R - find/replace line breaks using regexSubstitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes
Fine Tuning of the Universe
I'm in charge of equipment buying but no one's ever happy with what I choose. How to fix this?
Sequence of Tenses: Translating the subjunctive
Where does the Z80 processor start executing from?
How can I get through very long and very dry, but also very useful technical documents when learning a new tool?
How easy is it to start Magic from scratch?
For a non-Jew, is there a punishment for not observing the 7 Noahide Laws?
Integer addition + constant, is it a group?
What is the best translation for "slot" in the context of multiplayer video games?
Unreliable Magic - Is it worth it?
A particular customize with green line and letters for subfloat
India just shot down a satellite from the ground. At what altitude range is the resulting debris field?
How long to clear the 'suck zone' of a turbofan after start is initiated?
Implement the Thanos sorting algorithm
Short story about space worker geeks who zone out by 'listening' to radiation from stars
System.debug(JSON.Serialize(o)) Not longer shows full string
Sort a list by elements of another list
How do I extract a value from a time formatted value in excel?
Do sorcerers' subtle spells require a skill check to be unseen?
You cannot touch me, but I can touch you, who am I?
Is this apparent Class Action settlement a spam message?
How can a function with a hole (removable discontinuity) equal a function with no hole?
Balance Issues for a Custom Sorcerer Variant
Trouble understanding the speech of overseas colleagues
R - find/replace line breaks using regex
Substitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes
I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.
This is the code I'm using. It works for character substitution, but not for line breaks.
gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")
I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.
r regex
add a comment |
I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.
This is the code I'm using. It works for character substitution, but not for line breaks.
gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")
I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.
r regex
Actually I think you would need"\n"
but it's hard to test.
– NelsonGon
Mar 21 at 15:59
Like this maybe(I haven't usedcat
).test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using"\n"
might not make a difference.
– NelsonGon
Mar 21 at 16:01
5
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.
– Gregor
Mar 21 at 16:02
2
You also might see a significant speed up if you use thefixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.
– Gregor
Mar 21 at 16:04
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.
– Will Hanley
2 days ago
add a comment |
I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.
This is the code I'm using. It works for character substitution, but not for line breaks.
gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")
I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.
r regex
I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.
This is the code I'm using. It works for character substitution, but not for line breaks.
gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")
I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.
r regex
r regex
asked Mar 21 at 15:54
Will HanleyWill Hanley
838
838
Actually I think you would need"\n"
but it's hard to test.
– NelsonGon
Mar 21 at 15:59
Like this maybe(I haven't usedcat
).test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using"\n"
might not make a difference.
– NelsonGon
Mar 21 at 16:01
5
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.
– Gregor
Mar 21 at 16:02
2
You also might see a significant speed up if you use thefixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.
– Gregor
Mar 21 at 16:04
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.
– Will Hanley
2 days ago
add a comment |
Actually I think you would need"\n"
but it's hard to test.
– NelsonGon
Mar 21 at 15:59
Like this maybe(I haven't usedcat
).test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using"\n"
might not make a difference.
– NelsonGon
Mar 21 at 16:01
5
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.
– Gregor
Mar 21 at 16:02
2
You also might see a significant speed up if you use thefixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.
– Gregor
Mar 21 at 16:04
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.
– Will Hanley
2 days ago
Actually I think you would need
"\n"
but it's hard to test.– NelsonGon
Mar 21 at 15:59
Actually I think you would need
"\n"
but it's hard to test.– NelsonGon
Mar 21 at 15:59
Like this maybe(I haven't used
cat
). test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using "\n"
might not make a difference.– NelsonGon
Mar 21 at 16:01
Like this maybe(I haven't used
cat
). test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using "\n"
might not make a difference.– NelsonGon
Mar 21 at 16:01
5
5
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.– Gregor
Mar 21 at 16:02
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.– Gregor
Mar 21 at 16:02
2
2
You also might see a significant speed up if you use the
fixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.– Gregor
Mar 21 at 16:04
You also might see a significant speed up if you use the
fixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.– Gregor
Mar 21 at 16:04
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.– Will Hanley
2 days ago
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.– Will Hanley
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
- The files are read in using
read_utf8
that basically executesx = readLines(con, encoding = 'UTF-8', warn = FALSE)
, - Then,
gsub
is fed with these lines, and when all replacements are done, - The
write_utf8
function concatenates the lines... with the LF, newline, symbol.
You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)
This will remove lines that follow digit only lines.
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
@WillHanley Please note that all you need is topaste
the lines. See the updated answer.
– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284424%2fr-find-replace-line-breaks-using-regex%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
- The files are read in using
read_utf8
that basically executesx = readLines(con, encoding = 'UTF-8', warn = FALSE)
, - Then,
gsub
is fed with these lines, and when all replacements are done, - The
write_utf8
function concatenates the lines... with the LF, newline, symbol.
You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)
This will remove lines that follow digit only lines.
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
@WillHanley Please note that all you need is topaste
the lines. See the updated answer.
– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
add a comment |
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
- The files are read in using
read_utf8
that basically executesx = readLines(con, encoding = 'UTF-8', warn = FALSE)
, - Then,
gsub
is fed with these lines, and when all replacements are done, - The
write_utf8
function concatenates the lines... with the LF, newline, symbol.
You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)
This will remove lines that follow digit only lines.
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
@WillHanley Please note that all you need is topaste
the lines. See the updated answer.
– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
add a comment |
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
- The files are read in using
read_utf8
that basically executesx = readLines(con, encoding = 'UTF-8', warn = FALSE)
, - Then,
gsub
is fed with these lines, and when all replacements are done, - The
write_utf8
function concatenates the lines... with the LF, newline, symbol.
You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)
This will remove lines that follow digit only lines.
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
- The files are read in using
read_utf8
that basically executesx = readLines(con, encoding = 'UTF-8', warn = FALSE)
, - Then,
gsub
is fed with these lines, and when all replacements are done, - The
write_utf8
function concatenates the lines... with the LF, newline, symbol.
You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE)
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)
This will remove lines that follow digit only lines.
edited 2 days ago
answered Mar 21 at 18:38
Wiktor StribiżewWiktor Stribiżew
326k16147226
326k16147226
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
@WillHanley Please note that all you need is topaste
the lines. See the updated answer.
– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
add a comment |
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
@WillHanley Please note that all you need is topaste
the lines. See the updated answer.
– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.
– Will Hanley
2 days ago
1
1
@WillHanley Please note that all you need is to
paste
the lines. See the updated answer.– Wiktor Stribiżew
2 days ago
@WillHanley Please note that all you need is to
paste
the lines. See the updated answer.– Wiktor Stribiżew
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…
– Will Hanley
2 days ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284424%2fr-find-replace-line-breaks-using-regex%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Actually I think you would need
"\n"
but it's hard to test.– NelsonGon
Mar 21 at 15:59
Like this maybe(I haven't used
cat
).test<-paste("This is a n","test") test gsub("\n","",test)
. Although in this case using"\n"
might not make a difference.– NelsonGon
Mar 21 at 16:01
5
fortunes::fortune(365)
When in doubt, keep adding slashes until it works.– Gregor
Mar 21 at 16:02
2
You also might see a significant speed up if you use the
fixed = TRUE
argument. You don't actually need regex, you're only looking for exact matches.– Gregor
Mar 21 at 16:04
"\n"
did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.– Will Hanley
2 days ago