R - find/replace line breaks using regexSubstitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes

Fine Tuning of the Universe

I'm in charge of equipment buying but no one's ever happy with what I choose. How to fix this?

Sequence of Tenses: Translating the subjunctive

Where does the Z80 processor start executing from?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

How easy is it to start Magic from scratch?

For a non-Jew, is there a punishment for not observing the 7 Noahide Laws?

Integer addition + constant, is it a group?

What is the best translation for "slot" in the context of multiplayer video games?

Unreliable Magic - Is it worth it?

A particular customize with green line and letters for subfloat

India just shot down a satellite from the ground. At what altitude range is the resulting debris field?

How long to clear the 'suck zone' of a turbofan after start is initiated?

Implement the Thanos sorting algorithm

Short story about space worker geeks who zone out by 'listening' to radiation from stars

System.debug(JSON.Serialize(o)) Not longer shows full string

Sort a list by elements of another list

How do I extract a value from a time formatted value in excel?

Do sorcerers' subtle spells require a skill check to be unseen?

You cannot touch me, but I can touch you, who am I?

Is this apparent Class Action settlement a spam message?

How can a function with a hole (removable discontinuity) equal a function with no hole?

Balance Issues for a Custom Sorcerer Variant

Trouble understanding the speech of overseas colleagues



R - find/replace line breaks using regex


Substitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes













1















I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.



This is the code I'm using. It works for character substitution, but not for line breaks.



gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")


I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.










share|improve this question






















  • Actually I think you would need "\n" but it's hard to test.

    – NelsonGon
    Mar 21 at 15:59











  • Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

    – NelsonGon
    Mar 21 at 16:01







  • 5





    fortunes::fortune(365) When in doubt, keep adding slashes until it works.

    – Gregor
    Mar 21 at 16:02






  • 2





    You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

    – Gregor
    Mar 21 at 16:04












  • "\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

    – Will Hanley
    2 days ago
















1















I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.



This is the code I'm using. It works for character substitution, but not for line breaks.



gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")


I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.










share|improve this question






















  • Actually I think you would need "\n" but it's hard to test.

    – NelsonGon
    Mar 21 at 15:59











  • Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

    – NelsonGon
    Mar 21 at 16:01







  • 5





    fortunes::fortune(365) When in doubt, keep adding slashes until it works.

    – Gregor
    Mar 21 at 16:02






  • 2





    You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

    – Gregor
    Mar 21 at 16:04












  • "\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

    – Will Hanley
    2 days ago














1












1








1








I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.



This is the code I'm using. It works for character substitution, but not for line breaks.



gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")


I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.










share|improve this question














I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.



This is the code I'm using. It works for character substitution, but not for line breaks.



gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")


I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.







r regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 21 at 15:54









Will HanleyWill Hanley

838




838












  • Actually I think you would need "\n" but it's hard to test.

    – NelsonGon
    Mar 21 at 15:59











  • Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

    – NelsonGon
    Mar 21 at 16:01







  • 5





    fortunes::fortune(365) When in doubt, keep adding slashes until it works.

    – Gregor
    Mar 21 at 16:02






  • 2





    You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

    – Gregor
    Mar 21 at 16:04












  • "\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

    – Will Hanley
    2 days ago


















  • Actually I think you would need "\n" but it's hard to test.

    – NelsonGon
    Mar 21 at 15:59











  • Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

    – NelsonGon
    Mar 21 at 16:01







  • 5





    fortunes::fortune(365) When in doubt, keep adding slashes until it works.

    – Gregor
    Mar 21 at 16:02






  • 2





    You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

    – Gregor
    Mar 21 at 16:04












  • "\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

    – Will Hanley
    2 days ago

















Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59





Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59













Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01






Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01





5




5





fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02





fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02




2




2





You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04






You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04














"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago






"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago













1 Answer
1






active

oldest

votes


















3














You can't do that with xfun::gsub_dir.



Have a look at the source code:



  • The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

  • Then, gsub is fed with these lines, and when all replacements are done,

  • The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:



lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)



folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)


If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:



lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)



folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)


This will remove lines that follow digit only lines.






share|improve this answer

























  • Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

    – Will Hanley
    2 days ago






  • 1





    @WillHanley Please note that all you need is to paste the lines. See the updated answer.

    – Wiktor Stribiżew
    2 days ago











  • I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

    – Will Hanley
    2 days ago











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284424%2fr-find-replace-line-breaks-using-regex%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














You can't do that with xfun::gsub_dir.



Have a look at the source code:



  • The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

  • Then, gsub is fed with these lines, and when all replacements are done,

  • The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:



lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)



folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)


If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:



lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)



folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)


This will remove lines that follow digit only lines.






share|improve this answer

























  • Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

    – Will Hanley
    2 days ago






  • 1





    @WillHanley Please note that all you need is to paste the lines. See the updated answer.

    – Wiktor Stribiżew
    2 days ago











  • I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

    – Will Hanley
    2 days ago
















3














You can't do that with xfun::gsub_dir.



Have a look at the source code:



  • The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

  • Then, gsub is fed with these lines, and when all replacements are done,

  • The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:



lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)



folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)


If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:



lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)



folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)


This will remove lines that follow digit only lines.






share|improve this answer

























  • Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

    – Will Hanley
    2 days ago






  • 1





    @WillHanley Please note that all you need is to paste the lines. See the updated answer.

    – Wiktor Stribiżew
    2 days ago











  • I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

    – Will Hanley
    2 days ago














3












3








3







You can't do that with xfun::gsub_dir.



Have a look at the source code:



  • The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

  • Then, gsub is fed with these lines, and when all replacements are done,

  • The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:



lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)



folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)


If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:



lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)



folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)


This will remove lines that follow digit only lines.






share|improve this answer















You can't do that with xfun::gsub_dir.



Have a look at the source code:



  • The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

  • Then, gsub is fed with these lines, and when all replacements are done,

  • The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:



lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)



folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)


If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:



lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files)
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)



folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)


This will remove lines that follow digit only lines.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered Mar 21 at 18:38









Wiktor StribiżewWiktor Stribiżew

326k16147226




326k16147226












  • Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

    – Will Hanley
    2 days ago






  • 1





    @WillHanley Please note that all you need is to paste the lines. See the updated answer.

    – Wiktor Stribiżew
    2 days ago











  • I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

    – Will Hanley
    2 days ago


















  • Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

    – Will Hanley
    2 days ago






  • 1





    @WillHanley Please note that all you need is to paste the lines. See the updated answer.

    – Wiktor Stribiżew
    2 days ago











  • I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

    – Will Hanley
    2 days ago

















Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago





Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago




1




1





@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago





@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago













I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago






I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284424%2fr-find-replace-line-breaks-using-regex%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript