R - find/replace line breaks using regexSubstitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes

Fine Tuning of the Universe

I'm in charge of equipment buying but no one's ever happy with what I choose. How to fix this?

Sequence of Tenses: Translating the subjunctive

Where does the Z80 processor start executing from?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

How easy is it to start Magic from scratch?

For a non-Jew, is there a punishment for not observing the 7 Noahide Laws?

Integer addition + constant, is it a group?

What is the best translation for "slot" in the context of multiplayer video games?

Unreliable Magic - Is it worth it?

A particular customize with green line and letters for subfloat

India just shot down a satellite from the ground. At what altitude range is the resulting debris field?

How long to clear the 'suck zone' of a turbofan after start is initiated?

Implement the Thanos sorting algorithm

Short story about space worker geeks who zone out by 'listening' to radiation from stars

System.debug(JSON.Serialize(o)) Not longer shows full string

Sort a list by elements of another list

How do I extract a value from a time formatted value in excel?

Do sorcerers' subtle spells require a skill check to be unseen?

You cannot touch me, but I can touch you, who am I?

Is this apparent Class Action settlement a spam message?

How can a function with a hole (removable discontinuity) equal a function with no hole?

Balance Issues for a Custom Sorcerer Variant

Trouble understanding the speech of overseas colleagues

R - find/replace line breaks using regex

Substitution using regex with line breaks on a folder of text filesA comprehensive regex for phone number validationRegular expression to match a line that doesn't contain a word?How do I access named capturing groups in a .NET Regex?How to replace all occurrences of a string in JavaScriptRegEx match open tags except XHTML self-contained tagsHow do I grep for all non-ASCII characters?What is a cross platform regex for removal of line breaks?Find and kill a process in one line using bash and regexGrep regex NOT containing string.NET Regex To Remove Line Breaks Within Quotes

I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.

This is the code I'm using. It works for character substitution, but not for line breaks.

gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")

I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.

asked Mar 21 at 15:54

Will Hanley

838

Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59

Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01

5

fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02

2

You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04

"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago

add a comment |

I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.

This is the code I'm using. It works for character substitution, but not for line breaks.

gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")

I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.

asked Mar 21 at 15:54

Will Hanley

838

Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59

Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01

5

fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02

2

You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04

"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago

add a comment |

I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.

This is the code I'm using. It works for character substitution, but not for line breaks.

gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")

I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.

asked Mar 21 at 15:54

Will Hanley

838

I'm trying to clean a bunch of .txt files in a folder using regex. I can't seem to get R to find line breaks.

This is the code I'm using. It works for character substitution, but not for line breaks.

gsub_dir(dir = "folder_name", pattern = "\n", replacement = "#")

I've also tried r and various other permutations. Using a plain text editor I find all the line breaks with n.

r regex

asked Mar 21 at 15:54

Will Hanley

838

asked Mar 21 at 15:54

Will Hanley

838

asked Mar 21 at 15:54

Will Hanley

838

asked Mar 21 at 15:54

Will Hanley

838

asked Mar 21 at 15:54

Will Hanley

838

Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59

Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01

5

fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02

2

You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04

"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago

add a comment |

Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59

Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01

5

fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02

2

You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04

"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago

Actually I think you would need "\n" but it's hard to test.

– NelsonGon
Mar 21 at 15:59

Like this maybe(I haven't used cat). test<-paste("This is a n","test") test gsub("\n","",test). Although in this case using "\n" might not make a difference.

– NelsonGon
Mar 21 at 16:01

fortunes::fortune(365) When in doubt, keep adding slashes until it works.

– Gregor
Mar 21 at 16:02

You also might see a significant speed up if you use the fixed = TRUE argument. You don't actually need regex, you're only looking for exact matches.

– Gregor
Mar 21 at 16:04

"\n" did not work; you are right that I don't need regex for this example but I do need regex + line break for the project.

– Will Hanley
2 days ago

add a comment |

1 Answer
1

active

oldest

votes

You can't do that with xfun::gsub_dir.

Have a look at the source code:

The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

Then, gsub is fed with these lines, and when all replacements are done,

The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:

lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x = readLines(f, encoding = encoding, warn = FALSE)
 cat(x, sep = newline, file = f)
 


folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)

If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:

lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x <- readLines(f, encoding = encoding, warn = FALSE)
 x <- paste(x, collapse = newline)
 x <- gsub(pattern, replacement, x, perl = perl)
 cat(x, file = f)
 


folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)

This will remove lines that follow digit only lines.

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

1

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284424%2fr-find-replace-line-breaks-using-regex%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can't do that with xfun::gsub_dir.

Have a look at the source code:

The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

Then, gsub is fed with these lines, and when all replacements are done,

The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:

lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x = readLines(f, encoding = encoding, warn = FALSE)
 cat(x, sep = newline, file = f)
 


folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)

If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:

lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x <- readLines(f, encoding = encoding, warn = FALSE)
 x <- paste(x, collapse = newline)
 x <- gsub(pattern, replacement, x, perl = perl)
 cat(x, file = f)
 


folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)

This will remove lines that follow digit only lines.

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

1

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

add a comment |

You can't do that with xfun::gsub_dir.

Have a look at the source code:

The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

Then, gsub is fed with these lines, and when all replacements are done,

The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:

lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x = readLines(f, encoding = encoding, warn = FALSE)
 cat(x, sep = newline, file = f)
 


folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)

If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:

lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x <- readLines(f, encoding = encoding, warn = FALSE)
 x <- paste(x, collapse = newline)
 x <- gsub(pattern, replacement, x, perl = perl)
 cat(x, file = f)
 


folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)

This will remove lines that follow digit only lines.

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

1

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

add a comment |

You can't do that with xfun::gsub_dir.

Have a look at the source code:

The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

Then, gsub is fed with these lines, and when all replacements are done,

The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:

lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x = readLines(f, encoding = encoding, warn = FALSE)
 cat(x, sep = newline, file = f)
 


folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)

If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:

lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x <- readLines(f, encoding = encoding, warn = FALSE)
 x <- paste(x, collapse = newline)
 x <- gsub(pattern, replacement, x, perl = perl)
 cat(x, file = f)
 


folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)

This will remove lines that follow digit only lines.

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

You can't do that with xfun::gsub_dir.

Have a look at the source code:

The files are read in using read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE),

Then, gsub is fed with these lines, and when all replacements are done,

The write_utf8 function concatenates the lines... with the LF, newline, symbol.

You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:

lbr_change_gsub_dir = function(newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x = readLines(f, encoding = encoding, warn = FALSE)
 cat(x, sep = newline, file = f)
 


folder <- "C:\MyFolder\Here"
lbr_change_gsub_dir(newline="#", dir=folder)

If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:

lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = 'n', encoding = 'UTF-8', dir = '.', recursive = TRUE) 
 files = list.files(dir, full.names = TRUE, recursive = recursive)
 for (f in files) 
 x <- readLines(f, encoding = encoding, warn = FALSE)
 x <- paste(x, collapse = newline)
 x <- gsub(pattern, replacement, x, perl = perl)
 cat(x, file = f)
 


folder <- "C:\1"
lbr_gsub_dir("(?m)\d+\R(.+)", "\1", dir = folder)

This will remove lines that follow digit only lines.

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

edited 2 days ago

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

answered Mar 21 at 18:38

Wiktor Stribiżew

326k16147226

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

1

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

add a comment |

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

1

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

Thank you -- this works in answer to my narrow question. I still can't figure out my broader problem, which is how to use regex including line breaks over a folder of text files. I will post a new question about that.

– Will Hanley
2 days ago

@WillHanley Please note that all you need is to paste the lines. See the updated answer.

– Wiktor Stribiżew
2 days ago

I am still unsure how to do what I want to do--posted a question that I hope is clearer: stackoverflow.com/questions/55345453/…

– Will Hanley
2 days ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1