Remove everything that doesn't match regex patterns in PythonMatch all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)
Missing quartile in boxplot
Knights and Knaves: What does C say?
Bothered by watching coworkers slacking off
French license plates
Looking for circuit board material that can be dissolved
Confusion regarding control system of Mars Rover?
Can I bring this power bank on board the aircraft?
How to find places to store/land a private airplane?
What's the global, general word that stands for "center tone of a song"?
PhD Length: are shorter PhD degrees (from different countries) valued differently in other counter countries where PhD Is a longer process?
What does a textbook look like while you are writing it?
The answer is a girl's name (my future granddaughter) - can anyone help?
How dangerous is a very out-of-true disc brake wheel?
Why most footers have a background color as a divider of section?
How dangerous are my worn rims?
Caro-Kann c4-c5 push
Can UK supreme court justices be evaluated ideologically?
Airport Security - advanced check, 4th amendment breach
Would an object shot from earth fall into the sun?
How important is knowledge of trig identities for use in Calculus
How to level a picture frame hung on a single nail?
Does the US Armed Forces refuse to recruit anyone with an IQ less than 83?
Looseness for Bezier controls. Or how to bend in circularly symmetric tikz-drawings (without bend left, etc.)
How to identify whether a publisher is genuine or not?
Remove everything that doesn't match regex patterns in Python
Match all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.
Example of the stuff I want gone:
Mexico [12/20/1985]
If I could remove what doesn't match the pattern, then the brackets and Mexico would go away
Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:
pattern = (r"(19|20)dd")
I'm using has_date = data.str.contains(pattern)
and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.
I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.
Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks
python regex pandas
add a comment
|
I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.
Example of the stuff I want gone:
Mexico [12/20/1985]
If I could remove what doesn't match the pattern, then the brackets and Mexico would go away
Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:
pattern = (r"(19|20)dd")
I'm using has_date = data.str.contains(pattern)
and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.
I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.
Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks
python regex pandas
This sounds like you want to extract texts your pattern matches. Trydf['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(ifData
is the column with original texts andDates
is the target column).
– Wiktor Stribiżew
Mar 28 at 21:33
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18
add a comment
|
I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.
Example of the stuff I want gone:
Mexico [12/20/1985]
If I could remove what doesn't match the pattern, then the brackets and Mexico would go away
Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:
pattern = (r"(19|20)dd")
I'm using has_date = data.str.contains(pattern)
and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.
I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.
Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks
python regex pandas
I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.
Example of the stuff I want gone:
Mexico [12/20/1985]
If I could remove what doesn't match the pattern, then the brackets and Mexico would go away
Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:
pattern = (r"(19|20)dd")
I'm using has_date = data.str.contains(pattern)
and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.
I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.
Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks
python regex pandas
python regex pandas
edited Apr 11 at 18:32
Wiktor Stribiżew
360k16 gold badges171 silver badges254 bronze badges
360k16 gold badges171 silver badges254 bronze badges
asked Mar 28 at 20:40
hapigoluckihapigolucki
133 bronze badges
133 bronze badges
This sounds like you want to extract texts your pattern matches. Trydf['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(ifData
is the column with original texts andDates
is the target column).
– Wiktor Stribiżew
Mar 28 at 21:33
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18
add a comment
|
This sounds like you want to extract texts your pattern matches. Trydf['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(ifData
is the column with original texts andDates
is the target column).
– Wiktor Stribiżew
Mar 28 at 21:33
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18
This sounds like you want to extract texts your pattern matches. Try
df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(if Data
is the column with original texts and Dates
is the target column).– Wiktor Stribiżew
Mar 28 at 21:33
This sounds like you want to extract texts your pattern matches. Try
df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(if Data
is the column with original texts and Dates
is the target column).– Wiktor Stribiżew
Mar 28 at 21:33
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18
add a comment
|
1 Answer
1
active
oldest
votes
To address your exact problem, namely replacing everything not matching the pattern, you may use
df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")
See the regex demo.
Here, (?s)
will make .
match any char, ((?:19|20)dd)?
is an optional capturing group #1 that matches either 19
or 20
and then any 2 digits 1 or 0 times, and then matches any char with .
pattern. If Group 1 matched, it will be put back into the result due to the 1
backreference.
However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use
df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
The b((?:19|20)d2)b
will match 19
or 20
and then any two digits as a whole word (due to b
word boundaries).
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55406499%2fremove-everything-that-doesnt-match-regex-patterns-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To address your exact problem, namely replacing everything not matching the pattern, you may use
df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")
See the regex demo.
Here, (?s)
will make .
match any char, ((?:19|20)dd)?
is an optional capturing group #1 that matches either 19
or 20
and then any 2 digits 1 or 0 times, and then matches any char with .
pattern. If Group 1 matched, it will be put back into the result due to the 1
backreference.
However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use
df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
The b((?:19|20)d2)b
will match 19
or 20
and then any two digits as a whole word (due to b
word boundaries).
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
add a comment
|
To address your exact problem, namely replacing everything not matching the pattern, you may use
df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")
See the regex demo.
Here, (?s)
will make .
match any char, ((?:19|20)dd)?
is an optional capturing group #1 that matches either 19
or 20
and then any 2 digits 1 or 0 times, and then matches any char with .
pattern. If Group 1 matched, it will be put back into the result due to the 1
backreference.
However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use
df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
The b((?:19|20)d2)b
will match 19
or 20
and then any two digits as a whole word (due to b
word boundaries).
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
add a comment
|
To address your exact problem, namely replacing everything not matching the pattern, you may use
df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")
See the regex demo.
Here, (?s)
will make .
match any char, ((?:19|20)dd)?
is an optional capturing group #1 that matches either 19
or 20
and then any 2 digits 1 or 0 times, and then matches any char with .
pattern. If Group 1 matched, it will be put back into the result due to the 1
backreference.
However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use
df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
The b((?:19|20)d2)b
will match 19
or 20
and then any two digits as a whole word (due to b
word boundaries).
To address your exact problem, namely replacing everything not matching the pattern, you may use
df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")
See the regex demo.
Here, (?s)
will make .
match any char, ((?:19|20)dd)?
is an optional capturing group #1 that matches either 19
or 20
and then any 2 digits 1 or 0 times, and then matches any char with .
pattern. If Group 1 matched, it will be put back into the result due to the 1
backreference.
However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use
df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
The b((?:19|20)d2)b
will match 19
or 20
and then any two digits as a whole word (due to b
word boundaries).
answered Apr 1 at 20:57
Wiktor StribiżewWiktor Stribiżew
360k16 gold badges171 silver badges254 bronze badges
360k16 gold badges171 silver badges254 bronze badges
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
add a comment
|
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
1
1
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful
– hapigolucki
Apr 11 at 17:03
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55406499%2fremove-everything-that-doesnt-match-regex-patterns-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This sounds like you want to extract texts your pattern matches. Try
df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')
(ifData
is the column with original texts andDates
is the target column).– Wiktor Stribiżew
Mar 28 at 21:33
Have you tried that yet?
– Wiktor Stribiżew
Mar 29 at 22:18