Remove everything that doesn't match regex patterns in PythonMatch all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)

Missing quartile in boxplot

Knights and Knaves: What does C say?

Bothered by watching coworkers slacking off

French license plates

Looking for circuit board material that can be dissolved

Confusion regarding control system of Mars Rover?

Can I bring this power bank on board the aircraft?

How to find places to store/land a private airplane?

What's the global, general word that stands for "center tone of a song"?

PhD Length: are shorter PhD degrees (from different countries) valued differently in other counter countries where PhD Is a longer process?

What does a textbook look like while you are writing it?

The answer is a girl's name (my future granddaughter) - can anyone help?

How dangerous is a very out-of-true disc brake wheel?

Why most footers have a background color as a divider of section?

How dangerous are my worn rims?

Caro-Kann c4-c5 push

Can UK supreme court justices be evaluated ideologically?

Airport Security - advanced check, 4th amendment breach

Would an object shot from earth fall into the sun?

How important is knowledge of trig identities for use in Calculus

How to level a picture frame hung on a single nail?

Does the US Armed Forces refuse to recruit anyone with an IQ less than 83?

Looseness for Bezier controls. Or how to bend in circularly symmetric tikz-drawings (without bend left, etc.)

How to identify whether a publisher is genuine or not?

Remove everything that doesn't match regex patterns in Python

Match all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.

Example of the stuff I want gone:

Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away

Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:

pattern = (r"(19|20)dd")

I'm using has_date = data.str.contains(pattern) and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.

I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.

Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33

Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18

add a comment
|

Example of the stuff I want gone:

Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away

Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:

pattern = (r"(19|20)dd")

I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.

Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33

Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18

add a comment
|

Example of the stuff I want gone:

Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away

Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:

pattern = (r"(19|20)dd")

I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.

Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

Example of the stuff I want gone:

Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away

Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:

pattern = (r"(19|20)dd")

I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.

Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks

python regex pandas

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

edited Apr 11 at 18:32

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

asked Mar 28 at 20:40

hapigolucki

133 bronze badges

This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33

Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18

add a comment
|

This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33

Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18

This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33

Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18

add a comment
|

1 Answer
1

active

oldest

votes

To address your exact problem, namely replacing everything not matching the pattern, you may use

df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")

See the regex demo.

Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.

However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use

df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')

The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

1

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55406499%2fremove-everything-that-doesnt-match-regex-patterns-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

To address your exact problem, namely replacing everything not matching the pattern, you may use

df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")

See the regex demo.

However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use

df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')

The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

1

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

add a comment
|

To address your exact problem, namely replacing everything not matching the pattern, you may use

df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")

See the regex demo.

However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use

df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')

The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

1

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

add a comment
|

To address your exact problem, namely replacing everything not matching the pattern, you may use

df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")

See the regex demo.

However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use

df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')

The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

To address your exact problem, namely replacing everything not matching the pattern, you may use

df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")

See the regex demo.

However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use

df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')

The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

answered Apr 1 at 20:57

Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges

1

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

add a comment
|

1

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1