Remove everything that doesn't match regex patterns in PythonMatch all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)

Missing quartile in boxplot

Knights and Knaves: What does C say?

Bothered by watching coworkers slacking off

French license plates

Looking for circuit board material that can be dissolved

Confusion regarding control system of Mars Rover?

Can I bring this power bank on board the aircraft?

How to find places to store/land a private airplane?

What's the global, general word that stands for "center tone of a song"?

PhD Length: are shorter PhD degrees (from different countries) valued differently in other counter countries where PhD Is a longer process?

What does a textbook look like while you are writing it?

The answer is a girl's name (my future granddaughter) - can anyone help?

How dangerous is a very out-of-true disc brake wheel?

Why most footers have a background color as a divider of section?

How dangerous are my worn rims?

Caro-Kann c4-c5 push

Can UK supreme court justices be evaluated ideologically?

Airport Security - advanced check, 4th amendment breach

Would an object shot from earth fall into the sun?

How important is knowledge of trig identities for use in Calculus

How to level a picture frame hung on a single nail?

Does the US Armed Forces refuse to recruit anyone with an IQ less than 83?

Looseness for Bezier controls. Or how to bend in circularly symmetric tikz-drawings (without bend left, etc.)

How to identify whether a publisher is genuine or not?



Remove everything that doesn't match regex patterns in Python


Match all occurrences of a regexHow can I remove a trailing newline?Regular expression to match a line that doesn't contain a wordHow to remove an element from a list by index?Regex: match everything but specific patternRegEx match open tags except XHTML self-contained tagsHow to print matched regex pattern using awk?How to remove a key from a Python dictionary?“Large data” work flows using pandasNegative RegEx pattern matching in Python equivalent to Perl(!~ operator)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









0















I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.



Example of the stuff I want gone:



Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away



Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:



pattern = (r"(19|20)dd")



I'm using has_date = data.str.contains(pattern) and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.



I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.



Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks










share|improve this question


























  • This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

    – Wiktor Stribiżew
    Mar 28 at 21:33












  • Have you tried that yet?

    – Wiktor Stribiżew
    Mar 29 at 22:18


















0















I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.



Example of the stuff I want gone:



Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away



Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:



pattern = (r"(19|20)dd")



I'm using has_date = data.str.contains(pattern) and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.



I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.



Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks










share|improve this question


























  • This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

    – Wiktor Stribiżew
    Mar 28 at 21:33












  • Have you tried that yet?

    – Wiktor Stribiżew
    Mar 29 at 22:18














0












0








0








I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.



Example of the stuff I want gone:



Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away



Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:



pattern = (r"(19|20)dd")



I'm using has_date = data.str.contains(pattern) and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.



I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.



Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks










share|improve this question
















I have a regex pattern that identifies dates in a whole column of dates, but some of the dates are included in a string, while some are just plain dates by themselves. My regex pattern finds every date perfectly, but now I wanted to be able to say "remove everything that doesn't fit the date pattern" which will get rid of the text that's either in front of or behind some dates.



Example of the stuff I want gone:



Mexico [12/20/1985] If I could remove what doesn't match the pattern, then the brackets and Mexico would go away



Say my regex pattern is (I have two more that match more specific date formats but not including them because that's beside the point:



pattern = (r"(19|20)dd")



I'm using has_date = data.str.contains(pattern) and it works perfectly to find what I'm looking for. But, now that I've identified the observations that have the dates that I want, I need to strip/remove/replace with nothing everything that isn't that pattern.



I made a file of what didn't match the regex patterns and what did, and checked to make sure my regex patterns got everything, so I'm good on that front.



Anyone have any suggestions on how to replace what isn't my pattern? Welcome any thoughts. Thanks







python regex pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 11 at 18:32









Wiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges




360k16 gold badges171 silver badges254 bronze badges










asked Mar 28 at 20:40









hapigoluckihapigolucki

133 bronze badges




133 bronze badges















  • This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

    – Wiktor Stribiżew
    Mar 28 at 21:33












  • Have you tried that yet?

    – Wiktor Stribiżew
    Mar 29 at 22:18


















  • This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

    – Wiktor Stribiżew
    Mar 28 at 21:33












  • Have you tried that yet?

    – Wiktor Stribiżew
    Mar 29 at 22:18

















This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33






This sounds like you want to extract texts your pattern matches. Try df['Dates'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('') (if Data is the column with original texts and Dates is the target column).

– Wiktor Stribiżew
Mar 28 at 21:33














Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18






Have you tried that yet?

– Wiktor Stribiżew
Mar 29 at 22:18













1 Answer
1






active

oldest

votes


















1
















To address your exact problem, namely replacing everything not matching the pattern, you may use



df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")


See the regex demo.



Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.



However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use



df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')


The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).






share|improve this answer




















  • 1





    Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

    – hapigolucki
    Apr 11 at 17:03












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55406499%2fremove-everything-that-doesnt-match-regex-patterns-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1
















To address your exact problem, namely replacing everything not matching the pattern, you may use



df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")


See the regex demo.



Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.



However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use



df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')


The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).






share|improve this answer




















  • 1





    Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

    – hapigolucki
    Apr 11 at 17:03















1
















To address your exact problem, namely replacing everything not matching the pattern, you may use



df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")


See the regex demo.



Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.



However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use



df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')


The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).






share|improve this answer




















  • 1





    Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

    – hapigolucki
    Apr 11 at 17:03













1














1










1









To address your exact problem, namely replacing everything not matching the pattern, you may use



df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")


See the regex demo.



Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.



However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use



df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')


The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).






share|improve this answer













To address your exact problem, namely replacing everything not matching the pattern, you may use



df['Data'] = df['Data'].str.replace(r"(?s)((?:19|20)dd)?.", r"1")


See the regex demo.



Here, (?s) will make . match any char, ((?:19|20)dd)? is an optional capturing group #1 that matches either 19 or 20 and then any 2 digits 1 or 0 times, and then matches any char with . pattern. If Group 1 matched, it will be put back into the result due to the 1 backreference.



However, it seems you want to just extract the year from the data, and in case there is none, just get an empty string, so use



df['Data'] = df['Data'].str.extract(r'b((?:19|20)d2)b', expand=False).fillna('')


The b((?:19|20)d2)b will match 19 or 20 and then any two digits as a whole word (due to b word boundaries).







share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 1 at 20:57









Wiktor StribiżewWiktor Stribiżew

360k16 gold badges171 silver badges254 bronze badges




360k16 gold badges171 silver badges254 bronze badges










  • 1





    Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

    – hapigolucki
    Apr 11 at 17:03












  • 1





    Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

    – hapigolucki
    Apr 11 at 17:03







1




1





Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03





Thank you!! Sorry for the late response, but I hadn't been able to get into my account. That solution was very helpful

– hapigolucki
Apr 11 at 17:03




















draft saved

draft discarded















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55406499%2fremove-everything-that-doesnt-match-regex-patterns-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현