Why do I get a different result every time I save and extract a response string from a web service?RegEx match open tags except XHTML self-contained tagsmatching any character including newlines in a Python regex subexpression, not globallyHow to get string objects instead of Unicode from JSON?Why does comparing strings using either '==' or 'is' sometimes produce a different result?Why are empty strings returned in split() results?Writing string to a file on a new line every timeHow to extract numbers from a string in Python?md5 a string multiple times get different result on different platformGetting Different Results For Web ScrapingCannot display HTML stringWhy is np.random.choice giving the same result every time?Different result every time

In xXx, is Xander Cage's 10th vehicle a specific reference to another franchise?

Changing a TGV booking

Beth cardinals and inacceesible cardinals

Are there any legal requirements concerning airline pilots and their watches?

Can my Boyfriend, who lives in the UK and has a Polish passport, visit me in the USA?

Do predators tend to have vertical slit pupils versus horizontal for prey animals?

Have ejective consonants ever arisen on their own?

Is it appropriate for a business to ask me for my credit report?

What professions does medieval village with a population of 100 need?

To "hit home" in German

Are required indicators necessary for radio buttons?

How did Apollo 15's depressurization work?

Stuffing in the middle

What is a "click" in Greek or Latin?

My two team members in a remote location don't get along with each other; how can I improve working relations?

Unsolved Problems (Not Independent of ZFC) due to Lack of Computational Power

90s(?) book series about two people transported to a parallel medieval world, she joins city watch, he becomes wizard

How much code would a codegolf golf if a codegolf could golf code?

Does git delete empty folders?

Chess software to analyze games

Does Denmark lose almost $700 million a year "carrying" Greenland?

Is "stainless" a bulk or a surface property of stainless steel?

How to get distinct values from an array of arrays in JavaScript using the filter() method?

What is the latest version of SQL Server native client that is compatible with Sql Server 2008 r2

Why do I get a different result every time I save and extract a response string from a web service?

RegEx match open tags except XHTML self-contained tagsmatching any character including newlines in a Python regex subexpression, not globallyHow to get string objects instead of Unicode from JSON?Why does comparing strings using either '==' or 'is' sometimes produce a different result?Why are empty strings returned in split() results?Writing string to a file on a new line every timeHow to extract numbers from a string in Python?md5 a string multiple times get different result on different platformGetting Different Results For Web ScrapingCannot display HTML stringWhy is np.random.choice giving the same result every time?Different result every time

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.

I'm consuming a web service. the answer I get is stored in the variable answerService, this is a very long string and after this I extract what is inside the tag span that has this structure:

<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
 #with that I get the "xxx"
 arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)

I get an array of "n" length according to the span existing with this structure.

If I do this directly from the web service it does not work and I only get this answer:

['áGILMENTE']

Now, if I put the response of the web service sameStringOfAnswer in my code, the result is different:

print(arraySpan)
['ADV', 'áGILMENTE']

By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE'] when the answer I expect is ['ADV', 'áGILMENTE']

This is the key piece that shows that 2 span is always coming with the structure I need:

Here is my code:

import requests
import re
session = requests.Session()

getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)

answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)

#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> &nbsp;</form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría&nbsp;<span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema&nbsp;<span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)

What am I doing wrong?

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

1

Why are you using regex to parse html?

– TheIncorrigible1
Mar 27 at 14:52

@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.

– unusuario
Mar 27 at 14:53

@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.

– unusuario
Mar 27 at 14:57

Possible duplicate of RegEx match open tags except XHTML self-contained tags

– Ralf
Mar 27 at 14:59

@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

– unusuario
Mar 27 at 15:02

|
show 4 more comments

NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.

I'm consuming a web service. the answer I get is stored in the variable answerService, this is a very long string and after this I extract what is inside the tag span that has this structure:

<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
 #with that I get the "xxx"
 arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)

I get an array of "n" length according to the span existing with this structure.

If I do this directly from the web service it does not work and I only get this answer:

['áGILMENTE']

Now, if I put the response of the web service sameStringOfAnswer in my code, the result is different:

print(arraySpan)
['ADV', 'áGILMENTE']

This is the key piece that shows that 2 span is always coming with the structure I need:

Here is my code:

import requests
import re
session = requests.Session()

getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)

answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)

#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> &nbsp;</form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría&nbsp;<span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema&nbsp;<span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)

What am I doing wrong?

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

1

Why are you using regex to parse html?

– TheIncorrigible1
Mar 27 at 14:52

@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.

– unusuario
Mar 27 at 14:53

@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.

– unusuario
Mar 27 at 14:57

Possible duplicate of RegEx match open tags except XHTML self-contained tags

– Ralf
Mar 27 at 14:59

@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

– unusuario
Mar 27 at 15:02

|
show 4 more comments

NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.

I'm consuming a web service. the answer I get is stored in the variable answerService, this is a very long string and after this I extract what is inside the tag span that has this structure:

<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
 #with that I get the "xxx"
 arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)

I get an array of "n" length according to the span existing with this structure.

If I do this directly from the web service it does not work and I only get this answer:

['áGILMENTE']

Now, if I put the response of the web service sameStringOfAnswer in my code, the result is different:

print(arraySpan)
['ADV', 'áGILMENTE']

This is the key piece that shows that 2 span is always coming with the structure I need:

Here is my code:

import requests
import re
session = requests.Session()

getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)

answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)

#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> &nbsp;</form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría&nbsp;<span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema&nbsp;<span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)

What am I doing wrong?

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.

I'm consuming a web service. the answer I get is stored in the variable answerService, this is a very long string and after this I extract what is inside the tag span that has this structure:

<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
 #with that I get the "xxx"
 arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)

I get an array of "n" length according to the span existing with this structure.

If I do this directly from the web service it does not work and I only get this answer:

['áGILMENTE']

Now, if I put the response of the web service sameStringOfAnswer in my code, the result is different:

print(arraySpan)
['ADV', 'áGILMENTE']

This is the key piece that shows that 2 span is always coming with the structure I need:

Here is my code:

import requests
import re
session = requests.Session()

getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)

answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)

#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> &nbsp;</form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría&nbsp;<span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema&nbsp;<span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)

What am I doing wrong?

python

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

edited Mar 27 at 15:34

LogicalBranch

2,3162 gold badges10 silver badges40 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

asked Mar 27 at 14:50

unusuario

4911 bronze badges

1

Why are you using regex to parse html?

– TheIncorrigible1
Mar 27 at 14:52

@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.

– unusuario
Mar 27 at 14:53

@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.

– unusuario
Mar 27 at 14:57

Possible duplicate of RegEx match open tags except XHTML self-contained tags

– Ralf
Mar 27 at 14:59

@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

– unusuario
Mar 27 at 15:02

|
show 4 more comments

1

Why are you using regex to parse html?

– TheIncorrigible1
Mar 27 at 14:52

@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.

– unusuario
Mar 27 at 14:53

@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.

– unusuario
Mar 27 at 14:57

Possible duplicate of RegEx match open tags except XHTML self-contained tags

– Ralf
Mar 27 at 14:59

@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

– unusuario
Mar 27 at 15:02

Why are you using regex to parse html?

– TheIncorrigible1
Mar 27 at 14:52

@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.

– unusuario
Mar 27 at 14:53

@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.

– unusuario
Mar 27 at 14:57

Possible duplicate of RegEx match open tags except XHTML self-contained tags

– Ralf
Mar 27 at 14:59

@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.

– unusuario
Mar 27 at 15:02

|
show 4 more comments

1 Answer
1

active

oldest

votes

The HTML from the webservice contains:

<span style="font-weight:bold"> ADVn </span>

But your minified code contains the tag without the newline n:

<span style="font-weight:bold"> ADV </span>

You can test the difference yourself:

>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.

This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55380168%2fwhy-do-i-get-a-different-result-every-time-i-save-and-extract-a-response-string%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The HTML from the webservice contains:

<span style="font-weight:bold"> ADVn </span>

But your minified code contains the tag without the newline n:

<span style="font-weight:bold"> ADV </span>

You can test the difference yourself:

>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.

This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

add a comment |

The HTML from the webservice contains:

<span style="font-weight:bold"> ADVn </span>

But your minified code contains the tag without the newline n:

<span style="font-weight:bold"> ADV </span>

You can test the difference yourself:

>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.

This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

add a comment |

The HTML from the webservice contains:

<span style="font-weight:bold"> ADVn </span>

But your minified code contains the tag without the newline n:

<span style="font-weight:bold"> ADV </span>

You can test the difference yourself:

>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.

This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

The HTML from the webservice contains:

<span style="font-weight:bold"> ADVn </span>

But your minified code contains the tag without the newline n:

<span style="font-weight:bold"> ADV </span>

You can test the difference yourself:

>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']

That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.

This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

edited Mar 27 at 15:26

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

answered Mar 27 at 15:21

Ralf

8,8594 gold badges18 silver badges40 bronze badges

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

add a comment |

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the , What is the best way or the solution to get what I need inside those tags ?

– unusuario
Mar 27 at 15:30

The answers in this question suggest using ([sS]*?) (or some variation of it) instead of (.*?).

– Ralf
Mar 27 at 15:41

@unusuario you should read more about regex to get a good solution for your use case.

– Ralf
Mar 27 at 15:41

You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9

– akent
Mar 27 at 15:46

add a comment |

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1