Why do I get a different result every time I save and extract a response string from a web service?RegEx match open tags except XHTML self-contained tagsmatching any character including newlines in a Python regex subexpression, not globallyHow to get string objects instead of Unicode from JSON?Why does comparing strings using either '==' or 'is' sometimes produce a different result?Why are empty strings returned in split() results?Writing string to a file on a new line every timeHow to extract numbers from a string in Python?md5 a string multiple times get different result on different platformGetting Different Results For Web ScrapingCannot display HTML stringWhy is np.random.choice giving the same result every time?Different result every time
In xXx, is Xander Cage's 10th vehicle a specific reference to another franchise?
Changing a TGV booking
Beth cardinals and inacceesible cardinals
Are there any legal requirements concerning airline pilots and their watches?
Can my Boyfriend, who lives in the UK and has a Polish passport, visit me in the USA?
Do predators tend to have vertical slit pupils versus horizontal for prey animals?
Have ejective consonants ever arisen on their own?
Is it appropriate for a business to ask me for my credit report?
What professions does medieval village with a population of 100 need?
To "hit home" in German
Are required indicators necessary for radio buttons?
How did Apollo 15's depressurization work?
Stuffing in the middle
What is a "click" in Greek or Latin?
My two team members in a remote location don't get along with each other; how can I improve working relations?
Unsolved Problems (Not Independent of ZFC) due to Lack of Computational Power
90s(?) book series about two people transported to a parallel medieval world, she joins city watch, he becomes wizard
How much code would a codegolf golf if a codegolf could golf code?
Does git delete empty folders?
Chess software to analyze games
Does Denmark lose almost $700 million a year "carrying" Greenland?
Is "stainless" a bulk or a surface property of stainless steel?
How to get distinct values from an array of arrays in JavaScript using the filter() method?
What is the latest version of SQL Server native client that is compatible with Sql Server 2008 r2
Why do I get a different result every time I save and extract a response string from a web service?
RegEx match open tags except XHTML self-contained tagsmatching any character including newlines in a Python regex subexpression, not globallyHow to get string objects instead of Unicode from JSON?Why does comparing strings using either '==' or 'is' sometimes produce a different result?Why are empty strings returned in split() results?Writing string to a file on a new line every timeHow to extract numbers from a string in Python?md5 a string multiple times get different result on different platformGetting Different Results For Web ScrapingCannot display HTML stringWhy is np.random.choice giving the same result every time?Different result every time
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.
I'm consuming a web service. the answer I get is stored in the variable answerService
, this is a very long string and after this I extract what is inside the tag span
that has this structure:
<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
#with that I get the "xxx"
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
I get an array of "n" length according to the span
existing with this structure.
If I do this directly from the web service it does not work and I only get this answer:
['áGILMENTE']
Now, if I put the response of the web service sameStringOfAnswer
in my code, the result is different:
print(arraySpan)
['ADV', 'áGILMENTE']
By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE']
when the answer I expect is ['ADV', 'áGILMENTE'
]
This is the key piece that shows that 2 span
is always coming with the structure I need:
Here is my code:
import requests
import re
session = requests.Session()
getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)
answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)
#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> </form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría <span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema <span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)
What am I doing wrong?
python
|
show 4 more comments
My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.
I'm consuming a web service. the answer I get is stored in the variable answerService
, this is a very long string and after this I extract what is inside the tag span
that has this structure:
<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
#with that I get the "xxx"
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
I get an array of "n" length according to the span
existing with this structure.
If I do this directly from the web service it does not work and I only get this answer:
['áGILMENTE']
Now, if I put the response of the web service sameStringOfAnswer
in my code, the result is different:
print(arraySpan)
['ADV', 'áGILMENTE']
By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE']
when the answer I expect is ['ADV', 'áGILMENTE'
]
This is the key piece that shows that 2 span
is always coming with the structure I need:
Here is my code:
import requests
import re
session = requests.Session()
getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)
answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)
#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> </form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría <span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema <span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)
What am I doing wrong?
python
1
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02
|
show 4 more comments
My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.
I'm consuming a web service. the answer I get is stored in the variable answerService
, this is a very long string and after this I extract what is inside the tag span
that has this structure:
<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
#with that I get the "xxx"
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
I get an array of "n" length according to the span
existing with this structure.
If I do this directly from the web service it does not work and I only get this answer:
['áGILMENTE']
Now, if I put the response of the web service sameStringOfAnswer
in my code, the result is different:
print(arraySpan)
['ADV', 'áGILMENTE']
By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE']
when the answer I expect is ['ADV', 'áGILMENTE'
]
This is the key piece that shows that 2 span
is always coming with the structure I need:
Here is my code:
import requests
import re
session = requests.Session()
getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)
answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)
#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> </form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría <span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema <span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)
What am I doing wrong?
python
My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.
I'm consuming a web service. the answer I get is stored in the variable answerService
, this is a very long string and after this I extract what is inside the tag span
that has this structure:
<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
#with that I get the "xxx"
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
I get an array of "n" length according to the span
existing with this structure.
If I do this directly from the web service it does not work and I only get this answer:
['áGILMENTE']
Now, if I put the response of the web service sameStringOfAnswer
in my code, the result is different:
print(arraySpan)
['ADV', 'áGILMENTE']
By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE']
when the answer I expect is ['ADV', 'áGILMENTE'
]
This is the key piece that shows that 2 span
is always coming with the structure I need:
Here is my code:
import requests
import re
session = requests.Session()
getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)
answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)
#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,pmargin:0; padding:0bodyfont-family: Arial, Helvetica, sans-serif; background-color:#fffatext-decoration: none;a:hovertext-decoration: underlineullist-style-type: nonetdpadding: 0.5pc 2pc 0pc 0pc.navfloat: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px.nav lidisplay:inline; border-left: 1px solid #444; padding:0 0.4em;.nav li.firstborder-left:0.hidedisplay:noneinputtext-indent: 2pxinput[type="submit"]text-indent: 0DIV.delPagepadding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;.delMainpadding: 2ex 0.5em 0.5pc 0.5em;.postmargin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;.posts, #postspadding: 0.5ex 0.5em 0.5pc 50px;.bannerpadding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both.banner h1font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;h2font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;.resaltadofont-weight: bolder;font-size: 100%</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> </form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría <span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema <span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)
What am I doing wrong?
python
python
edited Mar 27 at 15:34
LogicalBranch
2,3162 gold badges10 silver badges40 bronze badges
2,3162 gold badges10 silver badges40 bronze badges
asked Mar 27 at 14:50
unusuariounusuario
4911 bronze badges
4911 bronze badges
1
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02
|
show 4 more comments
1
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02
1
1
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02
|
show 4 more comments
1 Answer
1
active
oldest
votes
The HTML from the webservice contains:
<span style="font-weight:bold"> ADVn </span>
But your minified code contains the tag without the newline n
:
<span style="font-weight:bold"> ADV </span>
You can test the difference yourself:
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']
That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.
This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using([sS]*?)
(or some variation of it) instead of(.*?)
.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55380168%2fwhy-do-i-get-a-different-result-every-time-i-save-and-extract-a-response-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The HTML from the webservice contains:
<span style="font-weight:bold"> ADVn </span>
But your minified code contains the tag without the newline n
:
<span style="font-weight:bold"> ADV </span>
You can test the difference yourself:
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']
That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.
This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using([sS]*?)
(or some variation of it) instead of(.*?)
.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
add a comment |
The HTML from the webservice contains:
<span style="font-weight:bold"> ADVn </span>
But your minified code contains the tag without the newline n
:
<span style="font-weight:bold"> ADV </span>
You can test the difference yourself:
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']
That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.
This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using([sS]*?)
(or some variation of it) instead of(.*?)
.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
add a comment |
The HTML from the webservice contains:
<span style="font-weight:bold"> ADVn </span>
But your minified code contains the tag without the newline n
:
<span style="font-weight:bold"> ADV </span>
You can test the difference yourself:
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']
That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.
This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags
The HTML from the webservice contains:
<span style="font-weight:bold"> ADVn </span>
But your minified code contains the tag without the newline n
:
<span style="font-weight:bold"> ADV </span>
You can test the difference yourself:
>>> pattern = r'<span style="font-weight:bold">(.*?)<'
>>> re.findall(pattern, '<span style="font-weight:bold">AAAn<')
[]
>>> re.findall(pattern, '<span style="font-weight:bold">AAA<')
['AAA']
That is why the are different. You should have mentioned that you use a minifier, as they alter the HTML and you can not use regex after that and still expect the same output.
This whole problem would have been avoided if you used an XML parser instead of regex, just like the linked question suggests: RegEx match open tags except XHTML self-contained tags
edited Mar 27 at 15:26
answered Mar 27 at 15:21
RalfRalf
8,8594 gold badges18 silver badges40 bronze badges
8,8594 gold badges18 silver badges40 bronze badges
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using([sS]*?)
(or some variation of it) instead of(.*?)
.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
add a comment |
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using([sS]*?)
(or some variation of it) instead of(.*?)
.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
You are a genius, I think I finally understand my problem, although in theory I am getting everything that is inside the <span>, What is the best way or the solution to get what I need inside those tags <span>?
– unusuario
Mar 27 at 15:30
The answers in this question suggest using
([sS]*?)
(or some variation of it) instead of (.*?)
.– Ralf
Mar 27 at 15:41
The answers in this question suggest using
([sS]*?)
(or some variation of it) instead of (.*?)
.– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
@unusuario you should read more about regex to get a good solution for your use case.
– Ralf
Mar 27 at 15:41
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
You should really really use a parser. Try BeautifulSoup. Here's some code that does what you want to get you started. gist.github.com/akent/86dd72a085d452e8db5f4d76c3cce2c9
– akent
Mar 27 at 15:46
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55380168%2fwhy-do-i-get-a-different-result-every-time-i-save-and-extract-a-response-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why are you using regex to parse html?
– TheIncorrigible1
Mar 27 at 14:52
@TheIncorrigible1 I'm new to python, maybe I'm doing some bad practice, but it's the way I found to extract what I need.
– unusuario
Mar 27 at 14:53
@TheIncorrigible1 I ask you please do not mark my answer as resolved, beyond whether I am doing a bad practice, I have a functional code, and the problem I have could also occur if done differently. please I want you to see my problem, it's kind of weird.
– unusuario
Mar 27 at 14:57
Possible duplicate of RegEx match open tags except XHTML self-contained tags
– Ralf
Mar 27 at 14:59
@Ralf is not duplicated, I ask you please do not mark my answer as duplicate. My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
– unusuario
Mar 27 at 15:02