UnicodeEncodeError while using spark-submit and BeautifulSoupHow to remove items from a list while iterating?UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)Python 2.7 on OS X: TypeError: 'frozenset' object is not callable on each commandreading json file in pysparksocket.timeout mongoDB pysparkFailed to get broadcast_1_piece0 of broadcast_1 in pyspark applicationHow to convert type <class 'pyspark.sql.types.Row'> into VectorHow to remove records having Null values for Product price in pysparkSending JSON with utf-8 characters via robot+python-2.7 into APIUnicodeEncodeError: Ascii codec can't encode character u2581 in position 0: ordinal not in range(128)

Why don't politicians push for fossil fuel reduction by pointing out their scarcity?

Why doesn't the Falcon-9 first stage use three legs to land?

Vacuum collapse -- why do strong metals implode but glass doesn't?

Are there categories whose internal hom is somewhat 'exotic'?

But though we be the children of technology

Land Registry Clause

Why don't sharp and flat root note chords seem to be present in much guitar music?

iPhone 8 purchased through AT&T change to T-Mobile

Are required indicators necessary for radio buttons?

90s(?) book series about two people transported to a parallel medieval world, she joins city watch, he becomes wizard

Changing a TGV booking

Is it appropriate for a business to ask me for my credit report?

Can others monetize my project with GPLv3?

What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?

Use of vor in this sentence

Is there a commercial liquid with refractive index greater than n=2?

How can I describe being temporarily stupid?

Starships without computers?

Count the frequency of items in an array

Can I submit a paper under an alias so as to avoid trouble in my country?

Alchemist potion on Undead

!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!

Does the Symbiotic Entity damage apply to a creature hit by the secondary damage of Green Flame Blade?

Why would the President need briefings on UFOs?



UnicodeEncodeError while using spark-submit and BeautifulSoup


How to remove items from a list while iterating?UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)Python 2.7 on OS X: TypeError: 'frozenset' object is not callable on each commandreading json file in pysparksocket.timeout mongoDB pysparkFailed to get broadcast_1_piece0 of broadcast_1 in pyspark applicationHow to convert type <class 'pyspark.sql.types.Row'> into VectorHow to remove records having Null values for Product price in pysparkSending JSON with utf-8 characters via robot+python-2.7 into APIUnicodeEncodeError: Ascii codec can't encode character u2581 in position 0: ordinal not in range(128)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question
























  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21


















0















I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question
























  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21














0












0








0








I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question














I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```






python python-2.7 apache-spark hadoop beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 5 at 11:16









Harshdeep SinghHarshdeep Singh

1292 silver badges15 bronze badges




1292 silver badges15 bronze badges















  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21


















  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21

















Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

– Serge Ballesta
Mar 5 at 12:36





Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

– Serge Ballesta
Mar 5 at 12:36













No, I haven't I will give it a try!

– Harshdeep Singh
Mar 5 at 12:57





No, I haven't I will give it a try!

– Harshdeep Singh
Mar 5 at 12:57













[r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

– Harshdeep Singh
Mar 5 at 13:21






[r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

– Harshdeep Singh
Mar 5 at 13:21













1 Answer
1






active

oldest

votes


















0














I solved the problem on my own. I think the problem was in the representation of the string.



For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



I applied this on the line variable.






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55001558%2funicodeencodeerror-while-using-spark-submit-and-beautifulsoup%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I solved the problem on my own. I think the problem was in the representation of the string.



    For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



    I applied this on the line variable.






    share|improve this answer





























      0














      I solved the problem on my own. I think the problem was in the representation of the string.



      For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



      I applied this on the line variable.






      share|improve this answer



























        0












        0








        0







        I solved the problem on my own. I think the problem was in the representation of the string.



        For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



        I applied this on the line variable.






        share|improve this answer













        I solved the problem on my own. I think the problem was in the representation of the string.



        For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



        I applied this on the line variable.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 27 at 14:51









        Harshdeep SinghHarshdeep Singh

        1292 silver badges15 bronze badges




        1292 silver badges15 bronze badges





















            Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







            Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55001558%2funicodeencodeerror-while-using-spark-submit-and-beautifulsoup%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript