UnicodeEncodeError while using spark-submit and BeautifulSoupHow to remove items from a list while iterating?UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)Python 2.7 on OS X: TypeError: 'frozenset' object is not callable on each commandreading json file in pysparksocket.timeout mongoDB pysparkFailed to get broadcast_1_piece0 of broadcast_1 in pyspark applicationHow to convert type <class 'pyspark.sql.types.Row'> into VectorHow to remove records having Null values for Product price in pysparkSending JSON with utf-8 characters via robot+python-2.7 into APIUnicodeEncodeError: Ascii codec can't encode character u2581 in position 0: ordinal not in range(128)

Multi tool use
Multi tool use

Why don't politicians push for fossil fuel reduction by pointing out their scarcity?

Why doesn't the Falcon-9 first stage use three legs to land?

Vacuum collapse -- why do strong metals implode but glass doesn't?

Are there categories whose internal hom is somewhat 'exotic'?

But though we be the children of technology

Land Registry Clause

Why don't sharp and flat root note chords seem to be present in much guitar music?

iPhone 8 purchased through AT&T change to T-Mobile

Are required indicators necessary for radio buttons?

90s(?) book series about two people transported to a parallel medieval world, she joins city watch, he becomes wizard

Changing a TGV booking

Is it appropriate for a business to ask me for my credit report?

Can others monetize my project with GPLv3?

What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?

Use of vor in this sentence

Is there a commercial liquid with refractive index greater than n=2?

How can I describe being temporarily stupid?

Starships without computers?

Count the frequency of items in an array

Can I submit a paper under an alias so as to avoid trouble in my country?

Alchemist potion on Undead

!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!

Does the Symbiotic Entity damage apply to a creature hit by the secondary damage of Green Flame Blade?

Why would the President need briefings on UFOs?



UnicodeEncodeError while using spark-submit and BeautifulSoup


How to remove items from a list while iterating?UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)Python 2.7 on OS X: TypeError: 'frozenset' object is not callable on each commandreading json file in pysparksocket.timeout mongoDB pysparkFailed to get broadcast_1_piece0 of broadcast_1 in pyspark applicationHow to convert type <class 'pyspark.sql.types.Row'> into VectorHow to remove records having Null values for Product price in pysparkSending JSON with utf-8 characters via robot+python-2.7 into APIUnicodeEncodeError: Ascii codec can't encode character u2581 in position 0: ordinal not in range(128)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question
























  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21


















0















I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question
























  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21














0












0








0








I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```









share|improve this question














I keep getting a UnicodeEncodeError in Python 2.7 when I submit a job to spark 1.6, hadoop 2.7 but I do not get the same error when I am executing the same code line by line on the pyspark shell.



I am using BeautifulSoup to get all the tags and getting the text from them using this line of code:



[r.text for r in BeautifulSoup(line).findAll('ref') if r.text]



I have tried the following things:



  1. Set the export PYTHONIOENCODING="utf8"

  2. Use r.text.encode('ascii', 'ignore')

  3. Also tried to apply sysdefaultencoding('utf-8')

Could please someone tell me how to fix it? Below is the error stack:



"/hdata/dev/sdf1/hadoop/yarn/local/usercache/harshdee/appcache/application_1551632819863_0039/container_e36_1551632819863_0039_01_000004/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/harshdee/get_data.py", line 63, in get_as_row
return Row(citations=get_citations(line.content), id=line.id, title=line.title)
File "/home/harshdee/get_data.py", line 47, in get_citations
refs_in_line = [r.text for r in BeautifulSoup(line).findAll('ref') if r.text]
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 274, in __init__
self._check_markup_is_url(markup)
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 336, in _check_markup_is_url
' that document to Beautiful Soup.' % decoded_markup
File "/usr/lib64/python2.7/warnings.py", line 29, in _show_warning
file.write(formatwarning(message, category, filename, lineno, line))
File "/usr/lib64/python2.7/warnings.py", line 38, in formatwarning
s = "%s:%s: %s: %sn" % (filename, lineno, category.__name__, message)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-28: ordinal not in range(128)```






python python-2.7 apache-spark hadoop beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 5 at 11:16









Harshdeep SinghHarshdeep Singh

1292 silver badges15 bronze badges




1292 silver badges15 bronze badges















  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21


















  • Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

    – Serge Ballesta
    Mar 5 at 12:36











  • No, I haven't I will give it a try!

    – Harshdeep Singh
    Mar 5 at 12:57











  • [r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

    – Harshdeep Singh
    Mar 5 at 13:21

















Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

– Serge Ballesta
Mar 5 at 12:36





Python 2.7 is now rather old, but have you tried to convert everything to unicode: ...findAll(u'ref')?

– Serge Ballesta
Mar 5 at 12:36













No, I haven't I will give it a try!

– Harshdeep Singh
Mar 5 at 12:57





No, I haven't I will give it a try!

– Harshdeep Singh
Mar 5 at 12:57













[r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

– Harshdeep Singh
Mar 5 at 13:21






[r.text for r in BeautifulSoup(line).findAll(u'ref') if r.text]: tried this but gave me the same error @SergeBallesta

– Harshdeep Singh
Mar 5 at 13:21













1 Answer
1






active

oldest

votes


















0














I solved the problem on my own. I think the problem was in the representation of the string.



For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



I applied this on the line variable.






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55001558%2funicodeencodeerror-while-using-spark-submit-and-beautifulsoup%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I solved the problem on my own. I think the problem was in the representation of the string.



    For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



    I applied this on the line variable.






    share|improve this answer





























      0














      I solved the problem on my own. I think the problem was in the representation of the string.



      For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



      I applied this on the line variable.






      share|improve this answer



























        0












        0








        0







        I solved the problem on my own. I think the problem was in the representation of the string.



        For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



        I applied this on the line variable.






        share|improve this answer













        I solved the problem on my own. I think the problem was in the representation of the string.



        For this, I used the repr function which returns the object representation. In other words, it basically returns a string which is uniformly encoded.



        I applied this on the line variable.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 27 at 14:51









        Harshdeep SinghHarshdeep Singh

        1292 silver badges15 bronze badges




        1292 silver badges15 bronze badges





















            Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







            Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55001558%2funicodeencodeerror-while-using-spark-submit-and-beautifulsoup%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            ifNl3QaHcwasGTZNk,9R K1f4KVCEiF,OKGwLOt6Ax43qwJUOjuxxQLL6z,M wRLsIJSLg3BAfv3Zy25C w9
            YM,IEzuhfi,Ac07yk3G4XA9iHeTS6bKBN621LkGZPR BMaLd,hDW,zmxcZOAyQELp

            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현