How can I get proper response back from scrapy?How can I safely create a nested directory?How can I remove a trailing newline in Python?How to randomly select an item from a list?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list in Python?Scrapy is throwing URL errorBack to basics: ScrapyPass extra values with start_url without meta to Scrapy spiderHow to process Scrapy output for NLP?

How does the Around command at zero work?

How can one's career as a reviewer be ended?

Why was this person allowed to become Grand Maester?

A word that means "blending into a community too much"

Grep Match and extract

Should I refuse to be named as co-author of a low quality paper?

If there's something that implicates the president why is there then a national security issue? (John Dowd)

Teaching a class likely meant to inflate the GPA of student athletes

What would be the way to say "just saying" in German? (Not the literal translation)

Did Apple bundle a specific monitor with the Apple II+ for schools?

Russian word for a male zebra

What aircraft was used as Air Force One for the flight between Southampton and Shannon?

Why did Intel abandon unified CPU cache?

Amplitude of a crest and trough in a sound wave?

Generate basis elements of the Steenrod algebra

Does a bank have to tell me if a check made out to me was cashed there?

Do people with slow metabolism tend to gain weight (fat) if they stop exercising?

How creative should the DM let an artificer be in terms of what they can build?

Which languages would be most useful in Europe at the end of the 19th century?

What is the polarity of this barrel plug with a double circle?

What STL algorithm can determine if exactly one item in a container satisfies a predicate?

Who won a Game of Bar Dice?

Why am I Seeing A Weird "Notch" on the Data Line For Some Logical 1s?

How can I remove material from this wood beam?



How can I get proper response back from scrapy?


How can I safely create a nested directory?How can I remove a trailing newline in Python?How to randomly select an item from a list?How to get the current time in PythonHow can I make a time delay in Python?How do I get the number of elements in a list in Python?Scrapy is throwing URL errorBack to basics: ScrapyPass extra values with start_url without meta to Scrapy spiderHow to process Scrapy output for NLP?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am trying to scrape some search results from this company register, but when i try to scrape the company name my results dont seem to return properly, its like the company name item is split into 2 html items based of the search keyword.



Is there a way to join these together? This is my spider



import scrapy

class QuotesSpider(scrapy.Spider):

name = 'gov2'
start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

def parse(self, response):

for i in response.css('ul.results-list'):
yield
'company_name': i.css('li.type-company h3 a::text').extract(),
'address': i.css('li.type-company p::text').extract(),



My results as you can see its missing some parts.. enter image description here



Hope any of you see whats going on.. thank you!










share|improve this question




























    0















    I am trying to scrape some search results from this company register, but when i try to scrape the company name my results dont seem to return properly, its like the company name item is split into 2 html items based of the search keyword.



    Is there a way to join these together? This is my spider



    import scrapy

    class QuotesSpider(scrapy.Spider):

    name = 'gov2'
    start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

    def parse(self, response):

    for i in response.css('ul.results-list'):
    yield
    'company_name': i.css('li.type-company h3 a::text').extract(),
    'address': i.css('li.type-company p::text').extract(),



    My results as you can see its missing some parts.. enter image description here



    Hope any of you see whats going on.. thank you!










    share|improve this question
























      0












      0








      0








      I am trying to scrape some search results from this company register, but when i try to scrape the company name my results dont seem to return properly, its like the company name item is split into 2 html items based of the search keyword.



      Is there a way to join these together? This is my spider



      import scrapy

      class QuotesSpider(scrapy.Spider):

      name = 'gov2'
      start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

      def parse(self, response):

      for i in response.css('ul.results-list'):
      yield
      'company_name': i.css('li.type-company h3 a::text').extract(),
      'address': i.css('li.type-company p::text').extract(),



      My results as you can see its missing some parts.. enter image description here



      Hope any of you see whats going on.. thank you!










      share|improve this question














      I am trying to scrape some search results from this company register, but when i try to scrape the company name my results dont seem to return properly, its like the company name item is split into 2 html items based of the search keyword.



      Is there a way to join these together? This is my spider



      import scrapy

      class QuotesSpider(scrapy.Spider):

      name = 'gov2'
      start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

      def parse(self, response):

      for i in response.css('ul.results-list'):
      yield
      'company_name': i.css('li.type-company h3 a::text').extract(),
      'address': i.css('li.type-company p::text').extract(),



      My results as you can see its missing some parts.. enter image description here



      Hope any of you see whats going on.. thank you!







      python web-scraping scrapy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 24 at 20:44









      Hi tEHi tE

      235




      235






















          2 Answers
          2






          active

          oldest

          votes


















          1














          As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.



          Try this one and remove the unnecessary spaces through regex:



          import scrapy
          import re

          class QuotesSpider(scrapy.Spider):

          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):

          for i in response.css('ul.results-list'):
          yield
          'company_name': re.sub('s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
          'address': re.sub('s+',' ',''.join(i.css('li.type-company p ::text').extract())),






          share|improve this answer




















          • 1





            Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

            – Hi tE
            Mar 25 at 11:37


















          1














          Using the regex, just modified the code for a better output.



          import re
          import scrapy


          class QuotesSpider(scrapy.Spider):
          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):
          for i in response.css('.type-company'):
          yield
          'company_name': re.sub('s+', ' ', ''.join(i.css('h3 a ::text').extract())),
          'address': re.sub('s+', ' ', ''.join(i.css('p ::text').extract())),






          share|improve this answer























          • Thanks the output looks way betetr hahah

            – Hi tE
            Mar 25 at 11:39











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328394%2fhow-can-i-get-proper-response-back-from-scrapy%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.



          Try this one and remove the unnecessary spaces through regex:



          import scrapy
          import re

          class QuotesSpider(scrapy.Spider):

          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):

          for i in response.css('ul.results-list'):
          yield
          'company_name': re.sub('s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
          'address': re.sub('s+',' ',''.join(i.css('li.type-company p ::text').extract())),






          share|improve this answer




















          • 1





            Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

            – Hi tE
            Mar 25 at 11:37















          1














          As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.



          Try this one and remove the unnecessary spaces through regex:



          import scrapy
          import re

          class QuotesSpider(scrapy.Spider):

          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):

          for i in response.css('ul.results-list'):
          yield
          'company_name': re.sub('s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
          'address': re.sub('s+',' ',''.join(i.css('li.type-company p ::text').extract())),






          share|improve this answer




















          • 1





            Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

            – Hi tE
            Mar 25 at 11:37













          1












          1








          1







          As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.



          Try this one and remove the unnecessary spaces through regex:



          import scrapy
          import re

          class QuotesSpider(scrapy.Spider):

          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):

          for i in response.css('ul.results-list'):
          yield
          'company_name': re.sub('s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
          'address': re.sub('s+',' ',''.join(i.css('li.type-company p ::text').extract())),






          share|improve this answer















          As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.



          Try this one and remove the unnecessary spaces through regex:



          import scrapy
          import re

          class QuotesSpider(scrapy.Spider):

          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):

          for i in response.css('ul.results-list'):
          yield
          'company_name': re.sub('s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
          'address': re.sub('s+',' ',''.join(i.css('li.type-company p ::text').extract())),







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 25 at 5:44

























          answered Mar 25 at 1:22









          PankajPankaj

          856714




          856714







          • 1





            Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

            – Hi tE
            Mar 25 at 11:37












          • 1





            Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

            – Hi tE
            Mar 25 at 11:37







          1




          1





          Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

          – Hi tE
          Mar 25 at 11:37





          Thank you very much, that is what I was looking for! Ill look some more into regex! thats amazing haha!

          – Hi tE
          Mar 25 at 11:37













          1














          Using the regex, just modified the code for a better output.



          import re
          import scrapy


          class QuotesSpider(scrapy.Spider):
          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):
          for i in response.css('.type-company'):
          yield
          'company_name': re.sub('s+', ' ', ''.join(i.css('h3 a ::text').extract())),
          'address': re.sub('s+', ' ', ''.join(i.css('p ::text').extract())),






          share|improve this answer























          • Thanks the output looks way betetr hahah

            – Hi tE
            Mar 25 at 11:39















          1














          Using the regex, just modified the code for a better output.



          import re
          import scrapy


          class QuotesSpider(scrapy.Spider):
          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):
          for i in response.css('.type-company'):
          yield
          'company_name': re.sub('s+', ' ', ''.join(i.css('h3 a ::text').extract())),
          'address': re.sub('s+', ' ', ''.join(i.css('p ::text').extract())),






          share|improve this answer























          • Thanks the output looks way betetr hahah

            – Hi tE
            Mar 25 at 11:39













          1












          1








          1







          Using the regex, just modified the code for a better output.



          import re
          import scrapy


          class QuotesSpider(scrapy.Spider):
          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):
          for i in response.css('.type-company'):
          yield
          'company_name': re.sub('s+', ' ', ''.join(i.css('h3 a ::text').extract())),
          'address': re.sub('s+', ' ', ''.join(i.css('p ::text').extract())),






          share|improve this answer













          Using the regex, just modified the code for a better output.



          import re
          import scrapy


          class QuotesSpider(scrapy.Spider):
          name = 'gov2'
          start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']

          def parse(self, response):
          for i in response.css('.type-company'):
          yield
          'company_name': re.sub('s+', ' ', ''.join(i.css('h3 a ::text').extract())),
          'address': re.sub('s+', ' ', ''.join(i.css('p ::text').extract())),







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 25 at 2:49









          Arun AugustineArun Augustine

          304110




          304110












          • Thanks the output looks way betetr hahah

            – Hi tE
            Mar 25 at 11:39

















          • Thanks the output looks way betetr hahah

            – Hi tE
            Mar 25 at 11:39
















          Thanks the output looks way betetr hahah

          – Hi tE
          Mar 25 at 11:39





          Thanks the output looks way betetr hahah

          – Hi tE
          Mar 25 at 11:39

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328394%2fhow-can-i-get-proper-response-back-from-scrapy%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

          155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해