Can't Identify Proper CSS Selector to Scrape with MechanizeUsing regular expression in css?Mechanize not recognizing anchor tags via CSS selector methodsNokogiri and Mechanize help (navigating to pages via div class and scraping)How do I convert a Nokogiri statement into Mechanize for screen scraping?nokogiri + mechanize css selector by textIn scraping, can't login with MechanizeWeb Scraping with Nokogiri and MechanizePick the correct form from Mechanize results via CSS selectorMechanize search unable to find CSS selector (it's definitely present)Mechanize suddenly can't login anymoreRails mechanize data scraping correct data/cleaning it

Using "subway" as name for London Underground?

Passing multiple files through stdin (over ssh)

What can plausibly explain many of my very long and low-tech bridges?

Scrum Master role: Reporting?

Inconsistent behavior of compiler optimization of unused string

Is it a problem if <h4>, <h5> and <h6> are smaller than regular text?

Is an early checkout possible at a hotel before its reception opens?

Why was the Sega Genesis marketed as a 16-bit console?

Can a user sell my software (MIT license) without modification?

What makes Ada the language of choice for the ISS's safety-critical systems?

What makes an item an artifact?

Is open-sourcing the code of a webapp not recommended?

Does an ice chest packed full of frozen food need ice?

Should I compare a std::string to "string" or "string"s?

Find the Factorial From the Given Prime Relationship

When conversion from Integer to Single may lose precision

How to retract an idea already pitched to an employer?

What should the arbiter and what should have I done in this case?

How did they achieve the Gunslinger's shining eye effect in Westworld?

How can drunken, homicidal elves successfully conduct a wild hunt?

How to tell your grandparent to not come to fetch you with their car?

Words that signal future content

Which comes first? Multiple Imputation, Splitting into train/test, or Standardization/Normalization

Was there a priest on the Titanic who stayed on the ship giving confession to as many as he could?



Can't Identify Proper CSS Selector to Scrape with Mechanize


Using regular expression in css?Mechanize not recognizing anchor tags via CSS selector methodsNokogiri and Mechanize help (navigating to pages via div class and scraping)How do I convert a Nokogiri statement into Mechanize for screen scraping?nokogiri + mechanize css selector by textIn scraping, can't login with MechanizeWeb Scraping with Nokogiri and MechanizePick the correct form from Mechanize results via CSS selectorMechanize search unable to find CSS selector (it's definitely present)Mechanize suddenly can't login anymoreRails mechanize data scraping correct data/cleaning it






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have built a web scraper that is successfully pulling almost everything I need out of the web page I'm looking at. The goal is to pull the URL for a particular image associated with all the coffees found at a particular URL.



The rake task I have defined to complete the scraping is as follows:



mechanize = Mechanize.new
mechanize.get(url) do |page|
page.links_with(:href => /products/).each do |link|
coffee_page = link.click

bean = Bean.new

bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
bean.roaster_id = "2"
bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')

if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
bean.destroy
else
ap bean
end
end
end


Now the information I need is all on the page, and I'm looking for the image URL that is found like the below, but for all the individual coffee_pages at the source page. It needs to be generic enough to pull this picture source but nothing else. I've tried a number of different css selectors but everything pulls either nil or blank.



<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">


The coffee_page I'm on is here: https://shop.ceremonycoffee.com/products/burundi-kiryama










share|improve this question






















  • Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

    – max pleaner
    Mar 19 at 18:51











  • Let me know if my answer to your question is sufficient. If so please mark as correct.

    – NemyaNation
    Mar 28 at 23:00











  • Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

    – the Tin Man
    May 23 at 23:59


















0















I have built a web scraper that is successfully pulling almost everything I need out of the web page I'm looking at. The goal is to pull the URL for a particular image associated with all the coffees found at a particular URL.



The rake task I have defined to complete the scraping is as follows:



mechanize = Mechanize.new
mechanize.get(url) do |page|
page.links_with(:href => /products/).each do |link|
coffee_page = link.click

bean = Bean.new

bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
bean.roaster_id = "2"
bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')

if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
bean.destroy
else
ap bean
end
end
end


Now the information I need is all on the page, and I'm looking for the image URL that is found like the below, but for all the individual coffee_pages at the source page. It needs to be generic enough to pull this picture source but nothing else. I've tried a number of different css selectors but everything pulls either nil or blank.



<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">


The coffee_page I'm on is here: https://shop.ceremonycoffee.com/products/burundi-kiryama










share|improve this question






















  • Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

    – max pleaner
    Mar 19 at 18:51











  • Let me know if my answer to your question is sufficient. If so please mark as correct.

    – NemyaNation
    Mar 28 at 23:00











  • Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

    – the Tin Man
    May 23 at 23:59














0












0








0








I have built a web scraper that is successfully pulling almost everything I need out of the web page I'm looking at. The goal is to pull the URL for a particular image associated with all the coffees found at a particular URL.



The rake task I have defined to complete the scraping is as follows:



mechanize = Mechanize.new
mechanize.get(url) do |page|
page.links_with(:href => /products/).each do |link|
coffee_page = link.click

bean = Bean.new

bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
bean.roaster_id = "2"
bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')

if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
bean.destroy
else
ap bean
end
end
end


Now the information I need is all on the page, and I'm looking for the image URL that is found like the below, but for all the individual coffee_pages at the source page. It needs to be generic enough to pull this picture source but nothing else. I've tried a number of different css selectors but everything pulls either nil or blank.



<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">


The coffee_page I'm on is here: https://shop.ceremonycoffee.com/products/burundi-kiryama










share|improve this question














I have built a web scraper that is successfully pulling almost everything I need out of the web page I'm looking at. The goal is to pull the URL for a particular image associated with all the coffees found at a particular URL.



The rake task I have defined to complete the scraping is as follows:



mechanize = Mechanize.new
mechanize.get(url) do |page|
page.links_with(:href => /products/).each do |link|
coffee_page = link.click

bean = Bean.new

bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
bean.roaster_id = "2"
bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')

if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
bean.destroy
else
ap bean
end
end
end


Now the information I need is all on the page, and I'm looking for the image URL that is found like the below, but for all the individual coffee_pages at the source page. It needs to be generic enough to pull this picture source but nothing else. I've tried a number of different css selectors but everything pulls either nil or blank.



<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">


The coffee_page I'm on is here: https://shop.ceremonycoffee.com/products/burundi-kiryama







ruby-on-rails ruby nokogiri mechanize






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 19 at 6:58









Andrew HymanAndrew Hyman

93




93












  • Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

    – max pleaner
    Mar 19 at 18:51











  • Let me know if my answer to your question is sufficient. If so please mark as correct.

    – NemyaNation
    Mar 28 at 23:00











  • Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

    – the Tin Man
    May 23 at 23:59


















  • Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

    – max pleaner
    Mar 19 at 18:51











  • Let me know if my answer to your question is sufficient. If so please mark as correct.

    – NemyaNation
    Mar 28 at 23:00











  • Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

    – the Tin Man
    May 23 at 23:59

















Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

– max pleaner
Mar 19 at 18:51





Css does have substring matching, so you could use img[src^='//cdn.shopify.com/s/files/'] (not sure if that is specific enough for your needs, you can scope to a parent if required). See stackoverflow.com/questions/8903313/… and w3.org/TR/selectors/#attribute-substrings

– max pleaner
Mar 19 at 18:51













Let me know if my answer to your question is sufficient. If so please mark as correct.

– NemyaNation
Mar 28 at 23:00





Let me know if my answer to your question is sufficient. If so please mark as correct.

– NemyaNation
Mar 28 at 23:00













Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

– the Tin Man
May 23 at 23:59






Please read "How to Ask". When asking about a problem with your code we need the minimum data necessary to demonstrate the problem in the question itself. A link forces us to search through a page's HTML which wastes our time and discourages people from trying to help you. We need you to prepare the question so we can help you. In addition, now that the link is broken your question makes little sense.

– the Tin Man
May 23 at 23:59













1 Answer
1






active

oldest

votes


















0














You need to change



bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')


to



bean.image_url = coffee_page.css('#mobile-only>img').attr('src')


If you can, always use nearby identifiers to locate the element you want to access.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55235229%2fcant-identify-proper-css-selector-to-scrape-with-mechanize%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You need to change



    bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')


    to



    bean.image_url = coffee_page.css('#mobile-only>img').attr('src')


    If you can, always use nearby identifiers to locate the element you want to access.






    share|improve this answer



























      0














      You need to change



      bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')


      to



      bean.image_url = coffee_page.css('#mobile-only>img').attr('src')


      If you can, always use nearby identifiers to locate the element you want to access.






      share|improve this answer

























        0












        0








        0







        You need to change



        bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')


        to



        bean.image_url = coffee_page.css('#mobile-only>img').attr('src')


        If you can, always use nearby identifiers to locate the element you want to access.






        share|improve this answer













        You need to change



        bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')


        to



        bean.image_url = coffee_page.css('#mobile-only>img').attr('src')


        If you can, always use nearby identifiers to locate the element you want to access.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 24 at 16:28









        NemyaNationNemyaNation

        8010




        8010





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55235229%2fcant-identify-proper-css-selector-to-scrape-with-mechanize%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현