using rendered page links via java clientIs Java “pass-by-reference” or “pass-by-value”?How do I efficiently iterate over each entry in a Java Map?Does a finally block always get executed in Java?What is the difference between public, protected, package-private and private in Java?How do I read / convert an InputStream into a String in Java?When to use LinkedList over ArrayList in Java?How do I generate random integers within a specific range in Java?How do I determine whether an array contains a particular value in Java?How do I convert a String to an int in Java?Creating a memory leak with Java

Did Voldemort kill his father before finding out about Horcruxes?

how many bits in the resultant hash will change, if the x bits are changed in its the original input

Animal Shelter Management C++

Interviewing with an unmentioned 9 months of sick leave taken during a job

How fast does a character need to move to be effectively invisible?

What powers the air required for pneumatic brakes in aircraft?

Should I be able to keep my company purchased standing desk when I leave my job?

Do I need a 50/60Hz notch filter for battery powered devices?

Manually select/unselect lines before forwarding to stdout

Can a pizza stone be fixed after soap has been used to clean it?

How should one refer to knights (& dames) in academic writing?

What are "full piece" and "half piece" in chess?

How to determine the optimal threshold to achieve the highest accuracy

Does the Intel 8085 CPU use real memory addresses?

Credit card details stolen every 1-2 years. What am I doing wrong?

What is the meaning of [[:space:]] in bash?

Is there an English equivalent for "Les carottes sont cuites", while keeping the vegetable reference?

Strategy to pay off revolving debt while building reserve savings fund?

Is there any conditions on a finite abelian group so that it cannot be class group of any number field?

Why does FFmpeg choose 10+20+20 ms instead of an even 16 ms for 60 fps GIF images?

What advantages do focused Arrows of Slaying have over more generic ones?

FPGA CPU's, how to find the max speed?

Why does "git status" show I'm on the master branch and "git branch" does not?

Why should I cook the flour first when making bechamel sauce?



using rendered page links via java client


Is Java “pass-by-reference” or “pass-by-value”?How do I efficiently iterate over each entry in a Java Map?Does a finally block always get executed in Java?What is the difference between public, protected, package-private and private in Java?How do I read / convert an InputStream into a String in Java?When to use LinkedList over ArrayList in Java?How do I generate random integers within a specific range in Java?How do I determine whether an array contains a particular value in Java?How do I convert a String to an int in Java?Creating a memory leak with Java






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I am given a url , I need to get this url html and from there get this site links .
I thought about using headless browsers . I m using java so I would like to sum it up using java process.

an example can be cnn site ...
So far I have tried using :




testCompile 'net.sourceforge.htmlunit:htmlunit:2.32'




 @Test
public void htmlUnitTest() throws Exception

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME))

webClient.waitForBackgroundJavaScriptStartingBefore(20000);
webClient.getOptions().setThrowExceptionOnScriptError(false);

final HtmlPage page = webClient.getPage(URL);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

List<HtmlAnchor> anchors = page.getAnchors();

System.out.println("anchors.size() : " + anchors.size());
System.out.println("***********");
System.out.println(content);
System.out.println("***********");

try (BufferedWriter writer = new BufferedWriter(new FileWriter("htmlUnit.txt")))
writer.write(content);





but the response I am getting the original HTML without being rendered (the java script havent worked and created the page anchors in my case )



can someone recommend on another library , or tell me if I miss using html unit and can suggest a working solution it will be very helpful.










share|improve this question






















  • please provide the url to give us a chance to reproduce your case

    – RBRi
    Mar 26 at 10:37











  • any site which has rendering try for instance edition.cnn.com

    – yoav.str
    Mar 27 at 8:31

















0















I am given a url , I need to get this url html and from there get this site links .
I thought about using headless browsers . I m using java so I would like to sum it up using java process.

an example can be cnn site ...
So far I have tried using :




testCompile 'net.sourceforge.htmlunit:htmlunit:2.32'




 @Test
public void htmlUnitTest() throws Exception

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME))

webClient.waitForBackgroundJavaScriptStartingBefore(20000);
webClient.getOptions().setThrowExceptionOnScriptError(false);

final HtmlPage page = webClient.getPage(URL);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

List<HtmlAnchor> anchors = page.getAnchors();

System.out.println("anchors.size() : " + anchors.size());
System.out.println("***********");
System.out.println(content);
System.out.println("***********");

try (BufferedWriter writer = new BufferedWriter(new FileWriter("htmlUnit.txt")))
writer.write(content);





but the response I am getting the original HTML without being rendered (the java script havent worked and created the page anchors in my case )



can someone recommend on another library , or tell me if I miss using html unit and can suggest a working solution it will be very helpful.










share|improve this question






















  • please provide the url to give us a chance to reproduce your case

    – RBRi
    Mar 26 at 10:37











  • any site which has rendering try for instance edition.cnn.com

    – yoav.str
    Mar 27 at 8:31













0












0








0








I am given a url , I need to get this url html and from there get this site links .
I thought about using headless browsers . I m using java so I would like to sum it up using java process.

an example can be cnn site ...
So far I have tried using :




testCompile 'net.sourceforge.htmlunit:htmlunit:2.32'




 @Test
public void htmlUnitTest() throws Exception

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME))

webClient.waitForBackgroundJavaScriptStartingBefore(20000);
webClient.getOptions().setThrowExceptionOnScriptError(false);

final HtmlPage page = webClient.getPage(URL);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

List<HtmlAnchor> anchors = page.getAnchors();

System.out.println("anchors.size() : " + anchors.size());
System.out.println("***********");
System.out.println(content);
System.out.println("***********");

try (BufferedWriter writer = new BufferedWriter(new FileWriter("htmlUnit.txt")))
writer.write(content);





but the response I am getting the original HTML without being rendered (the java script havent worked and created the page anchors in my case )



can someone recommend on another library , or tell me if I miss using html unit and can suggest a working solution it will be very helpful.










share|improve this question














I am given a url , I need to get this url html and from there get this site links .
I thought about using headless browsers . I m using java so I would like to sum it up using java process.

an example can be cnn site ...
So far I have tried using :




testCompile 'net.sourceforge.htmlunit:htmlunit:2.32'




 @Test
public void htmlUnitTest() throws Exception

try (final WebClient webClient = new WebClient(BrowserVersion.CHROME))

webClient.waitForBackgroundJavaScriptStartingBefore(20000);
webClient.getOptions().setThrowExceptionOnScriptError(false);

final HtmlPage page = webClient.getPage(URL);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

List<HtmlAnchor> anchors = page.getAnchors();

System.out.println("anchors.size() : " + anchors.size());
System.out.println("***********");
System.out.println(content);
System.out.println("***********");

try (BufferedWriter writer = new BufferedWriter(new FileWriter("htmlUnit.txt")))
writer.write(content);





but the response I am getting the original HTML without being rendered (the java script havent worked and created the page anchors in my case )



can someone recommend on another library , or tell me if I miss using html unit and can suggest a working solution it will be very helpful.







java htmlunit






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 26 at 8:09









yoav.stryoav.str

5925 gold badges25 silver badges62 bronze badges




5925 gold badges25 silver badges62 bronze badges












  • please provide the url to give us a chance to reproduce your case

    – RBRi
    Mar 26 at 10:37











  • any site which has rendering try for instance edition.cnn.com

    – yoav.str
    Mar 27 at 8:31

















  • please provide the url to give us a chance to reproduce your case

    – RBRi
    Mar 26 at 10:37











  • any site which has rendering try for instance edition.cnn.com

    – yoav.str
    Mar 27 at 8:31
















please provide the url to give us a chance to reproduce your case

– RBRi
Mar 26 at 10:37





please provide the url to give us a chance to reproduce your case

– RBRi
Mar 26 at 10:37













any site which has rendering try for instance edition.cnn.com

– yoav.str
Mar 27 at 8:31





any site which has rendering try for instance edition.cnn.com

– yoav.str
Mar 27 at 8:31












1 Answer
1






active

oldest

votes


















0














The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().



One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.






share|improve this answer

























  • this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

    – yoav.str
    Mar 28 at 19:05











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55352432%2fusing-rendered-page-links-via-java-client%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().



One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.






share|improve this answer

























  • this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

    – yoav.str
    Mar 28 at 19:05
















0














The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().



One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.






share|improve this answer

























  • this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

    – yoav.str
    Mar 28 at 19:05














0












0








0







The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().



One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.






share|improve this answer















The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().



One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 29 at 17:44

























answered Mar 27 at 18:19









RBRiRBRi

1,4512 gold badges7 silver badges10 bronze badges




1,4512 gold badges7 silver badges10 bronze badges












  • this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

    – yoav.str
    Mar 28 at 19:05


















  • this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

    – yoav.str
    Mar 28 at 19:05

















this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

– yoav.str
Mar 28 at 19:05






this js is the page being render once the java script is being interpreted on page start . without any event triggering ... and the motivation is to be website agnostic , meaning I don't want to be aware to this site architecture ... how real world crawlers such yahoo and google does it ?

– yoav.str
Mar 28 at 19:05









Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55352432%2fusing-rendered-page-links-via-java-client%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현