Webscraping using Selenium with nodeJs

Posted By : Parveen Kumar Yadav | 27-Jun-2017

nodejs

For webscarpping you can go with phantom.js, nightmare.js etc. But in some case while using phantomJs or nightmare some server detect that the call is from the bot not by original user so in some case you can avoid that by using selenium not worked in all cases but yes this is one of the option to do scarping. It is a web testing framework that automatically loads the web browser to mimic a normal user. Once a page loads, you can scrape the content. For using selenium in your project you need to follow the steps:-

npm install selenium-standalone@latest -g
selenium-standalone install
selenium-standalone start

you can check the document for this in:-

https://www.npmjs.com/package/selenium-standalone

After that you need to install selenium web-driver

npm install selenium-webdriver

For detail description of installing and usage you can go through with the link:-

https://www.npmjs.com/package/selenium-webdriver

if you will get the following error:-

Error: The geckodriver executable could not be found on the current PATH. Please download the latest version from https://github.com/mozilla/geckodriver/releases/WebDriver and ensure it can be found on your PATH.

than you need to download the latest version of geckodriver or first check your path also. If you are using Ubuntu than you can directly install the geckodriver from the following link:-

https://askubuntu.com/questions/870530/how-to-install-geckodriver-in-Ubuntu

After that you also need to install the compatible firefox version for that you can download easily via following link:-

https://askubuntu.com/questions/661186/how-to-install-previous-firefox-version --> install any version Firefox

That issue is related to the version of Firefox and also the version we are using for geckodriver, so i upgrade my Firefox browser to the stable version i.e. 51.0.1 and also upgrade driver to 0.16.1 and set again the PATH in Bashrc after that the issue we were facing was resolved. Now if all works fine than you can get the html content of any webpage via the pageSource property.

driver = webdriver.Firefox();
driver.get("http://example.com");
html = driver.getPageSource();

in this way you can get the source of page using selenium web driver in NodeJS.

Hope this will help. Thanks!

Related Tags

About Author

Parveen Kumar Yadav

Parveen is an experienced Java Developer working on Java, J2EE, Spring, Hibernate, Grails ,Node.js,Meteor,Blaze, Neo4j, MongoDB, Wowza Streaming Server,FFMPEG,Video transcoding,Amazon web services, AngularJs, javascript. He likes to learn new technologies

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

Webscraping using Selenium with nodeJs

Posted By : Parveen Kumar Yadav | 27-Jun-2017

Related Tags

About Author

Parveen Kumar Yadav

More From Oodles

Ready to Expand? Discover PWA Tactics That Drive Growth

In this article, we will explore in-depth how progressive enhancement can elevate your digital expansion approach, its principles, and how you can implement it to improve your accessibility.

Arpita Pal | 22-Jan-2025

Essential Security Practices for Securing Your Business’s PWA

In this article, we will explore the potential threats to your PWA and impactful strategies to safeguard it against these risks.

Arpita Pal | 30-Dec-2024

Elevate Your Digital Presence with Our PWA Optimization Solutions

In this article, we have provided a detailed discussion of the most effective strategies to elevate its user experience, performance and functionality.

Arpita Pal | 29-Nov-2024

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2025 Oodles Technologies Pvt Ltd. All rights reserved.