Web Scraping with Selenium Java using MongoDB

Introduction:

Web scraping using Selenium is a largely discussed topic.

As the name suggests, this is a technique used for extracting data from websites. If manually done would be a very tedious job for example when a big e-commerce store wants to review or monitor price changes of products, or weather data monitoring they generally use web scrapping.

So, let us start step by step and delve deep into understanding the process:

To know more about web scraping using selenium and Java click here

  1. Downloading MongoDB:

MongoDB is open-source. You need to download Mongo dB (latest version preferable) and set MongoDB’s bin path in the windows environment variable.

Downloading MongoDB
Downloading MongoDB

Once this is done, one would need to start both client (mongo) and server (mongod) on cmd. The mongodb client connects to localhost(mongodb://127.0.0.1:27017/)

mongod
mongod

  1. Adding dependencies in POM.xml

WebDriverManager by Boni Garcia helps us manage driver-related software settings with ease and rescue us from manually downloading and setting up drivers for browsers.

Place the below dependencies in your POM.xml file.

POM.xml file
POM.xml file
  1. Writing code for Web Scraping

After the installation of all the required software and drivers, we would need to start with the coding part. Create a connection with MongoDB. Also, create a database and collection (as it is a NoSQL database program) into it. We’ll be using the book to scrape the website here. So, you would simply need to direct your driver towards the URL (https://books.toscrape.com/). The next thing you would need to do is to scrape some data from the page. So, we would scrape the title, URL, several links, and images present on the page along with their HREFMongoDB and source values respectively.

Here’s the full code:

Web Scraping
Web Scraping

Web Scraping
Web Scraping

  1. Viewing the data on MongoDB Database:

As you all are aware, in MongoDB, there are no rows or columns, so data here will be stored in the form of a document.

You can view the output on the MongoDB database in JSON format with the below commands:

>use autodb

>db.web.find().pretty();

As used in the code, here autodb is the database name and web, is the collection name.

MongoDB Database
MongoDB Database

Conclusion: 

After performing the web scraping, one would get all the requested data in one place from the website which could be further used for data analysis and to make strategic decisions.
You now have the power to scrape! 💪You now have the foundational skills necessary to scrape websites.

Web scraping with Selenium Java and MongoDB empowers data extraction and analysis. For enhanced web scraping endeavours, consider exploring Afour Technologies. Their expertise in cyber security testing, DevOps consulting, and test strategy can elevate your projects. Visit AfourTech for more information and embark on a transformative web scraping journey.

9 Comments

Leave a Reply