- April 15, 2021
- Posted by: Shreya Aggarwal
- Category: Blogs
Web scraping using Selenium is a largely discussed topic.
As the name suggests, this is a technique used for extracting data from websites. If manually done would be a very tedious job for example when a big e-commerce store wants to review or monitor price changes of products, or weather data monitoring they generally use web scrapping.
So, let us start step by step and delve deep into understanding the process:
To know more about web scraping using selenium and Java click here
- Downloading MongoDB:
MongoDB is open-source. You need to download Mongo dB (latest version preferable) and set MongoDB’s bin path in the windows environment variable.
Once this is done, one would need to start both client (mongo) and server (mongod) on cmd. The mongodb client connects to localhost(mongodb://127.0.0.1:27017/)
- Adding dependencies in POM.xml
WebDriverManager by Boni Garcia helps us manage driver-related software settings with ease and rescue us from manually downloading and setting up drivers for browsers.
Place the below dependencies in your POM.xml file.
- Writing code for Web Scraping
After the installation of all the required software and drivers, we would need to start with the coding part. Create a connection with MongoDB. Also, create a database and collection (as it is a NoSQL database program) into it. We’ll be using the book to scrape the website here. So, you would simply need to direct your driver towards the URL (https://books.toscrape.com/). The next thing you would need to do is to scrape some data from the page. So, we would scrape the title, URL, several links, and images present on the page along with their HREFMongoDB and source values respectively.
Here’s the full code:
- Viewing the data on MongoDB Database:
As you all are aware, in MongoDB, there are no rows or columns, so data here will be stored in the form of a document.
You can view the output on the MongoDB database in JSON format with the below commands:
As used in the code, here autodb is the database name and web, is the collection name.
After performing the web scraping, one would get all the requested data in one place from the website which could be further used for data analysis and to make strategic decisions.
You now have the power to scrape! 💪
You now have the foundational skills necessary to scrape websites.