Scraping Images from a Static Website

Rohit Joshi
3 min readApr 17, 2021

Web Scraping? ….Wait what’s “web scraping”?… How to scrape images?
These are some of the questions that might have crossed your mind after reading the title. Well, you will get answers to all the above questions in a few minutes.
So, lets begin…

What is Web Scraping?

In layman’s terms “web scrapping” is the process in which we extract data from websites. Mostly the process is automated using a program written in python or any other language. These websites maybe static or dynamic.

Wait.. what are static and dynamic web-pages?
Static Web-pages are built using HTML,CSS, JavaScript etc. The structure of the page is same for all the visitors.
Dynamic Web-pages are built using CGI,ASP.NET etc. . The content to be displayed differs from person to person. The best examples of dynamic webpages are the ones where the data is updated repeatedly.

Why do we need Web Scrapping?

Lets try to understand this using an example:
Suppose you are looking for internships and everyday you visit internship portals looking for opportunities. Visiting websites , checking for right intern position and getting the details seems to be a mundane or rather a tedious task. What if we can automate this process? Voila!!🤩 That’s where Web Scrapping comes to our rescue. We can write a program which will look for internship positions with given preferences and extract this information in a well structured format.

How to Scrape images from a website?

So finally we are now going to look how to scrape images. I am using python as the programming language for scrapping..

Step 1: Know the Python modules
We need some python modules/libraries which will help us in this task. These modules are:
1)requests: This allows us to send http request to the website.
2)Beautifulsoup: This will help us for getting required data out of HTML,XML or any other markup language.
3)urllib : urllib is a package of several modules for working with URLs.

Step 2: Importing these modules

import requests
from bs4 import BeautifulSoup
import urllib.request
import random
import os

Step 3: Writing the main block of code

url="https://www.creativeshrimp.com/top-30-artworks-of-beeple.html"source_code=requests.get(url)

Explanation: The url variable holds the URL of the website from where we are going to extract images.
requests.get() function is used to send http request to the url and stores the response from the website in the source_code variable.

plain_text=source_code.textsoup=BeautifulSoup(plain_text)

Explanation: .text is used to convert the raw bytes of the response into string.
BeautifulSoup() is used to parse the data. Basically parsing means making something understandable. In this case we are parsing the data to make it human readable.

for link in soup.find_all("a",{"class":"lightbox"}):
href=link.get('href')
print(href)

Explanation:

If you visit the above given URL , you will notice all the images that we are trying to scrape are inside a tag and have class called “lightbox”.

So in the above code snippet we are using a for loop with find_all() [find_all is used to extract all the occurrences of a particular tag from a page] to look out for all the images and get their href tag which are the links of our images.

After the loop ends the program will stop the execution and all the images will now be available in the directory in which this python file exists.
You can find the code on my GitHub.

Congratulations!! You have now successfully learnt and implemented web scraping to extract images.🥳

That's all folks,

You can follow me on:
GitHub: Rohit Joshi (rohitjoshi6)
LinkedIn: Rohit Joshi

--

--

Rohit Joshi

Google DSC Lead’21 | CS’23 | Frontend Web developer