How to Extract IMDb Data With Python and Cinemagoer

The Internet Movie Database (IMDb) is the largest online database containing information related to films, television series, home videos, video games, and streaming content. The online database contains millions of accurate records that you can use to perform data analysis.

Cinemagoer (formerly known as IMDbPY) is a Python library for managing and retrieving the data of the IMDb movie database. You can access data about movies, people, and companies, that can be further used for analysis.

Installing Required Libraries

You need to install the cinemagoer Python library to access the IMDb database. Run the following command in the command prompt to install the library:

pip install cinemagoer

You must have pip installed on your system to install external Python libraries.

The code used in this project is available in a GitHub repository and is free for you to use under the MIT license.

You need to import the cinemagoer library before using it in your code.

from imdb import Cinemagoer
ia = Cinemagoer()

The above code imports the cinemagoer library and creates an instance of the cinemagoer class.

Searching Movies

You can search for movies with a given (or similar) title using the search_movie() method. For example, if you want to search for movies having the title “rock”, you need to run the following code:

from imdb import Cinemagoer

ia = Cinemagoer()

movies = ia.search_movie('rock')

This should print out the first movie it finds, for example:

You can get a movie by its IMDb ID. You can then extract further information like director names, and genres. You need to loop through the list to get individual information.

from imdb import Cinemagoer

ia = Cinemagoer()

movie = ia.get_movie('0468569')


for director in movie['directors']:


for genre in movie['genres']:

In the output, you should see the name of the given movie, its director(s), and its genre(s):

Searching for a Person

You can search for people using the search_person() method. For example, if you want to search for “Heath”, you need to run the following code:

from imdb import Cinemagoer

ia = Cinemagoer()

persons = ia.search_person('Heath')

You’ll see the name of the first matching person the search finds:

Searching Companies

You can search for companies using the search_company() method. For example, if you want to search for “Universal”, you need to run the following code:

from imdb import Cinemagoer

ia = Cinemagoer()

companies = ia.search_company('Universal')

You’ll get the list of all companies that have Universal in their name.

You can also retrieve a person and company data using its ID.

from imdb import Cinemagoer

ia = Cinemagoer()

person = ia.get_person('0005132')
print(person['birth date'])

company = ia.get_company('0005073')

The output will show details of the person and the name of a company:

Finding Top and Bottom Movies

You can retrieve the data for top 250 and bottom 100 movies using the get_top250_movies() and get_bottom100_movies() methods, respectively:

from imdb import Cinemagoer

ia = Cinemagoer()

top = ia.get_top250_movies()

bottom = ia.get_bottom100_movies()

In response, you’ll see the name of the best movie, and the name of the worst:

The cinemagoer library also provides some other methods like get_top250_tv(), get_popular100_movies(), and get_top250_indian_movies().

Data analysis is the evaluation of data using analytical or statistical tools to extract information. The popularity of data analysis is growing every day. It’s now used by businesses, marketing companies, and sports teams. The complete process of data analytics includes defining objectives, posing questions, data collection, data scrubbing, data analysis, and concluding results.

You can get datasets for your projects using Python libraries like Cinemagoer or via online platforms like Kaggle. Alongside full languages like Python and R, you can use other tools like Microsoft Excel, Tableau, and Stata to perform data analysis.

[quads id=2]
Read the full article here

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button