No Result
View All Result
DevRescue
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us
DevRescue
Home Blog Python

Load CSV Files with Python and pandas

by Khaleel O.
July 10, 2021
in Python
Reading Time: 5 mins read
A A
python load csv
Python Load CSV

For this Load CSV Files with Python and pandas tutorial, we will show you the following:

  • Import a CSV file
  • Do some basic EDA (Exploratory Data Analysis)
  • Select rows and columns

We will be using Python 3.8.10 and the pandas module to accomplish this.


The dataset we will be using for this tutorial is Box Office Mojo’s listing of the domestic (US only) lifetime gross (US Dollars), ranking and production year of 14000+ movies via Data.World. You can find it HERE.

DevRescue.com
Box Office Mojo Domestic (US Only) Box Office Results via Data.World

Next, we begin writing our code!โšกโšก We will be using the pandas module, as previously mentioned to import python csv files. The pandas module is one of the best tools you can use when it comes to working with relational data in the Python programming language. If you don’t already have pandas installed, you can install it with the following command at the terminal:

pip install pandas

Next we write the actual logic:

# Import packages
import requests
import pandas as pd #pip install pandas
import io

url = 'https://raw.githubusercontent.com/devrescue/python/main/datasets/boxoffice.csv'


s = requests.get(url).content
df = pd.read_csv(io.StringIO(s.decode('utf-8')))

print(df.info())
print(df.head())
print(df.describe())
print(df["lifetime_gross"].describe().apply(lambda x: format(x, 'f')))
print(df.tail())

Let’s explain what is happening here:

  1. Import our modules: We are using the os module which allows us to use operating system dependent functionality. We are also using the pandas module, as previously mentioned.
  2. Import the data from the CSV file: Using the requests module we can grab the CSV directly from a URL.
  3. Create Pandas DataFrame: pd.read_csv(boxoffice) reads a CSV (Comma Separated Value) file into a pandas DataFrame called df. A DataFrame is the primary pandas data structure. It is two-dimensional, which means it has columns and rows. Moving forward, our dataset is contained in the df DataFrame.
  4. Perform Simple Exploratory Data Analysis (EDA): EDA is what most Data Scientists do to get an understanding of the data they are working with before they do more in depth analysis:
    • The info() method gives us basic information about the dataframe df such as the column names, types, number of rows and memory usage.
    • The head(n=5) method gives us the first n rows in the dataframe based on position. If left blank, it returns the first 5 rows.
    • The describe() method gives us general descriptive statistics about the dataset contained in dataframe df.
    • The tail(n=5) method give us the last n rows in the dataframe based on position. If left blank, it returns the last 5 rows.

The Python Notebook of this tutorial will allow you to see what each of these statements do. Find it HERE.

Now that we have imported our CSV file, and have some idea what that data represents, we can now use indexing to extract the rows and columns.

Using the iloc[] method we can do selection by position. For example, we already know from our EAD that there are 16,542 rows (0 to 16541) so we can use this code to display the first and last row using iloc[]:

print(df.iloc[[0,16541]])

We give iloc the index of the first and last row which are 0 and 16541 respectively.

iloc[] also accepts a single integer if you want to return just 1 row.

We can select rows and columns simultaneously also. For example, if we want only the title, lifetime_gross and year for the first 5 rows we can do the following:

print(df.iloc[:5,[1,3,4]])

Using the iloc[] method we can use indexing intelligently to get our data. The :5 is how we use slicing to give us the first five rows of the dataset. The [1,3,4] gives us the 2nd, 4th and 5th column of the dataset.

Finally, we can select data from our imported CSV dataset based on a particular condition. For example, if we want only movies where the lifetime_gross is more than 500M dollars we can do this:

df[df["lifetime_gross"] > 500000000]

Using indexing once again, we can select rows from our imported dataset that meet our condition.

So we have been able to import a CSV file using pandas, create a DataFrame out of that CSV data, do EDA on our dataset as well as select rows and columns from our dataset for further processing. Yaaaaay ๐Ÿ™Œ๐Ÿ™Œ๐Ÿ™Œ.

Find out more about pandas HERE and you can find the full code on GitHub HERE. We hope this tutorial was helpful, thank you for reading. ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ

Wanna learn more? We have other awesome Python Tutorials which you can find HERE.

Tags: machine learningPython CSV Filespython pandas
Previous Post

Break List Into Chunks with Python

Next Post

Simple Python k-Nearest Neighbors Tutorial

Khaleel O.

Khaleel O.

I love to share, educate and help developers. I have 14+ years experience in IT. Currently transitioning from Systems Administration to DevOps. Avid reader, intellectual and dreamer. Enter Freely, Go safely, And leave something of the happiness you bring.

Related Posts

Python

Python Fibonacci Recursive Solution

by Khaleel O.
January 16, 2024
0
0

Let's do a Python Fibonacci Recursive Solution. Let's go! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ The Fibonacci sequence is a series of numbers in which...

Read moreDetails
Python

Python Slice String List Tuple

by Khaleel O.
January 16, 2024
0
0

Let's do a Python Slice string list tuple how-to tutorial. Let's go! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ In Python, a slice is a feature...

Read moreDetails
Python

Python Blowfish Encryption Example

by Khaleel O.
January 14, 2024
0
0

Let's do a Python Blowfish Encryption example. Let's go! ๐Ÿ”ฅ ๐Ÿ”ฅ Blowfish is a symmetric-key block cipher algorithm designed for...

Read moreDetails
Python

Python Deque Methods

by Khaleel O.
January 14, 2024
0
0

In this post we'll list Python Deque Methods. Ready? Let's go! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ A deque (double-ended queue) in Python is a...

Read moreDetails

DevRescue ยฉ 2021 All Rights Reserved. Privacy Policy. Cookie Policy

Manage your privacy

To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.

Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
Manage your privacy
To provide the best experiences, DevRescue.com will use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
No Result
View All Result
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us

DevRescue ยฉ 2022 All Rights Reserved Privacy Policy