In this tutorial we will explore python list comprehension with dataframes. First, let’s install the pandas library. Use the following command at the console/command line/terminal:
pip install pandas
A dataframe is two-dimensional data structure with rows and columns.
A list comprehension is a shorthand syntax for creating new lists based on existing lists. They are one of several methods that are available in Python to accomplish this.
List comprehensions are versatile and powerful and we will show you how to use them over the next few tutorials.
Let’s create our first simple dataframe:
import pandas as pd
data = {'TOP10': ['United States','China','Japan','Germany','India','United Kingdom','France','Italy','Brazil','Canada']}
df = pd.DataFrame(data)
print(df)
#output
# TOP10
#0 United States
#1 China
#2 Japan
#3 Germany
#4 India
#5 United Kingdom
#6 France
#7 Italy
#8 Brazil
#9 Canada
The above code shows how to define a new, simple, one-column dataframe. A simple list of the TOP 10 World Economies by GDP.
How can we get all of these country names in ALL CAPS. We can use the following list comprehension:
upperCaseNames = [x.upper() for x in df["TOP10"]]
print(upperCaseNames)
#output
#['UNITED STATES', 'CHINA', 'JAPAN', 'GERMANY', 'INDIA', 'UNITED KINGDOM', 'FRANCE', 'ITALY', 'BRAZIL', 'CANADA']
Let’s explain what’s going on here:
- A list comprehension must be enclosed with the square brackets []. In this case, our list comprehension is returning a list.
- df[“TOP10”] is our iterable. It is the column named “TOP10″ in our dataframe. An iterable is a python object capable of returning its members one at a time, which allows it to be iterated over in the for-loop. When we iterate over an iterable we are actually “touching” each individual member for the purpose of performing operations on each one.
- x is our iterator variable. It is used to represent each individual member of our iterable. Because we are using a dataframe, each iterator variable represents a row value in the TOP10 Column. For each iteration of the for loop, the value of x will be a different country name, starting at the top and advancing one member at a time until the for loop reaches the end of the dataframe column values.
- x.upper() is our output expression. This is where we define and perform the operation on the iterator variable. In this case we are converting each country name to UPPERCASE.
- upperCaseNames is the list returned by the list comprehension. This new list will contain members of the original list that have been modified by the output expression.
See how easy that was? With list comprehension all it takes is one line. Let’s do another example:
forwardAndReverse = [(x.upper(),x.upper()[::-1]) for x in df["TOP10"]]
print(forwardAndReverse)
#Output
#[('UNITED STATES', 'SETATS DETINU'), ('CHINA', 'ANIHC'), ('JAPAN', 'NAPAJ'), ('GERMANY', 'YNAMREG'), ('INDIA', 'AIDNI'), ('UNITED KINGDOM', 'MODGNIK DETINU'), ('FRANCE', 'ECNARF'), ('ITALY', 'YLATI'), ('BRAZIL', 'LIZARB'), ('CANADA', 'ADANAC')]
In this example we have two operations in our output expression, which is permitted (we can have more than two). The first operation coverts each value to uppercase and the second one gives us a reverse of the uppercase value x. Recall that x is our iterator variable.
So now that we have the basics let’s do something useful with our newfound skills. Let’s re-define our dataframe df.
data = {'Country Name': ['United States','China','Japan','Germany','India','United Kingdom','France','Italy','Brazil','Canada'],
'GDP 2021 Est. Trillions':[22675271000,16642318000,5378136000,4319286000,3124650000,3049704000,2938271000,2106287000,1883487000,1806707000]}
df = pd.DataFrame(data, index = [1,2,3,4,5,6,7,8,9,10])
print(df)
#Output
# Country Name GDP 2021 Est. Trillions
#1 United States 22675271000000
#2 China 16642318000000
#3 Japan 5378136000000
#4 Germany 4319286000000
#5 India 3124650000000
#6 United Kingdom 3049704000000
#7 France 2938271000000
#8 Italy 2106287000000
#9 Brazil 1883487000000
#10 Canada 1806707000000
Ok so we redefined our dataframe to better reflect the data we are trying to represent and we also added the GDP column.
Next, let’s make our Country Name column upper case and a bit more professional looking with list comprehension:
df["Country Name"] = [x.upper() for x in df["Country Name"]]
print(df)
#Output
# Country Name GDP 2021 Est. Trillions
#1 UNITED STATES 22675271000000
#2 CHINA 16642318000000
#3 JAPAN 5378136000000
#4 GERMANY 4319286000000
#5 INDIA 3124650000000
#6 UNITED KINGDOM 3049704000000
#7 FRANCE 2938271000000
#8 ITALY 2106287000000
#9 BRAZIL 1883487000000
#10 CANADA 1806707000000
Much better right? We are able to use our new skills to upgrade the look of our pandas dataframe.
Let’s make a second change. We want to shorten the Trillion number just to make our dataset look cleaner and more readable. We can do this with the numerize package. Let’s install it by using the following command at the terminal:
pip install numerize
Once we do that we can now shorten or Trillions column for a neater appearance:
df["GDP 2021 Est. Trillions"] = [ numerize.numerize(x) for x in df["GDP 2021 Est. Trillions"]]
print(df)
#Output
# Country Name GDP 2021 Est. Trillions
#1 UNITED STATES 22.68T
#2 CHINA 16.64T
#3 JAPAN 5.38T
#4 GERMANY 4.32T
#5 INDIA 3.12T
#6 UNITED KINGDOM 3.05T
#7 FRANCE 2.94T
#8 ITALY 2.11T
#9 BRAZIL 1.88T
#10 CANADA 1.81T
See how convenient list comprehensions with pandas dataframes are? We can use loops but list comprehensions allow us to achieve the same in one line which makes for cleaner code.
Click here to go to Part 2 of this tutorial where we continue list comprehension on dataframes with Python. Find the source code for Part 1 HERE. 👌👌👌
Are you worried about your child’s online safety or your employees’ productivity? Do you wonder what they’re accessing on their devices? SentryPC is here to address these concerns. This all-in-one, cloud-based software provides robust activity monitoring, content filtering, and time management, making it ideal for both parental control and employee monitoring. Embrace peace of mind and enhanced efficiency with SentryPC, the proactive solution to your digital monitoring needs. CLICK HERE to get started.