In this tutorial, we assume you know the very fundamentals of Python, including working with strings, integers, and floats
Lists are one of the most powerful data types in Python. In this Python List Tutorial, you’ll learn how to work with lists while analyzing data about mobile apps.
In this tutorial, we assume you know the very fundamentals of Python, including working with strings, integers, and floats.
We’ll be working with this table of data, taken from Mobile App Store data set (Ramanathan Perumal):
|Clash of Clans||0.0||USD||2130805||4.5|
|Pandora – Music & Radio||0.0||USD||1126879||4.0|
Each value in the table is a data point. For instance, the first row (after the column titles) has five data points:
A collection of data points make up a dataset. We can understand our entire table above as a collection of data points, so we call the entire table a dataset. We can see that our data set has five rows and five columns.
Using our understanding of Python types, we might think we could store each data point in its own variable — for instance, this is how we might store the first row’s data points:
Above, we stored:
Creating a variable for each data point in our data set would be a cumbersome process. Fortunately, we can store data more efficiently using lists. This is how we can create a list of data points for the first row:
To create the list above, we:
'Facebook', 0.0, 'USD', 2974676, 3.5
['Facebook', 0.0, 'USD', 2974676, 3.5]
After we created the list, we stored it in the computer’s memory by assigning it to a variable named
To create a list of data points, we only need to:
Now let’s create five lists, one for each row in our dataset:
A list can contain a variety of data types. A list like
[4, 5, 6] has identical data types (only integers), while the list
['Facebook', 0.0, 'USD', 2974676, 3.5] has mixed data types:
['Facebook', 0.0, 'USD', 2974676, 3.5] list has five data points. To find the length of a list, we can use the
For small lists, we can just count the data points on our screens to find the length, but the
len() command will prove very useful whenever you work with lists containing many elements, or need to write code for data where you don’t know the length ahead of time.
Each element (data point) in a list has a specific number associated with it, called an index number. The indexing always starts at 0, so the first element will have the index number 0, the second element the index number 1, and so on.
To quickly find the index of a list element, identify its position number in the list, and then subtract 1. For example, the string
'USD' is the third element of the list (position number 3), so its index number must be 2 since 3 – 1 = 2.
The index numbers help us retrieve individual elements from a list. Looking back at the list
row_1 from the code example above, we can retrieve the first element (the string
'Facebook') with the index number 0 by running the code
The syntax for retrieving individual list elements follows the model
list_name[index_number]. For instance, the name of our list above is
row_1 and the index number of the first element is
0 — following the
list_name[index_number] model, we get
row_1, where the index number
0 is in square brackets after the variable name
This is how we can retrieve each element in
Retrieving list elements makes it easier to perform operations. For instance, we can select the ratings for Facebook and Instagram, and find the average or the difference between the two:
Let’s use list indexing to extract the number of ratings from the first three rows and then average them:
In Python, we have two indexing systems for lists:
In practice, we almost always use positive indexing to retrieve list elements. Negative indexing is useful when we want to select the last element of a list — especially if the list is long, and we can’t tell the length by counting.
Notice that if we use an index number that is outside the range of the two indexing systems, we’ll get an
Let’s use negative indexing to extract the user rating (the last value) from each of the first three rows and then average them.
Instead of selecting list elements individually, we can use a syntax shortcut to select two or more consecutive elements:
When we select the first
n elements (
n stands for a number) from a list named
a_list, we can use the syntax shortcut
a_list[0:n]. In the example above, we needed to select the first three elements from the list
row_3, so we used
When we selected the first three elements, we sliced a part of the list. For this reason, the process of selecting a part of a list is called list slicing.
There are many ways that we might want to slice a list:
To retrieve any list slice we want:
mrepresents the index number of the first element of the slice; and
nrepresents the index number of the last element of the slice plus one (if the last element has the index number 2, then we
nwill be 3, if the last element has the index number 4, then
nwill be 5, and so on).
When we need to select the first or last
x elements (
x stands for a number), we can use even simpler syntax shortcuts:
a_list[:x]when we want to select the first
a_list[-x:]when we want to select the last
Let’s look at how we extract the first four elements from the first row (with data about Facebook):
The last three elements from that same row:
And elements three and four from the fifth row (with data about Pandora):
Previously, we introduced lists as a better alternative to using one variable per data point. Instead of having a separate variable for each of the five data points
'Facebook', 0.0, 'USD', 2974676, 3.5, we can bundle the data points together into a list, and then store the list in a single variable.
So far, we’ve been working with a data set having five rows, and we’ve been storing each row as a list in a separate variable (the variables
row_5). If we had a data set with 5,000 rows, however, we’d end up with 5,000 variables, which will make our code messy and almost impossible to work with.
To solve this problem, we can store our five variables in a single list:
As we can see,
data_set is a list that stores five other lists (
row_5). A list that contains other lists is called a list of lists.
data_set variable is still a list, which means we can retrieve individual list elements and perform list slicing using the syntax we learned. Below, we:
row_2) by performing list slicing using
We’ll often need to retrieve individual elements from a list that’s part of a list of lists — for instance, we may want to retrieve the value
['Facebook', 0.0, 'USD', 2974676, 3.5], which is part of the
data_set list of lists. Below, we extract
data_set using what we’ve learned:
data_set, and assign the result to a variable named
fb_row, which outputs
['Facebook', 0.0, 'USD', 2974676, 3.5].
fb_rowis a list), and assign the result to a variable named
fb_rating, which outputs
Above, we retrieved
3.5 in two steps: we first retrieved
data_set, and then we retrieved
fb_row[-1]. However, there’s an easier way to retrieve the same value of
3.5 by chaining the two indices (
[-1]) — the code
Above, we’ve seen two ways of retrieving the value
3.5. Both ways lead to the same output (
3.5), but the second way involves less typing because it elegantly combines the steps we see in the first case. While you can choose either option, people generally choose the second one.
Let’s transform our five individual lists into a list of lists:
Previously in this mission, we were interested in computing the average rating of an app. This was a doable task when we were working with only three rows, but the more rows we add the harder it becomes. Using our strategy from earlier, we’ll:
As you can see, with five ratings this becomes complex. If we were working with data containing 1,000s of rows, it would require an impractical amount of code! We need to find a simple way to retrieve many ratings.
Looking at the code example above, we see that a process keeps repeating: we select the last list element for each list within
app_data_set stores five lists, so we repeat the same process five times. What if we could tell Python directly that we want to repeat this process for each list in
Fortunately, we can do that — Python offers us an easy way to repeat a process, which helps us enormously when we need to repeat a process hundreds, thousands, or even millions of times.
Let’s say we have a list
[3, 5, 1, 2] assigned to a variable
ratings, and we want to repeat the following process: for each element in
ratings, print that element. This is how we could translate that into Python syntax:
In our first example above, the process we wanted to repeat was ”extract the last element for each list in
app_data_set“. This is how we can translate that process into Python syntax:
Let’s try to get a better understanding of what happens above. Python isolates, one at a time, each list element from
app_data_set, and assigns it to
each_list (which basically becomes a variable that stores a list — we’ll discuss this more on the next screen):
The code in the last diagram above is a much more simplified and abstracted version of the code below:
Using the technique above requires us to write a line of code for every row in the data set. But using the
for each_list in app_data_set technique requires us to write only two lines of code regardless of the number of rows in the data set — the data set can have five rows or one million.
Our intermediate goal is to use this new technique to compute the average rating for our five rows above, and our final goal is to compute the average rating for our data set with 7,197 rows. We’ll do exactly that over the next few screens of this mission, but for now, we’ll focus on practicing this technique to get a good grasp of it.
Before writing any code, we need to indent the code we want repeated four space characters to the right:
Technically, we only need to indent the code at least one space character to the right, but the convention in the Python community is to use four space characters. This helps with readability — it will be easier for other people who follow this convention to read your code, and it will be easier for you to read theirs.
Let’s use this technique to print the name and rating of each app:
The technique we’ve just learned is called a loop. Loops are an incredibly useful tool that are used to perform repetitive processes with Python lists. Because we always start with
for (like in
for some_variable in some_list:), this technique is known as a for loop.
These are the structural parts of a for loop:
The indented code in the body gets executed the same number of times as elements in the iterable variable. If the iterable variable is a list that has three elements, the indented code in the body gets executed three times. We call each code execution an iteration, so there’ll be three iterations for a list that has three elements. For each iteration, the iteration variable will take a different value, following this pattern:
The name of the interation variable can be whatever you like – if you replaced
value in the code above with
dog, the code will work exactly the same way. That said, it’s convention to use something that helps communicate what the data is.
The code outside the loop body can interact with the code inside the loop body. For instance, in the code below we:
a_sumwith a value of zero outside the loop body.
a_list. For every iteration of the loop, we:
valueand the current value stored in
a_sumwas defined outside the loop body).
a_sum(inside the loop body).
a_sumvariable (inside the loop body). Notice that the value of
a_sumchanges after each addition. At the end of the loop,
a_sumhas the value
9, which is equivalent to the sum of the numbers in
1 + 3 + 5).
Above, we created a way to sum up the numbers in a list. We can use this technique to sum up the ratings in our dataset. Once we have the sum, we only need to divide by the number of ratings to get the average value.
Now we’ll learn an alternative way to compute the average rating value. Once we create a list, we can add (or append) values to it using the
Unlike other commands we’ve learned, notice that
append() has a special syntactical usage, following the pattern
list_name.append() rather than being simply used as
Now that we know how to append values to a list, we can take the steps below to compute the average app rating:
sum()command to sum up all the ratings (to be able to use
sum(), we’ll need to store the ratings as floats or integers); and then
Below, we can see the steps above implemented for our data set with five rows:
We can also use
append() to add another row to our list of lists by appending the data as a list. Let’s look at how that works:
Now, let’s use the technique we learned above to calculate the average rating of all six apps:
In this tutorial we learned how to:
Thank for reading !
Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).
Python Programming Tutorials For Beginners
This Edureka session on Python Modules Tutorial will help you understand the concept of modules in python, why, and how we can use modules in Python.
Web scraping allows us to extract information from web pages. In this tutorial, you'll learn how to perform web scraping with Python and BeautifulSoup.The internet is an absolutely massive source of data. Unfortunately, the vast majority if it isn’t available in conveniently organized CSV files for download and analysis. If you want to capture data from many websites, you’ll need to try web scraping.
In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.