Master the essential Python skills for data science with this cheat sheet. This comprehensive Python cheat sheet for data science covers everything you need to know, from basic programming concepts to advanced data science techniques. Learn how to use Python for data manipulation, visualization, and machine learning, with code examples, clear explanations, and helpful images.
Nobody can deny that Python has been on the rise in the data science industry and it certainly seems that it’s here to stay.
This rise in popularity in the industry, the long gone infancy of Python packages for data analysis, the low and gradual learning curve and the fact that it is a fully fledged programming language are only a couple of reasons that make Python an exceptional tool for data science.
Although Python is a very readable language, you might still be able to use some help.
That’s enough reason for DataCamp to make a Python cheat sheet for data science, especially for beginners. It can serve as a quick reference for those of you who are just beginning their data science journey or it can serve as a guide to make it easier to learn about and use Python.
(Above is the printable version of this cheat sheet)
This Python cheat sheet will guide you through variables and data types, Strings, Lists, to eventually land at the fundamental package for scientific computing with Python, Numpy.
import numpy
import numpy as np
from math import pi
>>> help(str)
>>> x=5
>>> x
5
Sum of two variables
>>> x+2
7
Subtraction of two variables
>>> x-2
3
Multiplication of two variables
>>> x*2
10
Exponentiation of a variable
>>> x**2
25
Remainder of a variable
>>> x%2
1
Division of a variable
>>> x/float(2)
2.5
Variables to strings
str()
'5', '3.45', 'True'
Variables to integers
int()
5, 3, 1
Variables to floats
float()
5.0, 1.0
Variables to booleans
bool()
True, True, True
>>> my_string = 'thisStringIsAwesome'
>>> my_string
'thisStringIsAwesome'
>>> my_string * 2
'thisStringIsAwesomethisStringIsAwesome'
>>> my_string + 'Innit'
'thisStringIsAwesomeInnit'
>>> 'm' in my_string
'True'
>>> my_string[3]
>>> my_string[4:9]
String to uppercase
>>> my_string.upper()
String to lowercase
>>> my_string.lower()
Count String elements
>>> my_string.count('w')
Replace String elements
>>> my_string.replace('e', 'i')
Strip whitespace from ends
>>> my_string.strip()
>>> a = 'is'
>>> b = 'nice'
>>> my_list = ['my', 'list', a, b]
>>> my_list2 = [[4,5,6,7], [3,4,5,6]]
Subset
Select item at index 1
>>> my_list[1]
Select 3rd last item
>>> my_list[-3]
Select items at index 1 and 2
>>> my_list[1:3]
Select items after index 0
>>> my_list[1:]
Select items before index 3
>>> my_list[:3]
Copy my_list
>>> my_list[:]
my_list[list][itemOfList]
>>> my_list2[1][0]
>>> my_list2[1][:2]
>>> my_list + my_list
['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice']
>>> my_list * 2
['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice']
>>> my_list2 > 4
>>> True
Get the index of an item
>>> my_list.index(a)
Count an item
>>> my_list.count(a)
Append an item at a time
>>> my_list.append('!')
Remove an item
>>> my_list.remove('!')
Remove an item
>>> del(my_list[0:1])
Reverse the list
>>> my_list.reverse()
Append an item
>>> my_list.extend('!')
Remove an item
>>> my_list.pop(-1)
Insert an item
>>> my_list.insert(0,'!')
Sort the list
>>> my_list.sort()
>>> my_list = [1, 2, 3, 4]
>>> my_array = np.array(my_list)
>>> my_2darray = np.array([[1,2,3],[4,5,6]])
Select item at index 1
>>> my_array[1]
2
Select items at index 0 and 1
>>> my_array[0:2]
array([1, 2])
my_2darray[rows, columns]
>>> my_2darray[:,0]
array([1, 4])
>>> my_array > 3
array([False, False, False, True], dtype=bool)
>>> my_array * 2
array([2, 4, 6, 8])
>>> my_array + np.array([5, 6, 7, 8])
array([6, 8, 10, 12])])
Get the dimensions of the array
>>> my_array.shape
Append items to an array
>>> np.append(other_array)
Insert items in an array
>>> np.insert(my_array, 1, 5)
Delete items in an array
>>> np.delete(my_array,[1])
Mean of the array
>>> np.mean(my_array)
Median of the array
>>> np.median(my_array)
Correlation coefficient
>>> my_array.corrcoef()
Standard deviation
>>> np.std(my_array)
#python #data-science #numpy #machine-learning