# Linear Regression in Python | Tutorial for Beginners

A Beginner’s Guide to Linear Regression in Python

A Beginner’s Guide to Linear Regression in Python
What linear regression is and how it can be implemented for both two variables and multiple variables using Scikit-Learn, which is one of the most popular machine learning libraries for Python. The term “linearity” in algebra refers to a linear relationship between two or more variables. If we draw this relationship in a two-dimensional space (between two variables), we get a straight line.

## Learn Python Tutorial from Basic to Advance

Basic programming concept in any language will help but not require to attend this tutorial

Description
Become a Python Programmer and learn one of employer's most requested skills of 21st century!

This is the most comprehensive, yet straight-forward, course for the Python programming language on Simpliv! Whether you have never programmed before, already know basic syntax, or want to learn about the advanced features of Python, this course is for you! In this course we will teach you Python 3. (Note, we also provide older Python 2 notes in case you need them)

With over 40 lectures and more than 3 hours of video this comprehensive course leaves no stone unturned! This course includes tests, and homework assignments as well as 3 major projects to create a Python project portfolio!

This course will teach you Python in a practical manner, with every lecture comes a full coding screencast and a corresponding code notebook! Learn in whatever manner is best for you!

We will start by helping you get Python installed on your computer, regardless of your operating system, whether its Linux, MacOS, or Windows, we've got you covered!

We cover a wide variety of topics, including:

Command Line Basics
Installing Python
Running Python Code
Strings
Lists
Dictionaries
Tuples
Sets
Number Data Types
Print Formatting
Functions
Scope
Built-in Functions
Debugging and Error Handling
Modules
External Modules
Object Oriented Programming
Inheritance
Polymorphism
File I/O
Web scrapping
Database Connection
Email sending
and much more!
Project that we will complete:

Guess the number
Guess the word using speech recognition
Love Calculator
Click and save image using openCV
Ludo game dice simulator
open wikipedia on command prompt

So what are you waiting for? Learn Python in a way that will advance your career and increase your knowledge, all in a fun and practical way!

Basic knowledge
Basic programming concept in any language will help but not require to attend this tutorial
What will you learn
Learn to use Python professionally, learning both Python 2 and Python 3!
Create games with Python, like Tic Tac Toe and Blackjack!
Learn advanced Python features, like the collections module and how to work with timestamps!
Learn to use Object Oriented Programming with classes!
Understand complex topics, like decorators.
Understand how to use both the pycharm and create .py files
Get an understanding of how to create GUIs in the pycharm!
Build a complete understanding of Python from the ground up!

## Best Way to Learn Python Programming Language | Python Tutorial

Worried that you have no experience in handling Python? Don’t! Python programming language teaching from Simpliv puts you right there to be able to write Python programs with ease. Place object-oriented programing in a Python context and use Python to perform complicated text processing.

Description
A Note on the Python versions 2 and 3: The code-alongs in this class all use Python 2.7. Source code (with copious amounts of comments) is attached as a resource with all the code-alongs. The source code has been provided for both Python 2 and Python 3 wherever possible.

What's Covered:

Introductory Python: Functional language constructs; Python syntax; Lists, dictionaries, functions and function objects; Lambda functions; iterators, exceptions and file-handling
Database operations: Just as much database knowledge as you need to do data manipulation in Python
Auto-generating spreadsheets: Kill the drudgery of reporting tasks with xlsxwriter; automated reports that combine database operations with spreadsheet auto-generation
Text processing and NLP: Python’s powerful tools for text processing - nltk and others.
Website scraping using Beautiful Soup: Scrapers for the New York Times and Washington Post
Machine Learning : Use sk-learn to apply machine learning techniques like KMeans clustering
Hundreds of lines of code with hundreds of lines of comments
Drill #1: Download a zip file from the National Stock Exchange of India; unzip and process to find the 3 most actively traded securities for the day
Drill #2: Store stock-exchange time-series data for 3 years in a database. On-demand, generate a report with a time-series for a given stock ticker
Drill #3: Scrape a news article URL and auto-summarize into 3 sentences
Drill #4: Scrape newspapers and a blog and apply several machine learning techniques - classification and clustering to these
Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to NOT offer additional technical support over email or in-person. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

Thank you for your patience and understanding!

Who is the target audience?

Yep! Folks with zero programming experience looking to learn a new skill
Machine Learning and Language Processing folks looking to apply concepts in a full-fledged programming language
Yep! Computer Science students or software engineers with no experience in Java, but experience in Python, C++ or even C#. You might need to skip over some bits, but in general the class will still have new learning to offer you :-)
Basic knowledge
No prior programming experience is needed :-)
The course will use a Python IDE (integrated development environment) called iPython from Anaconda. We will go through a step-by-step procedure on downloading and installing this IDE.
What will you learn
Pick up programming even if you have NO programming experience at all
Write Python programs of moderate complexity
Perform complicated text processing - splitting articles into sentences and words and doing things with them
Work with files, including creating Excel spreadsheets and working with zip files
Apply simple machine learning and natural language processing concepts such as classification, clustering and summarization
Understand Object-Oriented Programming in a Python context

## Learning Model Building in Scikit-learn : A Python Machine Learning Library

scikit-learn is an open source Python library that implements a range of machine learning, pre-processing, cross-validation and visualization algorithms using a unified interface.

scikit-learn is an open source Python library that implements a range of machine learning, pre-processing, cross-validation and visualization algorithms using a unified interface.

### Important features of scikit-learn:

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
In this article, we are going to see how we can easily build a machine learning model using scikit-learn.

#### Installation:

Scikit-learn requires:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Before installing scikit-learn, ensure that you have NumPy and SciPy installed. Once you have a working installation of NumPy and SciPy, the easiest way to install scikit-learn is using pip:

``````pip install -U scikit-learn

``````

Let us get started with the modeling process now.

A dataset is nothing but a collection of data. A dataset generally has two main components:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
**Loading exemplar dataset: **scikit-learn comes loaded with a few example datasets like the iris and digits datasets for classification and the boston house prices dataset for regression.

Given below is an example of how one can load an exemplar dataset:

``````# load the iris dataset as an example
&nbsp;&nbsp;
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
&nbsp;&nbsp;
# store the feature and target names
feature_names = iris.feature_names
target_names = iris.target_names
&nbsp;&nbsp;
# printing features and target names of our dataset
print("Feature names:", feature_names)
print("Target names:", target_names)
&nbsp;&nbsp;
# X and y are numpy arrays
print("\nType of X is:", type(X))
&nbsp;&nbsp;
# printing first 5 input rows
print("\nFirst 5 rows of X:\n", X[:5])

``````

Output:

``````Feature names: ['sepal length (cm)','sepal width (cm)',
'petal length (cm)','petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

Type of X is:

First 5 rows of X:
[[ 5.1  3.5  1.4  0.2]
[ 4.9  3.   1.4  0.2]
[ 4.7  3.2  1.3  0.2]
[ 4.6  3.1  1.5  0.2]
[ 5.   3.6  1.4  0.2]]

``````

Loading external dataset: Now, consider the case when we want to load an external dataset. For this purpose, we can use pandas library for easily loading and manipulating dataset.

To install pandas, use the following pip command:

``````pip install pandas

``````

In pandas, important data types are:

Series: Series is a one-dimensional labeled array capable of holding any data type.

DataFrame: It is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

Note: The CSV file used in example below can be downloaded from here: weather.csv

``````import pandas as pd
&nbsp;&nbsp;
&nbsp;&nbsp;
# shape of dataset
print("Shape:", data.shape)
&nbsp;&nbsp;
# column names
print("\nFeatures:", data.columns)
&nbsp;&nbsp;
# storing the feature matrix (X) and response vector (y)
X = data[data.columns[:-1]]
y = data[data.columns[-1]]
&nbsp;&nbsp;
# printing first 5 rows of feature matrix
&nbsp;&nbsp;
# printing first 5 values of response vector

``````

Output:

``````Shape: (14, 5)

Features: Index([u'Outlook', u'Temperature', u'Humidity',
u'Windy', u'Play'], dtype='object')

Feature matrix:
Outlook Temperature Humidity  Windy
0  overcast         hot     high  False
1  overcast        cool   normal   True
2  overcast        mild     high   True
3  overcast         hot   normal  False
4     rainy        mild     high  False

Response vector:
0    yes
1    yes
2    yes
3    yes
4    yes
Name: Play, dtype: object

``````

Step 2: Splitting the dataset

One important aspect of all machine learning models is to determine their accuracy. Now, in order to determine their accuracy, one can train the model using the given dataset and then predict the response values for the same dataset using that model and hence, find the accuracy of the model.

But this method has several flaws in it, like:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
A better option is to split our data into two parts: first one for training our machine learning model, and second one for testing our model.

To summarize:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Consider the example below:

``````# load the iris dataset as an example
&nbsp;&nbsp;
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
&nbsp;&nbsp;
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
&nbsp;&nbsp;
# printing the shapes of the new X objects
print(X_train.shape)
print(X_test.shape)
&nbsp;&nbsp;
# printing the shapes of the new y objects
print(y_train.shape)
print(y_test.shape)

``````

Output:

``````(90L, 4L)
(60L, 4L)
(90L,)
(60L,)

``````

The train_test_split function takes several arguments which are explained below:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Step 3: Training the model

Now, its time to train some prediction-model using our dataset. Scikit-learn provides a wide range of machine learning algorithms which have a unified/consistent interface for fitting, predicting accuracy, etc.

The example given below uses KNN (K nearest neighbors) classifier.

Note: We will not go into the details of how the algorithm works as we are interested in understanding its implementation only.

Now, consider the example below:

``````# load the iris dataset as an example
&nbsp;&nbsp;
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
&nbsp;&nbsp;
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
&nbsp;&nbsp;
# training the model on training set
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
&nbsp;&nbsp;
# making predictions on the testing set
y_pred = knn.predict(X_test)
&nbsp;&nbsp;
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("kNN model accuracy:", metrics.accuracy_score(y_test, y_pred))
&nbsp;&nbsp;
# making prediction for out of sample data
sample = [[3, 5, 4, 2], [2, 3, 5, 4]]
preds = knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds]
print("Predictions:", pred_species)
&nbsp;&nbsp;
# saving the model
from sklearn.externals import joblib
joblib.dump(knn, 'iris_knn.pkl')

``````

Output:

``````kNN model accuracy: 0.983333333333
Predictions: ['versicolor', 'virginica']

``````

Important points to note from the above code:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````knn = KNeighborsClassifier(n_neighbors=3)

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````knn.fit(X_train, y_train)

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````y_pred = knn.predict(X_test)

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````print(metrics.accuracy_score(y_test, y_pred))

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````sample = [[3, 5, 4, 2], [2, 3, 5, 4]] preds = knn.predict(sample)

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````joblib.dump(knn, 'iris_knn.pkl')

``````

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

``````knn = joblib.load('iris_knn.pkl')

``````

As we approach the end of this article, here are some benefits of using scikit-learn over some other machine learning libraries(like R):
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
References:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.