Shubham Ankit

Shubham Ankit

1563520340

Learning Model Building in Scikit-learn : A Python Machine Learning Library

scikit-learn is an open source Python library that implements a range of machine learning, pre-processing, cross-validation and visualization algorithms using a unified interface.

Important features of scikit-learn:

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
In this article, we are going to see how we can easily build a machine learning model using scikit-learn.

Installation:

Scikit-learn requires:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Before installing scikit-learn, ensure that you have NumPy and SciPy installed. Once you have a working installation of NumPy and SciPy, the easiest way to install scikit-learn is using pip:

pip install -U scikit-learn

Let us get started with the modeling process now.

Step 1: Load a dataset

A dataset is nothing but a collection of data. A dataset generally has two main components:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
**Loading exemplar dataset: **scikit-learn comes loaded with a few example datasets like the iris and digits datasets for classification and the boston house prices dataset for regression.

Given below is an example of how one can load an exemplar dataset:

# load the iris dataset as an example
from sklearn.datasets import load_iris
iris = load_iris()
  
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
  
# store the feature and target names
feature_names = iris.feature_names
target_names = iris.target_names
  
# printing features and target names of our dataset
print("Feature names:", feature_names)
print("Target names:", target_names)
  
# X and y are numpy arrays
print("\nType of X is:", type(X))
  
# printing first 5 input rows
print("\nFirst 5 rows of X:\n", X[:5])

Output:

Feature names: ['sepal length (cm)','sepal width (cm)',
                'petal length (cm)','petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

Type of X is: 

First 5 rows of X:
 [[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]

Loading external dataset: Now, consider the case when we want to load an external dataset. For this purpose, we can use pandas library for easily loading and manipulating dataset.

To install pandas, use the following pip command:

pip install pandas

In pandas, important data types are:

Series: Series is a one-dimensional labeled array capable of holding any data type.

DataFrame: It is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

Note: The CSV file used in example below can be downloaded from here: weather.csv

import pandas as pd
  
# reading csv file
data = pd.read_csv('weather.csv')
  
# shape of dataset
print("Shape:", data.shape)
  
# column names
print("\nFeatures:", data.columns)
  
# storing the feature matrix (X) and response vector (y)
X = data[data.columns[:-1]]
y = data[data.columns[-1]]
  
# printing first 5 rows of feature matrix
print("\nFeature matrix:\n", X.head())
  
# printing first 5 values of response vector
print("\nResponse vector:\n", y.head())

Output:

Shape: (14, 5)

Features: Index([u'Outlook', u'Temperature', u'Humidity', 
                u'Windy', u'Play'], dtype='object')

Feature matrix:
     Outlook Temperature Humidity  Windy
0  overcast         hot     high  False
1  overcast        cool   normal   True
2  overcast        mild     high   True
3  overcast         hot   normal  False
4     rainy        mild     high  False

Response vector:
0    yes
1    yes
2    yes
3    yes
4    yes
Name: Play, dtype: object

Step 2: Splitting the dataset

One important aspect of all machine learning models is to determine their accuracy. Now, in order to determine their accuracy, one can train the model using the given dataset and then predict the response values for the same dataset using that model and hence, find the accuracy of the model.

But this method has several flaws in it, like:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
A better option is to split our data into two parts: first one for training our machine learning model, and second one for testing our model.

To summarize:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Advantages of train/test split:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Consider the example below:

# load the iris dataset as an example
from sklearn.datasets import load_iris
iris = load_iris()
  
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
  
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
  
# printing the shapes of the new X objects
print(X_train.shape)
print(X_test.shape)
  
# printing the shapes of the new y objects
print(y_train.shape)
print(y_test.shape)

Output:

(90L, 4L)
(60L, 4L)
(90L,)
(60L,)

The train_test_split function takes several arguments which are explained below:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Step 3: Training the model

Now, its time to train some prediction-model using our dataset. Scikit-learn provides a wide range of machine learning algorithms which have a unified/consistent interface for fitting, predicting accuracy, etc.

The example given below uses KNN (K nearest neighbors) classifier.

Note: We will not go into the details of how the algorithm works as we are interested in understanding its implementation only.

Now, consider the example below:

# load the iris dataset as an example
from sklearn.datasets import load_iris
iris = load_iris()
  
# store the feature matrix (X) and response vector (y)
X = iris.data
y = iris.target
  
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
  
# training the model on training set
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
  
# making predictions on the testing set
y_pred = knn.predict(X_test)
  
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("kNN model accuracy:", metrics.accuracy_score(y_test, y_pred))
  
# making prediction for out of sample data
sample = [[3, 5, 4, 2], [2, 3, 5, 4]]
preds = knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds]
print("Predictions:", pred_species)
  
# saving the model
from sklearn.externals import joblib
joblib.dump(knn, 'iris_knn.pkl')

Output:

kNN model accuracy: 0.983333333333
Predictions: ['versicolor', 'virginica']

Important points to note from the above code:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

knn = KNeighborsClassifier(n_neighbors=3)

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

knn.fit(X_train, y_train)

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

y_pred = knn.predict(X_test)

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

print(metrics.accuracy_score(y_test, y_pred))

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

sample = [[3, 5, 4, 2], [2, 3, 5, 4]] preds = knn.predict(sample)

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

joblib.dump(knn, 'iris_knn.pkl')

Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.

knn = joblib.load('iris_knn.pkl')

As we approach the end of this article, here are some benefits of using scikit-learn over some other machine learning libraries(like R):
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
References:
Simple and efficient tools for data mining and data analysis. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.Accessible to everybody and reusable in various contexts.Built on the top of NumPy, SciPy, and matplotlib.Open source, commercially usable – BSD license.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

#python #machine-learning

What is GEEK

Buddha Community

Learning Model Building in Scikit-learn : A Python Machine Learning Library
Ray  Patel

Ray Patel

1625843760

Python Packages in SQL Server – Get Started with SQL Server Machine Learning Services

Introduction

When installing Machine Learning Services in SQL Server by default few Python Packages are installed. In this article, we will have a look on how to get those installed python package information.

Python Packages

When we choose Python as Machine Learning Service during installation, the following packages are installed in SQL Server,

  • revoscalepy – This Microsoft Python package is used for remote compute contexts, streaming, parallel execution of rx functions for data import and transformation, modeling, visualization, and analysis.
  • microsoftml – This is another Microsoft Python package which adds machine learning algorithms in Python.
  • Anaconda 4.2 – Anaconda is an opensource Python package

#machine learning #sql server #executing python in sql server #machine learning using python #machine learning with sql server #ml in sql server using python #python in sql server ml #python packages #python packages for machine learning services #sql server machine learning services

Ray  Patel

Ray Patel

1619518440

top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

1) swap two numbers.

2) Reversing a string in Python.

3) Create a single string from all the elements in list.

4) Chaining Of Comparison Operators.

5) Print The File Path Of Imported Modules.

6) Return Multiple Values From Functions.

7) Find The Most Frequent Value In A List.

8) Check The Memory Usage Of An Object.

#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners

Ray  Patel

Ray Patel

1619643600

Top Machine Learning Projects in Python For Beginners [2021]

If you want to become a machine learning professional, you’d have to gain experience using its technologies. The best way to do so is by completing projects. That’s why in this article, we’re sharing multiple machine learning projects in Python so you can quickly start testing your skills and gain valuable experience.

However, before you begin, make sure that you’re familiar with machine learning and its algorithm. If you haven’t worked on a project before, don’t worry because we have also shared a detailed tutorial on one project:

#artificial intelligence #machine learning #machine learning in python #machine learning projects #machine learning projects in python #python

Top Machine Learning Projects in Python For Beginners [2021] | upGrad blog

If you want to become a machine learning professional, you’d have to gain experience using its technologies. The best way to do so is by completing projects. That’s why in this article, we’re sharing multiple machine learning projects in Python so you can quickly start testing your skills and gain valuable experience.

However, before you begin, make sure that you’re familiar with machine learning and its algorithm. If you haven’t worked on a project before, don’t worry because we have also shared a detailed tutorial on one project:

The Iris Dataset: For the Beginners

The Iris dataset is easily one of the most popular machine learning projects in Python. It is relatively small, but its simplicity and compact size make it perfect for beginners. If you haven’t worked on any machine learning projects in Python, you should start with it. The Iris dataset is a collection of flower sepal and petal sizes of the flower Iris. It has three classes, with 50 instances in every one of them.

We’ve provided sample code on various places, but you should only use it to understand how it works. Implementing the code without understanding it would fail the premise of doing the project. So be sure to understand the code well before implementing it.

#artificial intelligence #machine learning #machine learning in python #machine learning projects #machine learning projects in python #python

sophia tondon

sophia tondon

1620898103

5 Latest Technology Trends of Machine Learning for 2021

Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.

#machinelearningapps #machinelearningdevelopers #machinelearningexpert #machinelearningexperts #expertmachinelearningservices #topmachinelearningcompanies #machinelearningdevelopmentcompany

Visit Blog- https://www.xplace.com/article/8743

#machine learning companies #top machine learning companies #machine learning development company #expert machine learning services #machine learning experts #machine learning expert