Poppy Cooke

Poppy Cooke

1564452424

Machine Learning Pipelines: Nonlinear Model Stacking

Normally, we face data sets that are fairly linear or can be manipulated into one. But what if the data set that we are examining really should be looked at in a nonlinear way? Step into the world of nonlinear feature engineering. First, we’ll look at examples of nonlinear data. Next, we’ll briefly discuss the K-means algorithm as a means to nonlinear feature engineering. Lastly, we’ll apply K-means stacked on top of logistic regression to build a superior model for classification.Examples of Nonlinear DataNonlinear data occurs quite often in the business world. Examples include, segmenting group behavior (marketing), patterns in inventory by group activity (sales), anomaly detection from previous transactions (finance), etc. To a more concrete example (supply chain / logistics), we can even see it in a visualization of truck driver data of speeding against distance :

From a quick glance, we can see that there are at least 2 groups within this data set. A group split between above 100 distance and below 100 distance. Intuitively, we can see that fitting a linear model here would be horrendous. Thus, we need a different type model. Applying K-means, we can actually find four groups as seen below [1]:

With K-means, we can now assign additional analysis on the above drivers’ data set to produce predictive insights to help businesses categorize drivers’ distance traveled and their speeding patterns. In our case, we’ll apply K-means to our own fictitious data set to save us more steps of feature engineering real life data.K-MeansBefore we begin constructing our data, let’s take some time to go over what K-means actually is. K-means is an algorithm that looks for a certain number of clusters within an unlabeled data set [2]. Take note of the word unlabeled. This means that K-means is an unsupervised learning model. This is super helpful, when you get data but don’t really know how to label it. K-means can help out by labeling groups for you — pretty cool!Applying Nonlinear Feature EngineeringFor our data, we’ll use the make_circles data from sklearn [3]. Alright, let’s get to our hands on example:

#Load up our packages
import pandas as pd
import numpy as np
import sklearn
import scipy
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import OneHotEncoder
from scipy.spatial import Voronoi, voronoi_plot_2d
from sklearn.data sets.samples_generator import make_circles
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
%matplotlib notebook

Our next step is to use create a K-means class. For those of you unfamiliar with classes (not a subject you take in school), think of a class in coding as a super function that has a lot of functions inside it. Now, I know there’s already a k-means clustering algorithm in sklearn, but I really like this class made by Alice Zheng due to detailed comments and the visualization that we’ll soon see [4]:

class KMeansFeaturizer:
    """Transforms numeric data into k-means cluster memberships.
    
    This transformer runs k-means on the input data and converts each data point
    into the id of the closest cluster. If a target variable is present, it is 
    scaled and included as input to k-means in order to derive clusters that
    obey the classification boundary as well as group similar points together.    Parameters
    ----------
    k: integer, optional, default 100
        The number of clusters to group data into.    target_scale: float, [0, infty], optional, default 5.0
        The scaling factor for the target variable. Set this to zero to ignore
        the target. For classification problems, larger `target_scale` values 
        will produce clusters that better respect the class boundary.    random_state : integer or numpy.RandomState, optional
        This is passed to k-means as the generator used to initialize the 
        kmeans centers. If an integer is given, it fixes the seed. Defaults to 
        the global numpy random number generator.    Attributes
    ----------
    cluster_centers_ : array, [k, n_features]
        Coordinates of cluster centers. n_features does count the target column.
    """    def __init__(self, k=100, target_scale=5.0, random_state=None):
        self.k = k
        self.target_scale = target_scale
        self.random_state = random_state
        self.cluster_encoder = OneHotEncoder().fit(np.array(range(k)).reshape(-1,1))
        
    def fit(self, X, y=None):
        """Runs k-means on the input data and find centroids.        If no target is given (`y` is None) then run vanilla k-means on input
        `X`.         If target `y` is given, then include the target (weighted by 
        `target_scale`) as an extra dimension for k-means clustering. In this 
        case, run k-means twice, first with the target, then an extra iteration
        without.        After fitting, the attribute `cluster_centers_` are set to the k-means
        centroids in the input space represented by `X`.        Parameters
        ----------
        X : array-like or sparse matrix, shape=(n_data_points, n_features)        y : vector of length n_data_points, optional, default None
            If provided, will be weighted with `target_scale` and included in 
            k-means clustering as hint.
        """
        if y is None:
            # No target variable, just do plain k-means
            km_model = KMeans(n_clusters=self.k, 
                              n_init=20, 
                              random_state=self.random_state)
            km_model.fit(X)            self.km_model_ = km_model
            self.cluster_centers_ = km_model.cluster_centers_
            return self        # There is target information. Apply appropriate scaling and include
        # into input data to k-means            
        data_with_target = np.hstack((X, y[:,np.newaxis]*self.target_scale))        # Build a pre-training k-means model on data and target
        km_model_pretrain = KMeans(n_clusters=self.k, 
                                   n_init=20, 
                                   random_state=self.random_state)
        km_model_pretrain.fit(data_with_target)        # Run k-means a second time to get the clusters in the original space
        # without target info. Initialize using centroids found in pre-training.
        # Go through a single iteration of cluster assignment and centroid 
        # recomputation.
        km_model = KMeans(n_clusters=self.k, 
                          init=km_model_pretrain.cluster_centers_[:,:2], 
                          n_init=1, 
                          max_iter=1)
        km_model.fit(X)
        
        self.km_model = km_model
        self.cluster_centers_ = km_model.cluster_centers_
        return self
        
    def transform(self, X, y=None):
        """Outputs the closest cluster id for each input data point.        Parameters
        ----------
        X : array-like or sparse matrix, shape=(n_data_points, n_features)        y : vector of length n_data_points, optional, default None
            Target vector is ignored even if provided.        Returns
        -------
        cluster_ids : array, shape[n_data_points,1]
        """
        clusters = self.km_model.predict(X)
        return self.cluster_encoder.transform(clusters.reshape(-1,1))
    
    def fit_transform(self, X, y=None):
        """Runs fit followed by transform.
        """
        self.fit(X, y)
        return self.transform(X, y)

Don’t let that huge amount of text bother you. I just put it there incase you wanted to experiment with it on your own projects. Afterwards, we’ll create our training/test set, and set the seed to 420 to get the same results:

# Creating our training and test set
seed = 420training_data, training_labels = make_circles(n_samples=2000, factor=0.2)kmf_hint = KMeansFeaturizer(k=100, target_scale=10, random_state=seed).fit(training_data, training_labels)kmf_no_hint = KMeansFeaturizer(k=100, target_scale=0, random_state=seed).fit(training_data, training_labels)def kmeans_voronoi_plot(X, y, cluster_centers, ax):
    #Plots Voronoi diagram of k-means clusters overlaid with data
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='Set1', alpha=0.2)
    vor = Voronoi(cluster_centers)
    voronoi_plot_2d(vor, ax=ax, show_vertices=False, alpha=0.5)

Now, let’s look at our unlabeled nonlinear data:

#looking at circles data
df = pd.DataFrame(training_data)
ax = sns.scatterplot(x=0, y=1, data=df)

Just like our into data set of the drivers’, our circle within a circle is definitely not a linear data set. Next, we’ll apply K-means comparing visual results with giving it a hint on what we think and no hints:

#With hint
fig = plt.figure()
ax = plt.subplot(211, aspect='equal')
kmeans_voronoi_plot(training_data, training_labels, kmf_hint.cluster_centers_, ax)
ax.set_title('K-Means with Target Hint')#Without hint
ax2 = plt.subplot(212, aspect='equal')
kmeans_voronoi_plot(training_data, training_labels, kmf_no_hint.cluster_centers_, ax2)
ax2.set_title('K-Means without Target Hint')

I find that in hint versus no hint that the results are fairly close. If you want more automation, then you might want to apply no hint. But if you can spend some time looking at your data set to give it a hint, I would. The reason is it could save you some time in running the model, so k-means spends less time figuring out on its own. Another reason to give k-means a hint is you have domain expertise in your data set and know there are a specific number of clusters.Model Stacking for ClassificationTime for the fun part — making the stacked model. Some of you might be asking, what’s the difference between stacked model and ensemble model. An ensemble model combines multiple machine learning models to make another model [5]. So, not much. I think model stacking is more precise here, since k-means is feeding into logistic regression. If we could draw a Venn diagram, we would find stacked models inside the concept of ensemble model. I couldn’t find a good example on Google images, so I applied the magic of MS paint to present a rough illustration for your viewing pleasure:

Ok, art class over and back to coding. We’re going to do a ROC curve of kNN, logistic regression (LR), and k-means feeding into logistic regression.

#Generate test data from same distribution of training data
test_data, test_labels = make_moons(n_samples=2000, noise=0.3, random_state=seed+5)training_cluster_features = kmf_hint.transform(training_data)
test_cluster_features = kmf_hint.transform(test_data)training_with_cluster = scipy.sparse.hstack((training_data, training_cluster_features))
test_with_cluster = scipy.sparse.hstack((test_data, test_cluster_features))#Run the models
lr_cluster = LogisticRegression(random_state=seed).fit(training_with_cluster, training_labels)classifier_names = ['LR',
                    'kNN']
classifiers = [LogisticRegression(random_state=seed),
               KNeighborsClassifier(5)]
for model in classifiers:
    model.fit(training_data, training_labels)   
    
#Plot the ROC
def test_roc(model, data, labels):
    if hasattr(model, "decision_function"):
        predictions = model.decision_function(data)
    else:
        predictions = model.predict_proba(data)[:,1]
    fpr, tpr, _ = sklearn.metrics.roc_curve(labels, predictions)
    return fpr, tprplt.figure()
fpr_cluster, tpr_cluster = test_roc(lr_cluster, test_with_cluster, test_labels)
plt.plot(fpr_cluster, tpr_cluster, 'r-', label='LR with k-means')for i, model in enumerate(classifiers):
    fpr, tpr = test_roc(model, test_data, test_labels)
    plt.plot(fpr, tpr, label=classifier_names[i])
    
plt.plot([0, 1], [0, 1], 'k--')
plt.legend()
plt.xlabel('False Positive Rate', fontsize=14)
plt.ylabel('True Positive Rate', fontsize=14)

Alright, first time I saw a ROC curve, I was like how do I read this thing? Well, what you want is the model that shoots up to the top left corner the fastest. In this case, our most accurate model is the stacked model — linear regression with k-means. The classification that our models worked on is picking where, which data point belongs to the big circle or small circle.ConclusionPhew, we covered quite a few things here. First, we look at nonlinear data and examples that we might face in the real world. Second, we looked over k-means as a tool to discover more features about our data that was not there before. Next, we applied k-means to our own data set. Lastly, we stacked k-means into logistic regression to make a superior model. Pretty cool stuff overall. Somethings to note, we didn’t tune the models, which would change the performance nor did we compare that many models. But combining unsupervised learning into your supervised models could prove pretty useful and help you deliver insights you couldn’t get otherwise!

Disclaimer: All things stated in this article are of my own opinion and not of any employer. Also sprinkled affiliate links.

[1] A, Trevino, Introduction to K-means Clustering (2016),

https://www.datascience.com/blog/k-means-clustering

[2] J, VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data (2016),

https://amzn.to/2SMdZue

[3] Scikit-learn Developers, sklearn.data sets.make_circles (2019),

https://scikit-learn.org/stable/modules/generated/sklearn.data sets.make_circles.html#sklearn.data sets.make_circles

[4] A, Zheng, et al, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists (2018),

https://amzn.to/2SOFh3q

[5] F, Gunes, Why do stacked ensemble models win data science competitions? (2017),

https://blogs.sas.com/content/subconsciousmusings/2017/05/18/stacked-ensemble-models-win-data-science-competitions/

#machine-learning #data-science #deep-learning

What is GEEK

Buddha Community

Machine Learning Pipelines: Nonlinear Model Stacking
Mckenzie  Osiki

Mckenzie Osiki

1623906928

How To Use “Model Stacking” To Improve Machine Learning Predictions

What is Model Stacking?

Model Stacking is a way to improve model predictions by combining the outputs of multiple models and running them through another machine learning model called a meta-learner. It is a popular strategy used to win kaggle competitions, but despite their usefulness they’re rarely talked about in data science articles — which I hope to change.

Essentially a stacked model works by running the output of multiple models through a “meta-learner” (usually a linear regressor/classifier, but can be other models like decision trees). The meta-learner attempts to minimize the weakness and maximize the strengths of every individual model. The result is usually a very robust model that generalizes well on unseen data.

The architecture for a stacked model can be illustrated by the image below:

#tensorflow #neural-networks #model-stacking #how to use “model stacking” to improve machine learning predictions #model stacking #machine learning

sophia tondon

sophia tondon

1620898103

5 Latest Technology Trends of Machine Learning for 2021

Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.

#machinelearningapps #machinelearningdevelopers #machinelearningexpert #machinelearningexperts #expertmachinelearningservices #topmachinelearningcompanies #machinelearningdevelopmentcompany

Visit Blog- https://www.xplace.com/article/8743

#machine learning companies #top machine learning companies #machine learning development company #expert machine learning services #machine learning experts #machine learning expert

Nora Joy

1604154094

Hire Machine Learning Developers in India

Hire machine learning developers in India ,DxMinds Technologies is the best product engineering company in India making innovative solutions using Machine learning and deep learning. We are among the best to hire machine learning experts in India work in different industry domains like Healthcare retail, banking and finance ,oil and gas, ecommerce, telecommunication ,FMCG, fashion etc.
**
Services**
Product Engineering & Development
Re-engineering
Maintenance / Support / Sustenance
Integration / Data Management
QA & Automation
Reach us 917483546629

Hire machine learning developers in India ,DxMinds Technologies is the best product engineering company in India making innovative solutions using Machine learning and deep learning. We are among the best to hire machine learning experts in India work in different industry domains like Healthcare retail, banking and finance ,oil and gas, ecommerce, telecommunication ,FMCG, fashion etc.

Services

Product Engineering & Development

Re-engineering

Maintenance / Support / Sustenance

Integration / Data Management

QA & Automation

Reach us 917483546629

#hire machine learning developers in india #hire dedicated machine learning developers in india #hire machine learning programmers in india #hire machine learning programmers #hire dedicated machine learning developers #hire machine learning developers

Nora Joy

1607006620

Applications of machine learning in different industry domains

Machine learning applications are a staple of modern business in this digital age as they allow them to perform tasks on a scale and scope previously impossible to accomplish.Businesses from different domains realize the importance of incorporating machine learning in business processes.Today this trending technology transforming almost every single industry ,business from different industry domains hire dedicated machine learning developers for skyrocket the business growth.Following are the applications of machine learning in different industry domains.

Transportation industry

Machine learning is one of the technologies that have already begun their promising marks in the transportation industry.Autonomous Vehicles,Smartphone Apps,Traffic Management Solutions,Law Enforcement,Passenger Transportation etc are the applications of AI and ML in the transportation industry.Following challenges in the transportation industry can be solved by machine learning and Artificial Intelligence.

  • ML and AI can offer high security in the transportation industry.
  • It offers high reliability of their services or vehicles.
  • The adoption of this technology in the transportation industry can increase the efficiency of the service.
  • In the transportation industry ML helps scientists and engineers come up with far more environmentally sustainable methods for powering and operating vehicles and machinery for travel and transport.

Healthcare industry

Technology-enabled smart healthcare is the latest trend in the healthcare industry. Different areas of healthcare, such as patient care, medical records, billing, alternative models of staffing, IP capitalization, smart healthcare, and administrative and supply cost reduction. Hire dedicated machine learning developers for any of the following applications.

  • Identifying Diseases and Diagnosis
  • Drug Discovery and Manufacturing
  • Medical Imaging Diagnosis
  • Personalized Medicine
  • Machine Learning-based Behavioral Modification
  • Smart Health Records
  • Clinical Trial and Research
  • Better Radiotherapy
  • Crowdsourced Data Collection
  • Outbreak Prediction

**
Finance industry**

In financial industries organizations like banks, fintech, regulators and insurance are Adopting machine learning to improve their facilities.Following are the use cases of machine learning in finance.

  • Fraud prevention
  • Risk management
  • Investment predictions
  • Customer service
  • Digital assistants
  • Marketing
  • Network security
  • Loan underwriting
  • Algorithmic trading
  • Process automation
  • Document interpretation
  • Content creation
  • Trade settlements
  • Money-laundering prevention
  • Custom machine learning solutions

Education industry

Education industry is one of the industries which is investing in machine learning as it offers more efficient and easierlearning.AdaptiveLearning,IncreasingEfficiency,Learning Analytics,Predictive Analytics,Personalized Learning,Evaluating Assessments etc are the applications of machine learning in the education industry.

Outsource your machine learning solution to India,India is the best outsourcing destination offering best in class high performing tasks at an affordable price.Business** hire dedicated machine learning developers in India for making your machine learning app idea into reality.
**
Future of machine learning

Continuous technological advances are bound to hit the field of machine learning, which will shape the future of machine learning as an intensively evolving language.

  • Improved Unsupervised Algorithms
  • Increased Adoption of Quantum Computing
  • Enhanced Personalization
  • Improved Cognitive Services
  • Rise of Robots

**Conclusion
**
Today most of the business from different industries are hire machine learning developers in India and achieve their business goals. This technology may have multiple applications, and, interestingly, it hasn’t even started yet but having taken such a massive leap, it also opens up so many possibilities in the existing business models in such a short period of time. There is no question that the increase of machine learning also brings the demand for mobile apps, so most companies and agencies employ Android developers and hire iOS developers to incorporate machine learning features into them.

#hire machine learning developers in india #hire dedicated machine learning developers in india #hire machine learning programmers in india #hire machine learning programmers #hire dedicated machine learning developers #hire machine learning developers

Nora Joy

1607006620

Hire Machine Learning Developer | Hire ML Experts in India

Machine learning applications are a staple of modern business in this digital age as they allow them to perform tasks on a scale and scope previously impossible to accomplish.Businesses from different domains realize the importance of incorporating machine learning in business processes.Today this trending technology transforming almost every single industry ,business from different industry domains hire dedicated machine learning developers for skyrocket the business growth.Following are the applications of machine learning in different industry domains.

Transportation industry

Machine learning is one of the technologies that have already begun their promising marks in the transportation industry.Autonomous Vehicles,Smartphone Apps,Traffic Management Solutions,Law Enforcement,Passenger Transportation etc are the applications of AI and ML in the transportation industry.Following challenges in the transportation industry can be solved by machine learning and Artificial Intelligence.

  • ML and AI can offer high security in the transportation industry.
  • It offers high reliability of their services or vehicles.
  • The adoption of this technology in the transportation industry can increase the efficiency of the service.
  • In the transportation industry ML helps scientists and engineers come up with far more environmentally sustainable methods for powering and operating vehicles and machinery for travel and transport.

Healthcare industry

Technology-enabled smart healthcare is the latest trend in the healthcare industry. Different areas of healthcare, such as patient care, medical records, billing, alternative models of staffing, IP capitalization, smart healthcare, and administrative and supply cost reduction. Hire dedicated machine learning developers for any of the following applications.

  • Identifying Diseases and Diagnosis
  • Drug Discovery and Manufacturing
  • Medical Imaging Diagnosis
  • Personalized Medicine
  • Machine Learning-based Behavioral Modification
  • Smart Health Records
  • Clinical Trial and Research
  • Better Radiotherapy
  • Crowdsourced Data Collection
  • Outbreak Prediction

**
Finance industry**

In financial industries organizations like banks, fintech, regulators and insurance are Adopting machine learning to improve their facilities.Following are the use cases of machine learning in finance.

  • Fraud prevention
  • Risk management
  • Investment predictions
  • Customer service
  • Digital assistants
  • Marketing
  • Network security
  • Loan underwriting
  • Algorithmic trading
  • Process automation
  • Document interpretation
  • Content creation
  • Trade settlements
  • Money-laundering prevention
  • Custom machine learning solutions

Education industry

Education industry is one of the industries which is investing in machine learning as it offers more efficient and easierlearning.AdaptiveLearning,IncreasingEfficiency,Learning Analytics,Predictive Analytics,Personalized Learning,Evaluating Assessments etc are the applications of machine learning in the education industry.

Outsource your machine learning solution to India,India is the best outsourcing destination offering best in class high performing tasks at an affordable price.Business** hire dedicated machine learning developers in India for making your machine learning app idea into reality.
**
Future of machine learning

Continuous technological advances are bound to hit the field of machine learning, which will shape the future of machine learning as an intensively evolving language.

  • Improved Unsupervised Algorithms
  • Increased Adoption of Quantum Computing
  • Enhanced Personalization
  • Improved Cognitive Services
  • Rise of Robots

**Conclusion
**
Today most of the business from different industries are hire machine learning developers in India and achieve their business goals. This technology may have multiple applications, and, interestingly, it hasn’t even started yet but having taken such a massive leap, it also opens up so many possibilities in the existing business models in such a short period of time. There is no question that the increase of machine learning also brings the demand for mobile apps, so most companies and agencies employ Android developers and hire iOS developers to incorporate machine learning features into them.

#hire machine learning developers in india #hire dedicated machine learning developers in india #hire machine learning programmers in india #hire machine learning programmers #hire dedicated machine learning developers #hire machine learning developers