Machine learning with Python: An introduction

Machine learning with Python: An introduction

Find out how Python compares to Java for data analysis, then use Flask to build a Python-based web service for machine learning

Find out how Python compares to Java for data analysis, then use Flask to build a Python-based web service for machine learning

Machine learning is one of our most important technologies for the future. Self-driving cars, voice-controlled speakers, and face detection software all are built on machine learning technologies and frameworks. As a software developer you may wonder how this will impact your daily work, including the tools and frameworks you should learn. If you’re reading this article, my guess is you’ve already decided to learn more about machine learning.

In my previous article, “Machine Learning for Java developers,” I introduced Java developers to setting up a machine learning algorithm and developing a simple prediction function in Java. While Java’s ecosystem includes many tools and frameworks for machine learning, Python has emerged as the most popular language for this field. Figure 1 shows the result of a recent Google Trends query combining the search term “machine learning” with “Python,” “Java,” and “R.” Although this graph is not reliable from a statistical point of view, it does allow us to visualize the popularity of Python for machine learning.

In this article, you’ll learn why Python is especially successful for machine learning and other uses involving data science. I’ll briefly introduce some of the Python-based tools data scientists and software engineers use for machine learning, and suggest a few ways to integrate Python into your machine learning development process–from mixed environments leveraging a Java backend to Python-based solutions in clouds, containers, and more.

Get the code

Get the source code for this introduction to machine learning with Python, including examples not found in the article.

A use case for machine learning

To start, let’s revisit the use case from my previous introduction to machine learning. Assume you’re working for a large, multinational real estate company, Better Home Inc. To support its agents, the company uses third-party software systems as well as a custom-developed core system. This system is built on the top of a massive database containing historical data on sold homes, sale prices, and descriptions of available houses. The database is updated continuously by internal and external sources, and is used to manage sales as well as estimating the market value of properties for sale.

An agent may enter features such as house size, year of construction, location, and so on to receive the estimated sale price. Internally, this function uses a machine learning model–essentially, a mathematical expression of model parameters–to calculate a prediction. (Please see my previous article for a more detailed explanation of machine learning algorithms and how to develop und use them in Java.)

Listing 1. A machine learning model based on linear regression

double predictPrice(double[] houseFeatures) {
    // mathematical expression (here linear regression)
    double price = this.modelParams[0] * 1 +
                   this.modelParams[1] * houseFeatures[0] +
                   this.modelParams[2] * houseFeatures[1] +
                   ...;
    return price;
}


In Listing 1, a machine learning model is implemented using a linear regression algorithm, which is very popular in machine learning. The algorithm multiplies model parameters with the feature parameters for a given property and sums them up. As is typical in machine learning, a training process determines the parameter values to be used for the model. This approach is called supervised learning.

Supervised learning consists of feeding a system labeled example records, which are then analyzed for correlations. In this case, the system is fed historical house record features that have been labeled with the sale price. The model looks for correlations between features that have some impact on sale price, as well as the weight of these relationships. Model parameters are then adjusted based on the identified correlations and weights. This is how a machine learning model “learns” to estimate the price for a given house.

Listing 2. Training model parameters

void train(double[] houseFeatures, double[] pricesOfSale) {
    // .. find hidden structures and determine the
    // proper model parameters
    this.modelParams = ...
}


Challenges in machine learning

While the code example may appear quite simple, the challenge is to find and train the appropriate algorithm. In contrast to linear regression, which is relatively simple, most algorithms used for machine learning are more complex. Many machine learning algorithms require additional (hyper) parameters, which require a deeper understanding of the mathematics behind the algorithm.

Another challenge is finding and selecting appropriate training data. Data records have to be collected and understood, and collecting the records is not always easy. In order to build and train a price-prediction model, you must first locate a large number of sold house records. In order to be useful, you need not only the sale price but other features that help define the value of each house. In many cases, this means importing and consolidating from external as well as internal data sources. As an example, you might fetch house characteristics as well as the price of sale from an internal database storing sales transactions. For additional characteristics, you might call external partner APIs that provide information regarding the transport infrastructure or income levels for the given neighborhood.

Machine learning as a scientific process

Developing machine learning models is more similar to a scientific process than to traditional computer programming. A scientific process starts with a question, or an observation. For instance, you might observe that senior estate agents at Better Home Inc. are quite good at estimating the market price of a house. By interviewing these agents, you discover that they are able to quickly enumerate the features that determine the market value of a house. Furthermore, they’re well versed in market conditions for different cities and regions. From this observation, you theorize that anyone could determine the market price of a house by combining historical sales data with key features of the property. Using this data, you could develop a machine learning model capable of estimating the sale price of a house. This feature would be of value to the company because it would enable inexperienced agents to determine the expected sale price of a new offer.

In order to test your thesis, you will need to acquire and explore the selected data sets. At this point, you are seeking an overview of the data structure. To get this overview, you will likely use tools such as TableauKNIME, and Weka, or even simple libraries like Python Data Analysis Library (pandas) or matplotlib. Before attempting to build your machine learning models, you will also need to prepare your data records by handling invalid or missing values. Once you’ve built your models, you will need to test and validate them in order to know whether your assumptions are true or false. You might, for example, validate whether the Better Home Inc. machine learning model is capable of estimating the proper sale price of a house. In general, data exploration, analysis, cleaning, and validation are the most time-consuming activities of machine learning.

The role of the data scientist

Data scientists are frequently responsible for the major tasks of a machine learning process. Most data scientists have a background in mathematics and statistics, but they are also typically proficient with programming and data modeling skills. Data scientists often have a strong understanding of data-mining techniques, which helps them to understand and select data sources, as well as gaining insight from the data. Careful data analysis helps teams choose the appropriate machine learning algorithms for a given use case.

In contrast to traditional software engineers, including enterprise Java developers, a data scientist is more focused on data and the hidden patterns in data. Data scientists typically develop, train, and process machine learning models using computing environments and data platforms implemented by traditional software engineers.

Python-based tools for data analysis

Understanding the role of data scientists in machine learning helps us understand why Python is the preferred language for this field. Unlike traditional software engineers, most data scientists prefer Python as a programming language. This is because data scientists are generally closer to scientific and research communities, where R and Python are widely used. Moreover, these communities have developed Python-based scientific libraries that make it easier to develop machine learning models. Now there is a growing, Python-based tools ecosystem specifically for machine learning. This ecosystem includes Jupyter Notebook, an interactive web-based Python shell, which is the current, de facto standard in the field of data science.

Jupyter Notebook: A web interface for visualizing data analysis

Jupyter Notebook extends a command-line Python interpreter with a web-based user interface and some enhanced visualization capabilities. It integrates code and output into a single web document that combines code, explanatory text, and visualizations. The inline plotting of the output allows immediate data visualization and iterative development and analysis. A notebook is used to explore data as well as to develop, train, and test machine learning models. As an example, a data scientist working for Better Home Inc. might use a notebook to load and explore available housing data sets, as shown in Figure 3.

notebook in Jupyter consists of input cells and output cells. The editable input cells contain common Python code, which will be executed by pressing the key combination Ctrl+Enter. In the notebook shown in Figure 3, the second input cell is used to load a houses.csv file into a pandas dataframe. The dataframe provides utilities to manipulate and visualize data in an intuitive way. The third cell of the notebook contains a dataframe used to plot a histogram of house prices over time.

Data scientists use histograms and other charts and visualizations to understand data, and to identify outliers and inconsistencies in the data. Identifying inconsistencies and outliers is important because it allows you to sort through and resolve them in the data preparation process. This process eventually leads to clean data sets, which you can use to develop reliable machine learning models. You use the data sets to identify the features or house properties that are most relevant to the final sale price. These are the features that will define your machine learning model. Most algorithms aren’t intelligent enough to automatically extract meaningful features from the full data set, and most algorithms won’t work well if there are too many features to be analyzed.

Scikit-learn: A library of advanced machine learning algorithms

I explained in my Java-based introduction to machine learning that logistic regression algorithms require numeric values. For such a machine learning model, all of your strings or category values must be converted to numeric values. The process of conversion is done during feature extraction. One way to extract features is to develop a dedicated function that converts the raw input of house records into a vectorized representation that the algorithm can understand.

Below is a simplified extract_core_features() method written in Python. If you are unfamiliar with Python, don’t be confused by the self argument. In Python, the first argument of every non-static method definition is always a reference to the current instance of the class. On the caller side, this argument will be passed automatically by executing the method.

A significant portion of the machine learning code data scientists write is for feature extraction. In the field of natural language processing, for instance, several non-trivial conversion steps are required to transform human text into a vectorized form.

=====================================

Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

Learn More

☞ Machine Learning A-Z™: Hands-On Python & R In Data Science

☞ Python for Data Science and Machine Learning Bootcamp

☞ Machine Learning, Data Science and Deep Learning with Python

☞ [2019] Machine Learning Classification Bootcamp in Python

Machine Learning, Data Science and Deep Learning with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks. Introducing Tensorflow, Using Tensorflow, Introducing Keras, Using Keras, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Learning Deep Learning, Machine Learning with Neural Networks, Deep Learning Tutorial with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks

Explore the full course on Udemy (special discount included in the link): http://learnstartup.net/p/BkS5nEmZg

In less than 3 hours, you can understand the theory behind modern artificial intelligence, and apply it with several hands-on examples. This is machine learning on steroids! Find out why everyone’s so excited about it and how it really works – and what modern AI can and cannot really do.

In this course, we will cover:
• Deep Learning Pre-requistes (gradient descent, autodiff, softmax)
• The History of Artificial Neural Networks
• Deep Learning in the Tensorflow Playground
• Deep Learning Details
• Introducing Tensorflow
• Using Tensorflow
• Introducing Keras
• Using Keras to Predict Political Parties
• Convolutional Neural Networks (CNNs)
• Using CNNs for Handwriting Recognition
• Recurrent Neural Networks (RNNs)
• Using a RNN for Sentiment Analysis
• The Ethics of Deep Learning
• Learning More about Deep Learning

At the end, you will have a final challenge to create your own deep learning / machine learning system to predict whether real mammogram results are benign or malignant, using your own artificial neural network you have learned to code from scratch with Python.

Separate the reality of modern AI from the hype – by learning about deep learning, well, deeply. You will need some familiarity with Python and linear algebra to follow along, but if you have that experience, you will find that neural networks are not as complicated as they sound. And how they actually work is quite elegant!

This is hands-on tutorial with real code you can download, study, and run yourself.

Python Tutorial - Learn Python for Machine Learning and Web Development

Python Tutorial - Learn Python for Machine Learning and Web Development

Python tutorial for beginners - Learn Python for Machine Learning and Web Development. Can Python be used for machine learning? Python is widely considered as the preferred language for teaching and learning ML (Machine Learning). Can I use Python for web development? Python can be used to build server-side web applications. Why Python is suitable for machine learning? How Python is used in AI? What language is best for machine learning?

Python tutorial for beginners - Learn Python for Machine Learning and Web Development

TABLE OF CONTENT

  • 00:00:00 Introduction
  • 00:01:49 Installing Python 3
  • 00:06:10 Your First Python Program
  • 00:08:11 How Python Code Gets Executed
  • 00:11:24 How Long It Takes To Learn Python
  • 00:13:03 Variables
  • 00:18:21 Receiving Input
  • 00:22:16 Python Cheat Sheet
  • 00:22:46 Type Conversion
  • 00:29:31 Strings
  • 00:37:36 Formatted Strings
  • 00:40:50 String Methods
  • 00:48:33 Arithmetic Operations
  • 00:51:33 Operator Precedence
  • 00:55:04 Math Functions
  • 00:58:17 If Statements
  • 01:06:32 Logical Operators
  • 01:11:25 Comparison Operators
  • 01:16:17 Weight Converter Program
  • 01:20:43 While Loops
  • 01:24:07 Building a Guessing Game
  • 01:30:51 Building the Car Game
  • 01:41:48 For Loops
  • 01:47:46 Nested Loops
  • 01:55:50 Lists
  • 02:01:45 2D Lists
  • 02:05:11 My Complete Python Course
  • 02:06:00 List Methods
  • 02:13:25 Tuples
  • 02:15:34 Unpacking
  • 02:18:21 Dictionaries
  • 02:26:21 Emoji Converter
  • 02:30:31 Functions
  • 02:35:21 Parameters
  • 02:39:24 Keyword Arguments
  • 02:44:45 Return Statement
  • 02:48:55 Creating a Reusable Function
  • 02:53:42 Exceptions
  • 02:59:14 Comments
  • 03:01:46 Classes
  • 03:07:46 Constructors
  • 03:14:41 Inheritance
  • 03:19:33 Modules
  • 03:30:12 Packages
  • 03:36:22 Generating Random Values
  • 03:44:37 Working with Directories
  • 03:50:47 Pypi and Pip
  • 03:55:34 Project 1: Automation with Python
  • 04:10:22 Project 2: Machine Learning with Python
  • 04:58:37 Project 3: Building a Website with Django

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading

Complete Python Bootcamp: Go from zero to hero in Python 3

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python and Django Full Stack Web Developer Bootcamp

Complete Python Masterclass

Python Programming Tutorial | Full Python Course for Beginners 2019 👍

Top 10 Python Frameworks for Web Development In 2019

Python for Financial Analysis and Algorithmic Trading

Building A Concurrent Web Scraper With Python and Selenium

Machine Learning Full Course - Learn Machine Learning

Machine Learning Full Course - Learn Machine Learning

This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.

Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial

It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.

Below topics are explained in this Machine Learning course for beginners:

  1. Basics of Machine Learning - 01:46

  2. Why Machine Learning - 09:18

  3. What is Machine Learning - 13:25

  4. Types of Machine Learning - 18:32

  5. Supervised Learning - 18:44

  6. Reinforcement Learning - 21:06

  7. Supervised VS Unsupervised - 22:26

  8. Linear Regression - 23:38

  9. Introduction to Machine Learning - 25:08

  10. Application of Linear Regression - 26:40

  11. Understanding Linear Regression - 27:19

  12. Regression Equation - 28:00

  13. Multiple Linear Regression - 35:57

  14. Logistic Regression - 55:45

  15. What is Logistic Regression - 56:04

  16. What is Linear Regression - 59:35

  17. Comparing Linear & Logistic Regression - 01:05:28

  18. What is K-Means Clustering - 01:26:20

  19. How does K-Means Clustering work - 01:38:00

  20. What is Decision Tree - 02:15:15

  21. How does Decision Tree work - 02:25:15 

  22. Random Forest Tutorial - 02:39:56

  23. Why Random Forest - 02:41:52

  24. What is Random Forest - 02:43:21

  25. How does Decision Tree work- 02:52:02

  26. K-Nearest Neighbors Algorithm Tutorial - 03:22:02

  27. Why KNN - 03:24:11

  28. What is KNN - 03:24:24

  29. How do we choose 'K' - 03:25:38

  30. When do we use KNN - 03:27:37

  31. Applications of Support Vector Machine - 03:48:31

  32. Why Support Vector Machine - 03:48:55

  33. What Support Vector Machine - 03:50:34

  34. Advantages of Support Vector Machine - 03:54:54

  35. What is Naive Bayes - 04:13:06

  36. Where is Naive Bayes used - 04:17:45

  37. Top 10 Application of Machine Learning - 04:54:48

  38. How to become a Machine Learning Engineer - 04:59:46

  39. Machine Learning Interview Questions - 05:09:03