Finding patterns or clusters in a dataset is one of the basic abilities of human intelligence.
Note from Towards Data Science’s editors:_ While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details._
Nowadays, nearly everything in our lives can be quantified by data. Whether it involves search engine results, social media usage, weather trackers, cars, or sports, data is always being collected to enhance our quality of life. How do we get from all this raw data to improve the level of performance? This article will introduce us to the tools and techniques developed to make sense of unstructured data and discover hidden patterns. Specifically, the main topics that are covered are:
1. Supervised & Unsupervised Learning and the main techniques corresponding to each one (Classification and Clustering, respectively).
2. An in-depth look at the K-Means algorithm
1. Understanding the many different techniques used to discover patterns in a set of data
2. In-depth understanding of the K-Means algorithm
In unsupervised learning, we are trying to discover hidden patterns in data, when we don’t have any labels. We will go through what hidden patterns are and what labels are, and we will go through real data examples.
What is unsupervised learning?
First, let’s step back to what learning even means. In machine learning in statistics, we are typically trying to find hidden patterns in data. Ideally, we want these hidden patterns to help us in some way. For instance, to help us understand some scientific results, to improve our user experience, or to help us maximize profit in some investment. Supervised learning is when we learn from data, but we have labels for all the data we have seen so far. Unsupervised learning is when we learn from data, but we don’t have any labels.
Let’s use an example of an email. In general, it can be hard to keep our inbox in check. We get many e-mails every day and a big problem is spam. In fact, it would be an even bigger problem if e-mail providers, like Gmail, were not so effective at keeping spam out of our inboxes. But how do they know whether a particular e-mail is a spam or not? This is our first example of a machine learning problem.
Every machine learning problem has a data set, which is a collection of data points that help us learn. Your data set will be all the e-mails that are sent over a month. Each data point will be a single e-mail. Whenever you get an e-mail, you can quickly tell whether it’s spam. You might hit a button to label any particular e-mail as spam or not spam. Now you can imagine that each of your data points has one of two labels, spam or not spam. In the future, you will keep getting emails, but you won’t know in advance which label it should have, spam or not spam. The machine learning problem is to predict whether a new label for a new email is spam or not spam. This means that we want to predict the label of the next email. If our machine learning algorithm works, it can put all the spam in a separate folder. This spam problem is an example of supervised learning. You can imagine a teacher, or supervisor, telling you the label of each data point, which is whether each e-mail is spam or not spam. The supervisor might be able to tell us whether the labels we predicted were correct.
So what is unsupervised learning? Let’s try another example of a machine learning problem. Imagine you are looking at your emails, and realize you got too many emails. It would be helpful if you could read all the emails that are on the same topic at the same time. So, you might run a machine learning algorithm that groups together similar emails. After you have run your machine learning algorithm, you find that there are natural groups of emails in your inbox. This is an example of an unsupervised learning problem. You did not have any labels because no labels were made for each email, which means there is no supervisor.
#reinforcement-learning #supervised-learning #unsupervised-learning #k-means-clustering #machine-learning
Finding patterns or clusters in a dataset is one of the basic abilities of human intelligence.
Machine learning is quite an exciting field to study and rightly so. It is all around us in this modern world. From Facebook’s feed to Google Maps for navigation, machine learning finds its application in almost every aspect of our lives.
It is quite frightening and interesting to think of how our lives would have been without the use of machine learning. That is why it becomes quite important to understand what is machine learning, its applications and importance.
To help you understand this topic I will give answers to some relevant questions about machine learning.
But before we answer these questions, it is important to first know about the history of machine learning.
You might think that machine learning is a relatively new topic, but no, the concept of machine learning came into the picture in 1950, when Alan Turing (Yes, the one from Imitation Game) published a paper answering the question “Can machines think?”.
In 1957, Frank Rosenblatt designed the first neural network for computers, which is now commonly called the Perceptron Model.
In 1959, Bernard Widrow and Marcian Hoff created two neural network models called Adeline, that could detect binary patterns and Madeline, that could eliminate echo on phone lines.
In 1967, the Nearest Neighbor Algorithm was written that allowed computers to use very basic pattern recognition.
Gerald DeJonge in 1981 introduced the concept of explanation-based learning, in which a computer analyses data and creates a general rule to discard unimportant information.
During the 1990s, work on machine learning shifted from a knowledge-driven approach to a more data-driven approach. During this period, scientists began creating programs for computers to analyse large amounts of data and draw conclusions or “learn” from the results. Which finally overtime after several developments formulated into the modern age of machine learning.
Now that we know about the origin and history of ml, let us start by answering a simple question - What is Machine Learning?
#machine-learning #machine-learning-uses #what-is-ml #supervised-learning #unsupervised-learning #reinforcement-learning #artificial-intelligence #ai
Machine learning tasks usually have some data sets where we have some parameters, and for those resulting parameters, we have their respective outputs. From these datasets, our machine learning model built can predict the results for similar data. This process is what happens in supervised learning.
An example of supervised learning is for determining if the patient appears to have a tumor. We have a large dataset with a set of parameters of their patients matched with their respective results. We can assume that this is a simple classification task with ‘1’ for tumor and ‘0’ for None.
However, let’s say we have a dataset of dogs and cats. There are no pre-trained results for us to determine which one of them is a cat or a dog. Such kind of problems that have unlabeled datasets can be solved with the help of unsupervised learning. In technical terms, we can define unsupervised learning as a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. Clustering and association are two of the most important types of unsupervised learning algorithms. Today, we will be focusing only on Clustering.
Using certain data patterns, the machine learning algorithm is able to find similarities and group these data into groups. In other words, Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
In clustering, we don’t have any predictions or labeled data. We are given a set of input data points, and using these we need to find the most similar matches and group them into clusters. The clustering algorithms have a wide range of applications that we will discuss in future sections.
Let us analyze the various clustering algorithms that are available. We will discuss the three most prevalent and popular algorithm techniques among the many existing approaches available to us. We will also understand the performance metrics used for unsupervised learning and finally discuss their applications in the real world.
#data-science #machine-learning #unsupervised-learning #clustering
Self Supervised Learning is an interesting research area where the goal is to learn rich representations from unlabeled data without any human annotation.
This can be achieved by creatively formulating a problem such that you use parts of the data itself as labels and try to predict that. Such formulations are called pretext tasks.
For example, you can setup a pretext task to predict the color version of the image given the gray-scale version. Similarly, you could remove a part of the image and train a model to predict the part from the surrounding. There are many such pretext tasks.
#unsupervised-learning #self-supervised-learning #machine-learning #unsupervised-clustering #knowledge-distillation