In this article, I will provide a basic guide to transferring concepts in deep learning
As Machine Learning becomes ever more powerful and advanced, the models giving this advanced capability become ever larger, and begin to require huge amounts of time, power, and data to train them. There is no sign that this process is slowing currently, as models continue to get larger and more complicated at an increasing rate.
These requirements begin to move access to the most powerful models and capabilities out of the reach of all but the largest technology companies, such as Microsoft, IBM, and Google. It also requires incurring a huge cost every time anyone wishes to retrain a model for a new task.
For example, cutting edge Natural Language Processing (NLP) models such as BERT and GPT-3 have 340 million and 17 billion parameters respectively in their full-size model versions and training them involves optimising all these parameters. BERT is reported to take 4 days to train as a full model, on high-end computing hardware, consuming huge amounts of electricity in the process. The precise details of how GPT-3 was trained remain unclear, but it was believed to have used 45 Terabytes of data in the process and is estimated to cost $4.6M to train.
There are cutdown versions of these models available, which follow the general concept, and architecture of the larger versions, sacrificing a degree of performance, however, even these are sizeable. The smaller version of BERT that is currently available contains 110 million parameters, for example.
Luckily, there is a way to work with these models which avoids having to fully re-train the model for each task that it is used on. Once these huge models have been trained, they are often released into the public domain, so they are available to everyone. These released models can then be fine-tuned to deal with the specific task at hand. It is this fine-tuning stage that is referred to as ‘Transfer Learning’, we are transferring the knowledge embedded in the trained model to a specific purpose.
This blog will explain how this Transfer Learning can be achieved, and I hope to provide demonstrations of this in a later article.
In short, Transfer Learning is taking an advanced and very complex Machine Learning model that has been pre-trained on huge amounts of data, then fine-tuning it to work on a specific task. For example, we can download the pre-trained version of BERT, which is freely available on the internet, ‘freeze’ large parts of the model keeping all of those learned parameters the same.
We then allow the remaining parts of the model to train on the data specific to our current text related task. The aim here is to take advantage of the general knowledge embedded within the model and apply it to a narrow task. This is many orders of magnitude cheaper and easier to achieve than retraining the model from scratch each time it is used, with reasonable hardware a model can be fine-tuned in a few hours, as opposed to days or weeks for full training.
To understand Transfer Learning in more detail, we first need to understand the general concept of an Artificial Neural Network. Figure 1 below shows the conceptual layout of a very simple Neural Network. Moving left to right, Data is fed into the nodes (represented by circles in the diagram) of the Input Layer, passed along the connections (shown by lines), processed in the Hidden Layer and finally a result is output at the Output Layer. This output could be a number we are trying to predict, such as the value of a house, where we have put its details into the model as input. Alternatively, it could be the likelihood of a tweet being ‘fake news’, with the text of the tweet being the input, or anything else we are looking to predict with a given relevant input.
Inexture's Deep learning Development Services helps companies to develop Data driven products and solutions. Hire our deep learning developers today to build application that learn and adapt with time.
Google Reveals "What is being Transferred” in Transfer Learning. Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community.
Project walk-through on Convolution neural networks using transfer learning. From 2 years of my master’s degree, I found that the best way to learn concepts is by doing the projects.
Looking to attend an AI event or two this year? Below ... Here are the top 22 machine learning conferences in 2020: ... Start Date: June 10th, 2020 ... Join more than 400 other data-heads in 2020 and propel your career forward. ... They feature 30+ data science sessions crafted to bring specialists in different ...
This article will simply explain the concept which will help you understand the difference between Machine Learning and Deep Learning.