Discover the many uses of encoders for the purpose of machine-learning with this “ from-scratch” walkthrough! In a perfect world, all programmers, scientists, data-engineers, analysts, and machine-learning engineers alike dream that all data could arrive at their doorstep in the cleanest form possible. Unfortunately, humans went and developed the phonetic alphabet before they started talking in binary, or “ beep boop” speech.
In a perfect world, all programmers, scientists, data-engineers, analysts, and machine-learning engineers alike dream that all data could arrive at their doorstep in the cleanest form possible. Unfortunately, humans went and developed the phonetic alphabet before they started talking in binary, or “ beep boop” speech. As a result, it is unfortunately incredibly common to come across words (or “ strings” in “ beep boop” language) rather than numbers when working with data-sets, and this is even true of the cleanest data-sets available today.
The problem with the combination of data and strings and words is that words cannot directly be analyzed by an artificial brain. Computers speak quantitatively, rather than qualitatively. Asking a computer to interpret words, especially sentences with subjective meaning or emotion is like having the Cookie monster eat celery;
it’s just not going to happen.
Fortunately, there is a solution to this problem — there are many different ways that you can approach turning words into numbers for analysis! Though doing so might not allow a computer to analyze certain things about words, it can certainly help with solving common machine-learning problems that you may encounter in the educational grind that is Data-Science. Typically, whenever machine-learning is being done with strings, a Data-Scientist will be working with an encoder. Without further ado, let’s look at some encoders!
If you’re new to machine-learning, one trick you should definitely snatch up as soon as possible is the ability to One-Hot-Encode a Data-Frame. One-Hot-Encoding, also called One-Hot, or Dummy-Encoding takes a very radical approach to dealing with categorical variables. Typically, I use One-Hot in situations where I have as few categories as possible. This is because first and foremost, One-Hot-Encoded data takes up a lot of memory and disk space compared to the other algorithms available. Additionally, One-Hot really shines in this exact light.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.
Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.