Hello again, lovely people! We’re back this week with another data science quick tip, and this one is sort of a two parter. In this first…
Hello again, lovely people! We’re back this week with another data science quick tip, and this one is sort of a two parter. In this first part, we’ll be covering how to use Scikit-Learn pipelines with Scikit-Learn’s barebones transformers, and in the next part, I’ll teach you how to use your own custom data transformers within this same pipeline framework. (Stay tuned for that post!)
Before getting into things, let me share my GitHub for this post in case you want to follow along more closely. I’ve also included the data we’ll be working with as well. Check it all out at this link.
As always, let’s start with the intuition on why you would want to leverage something like this. I’m assuming that you’re already familiar with the concept that predictive models are often exported into binary pickle files. These binaries are then later imported for use elsewhere in things like APIs. That way when you receive data through said API, your deserialized pickle can perform the prediction and ship the results back to your eager, smiling user.
But it’s not always the case that your data will come ready-to-go into your API! In many cases, you’ll have to do a little preprocessing work before it can be appropriately put through the model for predictive purposes. I’m talking about things like one-hot encoding, scaling, imputing, and more. The Scikit-Learn package offers a number of these transformers to use, but if you do NOT use a pipeline, you’ll have to serialize each individual transformer. In the end, you could end up with like 6–7 serialized pickle files. Not ideal!
Artificial Intelligence (AI) vs Machine Learning vs Deep Learning vs Data Science: Artificial intelligence is a field where set of techniques are used to make computers as smart as humans. Machine learning is a sub domain of artificial intelligence where set of statistical and neural network based algorithms are used for training a computer in doing a smart task. Deep learning is all about neural networks. Deep learning is considered to be a sub field of machine learning. Pytorch and Tensorflow are two popular frameworks that can be used in doing deep learning.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different
Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.
Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.