I have created a list of basic Machine Learning Interview Questions and Answers. These Machine Learning Interview Questions are common, simple and straight-forward.
I have created a list of basic Machine Learning Interview Questions and Answers. These Machine Learning Interview Questions are common, simple and straight-forward.
These questions are categorized into 8 groups:
1. Basic Introduction
2. Data Exploration and Visualization
3. Data Preprocessing and Wrangling
4. Dimensionality Reduction
6. Accuracy Measurement
8. Practical Implementations
These Machine Learning Interview Questions cover following basic concepts of Machine Learning:
1. General introduction to Machine Learning
2. Data Analysis, Exploration, Visualization and Wrangling techniques
3. Dimensionality Reduction techniques like PCA (Principal Component Analysis), SVD (Singular Vector Decomposition), LDA (Linear Discriminant Analysis), MDS (Mulit-dimension Scaling), t-SNE (t-Distributed Stochastic Neighbor Embedding) and ICA (Independent Component Analysis)
4. Supervised and Unsupervised Machine Learning algorithms like K-Nearest Neighbors (KNN), Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), Linear Regression, Logistic Regression, K-Means Clustering, Time Series Analysis, Sentiment Analysis etc
5. Bias and Variance, Overfitting and Underfitting, Cross-validation
6. Regularization, Ridge, Lasso and Elastic Net Regression
7. Bagging and Boosting techniques like Random Forest, AdaBoost, Gradient Boosting Machine (GBM) and XGBoost.
8. Basic data structures and libraries of Python used in Machine Learning
I will keep on adding more questions to this list in future.
Basic Introduction (7 Questions)
What is Machine Learning? What are its various applications? Why is Machine Learning gaining so much attraction now-a-days?
What is the difference between Artificial Intelligence, Machine Learning and Deep Learning?
What are various types of Machine Learning? What is Supervised Learning, Unsupervised Learning, Semi-supervised Learning and Reinforcement Learning? Give some examples of these types of Machine Learning.
Explain Deep Learning and Neural Networks.
What is the difference between Data Mining and Machine learning?
What is the difference between Inductive and Deductive Machine Learning?
What are the various steps involved in a Machine Learning Process?
Data Exploration and Visualization (4 Questions)
What is Hypothesis Generation? What is the difference between Null Hypothesis (Ho) and Alternate Hypothesis (Ha)? Answer
What is Univariate, Bivariate and Multivariate Data Exploration? Answer
Explain various plots and grids available for data exploration in seaborn and matplotlib libraries?
Data Preprocessing and Wrangling (19 Questions)
What is the difference between Data Processing, Data Preprocessing and Data Wrangling?
What is Data Wrangling? What are the various steps involved in Data Wrangling? Answer
What is the difference between **Labeled **and Unlabeled data?
What do you mean by **Features **and **Labels **in the dataset?
What are the **Independent / Explanatory **and **Dependent **variables?
What is the difference between **Continuous **and **Categorical / Discrete **variables?
What do you mean by **Noise **in the dataset? How to remove it?
What are **Skewed Variables **and Outliers in the dataset? What are the various ways to visualize and remove these? What do you mean by log transformation of skewed variables? Answer 1, Answer 2, Answer 3, Answer 4, Answer 5
What is the difference between Mean, Median and Mode? How are these terms used to impute missing values in numeric variables? Answer
How will you calculate **Mean, Variance **and **Standard Deviation **of a feature / variable in a given dataset? What is the formula?
What is Binning Technique? What is the difference between Fixed Width Binning and Adaptive Binning? Answer
Which Machine Learning Algorithms require Feature Scaling (Standardization and Normalization) and which not? Answer
What do you mean by Imbalanced Datasheet? How will you handle it?
What is the difference between "Training" dataset and "Test" dataset? What are the common ratios we generally maintain between them?
What is the difference between Validation set and **Test **set?
What do you understand by Fourier Transform? How is it used in Machine Learning?
Dimensionality Reduction (9 Questions)
Feature Selection and Feature Extraction
4.** Principal Component Analysis**
What is SVD (Singular Value Decomposition)?
Linear Discriminant Analysis
Algorithms (27 Questions)
2.** Supervised Learning**
What is the difference between Linear Regression and Logistic Regression? Answer
Compare KNN, SVM and Naive Bayes.
What is the difference between Decision Tree and Random Forest? Answer
Bias and Variance
Hint: You should explain Dimensionality Reduction Techniques, Regularization, Cross-validation, Decision Tree Pruning and Ensemble Learning Techniques.
What is the difference between Random Forest and AdaBoost? Answer
GBM (Gradient Boosting Machine)
What is the difference between the AdaBoost and GBM? Answer
24.** K-Means Clustering**
What is the difference between KNN and K-Means Clustering algorithms?
Time Series Analysis
Accuracy Measurement (10 Questions)
Classification metrics: Confusion Matrix, Classification Report, Accuracy Score etc.
Regression metrics: MAE, MSE, RMSE Answer
What is Confusion Matrix? What do you mean by True Positive, True Negative, False Positive and False Negative in Confusion Matrix?
How do we manually calculate Accuracy Score from Confusion Matrix?
What is Sensitivity (True Positive Rate) and Specificity (True Negative Rate)? How will you calculate it from Confusion Matrix? What is its formula?
What is the difference between **Precision **and Recall? How will you calculate it from Confusion Matrix? What is its formula?
What do you mean by ROC (Receiver Operating Characteristic) curve and AUC (Area Under the ROC Curve)? How is this curve used to measure the performance of a classification model?
What do you understand by Type I vs Type II error ? What is the difference between them?
What is Classification Report? Describe its various attributes like Precision, Recall, F1 Score and Support.
What is the difference between F1 Score and Accuracy Score?
What do you mean by Loss Function? Name some commonly used Loss Functions. Define Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, Sum of Absolute Error, Sum of Squared Error, R Square Method, Adjusted R Square Method. Answer
Python (16 Questions)
What are the commonly used libraries in Python for Machine Learning? Explain pandas, numpy, sklearn, matplotlib, seaborn and scipy libraries.
Which data structures in Python are commonly used in Machine Learning? Explain tuple, list and **dictionary. **Answer
What are **mutable **and **immutable **objects in Python?
What are the magic functions in IPython?
What is the purpose of writing "inline" with "%matplotlib" (%matplotlib inline)?
What are the basic steps to implement any Machine Learning algorithm in Python?
What is the **random_state (seed) **parameter in train_test_split?
What are the various metrics present in **sklearn **library to measure the accuracy of an algorithm? Describe classification_report, confusion_matrix, accuracy_score, f1_score, r2_score, score, mean_absolute_error, mean_squared_error.
Practical Implementations (5 Questions)
Write a pseudo code for a given algorithm.
What are the parameters on which we decide which algorithm to use for a given situation?
How will you design a Chess Game, Spam Filter, Recommendation Engine etc.?
How can you use Machine Learning Algorithms to increase revenue of a company?
How will you design a promotion campaign for a business using Machine Learning?
In this post, I will be discussing the top Machine Learning related questions asked in your interviews.
Originally published by Zulaikha Lateef at https://www.edureka.co
Ever since machines started learning and reasoning without human intervention, we’ve managed to reach an endless peak of technical evolution. Needless to say, the world has changed since Artificial Intelligence, Machine Learning and Deep learning were introduced and will continue to do so until the end of time. In this Machine Learning Interview Questions, I have collected the most frequently asked questions by interviewers. These questions are collected after consulting with Machine Learning Certification Training Experts.
In case you have attended any Machine Learning interview in the recent past, do paste those interview questions in the comments section and we’ll answer them at the earliest. You can also comment below if you have any questions in your mind, which you might face in your Machine Learning interview.
You may go through this recording of Machine Learning Interview Questions and Answers where our instructor has explained the topics in a detailed manner with examples that will help you to understand this concept better.
In this post, I will be discussing the top Machine Learning related questions asked in your interviews. . So, for your better understanding I have divided this blog into the following 3 sections:
Types of Machine Learning – Machine Learning Interview Questions
There are three ways in which machines learn:
Supervised learning is a method in which the machine learns using labeled data.
Unsupervised learning is a method in which the machine is trained on unlabelled data or without any guidance
Reinforcement learning involves an agent that interacts with its environment by producing actions & discovers errors or rewards.
Deep Learning vs Machine Learning – Machine Learning Interview Questions
Classification vs Regression – Machine Learning Interview Questions
Let me explain you this with an analogy:
However, you might be wrong in some answers.
Let’s consider a scenario of a fire emergency:
A confusion matrix or an error matrix is a table which is used for summarizing the performance of a classification algorithm.
Confusion Matrix – Machine Learning Interview Questions
Consider the above table where:
Inductive vs Deductive learning – Machine Learning Interview Questions
K-means vs KNN – Machine Learning Interview Questions
Receiver Operating Characteristic curve (or ROC curve) is a fundamental tool for diagnostic test evaluation and is a plot of the true positive rate (Sensitivity) against the false positive rate (Specificity) for the different possible cut-off points of a diagnostic test.
ROC – Machine Learning Interview Questions
Type 1 vs Type 2 Error – Machine Learning Interview Questions
Q13. Is it better to have too many false positives or too many false negatives? Explain.
False Negatives vs False Positives – Machine Learning Interview Questions
It depends on the question as well as on the domain for which we are trying to solve the problem. If you’re using Machine Learning in the domain of medical testing, then a false negative is very risky, since the report will not show any health problem when a person is actually unwell. Similarly, if Machine Learning is used in spam detection, then a false positive is very risky because the algorithm may classify an important email as spam.
Model Accuracy vs Performance – Machine Learning Interview Questions
Well, you must know that model accuracy is only a subset of model performance. The accuracy of the model and performance of the model are directly proportional and hence better the performance of the model, more accurate are the predictions.
Over-fitting occurs when a model studies the training data to such an extent that it negatively influences the performance of the model on new data.
This means that the disturbance in the training data is recorded and learned as concepts by the model. But the problem here is that these concepts do not apply to the testing data and negatively impact the model’s ability to classify the new data, hence reducing the accuracy on the testing data.
Three main methods to avoid overfitting:
Ensemble Learning – Machine Learning Interview Questions
Ensemble learning is a technique that is used to create multiple Machine Learning models, which are then combined to produce more accurate results. A general Machine Learning model is built by using the entire training data set. However, in Ensemble Learning the training data set is split into multiple subsets, wherein each subset is used to build a separate model. After the models are trained, they are then combined to predict an outcome in such a way that the variance in the output is reduced.
Bagging & Boosting – Machine Learning Interview Questions
The following methods can be used to screen outliers:
How do you handle these outliers?
Eigenvalue & Eigenvectors – Machine Learning Interview Questions
In the above example, 3 is an Eigenvalue, with the original vector in the multiplication problem being an eigenvector.
The Eigenvector of a square matrix A is a nonzero vector x such that for some number λ, we have the following:
Ax = λx,
where λ is an Eigenvalue
So, in our example, λ = 3 and X = [1 1 2]
A/B Testing – Machine Learning Interview Questions
Here p and q is the probability of success and failure respectively in that node.
This set of Machine Learning interview questions deal with Python related Machine Learning questions.
Here is a list of Python libraries mainly used for Data Analysis:
Python Libraries – Machine Learning Interview Questions
It depends on the visualization you’re trying to achieve. Each of these libraries is used for a specific purpose:
Pandas Series vs DataFrame – Machine Learning Interview Questions
Consider the following Python code:
bill_data=pd.read_csv("datasetsTelecom Data AnalysisBill.csv") bill_data.shape #Identify duplicates records in the data Dupes = bill_data.duplicated() sum(dupes) #Removing Duplicates bill_data_uniq = bill_data.drop_duplicates()
#importing dataset import sklearn from sklearn import datasets iris = datasets.load_iris() X = iris.data Y = iris.target #splitting the dataset from sklearn.cross_validation import train_test_split X_train, Y_train, X_test, Y_test = train_test_split(X,Y, test_size = 0.5) #Selecting Classifier my_classifier = tree.DecisionTreeClassifier() My_classifier.fit(X_train, Y_train) predictions = my_classifier(X_test) #check accuracy From sklear.metrics import accuracy_score print accuracy_score(y_test, predictions)3 - Machine Learning Scenario Based Questions
This set of Machine Learning interview questions deal with scenario-based Machine Learning questions.
SELECT f.user_id, l.page_id FROM friend f JOIN like l ON f.friend_id = l.user_id WHERE l.page_id NOT IN (SELECT page_id FROM like WHERE user_id = f.user_id)
(note: here 0.96 denotes the chance of not seeing an ad in 100 stories, 99 denotes the possibility of seeing only 1 ad, 0.04 is the probability of seeing an ad once in 100 stories )
Since the data is spread across the median, let’s assume it’s a normal distribution.
As you know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.
You can do the following:
Type 1: How to tackle high variance?
Type 2: How to tackle high variance?
Possibly, you might get tempted to say no, but that would be incorrect.
Discarding correlated variables will have a substantial effect on PCA because, in the presence of correlated variables, the variance explained by a particular component gets inflated.
Yes, it is possible.
Q15. ‘People who bought this also bought…’ recommendations seen on Amazon is based on which algorithm?
E-commerce websites like Amazon make use of Machine Learning to recommend products to their customers. The basic idea of this kind of recommendation comes from collaborative filtering. Collaborative filtering is the process of comparing users with similar shopping behaviors in order to recommend products to a new user with similar shopping behavior.
Collaborative Filtering – Machine Learning Interview Questions
To better understand this, let’s look at an example. Let’s say a user A who is a sports enthusiast bought, pizza, pasta, and a coke. Now a couple of weeks later, another user B who rides a bicycle buys pizza and pasta. He does not buy the coke, but Amazon recommends a bottle of coke to user B since his shopping behaviors and his lifestyle is quite similar to user A. This is how collaborative filtering works.
Thanks for reading ❤
If you liked this post, share it with all of your programming buddies!
This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.
Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial
It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.
Below topics are explained in this Machine Learning course for beginners:
Basics of Machine Learning - 01:46
Why Machine Learning - 09:18
What is Machine Learning - 13:25
Types of Machine Learning - 18:32
Supervised Learning - 18:44
Reinforcement Learning - 21:06
Supervised VS Unsupervised - 22:26
Linear Regression - 23:38
Introduction to Machine Learning - 25:08
Application of Linear Regression - 26:40
Understanding Linear Regression - 27:19
Regression Equation - 28:00
Multiple Linear Regression - 35:57
Logistic Regression - 55:45
What is Logistic Regression - 56:04
What is Linear Regression - 59:35
Comparing Linear & Logistic Regression - 01:05:28
What is K-Means Clustering - 01:26:20
How does K-Means Clustering work - 01:38:00
What is Decision Tree - 02:15:15
How does Decision Tree work - 02:25:15
Random Forest Tutorial - 02:39:56
Why Random Forest - 02:41:52
What is Random Forest - 02:43:21
How does Decision Tree work- 02:52:02
K-Nearest Neighbors Algorithm Tutorial - 03:22:02
Why KNN - 03:24:11
What is KNN - 03:24:24
How do we choose 'K' - 03:25:38
When do we use KNN - 03:27:37
Applications of Support Vector Machine - 03:48:31
Why Support Vector Machine - 03:48:55
What Support Vector Machine - 03:50:34
Advantages of Support Vector Machine - 03:54:54
What is Naive Bayes - 04:13:06
Where is Naive Bayes used - 04:17:45
Top 10 Application of Machine Learning - 04:54:48
How to become a Machine Learning Engineer - 04:59:46
Machine Learning Interview Questions - 05:09:03
Top 10 Machine Learning Interview Questions 2019 - Check out ten machine learning interview questions and answers.
Originally published by Vibhuthi Viswanathan at dzone.com
Emerging technologies have taken the world by storm. The innovations, opportunities, and threats they have unleashed are like no other. Along with their growth, the demand for specialists in these areas has grown.
As per the findings of the latest industry report, jobs in emerging technologies like machine learning, artificial intelligence, and data science rank among the top emerging jobs. A career in emerging technologies such as machine learning, AI, or data science can be highly lucrative as well as intellectually stimulating.
In this article, I have compiled some of the most frequently asked machine learning interview questions with their corresponding answers. Machine learning aspirants, as well as experienced ML professionals, can use this to revise their fundamentals before the interview.
Machine Learning Interview Questions 2019
1.Differentiate Machine Learning and Deep Learning
Machine learning, a subset of artificial intelligence, provides the machines with the capability to learn and improve automatically without any explicit programming. Whereas Deep learning, a subset of machine learning, artificial neural networks that are capable of making intuitive decisions.
2.What do you understand by the terms Recall and Precision?
The recall is alternatively called a true positive rate. It refers to the number of positives that have been claimed by your model compared to the number of positives that are available throughout the data.
Precision, which is alternatively called a positive predicted value, is based on prediction. It is a measurement of the number of accurate positives that the model has claimed as compared to the number of positives that the model has actually claimed.
3.Differentiate between Supervised Machine Learning and Unsupervised Machine learning?
In Supervised learning, the machine is trained with the help of labeled data, i.e., data that is tagged with the right answers. Whereas in unsupervised machine learning, the model learns by discovering information by itself. As compared to supervised learning models, unsupervised models are more preferred for performing difficult processing tasks.
4.What is K-means and KNN
K-means is an unsupervised algorithm that is used for the process of clustering problems and KNN or K nearest neighbors is a supervised algorithm that is used for the process of regression and classification.
5.What makes Classification different from Regression
Both these concepts are an important aspect of supervised machine learning techniques. With Classification, the output is classified into different categories for making predictions. Whereas Regression models are usually used to find out the relationship between forecasting and variables. A key difference between classification and regression is that in the former the output variable is discrete and it is continuous in the latter.
6.How will you deal with missing data in a dataset?
One of the greatest challenges faced by a data scientist pertains to the problem of missing data. You can attribute the missing values in many ways including assigning a unique category, row deletion, substituting with mean/median/mode, employing algorithms that support the support missing values, and forecasting the missing value to name a few.
7.What do you understand by Inductive Logic Programming (ILP)?
A subfield of machine learning, Inductive Logic Programming searches patterns in data by using logic programming to develop predictive models. This process assumes that logic programs are a hypothesis or background knowledge.
8.What are the steps you need to ensure you don't overfit with a specific model?
When the model is provided a large amount of data during training, it starts to learn from the noise and other wrong data from the data set. This makes it difficult for the model to learn to generalize new instances apart from the training set. There are three ways by which you can avoid overfitting in machine learning. The first way is by keeping the model simple, the second way is by using cross-validation techniques and thirdly, by using regularization techniques, for example, LASSO.
9.What is Ensemble Learning?
Ensemble methods are alternatively called learning multiple classifier systems or committee-based learning. Ensemble method refers to the learning algorithms that build classifier sets and then categorize new data points to make a choice of their forecasting. This method trains a number of hypotheses to address the same problem. The best example of ensemble modeling is the random forest trees where many decision trees are used for predicting the results.
10.Name the steps that are required in a machine learning project?
Some of the critical steps that you should take for achieving a good working model are collecting data, preparing data, selecting a machine learning model, model training, evaluating the model, tuning the parameter, and lastly, prediction.
Originally published by Vibhuthi Viswanathan at dzone.com