How Much Data is Enough to Build a Machine Learning Model

Because machine learning models learn from data it is important to have enough data that the model can learn to handle every case that you will throw at the model when it is actually used. It is a common practice to make sure that all of the inputs to a model (such as a neural network) are within the ranges of the training data. However, this univariate approach does not look at how you would deal with multi-variate coverage of data. For example, your training data may have individuals with heights ranging the normal values for humans. But do you have enough males and females at each height? How about the age ranges? How fully filled out is your training data. This videos shows some ways that you can measure the coverage of your data.

Source code: https://github.com/jeffheaton/present/blob/master/prepare_ai_2019/howMuchData.ipynb

Subscribe: https://www.youtube.com/channel/UCR1-GEpyOPzT2AO4D_eifdw

#machine-learning