If your model is not performing at the level you believe it should be performing, you’ve come to the right place! This reference article will detail common issues and solutions in model building.

**Problem:** Your neural network performs very well in training but poorly on validation (testing) sets.

**Issue**: Your neural network is probably overfitting to the data. This means that instead of actually searching for deeper connections within the training examples, it is ‘taking the easy way out’ and simply memorizing all of the data points, which is possible given its large architecture.

**Solutions:**

- Simplify the model architecture. When your neural network has too many layers and nodes, it may give the model the opportunity to memorize the data instead of actually learning generalized patterns. By reducing the storage capacity of a neural network, you are taking away its ability to ‘cheat’ its way to high performance.
- Early stopping is a form of regularization and involves stopping a neural network from training right where the testing error is the smallest.
- Data augmentation, which applies to images, is a good way to drastically increase the dataset size and hence make it impossible for the neural network to memorize everything. It also helps you get the most out of each image, but it’s important to be careful with how augmentations are performed. For instance, if you allow vertical flips as an augmentation on the MNIST digits dataset, the model will have difficulty differentiating 6 and 9 and will hence be done more bad than good.
- Use regularization, which aims to reduce the complexity of the model. L1 weights errors by their absolute value, whereas L2 regularization weights errors by the square of their value. Hence, L2 targets higher errors, and puts disproportionately high penalties on higher errors — because a decrease in error from 5 to 4 is weighted as 25–16 = 9 but a decrease in error from 1 to 0 is weighted as 1–0 = 1, using L2 regularization will yield with coefficients that are very close to zero but not at 0, since there is no large relative incentive to do so. On the other hand, L1 regularization encourages coefficients to continually decreasing if it is profitable to do so, since a decrease from 1000 to 999 is the same as from 1 to 0.
- Generally speaking, L2 regularization may be better for more complex tasks and L1 for simpler ones, but which to use has complete dependency on the nature of the task at hand.
- Adding dropout as a layer can help reduce a model’s ability to simply memorize information and hence overfit. The dropout layer takes in inputs from the previous layer and randomly blocks a prespecified percent of them each time, forcing the network to adapt. When this disability is introduced, the reasoning goes, the neural network must find a way to select and compress only the most important information in each node in anticipation that some of it is bound to be blocked.

#ai #data-science #data #data analysis

1.15 GEEK