If you recently got started on Kaggle, or if you are an old regular of the platform, you probably wonder how to easily improve the performance of your model. Here are some practical tips I’ve accumulated through my Kaggle journey. So, either build your own model or just start from a baseline public kernel, and try implementing these suggestions !

1. Always review past competitions

Although Kaggle’s policy is to never feature twice an identical competition, there are often remakes of very similar problems. For example, some hosts propose a regular challenge on the same theme yearly (NFL’s Big Data Bowl for example), with only small variations, or in some fields (like medical imaging for example) there are a lot of competitions with different targets but very similar spirit.

Reviewing winners’ solutions (always made public after competition ends thanks to the incredible Kaggle community) can therefore be a great plus, as it gives you ideas to get started, and a winning strategy. If you have time to review a lot of them, you will also soon find out that, even in very different competitions, some popular baseline models seem to always do the job well enough :

  • Convolutional Neural Networks or the more complex ResNet or EfficientNet in computer vision challenges,
  • WaveNet in audio processing challenges (that can also very well be treated by image recognition models, if you just use a Mel Spectrogram),
  • BERT and its derivatives (RoBERTa, etc) in natural language processing challenges,
  • Light Gradient Boosting Method (or other Gradient Boosting or trees strategies) on tabular data

You can either look for similar competitions on the Kaggle platform directly, or take a look at this great summary by Sundalai Rajkumar.

Reviewing past competitions can also help you get hints on all the other steps explained in the following. For example, getting tips and tricks on preprocessing for similar problems, how people choose their hyperparameters, what additional tools they have implemented in their models to have them win the game, or if they focused on bagging only similar versions of their best models or rather ensembled a melting pot of all available public kernels.

#data-science #editors-pick #kaggle #getting-started #machine-learning

5 Simple Tips To Improve Your Kaggle Models
1.15 GEEK