Luna  Hermann

Luna Hermann

1599177780

Kaggling: A Journey of Past Competitions — Part1

1. Jigsaw Multilingual Toxic Comment Classification Competition

In Jigsaw competition, Cross-validation, postprocessing, and preprocessing played a lot of importance.

  1. Pseudo labeling: performance improvement when we used test-set predictions as training data — the intuition is that it helps models learn the test set distribution. Using all test-set predictions as soft-labels worked better than any other version of pseudo-labeling (e.g., hard labels, confidence thresholded PLs, etc.). Towards the end of the competition, we discovered a minor but material boost in LB. To know more about Pseudo labeling, best explained here by Chris from whom I learned a lot.
  2. k-fold CV and validation set as hold-out but as refined test predictions and used pseudo-labels + validation set for training, the validation metric became noisy to the point where we relied primarily on the public LB score.
  3. Postprocessing: the history of submissions and tweaking the test set predictions. Then tracking the delta of predictions for each sample for successful submissions, averaging them, and nudging the predictions in the same direction made the way for winning solutions.

TReNDS Neuroimaging Competition

In this competition reading, MRI data was a bit tedious. So preprocessing the data for constructed features, postprocessing and TLDR played a major role.

  1. Preprocessing: Adding bias to different columns in the test set to make it closer to the train set. Linear models showed good performance in the competition so it was expected that adding biases would help a lot (at least for linear models). There are lots of ways to figure out possible biases: we used the minimization of Kolmogorov-Smirnov test’s statistic between train and test dataset. To know more about the Kolmogorov-Smirnov(KS) test’s statistic here is a notebook.
  2. Postprocessing: Postprocessing used the same logic as a preprocessing: That is calculating the KS statistic and finding the best fit. But the effect of postprocessing was small: only 5e-5 for both public and private.
  3. TLDR: Incremental PCA for 3d images.Offsets for test features (like we did in ION).

#deep-learning #solutions #neural-networks #artificial-intelligence #kaggle

What is GEEK

Buddha Community

Kaggling: A Journey of Past Competitions — Part1
Vern  Greenholt

Vern Greenholt

1598236620

Kaggle Beginner Competitions Can Be Cheated

The purpose of this article is to warn new kagglers before they waste their time on trying to get an impossible score. Some kagglers got maximum accuracy with one click. Before we discuss how they did it and why — let’s introduce shortly Kaggle scoring model to understand why would even somebody try to cheat.

Kaggle Progression System

Kaggle is a portal where data scientists, machine learning experts, and analytics can challenge their skills, share knowledge, and take part in various competitions. And it is open to every level of experience — from complete newbie to grandmaster. You can use open datasets to broaden your knowledge, gain kudos/swag, and even win money.

Image for post

Some of the available competitions. (Image by author)

Winning competitons, taking part in discusions, and sharing your ideas result in medals. Medals are presented on your profile along with all your achievements.

Image for post

#data-science #beginner #kaggle-competition #competition #kaggle #data science

Kaggle Beginner Competitions Can Be Cheated

The purpose of this article is to warn new kagglers before they waste their time on trying to get an impossible score. Some kagglers got maximum accuracy with one click. Before we discuss how they did it and why — let’s introduce shortly Kaggle scoring model to understand why would even somebody try to cheat.

Kaggle Progression System

Kaggle is a portal where data scientists, machine learning experts, and analytics can challenge their skills, share knowledge, and take part in various competitions. And it is open to every level of experience — from complete newbie to grandmaster. You can use open datasets to broaden your knowledge, gain kudos/swag, and even win money.

Image for post

Some of the available competitions. (Image by author)

Winning competitons, taking part in discusions, and sharing your ideas result in medals. Medals are presented on your profile along with all your achievements.

#data-science #beginner #kaggle-competition #competition #kaggle #data science

Luna  Hermann

Luna Hermann

1599177780

Kaggling: A Journey of Past Competitions — Part1

1. Jigsaw Multilingual Toxic Comment Classification Competition

In Jigsaw competition, Cross-validation, postprocessing, and preprocessing played a lot of importance.

  1. Pseudo labeling: performance improvement when we used test-set predictions as training data — the intuition is that it helps models learn the test set distribution. Using all test-set predictions as soft-labels worked better than any other version of pseudo-labeling (e.g., hard labels, confidence thresholded PLs, etc.). Towards the end of the competition, we discovered a minor but material boost in LB. To know more about Pseudo labeling, best explained here by Chris from whom I learned a lot.
  2. k-fold CV and validation set as hold-out but as refined test predictions and used pseudo-labels + validation set for training, the validation metric became noisy to the point where we relied primarily on the public LB score.
  3. Postprocessing: the history of submissions and tweaking the test set predictions. Then tracking the delta of predictions for each sample for successful submissions, averaging them, and nudging the predictions in the same direction made the way for winning solutions.

TReNDS Neuroimaging Competition

In this competition reading, MRI data was a bit tedious. So preprocessing the data for constructed features, postprocessing and TLDR played a major role.

  1. Preprocessing: Adding bias to different columns in the test set to make it closer to the train set. Linear models showed good performance in the competition so it was expected that adding biases would help a lot (at least for linear models). There are lots of ways to figure out possible biases: we used the minimization of Kolmogorov-Smirnov test’s statistic between train and test dataset. To know more about the Kolmogorov-Smirnov(KS) test’s statistic here is a notebook.
  2. Postprocessing: Postprocessing used the same logic as a preprocessing: That is calculating the KS statistic and finding the best fit. But the effect of postprocessing was small: only 5e-5 for both public and private.
  3. TLDR: Incremental PCA for 3d images.Offsets for test features (like we did in ION).

#deep-learning #solutions #neural-networks #artificial-intelligence #kaggle

Lessons From My First Kaggle Competition

How I chose my first Kaggle competition to enter and what I learned from doing it.

A little background

I find starting out in a new area of programming a somewhat daunting experience. I have been programming for the past 8 years, but only recently have developed a keen interest in Data Science. I want to share my experience to encourage you to take the plunge too!

I started out dipping my toe in the ocean of this vast topic with a couple of the Kaggle mini-courses. I didn’t need to learn how to write in Python, but I needed to equip myself with the tools to do the programming that I wanted. First up was Intro to Machine Learning — it seemed like a good place to start. As part of this course you contribute to an in course competition, but even after completing it, I didn’t feel prepared to do a public competition. Cue Intermediate Machine Learning, where I learned to use a new model and how to think deeper about a data problem.

#2020 sep tutorials # overviews #competition #data science #kaggle

Alec  Nikolaus

Alec Nikolaus

1601636400

International alternatives to Kaggle for Data Science / Machine Learning competitions

We’ve all heard of Kaggle, but that also means there’s more competition — recently, Kaggle reached 5 million users. Further, not all competitions are open to everyone in the world. Here’s the policy of one competition, for instance:

“Members of the Kaggle community who are not United States Citizens or legal permanent residents at the time of entry are allowed to participate in the Competition but are not eligible to win prizes. If a team has one or more members who are not prize eligible, then the entire team is not prize eligible.”

By trying out other competition platforms, you can be a “big fish in a small pond,” as there are a lot fewer competitors.

Keep in mind that AI competitions aren’t the end-all-be-all if you want to enter the industry, as you’ll need knowledge in statistics, computing, communication, and more — not just knowing how to build models.

Besides ranking in competitions, you’ll want to work on practical projects that you can share with the world. Ideally, your projects can resonate with non-technical audiences as well, such as hiring managers who often don’t understand the intricacies of the field. To do so, you can use no-code analytics tools like Apteo that let you share very simple and easy-to-understand analyses.

That being said, let’s dive in

#2020 sep tutorials # overviews #competition #data science #kaggle #machine learning