How do Kaggle competitions work

What is Kaggle? The new platform simply explained.

Kaggle is a platform to show your skills in data analysis and machine learning and to compare yourself against others. Prize money over $ 10,000 is often advertised as a reward.

What is Kaggle?

Kaggle is a platform specializing in data science, with regular competitions. Mostly it is about the optimization of machine learning-based predictions, for example time series forecasting or classification. Real data and prize money provided by organizations, some of which reach millions of euros, results in a mutual measurement of the skills of the participants and the “hunt” for the top placements.

In general, a competition runs in such a way that a company or other organization posts data and a description of the problem (e.g. “forecast of sales in month X”). Based on this, the participants or participant teams can develop and upload their solutions (mostly as an ID prediction pair).

These solutions are then automatically evaluated and thus the leadership board is formed. The lower the error, i.e. the better the prediction, the higher the ranking. How the errors are calculated depends on the competition, but mostly simply a squared mean error or a similar measure.

The story of Kaggle

Kaggle was founded in 2010 in Los Angeles, acquired by Google in 2017 and reached over a million members in the same year. From the beginning, Kaggle was recognized as a “Competition Platform” and dedicated itself to the challenge of marketing machine learning as an optimization problem.

In the meantime, you will not only find hundreds of competitions on Kaggle, but also a database of publicly accessible data sets and courses. Thus, Kaggle takes on an increasingly central role in the careers of many data scientists, as first practical experience can be gained here that goes beyond prepared standard data sets (Titanic, iris ..).

Who is the target audience for Kaggle?

While Kaggle was initially intended more for experienced data scientists and machine learning engineers, it now covers pretty much the entire spectrum of experience in data science and AI. The challenging competitions for experienced data scientists remain the central component of Kaggle, but there are more and more interesting aspects for beginners due to the comprehensive offer.

Newcomers can quickly gain insights into other ways of thinking and analyzing and implement their own ideas, especially through the publishable notebooks, which contain code from participants. There are also relatively old but very accessible competitions that are well suited for expanding knowledge.

What makes Kaggle so special?

Kaggle was the first public platform that dealt with the topic of “machine learning as a competition”. The attractiveness of high prize money is a factor, but a very high placement in the competitions alone is often considered an award for the participants. Particularly noteworthy is the possibility of publishing notebooks, i.e. scripts.

Most of the time in every competition there is a publicly available notebook that provides a basic analysis (exploratory data analysis with, if necessary, initial modeling). Based on this, refinements can be worked out. Of course, you can also work completely for yourself without having to publish scripts.

Frequently asked questions about Kaggle

In which programming languages ​​do you work on Kaggle?

Whether python or R or Java - the development has no influence on the competitions at Kaggle. Since the script is not the solution that is evaluated, but only the predictions as .csv, you can generate this output with anything you can think of.

However, if you want to work directly with the Kaggle Notebook Environment, you have to rely on python or R. But he has the advantage of working directly on the resources provided by Kaggle.

How do you become a Kaggle Grandmaster?

Grandmaster is the final stage of the Kaggle Progression System. To become a Kaggle Grandmaster, you have to consistently excel in one of the four categories: Competitions, Datasets, Notebooks and Discussion.

For example, to become a Notebook Grandmaster you need 15 gold medals, with one medal standing for 50 upvotes, new members and old posts are excluded. Consequently, to become a Kaggle Grandmaster, you have to publish an exceptionally good basic analysis in 15 different competitions. Most, however, equate Kaggle Grandmaster with the category “Competitions”, as this is where the analyzes are evaluated. A top 10 placement in a number of competitions is usually required here; and that with several thousand participants.

Overall, the highest level in the Kaggle Progression System is 4x Kaggle Grandmaster, something that very few people have achieved so far. Strictly speaking, as of January 20th, 2021 exactly three of over 150,000 active participants: Chris Deotte, Vopani and Abhishek Thakur.

The Kaggle Titanic Data Set

The Titanic Dataset is often used not only at Kaggle, but generally in the area of ​​data science, if you want to implement classification in practice. Kaggle guides its new users directly through the analysis of the dataset as a kind of tutorial on how Kaggle works as a platform and how to submit solutions.

Is Kaggle free?

Yes, Kaggle membership is free. However, in order to download data sets or to take part in the competitions, one must be registered.

What can you win at Kaggle?

Usually, Kaggle Competitions have cash prizes in the lower five-digit range, but higher prizes are also possible. There are also competitions without a profit or with other prizes, such as company memberships or the like.

Where can I find datasets on Kaggle?

Kaggle now has its own section only for publicly accessible data sets: https://www.kaggle.com/datasets

In order to download the data you have to be registered.

Who Owns the Kaggle Platform?

The platform is founded and managed by Anthony Goldbloom and Ben Hamner. In the meantime, Google has bought the platform and is therefore the owner.

Who should join Kaggle?

We recommend trying Kaggle at least once. Only those who have a lot of time and experience will be able to deliver good results, so prioritization is important as usual. In general, however, if someone has barely had any practical experience in the area of ​​machine learning, Kaggle can be a good starting point to really deal with the problems in the area of ​​data science.

Categories General