The novice data scientist needs practice, and Kaggle solves this problem. In this article you will learn the best way to get started with the popular service.
What is Kaggle?
Kaggle is a community for data scientists. Originally it was a competition platform, but over time it has expanded into many other sections (more about them below). As a result, we have a kind of social network for data scientists, with a full cycle of new expert development: from educational mini-courses on the basics of machine learning to data science competitions.
How to get started?
Before you start conquering Kaggle, you need to register on the website. Follow the link and click Register. You will have two options: register through your Google account or by email address. You get a confirmation email, log in – that’s it, you’re now in the Kaggle community.
Important:The resource has a promotion system. Once you register, your account will be at the lowest level:
Novice.
The next level is Contributor. You can reach it in a few simple steps:
- Start 1 script or notebook.
- Make 1 submission in any competition.
- Write 1 comment.
- Make 1 upvote (the equivalent of a “like” – up arrow).
Below you will find a detailed guide on how to do these things and earn a Contributor badge.
What’s on Kaggle?
After registering, you will be taken to the home page and you will see several sections.
Competitions – the essence of Kaggle
They are divided into several types. For example, Getting Started and Playground are competitions with pre-processed data and simple tasks. Featured and Research are more complex and full-scale competitions. They cover the entire cycle of a Data Scientist’s work (e.g. bringing data to an acceptable state). The challenge in such competitions is to solve a non-trivial problem, often from large corporations such as Google or Amazon, with solid cash rewards for the winners.
Datasets – datasets for all tastes and colours
Both competition and custom datasets presented here. You can also upload your own datasets.
Code – custom code
An online editor which lets you create a Jupyter Notebook or a simple script in Python and R. You just plug in the data and work in a browser without installing any libraries or dependencies.
The code is laid out in everything from EDA (Exploratory Data Analysis) tasks from competitions, to simple methods to help you optimise your programs.
Below the search bar are tags, by which you can select the “notebooks” you’re interested in.
The arrow below the name is the Upvote, which is used to determine relevance. I recommend that you select the notebook you are interested in, like it, comment on it and click Copy and Edit. Thus you will save it to your profile (analogue of fork on GitHub), can run the cells inside it and get the Kaggle Contributor button described above.
Discussions
A forum divided into several parts.
- General – everything related to Kaggle itself (announcements, discussions about past competitions) and life cycles of machine learning models.
- Getting Started – an analogue of the previous section, but for newcomers. It is recommended to visit it in the first place.
- Product Feedback – feedback on the site. If you encounter technical problems while working at Kaggle, this is your place.
- Question & Answers – technical advice from other data scientists.
- Learn – questions and discussions about the Courses section of the site.
How do I enter a Kaggle competition?
Find a competition that suits you. Then click Join Competition and agree to the terms and conditions.
- Overview – an overview of the competition. This sets out the essence of the problem to be solved. It also specifies the metric used in the competition and other requirements (e.g. “submit” format).
- Data – data for which the best metric is to be achieved.
- Code – here contestants post their ideas and solutions. This section is recommended to visit first, as you can spy ideas for your own solutions.
- Discussion – discussion of problems of the competition, methods of solution, nuances.
- Leaderboard – A leaderboard. In advanced competitions there is a gold section for cash prizes, a silver section for incentives and a bronze section for Kaggle medals.
- Rules – the rules of the competition.
- Team – not all competitions have teams. Teams are best created in the more difficult stages of conquering Kaggle, but to start with, try it out yourself to gain the necessary skills.
So you’ve got the interface sorted. The classic aim of the competition looks like this: you need to knock out the best metric based on the available data.
More often than not, the data is broken down into train and test sets. With the first you train the model and with the second you make a prediction before saving the solution (Submission).
To create a solution on the website, you have to:
- Create a new notebook under Code.
2. Add the competition data, by clicking the Add data button.
3. Save the notebook.
4. A menu will pop up, where you click Submit to Competition.
Your solution now appears in the tournament table.
The easiest competition for beginners.
Note:
These competitions are part of the Getting Started and Playground categories. You won’t get any cash prizes or medals for them, but they are a great way to improve your skills and get into the competitive environment of Kaggle.
- Titanic. Probably the most famous competition for newcomers. The Titanic dataset contains the passenger data of the ship of the same name. Your goal is to build a model that can best predict whether an arbitrary passenger survived or not. This is a typical classification problem.
- House prices. The goal is to predict the house price based on a number of features, like location, square footage, number of rooms, garage, etc. There is a more advanced version of this competition – Advanced Regression Techniques. Here you have to solve a regression problem, so you can populate the model with linear methods.
- Tabular Playground Series. Runs every month from January 2021. Your goal is to predict the target column based on simple, tabular data. Unlike the endless competitions described above, Tabular Playground is exactly one month long, making it more dynamic. There will be fewer open notebooks with ready-made answers, but more space to create your own, unique solution.
Why participate in Kaggle competitions?
If you haven’t tried Kaggle yet, it’s time to start. Competitions help you learn how to solve real-world Data Science problems and choose one of its many directions. With continuous practice, you’ll learn more in a week than you would in 3 months of studying theory. Moreover, competition medals will be a plus for employment: employers will definitely pay attention to your practical experience. In the next article, we will deal with one of the most basic Kaggle competitions – House Prices.