Your Year on Kaggle: Most Memorable Community Stats from 2016

Now that we have entered a new year, we want to share and celebrate some of your 2016 highlights in the best way we know how: through numbers. From breaking competitions records to publishing eight Pokémon datasets since August alone, 2016 was a great year to witness the growth of the Kaggle community. And we can’t help but quantify some of our favorite moments and milestones. Read about the major machine learning trends, impressive achievements, and fun factoids that all add up to one amazing community. We hope you enjoy your year in review!

We’d love to hear what numbers you’re looking forward to in 2017!
Share your data science predictions, resolutions, and plans in the comments or tag us on Twitter .



This past year we welcomed well over 300,000 new users to our Kaggle from all over the world. The world map below highlights the growing global data science community with representation from nearly every country.

Say hello to the newest members of the data science community!

Say hello to the newest members of the data science community!

Looking at our community, these were some of our favorite numbers that represent who you are, your accomplishments, and the future of Kaggle:

_blog_tfidf We wanted to get to know our twitter followers a bit better, so we read all of their bios. Well, machine-read. This is the highest tf-idf calculated from the words in our followers’ bios. What’s the word, you ask? Hint: it’s not « #bigdata ». The word is analytics.

The one-millionth Kaggler is currently projected to register on September 9th, 2017. In other words, that special moment will happen at 1504915200 in Unix epoch time. We didn’t run a competition to arrive at this prediction, though, so we won’t be surprised to see it happen sooner!

In addition to these highlights, we applaud the eighty-eight Kagglers who’ve achieved Grandmaster status in Competitions plus one Kernel Master, ZFTurbo. And conversation was good in 2016: nearly 50,000 discussion posts were shared including remembrances of the life and accomplishments of Lucas (Leustagos), a data science hero and #1 Kaggler.



Studying what techniques Kagglers are using and talking about is one way to keep your finger on the pulse of machine learning. This is why we’ve published the Meta Kaggle dataset containing our public data on competitions and more. And when we looked at hot topics in 2016, it likely comes as no surprise that XGBoost dominated discussion of ML techniques this past year. But, you can see below that Keras is ending the year strong! Already piquing community interest, we’re curious to witness how newcomer LightGBM will do in 2017.

ML techniques/frameworks discussed on Kaggle

In 2016, over 60,000 Kagglers competed for $1.1M in prizes, jobs, and knowledge in 31 competitions. Thirty-nine winning teams shared their approaches right here on No Free Hunch and 154,986 submissions were made to the Titanic Getting Started competition alone. Plus, the future of competitions is bright: we launched our first Code Competition in December. Here are some of our favorite moments (and their numbers) from the last year:

_blog_inclassWe saw 1.92 times more Kaggle InClass competitions launched by professors in 2016 compared to last year. 21,304 high fives to the students who made a submission!


Kernels & Datasets

In 2016 we observed a lot of clickbait headlines crowning either Python or R as the best language for doing data science. Well, we have some numbers to lend some substance to the arguments. In past years, R was the language of choice on Kaggle, but 2016 has seen Python emerge as a clear winner when it came to the number of kernels written. One question remains: will Python maintain its constrictive grip in the coming year?

Monthly kernels written on Kaggle by language

In other big news, we began to allow users to publish their own datasets on our open data platform last August. Naturally, we were excited to dig into the weird and wonderful new numbers this would give us! Here are some of our favorites:

_blog_billboardOur open data platform isn’t quite like the Billboard 200. But if it were, the dataset How ISIS Uses Twitter would be a top-10 chart-topper for an impressive 31 weeks. Read about the stories behind this dataset and others in our Open Data Spotlight »

_blog_pokemonEven Kagglers caught the Pokémon craze. Users have published eight Pokémon-related datasets which together claim nearly 9,000 downloads. Gotta download ’em all! »

When it came to open datasets published by users and organizations in 2016, sports and games were the clear winners. From the incredible European Soccer Database with its 331 kernels to 20 Years of Games from IGN, the numbers make us look forward to a fantastic, data-filled 2017.


InnoValeur | Data Science | Smart Data | Machine Learning | AI

Publier un commentaire

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.