Here the scenario: A new kaggle competition, a new dataset. Gigabites? ouch! Cold shivers as you anticipate hours waiting to extract features, train models and middle of the night cold feet as you’re ‘just checking’ that your python script is still running.

Not familiar with H2O, Spark’s MLlib or GraphLab? Fear not!

Stupendous Scikit-learn will come to your rescue with its line-up of out-of-core classifiers.

The rest of the story is on the Open Data Science Conference Blog: Riding on Large Data with Scikit-learn by yours truly.


If you liked this post, you can share it with your followers or follow me on Twitter!

If you have any questions or comments, please post them below.