Posts

  • Writing Effective Amazon Machine Learning

    My article on the Amazon Machine Learning service first published on the ODSC blog and then republished on KDnuggets triggered a book project. Shortly after writing that article, I was contacted by Packt publishing to write an entire book on AWS Machine Learning service. Packt Publishing is well known for...

  • Large Data with Scikit-learn - Boston Meetup

    ### Large Data with Scikit-learn * Alex Perrier - [@alexip](https://twitter.com/alexip) * Data & Software - [@BerkleeOnline](https://twitter.com/berkleeonline) - Day * Data Science contributor - [@ODSC](https://twitter.com/odsc) - Night ### Plan 1) What is large data? out-of-core, streaming, online, batch? 2) Algorithms 3) Implementation 4) Examples ### Many great alternatives * Dato: [GraphLab...

  • Paris Meetup slides Topic Modeling of Twitter Followers

    ### Topic Modeling #####appliqué aux fils twitters. * Alexis Perrier [@alexip](https://twitter.com/alexip) * Data & Software, Berklee College of Music, Boston [@BerkleeOnline](https://twitter.com/berkleeonline) * Data Science contributor [@ODSC](https://twitter.com/odsc) **Part I: Topic Modeling** * Nature et application * Algos et Librairies **Part II: Projet: followers sur twitter** * Methodes * Problemes * Viz...

  • Hands-on analysis of the Amazon Machine Learning service

    Is the new Amazon Machine Learning too simple to reap the benefits of predictive analytics? Machine Learning as a Service (MLaaS) promises to put data science within the reach of companies. In that context, Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The...

  • Jupyter, Zeppelin, Beaker: The Rise of the Notebooks

    One of the particularities of scientific computing is the need for experiments, explorations, and collaborations. This need is addressed by notebooks. Notebooks are collaborative web-based environments for data exploration and visualization — the perfect toolbox for data science. They help create reproducible, shareable, collaborative computational narratives. There are alternatives to...

  • Dynamics of Debates with Time Maps

    2015 presidential debates The race for the presidential nomination for both parties is going full speed with a plethora of debates. At time of writing there has been 4 Republican debates and 2 Democratic ones. These debates have high impacts on the presidential nomination race with candidates dropping out and...

  • NLP Analysis of the 2015 presidential candidate debates

    I’ve been fascinated by the recent presidential nomination debates. Their format, the number of participants, the post debates media frenzy all make for a good show. In the following 2 articles I’ve applied several powerful Text Mining and Natural Language Processing techniques to the transcripts. In this first article: Dissecting...

  • Scikit-learn's Out-of-Core Classifiers for Large Data

    Here the scenario: A new kaggle competition, a new dataset. Gigabites? ouch! Cold shivers as you anticipate hours waiting to extract features, train models and middle of the night cold feet as you’re ‘just checking’ that your python script is still running. Not familiar with H2O, Spark’s MLlib or GraphLab?...

  • Segmentation of Twitter Timelines via Topic Modeling

    Following up on our first post on the subject, Topic Modeling of Twitter Followers, we compare different unsupervised methods to further analyze the timelines of the followers of the @alexip account. We compare the results obtained through Latent Semantic Analysis and Latent Dirichlet Allocation and we segment Twitter timelines based...

  • Topic Modeling of Twitter Followers

    In this post, we explore LDA an unsupervised topic modeling method in the context of twitter timelines. Given a twitter account, is it possible to find out what subjects its followers are tweeting about? Knowing the evolution or the segmentation of an account’s followers can give actionable insights to a...

  • Feature Importance in Random Forests,

    Comparing Gini and Accuracy metrics We’re following up on Part I where we explored the Driven Data blood donation data set. The objective of the present article is to explore feature engineering and assess the impact of newly created features on the predictive power of the model in the context...

  • Blood Donation on DrivenData: Exploration

    Blood Donation on DrivenData - Part I Exploration DrivenData.org is a machine learning competition web site similar to the better known Kaggle.com site with a different angle. It focuses on leveraging Data Science for social issues. And it’s based in Boston! For the learning Data Scientist, DrivenData offers a good...

subscribe via RSS