DataTron - Data Science for Everyone

Over the Hack n Roll hackathon at NUS, I worked on a particularly interesting project which I thought I’d briefly write about.

Inspiration

Nowadays, data is ubiquitous but hidden in plain sight. However, there aren’t a lot of options to do data science if you’re not a data scientist. Most data science tools such as Looker and DataRobot are limited to business intelligence use cases, not for personal use.

Since data is rapidly becoming a big part of our lives, can make data science accessible to everyone? What if you can unwrap the insights hidden in your own data without having to know how to use IPython, Spark, or Hadoop?

Introducing DataTron

DataTron is a modern web-enabled data science toolkit for normal people. It lets you easily explore your dataset and mine for insights via a friendly GUI.

  • Drop a CSV file and let DataTron do data science for you. It’s like having your own personal data scientist.
  • DataTron automagically parses and trains classifiers on your data.
  • Upload your dataset and share your link code with others!

DataTron was built on Electron, Node, D3.js, and Javascript.

Screenshots

The dataset used in the gifs are from the UCI Machine Learning Repository’s Car Evaluation Dataset.

How

DataTron is built on Electron (which powers the Atom code editor), Node, and vanilla Javascript. Over the hackathon, I wrote my own bayesian classifier, wrote the desktop app, developed a lightweight Ruby API for uploadingand sharing of datasets, and rendered some visualizations using D3.js and C3.js.

Some challenges:

  • Writing my own Bayesian machine learning classifier from scratch! It’s open source: yosriady/node-bayes!
  • Data visualization of nontrivial machine learning hypothesis representations (ID3 Decision Tree classifier)
  • Building cross-platform desktop apps with Electron + Node

In Closing

Hackathons are a great way to instill that sense of urgency necessary to ship projects.