What Tools Do Elite Data Scientists Use?

What Tools Do Elite Data Scientists Use?
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

What is the recommended tool set of a data scientist at your level? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by DJ Patil, U.S. Chief Data Scientist, on Quora:

What is the recommended tool set of a data scientist at my level? This is one of the questions I get most often. For me, it depends on what kind of problem you're working on. If it is the early part of the problem, then I'd focus on iteration speed. Go fast. This means very fast scripting and visualization. This can be done it lots of today's products. Once you know what you want to do, then it's time to bring in the heavier tool sets that allow you to scale a problem.

I remember when I first started working with open weather data from NOAA where I used a combination of shell scripts combined with Matlab (this was in the days before python etc). I also did some of the more mathematical work in Mathematica. This allowed me to "rip" through the data fast. Effectively trying out lots of different ideas tied to a platform where I could view the data fast. Over time, we figured out what we wanted to study (the Maryland Ensemble Kalman Filter) and then we had to go to MPI with Fortan.

When I'm looking at data for the first time, I just like to look at the data to get a sense on order of magnitude, variability, text, etc. Then I like to conduct lots, and lots of histograms. This helps me start thinking about interesting questions to ask. If it is time series data, I always like to go to the frequency domain (even in spam problems for how often the spam is being sent/received).

One of the coolest things now, are the number of awesome packages out there. SciPy, TensorFlow, etc. These are just starting to scratch the surface of the next great iteration of toolkits. Combined with D3, this reminds me of the stories I used to hear at Los Alamos about when LINPACK came out (true story Stewart failed me in numerical analysisJ). I think we'll look back in two or three years and see a whole new level of tooling and systems to help make looking a data go faster than ever!

This question originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.

More questions:

Popular in the Community

Close

What's Hot