The Definitive Q&A for Aspiring Data Scientists

I was recently asked five questions by Alex Woodie of Datanami for the article, “So You Want To Be A Data Scientist” that he was preparing. He used a few snippets from my full set of answers. The longer version of my answers provided additional advice. For aspiring data scientists of all ages, I provide here the full, unabridged version of my answers, which may help you even more to achieve your goal. (Note: I paraphrase Alex’s original questions in quotes below.)

1. “What is the number one piece of advice you give to aspiring data scientists?”

My number one piece of advice always is to follow your passions first. Know what you are good at and what you care about, and pursue that. So, you might be good at math, or programming, or data manipulation, or problem solving, or communications (data journalism), or whatever. You can do that flavor of data science within the context of any domain: scientific research, government, media communications, marketing, business, healthcare, finance, cybersecurity, law enforcement, manufacturing, transportation, or whatever. As a successful data scientist, your day can begin and end with you counting your blessings that you are living your dream by solving real-world problems with data. I saw a quote recently that summarizes this: "If you think your scarce data science skills could be better used elsewhere, be bold and make the move." (Reference).

2. “What are the most important skills for an aspiring data scientist to acquire?” 

There are many skills under the umbrella of data science, and we should not expect any one single person to be a master of them all. The best solution to the data science talent shortage is a team of data scientists. So I suggest that you become expert in two or more skill areas, but also have a working knowledge of the others. Those skills include machine learning (data mining), information retrieval, statistics, data and information visualization, databases (modeling, organizing, querying), data structures (including indexing schemes), programming (Python, R, SAS, Hadoop, Spark), graph/network analysis, natural language processing, optimization, and modeling & simulation.

3. “Is it better for a person to stay in school and enroll in a graduate program, or is it better to acquire the skills on-the-job?”  

This is a tough question. Academically, the "sweet spot" is a Master's degree, since it is an advanced professional degree that is completely sufficient in most work environments (though a PhD is required for a research appointment in academic institutions or research labs). These days, more and more organizations are willing to hire data scientists with little course work and with some experience, without an advanced degree. The degree will eventually be very important for career advancement (perhaps most importantly an MBA, which now includes business analytics), so don't avoid getting your degree—it just doesn't have to come before your first data science job.

4. “For someone who stays in school, do you recommend that they enroll in a program tailored toward data science, or would they get the requisite skills in a ‘hard science’ program such as astrophysics (like you)?”

The good news for physics, biology, astronomy, chemistry, and other science students is that they can easily translate their science skills into a data science profession. So, that's a valuable and valid consideration that matches with my earlier statement: follow your passion first! But, if you are starting out now with the vision of being a professional data scientist, then definitely get into a master's degree program in that field. The degree programs come with a variety of emphases: Data Science, Data Analytics, Big Data Engineering, Business Analytics, Healthcare Analytics, Machine Learning, Statistics, Operations Research, Decision Sciences, Computational Science & Informatics, and a few more. Find the degree program whose course requirements are consistent with your strengths, goals, and desired skills.

5. “Do you see advances in analytic packages replacing the need for some of the skills that data scientists have traditionally had, such as programming skills (Python, Java, etc.)?”

There is a difference between skills, talents, and aptitudes. Agility in skills is enabled by a life-long commitment to developing your talents and aptitudes. The aptitudes are invariate to changes in software packages: curious, creative, problem-solver, communicator, lifelong learner, risk-taker, inquisitive, and innovative. There are also basic talents that are independent of specific software packages: statistical literacy, data literacy, computational literacy, machine learning algorithms, data wrangling, data cleaning, data storytelling, and subject matter expertise (in some discipline). Consequently, armed with these talents and aptitudes, the agile data scientist can learn and apply new software packages, can learn and apply new programming skills, and can learn and apply new approaches that are created by the brilliant analytic minds in numerous organizations (commercial, government, and academic). Therefore, advances in analytic packages will not replace the need for data scientists (as some folks have predicted), but these advances will definitely replace the need for some of the data scientist's skills (such as Java), though not all of their skills; I think that we all will need to know a programming language (Python, R, or SAS) and also SQL for the foreseeable future.

In summary, I hope that these answers are useful to you as you prepare for your data science journey. And don’t forget to check out the amazing and free Hadoop training program at MapR, which also covers several other critical big data analytics skills (including multiple HBase and Drill courses) for the successful data scientist.



Practical Machine Learning: Innovations in Recommendation

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free