What is a data scientist? I can take my own experience in the field as some inspiration. A data scientist in my mind has knowledge in the follwoing four areas:
1) Programming:
I first entered the field of data science by learning how to code. I quit my job after 1 and a half years living in New York City post-college with a desire to enter the field of data science. I enrolled in a 3-month bootcamp dedicated to learning python, backend web development, and data science. This first experience showed me that a big part of data science is understanding how to put ideas into action through programming. That can be implementing a machine learning algorithm or creating an R Shiny dashboard to present data analysis. Learning how to program opened up a lot of doors to learn data analysis…
2) Data Analysis:
Data analysis involves the process of surfacing data and exploring the data with any number of methods to derive insights that would not be possible to derive from simply looking at the data. This often involves data visualization, simple statistical methods such as regression, describing relationships between variables, and summarizing data. A data scientist must be able to gain a good understanding of the contents of a data set through this exploration. Sometimes, this results directly into presentations if your goal is to simply analyze characteristics of that particular data set. However, we often want to do more interesting things beyond exploratory analysis, which is where the next two skills come into play.
3) Statistics:
The field of statistics allow us to provide mathematical rigor around uncertainty. Often, the first step of applying a statistical method is to start with a hypothesis - your problem statement. Then, we can apply certain methods to determine whether we can determine some interaction or effect has had a significant effect on the data we observe. This could be problems such as the effect of an ad campaign on sales, or the effect of a drug treatment on patient outcomes.
4) Machine learning:
Machine learning largely focuses on making predictions from data. It is a large field that encompasses supervised and unsupervised learning, regression and classification, linear and non-linear methods, and even deep learning. It is a subfield of the greater Arificial Intelligence field. A data scientist is often tasked to create intelligent systems from data such as a classifier - or more complex programs using deep learning such as a translator or product recommender.
Comparisons Between Career Fields
A statistician differs in that they likely focus more on #3. A data analyst by comparison will focus more in #1 and #2, while a machine learning engineer will focus on productionalizing #4. A data scientist is comfortable in all of these areas, though may not have as much knowledge (especially research-wise) in any particular domain. I began my career as a data analyst, the learned machine learning, and am rounding out my experience with this Master’s program in Statistics, with the goal of being a more well-rounded data scientist.