Help wanted: Data scientist
By Thomas H. Davenport
"Big data" has excited many executives with its potential to transform businesses and organizations. The concept refers to data that is either too voluminous, too unstructured, or from too many diverse sources to be managed and analyzed through traditional means. The related concept of "high-performance analytics" (HPA) involves using new technologies to dramatically accelerate the speed of large-scale analytical projects.
But prospering from big data and HPA takes more than simply employing new technologies. Firms will need to build new capabilities to deal with this new resource, even if they are already experienced users of analytics.
One of the most important capabilities is "data scientists" to do the day-to-day work of big data management and analysis. I interviewed more than 30 of them in a recent study, and here's what you need to know:
What is a data scientist?
GE, for example, expects to hire more than 400 computer and data scientists at its new Global Software and Analytics Center in San Francisco to focus on big data for industrial products, such as locomotives, turbines, jet engines and large energy generation facilities.
Data scientists have somewhat different roles from traditional quantitative analysts. Whereas traditional analysts typically use analytics on internally generated data to support internal decisions, the focus of many data scientists is on customer-facing products and processes, where they help to generate products, features, and value-adding services. For example:
Given this product-centric focus, data scientists are most likely to be in product development or marketing organizations. Some work in the reporting structure of the chief technology officer (CTO). Those who report to CTOs are likely to work on tools that make data science easier and more productive.
Data scientists who focus on HPA applications don't necessarily need to understand how to process unstructured data, but they do need to understand how analytical work can be divided across multiple parallel servers. They should be able to explore a variety of ways to use the extra time from HPA to refine their models. In addition, they need to try to accelerate decision speeds to match the much faster cycle times of data analysis.
They're not just programmers; many refer to data scientists' computational skills as "hacking" – bending technology to do their bidding in unusual ways. The specific technologies on which data scientists focus include:
In addition to these technical skills, data scientists also need the attributes previously necessary for analytical professionals, including mathematical and statistical skills, business acumen, and the ability to communicate effectivelywith customers, product managers and decision makers.
Of course, the combination of these skills is difficult to find in one person, so some companies have created data science teams that together embody this collection of skills.
Finding data scientists
There are a variety of other approaches in use to develop and hire data scientists. EMC has determined that the availability of data scientists will be an important gating factor in its own big data efforts and those of its customers. So it has created a "Data Science and Data Analytics" training program for its employees and customers. Some large consulting firms are beginning to offer data scientists to their clients. And a Silicon Valley program, the Insight Data Science Fellows Program, takes scientists for six weeks and teaches them the skills to be a data scientist.
Another problem is that while data scientists combine technology-intensive “data wrangling” and analytics, there is often more of the former than the latter – “big data often equals small math.” The amount of effort necessary to deal with large volumes of unstructured data sometimes means there are fewer resources and less time left over for detailed statistical analysis.
Because of the difficulties of extracting and structuring data, current data scientists also often face issues of relatively low productivity. The next generation of data scientists will undoubtedly be more productive and will use tools that make common tasks much easier.
Just as traditional quantitative analysis on "small data" didn't happen without professional and semi-professional analysts, big data can't be analyzed without data scientists.
Such a person can not only convert unstructured data to structured data and perform quantitative analysis on it, but also help an organization think about what data sources to investigate, what customers really need in data and analysis requirements, and how best to incorporate big data-based products and services into an effective business model.
The many executives who are excited about the potential of big data and high-performance analytics for their organizations need to realize that putting big data to work requires a special breed of analyst. Even if an organization isn't quite ready to aggressively pursue big data opportunities yet, it's worth thinking now about how and when it will acquire the most scarce and valuable resource in big data – the data scientists.
A degree in BIG DATA?
This story appears in the Fourth Quarter 2012 issue of