Data Science

What it is and why it matters

Data science is a multidisciplinary field that broadly describes the use of data to generate insight. Unlike more specialized data-related fields, such as data mining or data engineering, data science encompasses the complete life cycle of translating raw data into usable information and applying it for productive ends in a wide variety of applications.

The evolution of data science

When tracing the origin of data science, many think back to 1962, when mathematician John Tukey hinted at the discipline in his seminal paper The Future of Data Analysis. In it, he described the existence of an “unrecognized science,” one which involved learning from data.

It’s more helpful, however, to examine data science in the modern world. The advent of big data – made possible by leaps in processing and storage capabilities – has brought about unprecedented opportunities for organizations to reveal hidden patterns in data and use this insight to improve decision making. But to do so, they must first collect, process, analyze and share that data. Managing this data life cycle is the essence of data science.

Today, data science is ubiquitous in the business world – and beyond. So much so that Harvard Business Review dubbed the data scientist the sexiest job in the 21st century. If data scientists are the practitioners, data science is the techniques and technologies.

Manufacturing

Deploying the best model into production

As a global manufacturer of construction material, USG must produce high-quality products at affordable prices. By deploying SAS® Model Manager, the Sheetrock producer can single out the optimal formulation of raw materials and adjust its production process in near-real time to achieve that goal.

Data science in today’s world

Get a glimpse into the modern world of data science.

The Data Science Experience

Explore real examples of data science in action with videos, articles and on-demand webinars from citizen data scientists.

Drive Analytic Innovation Through SAS® and Open Source Integration

This e-book provides guidance for innovating in the modern organization by integrating open source software with SAS in data science.

Data Science and the Art of Persuasion

This summary of a Harvard Business Review webinar describes what data science teams must do to achieve greater success and the skills that data scientists should develop to improve their overall effectiveness.

Data Science Resource Hub

This resource center is chock-full of everything you need to supplement your training as a data scientist. It includes videos, articles, webinars and other learning materials. Practical topics include data storytelling, scientific research and nailing your data science interview.

Gartner’s Magic Quadrant for Data Science

Curious how the various data science platforms stack up? Explore Gartner’s Magic Quadrant for Data Science and Machine Learning Platforms to compare the top 20 offerings.

Who’s using data science?

You’d be hard-pressed to find an industry that doesn’t infuse data science into critical business functions. Here are a few of the most interesting use cases.

Health care

The growing demand for value-based care and faster drug discovery cycles has sped up the adoption of data science in health care. In the field of medical imaging alone, AI and analytics now help enhance diagnostic accuracy, augment physicians and radiologists, and improve patient care delivery.

Retail

To compete with the Amazons of the world, retailers must be able to rapidly fulfill customer needs using data science technologies such as predictive analytics. Doing so can help forecast demand levels, manage fluctuating demand, and make correlations between trends and relationships across the supply chain.

Public Sector

As the sheer volume and complexity of decisions rise across governments, agencies are turning to data science to improve the accuracy, fairness and speed of those decisions. Read how governments worldwide are using analytics to make millions of vital decisions each day.

Banking

For banks, data science is more than a trend – it’s how business gets done. With a plethora of use cases from fraud detection to customer intelligence to risk management, data science is now the driving force behind critical business decisions and a competitive differentiator in a crowded financial landscape.

Data science outputs

To understand the many ways data science can affect an organization, it’s helpful to examine some of the common data science goals and deliverables. 

  • Prediction (when an asset will fail).
  • Classification (new or existing customer).
  • Recommendations (if you like that, try this).
  • Anomaly detection (fraudulent purchases).
  • Recognition (image, text, audio, video, etc.).
  • Actionable insights (dashboards, reports, visualizations).
  • Automated processes and decision making (credit card approval).
  • Scoring and ranking (credit score).
  • Segmentation (targeted marketing).
  • Optimization (manufacturing improvements).
  • Forecasts (predicting sales and revenue).

If you’re looking to augment your data science work with a better grasp of choosing, deploying and managing models, then exploring more training in AI and ML is ideal. Ronald van Loon Principal Analyst CEO of Intelligent World

Composite AI

Most AI projects today rely on multiple data science technologies. According to Gartner, using a combination of different AI techniques to achieve the best result is called “composite AI.”

With composite AI, you start with the problem and then apply the right data and tools to solve the problem. This often includes using a combination of data science techniques, including ML, statistics, advanced analytics, data mining, forecasting, optimization, natural language processing, computer vision and others. 

Composite AI is increasingly synonymous with data science. That’s because choosing the right AI technology to use is not always straightforward. It requires a deep understanding of the business problem you’re trying to solve and the data available to solve it. This combination of business and technology expertise is the essence of data science. 

How data science works

Data science involves the use of multiple tools and technologies to derive meaningful information from structured and unstructured data. Here are some of the common practices used by data scientists to transform raw information into business-changing insight.

Data management is the practice of managing data to unlock its potential for an organization. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for analytics. 

Machine learning automates analytical model building. With unsupervised machine learning, the technology uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude.

A neural network is a kind of machine learning inspired by the workings of the human brain. It’s a computing system made up of interconnected units (like neurons) that processes information by responding to external inputs, relaying information between each unit.

Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in large amounts of data. Common applications include image and speech recognition.

Computer vision relies on pattern recognition and deep learning to recognize what’s in a picture or video. When machines can process, analyze and understand images, they can capture images or videos in real time and interpret their surroundings.

Natural language processing is the ability of computers to analyze, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using everyday language to perform tasks.

Data visualization is the presentation of data in a pictorial or graphical format so it can be easily analyzed. This is especially important to enable organizations to make business decisions based on the output of data science efforts. 

Popular programming languages for data science

Just as humans use a wide variety of languages, the same is true for data scientists. With hundreds of programming languages available today, choosing the right one comes down to what you’re trying to accomplish. Here’s a look at some of the top data science programming languages. 

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as a scripting or glue language to connect existing components.

R is a free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

SAS is a programming language trusted by hundreds of thousands of data scientists worldwide. The SAS Viya platform allows you to combine the benefits of every technology system and programming language in your organization for better analytical model development and deployment. Read how SAS Viya can help turn your modeling melting pot into smarter business decisions.

Next Steps

If you want to learn data science, SAS is the place to do it.

Data science solutions

SAS Viya offerings and capablities feature robust data management, visualization, advanced analytics and model management capabilities to accelerate data science at any organization.

SAS Visual Data Mining and Machine Learning enables you to solve the most complex analytical problems with a single, integrated, collaborative solution – now with its own automated modeling API.

SAS Visual Analytics provides you with the means to quickly prepare reports interactively, explore your data through visual displays and perform your analyses on a self-service basis.

These solutions and more are powered by SAS Viya, SAS’ market-leading data science platform that runs on a modern, scalable, cloud-enabled architecture.