What it is and why it matters
Data science is a multidisciplinary field that broadly describes the use of data to generate insight. Unlike more specialized data-related fields, such as data mining or data engineering, data science encompasses the complete life cycle of translating raw data into usable information and applying it for productive ends in a wide variety of applications.
The evolution of data science
When tracing the origin of data science, many think back to 1962, when mathematician John Tukey hinted at the discipline in his seminal paper The Future of Data Analysis. In it, he described the existence of an “unrecognized science,” one which involved learning from data.
It’s more helpful, however, to examine data science in the modern world. The advent of big data – made possible by leaps in processing and storage capabilities – has brought about unprecedented opportunities for organizations to reveal hidden patterns in data and use this insight to improve decision making. But to do so, they must first collect, process, analyze and share that data. Managing this data life cycle is the essence of data science.
Today, data science is ubiquitous in the business world – and beyond. So much so that Harvard Business Review dubbed the data scientist the sexiest job in the 21st century. If data scientists are the practitioners, data science is the techniques and technologies.
Deploying the best model into production
As a global manufacturer of construction material, USG must produce high-quality products at affordable prices. By deploying SAS® Model Manager, the Sheetrock producer can single out the optimal formulation of raw materials and adjust its production process in near-real time to achieve that goal.
Data science in today’s world
Get a glimpse into the modern world of data science.
Who’s using data science?
You’d be hard-pressed to find an industry that doesn’t infuse data science into critical business functions. Here are a few of the most interesting use cases.
Data science outputs
To understand the many ways data science can affect an organization, it’s helpful to examine some of the common data science goals and deliverables.
- Prediction (when an asset will fail).
- Classification (new or existing customer).
- Recommendations (if you like that, try this).
- Anomaly detection (fraudulent purchases).
- Recognition (image, text, audio, video, etc.).
- Actionable insights (dashboards, reports, visualizations).
- Automated processes and decision making (credit card approval).
- Scoring and ranking (credit score).
- Segmentation (targeted marketing).
- Optimization (manufacturing improvements).
- Forecasts (predicting sales and revenue).
If you’re looking to augment your data science work with a better grasp of choosing, deploying and managing models, then exploring more training in AI and ML is ideal. Ronald van Loon Principal Analyst CEO of Intelligent World
Most AI projects today rely on multiple data science technologies. According to Gartner, using a combination of different AI techniques to achieve the best result is called “composite AI.”
With composite AI, you start with the problem and then apply the right data and tools to solve the problem. This often includes using a combination of data science techniques, including ML, statistics, advanced analytics, data mining, forecasting, optimization, natural language processing, computer vision and others.
Composite AI is increasingly synonymous with data science. That’s because choosing the right AI technology to use is not always straightforward. It requires a deep understanding of the business problem you’re trying to solve and the data available to solve it. This combination of business and technology expertise is the essence of data science.
How data science works
Data science involves the use of multiple tools and technologies to derive meaningful information from structured and unstructured data. Here are some of the common practices used by data scientists to transform raw information into business-changing insight.
Data management is the practice of managing data to unlock its potential for an organization. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for analytics.
Machine learning automates analytical model building. With unsupervised machine learning, the technology uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude.
A neural network is a kind of machine learning inspired by the workings of the human brain. It’s a computing system made up of interconnected units (like neurons) that processes information by responding to external inputs, relaying information between each unit.
Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in large amounts of data. Common applications include image and speech recognition.
Computer vision relies on pattern recognition and deep learning to recognize what’s in a picture or video. When machines can process, analyze and understand images, they can capture images or videos in real time and interpret their surroundings.
Natural language processing is the ability of computers to analyze, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using everyday language to perform tasks.
Data visualization is the presentation of data in a pictorial or graphical format so it can be easily analyzed. This is especially important to enable organizations to make business decisions based on the output of data science efforts.
Popular programming languages for data science
Just as humans use a wide variety of languages, the same is true for data scientists. With hundreds of programming languages available today, choosing the right one comes down to what you’re trying to accomplish. Here’s a look at some of the top data science programming languages.
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as a scripting or glue language to connect existing components.
R is a free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.
SAS is a programming language trusted by hundreds of thousands of data scientists worldwide. The SAS Viya platform allows you to combine the benefits of every technology system and programming language in your organization for better analytical model development and deployment. Read how SAS Viya can help turn your modeling melting pot into smarter business decisions.
Data science solutions
SAS Viya data science offerings feature robust data management, visualization, advanced analytics and model management capabilities to accelerate data science at any organization.
SAS Visual Data Mining and Machine Learning enables you to solve the most complex analytical problems with a single, integrated, collaborative solution – now with its own automated modeling API.
These solutions and more are powered by SAS Viya, SAS’ market-leading data science platform that runs on a modern, scalable, cloud-enabled architecture.