Data Science

What it is and why it matters

Data science is a multidisciplinary field that broadly describes the use of data to generate insight. Unlike more specialised data-related fields, such as data mining or data engineering, data science encompasses the complete life cycle of translating raw data into usable information and applying it for productive ends in a wide variety of applications.

The evolution of data science

When tracing the origin of data science, many think back to 1962, when mathematician John Tukey hinted at the discipline in his seminal paper The Future of Data Analysis. In it, he described the existence of an “unrecognised science,” one which involved learning from data.

It’s more helpful, however, to examine data science in the modern world. The advent of big data – made possible by leaps in processing and storage capabilities – has brought about unprecedented opportunities for organisations to reveal hidden patterns in data and use this insight to improve decision making. But to do so, they must first collect, process, analyse and share that data. Managing this data life cycle is the essence of data science.

Today, data science is ubiquitous in the business world – and beyond. So much so that Harvard Business Review dubbed the data scientist the sexiest job in the 21st century. If data scientists are the practitioners, data science is the techniques and technologies.

Are you curious about what the top programming languages are in data science today?

Data science in today’s world

Get a glimpse into the modern world of data science.

The data science experience

Explore real examples of data science in action with videos, articles and on-demand webinars from citizen data scientists across many different industries.

Drive Analytic Innovation Through SAS® and Open Source Integration

This e-book provides guidance for innovating in the modern organisation by integrating open source software with SAS in data science.

A data scientist’s views on data literacy

In this article, data scientist and astrophysicist, Kirk Borne, takes a deep dive into what data literacy is, how to become data literate and, most importantly, how to use it.

Data Science Resource Hub

This resource centre is chock-full of everything you need to supplement your training as a data scientist. It includes videos, articles, webinars and other learning materials. Practical topics include data storytelling, scientific research and nailing your data science interview.

Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

Curious how the various data science platforms stack up? Explore the Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms to compare the top 20 offerings.

Who’s using data science?

You’d be hard-pressed to find an industry that doesn’t infuse data science into critical business functions. Here are a few of the most interesting use cases.

Banking

For banks, data science is more than a trend – it’s how business gets done. With a plethora of use cases from fraud detection to customer intelligence to risk management, data science is now the driving force behind critical business decisions and a competitive differentiator in a crowded financial landscape.

Insurance 

In the insurance industry, data analysts can help their organizations harness data insights to make better decisions and improve business performance. Insurers that incorporate advanced data science capabilities are positioned to improve their underwriting profitability, gross written premium and net promoter score.

Public Sector

As the sheer volume and complexity of decisions rise across governments, agencies are turning to data science to improve the accuracy, fairness and speed of those decisions. Read how governments worldwide are using analytics to make millions of vital decisions each day.

Retail

To compete with the Amazons of the world, retailers must be able to rapidly fulfil customer needs using data science technologies such as predictive analytics. Doing so can help forecast demand levels, manage fluctuating demand, and make correlations between trends and relationships across the supply chain.

Bridging the data science skills gap

The demand for advanced analytical skills has skyrocketed, leaving countries scrambling to bridge the talent gap. By using SAS® Education Analytical Suite and SAS® Viya®, North-West University is providing innovative data science education. This is transforming South Africa's workforce by helping students gain vital firsthand experience in problem formulation, business etiquette and writing, and value delivery.

Data science outputs

To understand the many ways data science can affect an organization, it’s helpful to examine some of the common data science goals and deliverables.

  • Prediction (when an asset will fail).
  • Classification (new or existing customer).
  • Recommendations (if you like that, try this).
  • Anomaly detection (fraudulent purchases).
  • Recognition (image, text, audio, video, etc.).
  • Practical insights (dashboards, reports, visualizations).
  • Automated processes and decision making (credit card approval).
  • Scoring and ranking (credit score).
  • Segmentation (targeted marketing).
  • Optimization (manufacturing improvements).
  • Forecasts (predicting sales and revenue).
If you’re looking to augment your data science work with a better grasp of choosing, deploying and managing models, then exploring more training in AI and ML is ideal. Ronald van Loon Principal Analyst, CEO of Intelligent World


Composite AI

Most AI projects today rely on multiple data science technologies. According to Gartner, using a combination of different AI techniques to achieve the best result is called “composite AI.”

With composite AI, you start with the problem and then apply the right data and tools to solve the problem. This often includes using a combination of data science techniques, including ML, statistics, advanced analytics, data mining, forecasting, optimization, natural language processing, computer vision and others. 

Most AI projects today rely on multiple data science technologies. According to Gartner, using a combination of different AI techniques to achieve the best result is called “composite AI.”

With composite AI, you start with the problem and then apply the right data and tools to solve the problem. This often includes using a combination of data science techniques, including ML, statistics, advanced analytics, data mining, forecasting, optimisation, natural language processing, computer vision and others. 

Composite AI is increasingly synonymous with data science. That’s because choosing the right AI technology to use is not always straightforward. It requires a deep understanding of the business problem you’re trying to solve and the data available to solve it. This combination of business and technology expertise is the essence of data science. 

How data science works – and data science tools

Data science projects involve the use of multiple tools and technologies to derive meaningful information from structured and unstructured data. Here are some of the common practices data scientists use as part of the data science process to transform raw information into business-changing insight.

Computer vision relies on pattern recognition and deep learning to recognize what’s in a picture or video. When machines can process, analyze and understand images, they can capture images or videos in real time and interpret their surroundings.

Data management is the practice of managing data to unlock its potential for an organization. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for analytics.

Data visualization is the presentation of data in a pictorial or graphical format so it can be easily understood by business analysts and others. Data visualizations are especially important in helping organizations analyze large amounts of data and make business decisions based on the output.

Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in high volumes of data. Common applications include image and speech recognition.

Machine learning – a branch of artificial intelligence automates analytical model building. With unsupervised machine learning models, the technology uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude.

Natural language processing is the ability of computers to analyze, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using everyday language to perform tasks.

A neural network is a kind of machine learning inspired by the workings of the human brain. It’s a computing system made up of interconnected units (like neurons) that process information by responding to external inputs, relaying information between each unit.

Data management is the practice of managing data to unlock its potential for an organisation. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for analytics. 

Machine learning automates analytical model building. With unsupervised machine learning, the technology uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude.

A neural network is a kind of machine learning inspired by the workings of the human brain. It’s a computing system made up of interconnected units (like neurons) that processes information by responding to external inputs, relaying information between each unit.

Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in large amounts of data. Common applications include image and speech recognition.

Computer vision relies on pattern recognition and deep learning to recognise what’s in a picture or video. When machines can process, analyse and understand images, they can capture images or videos in real time and interpret their surroundings.

Natural language processing is the ability of computers to analyse, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using everyday language to perform tasks.

Data visualisation is the presentation of data in a pictorial or graphical format so it can be easily analysed. This is especially important to enable organisations to make business decisions based on the output of data science efforts. 

Popular programming languages for data science

Just as humans use a wide variety of languages, the same is true for data scientists. With hundreds of programming languages available today, choosing the right one comes down to what you’re trying to accomplish. Here’s a look at some of the top data science programming languages. 

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

SAS is a programming language trusted by hundreds of thousands of data scientists worldwide. The SAS Viya platform allows you to combine the benefits of every technology system and programming language in your organisation for better analytical model development and deployment. Read how SAS Viya can help turn your modelling melting pot into smarter business decisions.


Next steps

If you want to learn data science, SAS is the place to do it.

Data science solutions

SAS Viya offerings and capablities feature robust data management, visualisation, advanced analytics and model management capabilities to accelerate data science at any organisation.

SAS Visual Data Mining and Machine Learning enables you to solve the most complex analytical problems with a single, integrated, collaborative solution – now with its own automated modelling API.

SAS Visual Analytics provides you with the means to quickly prepare reports interactively, explore your data through visual displays and perform your analyses on a self-service basis.

These solutions and more are powered by SAS Viya, SAS’ market-leading data science platform that runs on a modern, scalable, cloud-enabled architecture.