Data Science

What it is and why it matters

Data science is a multidisciplinary field that broadly describes the use of data to generate insight. Unlike more specialized data-related fields, such as data mining or data engineering, data science encompasses the complete life cycle of translating raw data into usable information and applying it for productive ends in a wide variety of applications.

The evolution of data science

When tracing the origin of data science, many think back to 1962, when mathematician John Tukey hinted at the discipline in his seminal paper The Future of Data Analysis. In it, he described the existence of an “unrecognized science,” one which involved learning from data.

It’s more helpful, however, to examine data science in the modern world. The advent of big data – made possible by leaps in processing and storage capabilities – has brought about unprecedented opportunities for organizations to reveal hidden patterns in data and use this insight to improve decision making. But to do so, they must first collect, process, analyze and share their data sets. Managing this data life cycle is the essence of data science.

Today, data science is ubiquitous in the business world – and beyond. So much so that Harvard Business Review dubbed the data scientist the sexiest job in the 21st century. If data scientists are the practitioners, data science is the techniques and technologies.

Are you curious about what the top programming languages are in data science today? Learn more in this article by ZDNET

Manufacturing

Bridging the data science skills gap

The demand for advanced analytical skills has skyrocketed, leaving countries scrambling to bridge the talent gap. By using SAS® Education Analytical Suite and SAS® Viya®, North-West University is providing innovative data science education. This is transforming South Africa's workforce by helping students gain vital first-hand experience in problem formulation, business etiquette and writing, and value delivery. 

Data science in today’s world

Get a glimpse into the modern world of data science.

The data science experience

Explore real examples of data science in action with videos, articles and on-demand webinars from citizen data scientists across many different industries.

How to become a data scientist (who lives at the beach)

Learn more about the day-to-day life of data scientist Robert Blanchard as he answers a variety of questions – all from his computer at the beach.

A data scientist’s views on data literacy

In this article, data scientist and astrophysicist, Kirk Borne, takes a deep dive into what data literacy is, how to become data literate and, most importantly, how to use it.

Data science blog posts

From recurrent neural networks to feature engineering and machine learning algorithms, learn from SAS experts as they explain technical methods used to solve many challenging business problems.

Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

Curious how the various data science platforms stack up? Explore the Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms to compare the top 20 offerings.

Who’s using data science?

You’d be hard-pressed to find an industry that doesn’t infuse data science into critical business functions. Here are a few of the most interesting use cases.

Insurance 

In the insurance industry, data analysts can help their organizations harness data insights to make better decisions and improve business performance. Insurers that incorporate advanced data science capabilities are positioned to improve their underwriting profitability, gross written premium and net promoter score.

Retail

To compete with the Amazons of the world, retailers must be able to rapidly fulfill customer needs using data science technologies such as predictive analytics. Doing so can help forecast demand levels, manage fluctuating demand, and make correlations between trends and relationships across the supply chain.

Public Sector

As the sheer volume and complexity of decisions rise across governments, agencies are turning to data science to improve the accuracy, fairness and speed of those decisions. Read how governments worldwide are using analytics to make millions of vital decisions each day.

Banking

For banks, data science is more than a trend – it’s how business gets done. With a plethora of use cases from fraud detection to customer intelligence to risk management, data science is now the driving force behind critical business decisions and a competitive differentiator in a crowded financial landscape.

Data science outputs

To understand the many ways data science can affect an organization, it’s helpful to examine some of the common data science goals and deliverables. 

  • Prediction (when an asset will fail).
  • Classification (new or existing customer).
  • Recommendations (if you like that, try this).
  • Anomaly detection (fraudulent purchases).
  • Recognition (image, text, audio, video, etc.).
  • Actionable insights (dashboards, reports, visualizations).
  • Automated processes and decision making (credit card approval).
  • Scoring and ranking (credit score).
  • Segmentation (targeted marketing).
  • Optimization (manufacturing improvements).
  • Forecasts (predicting sales and revenue).

If you’re looking to augment your data science work with a better grasp of choosing, deploying and managing models, then exploring more training in AI and ML is ideal. Ronald van Loon Principal Analyst CEO of Intelligent World

Composite AI

Most AI projects today rely on multiple data science technologies. According to Gartner, using a combination of different AI techniques to achieve the best result is called “composite AI.”

With composite AI, you start with the problem and then apply the right data and tools to solve the problem. This often includes using a combination of data science techniques, including ML, statistics, advanced analytics, data mining, forecasting, optimization, natural language processing, computer vision and others. 

Composite AI is increasingly synonymous with data science. That’s because choosing the right AI technology to use is not always straightforward. It requires a deep understanding of the business problem you’re trying to solve and the data available to solve it. This combination of business and technology expertise is the essence of data science. 

How data science works – and data science tools

Data science projects involve the use of multiple tools and technologies to derive meaningful information from structured and unstructured data. Here are some of the common practices data scientists use as part of the data science process to transform raw information into business-changing insight.

Data management is the practice of managing data to unlock its potential for an organization. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for analytics

Machine learning – a branch of artificial intelligence automates analytical model building. With unsupervised machine learning models, the technology uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude.

A neural network is a kind of machine learning inspired by the workings of the human brain. It’s a computing system made up of interconnected units (like neurons) that processes information by responding to external inputs, relaying information between each unit.

Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in high volumes of data. Common applications include image and speech recognition.

Computer vision relies on pattern recognition and deep learning to recognize what’s in a picture or video. When machines can process, analyze and understand images, they can capture images or videos in real time and interpret their surroundings.

Natural language processing is the ability of computers to analyze, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using everyday language to perform tasks.

Data visualization is the presentation of data in a pictorial or graphical format so it can be easily understood by business analysts and others. Data visualizations are especially important in helping organizations analyze large amounts of data and make business decisions based on the output.

Popular programming languages for data science

Just as humans use a wide variety of languages, the same is true for data scientists. With hundreds of programming languages available today, choosing the right one comes down to what you’re trying to accomplish. Here’s a look at some of the top data science programming languages. 

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development and building data pipelines. It's also used as a scripting or glue language to connect existing components.

R is a free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical analysis software and data analysis.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS) or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

SAS is a programming language trusted by hundreds of thousands of data scientists worldwide. The SAS Viya platform allows you to combine the benefits of every technology system and programming language in your organization for better analytical model development and deployment. Read how SAS Viya can help turn your modeling melting pot into smarter business decisions.

Next Steps

If you want to learn data science, SAS is the place to do it.

Data science solutions

SAS Viya capabilities include robust data management, visualization, advanced analytics and model management to accelerate data science at any organization.

SAS for Machine Learning and Deep Learning enables you to solve the most complex analytical problems with a single, integrated, collaborative solution – now with its own automated modeling API.

SAS Visual Analytics provides you with the means to quickly prepare reports interactively, explore your data through visual displays and perform your analyses on a self-service basis.

These solutions and more are powered by SAS Viya, SAS’ market-leading data science platform that runs on a modern, scalable, cloud-enabled architecture.