Blue Abstract art

Big Data

What it is and why it matters

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Big Data History and Current Considerations

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.

Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.

Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

At SAS, we consider two additional dimensions when it comes to big data:

Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.

Complexity. Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.

SciSports and SAS Viya Score New Insights on Soccer Pitch

Sports data analytics company SciSports has developed a camera system called BallJames to capture big data from all the players on the field who don’t have the ball – an extension of traditional approaches. This real-time tracking technology automatically generates 3-D data from the video of 14 cameras placed around the stadium to record every movement of the players. BallJames generates player data such as precision, direction and speed of the passing, sprinting strength and jumping strength – that’s a lot of big data.

Why Is Big Data Important?

The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:

  • Determining root causes of failures, issues and defects in near-real time.
  • Generating coupons at the point of sale based on the customer’s buying habits.
  • Recalculating entire risk portfolios in minutes.
  • Detecting fraudulent behavior before it affects your organization.

Big Data in Today’s World

Big data – and the way organizations manage and derive insight from it – is changing the way the world uses business information. Learn more about big data’s impact.

White paper icon

Data Integration Deja Vu: Big Data Reinvigorates DI

To stay relevant, data integration needs to work with many different types and sources of data, while operating at different latencies – from real time to streaming. Learn how DI has evolved to meet modern requirements.

Read paper

White paper icon

SAS: A Comprehensive Approach to Big Data Governance

Some analysts predict that data will soar to 10 times its 2016 volume by 2025. Along with this surge, big data governance issues will be more daunting than ever. Find out how a comprehensive platform from SAS – spanning data management and analytics – can help.

Read paper

Article icon

Data lake and data warehouse – know the difference

Is the term "data lake" just marketing hype? Or a new name for a data warehouse? Phil Simon sets the record straight about what a data lake is, how it works and when you might need one.

Read the article

SAS Visual Analytics screenshot on monitor


Adding Hadoop to your Big Data Mix?

SAS provides everything you need to get valuable insights from all that data.

Learn more about big data solutions from SAS

Who uses big data?

Big data affects organizations across practically every industry. See how each industry can benefit from this onslaught of information.

Banking

With large amounts of information streaming in from countless sources, banks are faced with finding new and innovative ways to manage big data. While it’s important to understand customers and boost their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance. Big data brings big insights, but it also requires financial institutions to stay one step ahead of the game with advanced analytics.

Education

Educators armed with data-driven insight can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.

Government

When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. But while there are many advantages to big data, governments must also address issues of transparency and privacy.

Health Care

Patient records. Treatment plans. Prescription information. When it comes to health care, everything needs to be done quickly, accurately – and, in some cases, with enough transparency to satisfy stringent industry regulations. When big data is managed effectively, health care providers can uncover hidden insights that improve patient care.

Manufacturing

Armed with insight that big data can provide, manufacturers can boost quality and output while minimizing waste – processes that are key in today’s highly competitive market. More and more manufacturers are working in an analytics-based culture, which means they can solve problems faster and make more agile business decisions.

Retail

Customer relationship building is critical to the retail industry – and the best way to manage that is to manage big data. Retailers need to know the best way to market to customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business. Big data remains at the heart of all those things.

What is a modern analytics platform?

Oliver Schabenberger, Executive Vice President and Chief Technology Officer at SAS, describes the characteristics of a modern analytics platform. He cites the diversity of data management and analytics challenges such a platform must handle – for small data as well as big data. The modern analytics platform, he says, should be able to process structured and unstructured data and accommodate simple analytics to complex machine learning problems.

 It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis. The sweeping changes in big data technologies and management approaches need to be accompanied by similarly dramatic shifts in how data supports decisions and product/service innovation.
Thomas H. Davenport in  Big Data in Big Companies
SAS Visual Analytics screenshot on monitor


Data Exploration & Visualization

SAS makes it easy to understand what your data has to tell you. Interactively explore billions of rows of data in seconds.

Learn more about big data solutions from SAS

How It Works

Before discovering how big data can work for your business, you should first understand where it comes from. The sources for big data generally fall into one of three categories:

Streaming data

This category includes data that reaches your IT systems from a web of connected devices, often part of the IoT. You can analyze this data as it arrives and make decisions on what data to keep, what not to keep and what requires further analysis.

Social media data

The data on social interactions is an increasingly attractive set of information, particularly for marketing, sales and support functions. It's often in unstructured or semistructured forms, so it poses a unique challenge when it comes to consumption and analysis.

Publicly available sources

Massive amounts of data are available through open data sources like the US government’s data.gov, the CIA World Factbook or the European Union Open Data Portal.

After identifying all the potential sources for data, consider the decisions you’ll need to make once you begin harnessing information. These include:

How to store and manage it

Whereas storage would have been a problem several years ago, there are now low-cost options for storing data if that’s the best strategy for your business.

How much of it to analyze

Some organizations don't exclude any data from their analyses, which is possible with today’s high-performance technologies such as grid computing or in-memory analytics. Another approach is to determine upfront which data is relevant before analyzing it.

How to use any insights you uncover

The more knowledge you have, the more confident you’ll be in making business decisions. It’s smart to have a strategy in place once you have an abundance of information at hand.

The final step in making big data work for your business is to research the technologies that help you make the most of big data and big data analytics. Consider:

  • Cheap, abundant storage.
  • Faster processors.
  • Affordable open source, distributed big data platforms, such as Hadoop.
  • Parallel processing, clustering, MPP, virtualization, large grid environments, high connectivity and high throughputs.
  • Cloud computing and other flexible resource allocation arrangements.

Back to Top