Big Data
What it is and why it matters
Big data is a term that describes large, hard-to-manage volumes of data – both structured and unstructured – that inundate businesses on a day-to-day basis. But it’s not just the type or amount of data that’s important, it’s what organizations do with the data that matters. Big data can be analyzed for insights that improve decisions and give confidence for making strategic business moves.
History of Big Data
Big data refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around for a long time. But the concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s:
Volume. Organizations collect data from a variety of sources, including transactions, smart (IoT) devices, industrial equipment, videos, images, audio, social media and more. In the past, storing all that data would have been too costly – but cheaper storage using data lakes, Hadoop and the cloud have eased the burden.
Velocity. With the growth in the Internet of Things, data streams into businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors and smart meters are driving the need to deal with these torrents of data in near-real time.
Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.
At SAS, we consider two additional dimensions when it comes to big data:
Variability
In addition to the increasing velocities and varieties of data, data flows are unpredictable – changing often and varying greatly. It’s challenging, but businesses need to know when something is trending in social media, and how to manage daily, seasonal and event-triggered peak data loads.
Veracity
Veracity refers to the quality of data. Because data comes from so many different sources, it’s difficult to link, match, cleanse and transform data across systems. Businesses need to connect and correlate relationships, hierarchies and multiple data linkages. Otherwise, their data can quickly spiral out of control.
Big data and analytics enable whole person care
Riverside County uses data management and analytics from SAS to integrate health and non-health data from its public hospital, behavioral health system, county jail, social services systems and homelessness systems. By understanding how individuals interact with different services, care pathways can be mapped to health outcomes – resulting in coordinated, whole person care.
Why Is big data important?
The importance of big data doesn’t simply revolve around how much data you have. The value lies in how you use it. By taking data from any source and analyzing it, you can find answers that 1) streamline resource management, 2) improve operational efficiencies, 3) optimize product development, 4) drive new revenue and growth opportunities and 5) enable smart decision making. When you combine big data with high-performance analytics, you can accomplish business-related tasks such as:
- Determining root causes of failures, issues and defects in near-real time.
- Spotting anomalies faster and more accurately than the human eye.
- Improving patient outcomes by rapidly converting medical image data into insights.
- Recalculating entire risk portfolios in minutes.
- Sharpening deep learning models' ability to accurately classify and react to changing variables.
- Detecting fraudulent behavior before it affects your organization.
Big Data in Today’s World
Big data – and the way organizations manage and derive insight from it – is changing the way the world uses business information. Learn more about big data’s impact.
What's a data hero to do?
Who are data heroes? A data scientist analyzes and looks for insights in data. Data engineers build pipelines focused on DataOps. Data officers ensure data is reliable and managed responsibly. Synergy among roles drives analytics success.
What is a data lake and why does it matter?
Unlike its older cousin – the data warehouse – a data lake is ideal for storing unstructured big data like tweets, images, voice and streaming data. But it can store all types of data – any source, size, speed or structure.
Big data and cloud
Big data projects demand intense resources for data processing and storage. Working together, big data technologies and cloud computing provide a cost-effective way to handle all types of data – for a winning combination of agility and elasticity.
Deep learning craves big data because big data is necessary to isolate hidden patterns and to find answers without overfitting the data. With deep learning, the more good quality data you have, the better the results.
Data-driven innovation
Today’s exabytes of big data open countless opportunities to capture insights that drive innovation. From more accurate forecasting to increased operational efficiency and better customer experiences, sophisticated uses of big data and analytics propel advances that can change our world – improving lives, healing sickness, protecting the vulnerable and conserving resources.
How Big Data Works
Before businesses can put big data to work for them, they should consider how it flows among a multitude of locations, sources, systems, owners and users. There are five key steps to taking charge of this "big data fabric" that includes traditional, structured data along with unstructured and semistructured data:
- Set a big data strategy.
- Identify big data sources.
- Access, manage and store the data.
- Analyze the data.
- Make intelligent, data-driven decisions.
1) Set a big data strategy
At a high level, a big data strategy is a plan designed to help you oversee and improve the way you acquire, store, manage, share and use data within and outside of your organization. A big data strategy sets the stage for business success amid an abundance of data. When developing a strategy, it’s important to consider existing – and future – business and technology goals and initiatives. This calls for treating big data like any other valuable business asset rather than just a byproduct of applications.
2) Identify big data sources
- Streaming data comes from the Internet of Things (IoT) and other connected devices that flow into IT systems from wearables, smart cars, medical devices, industrial equipment and more. You can analyze this big data as it arrives, deciding which data to keep or not keep, and which needs further analysis.
- Social media data stems from interactions on Facebook, YouTube, Instagram, etc. This includes vast amounts of big data in the form of images, videos, voice, text and sound – useful for marketing, sales and support functions. This data is often in unstructured or semistructured forms, so it poses a unique challenge for consumption and analysis.
- Publicly available data comes from massive amounts of open data sources like the US government’s data.gov, the CIA World Factbook or the European Union Open Data Portal.
- Other big data may come from data lakes, cloud data sources, suppliers and customers.
3) Access, manage and store big data
Modern computing systems provide the speed, power and flexibility needed to quickly access massive amounts and types of big data. Along with reliable access, companies also need methods for integrating the data, building data pipelines, ensuring data quality, providing data governance and storage, and preparing the data for analysis. Some big data may be stored on-site in a traditional data warehouse – but there are also flexible, low-cost options for storing and handling big data via cloud solutions, data lakes, data pipelines and Hadoop.
4) Analyze the data
With high-performance technologies like grid computing or in-memory analytics, organizations can choose to use all their big data for analyses. Another approach is to determine upfront which data is relevant before analyzing it. Either way, big data analytics is how companies gain value and insights from data. Increasingly, big data feeds today’s advanced analytics endeavors such as artificial intelligence (AI) and machine learning.
5) Make intelligent, data-driven decisions
Well-managed, trusted data leads to trusted analytics and trusted decisions. To stay competitive, businesses need to seize the full value of big data and operate in a data-driven way – making decisions based on the evidence presented by big data rather than gut instinct. The benefits of being data driven are clear. Data-driven organizations perform better, are operationally more predictable and are more profitable.
Next Steps
Big data demands sophisticated data management technology to transform your analytics and AI programs into big opportunities. SAS has you covered.
SAS® Information Governance
Regardless of source, where the data is stored, or how large and complex it is, SAS Information Governance makes it faster and easier for data users to find, catalog and protect the big data that is most valuable for analysis. Metadata-oriented search results show detailed information about each data asset. In turn, business users can evaluate the data’s fitness for purpose with less reliance on IT while avoiding rework and making more informed choices.
Recommended Reading
- Interview Интернет вещей в промышленности: вести с полейИнтернет вещей оказывает огромное влияние на технологическую сферу. Узнайте как воплотить в жизнь концепцию «интеллектуального» производства.
- Article A new arms race: Analytics for commodity market complianceRogue trading and dodgy deals are not the only things keeping chief risk officers awake. Today’s regulators now employ big data analytics to uncover troubles in the commodity swaps market. Staying ahead of innocent compliance errors – and quickly identifying the occasional bad actor from within – will require some tough analytics of your own.
- Big data and global developmentFind out how organiztions are using online and mobile data to make the world a better place. The United Nation's Global Pulse team defines global development and explains how big data can help improve human welfare.