Big Data – What Is It?
Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. Much has been written on the big data trend and how it can serve as the basis for innovation, differentiation and growth.
According to IDC, it is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms big data.1
- Volume. Many factors contribute to the increase in data volume – transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, etc. In the past, excessive data volume created a storage issue. But with today's decreasing storage costs, other issues emerge, including how to determine relevance amidst the large volumes of data and how to create value from data that is relevant.
- Variety. Data today comes in all types of formats – from traditional databases to hierarchical data stores created by end users and OLAP systems, to text documents, email, meter-collected data, video, audio, stock ticker data and financial transactions. By some estimates, 80 percent of an organization's data is not numeric! But it still must be included in analyses and decision making.
- Velocity. According to Gartner, velocity "means both how fast data is being produced and how fast the data must be processed to meet demand." RFID tags and smart metering are driving an increasing need to deal with torrents of data in near-real time. Reacting quickly enough to deal with velocity is a challenge to most organizations.
Big data according to SAS
At SAS, we consider two other dimensions when thinking about big data:
- Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something big trending in the social media? Perhaps there is a high-profile IPO looming. Maybe swimming with pigs in the Bahamas is suddenly the must-do vacation activity. Daily, seasonal and event-triggered peak data loads can be challenging to manage – especially with social media involved.
- Complexity. When you deal with huge volumes of data, it comes from multiple sources. It is quite an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Data governance can help you determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce high-quality information that is useful, appropriate and up-to-date.
Ultimately, regardless of the factors involved, we believe that the term big data is relative; it applies (per Gartner’s assessment) whenever an organization’s ability to handle, store and analyze data exceeds its current capacity.
Examples of big data
- RFID (radio frequency ID) systems generate up to 1,000 times the data of conventional bar code systems. Tweet
- 10,000 payment card transactions are made every second around the world.2 Tweet
- Walmart handles more than 1 million customer transactions an hour.3 Tweet
- 340 million tweets are sent per day. That's nearly 4,000 tweets per second.4 Tweet
- Facebook has more than 901 million active users generating social interaction data.5 Tweet
- More than 5 billion people are calling, texting, tweeting and browsing websites on mobile phones. Tweet
Uses for big data
So the real issue is not that you are acquiring large amounts of data (because we are clearly already in the era of big data). It's what you do with your big data that matters. The hopeful vision for big data is that organizations will be able to harness relevant data and use it to make the best decisions.
Technologies today not only support the collection and storage of large amounts of data, they provide the ability to understand and take advantage of its full value, which helps organizations run more efficiently and profitably. For instance, with big data and big data analytics, it is possible to:
- Analyze millions of SKUs to determine optimal prices that maximize profit and clear inventory.
- Recalculate entire risk portfolios in minutes and understand future possibilities to mitigate risk.
- Mine customer data for insights that drive new strategies for customer acquisition, retention, campaign optimization and next best offers.
- Quickly identify customers who matter the most.
- Generate retail coupons at the point of sale based on the customer's current and past purchases, ensuring a higher redemption rate.
- Send tailored recommendations to mobile devices at just the right time, while customers are in the right location to take advantage of offers.
- Analyze data from social media to detect new market trends and changes in demand.
- Use clickstream analysis and data mining to detect fraudulent behavior.
- Determine root causes of failures, issues and defects by investigating user sessions, network logs and machine sensors.
"High-performance analytics, coupled with the ability to score every record and feed it into the system electronically, can identify fraud faster and more accurately."
Many organizations are concerned that the amount of amassed data is becoming so large that it is difficult to find the most valuable pieces of information.
- What if your data volume gets so large and varied you don't know how to deal with it?
- Do you store all your data?
- Do you analyze it all?
- How can you find out which data points are really important?
- How can you use it to your best advantage?
Until recently, organizations have been limited to using subsets of their data, or they were constrained to simplistic analyses because the sheer volumes of data overwhelmed their processing platforms. What is the point of collecting and storing terabytes of data if you can't analyze it in full context, or if you have to wait hours or days to get results? On the other hand, not all business questions are better answered by bigger data.
You now have two choices:
- Incorporate massive data volumes in analysis. If the answers you are seeking will be better provided by analyzing all of your data, go for it. The game-changing technologies that extract true value from big data – all of it – are here today. One approach is to apply high-performance analytics to analyze the massive amounts of data using technologies such as grid computing, in-database processing and in-memory analytics.
- Determine upfront which big data is relevant. Traditionally, the trend has been to store everything (some call it data hoarding) and only when you query the data do you discover what is relevant. We now have the ability to apply analytics on the front end to determine data relevance based on context. This analysis can be used to determine which data should be included in analytical processes and which can be placed in low-cost storage for later availability if needed.
" Now you can run hundreds and thousands of models at the product level – at the SKU level – because you have the big data and analytics to support those models at that level."
A number of recent technology advancements are enabling organizations to make the most of big data and big data analytics:
- Cheap, abundant storage and server processing capacity.
- Faster processors.
- Affordable large-memory capabilities, such as Hadoop.
- New storage and processing technologies designed specifically for large data volumes, including unstructured data.
- Parallel processing, clustering, MPP, virtualization, large grid environments, high connectivity and high throughputs.
- Cloud computing and other flexible resource allocation arrangements.
Big data technologies not only support the ability to collect large amounts of data, they provide the ability to understand it and take advantage of its value. The goal of all organizations with access to large data collections should be to harness the most relevant data and use it for optimized decision making.
It is very important to understand that not all of your data will be relevant or useful. But how can you find the data points that matter most? It is a problem that is widely acknowledged. "Most businesses have made slow progress in extracting value from big data. And some companies attempt to use traditional data management practices on big data, only to learn that the old rules no longer apply," says Dan Briody, in the 2011 Economist Intelligence Unit's publication, "Big Data: Harnessing a Game-Changing Asset."
Big data solutions from SAS
How can you make the most of all that data, now and in the future? It is a twofold proposition. You can only optimize your success if you weave analytics into your big data solution. But you also need analytics to help you manage the big data itself.
There are several key technologies that can help you get a handle on your big data, and more important, extract meaningful value from it.
- Information management for big data. Many vendors look at big data as a discussion related to technologies such as Hadoop, NoSQL, etc. SAS takes a more comprehensive data management/data governance approach by providing a strategy and solutions that allow big data to be managed and used more effectively.
- High-performance analytics. By taking advantage of the latest parallel processing power, high-performance analytics lets you do things you never thought possible because the data volumes were just too large.
- High-performance visual analytics. High-performance visual analytics lets you explore huge volumes of data in mere seconds so you can quickly identify opportunities for further analysis. Because it's not just that you have big data, it's the decisions you make with the data that will create organizational gains.
- Flexible deployment options for big data. Flexible deployment models bring choice. High-performance analytics from SAS can analyze billions of variables, and those solutions can be deployed in the cloud (with SAS or another provider), on a dedicated high-performance analytics appliance or within your existing IT infrastructure, whichever best suits your organization's requirements.
1 Source: IDC. "Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO," September 2011.
2 Source: American Bankers Association, March 2009
3 Source: http://www.economist.com
4 Source: http://blog.twitter.com
5 Source: http://newsroom.fb.com/