News / Features



How to visualize billions of rows of data

New data visualization technologies make finding answers in big data clear and simple

By Randy Guard, VP of Sales Development and Product Management, SAS

The challenges of big data are well-documented – but so are the opportunities. If you can overcome storage, speed and structural challenges with your big data sources, you can find answers at a deeper level and ask what-if questions of the data in ways you never imagined.

But how do you visualize those answers? How can you explore relationships among thousands of variables and determine their relative importance in order to quickly build predictive models and make iterative changes on the fly? Finally, how do you display results in a way that is not overwhelming – and allows you to drive better business decisions?

You need a new way to look at the data that collapses and condenses the results in an intuitive fashion but still displays graphs and charts that decision makers are accustomed to seeing. You also need those results to be available quickly via mobile devices, and you need easy access to explore the data in real time through a user-friendly interface.

I'd like to offer four important tips for achieving visualizations that simplify the process of uncovering insights, solving complex problems and identifying new opportunities with big data:

1. Provide auto-charting capabilities to help users create the best possible visualizations, including visualizations specifically designed to display big data.
2. Build hierarchies on the fly and display data in various ways to address specific business problems, without relying on help from the IT department.
3. Offer clear explanations of complex analytic functions, correlations and linear regressions in a language that business users can understand.
4. Use in-memory processing for highspeed analytic calculations. If all calculations happen in memory, results can be delivered fast enough for exploratory analysis of even very large data sets.

Take something as simple as a box plot. It is a convenient way of graphically depicting groups of numerical data through summary calculations – minimum, maximum, upper and lower quartile, and median (see Figure 1).

Figure 1: This box plot quickly compares the distribution of data within a category

But even the most common descriptive statistics calculations can become complicated when you are dealing with big data and you don't want to be restricted by column limits, storage constraints and limited data type support. The solution is an in-memory engine that speeds the tasks of data exploration and a visual interface that clearly displays the results in a simple visualization.

Our next example (Figure 2) is a binned scatterplot overlaid with a regression line. Even though huge amounts of computations are needed to analyze and produce this type of information, bottlenecks do not occur if you use in-memory technology to perform the calculations on the server and present results on the fly. On screen, users are told when they are applying the analytics and what it means in layman's terms. To get the most out of your big data sources, you need the ability to quickly analyze and then visualize what the data is saying. Once you are able to explore and see the information quickly and intuitively, you will be able to gain better insights about customers, market trends, products, or whatever is being analyzed.

Figure 2: This binned scatter plot allows for quick comparison of two measures to see whether there is a relationship between the two and a regression line 

Bio: Randy Guard leads the product strategy and business development efforts across SAS. His teams work closely with SAS' Research and Development organization to define and manage product road maps based on market needs and customer input.

Randy Guard, VP of Sales Development and Product Management, SAS

Read More