How to visualize billions of rows of data
New data visualization technologies make finding answers in big data clear and simple
By Randy Guard, VP of Sales Development and Product Management, SAS
The challenges of big data are well-documented – but so are the opportunities. If you can overcome storage, speed and structural challenges with your big data sources, you can find answers at a deeper level and ask what-if questions of the data in ways you never imagined.
But how do you visualize those answers? How can you explore relationships among thousands of variables and determine their relative importance in order to quickly build predictive models and make iterative changes on the fly? Finally, how do you display results in a way that is not overwhelming – and allows you to drive better business decisions?
You need a new way to look at the data that collapses and condenses the results in an intuitive fashion but still displays graphs and charts that decision makers are accustomed to seeing. You also need those results to be available quickly via mobile devices, and you need easy access to explore the data in real time through a user-friendly interface.
I'd like to offer four important tips for achieving visualizations that simplify the process of uncovering insights, solving complex problems and identifying new opportunities with big data:
1. Provide auto-charting capabilities to help users create the best possible visualizations, including visualizations specifically designed to display big data.
Take something as simple as a box plot. It is a convenient way of graphically depicting groups of numerical data through summary calculations – minimum, maximum, upper and lower quartile, and median (see Figure 1).
But even the most common descriptive statistics calculations can become complicated when you are dealing with big data and you don't want to be restricted by column limits, storage constraints and limited data type support. The solution is an in-memory engine that speeds the tasks of data exploration and a visual interface that clearly displays the results in a simple visualization.
Our next example (Figure 2) is a binned scatterplot overlaid with a regression line. Even though huge amounts of computations are needed to analyze and produce this type of information, bottlenecks do not occur if you use in-memory technology to perform the calculations on the server and present results on the fly. On screen, users are told when they are applying the analytics and what it means in layman's terms. To get the most out of your big data sources, you need the ability to quickly analyze and then visualize what the data is saying. Once you are able to explore and see the information quickly and intuitively, you will be able to gain better insights about customers, market trends, products, or whatever is being analyzed.
Bio: Randy Guard leads the product strategy and business development efforts across SAS. His teams work closely with SAS' Research and Development organization to define and manage product road maps based on market needs and customer input.