What does the data look like?
The data must meet the following requirements:
Number of Tables
There can only be one. Therefore you need to prepare 1 single table. This single table is called the analytical base table (ABT). To get to this ABT, for example from a star schema, some data preparation like performing joins might be required.
Example: From Star Schema to 1 single Analytical Base Table (ABT)
The preferred format: the SAS dataset.
It's not the size of the data that matters, but what you do with your data. Therefore the size of the single table is limited to maximum 1 GB of data. The single table is not compressed, nor zipped.
The data needs to be masked and anonymous. Therefore the data is depersonalized and the data is not confidential .
Example: Depersonalising Tables
Variable Format Types
We love to KISS, meaning Keeping It Short & Simple. Therefore the data variables are in one of the following formats: Character, Numeric, Date, Time, DateTime.
Besides the 'must have requirements' described above, try to take into account as much as possible the following guidelines:
Analytical Base Table Layout
Please pay attention to layout of the data table. To be powerful for analytics the layout of an analytical base table is wide (as opposed to long) as shown in the example below.
Example: Analytical Base Table Layout
Note that 'Sale' and 'CostOfSale' show up in different rows. This is not good for performing analytics. An analytical layout is needed: the table should be flattened by moving rows to columns.
Make sure that your dataset contains geographical dimensions like for example continent, country, province, city/zip, ...
Visualizations using a geographical map give a unique power to explore the geographical dimension. Therefore geographical coordinates will be added to the dataset. The coordinate space used must be one of the following: World Geodetic System (WGS84), Web Mercator, British National Grid (OSGB36).
Coordinates of for the Belgian cities (zip codes) and provinces can be provided by SAS.
Other geographical coordinates can be obtained via Openstreetmap as follows:
- Navigate to http://www.openstreetmap.org
- Double click the location you're looking for (the more you zoom, the more precise it will be)
- Click on 'permalink' in the bottom right corner of your screen
- The Latitude & Longitude coordinates are now shown in the URL of the webpage.
- These coordinates are ready for use in SAS Visual Analytics.
Timing is everything, therefore your data needs to contain dimensions like date/time. This is a must to get insight via forecasting and scenario analysis.
Multiple Date/Time Intervals
Deduct other variables from one Date/Time variable. For example from a Date variable the following variables can be deducted: DateByMonth, DataByQuarter, DateByYear. This has several advantages in SAS Visual Analytics like for creating hierarchies in the date/time dimension or for example to filter easily on a certain month/quarter/year.
Make sure your dataset have the possibility to create hierarchies.
The example below shows how the data should be structured to be able to get a hierarchy from Make to Type to Engine:
More is better when it comes down to measures. Therefore the dataset contains many measures. This allows to perform correlation analysis.
Translate all IDs into meaningful values, like in the example below:
Try to use open/public data.
For example, try to solve mobility issues in your town predicting where and when to expect traffic jams, or how to improve the public road infrastructure to make them disappear completely.
Or you may want to understand which parameters influence the spread of diseases most, link employment rates to education, or analyze interesting phenomena such as floods, garbage trucks, international trade, world population evolutions, etc.
The data is out there, grab it!
Here are some examples of great site where you can find open data:
Data.gov.be - Open data initiative from the Belgian government
- Open Belgium
- Gapminder World data
- Aviz Visual Analytics Project – statistics per country
- Datacatalog World bank - Open data to alleviate poverty
OpenSpending.org - Public data on government finance
PublicData.eu - European open data initiative
The Data Hub - Open data search engine
The Guardian Data Store - Gateway to open data from governments around the globe
- Open flight data
- US Bureau of Transporation Statistics
Example: PROC Contents of a valid sample dataset: