www.sas.com > SAS UK > In the Know Homepage Search | Contact Us    
SAS UK Newsletter Banner SAS - The power to know(tm)  

Cleaning Up: Turning Data into Valued Information


The last issue of In the Know looked at profiling data to find problem areas. Once this is complete you're in a strong position to move to the next stage: applying rules to clean your data.

In a recent survey, two-thirds of European organisations said 'dirty data' affects their profitability. It's no surprise that poor quality data compromises decision-making and has spin-off effects in areas ranging from resource allocation to costs. Of course, high quality data can mean enhanced decision-making, better use of resources, cost savings, and more. The key to resolving the data quality challenge lies in automating the development, deployment and management of rules to replace time-consuming and inefficient manual practices.

This means resolving inaccuracies by using various techniques, for example to standardise name and address data, reconcile supplier information, and to transform and code commodity data. Such measures are necessary to clean your data prior to identifying duplicate/matching records. And whether your goal is to match data, link data or household data, it's worth remembering that your matching rules can only be as accurate as the data.

Like the other steps in this process (Profiling, Integration and Augmentation) the Quality stage covers a number of areas to help you tackle a host of quality issues, ranging from the commonplace to the more unusual.

Analysis and Categorisation
Understanding the type of data in your stores can be time consuming. Simply validating all address data or name data in a column can mean hours of manual work. A key aspect of the SAS approach is the analysis technology provided in SAS DFPower Studio, which uncovers the kind of quality issues that exist within a data source and builds the business logic required to resolve them.

Analysis mainly covers the subset of data quality known as data inconsistency, which can damage any system (causing inaccurate results, especially when reporting, querying or performing quantitative analysis on a data set). SAS DFPower Studio addresses this challenge. It automates the proactive detection and categorisation of information in structured and unstructured data sources, segmenting information based on identified categories and enabling you to re-engineer data into the appropriate columns.

In short, you can automate your manual data-type detection processes. And since the core data quality engine can be easily extended to handle any data type, you're in a position to identify and categorise company-specific codes, multi-value columns, account types and more.

Standardisation
Standardised information - like uniform abbreviations, correct spelling and formatted patterns - plays a vital role in the success of strategic information initiatives. To analyse, report and migrate information correctly, every data element must be scrutinised and transformed according to project and enterprise quality requirements. Without standardised information, business intelligence initiatives can be skewed and will never present the 'true picture'. Once again, SAS DFPower Studio provides the solution. It automates tedious standardisation processes by enabling data comprehension, by enforcing custom and out-of-the-box rules, and by reconciling pattern differences.

Example of Standardisation in SAS DFPower Studio

Parsing and Transforming Business Data
Breaking apart multi-value columns is vital to unlocking the value of data and information stores; information trapped in such columns may not conform to the requirements of your business intelligence and analytical applications. SAS DFPower Studio's natural language parsing engine comprehends business data with the same accuracy as the human mind, so it can break up multi-value columns into individual columns.

For example, identifying measurements, quantities, packaging information, manufacturer names or product IDs - all in one column - requires just a few minutes of customisation. Point and click knowledge transfer from domain experts to the rules-based parsing engine supports automated data comprehension, creating more accurate information and delivering dramatic productivity gains. SAS DFPower Studio will handle virtually any parsing requirement, no matter how big the data volume.

Saving Time and Resources
In short, the SAS Data Quality solution and methodology accelerate the development and deployment of quality rules, saving a huge amount of time and effort while improving accuracy. What previously took weeks or months in terms of checking and recoding is distilled down to automated processes that run in a matter of hours. Once the recoding work is complete, it can simply be handed over for integration into the ETL environment. Indeed, full integration into SAS ETL technology supports the easy deployment of quality rules into an automated batch environment.

The net result is that you can quickly and easily analyse and resolve data quality problems and integrate transformations into the ETL environment. And that means better quality business information and a faster return on investment.

An End-to-End Solution
The SAS Data Quality solution combines powerful software with a complete data management methodology that outlines best practice and describes how to tackle even the biggest challenges. When deployed in conjunction with the SAS Enterprise ETL Server, the SAS Data Quality Solution provides a single point of control for managing the ETL process.

In addition to Profiling and Quality, the third and fourth steps required for success are:

  • Integration: identifying and processing matched and duplicate records (e.g. to merge and link data)
  • Augmentation: adding value to existing data through intelligent use of other data sources and analytical functionality

These stages will be covered in the next two issues of In the Know.

Read how SAS data quality technology is being used by Hewlett Packard, one of the premier IT infrastructure companies, to reduce data reconciliation time dramatically and improve data analysis to improve marketing performance.

For more details on SAS ETL and data quality solutions, please visit: http://www.sas.com/technologies/dw/etl/index.html