www.sas.com > SAS UK > In the Know Homepage Search | Contact Us    
SAS UK Newsletter Banner SAS - The power to know(tm)  

Company Data: What You Don't Know Can Hurt You!


Data profiling is a highly important first step in a data integration and quality project, and SAS offers the most comprehensive end-to-end solution based on four building blocks:
  • Profiling: inspecting data for errors, inconsistencies, redundancies and omissions
  • Quality: resolving inaccuracies through a range of tools and techniques such as standardisation of name and address data, reconciliation of supplier information and transformation of code data
  • Integration: rapid identification and processing of matched or duplicate records, for example to merge and link data
  • Augmentation: adding value to existing data by augmenting it through intelligent use of other data sources and analytical functionality
Our solution combines powerful software tools with a complete data management methodology that outlines best practices and describes how to tackle even the toughest challenges. The end result is not simply to ensure consistency and accuracy but to provide analysts with new insights into their organisation's customers, suppliers and internal processes.

Deployed in conjunction with the SAS Enterprise ETL Server, the SAS Data Quality Solution provides a single point of control for managing the ETL process, significantly reducing maintenance and deployment costs for business intelligence applications. It empowers IT departments and information analysts to integrate new data warehousing solutions into their operational infrastructure faster than ever before.

This is the first of a series of In the Know articles covering all four building blocks of data quality.

How good is your data?
How complete, and how accurate, is the data in your corporate repositories? Many organisations would find it difficult to answer that question with confidence. They simply don't know. If that is the case, then typically they are in for a rather rude awakening as soon as they undertake a business intelligence project, for example to build a data mart for CRM. At that point they discover that it's next to impossible to build accurate source-to-target mappings and business rules within their ETL tool.

Manually analysing and correcting the source data can become a black hole sucking in huge amounts of time and effort. The end result is spiralling costs and missed deadlines. So what you don't know about your data can and probably will hurt you, and hurt you badly.

If you've worked on a large data integration project you will know that there can be many reasons for poor data quality, including:

  • Data that does not fulfil the business requirement
  • Inaccurate and incomplete data
  • Problems arising from consolidation of multiple systems
  • Inadequate documentation of legacy systems
Quite apart from the strain and the expense imposed on the IT function, these issues can cause the delay and failure of analytical projects and can cause significant business damage, for example to customer goodwill.

The data profiling process
If you rely on manual verification of data, there are no effective short cuts to data quality. But automatic data profiling can give you a complete understanding of the structure and content of your source data, in a fraction of the time. This will enable your analysts to create a process for moving data into a target repository for business intelligence and analytics.

Data profiling is a three-step process:

  1. Analyse the source data. The objective is to ensure that the profile is based on what is actually in the data, not gut feel, wishful thinking or outdated system documentation
  2. Based on your findings, apply the appropriate business rules and cleansing techniques within your ETL process
  3. Execute a final set of profiling reports to guarantee data quality and validate the transformation process
By undertaking this process systematically with the right tools, you can remove the risks of poor data quality early on - and this will pay handsome dividends in the subsequent stages of data management.

SAS® Data Profiling
SAS offers the statistical power to profile any data source and gain a thorough understanding of its structure and content. What's more you can do the profiling through a uniquely intuitive visual interface, greatly reducing time and cost.

SAS Data Profiling is based on comprehensive column, attribute and pattern analysis to highlight potential problems in structured data. It delivers comprehensive interactive reports based on metadata, for example by identifying table level, column level and record level statistics such as null values, frequency counts, uniqueness and much more. It also surfaces other statistical results such as minimum, maximum and mean values and primary candidate recognition.

All of the reports feature drill-down to any attribute value and corresponding data row, so you can quickly view data in the context of an entire record.

Join us in Marlow for a Data Quality Briefing on the 4th November to find out more about the latest data quality techniques and receive practical guidance on implementing data quality in your organisation.

For further information about SAS ETL and data quality solutions, please visit: http://www.sas.com/technologies/dw/etl/index.html