FALL 2009
DATA 101

Have you changed companies?  Do you want to modify your contact preferences?  Don’t forget to update your SAS Profile.

Nominate colleagues for the 2009 Customer Value Award.

For details on all of SAS' products and solutions, click here.

 

How Do You Eat an Elephant?
Master Data Management: Integrating data across the enterprise
By Marc Smith, Manager, Solutions Specialists, SAS Canada

Why now? Simply put, organizations need to integrate information across the enterprise. Compliance, mergers and acquisitions, and the drive to increase sustainable profits and services, fuel the need for rapid delivery of integration services. This challenge is not new. But the importance of consistent core data across the enterprise has increased dramatically in the last decade. 

How did we get here? Over the past 10-15 years, there have been numerous waves of technology meant to rapidly monetize information through the consolidation of data. Whether the goal is to streamline supply chain management, improve resource planning, or get up close and personal with your customers, these ideas have usually been marketed and sold as packaged software solutions. These solutions have usually promised way more than they could deliver.

Why is this? There are a few reasons. First of all, these solutions are designed with functionality in mind, with the assumption that the data that fuels the operations is readily available and is of high quality. Second, introducing performance orientation into most organizations needs to be accompanied by the right behavioral and organizational practices that empower taking actions which have measurable results. Third, these solutions cannot stand apart from the ongoing transactional/operational parts of the business, but instead must be more tightly integrated across both functional and analytical domains.

Addressing these issues requires more than just software – it also must include the governance and organizational discipline that can make enterprise resource planning (ERP), supply chain management (SCM) or customer relationship management (CRM) viable. This is where master data management (MDM) fits into an enterprise information management program. An MDM program is intended to bridge the gap between line-of-business applications and coordinated, centralized and consistent information management that assures high-quality data feeding both into and out of enterprise analytical applications. When it accompanies a strong analytical set of capabilities, MDM is a strategic organizational infrastructure initiative that provides a seamless mechanism for extracting true information value out of your islands of data.

Learn to play nice in each other’s sandbox
Crossing organizational boundaries means that all parts of the business need to learn to play nice in each other’s domain. Enterprise governance attempts to align skills, knowledge processes, culture and technology. The enterprise approach that drives MDM means that you need to learn to look beyond departmental boundaries. The proliferation of enterprise software systems and business practices have reached beyond the enterprise, through multichannel touch points, complex distribution channels and integrated supply chains. Business process management (BPM) and service-oriented architecture (SOA) ensure that components are shared and reusable, and that organizations can evolve faster through innovation, flexibility, and integration with technology.

When giants collide
MDM bridges the gap between the operational world and analytical data worlds. Essential data for the enterprise must be complete, consistent, accurate and able to span both operational and analytical systems. As organizations mature in managing their core data assets, core master data is managed in enterprise hubs that span across all areas of the business. These enterprise hubs are surrounded by services for global identification, linking and synchronization of heterogeneous data sources. These services are required to support the full life cycle of core data objects that are influenced by long-standing, policy-driven processes and behavior. With proper governance in place over key data, quality data can be applied to cross-functional applications, while the use of analytics will enable real-time, automated decision making capabilities.

Predictive analytics is a natural step for MDM. As processes become unified through process orchestration, so too does the need to enable these business processes with analytics. For example, a customer credit score can be used to accept or deny credit, change billing processes, affect a marketing campaign, etc. By sharing powerful analytics,, such as credit score and expected lifetime value, across the enterprise the organization will be able to treat customers differently based on business processes that consistently react to the enhanced intelligence. 

How do you eat an elephant? 
MDM initiatives are large projects and disruptive to an enterprise application architecture. Many early attempts at MDM failed, mainly due to organizations that viewed MDM as an IT problem. Leading organizations, however, see master data management programs as strategic. Today, it is widely recognized that data is a key strategic asset for organizations to optimize their operations and innovate new products and services. For decades, though, corporations tolerated inaccuracy in their data. It’s a business problem where the cost of poor data quality is unknown and the value in fixing it on a large scale is too difficult to quantify.

A more recent approach to MDM is to build incrementally, focused on business value creation in short bites, giving the organization time to digest what has been delivered. To be successful in the evolution, data governance programs must materialize in organizations and these programs must mature as the organization matures. Data governance is a programmatic approach to managing information across an organization. It involves a formal set of business processes and policies designed to ensure that data is handled in a prescribed fashion, with human intervention handled by trained data stewards.

SAS and DataFlux for Data Integration
There are fundamental differences between DataFlux and SAS software for Data Integration. SAS is optimized for moving, analyzing and transforming large amounts of data. Its architecture is ideal for reading and processing very large data sets. SAS has the ability to read most any data format and to transform this data into storage formats for analytics and business intelligence. The SAS platform is built around the data warehouse and analytic storage frameworks.

The DataFlux platform is ideal for processing transaction data. By utilizing operational interfaces and service-oriented architecture, DataFlux is optimized for creating a set of business rules, data quality rules and transformation rules and executing those rules in a high transaction rate environment. This platform is also optimized for data normalization and rationalization of data sources when there is no requirement to store historical data. Often, these types of opportunities require the processing of large numbers of transactions and the DataFlux platform is designed with the capabilities of processing these large transaction environments.

Together, DataFlux and SAS provide an end-to-end set of technologies that are best-of-breed for data quality, data integration and analytics. The two environments work well together. With the ability to share data quality rules and data transformations and business rules, customers are able to build their operational and analytical environments based on a consistent set of data management techniques, ensuring data is accurate and reliable across the enterprise.

 

SAS Platform
By Gary Gray, Senior Solutions Specialist

This edition’s special User Tip comes from Gary Gray, a Senior Solutions Specialist at SAS Canada. Gary has been with SAS for 15 years, and focuses on the SAS® platform with particular attention to data integration and data quality, as well as business intelligence. Gary has elected to address some frequently asked questions.

1. Can I share jobs and schemes in DataFlux® dfPower Studio® with other users?

To share information among multiple DataFlux dfPower Studio users, you have a few options.
Of course, you can import and export most jobs and reports in DataFlux dfPower Studio. However, for tighter integration of all jobs, schedules and reports, you can share a common Management Resource directory. To do this, all users must choose the Options menu item in the main DataFlux dfPower Studio application and point to a common location for the Management Resources directory. This could be a network location or a shared folder on a particular user's computer. To share items in a Quality Knowledge Base (QKB), such as schemes and match definitions, all users must point DataFlux dfPower Studio to a common QKB location. This is done similar to the way described above except that the path for the QKB directory should be the same for each user.

2. How do I access data in an Excel spreadsheet?

The additional step that you need to take to properly connect to an Excel spreadsheet is to define and name the range of data (the range of cells in the spreadsheet that contains the data you want to work with). In Excel, select the range you want to work with (this can be the whole spreadsheet), and use the menu item "Insert > Name > Define." Simply name the range. When you refresh your data sources in DataFlux dfPower Studio, you should now be able to select that ODBC data source and the "table" from that source, which is actually the range you just configured.

3. Why do my match results differ between versions of DataFlux dfPower Studio?

There are a few possible reasons for this. Usually, differing match results mean you are using different QKBs. It could also be that you are using the same QKB, but used different criteria when setting up the match job (i.e., different combinations of fields, conditions, match definitions or sensitivities). Lastly, if you move from drastically different versions of DataFlux dfPower Studio, like upgrading from version 4.3 to 6.0, you may see differences in your match results.

4. How do I improve match sensitivity?

All of these mechanisms work to overcome the ambiguities inherent in that kind of data. For example, a Name match definition most likely will match values like "Bob" and "Robert." An Organization match definition is directed to overlook values like "Inc." and "Corp." while at the same time matching "First" with "1st."

When all of these transformations are finished in memory, only a certain number of characters are actually utilized to create the resultant match code. At low sensitivities, less of the now-transformed value is used; at higher sensitivities, more of it is used. When more characters are used, you are less likely to find other similar match codes. The number of characters used for each sensitivity depends on how certain values should be weighted with regard to their importance. In a matching process, last names are more useful than first names. So when matching names, more of the last name is used compared to the first name, at a given sensitivity.

As you can see, when you adjust the sensitivity level, there is actually quite a bit going on "behind the scenes." Choosing the correct level for each type of data depends on the overall importance of that data to the quality of the match. The looser the match sensitivity, the greater chance you have to find all possible permutations. However, at this level, you open your results up to more falsely matched vales. And, using the highest sensitivity does not mean you are performing an exact match. To do that, you actually have to select the Exact match definition from within DataFlux dfPower’s match functionality.

SAS and all other SAS Institute Inc. product or service names are registered trademarks \or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2009, SAS Institute Inc. All rights reserved.