|
|
 |
|
 |
|
How Do You Eat an Elephant?
Master Data Management: Integrating
data across the enterprise
By Marc Smith, Manager,
Solutions Specialists, SAS Canada
Why now? Simply put, organizations need to integrate
information across the enterprise. Compliance, mergers and acquisitions,
and the drive to increase sustainable profits and services, fuel
the need for rapid delivery of integration services. This challenge
is not new. But the importance of consistent core data across the
enterprise has increased dramatically in the last decade.
How did we get here? Over the past 10-15 years,
there have been numerous waves of technology meant to rapidly monetize
information through the consolidation of data. Whether the goal
is to streamline supply chain management, improve resource planning,
or get up close and personal with your customers, these ideas have
usually been marketed and sold as packaged software solutions.
These solutions have usually promised way more than they could
deliver.
Why is this? There are a few reasons. First of all, these solutions
are designed with functionality in mind, with the assumption that
the data that fuels the operations is readily available and is
of high quality. Second, introducing performance orientation into
most organizations needs to be accompanied by the right behavioral
and organizational practices that empower taking actions which
have measurable results. Third, these solutions cannot stand apart
from the ongoing transactional/operational parts of the business,
but instead must be more tightly integrated across both functional
and analytical domains.
Addressing these issues requires more than just software – it
also must include the governance and organizational discipline
that can make enterprise resource planning (ERP), supply chain
management (SCM) or customer relationship management (CRM) viable.
This is where master data management (MDM) fits into an enterprise
information management program. An MDM program is intended to bridge
the gap between line-of-business applications and coordinated,
centralized and consistent information management that assures
high-quality data feeding both into and out of enterprise analytical
applications. When it accompanies a strong analytical set of capabilities,
MDM is a strategic organizational infrastructure initiative that
provides a seamless mechanism for extracting true information value
out of your islands of data.
Learn to play nice in each other’s sandbox
Crossing
organizational boundaries means that all parts of the business
need to learn to play nice in each other’s domain. Enterprise governance
attempts to align skills, knowledge processes, culture and technology.
The enterprise approach that drives MDM means that you need to
learn to look beyond departmental boundaries. The proliferation
of enterprise software systems and business practices have reached
beyond the enterprise, through multichannel touch points, complex
distribution channels and integrated supply chains. Business process
management (BPM) and service-oriented architecture (SOA) ensure
that components are shared and reusable, and that organizations
can evolve faster through innovation, flexibility, and integration
with technology.
When giants collide
MDM bridges the gap between the operational
world and analytical data worlds. Essential data for the enterprise
must be complete, consistent, accurate and able to span both operational
and analytical systems. As organizations mature in managing their
core data assets, core master data is managed in enterprise hubs
that span across all areas of the business. These enterprise hubs
are surrounded by services for global identification, linking and
synchronization of heterogeneous data sources. These services are
required to support the full life cycle of core data objects that
are influenced by long-standing, policy-driven processes and behavior.
With proper governance in place over key data, quality data can
be applied to cross-functional applications, while the use of analytics
will enable real-time, automated decision making capabilities.
Predictive analytics is a natural
step for MDM. As processes become unified through process orchestration,
so too does the need to enable these business processes with analytics.
For example, a customer credit score can be used to accept or deny
credit, change billing processes, affect a marketing campaign,
etc. By sharing powerful analytics,, such as credit score and expected
lifetime value, across the enterprise the organization will be
able to treat customers differently based on business processes
that consistently react to the enhanced intelligence.
How do you eat an elephant?
MDM initiatives are large projects
and disruptive to an enterprise application architecture. Many
early attempts at MDM failed, mainly due to organizations that
viewed MDM as an IT problem. Leading organizations, however, see
master data management programs as strategic. Today, it is widely
recognized that data is a key strategic asset for organizations
to optimize their operations and innovate new products and services.
For decades, though, corporations tolerated inaccuracy in their
data. It’s a business problem where the cost of poor data quality
is unknown and the value in fixing it on a large scale is too difficult
to quantify.
A more recent approach to MDM is
to build incrementally, focused on business value creation in short
bites, giving the organization time to digest what has been delivered.
To be successful in the evolution, data governance programs must
materialize in organizations and these programs must mature as
the organization matures. Data governance is a programmatic approach
to managing information across an organization. It involves a formal
set of business processes and policies designed to ensure that
data is handled in a prescribed fashion, with human intervention
handled by trained data stewards.
SAS and DataFlux for Data
Integration
There are fundamental differences
between DataFlux and SAS software for Data Integration. SAS is
optimized for moving, analyzing and transforming large amounts
of data. Its architecture is ideal for reading and processing very
large data sets. SAS has the ability to read most any data format
and to transform this data into storage formats for analytics and
business intelligence. The SAS platform is built around the data
warehouse and analytic storage frameworks.
The DataFlux platform is ideal for
processing transaction data. By utilizing operational interfaces
and service-oriented architecture, DataFlux is optimized for creating
a set of business rules, data quality rules and transformation
rules and executing those rules in a high transaction rate environment.
This platform is also optimized for data normalization and rationalization
of data sources when there is no requirement to store historical
data. Often, these types of opportunities require the processing
of large numbers of transactions and the DataFlux platform is designed
with the capabilities of processing these large transaction environments.
Together, DataFlux and SAS provide
an end-to-end set of technologies
that are best-of-breed for data quality, data integration and
analytics. The two environments work well together. With the ability
to share data quality rules and data transformations and business
rules, customers are able to build their operational and analytical
environments based on a consistent set of data management techniques,
ensuring data is accurate and reliable across the enterprise.
|
|
|
 |
|
|
|
 |
|
| |
|
|
 |

SAS Platform
By Gary Gray, Senior
Solutions Specialist
This edition’s special User Tip comes from Gary
Gray, a Senior Solutions Specialist
at SAS Canada. Gary has been with
SAS for 15 years, and focuses on
the SAS® platform with particular attention
to data integration and data quality,
as well as business intelligence.
Gary has elected to address some
frequently asked questions.
1. Can I share jobs and schemes in DataFlux®
dfPower Studio® with other users?
To share information among multiple DataFlux
dfPower Studio users, you have a few options.
Of course, you can import and export
most jobs and reports in DataFlux dfPower Studio. However, for
tighter integration of all jobs, schedules and reports, you can
share a common Management Resource directory. To do this, all users
must choose the Options menu item in the main DataFlux dfPower
Studio application and point to a common location for the Management
Resources directory. This could be a network location or a shared
folder on a particular user's computer. To share items in a Quality
Knowledge Base (QKB), such as schemes and match definitions, all
users must point DataFlux dfPower Studio to a common QKB location.
This is done similar to the way described above except that the
path for the QKB directory should be the same for each user.
2. How do I access data in an Excel spreadsheet?
The additional step that you need to take to
properly connect to an Excel spreadsheet is to define and name
the range of data (the range of cells in the spreadsheet that contains
the data you want to work with). In Excel, select the range you
want to work with (this can be the whole spreadsheet), and use
the menu item "Insert > Name > Define." Simply
name the range. When you refresh your data sources in DataFlux
dfPower Studio, you should now be able to select that ODBC data
source and the "table" from that source, which is actually
the range you just configured.
3. Why do my match results differ between versions
of DataFlux dfPower Studio?
There are a few possible reasons for this. Usually,
differing match results mean you are using different QKBs. It could
also be that you are using the same QKB, but used different criteria
when setting up the match job (i.e., different combinations of
fields, conditions, match definitions or sensitivities). Lastly,
if you move from drastically different versions of DataFlux dfPower
Studio, like upgrading from version 4.3 to 6.0, you may see differences
in your match results.
4. How do I improve match sensitivity?
All of these mechanisms work to overcome the
ambiguities inherent in that kind of data. For example, a Name
match definition most likely will match values like "Bob" and "Robert." An
Organization match definition is directed to overlook values like "Inc." and "Corp." while
at the same time matching "First" with "1st."
When all of these transformations
are finished in memory, only a certain number of characters are
actually utilized to create the resultant match code. At low sensitivities,
less of the now-transformed value is used; at higher sensitivities,
more of it is used. When more characters are used, you are less
likely to find other similar match codes. The number of characters
used for each sensitivity depends on how certain values should
be weighted with regard to their importance. In a matching process,
last names are more useful than first names. So when matching names,
more of the last name is used compared to the first name, at a
given sensitivity.
As you can see, when you adjust the
sensitivity level, there is actually
quite a bit going on "behind
the scenes." Choosing the correct level for each type of data
depends on the overall importance
of that data to the quality of the match. The looser the match
sensitivity, the greater chance you have to find all possible permutations.
However, at this level, you open your results up to more falsely
matched vales. And, using the highest sensitivity does not mean
you are performing an exact match. To do that, you actually have
to select the Exact match definition from within DataFlux dfPower’s
match functionality.
|
|
 |
|
|
|
|
|
 |
|
 |
|
|
|
|
|
 |
|
|
|
 |
|
|
 |
 |
SAS and all other SAS Institute
Inc. product or service
names are registered
trademarks \or trademarks
of SAS Institute Inc. in the USA and other countries. ® indicates
USA registration. Other brand and product names are trademarks
of their respective companies. Copyright © 2009, SAS Institute
Inc. All rights reserved. |
|
|
|