Managing the data asset
How to treat your data like the high-value business asset that it is
Data is now collected and saved from every conceivable source – Internet applications, front-office and back-office systems, trading networks, social media – and such complexity requires a sophisticated, deliberate process for managing this vital information. After all, data holds the key to sales, marketing, customer support, production and other initiatives. Without an accurate view of customers, products, materials, locations and assets, how can a company compete in today's marketplace?
Organizations must approach data management in the same fashion that they manage any process – with a well-defined, repeatable methodology. To accomplish this, you need a data management lifecycle methodology to manage, monitor and maintain data to benefit every phase of the business.
DataFlux recommends this six-phase process:
Let's look at each step in detail.
The Define phase of the data management methodology is just as important as mapping out a journey. The decisions made at this phase will guide the collection, organization, enhancement, monitoring and retirement of your data assets throughout the process. While you don't need all the answers at the beginning, you need a solid plan on how to proceed – and what the ultimate success indicators will be. Also, these success indicators have to map to the business problem identified earlier; the reason for the project (cut costs, mitigate risks, enhance revenue, etc.) provides a crucial guide for the Define phase.
During the Define phase, an organization should first answer a series of questions about:
People. Who's involved? And for what purpose? This step outlines everyone involved in the data management process, including executive support, manager/director sponsorship and business and IT involvement. Organizations also set up steering committees and/or stewardship teams to facilitate collaboration on cross-functional issues.
Road map. Where are we now? Where do we want to go? What obstacles are in our way? Often, the first task after selecting the right people is to determine a path to a successful outcome – including the definition of a successful outcome.
Source systems. What data will we need? Where is that data coming from? The road map tells the story of where the project is intended to go. This part of the Define phase will inform the team on the source systems and which data will play a role in the data management project.
Business processes. Which business processes will be affected? How will better data enhance the way the organization operates? This part of the Define process maps the data management strategy to existing business processes. Better data can ultimately streamline business processes, as less time is spent reconciling confusing views or managing poor-quality data.
Business rules and data definitions. How do we define "customer?" How do we want to optimize procurement and spending? This phase seems simple enough, but it can be deceptively difficult. Billing might define "customer" as anyone that receives an invoice, while customer support may only want to know who the user is. These decisions will form the basis of business rules and data definitions that will guide later phases.
Data can only be useful if you understand where it is, what it means to your organization and how it relates to other data in your organization. The Discover phase is designed to do just that.
Every new application implementation, data warehouse development, data migration or consolidation initiative should start with data discovery. Additionally, any time that new data sources enter your organization, start with data discovery. Data discovery has several components to it, and each prepares you for your data initiatives:
Data exploration. This diagnostic phase is concerned with documenting the data in your organization and the characteristics of that data. Data discovery arms you with information about the accuracy, consistency and reliability of your data.
Data profiling and auditing. Data profiling alerts you to data that does not match the characteristics defined in the metadata compiled during data exploration. But, more importantly, data profiling can also tell you if the data meets the business rules and definitions established in the Define phase. In addition, data profiling can help you determine the relationships across your data sources – where you have similar data, where data is in conflict, where data is duplicated and where data may be dormant.
Data cataloging and business vocabulary. You need a development environment where data sources can be combined and rationalized. A place where you can group data sources into projects to allow you to work across your data sources and develop a consistent environment for managing your data. Data cataloging lays the groundwork for all data management tasks to follow. Data catalogs must be augmented with business definitions and vocabularies, allowing the business user to comfortably navigate the landscape.
After completing the first two steps of the data management methodology, you will be able map your strategy, identify sources, understand the underlying formats and structures, as well as assess the relationships and uses of data. Now you face another challenge – taking all of these different structures, formats, data sources and data feeds, and creating an environment that accommodates the needs of your business. The Design phase requires consolidation and coordination, all the while concentrating on three major imperatives:
Consistency of rules. Ultimately, an organization needs one set of business rules that can be stored centrally but deployed across all data sources, applications and lines of business.
Consistency of the data model. The data model is the single, definitive source for how your data maps to your business. Through the process of creating a well-structured data model, you identify the appropriate source systems and begin to reconcile multiple views, if required.
Consistency of business processes. During the Define and Discover phases, you will identify processes that are potentially affected. Now, the task is to provide consistency across these processes. When creating business rules, you have to know how to reconcile questions like "Is this a new customer or an existing customer?" or "Is this a customer in good standing?" By understanding the processes that are affected, you can design more effective rules to automate business processes.
Now that the business users have established how the data and rules should be defined, it is up to the IT staff to ensure that databases and applications adhere to the definitions. There are many types of architectures involved in this phase: enabling ERP and CRM applications via proprietary interfaces, enabling data marts and data warehouses via extraction, transformation and loading (ETL) flows, enabling MDM systems via service-oriented architecture (SOA)/ETL or other technologies. The method and management of enabling the data in any of these environments is a decision that IT has to make in order to ensure the integrity and integration into the various systems.
One potential pitfall in the Execute phase is to duplicate the rules and standards from the Design phase for each application or data source. When duplicating the rules and definitions across siloed, unrelated systems, multiple, point-to-point interfaces are inadvertently created. These rules definitions must then be updated, separately, each time a rule or business initiative changes.
Naturally, this approach is highly impractical for the IT team to manage. A better solution is to build the definitions once and ensure that you have the ability to collectively apply those definitions across your organization. As one IT director put it: "We want to build our standards and rules once and then have the ability to use them repeatedly and propagate to the entire organization seamlessly."
For each data source, each business process and each application that is modified to the new data definitions, you need to:
- Understand the requirements.
- Validate that the new integration meets the requirements.
- Deploy the interface into production.
By repeating this process during the execution phase, you can create the data management rules to guide the collection and organization of data, test its integrity, and move to the next phase of the process.
A healthy data life cycle requires a robust monitoring and reporting system. The data needs to be consistently monitored so it remains fit-for-purpose for your organization. Why is this so critically important? After all, you just spent lots of time, energy and resources to get your systems to a point where the business users have a consistent and validated view of your organization. Isn't it time to just enjoy the success of all this effort?
Actually, the opposite is true. Very few organizations are static – they are forever growing and evolving. For example, you add new partners that bring new data to the table. Your business changes, sales regions are created or modified, you take on new initiatives and you develop new products. All of these changes must be reflected in your data, which makes the Evaluate phase so important.
Your mantra for success at this point needs to be: 1. Monitor; 2. Review; and 3. Optimize. Data should be monitored and validated as it enters your organization to verify it is meeting your rules. Those rules need to be constantly monitored to ensure they are still meeting the needs of your business. Efforts in discovery, design and execution will allow you to consolidate the rules and requirements into a single environment. With the ability to centralize the required data management rules, the changes can be immediately propagated across the organization, without duplication of effort.
Monitoring is a joint activity between IT and business users. IT monitors and validates that systems are running within their required service-level needs. Business users also benefit from the monitoring reports – constantly reviewing the reports and validating that business needs are being met while making changes when the business needs change.
One thing is certain in today's information age: A wide variety of data will continue to quickly pour into your organization. It is easy to see why data is a key asset. However, it is also important to recognize when data needs to be retired. The Control phase is about reassessing data. If data is no longer useful to your organization, you must be able to retire the data appropriately. This allows you to free up resources that are being expended maintaining the data environment.
For example, let's look at a common data problem facing financial services firms. When mergers, acquisitions and divestitures occur, you need the ability to purge or re-categorize data. You don't want to spend resources managing the data of a company that no longer exists.
Lastly, it is important to promote your successes across your organization. When you began your life cycle, you were solving a business problem. By the time you have reached this phase in the life cycle, you should have improved your business. Communicate and evangelize these messages to help everyone from senior management on down recognize that the efforts were successful and the business is improved. This demonstrates the business benefits of a sound data management methodology across the organization, and it paves the way for support of future initiatives.
President and CEO of DataFlux