Data management backgrounder

What it is – and why it matters

You’ve done enough research to know that data management is an important first step in dealing with big data or starting any analytics project. But you’re not too proud to admit that you’re still confused about the differences between master data management and data governance. Or maybe you know these terms by heart. And you feel like you’ve been explaining them to your boss or your business units over and over again.

Either way, we’ve created the primer you’ve been looking for. Print it, post it to the team bulletin board, or share it with your mom so she'll understand what you do. And remember, a data management strategy should not focus on just one of these areas. You need to consider them all.

Data access

What is it? Data is only an asset if you can get to it. Data access refers to an organization’s ability to retrieve information from any source. Data access technology, such as database drivers or document converters, help to make this step as easy and efficient as possible so you can spend your time using the data – not just trying to find it.
Why is it important? The data that organizations need can exist in many sources – spreadsheets, text files, databases, emails, business applications, web pages, social media feeds and streaming data from the IoT. Without a good way to access data from these sources, collecting the information becomes a nightmare. Though it’s often a forgotten element of data management, good data access technology is essential for organizations to extract useful data from any type of storage mechanism or format that’s available. Without it, trying to get the data you need is like walking into a vast, sprawling library with row after row of bookshelves and being told to look for a specific printed sentence with no instructions, no map, no organization and no one to help you.

Data integration

What is it? Once you’ve accessed the data you need, what do you do with it? A pretty common next step is to combine it with other data to present unified results. Data integration is the process that defines the steps to do this, and data integration tools help you design and automate the steps that do this work. The most common types of data integration tools are known as ETL, which stands for extract, transform and load, and ELT, which stands for extract, load and transform. Today, data integration isn’t limited to movements between databases. With the availability of in-memory servers, you might be loading data straight into memory, which bypasses the traditional database altogether.
Why is it important? Data integration allows organizations to create blended combinations of data that are ultimately more useful for making decisions. For example, one set of data might include a list of all customer names and their addresses. Another set of data might be a list of online activity and customer names. By itself, each set of data is relevant and can tell you something important. But when you integrate elements of both data sets, you can start to answer questions like, “Who are my best customers?” and “What’s the next best offer?” By combining key information from each set of data, you can create the best possible customer experience.

Data quality

What is it? Data quality is the practice of making sure data is accurate and usable for its intended purpose. Just like ISO 9000 quality management in manufacturing, data quality should be used at every step of a data management process. This starts from the moment data is accessed, through various integration points with other data – it even includes the point just before the data is published, reported on or referenced at another destination.
Why is it important? It’s easy to store data, but what is the value of that data if it’s incorrect or unusable? A simple example is a file with the text “123 MAIN ST Anytown, AZ 12345” in it. Any computer can store this information and provide it to a user – but without help, it can’t determine that this record is an address, which part of the address is the state, or whether mail sent to the address will get there. Correcting a simple, single record manually is not hard. But just try to perform this process for hundreds, thousands or millions of records! It’s much faster to use a data quality solution that can standardize, parse and verify in an automated, consistent way. By doing this at every step, you can eliminate risks like sending customer mail to an incorrect address.

Data governance

What is it? Data governance is a framework of people, policies, processes and technologies that define how you manage your organization’s data. It’s one way to make sure your data strategy aligns with your business strategy.
Why is it important? Data governance is usually driven by the need to comply with state or federal regulations like Sarbanes Oxley or the GDPR. It starts by asking general business questions and developing policies around the answers: How does your organization use its data? What constraints do you have to work around? Who has responsibility over the data? How should different data types be defined? Once you know the answers to these questions, you can define rules to enforce them and develop a data catalog to define terms. Let’s say you need to define what data users can access, what is considered "customer" data, which users can change the data (versus simply viewing it), and how to handle exceptions to the policies. Data governance tools help to control and manage those policies, trace how they’re handled and deliver reports for auditing purposes. And, similar to data quality, you can create data governance dashboards to help monitor the level of adherence to these policies.

Data federation

What is it? Data federation is a special kind of data integration. The ETL and ELT types of data integration combine data and then store it elsewhere for use. In the past, it was stored within a data mart or a data warehouse – and more recently, in a data lake. But what if you simply want to look at the combined results without needing to move and store it beforehand? Data federation provides the capacity to do just that, so you can access the combined data at exactly the time it’s requested.
Why is it important? While many ETL and ELT data integration tools can run fast, their results only represent a snapshot of what happened at a certain point in time (that is, when the process ran). With data federation, a result is generated based on what the sources of data look like at the time of the request. This allows for a timelier and potentially more accurate view of information. Data federation helps in other ways, too. When you use it to reference and orchestrate data sources without copying or moving the data, you can process the data where it lives, avoiding unnecessary data transfers. Consider how data federation helps address a central tenet of the GDPR known as “privacy by design.” Data federation capabilities enable you to provide centralized and role-based access to sensitive data. Security and data masking – such as hashing, randomization and encryption – provide additional protection.

Master data management

What is it? Master data management (MDM) is a set of processes and technologies that defines, unifies and manages all of the data that is common and essential to all areas of an organization. This master data is typically managed from a single location, often called a master data management hub. The hub acts as a common access point for publishing and sharing this critical data throughout the organization in a consistent manner.
Why is it important? Simple: It ensures that different users are not using different versions of the organization’s common, essential data. Without MDM, a customer who buys insurance from an insurer might continue to receive marketing solicitations to buy insurance from the same insurer. This happens, for example, when the information managed by the customer relationship database and marketing database aren’t linked together. That leads to two different records of the same person – and a confused, irritated customer.

With master data management, all organizational systems and data sources can be linked together and managed consistently on an ongoing basis. This ensures that any master data used by the organization is always consistent and accurate. In the big data world, MDM can also automate the use of certain data sources, what determine types of analytical models to apply, what context to apply them in, and the best visualization techniques for your data.

Data preparation

What is it? Data preparation is the task of blending, shaping and cleansing data before it’s used in business processes. It involves combining data from various sources, then cleansing and transforming it to get it ready for analytics or other business purposes. Data preparation is often done on a self-service basis so that business users can access and manipulate the data they need without writing code and without overloading IT.
Why is it important? Organizations face enormous volumes of dirty data, coupled with a gap in the skills needed to manage it. Business users can’t get timely access to the data, and data scientists waste valuable time preparing data instead of generating insights. In fact, most data scientists have to spend 50% to 80% of model development time on data preparation alone. Good data preparation tools remove the drudgery of routine data preparation, reveal sparkling clean data and add value along the way. By empowering business users and freeing IT to focus on more strategic initiatives, you can overcome the data prep skills gap and get speedy access to data you can trust.

Learn to trust your data

Despite recent advances in data management technology and tools, many organizations are still overwhelmed by today's massive volumes and types of fast-moving data. With SAS Data Management, you get data on demand, which helps you make decisions you can trust to run a data-driven business. Be prepared to act on what your data tells you – to achieve competitive advantage, address compliance mandates, boost profitability and keep your customers coming back for more.