Graphs on computer screen

Data Management

What it is and why it matters

Data management is the practice of managing data as a valuable resource to unlock its potential for an organization. Managing data effectively requires having a data strategy and reliable methods to access, integrate, cleanse, govern, store and prepare data for AI and analytics.

Data management in the era of AI

As long as businesses have collected data, they’ve had to manage it to avoid the conundrum of “garbage in, garbage out.” Good data management is essential to ensuring trusted, ethical and bias-free outputs. It’s particularly critical for artificial intelligence and machine learning tasks and large language models (LLMs) that are trained on huge data sets then used to understand and generate human language.

What is modern data management?

Modern data management is coupled with AI and machine learning. As these technologies evolve, the need for data access, data quality and data governance intensifies. In this explainer video, learn how modern data management is about new technologies and operations, such as DataOps and AIOps. You’ll hear why AI and machine learning models require trusted data for organizations to avoid risk, cost and lost productivity – especially in highly regulated industries with strict compliance requirements.

History of data management

Some say the need for data management began in the 1890s with mechanical punch cards that recorded information (data) on a thick card. But the concept of data management wasn’t widely discussed until the 1960s, when the Association of Data Processing Service Organizations (ADPSO) began providing data management advice for professionals.

Data management systems as we know them today weren’t common until the 1970s. These data management systems were strictly operational. They provided records (reports) of business operations at a given point in time, pulled from a relational database that stored information in rows and columns (typically a data warehouse). Some of the common processes and technologies related to data management include:

  • Batch processing and extract, transform, load (ETL).
  • Structured query language (SQL) and relational database management systems (RDBMSs).
  • Not-only SQL (NoSQL) and nonrelational databases.
  • Enterprise data warehouses, data lakes and data fabrics.
  • Data federation and virtualization.
  • Data catalogs, metadata management and data lineage.
  • Cloud computing and event stream processing (data streaming).

Today, business and IT functions can collaborate to optimize the way data is managed before it’s used with AI and generative AI (GenAI). Data engineers and analysts, for example, work with data scientists to manage and extract value from data.

Strong data focus: The foundation for student and university success

Like many others, the University of North Texas (UNT) was data rich and insight poor. It had fundamental issues with data integrity, data management and data governance – and with data relegated to silos, enterprise analytics was difficult. Learn how deploying data management software from SAS brought about a seismic shift in analytics capabilities at UNT – resulting in better student outcomes and tremendous savings.

Data management in today's world

Taking charge of your data requires tackling a wide range of data management concepts, technologies and processes. Learn from data experts what it takes to master this approach.

The future of data and AI

Rapid innovation, like AI, requires a solid data quality strategy. Learn how data quality can position organizations to succeed in their AI endeavors.

The data management path to AI

Maximize the business value of AI with modern data management. Discover how to move your organization forward instead of spending so much time asking questions about the data.

Self-service data preparation

Imagine the results if business users could prep data for analytics without relying on IT– no coding or special skills required. SAS® Data Preparation lets business users access, cleanse, profile and transform data on their own.

Generative AI and data management

Data management tools are essential to feeding LLMs with high-quality data and prompts – data that is auditable and traceable. With robust data protection measures like data minimization, anonymization and encryption, these tools protect user privacy and security.

Who's using data management?

Data management powers the processes for successful organizations across all industries. With more data and easier access to analytics comes the chance to seize more opportunities, ask more questions and solve more problems. Learn how global industries are using data management to support their goals.

Banking

More than ever, issues around data privacy, compliance and digitization require banks to have a trusted data foundation. Only with a complete, integrated view of all their data – and sound techniques for quality, governance and personal data protection – can banks gain customers’ trust and pursue forward-looking digital transformation efforts.

Health care

Enterprise data management is a must-have in the health care industry. The industry relies on being able to integrate data from all formats and sources – including external data – all while spotting duplicate data, fixing data quality issues, and adhering to strict regulatory and compliance requirements for protecting personal data and privacy.

Insurance

Insurers juggle massive amounts of data every day – including data from insurance quotes, policies, claims, customers and connected IoT devices. Building good actuarial models and making informed decisions about pricing, reserving, payments and more depends on having reliable data management capabilities to appropriately integrate, cleanse and govern insurance data.

Manufacturing

In the manufacturing industry, nothing speaks success like quality. With solid data management and data quality technologies, manufacturers can efficiently manage product inventory, and integrate structured and unstructured data from all sources to get an enterprise view of performance, drive better outcomes and make well-informed business decisions.

Public sector

Local and national governments are responsible for a vast range of services and programs. Reliable data management technologies support all those efforts – from fighting fraud and improper payments to ensuring citizen safety to overseeing population health outcomes, economic development and smart city initiatives.

Retail & Consumer Goods

Understanding customer experience and responding appropriately to expectations requires an accurate, up-to-date view of all the data – whether it’s streaming, cloud-based, or stored in a data lake or warehouse. From marketing to merchandising to sales, trusted data management is essential to taking charge of retail data.

Data management needs AI and machine learning, and just as important, AI/ML needs data management. As of now, the two are connected, with the path to successful AI intrinsically linked to modern data management practices. Dan Soceanu Dan Soceanu Senior Product Manager for AI and Data Management, SAS

How data management works

As volumes, types and sources of data soar, the need to process data in real time expands – and the urgency to manage data well remains a top priority for business success. Dig into some of the core data management technologies.


Augmented data management

This approach uses artificial intelligence or machine learning techniques to make processes like data quality, metadata management and data integration self-configuring and self-tuning.

For example, augmented data management can:

Generate a list of suggestions for how to improve data. Actions taken over time will continue to improve results.

Profile data and automatically find personal information, which can be flagged to influence behavior – such as only allowing specified users to access personal data in a table.

Suggest data transformations, then suggest improvements over time using machine learning – done via a discovery engine that analyzes data and metadata.

Provide recommendations to users and suggest next-best actions during the data preparation process.

More about how data management works

Data management for AI and machine learning (ML)

Many business processes rely on AI, which is the science of training systems to emulate human tasks through learning and automation. For example, AI and ML techniques are often used in making loan and credit decisions, medical diagnoses and retail offers. With AI and ML, it’s more important than ever to have well-managed data that you understand and trust – because if bad data feeds algorithms that adapt based on what they learn, mistakes can multiply quickly.

Data management for the IoT

The data that gushes from sensors embedded in Internet of Things (IoT) devices is often referred to as streaming data. Data streaming, or event stream processing, involves analyzing real-time data on the fly. This is accomplished by applying logic to the data, recognizing patterns in the data and filtering it for multiple uses as it flows into an organization. Fraud detection, network monitoring, e-commerce and risk management are popular applications for these techniques.

Bidirectional metadata management

Bidirectional metadata management shares and connects metadata between different systems. SAS, for example, is committed to being part of the open metadata community through its involvement in the OPDi Egeria project – which underscores the need for metadata standards to promote responsible data exchange across varied technology environments.

Data fabric and semantic layer

The term data fabric describes an organization’s diverse data landscape – where vast amounts and types of data are managed, processed, stored and analyzed, using a variety of methods. The semantic layer plays an important role in the data fabric. Like a business glossary, the semantic layer is a way to link data to commonly defined business terms used across the organization.

Data management and open source

Open source refers to a computing program or infrastructure in which the source code is publicly available for use and modification by a community of users. Using open source can speed development efforts and reduce costs. And data professionals can thrive if they can work in the programming language and environment of their choice.

Data federation/virtualization

Data federation is a special kind of virtual data integration that lets you look at combined data from multiple sources without needing to move and store the combined view in a new location. So, you can access combined data exactly when you request it. Unlike ETL and ELT tools that show a snapshot at a point in time, data federation generates results based on what the data sources look like at the time of the request. This gives a timelier and potentially more accurate view of the information.


Next steps

Data management solutions

Trusted data leads to trusted AI and analytics – which is important for the success of every business. Our data management solutions include all the capabilities you need to access, integrate, clean, govern and prepare your data for analytics – including advanced analytics like artificial intelligence and machine learning.

SAS® Viya®: Performance, productivity and trust

SAS Viya –va data and AI platform for your entire company – helps you access, manage and govern data to ensure it’s accurate, high-quality and ready for analytics.