A data warehouse (or enterprise data warehouse) stores large amounts of data that has been collected and integrated from multiple sources. Because organizations depend on this data for analytics and reporting, the data needs to be consistently formatted and easily accessible – two qualities that define data warehousing and make it essential to today’s businesses.
History of the data warehouse
As data began to proliferate in the 1970s and '80s, organizations needed a way to store and access all their information. Computer scientist Bill Inmon, the father of data warehousing, began to define the concept in the 1970s and is credited with coining the term “data warehouse.” He published Building the Data Warehouse, lauded as a fundamental source on data warehousing technology, in 1992. Inmon’s definition of the data warehouse takes a “top-down” approach, where a centralized repository is established first and then data marts – which contain specific subsets of data – are created within that repository.
Ralph Kimball, another technology expert who published The Data Warehouse Toolkit in the mid-'90s, took a slightly different view of the data warehousing concept. In his “bottom-up” approach, individual data marts are developed first and integrated together later to create a data warehouse.
Data warehousing remains relevant today – but it continues to evolve as industries change to accommodate more cloud computing and real-time data analytics. One data storage repository that's similar to a data warehouse is a data lake. Data lakes began with disruptive, low-cost technologies like Apache Hadoop. Today, data lakes are often used for unfettered big data that streams in and is stored without processing or building schemas.
E-commerce retailer enhances customer engagement with cloud-based analytics and AI
With a rapidly growing business and an increasingly dispersed workforce, 1-800-FLOWERS.COM turned to SAS® Viya® hosted on Azure to obtain a more flexible, scalable infrastructure. To get data ready for analytics, the company first consolidates its databases and feeds them into Snowflake, a cloud-based data warehouse.
Data warehousing in today’s world
A data warehouse often means the difference between informed decisions and data chaos. Learn how and why data warehouses and related technologies are being used in our world today.
Who's using data warehouses?
How a data warehouse works
A data warehouse starts with the data itself, which is collected and integrated from both internal and external sources. Business users access this standardized data in a warehouse so they can use it for analytics and reporting. Business intelligence tools help them explore the data to make better-informed business decisions.
Data is typically stored in a data warehouse through an extract, transform and load (ETL) process. The information is extracted from the source, transformed into high-quality data and then loaded into the warehouse. Businesses perform this process on a regular basis to keep data updated and prepared for the next step.
When an organization is ready to use its data for analytics or reporting, the focus shifts from data warehousing to business intelligence (BI) tools. BI technologies like visual analytics and data exploration help organizations glean important insights from their business data. On the back end, it’s important to understand how the data warehouse architecture organizes data and how the database execution model optimizes queries – so developers can write data applications with reasonably high performance.
In addition to a traditional data warehouse and ETL process, many organizations use a variety of other methods, tools and techniques for their workloads. For example:
- Data pipelines can be used to populate cloud data warehouses, which can be fully managed by the organization or by the cloud provider.
- Continuously streaming data can be stored in a cloud data warehouse.
- A centralized data catalog is helpful in uniting metadata, making it easier to find data and track its lineage.
- Data warehouse automation tools get new data into warehouses faster.
- Data virtualization solutions create a logical data warehouse so users can view the data from their choice of tools.
- Online analytical processing (OLAP) is a way of representing data that has been summarized into multidimensional views and hierarchies. When used with an integrated ETL process, it allows business users to get reports without IT assistance.
- An operational data store (ODS) holds a subset of near-real-time data that’s used for operational reporting or notifications.
Why are data warehouses important?
Enterprise data warehouses are vital because they integrate and store – in a central database and standard format – all the valuable data organizations use for enterprise decisioning. In turn, organizations can avoid the unpredictable results of taking an ad hoc approach to data access and integration. Data warehouses:
- Maintain historical data records – storing months or even years’ worth of information.
- Keep data secure by storing it in a single location where only those who need specific data are granted access to it.
- Provide easy access to high-quality data, which enables faster, more-informed business decisions.
- Make big data available for basic reporting as well as for advanced analytics, like machine learning and natural language processing.
Comparison: Data warehouse, data mart and data lake