White Paper

SAS Best Practices Report: Data Lakes

TDWI Best Practices Report


When designed well, a data lake is an effective data-driven design pattern for capturing a wide range of data types, both old and new, at large scale. By definition, a data lake is optimized for the quick ingestion of raw, detailed source data plus on-the-fly processing of such data for exploration, analytics and operations. Even so, traditional, latent data practices are possible, too.

Organizations are adopting the data lake design pattern (whether on Hadoop or a relational database) because lakes provision the kind of raw data that users need for data exploration and discovery-oriented forms of advanced analytics. A data lake can also be a consolidation point for both new and traditional data, thereby enabling analytics correlations across all data.

To help users prepare, this TDWI Best Practices Report defines data lake types, then discusses their emerging best practices, enabling technologies and real-world applications. The report’s survey quantifies user trends and readiness for data lakes, and the report’s user stories document real-world activities.