Top 10 priorities for high-performance data warehousing
This quick guide will help you determine your business needs and technology requirements
Big data volumes, increasingly complex analytic workloads, growing user communities, business requirements for real-time operation … these are just a few of the challenges IT faces. The good news is that high-performance data warehousing is helping organizations meet these challenges, achieve speed – and scale as needed.
High-performance data warehousing solutions abound, but there are a few things you need to know to ensure success. Here are the top 10 priorities to consider when implementing a high-performance data warehousing platform:
- Enable new business practices based on high-performance business intelligence (BI) data warehousing (DW), data integration and analytics. This is what high-performance data warehousing is really about. Expect to apply high-performance data warehousing options to more business practices as your organization moves deeper into business analytics (demanding workloads for queries, mining, statistics) and big data (scaling to massive, diverse data sets to discover new facts about the business).
- Make real-time operation your first priority for high-performance data warehousing. Collecting, processing and delivering time-sensitive data is the key enabler of most new applications that businesses are currently clamoring for, including operational BI, operational analytics, just-in-time inventory, facility monitoring, price optimization, workforce management, fraud detection and mobile asset management.
- Make scalability your second priority. On the one hand, you have no choice but to keep pace with growth in data volumes, BI user communities and burgeoning bodies of reports. On the other hand, tapping new data sources – Web data, social media, traditional enterprise applications – can provide richer information for business programs, such as 360-degree views, sentiment analysis, operational efficiency, website visitor analysis and customer relationships beyond the usual channels.
- Hardware: Use it, but don't abuse it. There's no doubt that servers, networks and storage are key components of any performance strategy, but hurling hardware at performance problems raises the cost of high performance. It also dulls a team's optimization expertise for software and data. Balance your reliance on hardware with software optimization skills and well-performing designs for queries, reports, data models and ETL jobs.
- Select database platforms and analytics tools that are designed for high performance. There are many types to consider, including analytics DBMSs, columnar databases, appliances and other engineered systems, as well as Hadoop with MapReduce and other non-SQL databases. While the data structures and workloads of these tools and platforms will deliver performance gains out of the box, you should still expect to perform some development work to remodel and tweak data processing for additional gains.
- Rely on specialized platform and tool functionality for certain performance gains. For example, in-database analytics assists the overall performance of non-query-based analyses by alleviating the need to move analytics data before analysis. In-memory processing decreases query response and analytics rescore time by decreasing disk I/O. Columnar data stores accelerate column-oriented queries by collating the data of table columns in physical storage.
- Consider the many new architectures that boost performance. If your EDW is still on an SMP platform, make migration to MPP a priority. Consider distributing your data warehouse architecture, especially to offload a workload to a standalone platform that performs well with that workload. When possible, take analytics algorithms to the data, instead of moving data to the algorithm (as is the DW tradition); this new paradigm is seen with in-database analytics, Hadoop with MapReduce layered over it, and gate-array processing in some storage platforms and appliances. Hadoop and MapReduce are an alternate MPP architecture that fits some analytics workloads quite well.
- Keep your performance optimization skills sharp and current. Ample hardware and fast, scalable software can automatically provide good performance for many situations. However, it's inevitable that some queries, reports, data models, ETL logic and analytics algorithms will need tweaking and tuning before they achieve the desired performance level. So maintain your optimization skills, especially for SQL tuning and data model tweaking. Get optimization training, if needed.
- Design and develop with high performance in mind. Most teams have standards for the look and feel of reports, approaches to modeling warehouse data, preferred interfaces for specific data sources or targets and the style of handwritten code. Whoever determines and enforces these standards should ensure that they also foster high performance. Of course, the performance of any new development work should be tested during peer review and quality assurance processes.
- Develop and apply a technology strategy for high-performance data warehousing. No single approach to attaining high-performance data warehousing is adequate for all situations.
Excerpted from the TDWI Best Practices Report High-Performance Data Warehousing, Q4 2012.
©2012 by TDWI (The Data Warehousing Institute TM), a division of 1105 Media, Inc. Reprinted with permission. Visit tdwi.org for more information.
Four requirements for high-performance success
- Up-to-date hardware platform components, especially CPUs, memory and storage.
- Enterprise software platforms and tools designed specifically for demanding data warehousing and analytics applications.
- Technical users’ global architectures for data and team standards for BI development, especially when governing data models, SQL coding, ETL logic and analytic algorithms.
- Tactical tweaking and tuning on the local level, as required by reports, data structures, analytic algorithms, or deficient tools and platforms.
SAS® helps IT go real time
To incorporate real-time analytical insights into business processes and systems, IT must provide a scalable, integrated high-performance infrastructure.
- Data management and governance for effective big data analytics, including the ability to process and analyze the entire data set or a targeted data set.
- Several distributed processing options: SAS In-Memory Analytics; SAS In-Database; and SAS Grid Computing.
- Easy access, processing ,visualization and analysis of data stored in Hadoop. SAS simplifies and augments Hadoop to ensure it meets expectations.
Read more about high perfomance analytics.