How to improve data prep for analytics: TDWI shares best practices

By Cindy Turner, Insights editor

Could increasing self-service data preparation reduce the amount of time it takes to gain business insight? According to a survey from TDWI, a high percentage of organizations (81 percent) are banking on it. Nearly as many (76 percent) hope to increase data-driven decision making.

TDWI shares these survey results along with conclusions about how to improve data prep for analytics in a Q3 2016 best practices report. Responses to the survey came from more than 400 professionals with different roles and levels of tool and platform expertise.

Organizations cannot expect manual data preparation processes to scale, and lack of coordination will become a problem as the big data tsunami hits.

Across the board, organizations want to improve data prep for analytics

More than a third (37 percent) of TDWI’s survey participants indicated dissatisfaction with their ability to easily find relevant data and understand how to use it appropriately for business intelligence (BI) and analytics. More specifically, survey participants said the top data-related barrier to improving how data is prepared for user BI and analytics projects was difficulty in accessing and integrating data across system or application data silos. They cited a closely related factor around speed and agility of existing ETL and data integration processes. Other top issues noted were insufficient inbound data quality, and difficulty of integrating data preparation with BI and analytics tools.

How can self-service data preparation help?

Organizations want to harness the power of analytics to make better decisions, and they want to do it as quickly as possible. But analytics can only be as good as the data. Getting good results from your data means doing a good job of preparing the data for analytics – that is, blending, integrating, cleansing, transforming, governing and defining the metadata of multiple data sources (including raw big data in Hadoop). This process has historically been slow, difficult and tedious.

Most organizations TDWI surveyed want to help business users and analysts do more on their own to serve their BI, data discovery and analytics needs without IT hand-holding. Many new, smarter self-service tools automate processes so users won’t need to do as much manual work to find the right data, cleanse it and transform it. Through self-service data preparation, users can become less dependent on IT and data specialists – a boon to everyone, providing IT is willing to give up a certain amount of control. Ideally, business users and IT will work together to ensure their self-service data preparation processes make users more productive without increasing data chaos or duplicating work.

Data prep – the underlying challenges

You don’t have to look far to see why “data chaos” is a pain point at many organizations. Spreadsheets are a big culprit – they’re the most commonly used tools for data access, queries, reporting, analysis, presentation and sharing. Even at organizations where there’s a data warehouse, spreadsheets are where many users continue to do their data preparation.

TDWI found that users tire of waiting for new data to become available in the data warehouse, and they don’t want to wait for long IT processes to complete. Instead, they copy and paste data on their own into spreadsheets, and try to cleanse and prepare it there for personal or departmental use.

Spreadsheets are certainly an affordable way to view data, do calculations, create graphs and perform some data analysis. But spreadsheets are fraught with errors – many related to the manual, ad hoc nature of how they’re used, the lack of documentation around them and the number of disparate spreadsheets most organizations have.

Best practices – how to improve data prep for analytics

Data variety and velocity are still on the rise, and improving self-service data preparation is a viable way to speed time to insight from all that data. Consider that newer big data sources require integrating semistructured and unstructured data – like customer behavior data, machine or sensor data, log file data, geolocation data and feeds from external sources. By giving business users better self-service data prep techniques, organizations can position themselves to quickly analyze diverse data sources and variables while spotting trends, patterns and correlations.

Some of the data preparation for analytics best practices that TDWI recommends include:

  • Make shortening the time to achieving business insight a data preparation improvement priority. Put this at the top of your list. Apply new data preparation for analytics technologies and methods that trim delays in getting users from data to insight.
  • Focus on reducing how long data preparation takes to deliver valuable data. Evaluate your current data preparation procedures to get rid of unnecessary routines. And develop strategies that rely on automation and standardized processes for incorporating and integrating new data. Then, once it’s prepared, register the data in a data catalog so it can be reused by others.
  • Use new technologies and methods to achieve higher levels of repeatability. Say no to one-off data prep processes that will likely have to be redone each time there’s a new requirement or new data. Evaluate how you can apply technologies and adjust processes so that you can reuse scripts, workflows and other elements for different situations. And adopt a collaboration framework that will encourage people to share repeatable methods, scripts and workflows.

Read all the details and recommendations from TDWI by downloading the best practices report Improving Data Preparation for Business Analytics.