5 data management best practices to help you do data right
By Cindy Turner, SAS Insights Editor
If you’re in the business of pretty much anything, you’ve got some important data hanging out at your company. In fact, you probably have a lot of important data in a lot of different places – internal and external. What you might be lacking are the data management best practices that could help you get to all of that data and take a closer look at it. Doing that just might give you a glimmer of insight that could nudge your business into a brand new market, or send profits soaring beyond all expectations.
But what, and where, IS all the data that’s relevant to your business? Can you access it when you want it? Do you know that it’s accurate, current, clean and complete? Can you easily pull all the data together, no matter what format it’s in or how often it changes?
The big question here: Is your data ready to support business analytics? An often-ignored truth is that before you can do really exciting things with analytics, you need to be able to “do” data first. Data management, that is.
Data management best practices = better analytics
Sure, plenty of companies have done analytics on data that wasn’t really prepared for analytics. Their data might have been incomplete – maybe the company infrastructure couldn’t accommodate some new data format, like unstructured data from text messages. Or maybe they were working from duplicate data, corrupt data or outdated data.
Until those companies find a better way to manage their data, the results of their analytics are going to be somewhat … well, less than optimal. So how difficult is it to manage unfiltered data and get it ready for analytics? Ask a data scientist. Most of them spend 50 to 80 percent of their model development time on data preparation alone.
5 data management best practices to get your data ready for analytics
- Simplify access to traditional and emerging data. More data generally means better predictors, so bigger really is better when it comes to how much data your business analysts and data scientists can get their hands on. With access to more data, it’s easier to quickly determine which data will best predict an outcome. SAS helps by offering an abundance of native data access capabilities that make it easy to work with a variety of data from ever-increasing sources, formats and structures.
- Strengthen the data scientist’s arsenal with advanced analytic techniques. SAS provides sophisticated statistical analysis capabilities inside of the ETL flow. For example, frequency analysis helps identify outliers and missing values that can skew other measures like mean, average and median. Summary statistics helps analysts understand the distribution and variance – because data isn’t always normally distributed, as many statistical methods assume. Correlation shows which variables or combination of variables will be most useful based on predictive capability strength – in light of which variables may influence one another, and to what degree.
- Scrub data to build quality into existing processes. Up to 40 percent of all strategic processes fail because of poor data. With a data quality platform designed around data management best practices, you can incorporate data cleansing right into your data integration flow. Pushing processing down to the database improves performance. It also removes invalid data based on the analytic method you’re using, and enriches data via binning (that is, grouping together data that was originally in smaller intervals).
- Shape data using flexible manipulation techniques. Preparing data for analytics requires merging, transforming, de-normalizing and sometimes aggregating your source data from multiple tables into one very wide table, often called an analytic base table (ABT). SAS simplifies data transposition with intuitive, graphical interfaces for transformations. Plus it lets you use other reshaping transformations like frequency analysis, appending data, partitioning and combining data, and multiple summarization techniques.
- Share metadata across data management and analytics domains. A common metadata layer lets you consistently repeat your data preparation processes. It promotes collaboration, provides lineage information on the data preparation process, and makes it easier to deploy models. You’ll notice better productivity, more accurate models, faster cycle times, more flexibility, and auditable, transparent data.
Data: The foundation for decisions
Analytics may be one of the hottest IT topics around these days – it is, undeniably, very sexy technology. But as you dream about the magic of analytics, remember this: Underlying analytics is data. Don’t underestimate how important it is to do your data right.