News / Features



Why you need a customer state vector

Improve modeling with a cross-departmental view of customer data 

In most organizations, customer data is stored in multiple disparate systems, re­sulting in separate product-related data stores about each individual customer. In a bank, for example, data about a cus­tomer’s credit card history is often stored separately from checking account data, home loan data and car loan data.

On a practical level, this limits your un­derstanding of each customer to a set of discrete touch points. On a technical level, it limits the number of variables that analysts can use to build models for fraud detection, campaign management and customer support programs.

Often, organizations believe they are building an efficient, customer-centric environment when they are, in fact, in­creasing the costs and complexities of managing and maintaining widespread data. To reduce those complexities, we propose a real-time storage option called the customer state vector (CSV).

What is a customer state vector?
The customer state vector is based on an engineering concept that is popular in the science community. For example, NASA uses a state vector to control the space shuttle during operations. Vari­ables in the shuttle state vector show the present position, velocity components and other factors of the orbital trajectory at snapshots in time. It analyzes where an orbiting vehicle has been, where it is now and projects where it is going. Vectors are an excellent prediction tool for launch, orbit and landing positions.

Likewise, a customer state vector acts as a central repository for time-based and “customer state” data. This approach to customer-centric data storage gathers data from various systems and stores the entirety of a person’s current inter­actions with the organization based on integrated, real-time information across all departments and divisions.

In business analytics, state vectors are used for modeling. In the process of building predictive models for fraud detection or marketing, you discover the underlying set of variables that are important for use in those predictive models. Once you’ve built the models, you get a good picture of the data and variables you need to collect. That set of data is labeled a state vector for the models you’re building.

With credit card fraud, for instance, you’re going to need quite a few variables from the credit card side, including purchase history and frequency of use. For fraud in the banking area, you look at size of deposits, average check size, the average number of checks written per week and similar variables. In market­ing, you try to build models with all the relative data available on campaign response rates and what types of bank services are used together with different banking products.

In most organizations today, those variables for credit card fraud, banking fraud and marketing would all be stored separately. The customer state vec­tor, on the other hand, works across all operational systems and combines the data needed for all the models into one table. Every time a change is made in a customer’s state, a message is sent to the state vector manager detailing what just occurred. The manager uses that information to update the state vector for the enterprise.

All the data you need for modeling
The idea of the state vector is to collect all the data for all the decision making for an entire enterprise in one place. The CSV is not necessarily storing raw data on every check that’s written. Instead, it might include a summarization of some statistics that are built up from that ac­count. Rather than saving every last bit of customer data from operational systems, it only saves data used in the models the organization is running.

Once the vector is updated, a number of different models can be executed and – if necessary – send information back to, say, the charge department on whether or not to accept that charge. The purpose of the CSV is to supply all the variables needed for all the models throughout the bank, so you can be ready to score any event that occurs using modeling.

One of the important things about this ap­proach is that it leaves all the operational systems untouched. Traditionally, there are so many reports that are dependent upon those silos of data that organiza­tions never merge everything together in one giant database because of the huge amounts of surrounding reporting and verification that goes on within the silos.

A 20 percent lift in model performance
The CSV helps the organization break out of its silo approach and bring fraud data together with customer support data across the organization. By contrast, most data warehousing is not done in real time. Updates are done on a nightly or weekly extract of opera­tional data bases. The CSV becomes more of an operational data store to keep track of the state of everything that’s going on for a different customer.

Once you’ve accumulated all the data used in all the different models, you could expand the models to make use of more data from that state vector and improve your modeling results. Instead of using just card data, for example, you can use card data and banking deposit data.

When you pull in variables from multiple areas across the organization, you’ll start to see an uplift in results as models are updated. I’d predict a 15 percent to 20 percent lift in model performance.

As expectations for CIOs continue to become more strategic and more business-focused, a state vector proj­ect is one way for IT to bring value to the business as a whole. IT leaders aren’t likely to remove the independent silos in their organization, but they can easily gather the data needed from each of those silos and start maintaining one CSV to see what every customer looks like as a whole.

The results in model uplift will be seen across divisions, and operational sys­tems will not be affected. If, like most CIOs, you’re being asked to focus on business gain and long-term strategic plans, consider the benefits of a customer state vector.

Jim Goodnight, CEO, SAS, has been at the company's helm since the its incorporation in 1976, overseeing an unbroken chain of revenue growth - a feat almost unheard of in the software industry.