Head of Data Quality
Improving data quality
Thanks to its long history, Česká pojišťovna has an extensive collection of client data, gathered over many years and systems (going back to punch cards). One of ČP's oldest operational systems contains life insurance data for close to 10 million policies and more than 10 million claims.
Data – the most valuable asset of our company – has been improved. The project was an important milestone on the road to consistent, error-free and reliable information.
Some of this data suffered from inconsistencies, duplications and other data quality problems, such as:
- First names and surnames with missing diacritical marks.
- National ID numbers with an invalid suffix, usually four zeros.
- National ID numbers that didn't match the client's gender.
- Names, surnames and titles that were erroneously typed in single fields, rather than their respective fields.
- Obsolete post codes or incomplete addresses.
- Heavily abbreviated names, surnames and addresses.
- Miscellaneous remarks not stored in designated fields.
The poor data quality had an adverse effect on other systems within ČP. It was difficult to cleanse and deduplicate data after it had been transferred into a central client database. Also, information on policies didn't always match information on corresponding claims, while incorrect age or gender information resulted in miscalculated insurance premiums. Moreover, the data inconsistency made it virtually impossible to move data from the existing operating system into a new, modern one.
ČP selected the SAS solution as part of a larger SAS technology offering to improve the quality of its enterprise data. The project consisted of data cleansing and deduplication of ČP's main client database. To prevent future data distortion, ČP plans to establish new unique and custom business rules.
Thanks to SAS DataFlux, ČP cleansed data in its oldest operational system over a four-month period. The data quality initiative achieved better than a 90 percent success rate (defined as the ratio of cleansed or verified records to the total number of records). The database held more than 20 million records in two data sets: policies and claims. Since this data was historical and static, the main focus was not on the performance of the data quality tool, but on the quality of the new information.
ČP is pleased with the outcome of the project. "The collaboration with SAS was excellent. The project ran smoothly and the results are impressive," said Štepán Cábelka, Head of Data Quality, Česká pojišťovna. "Data – the most valuable asset of our company – has been improved. The project was an important milestone on the road to consistent, error-free and reliable information."
Česká pojišťovna needed a solution for its large data set that was filled with inconsistencies, duplications and quality problems.
Česká pojišťovna cleansed an older data set, achieving a 90 percent success rate.
About Česká pojišťovna
Česká pojišťovna is an all-purpose insurance company providing both individual life and non-life insurance, along with insurance for small, medium and large clients in industrial and business segments. Česká pojišťovna (ČP), the leader in the Czech insurance market, belongs to the Generali PPF Holding, which serves nearly 31 percent of the market in terms of the volume. ČP is the largest insurance company in the country with more than 9 million active policies.