A Case Study to Improve Data Vendor Selection

A Case Study to Improve Data Vendor Selection

Rick McGraw
Copyright: © 2014 |Pages: 15
DOI: 10.4018/978-1-4666-4892-0.ch016
(Individual Chapters)
No Current Special Offers


All financial services companies use external sources of consumer income data to assess credit risk, develop risk mitigation strategies, and create pre-approved marketing offers. International credit card issuers spend more than a billion dollars in marketing and invest hundreds of millions in product development based on data from third party vendors. The accuracy of income data is the foundation for their business decisions. The purpose of this engagement was to evaluate the accuracy of third party data providers by measuring the data across six dimensions – Accuracy, Relevancy, Completeness, Coverage, Effectiveness, and Cost. The data provided by the three vendors for this project was incorrect by more than 20% on 75% of the records. The goal of the project was to demonstrate the ability to improve the accuracy of multi-source income data used in credit card marketing applications.
Chapter Preview

The Challenge

  • Data has become a commodity – broad availability, similar characteristics;

  • Generally, commodities carry metrics-based standard grades to gauge suitability, but today, data does not;

  • Without an established baseline and score, it is impossible to calculate the quality, and therefore, the unique value and contribution of a specific data source.

The inability to resolve data conflict leads to less effective models and sub-optimal decisions.


Performance Measures

Performance measures were computed for each vendor under each operating scenario. The performance measures computed were the six dimensions of Accuracy, Relevancy, Completeness, Coverage, Effectiveness and Cost. The sections below detail the performance measures and data analyzed.


Completeness measures the number of records appended by a vendor that have income information For this particular test, only one element was provided by the vendor –income, so completeness percentages were very high. However, in most cases, vendors are asked to provide multiple elements and they can be measured per element or at a higher composite level.


Coverage measures the number of records with income information appended by a vendor to the total number of records requested. This is a direct measure of the utility of the vendor data. A vendor with low coverage may be very good data, but with very few records.


Accuracy measures how close the reported income is relative to the true value.

Two of the data vendors provided an income range rather than a specific value for income. Records are considered accurate if the true value is within the income band provided by the vendor.

One vendor did not provide an income band, but instead provided a specific value. For this vendor, each income was matched to an equivalent band from the other vendors and the result is treated as if the vendor had reported the band.


Accuracy measured the vendor performance against the bands set up by the vendor. Because the vendor sets up the bands, there is a subjective component to accuracy. For example, if one vendor reports many small bands while a second vendor reports fewer larger bands, the second vendor may be measured as more accurate. This is a misleading result because the first vendor reports more precise results and this precision is typically desired.

Relevancy removes the vendor component and measures performance against a tolerance range for truth. For vendors reporting income bands, the result is considered relevant if the midpoint of the band is within the tolerance level of the true value.

Relevancy is computed for tolerance at 10% and 20%. For example, a vendor reporting an income that is 15% higher than the true value is considered accurate at 20% tolerance, but inaccurate at 10% tolerance.

Complete Chapter List

Search this Book: