Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

PAKDD-2007: A Near-Linear Model for the Cross-Selling Problem

Thierry Van de Merckt, Jean-François Chevalier

Source Title: Strategic Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts and Developments

DOI: 10.4018/978-1-60566-717-1.ch020

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter presents VADIS Consulting’s solution for the cross-selling problem of the PAKDD_2007 competition. For this competition, the authors have used their in-house developed tool RANK, which automates a lot of important tasks that must be done to provide a good solution for predictive modelling projects. It was for them a way of benchmarking their 3 years of investment effort against other tools and techniques. RANK encodes some important steps of the CRISP-DM methodology: Data Quality Audit, Data Transformation, Modelling, and Evaluation. The authors have used RANK as they would do in a normal project, however with much less access to the business information, and hence the task was quite elementary: they have audited the data quality and found some problems that were further corrected, they have then let RANK build a model by applying its standard recoding, and then applied automatic statistical evaluation for variable selection and pruning. The result was not extremely good in terms of prediction, but the model was extremely stable, which is what the authors were looking for.

Chapter Preview

Top

Applied Methodology

Our methodology for building analytical solutions is based on CRISP-DM. In order to support our consultants in applying this methodology in a rigorous and consistent way, we have developed a platform called RANK that automates some of the major steps of the process, as shown in figure 1.

Figure 1.

Methodological steps & RANK contribution

Since PAKDD07 contest provides the data sets, the target, and the data dictionary, the first three steps are not applicable. Hence, the process is the following:

•
Audit – Evaluation of the data quality, its consistency, etc.
•
Transformation – Preparation of the data for modeling: defining types, binning, recoding, deriving new variables, linearization of the vector space, normalization, etc.
•
Modeling – building the model itself, by choosing the best technique, the set of relevant variables, etc.
•
Evaluation – asserting the model stability, its statistical relevance, etc. And reviewing the business relevance (this last important step is not applicable to the contest).

The last two steps (Learning and Deployment) are not applicable to the PAKDD07 contest.

RANK provides a great help for all these steps to the analyst.

Audit

The audit allows analyzing the distribution of the variables and to spot anomalies. An example is given in the next figure.

This variable indicates the Number of Bureau Enquiries in the last 6 months for Mortgages. Maximum actual value is 97. Special values are:

•
98 = Went to bureau and no match found (new file created)
•
99 = Did not go to bureau

From the data dictionary, the distribution, and the output of Rank, we immediately see that there is a problem with value 98 & 99. Figure 2 shows for each modality (possibly grouped to form a statistically relevant sample of the data) the total number of cases, the number of clients (target), the equivalent percentages, the index which shows the target density compared to the total population, and the statistical significance of the modality in relation with the target density. We see that all modalities are significant and that the more a prospect enquires for mortgage, the more chances to sell one. However, when we look at the 8+9+ …+99 group of modalities, we see a decrease of the Index which does not make any business sense. This is just the side effect of the coding of “no match found” and “did not go to the bureau” into 98 and 99 values, which are grouped with high values of enquiries. This has to be corrected.

Figure 2.

Anomaly for 98 & 98 modalities

Another example is the presence of un-documented modalities such as for MASTERCARD, where the modalities “2” and “1” are not described in the data dictionary. These values must be corrected as well.

The Audit takes less than 10 sec to be computed, and took minutes to analyze and spot anomalies (see Figure 3).

Figure 3.

Audit output

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

PAKDD-2007: A Near-Linear Model for the Cross-Selling Problem

Abstract

Applied Methodology

Audit

Complete Chapter List