Using Data Science to Predict Hotel Booking Cancellations

Using Data Science to Predict Hotel Booking Cancellations

Nuno António (ISCTE Instituto Universitário de Lisboa, Portugal), Ana de Almeida (ISCTE Instituto Universitário de Lisboa, Portugal & Centro de Informática e Sistemas da Universidade de Coimbra, Portugal) and Luis M. M. Nunes (ISCTE Instituto Universitário de Lisboa, Portugal & Instituto de Telecomunicações, Portugal)
DOI: 10.4018/978-1-5225-1054-3.ch006

Abstract

Booking cancellations in the hospitality industry not only generate revenue loss and affect pricing and inventory allocation decisions, but they also, in overbooking situations, have the potential to affect the hotel's online social reputation. By employing data sets from four resort hotels and addressing this issue as a classification problem in the scope of data science, the authors demonstrate that it is possible to build models for predicting booking cancellations with accuracy results in excess of 90%. This research also demonstrates that despite what was alleged by Morales and Wang (2010), it is possible to predict with high accuracy whether a booking will be canceled. Results allow hotel managers to act on bookings with high cancellation probability and contain the associated revenue losses, produce better net demand forecasts, improve overbooking/cancellation policies, and have more assertive pricing and inventory allocation strategies.
Chapter Preview
Top

Introduction

Bookings represent a contract between a customer and a service provider (Talluri & Van Ryzin, 2004). This contract gives the customer the right to use the service in the future at a settled price, usually with an option to cancel the contract prior to the service provision. In the case of the hospitality industry, this option to cancel the booking puts the risk on the hotel. The hotel has to guarantee rooms to the customers who honor their bookings, but at the same time it has to bear with the opportunity cost of vacant capacity when a customer cancels a booking or does not show up (Talluri & Van Ryzin, 2004). Cancellation rates vary from hotel to hotel. For the purpose of this chapter, no-shows will be treated as cancellations, even though there are some differences between them. A cancellation occurs when the customer terminates the contract prior to his or her arrival. A no-show occurs when the customer does not inform the hotel and fails to check in.

Canceled bookings can represent up to 20% of the total bookings (Morales & Wang, 2010). However, in airport/roadside hotels this number can rise as high as 60% (Liu, 2004). These cancellations can have a substantial affect on revenue, not only because of the revenue loss they represent themselves, but also because of the effect they can have on pricing and inventory allocation decisions (Morales & Wang, 2010).

To compensate for the potential revenue losses caused by cancellations, hotels often sell above their capacity (overbooking) (Ivanov & Zhechev, 2012; Mehrotra & Ruttley, 2006; Morales & Wang, 2010). However, overbooking can also generate costs (Hayes & Miller, 2011; Mehrotra & Ruttley, 2006): reallocation of customers to alternative hotels, cash compensations, or social reputation. Thus, classifying hotel bookings with high cancellation probability is relevant to enable hotels to act on those bookings to prevent or mitigate their cancellation effects. At the same time, this prediction facilitates an easier identification of cancellation patterns, hence allowing a better understanding of net demand and a better definition of overbooking/cancellation policies.

Using uncensored data from four hotel Property Management Systems (PMS) that represent this tendency for hotels to have increasingly higher booking cancellations rates (illustrated in Figure 1), the authors aim to demonstrate how data science can be applied in the scope of hotel revenue management to:

Figure 1.

Booking cancellation ratios over time per hotel

  • 1.

    Identify features from PMS databases with predictive strength regarding a booking cancellation probability.

  • 2.

    Build a model that could predict bookings with a high cancellation probability.

  • 3.

    Understand if one prediction model fits all hotels or if a specific model should be built for each hotel.

Key Terms in this Chapter

F1 Score: Measure of prediction accuracy, which is the harmonic means of precision and recall.

Predictor: A variable that explains the potential reasons behind the outcome variable variations. Also known as independent variable, explanatory variable, or feature.

Recall: Measure of relevant predictions that are retrieved. It can be interpreted as the probability of a randomly selected prediction could be a True Positive.

FP (False Positive): The outcome prediction was true and the actual value was false (e.g., the booking was predicted as “has a cancellation” but in fact it was not canceled).

FN (False Negative): The outcome prediction was false and the actual value was true (e.g., the booking was predicted as “has not canceled” but in fact it was canceled).

Lead Time: Time (usually measured in days) between a booking’s date of placement in the hotel and the guest’s expected arrival date.

Outcome: A variable which one’s want to predict. Also known as response variable, dependent variable, or label.

Cancellation Time: Time (usually measured in days) between a booking’s date of cancellation and the guest’s expected arrival.

TN (True Negative): The outcome prediction was false and the actual value was false (e.g., the booking was predicted as “has not canceled” and it was not canceled).

Classification Problem: When the outcome of prediction is a class/category (discrete value).

Regression Problem: When the outcome of prediction is a continuous value.

AUC (Area Under the Curve): Measure of success calculated from the area under the plot of true positive rate against false positive rate.

TP (True Positive): The outcome prediction was true and the actual value was true (e.g., the booking was predicted as “has a cancellation” and it was canceled).

Precision: Measures the proportion of True Positives against the sum of all positive predictions (True Positives and False Positives).

Accuracy: Measure of outcome correctness. Measures the proportion of true results (True Positives and True Negatives) among the total number of predictions.

PMS (Property Management System): A computerized system used to facilitate the management of hotels and other types of properties. Considered equivalent to Enterprise Resource Planning systems in other types of industries.

Complete Chapter List

Search this Book:
Reset