Introduction
During Covid, the hospitality industry has suffered a massive drop in revenue. So when people are traveling more, getting the customer remains a challenge. We will develop an ML tool to solve this problem to counter this problem and set the fitting room to attract more customers. Using the hotel’s dataset, we will build an AI tool to select the correct room price, increase the occupancy rate, and increase the hotel revenue.
Learning Objectives
- Importance of setting the correct price for hotel rooms.
- Cleaning Data, transforming datasets, and preprocessing datasets.
- Creating maps and visual plots using hotel booking data
- Real-world application of hotel booking data analysis used in data science.
- Performing hotel booking data analysis using the Python programming language
This article was published as a part of the Data Science Blogathon.
What is the Hotel Room Price Dataset?
The hotel booking dataset contains data from different sources, which includes columns such as hotel type, number of adults, stay time, special requirements, etc. These values can help predict the hotel room price and help in increasing hotel revenue.
What is Hotel Room Price Analysis?
In Hotel room price analysis, we will analyze the dataset’s pattern and trend. Using this information, we will make decisions related to pricing and operation. These things will depend upon several factors.
- Seasonality: Room prices rise significantly during peak seasons, such as holidays.
- Demand: Room price rises when the demand is high, such as during an event celebration or a sports event.
- Competition: Hotel room prices are highly influenced by nearby hotels’ prices. If the number of hotels in an area then the room price will reduce.
- Amenities: If the hotel has a pool, spa, and gym, it will charge more for these facilities
- Location: The hotel in the main town can charge compared to the hotel in a remote area.
Importance of Setting the Right Hotel Room Price
Setting the room price is essential to increase revenue and profit. The importance of setting the right hotel price is as follows:
- Maximize revenue: Hotel price is the primary key to increasing revenue. By setting the competitive price, hotels can increase revenue.
- Increase Customer: More guests would book the hotel when the room prices are fair. This helps in increasing the occupancy rate.
- Maximize profit: Hotels try to charge more to increase profit. However, setting more would reduce the number of guests, whereas having the right price would increase the number.
Collecting Data and Preprocessing
Data collection and preprocessing is the essential part of hotel room price analysis. The data is collected from hotel websites, booking websites, and public datasets. This dataset is then converted to the required format for visualization purposes. In preprocessing, the dataset undergoes data cleaning and transformation. The new transformed dataset is used in visualization and model building.
Visualizing Dataset Using Tools and Techniques
Visualizing the dataset helps get insight and find the pattern to make a better decision. Below are the Python tools to provide better visualization.
- Matplotlib: Matplotlib is one of the critical stools in Python used to create charts and graphs like bar and line charts.
- Seaborn: Seaborn is another visualization tool in Python. It helps create more detailed visualization images like heat maps and violin plots.
Techniques Used to Visualize the Hotel Booking Dataset.
- Box plots: This library plots the graph between the market segment and stay. It helps in understanding the customer type.
- Bar charts: Using bar chat, we plot the graph between average daily revenue and months; this helps understand the more occupied months.
- Count plot: We plotted the graph between the market segment and deposit type using a count plot to understand which segment hotels receive more deposits.
Use Cases and Applications of Hotel Room Data Analysis in Data Science
The hotel booking dataset has multiple use cases and applications as described below:
- Customer Sentiment Analysis: Using machine learning techniques, such as customer sentiment analysis, from the customer review, managers can determine the sentiment and improve the service for a better experience.
- Forecasting Occupancy Rate: From customer reviews and ratings, managers can estimate the room occupancy rate in the short term.
- Business Operations: This dataset can also be used to track the inventory; this empowers the hotels to have sufficient room and material.
- Food and Beverage: Data can also be used to set prices for food and beverage items to maximize revenue while still being competitive.
- Performance Evaluation: This dataset also helps develop personalized suggestions for a guest’s experience. Thus improving hotel ratings.
Challenges in Hotel Room Data Analysis
Hotel room booking dates can have several challenges due to various reasons:
- Data quality: As we are collecting data from multiple datasets, the quality of the dataset is compromised, and the chances of missing data, inconsistency, and inaccuracy arise.
- Data privacy: The hotel collects sensitive data from the customer if these data leaks threaten the customer. So, following the data safety guidelines becomes almost a priority.
- Data integration: The Hotel has multiple systems, like property management and booking websites, so integrating these systems has difficulties.
- Data volume: Hotel room data can be extensive, making it challenging to manage and analyze.
Best Practices in Hotel Room Data Analysis
Best practices in hotel room data analysis:
- To collect data, use property management systems, online booking platforms, and guest feedback systems.
- Ensure data quality by regularly monitoring and cleaning the data.
- Protect data privacy by implementing security measures and complying with data privacy regulations.
- Integrate data from different systems to get a complete picture of the hotel room data.
- Use machine learning techniques such as LSTM to forecast room rates.
- Use data analytics to optimize business operations, like inventory and staffing.
- Use data analytics to target marketing campaigns to attract more guests.
- Use data analytics to evaluate performance and provide innovative guest experiences.
- With the help of data analytics, management can better understand their customer and provide better service.
Future Trends and Advancements in Hotel Room Data Analysis in Data Science
As consumer spending increases, it greatly benefits the hotel & tourism industry. This creates new trends and data to analyze customer spending and behavior. The increase in AI tools creates an opportunity to explore and maximize the industry. With the help of an AI tool, we can gather the required data and remove unwanted data, i.e., performing data preprocessing.
On top of this data, we can train our model to generate valuable insight and produce real-time analysis. This also helps in providing personalized experiences based on individual customers and guests. This highly benefits the hotel and the customer.
Data analysis also helps the management team to understand their customer and inventory. This will help in setting dynamic room pricing based on demand. Better inventory management helps in reducing the cost.
Hotel Room Data Analysis with Python Implementation
Let us perform a fundamental Data analysis with Python implementation on a dataset from Kaggle. To download the dataset, click here.
Data Details
Hostel Booking dataset includes information on different hotel types, such as Resort hotels and City Hotels, and Market Segmentation.
Visualizations of the Datasets
Step 1. Import Libraries and read the dataset
#Importing the Library
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
Step 2. Importing Dataset and Inspecting Data
#Read the file and convert to dataframe
df = pd.read_csv('data\hotel_bookings.csv')
#Display the dataframe shape
df.shape
(119390, 32)
#Checking the data sample
df.head()
#Checking the dataset info
df.info()
#Checking null values
df.isna().sum()
OUTPUT
Step 3. Visualizing the dataset
#Boxplot Distribution of Nights Spent at Hotels by Market Segment and Hotel Type
plt.figure(figsize = (15,8))
sns.boxplot(x = "market_segment", y = "stays_in_week_nights", data = df, hue = "hotel",
palette="Set1")
OUTPUT
#Plotting box plot for market segment vs stay in weekend night
plt.figure(figsize=(12,5))
sns.boxplot(x = "market_segment", y = "stays_in_weekend_nights", data = df,
hue = "hotel", palette="Set1");
OUTPUT
Observation
The above plots show that most groups are normally distributed, and some have high skewness. Most people tend to stay less than a week. The customers from the Aviation Segment do not seem to be staying at the resort hotels and have a relatively lower day average.
#Barplot of average daily revenue vs Month
plt.figure(figsize = (12,5))
sns.barplot(x = 'arrival_date_month', y = 'adr', data = df);
OUTPUT
Working Descriptions
In the implementation part, I will show how I used a ZenML pipeline to create a model that uses historical customer data to predict the review score for the next order or purchase. I also deployed a Streamlit
application to present the end product.
What is ZenML?
ZenML is an open-source MLOps framework that streamlines production-ready ML pipeline creations. A pipeline is a series of interconnected steps, where the output of one step serves as an input to another step, leading to the creation of a finished product. Below are reasons for selecting ZenML Pipeline:
- Efficient pipeline creation
- Standardization of ML workflows
- Real-time data analysis
Building a model is not enough; we have to deploy the model into production and monitor the model performance over time and how it interacts with accurate world data. An end-to-end machine
learning pipeline is a series of interconnected steps where the output of one step serves as an input to another step. The entire machine learning workflow can be automated through this process, from data preparation to model training and deployment. This can help us continuously predict and confidently deploy machine learning models. This way, we can track our production-ready model. I highly suggest you refer to the ZenML document for more details.
The first pipeline we create consists of the following
steps:
- ingest_data: This method will ingest the data and create a DataFrame.
- clean_data: This method will clean the data and remove the unwanted columns.
- model_train: This method will train and save the model using MLflow auto logging.
- Evaluation: This method will evaluate the model and save the metrics – using MLflow auto logging – into the artifact store.
Model Development
As we discussed above, different steps. Now, we will focus on the coding part.
Ingest Data
class IngestData:
"""
Ingesting data from the data_path
"""
def __init__(self,data_path:str) -> None:
"""
Args:
data_path: Path an which data file is located
"""
self.data_path = data_path
def get_data(self):
"""
Ingesting the data from data_path
Returns the ingested data
"""
logging.info(f"Ingesting data from {self.data_path}")
return pd.read_csv(self.data_path)
@step
def ingest_df(data_path:str) -> pd.DataFrame:
""""
Ingesting data from the data_path.
Args:
data_path: path to the data
Returns:
pd.DataFrame: the ingested data
"""
try:
ingest_data = IngestData(data_path)
df = ingest_data.get_data()
return df
except Exception as e:
logging.error(f"Error occur while ingesting data")
raise e
Above, we have defined an ingest_df() method, which takes the file path as an argument and returns the dataframe. Here @step is a zenml decorator. It is used to register the function as a step in a pipeline.
Clean Data & Processing
data["agent"].fillna(data["agent"].median(),inplace=True)
data["children"].replace(np.nan,0, inplace=True)
data = data.drop(data[data['adr'] < 50].index)
data = data.drop(data[data['adr'] > 5000].index)
data["total_stay"] = data['stays_in_week_nights'] + data['stays_in_weekend_nights']
data["total_person"] = data["adults"] + data["children"] + data["babies"]
#Feature Engineering
le = LabelEncoder()
data['hotel'] = le.fit_transform(data['hotel'])
data['arrival_date_month'] = le.fit_transform(data['arrival_date_month'])
data['meal'] = le.fit_transform(data['meal'])
data['country'] = le.fit_transform(data['country'])
data['market_segment'] = le.fit_transform(data['market_segment'])
data['reserved_room_type'] = le.fit_transform(data['reserved_room_type'])
data['assigned_room_type'] = le.fit_transform(data['assigned_room_type'])
data['deposit_type'] = le.fit_transform(data['deposit_type'])
data['customer_type'] = le.fit_transform(data['customer_type'])
- In the above code, we are removing the null values and outliers. We are merging the weeknight and weekend night stay to get the total stay days.
- Then, we did label encoding to the categorical columns such as hotel, country, deposit type, etc.
Model Training
from zenml import pipeline
@pipeline(enable_cache=False)
def train_pipeline(data_path: str):
df = ingest_df(data_path)
X_train, X_test, y_train, y_test = clean_df(df)
model = train_model(X_train, X_test, y_train, y_test)
r2_score,rsme = evaluate_model(model,X_test,y_test)
We will use the zenml @pipeline decorator to define the train_pipeline() method. The train_pipeline method takes the file path as an argument. After data ingestion and splitting the data into training and test sets, the train_model() method is called. This method, train_model(), will use different algorithms such as Lightgbm, Random Forest, Xgboost, and Linear_Regression to train on the dataset.
Model Evaluation
We will use the RMSE, R2 score, and MSE of different algorithms to determine the best one. In the below code, we have defined the evaluate_model() method to use other evaluation metrics.
@step(experiment_tracker=experiment_tracker.name)
def evaluate_model(model: RegressorMixin,
X_test: pd.DataFrame,
y_test: pd.DataFrame,
) -> Tuple[
Annotated[float, "r2_score"],
Annotated[float, "rmse"]
]:
"""
Evaluates the model on the ingested data.
Args:
model: RegressorMixin
x_test: pd.DataFrame
y_test: pd.DataFrame
Returns:
r2 r2 score,
rmse RSME
"""
try:
prediction = model.predict(X_test)
mse_class = MSE()
mse = mse_class.calculate_scores(y_test,prediction)
mlflow.log_metric("mse",mse)
r2_class = R2()
r2 = r2_class.calculate_scores(y_test,prediction)
mlflow.log_metric("r2",r2)
rmse_class = RMSE()
rmse = rmse_class.calculate_scores(y_test,prediction)
mlflow.log_metric("rmse",rmse)
return r2,rmse
except Exception as e:
logging.error("Error in evaluating model: {}".format(e))
raise e
Setting the Environment
Create the virtual environment using Python or Anaconda.
#Command to create virtual environment
python3 -m venv <virtual_environment_name>
You must install some Python packages in your environment using the command below.
cd zenml -project /hotel-room-booking
pip install -r requirements.txt
For running the run_deployment.py script, you will also need to install some integrations using ZenML:
zenml init
zenml integration install mlflow -y
In this project, we have created two pipelines
- run_pipeline.py, a pipeline that only trains the model
- run_deployment.py, a pipeline that also continuously deploys the model.
run_pipeline.py will take the file path as an argument, executing the train_pipeline() method. Below is the pictorial view of the different operations performed by run_pipeline(). This can be viewed by using the dashboard provided by Zenml.
Dashboard URL: http://127.0.0.1:8237/workspaces/default/pipelines/95881272-b1cc-46d6-9f73-7b967f28cbe1/runs/803ae9c5-dc35-4daa-a134-02bccb7d55fd/dag
run_deployment.py:- Under this file, we will execute the continuous_deployment_pipeline and inference_pipeline.
continuous_deployment_pipeline
from pipelines.deployment_pipeline import continuous_deployment_pipeline,inference_pipeline
def main(config: str,min_accuracy:float):
mlflow_model_deployment_component = MLFlowModelDeployer.get_active_model_deployer()
deploy = config == DEPLOY or config == DEPLOY_AND_PREDICT
predict = config == PREDICT or config == DEPLOY_AND_PREDICT
if deploy:
continuous_deployment_pipeline(
data_path=str
min_accuracy=min_accuracy,
workers=3,
timeout=60
)
df = ingest_df(data_path=data_path)
X_train, X_test, y_train, y_test = clean_df(df)
model = train_model(X_train, X_test, y_train, y_test)
r2_score, rmse = evaluate_model(model,X_test,y_test)
deployment_decision = deployment_trigger(r2_score)
mlflow_model_deployer_step(model=model,
deploy_decision=deployment_decision,
workers=workers,
timeout=timeout)
In the abThede, they create a continuous deployment pipeline to take the data and perform data ingestion, splitting, and model training. Once they train the model, they will then evaluate it.
inference_pipeline
@pipeline(enable_cache=False, settings={"docker": docker_settings})
def inference_pipeline(pipeline_name: str, pipeline_step_name: str):
# Link all the steps artifacts together
batch_data = dynamic_importer()
model_deployment_service = prediction_service_loader(
pipeline_name=pipeline_name,
pipeline_step_name=pipeline_step_name,
running=False,
)
predictor(service=model_deployment_service, data=batch_data)
In inference_pipeline, we will predict once the model is trained on the training dataset. In the above code, use dynamic_importer, prediction_service_loader, and predictor. Each of these method have different functionality.
- dynamic_importer:- It loads the dataset and performs preprocessing.
- prediction_service_loader: – This will load the deployed model using the parameter pipeline name and step name offered by Zenml.
- Predictor: – Once the model is trained, a prediction will be made on the test dataset.
Now we will visualize the pipelines using Zenml dashboard to clear view.
continuous_deployment_pipeline dashboard:-
Dashboard url:- http://127.0.0.1:8237/workspaces/default/pipelines/9eb06aba-d7df-43ef-a017-8cb5bb13cd89/runs/e4208fa5-48c8-4a8c-91f1-011c5e1ddbf9/dag
inference_pipeline dashboard:-
Dashboard url:-http://127.0.0.1:8237/workspaces/default/pipelines/07351bb1-6b0d-400e-aeea-551159346f0e/runs/c1ce61f8-dd12-4244-a4d6-514e5520b879/dag
We have deployed a Streamlit app that uses the latest model service asynchronously from the pipeline. It can be done quickly with ZenML within the Streamlit code. To run this Streamlit app in your local system, use the below command:
# command to run the streamlit app locally
streamlit run streamlit_app.py
You can get the complete end-to-end implementation code here
Results
We have experimented with multiple algorithms and compared the performance of each model. The results are as follows:
Models | MSE | RMSE | R2_Score |
---|---|---|---|
XGboost | 267.465 | 16.354 | 16.354 |
LightGBM | 319.477 | 17.873 | 0.839 |
RandomForest | 14.485 | 209.837 | 0.894 |
Linear Regression |
1338.777 | 36.589 | 0.325 |
The Random Forest model performs the best, with the lowest MSE and the highest R^2 score. This means that it is the most accurate at predicting the target variable and explains the most variance in the target variable. LightGBM model is the second best model, followed by the XGBoost model. The Linear Regression model performs the worst.
Demo Application
A live demo application of this project using Streamlit. It takes some input features for the product and predicts the customer satisfaction rate using our trained models.
Conclusion
The hotel room booking sector is also rapidly evolving as internet accessibility has increased in different parts of the world. Due to this, the demand for online hotel room booking has increased. Hotel management wants to know how to keep their guests and improve products and services to make better decisions. Machine learning is vital in various businesses, like customer segmentation, demand forecasting, product recommendation, guest satisfaction, etc.
Frequently Asked Questions
Several features determine the room price. Some of them are hotel_type, room_type, arrival_date, departure_date, number_of_guests, etc.
The model aims to set the correct room price so the hotels can keep the occupancy rate as high as possible. Multiple parties, such as hotels, travel websites, and businesses, can use this data.
A hotel room price optimization model is an ML tool that predicts the room price based on total stay days, room type, any special request, etc. Hotels can use this tool to set competitive prices and maximize profit.
In hotels, the prediction of room prices relies on several factors, including data type and quality. If the model undergoes training with additional parameters, it improves its ability to predict prices more accurately.
This model can be used in hotels to establish competitive prices, attract more customers, and increase occupancy rates. Travelers can utilize it to secure the best deals at reasonable rates without hotels overcharging them. This also helps in travel budget planning.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.