Predicting Next Purchase Using XGBoost and Python
Posted By : Aakash Chaddha | 29-Jan-2020
Predicting Next Purchase
Using XGBoost For Python App Development
For every company, whether big or small, customer retention is a hard pill. It is a green signal that indicates a company is doing good or not. Filtering fruitful customers from the haystack is always better for a company’s resources.
Using machine learning to calculate potential or long-term customers is now possible. We can train the model on our data to let it understand the minuscule variation on the customer data that can help produce an accurate prediction.
I will first explain to you all the terminologies that we need to understand before diving into the nitty-gritty of the codes. Code snippets will be available side-by-side.
Terminology:
RFM Model: Recency, frequency, monetary Model used to track and filter a company's fruitful clients.
Recency: The more recently a customer made a purchase with a company, the more likely he or she is to keep the company and the brand in mind for subsequent purchases. The probability of engaging in future transactions with recent buyers is arguably higher as compared to consumers who have not bought from the company in months or even longer periods.
Frequency: The frequency of transactions of the customer may be affected by factors such as the type of product, the purchase price point and the need for replenishment or replacement.
Monetary Value: Monetary value stems from the cost-effectiveness of the customer's business expenses in the course of their transactions.
XGBOOST Model: XGBoost is a Machine Learning algorithm based on a decision-tree ensemble that uses a gradient boosting system. Artificial neural networks tend to outperform any other algorithms or systems when predicting problems involving unstructured data (images, text, etc.).
K-MEANS Clustering: K-means clustering is a method of vector quantization that is common for cluster analysis in data mining, originally from signal processing. The goal of the clustering of k-means is to divide n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a cluster prototype.
Overview
We will be using customers to purchase data to predict their future repurchase chance within a given period of time. We will be needing their purchase date, item price, and customer id.
The following are the steps we may follow on this journey.
- Data preprocessing
- Training the model
- Prediction on New Customers.
Data Preprocessing:
Suppose a Gadget Company needs an ML model that can predict how much current customers are likely to repurchase again within the next 6 months. We currently have clients purchasing a history of 10 years from 2009-2019.
We will be using only 9 years of data ( < 2019 ) to train the model. And use the last one year (2019) to do a perfect validation test on the model.
|
We may use data_tsc and data_tsc_next to calculate the next purchase day.
Replace all the NA data with 9999 for easy grouping.
As We are following the RFM model, let us calculate Recency, Frequency and Monetary value of each customer.
|
|
|
|
Now we will make cluster of these three features and sort them with individually with the function order_cluster()
|
Clustering Recency, Frequency, and Monetary Value:
|
|
|
The last three consecutive transaction differences are added and the mean and standard deviation of their latest transaction difference is also calculated and added in the dataframe.
We will include new columns “OverallScore” and “Segment”:
|
The given dataframe is further grouped according to their next purchase day.
|
Till here, we have all the features and labels required for training our model.
Training the Model:
For training our Model, splitting of data is required. Therefore,
|
We will be using XGBoost Model to train our model.
|
We can store our trained model:
|
Prediction on New Customers:
For testing our model on new customer data, I would suggest you to again repeat the step from the top, with these changes,
|
And follow each step as it is, Till the data splitting snippet.
Instead of training splitting, Do this.
|
Load the pretrained model using the Load Model code in the Training the Model Section.
Use the following code to check your accuracy:
|
Confusion Matrix of the Prediction:
Diagram of Feature and Labels for XGB Model.
Thanks for reading.
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Aakash Chaddha
He is a very ambitious, motivated, career oriented person, willing to accept challenges, energetic and result oriented, with excellent leadership abilities, and an active and hardworking person who is patient and diligent.