Combining Apache Spark, XGBoost, MLeap and Play ! framework to predict customer churn in Telecommunications Companies
Cancellation of services by existing customers is a phenomenon that is omnipresent in the business world and particularly common in highly competitive economic environments.
The telecommunications sector is in constant turmoil due to the strong attractiveness of this environment. Users are less likely to put up with an non functional service, they walk away easily to subscribe to a competitor.
From there, we understand the extent of customer churn in telecommunications companies.
A Data Enthusiat’s team from BAAMTU have implemented a solution that can reduce that phenomenom of churn by combining Big Data and Machine Learning’s techniques.
But what exactly is churn?
Customer churn can be defined as the process by which a customer permanently suspends the use of services to which he or she has previously subscribed. This suspension may also be synonymous with non-use for a given period of time.
The marketing strategies traditionally used by telecommunications to manage their customers and prevent churn are often in the order of three:
- acquiring new customers
- selling incentives to existing customers
- retaining customers
The use of each of these techniques is linked to a certain cost for the company. Acquisition plays a very important role in the marketing strategies of telecommunications companies. And most of them are solely focused on it.
This is far from optimal because as long as the acquisition part continues to monopolize marketing’s efforts of companies, customer retention will continue to suffer and the churn rate will continue to increase.
What about churn rate?
Churn rate is a measure of the number of the company’s customers stopping the use of their services calculated over a given period. Its measurement is very important because its fluctuations can inform on the “state of health”, if I may say so, of the business.
This churn rate therefore, when not controlled, reduces significantly the acquisition efforts made by the marketing unit. Hence the urge to put in place an effective retention policy.
But what to do when you have an overwhelming number of customers?
Telecommunications companies often have a very large number of customers to manage and therefore cannot afford to focus on each of them in an isolated manner.
But, imagine if they could have the information that such a customer is very likely to leave them. This can reduce marketing efforts related to retention by redirecting them to the specific type of at-risk customers.
Process
The idea is to set up a model that predicts churn, in other words a tool that can tell us that a given customer will churn with a certain appetence score. The objective is to allow a manager to make these predictions interactively with a relatively low latency time and to have a visual stratification of clients in relation to their appetence scores.
The predictive model to be used will be based on ‘historical’ data. By historical data, we mean data from customers who have stopped using the services but also from customers who remain in the service in accordance with the machine learning ideology.
In the following part we will discuss the tools necessary to achieve our goal.
Tools
Apache Spark
In terms of customer knowledge, the more heterogeneous the data describing them is, the better it is for us!
Thanks to the distributed paradigm, the storage and the processing of large and various datasets is no longer a caveat.
Apache Spark is a massive data processing engine that offers several processing options including batch, real-time, graph and machine learning data processing.
Spark Machine Learning library, Mllib, supports several classes of classification, regression, dimensionality reduction algorithms and a large number of tools for data pre-processing.
For the creation of our model, we use the XGBoost library through its JVM packages combined with Apache Spark. This allows us to gain speed during training, the choice of parameters through cross validation is also done in a distributed way.
We use Spark through its scala API.
XGBoost
XGBoost or eXtreme Gradient Boosting is an algorithm based on the boosting of the gradient descent. Boosting in contrast to bagging consists in the sequential execution of a given number of decision trees, each of which corrects the errors made during the previous iteration in a sequential manner.
Models are added sequentially until no improvements can be made. The final prediction is obtained by summing the predictions made by the different decision trees.
The term gradient boosting refers to the use of the gradient descent algorithm to minimize loss or error when adding new models.
MLeap
We often need to provide the end user with the ability to interact with our model either through a web application, mobile or through APIs…
To do this, the driven model needs to be usable on all types of platforms. This is where the MLeap project was created. MLeap is the probable successor of the PMML format, the export format for machine learning templates in xml format. Mleap offers the serialization of coded models with the Mllib library of Spark, Scikit Learn, Tensorflow under a JSON or protobuff format and export them under a single format called Bundle, usable in particular on platforms running the JVM (Java, Scala,…).
The models can be used independently (without any dependence) of their original training platform (Spark, Tensorflow, Scikit Learn,…) by using only the runtime: MLeap Runtime.
We use MLeap through its scala API.
Play
Play is an MVC framework for creating web applications.
Play offers support for Java and Scala programming languages.
Better yet, Play! comes packaged with its own server, Akka since version 2.6.X and Netty since its last version. There is therefore no need to configure a web server in its development and production environment.
In addition, it offers an asynchronous motor based on Akka.
In our case, we use Play Through its scala API to set up a web application through which a manager can send prediction requests for his clients and view the results.
Scala
As you may have noticed, we love Scala. Why? Well! It’s quite simple enough.
This following phrase describe best my feelings towards the langage, I think :
It’s functional, it’s object-oriented, it’s…, it’s everything you needed and more!
As you will have understood, Scala is an object-oriented programming language in the sense that everything is object and functional in the sense that every function is value. Scala has excellent type management support and also offers inference of these types.
Description of the Data used
As mentioned above, we are implementing a predictive model of customer churn , i.e., we propose a model that predicts the odds of a given customer to discontinue their use of the company’s services.
The model produced is the result of supervised learning and more precisely of a binary classification and therefore require labelled data (telling about the variable to be predicted)
The following table describe the different informations present on the data :
FIELD | DESCRIPTION |
Account Length | Equivalent to the client’s seniority in the business |
VMail Message | Aggregation of customer voice messages |
Day Minutes | Accumulated minutes of the client’s actions during the day |
Eve Minutes | Accumulated minutes of the client’s actions during the evening |
Night Minutes | Accumulated minutes of the client’s actions during the night |
International Minutes | Cumulative minutes of the client’s international actions |
Customer Service Calls | Cumulative calls from the customer to customer service |
Churn | Target variable that we are trying to predict, whether or not the client is churning |
International Plan | Does the customer have access to the International plan |
Vmail Plan | Does the customer have voicemail plan |
Day Calls | Cumulative calls from the customer during the day |
Day Charge
|
Accumulated credit amount of the customer’s shares during the day |
Eve Calls
|
Accumulated customer calls during the evening |
Eve Charge | Accumulated credit amount of the customer’s shares during the evening
|
Night Calls
|
Accumulated overnight calls from the customer
|
Night Charge
|
Accumulated credit amount of the customer’s shares during the night
|
International Calls
|
Cumulative international calls from the customer
|
International Charge | Cumulative credit amount of the client’s international shares |
State | Customer’s state of origin |
Area Code | Area Code |
Phone | Customer’s telephone number |
These data include a few categories of customer data commonly used in telecommunications companies, namely:
- Usage data: any data related to the customer referring to his use of services, calls, voice messages, internet services…
- Interaction data: informing about the customer’s interactions with services such as customer service, call centers, etc.
- Context data: any other data that may characterize the customer, such as personal data
Functional architecture
It can be noted that contextual data such as state, area code and phone are not very significant in the problem we are trying to solve. These variables were therefore excluded from the process.
So we will no longer have to deal with categorical features in the process of building our model.

As said earlier, Spark provides a set of tools to deal with the different tasks involved in a model building process:
- Transformers: involved in the tasks of scaling, converting or modifying features;
- Extractors: help in extracting features from data;
- Selectors: provides us the ability to select a subset of features from a larger set of features and
- Estimators: abstracts the concept of a learning algorithm that fit on data
Our first step consists in doing some exploratory tasks on the data: numerical statistics on the columns of numerical variables, correlation and intercorrelation study and predictive power of each variable. These tasks aim to provide a pretty good understanding of the data we are given.
While preprocessing, we bucketize our continuous features. The Spark ML’s Bucketizer transformer transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users.
We finally create a unique vector, using the Spark ML’s VectorAssembler transformer, of all our features. This allow us to perform a dimensionality reduction using Spark ML’s PCA transformer.
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
val assembler = new VectorAssembler(“Churn_Assembler”)
.setInputCols(assemblerColumns.toArray)
.setOutputCol(“assembled”)
stages :+= assembler
val reducer = new PCA(“Churn_PCA”)
.setInputCol(assembler.getOutputCol)
.setOutputCol(“features”)
.setK(reducer_num_dimensions)
stages :+= reducer
The estimator used here is the XGBoostClassifier class of the XGBoost classification package.It is provided the appropriate parameters obtained after cross validation of the different parameters. For a explanation of the meaning of these parameters and further, one can visit the XGBoost website.
val xgbParam = Map(
“eta” -> booster_eta,
“max_depth” -> booster_max_depth,
“objective” -> booster_objective,
“num_round” -> booster_num_round,
“num_workers” -> booster_num_workers,
“min_child_weight” -> booster_min_child_weight,
“gamma” -> booster_gamma,
“alpha” -> booster_alpha,
“lambda” -> booster_lambda,
“subsample” -> booster_subsample,
“colsample_bytree” -> booster_colsample_bytree ,
“scale_pos_weight” -> booster_scale_pos_weight,
“base_score” -> booster_base_score
)
val xgbClassifier = new XGBoostClassifier(xgbParam)
.setFeaturesCol(reducer.getOutputCol)
.setLabelCol(“label”)
The classifier is then added to our stages to form the final pipeline:
stages :+= xgbClassifier
val pipeline = new Pipeline(“Churn_Pipeline”).setStages(stages)
The model is obtained by applying this pipeline to the training data. This resulting model is exported as a Mleap Bundle in protobuff format:
implicit val sbc : SparkBundleContext = SparkBundleContext()
.withDataset(model.transform(train))
(for (bundle <- managed(BundleFile(“jar:file:/tmp/churn-model-protobuff.zip”))) yield {
model.writeBundle.format(SerializationFormat.Protobuf).save(bundle).get
}).tried.get
The exported model will be used in our web application to perform churn prediction for new customers online.
val dirBundle : Transformer= (for(bundle <- managed(BundleFile(“jar:”+cwd+“conf/resources/churn-model-protobuff.zip”))) yield {
bundle.loadMleapBundle().get.root
}).tried.get
val transformedLeapFrame = dirBundle.transform(leapFrame).get
Results
Eventually, we will need to add customer information for scoring. To do this, you can add them individually or add several at a time by uploading a file. As a reminder, since the model is the result of supervised learning, to use it it it must be given data with the same name and containing the same number of variables as those used during training as shown in the screenshot below.

The next step after adding clients is the actual prediction using the previously exported model.

The data shown below are output by the model. They provide information on the probability and confidence of the client to belong to each of the two classes, in other words, the appetence score of a client to be “churner” or not.
Depending on these values, the customer is predicted as a potential “churner” or not in respect to the chosen threshold. The threshold here is a limit value defined by the company.

These predictions can eventually be stored in a database for future use.
Depending on their churning appetence scores, customers can be classified as “high risk”, “medium risk” or “low risk”.

This segmentation can help redirect retention marketing efforts more accurately to targeted customers, such as sending targeted messages:

Conclusion
The phenomenon of churn is omnipresent in telecommunications, due in particular to the high level of competitiveness that prevails there. Thinking about this phenomenon in a different way by advocating retention could be very beneficial. Through prediction, the use of machine learning techniques coupled with Big Data can help redirect marketing strategies aimed at retention to the right customers.
On another level, considering that a customer is influenced by his peers (other customers with whom he often communicates), if many peers have churned, it is very likely that he will follow.
Considering information no longer limited to the user interaction with the companies’ services, but also information related to its interaction with its ‘network’ can help increase the performance and accuracy of a model that predicts customer churn.
Awa Thiam