Apache PredictionIO and Python
Posted By : Anoop Sharma | 04-Dec-2020
Built on top of a state-of-the-art open source stack for developers and data scientists, Apache PredictionIO is an open-source Machine Learning server to help create predictive engines for any machine learning task.
Setting up PredictionIO
There are two ways in which you can setup PredictionIO on your Linux system
- Download Binary File for the already built distribution
- Start downloading from the scratch and create a new built binary Distribution
We’ll work with the 2nd option,
Steps to create PredictionIO binary distribution from scratch on your system
- Download source code from here - Source code for PredictionIO setup from scratch
- Extract the Tar file and place it in a new folder. By writing this in your terminal where the TAR file has been download.
tar zxvf apache-predictionio-0.14.0.tar.gz
- Now when you have extracted the files, We would create the distributions for the necessary software such as Scala, Elasticsearch, and Spark by writing this command. Make sure you are in the directory where the extracted files are kept.
./make-distribution.sh -Dscala.version=2.11.12 -Dspark.version=2.4.0 -Delasticsearch.version=6.4.2
- If the above command is successful, then you’d see something like this on your terminal
… PredictionIO-0.14.0/sbt/sbt PredictionIO-0.14.0/conf/ PredictionIO-0.14.0/conf/pio-env.sh PredictionIO binary distribution created at PredictionIO-0.14.0.tar.gz
- Now we need to extract the binary distribution we just created in the above steps in the same directory using this command
tar zxvf PredictionIO-0.14.0.tar.gz
Now we have the binary distribution and let’s move towards installing the dependencies in the following steps and we are only going to use spark as a dependency so we’ll only install that only.
- First, create a subdirectory inside the PredictionIO-0.14.0 folder as vendors
mkdir PredictionIO-0.14.0/vendors
- Download and setup Spark using the below commands
$ wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz $ tar zxvfC spark-2.4.0-bin-hadoop2.7.tgz PredictionIO-0.14.0/vendors
Now let’s start the PredictionIO server using the following command
PredictionIO-0.14.0/bin/pio-start-all
You’d get output something like this
$ PredictionIO-0.14.0/bin/pio-start-all Starting PredictionIO Event Server…
It means everything is working and your server has started, To check the status of your server just type. $ PredictionIO-0.14.0/bin/pio status and if you see something like this then you are good to go to the next part where you would select the template for creating the engine
(sleeping 5 seconds for all messages to show up...) Your system is all ready to go.
From here you can select the template you want to use for your engine
Template Gallery for Recommendation engine
I have selected this template and you can choose any template that you wanna work on. But first set the path for using pio command like this
$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH
Now follow these steps to create an engine
- Clone this repo like this
git clone https://github.com/apache/predictionio-template-recommender.git MyRecommendation
It will create a folder naming MyRecommendation and the cloned project inside it. Move inside this folder
- Create an app using this command and store the output you receive after running this command somewhere safe.
$ pio app new MyApp1 Its output would be something like this [INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b
Now lets add data into our engine using a python library called predictionio, Here are the steps to do that, Make sure you are still in the MyRecommendation folder
$ pip install predictionio $curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt $ python data/import_eventserver.py --access_key $ACCESS_KEY
You would see an output something like this
Importing data… 1501 events are imported.
Our data is fed into the engine Now it's time for us to deploy our engine as a service. Follow these steps to do the same
- Go inside the engine.json folder inside the MyRecommendation folder and change the appName to the one that we have created while creating a new app.
... "datasource": { "params" : { "appName": "MyApp1" } }, ...
- Now build your app using the following command
$ pio build --verbose
And the output for the correct build would be something like this
[INFO] [Console$] Your engine is ready for training.
- Now train your predictive model
$ pio train Output : [INFO] [CoreWorkflow$] Training completed successfully.
- Now deploy the engine using the following command
$ pio deploy Output: [INFO] [HttpListener] Bound to /0.0.0.0:8000 [INFO] [MasterActor] Bind successful. Ready to serve.
- Now you can go to the browser and see it live on 0.0.0.0:8000 and now to check whether your trained model is working or not let’s try hitting it with some values. Don’t kill the server. Open another terminal and do the following commands.
$ curl -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json
If everything is written you’d get an output something like this
{ "itemScores":[ {"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789}, {"item":"75","score":4.046063009943821}, {"item":"68","score":3.8153661512945325} ] }
Which means your engine is working fine.
Keep Coding!
Oodles Technologies is a well-built IT company that can serve you with most of your Big Data software solutions. Our development services include open-source Big Data platforms viz. Apache, Hadoop, MongoDB etc. We aim to serve our clients with their varied needs of Big Data Solutions that can ease them enhance their business reach.
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Anoop Sharma
Anoop is a Python developer, who has worked on Python Framework Django and is keen to increase his skillset in the field. He has a zest for learning and is capable of handling challenges. He is a team player and has good enthusiasm.