Apache PredictionIO and Python

Posted By : Anoop Sharma | 04-Dec-2020

Built on top of a state-of-the-art open source stack for developers and data scientists, Apache PredictionIO is an open-source Machine Learning server to help create predictive engines for any machine learning task.

Setting up PredictionIO 

There are two ways in which you can setup PredictionIO on your Linux system

  • Download Binary File for the already built distribution
  • Start downloading from the scratch and create a new built binary Distribution

We’ll work with the 2nd option,

Steps to create  PredictionIO binary distribution from scratch on your system

 

 

tar zxvf apache-predictionio-0.14.0.tar.gz
  • Now when you have extracted the files, We would create the distributions for the necessary software such as Scala, Elasticsearch, and Spark by writing this command. Make sure you are in the directory where the extracted files are kept.

 

./make-distribution.sh -Dscala.version=2.11.12 -Dspark.version=2.4.0 -Delasticsearch.version=6.4.2
  • If the above command is successful, then you’d see something like this on your terminal 

 


PredictionIO-0.14.0/sbt/sbt
PredictionIO-0.14.0/conf/
PredictionIO-0.14.0/conf/pio-env.sh
PredictionIO binary distribution created at PredictionIO-0.14.0.tar.gz
  • Now we need to extract the binary distribution we just created in the above steps in the same directory using this command 

 

tar zxvf PredictionIO-0.14.0.tar.gz

Now we have the binary distribution and let’s move towards installing the dependencies in the following steps and we are only going to use spark as a dependency so we’ll only install that only.

  • First, create a subdirectory inside the PredictionIO-0.14.0 folder as vendors


    mkdir PredictionIO-0.14.0/vendors
    

     
  • Download and setup Spark using the below commands

     
$ wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
$ tar zxvfC spark-2.4.0-bin-hadoop2.7.tgz PredictionIO-0.14.0/vendors

 

Now let’s start the PredictionIO server using the following command
 

PredictionIO-0.14.0/bin/pio-start-all 

You’d get output something like this

 

$ PredictionIO-0.14.0/bin/pio-start-all 
Starting PredictionIO Event Server

It means everything is working and your server has started, To check the status of your server just type. $ PredictionIO-0.14.0/bin/pio status and if you see something like this then you are good to go to the next part where you would select the template for creating the engine 

 

(sleeping 5 seconds for all messages to show up...)
Your system is all ready to go.

From here you can select the template you want to use for your engine
Template Gallery for Recommendation engine 

I have selected this template and you can choose any template that you wanna work on. But first set the path for using pio command like this

 

$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH

Now follow these steps to create an engine

  • Clone this repo like this
     
git clone https://github.com/apache/predictionio-template-recommender.git MyRecommendation

It will create a folder naming MyRecommendation and the cloned project inside it. Move inside this folder 

  • Create an app using this command and store the output you receive after running this command somewhere safe.
     
$ pio app new MyApp1

Its output would be something like this
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyApp1
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b

Now lets add data into our engine using a python library called predictionio, Here are the steps to do that, Make sure you are still in the MyRecommendation folder 

 

$ pip install predictionio
$curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
$ python data/import_eventserver.py --access_key $ACCESS_KEY

You would see an output something like this

 

Importing data
1501 events are imported.

Our data is fed into the engine Now it's time for us to deploy our engine as a service. Follow these steps to do the same

  • Go inside the engine.json folder inside the MyRecommendation folder and change the appName to the one that we have created while creating a new app.
     
     

 

  ...
  "datasource": {
    "params" : {
      "appName": "MyApp1"
    }
  },
  ...
  • Now build your app using the following command
     
$ pio build --verbose

And the output for the correct build would be something like this

 

[INFO] [Console$] Your engine is ready for training.
  • Now train your predictive model

     
$ pio train

Output :

[INFO] [CoreWorkflow$] Training completed successfully.
  • Now deploy the engine using the following command

     
$ pio deploy

Output:

[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.
  • Now you can go to the browser and see it live on 0.0.0.0:8000 and now to check whether your trained model is working or not let’s try hitting it with some values. Don’t kill the server. Open another terminal and do the following commands.

     
$ curl -H "Content-Type: application/json" \
-d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json

If everything is written you’d get an output something like this

 

{
  "itemScores":[
    {"item":"22","score":4.072304374729956},
    {"item":"62","score":4.058482414005789},
    {"item":"75","score":4.046063009943821},
    {"item":"68","score":3.8153661512945325}
  ]
}

Which means your engine is working fine.

Keep Coding! 

 

Oodles Technologies is a well-built IT company that can serve you with most of your Big Data software solutions. Our development services include open-source Big Data platforms viz. Apache, Hadoop, MongoDB etc. We aim to serve our clients with their varied needs of Big Data Solutions that can ease them enhance their business reach.

About Author

Author Image
Anoop Sharma

Anoop is a Python developer, who has worked on Python Framework Django and is keen to increase his skillset in the field. He has a zest for learning and is capable of handling challenges. He is a team player and has good enthusiasm.

Request for Proposal

Name is required

Comment is required

Sending message..