Apache PredictionIO and Python

Posted By : Anoop Sharma | 04-Dec-2020

Built on top of a state-of-the-art open source stack for developers and data scientists, Apache PredictionIO is an open-source Machine Learning server to help create predictive engines for any machine learning task.

Setting up PredictionIO

There are two ways in which you can setup PredictionIO on your Linux system

Download Binary File for the already built distribution
Start downloading from the scratch and create a new built binary Distribution

We’ll work with the 2nd option,

Steps to create PredictionIO binary distribution from scratch on your system

Download source code from here - Source code for PredictionIO setup from scratch
Extract the Tar file and place it in a new folder. By writing this in your terminal where the TAR file has been download.

tar zxvf apache-predictionio-0.14.0.tar.gz

Now when you have extracted the files, We would create the distributions for the necessary software such as Scala, Elasticsearch, and Spark by writing this command. Make sure you are in the directory where the extracted files are kept.

./make-distribution.sh -Dscala.version=2.11.12 -Dspark.version=2.4.0 -Delasticsearch.version=6.4.2

If the above command is successful, then you’d see something like this on your terminal

…
PredictionIO-0.14.0/sbt/sbt
PredictionIO-0.14.0/conf/
PredictionIO-0.14.0/conf/pio-env.sh
PredictionIO binary distribution created at PredictionIO-0.14.0.tar.gz

Now we need to extract the binary distribution we just created in the above steps in the same directory using this command

tar zxvf PredictionIO-0.14.0.tar.gz

Now we have the binary distribution and let’s move towards installing the dependencies in the following steps and we are only going to use spark as a dependency so we’ll only install that only.

First, create a subdirectory inside the PredictionIO-0.14.0 folder as vendors
```
mkdir PredictionIO-0.14.0/vendors
```
Download and setup Spark using the below commands

$ wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
$ tar zxvfC spark-2.4.0-bin-hadoop2.7.tgz PredictionIO-0.14.0/vendors

Now let’s start the PredictionIO server using the following command

PredictionIO-0.14.0/bin/pio-start-all

You’d get output something like this

$ PredictionIO-0.14.0/bin/pio-start-all 
Starting PredictionIO Event Server…

It means everything is working and your server has started, To check the status of your server just type. $ PredictionIO-0.14.0/bin/pio status and if you see something like this then you are good to go to the next part where you would select the template for creating the engine

(sleeping 5 seconds for all messages to show up...)
Your system is all ready to go.

From here you can select the template you want to use for your engine
Template Gallery for Recommendation engine

I have selected this template and you can choose any template that you wanna work on. But first set the path for using pio command like this

$ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH

Now follow these steps to create an engine

Clone this repo like this

git clone https://github.com/apache/predictionio-template-recommender.git MyRecommendation

It will create a folder naming MyRecommendation and the cloned project inside it. Move inside this folder

Create an app using this command and store the output you receive after running this command somewhere safe.

$ pio app new MyApp1

Its output would be something like this
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyApp1
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b

Now lets add data into our engine using a python library called predictionio, Here are the steps to do that, Make sure you are still in the MyRecommendation folder

$ pip install predictionio
$curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
$ python data/import_eventserver.py --access_key $ACCESS_KEY

You would see an output something like this

Importing data…
1501 events are imported.

Our data is fed into the engine Now it's time for us to deploy our engine as a service. Follow these steps to do the same

Go inside the engine.json folder inside the MyRecommendation folder and change the appName to the one that we have created while creating a new app.

  ...
  "datasource": {
    "params" : {
      "appName": "MyApp1"
    }
  },
  ...

Now build your app using the following command

$ pio build --verbose

And the output for the correct build would be something like this

[INFO] [Console$] Your engine is ready for training.

Now train your predictive model

$ pio train

Output :

[INFO] [CoreWorkflow$] Training completed successfully.

Now deploy the engine using the following command

$ pio deploy

Output:

[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.

Now you can go to the browser and see it live on 0.0.0.0:8000 and now to check whether your trained model is working or not let’s try hitting it with some values. Don’t kill the server. Open another terminal and do the following commands.

$ curl -H "Content-Type: application/json" \
-d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json

If everything is written you’d get an output something like this

{
  "itemScores":[
    {"item":"22","score":4.072304374729956},
    {"item":"62","score":4.058482414005789},
    {"item":"75","score":4.046063009943821},
    {"item":"68","score":3.8153661512945325}
  ]
}

Which means your engine is working fine.

Keep Coding!

Oodles Technologies is a well-built IT company that can serve you with most of your Big Data software solutions. Our development services include open-source Big Data platforms viz. Apache, Hadoop, MongoDB etc. We aim to serve our clients with their varied needs of Big Data Solutions that can ease them enhance their business reach.

Related Tags

MachineLearning

artificial Intelligence

About Author

Anoop Sharma

Anoop is a Python developer, who has worked on Python Framework Django and is keen to increase his skillset in the field. He has a zest for learning and is capable of handling challenges. He is a team player and has good enthusiasm.

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

Apache PredictionIO and Python

Posted By : Anoop Sharma | 04-Dec-2020

Related Tags

About Author

Anoop Sharma

More From Oodles

Fair Play & Machine Learning : The Path to Next Level Game Development

In this article, we'll unveil the key concepts of ML for ensuring fair play, its challenges and how it addresses these issues to provide players with a secure gaming experience.

Arpita Pal | 11-Oct-2024

Smart Gaming : Impact of Reinforcement Learning on Mobile Games

This blog delves into the significant role of reinforcement learning in revolutionizing mobile game AI, exploring its applications, challenges, and future prospects.

Arpita Pal | 15-Apr-2024

Driving Strategic Growth With AI In Marketing Analytics

This blog delves into the multifaceted impact of AI, its significance and the technologies utilized in AI in marketing analytics.

Arpita Pal | 16-Feb-2024

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2025 Oodles Technologies Pvt Ltd. All rights reserved.