Blog

  • Mahout

    This blog aims to provide a simple way to configure and create a mahout project in Eclipse IDE . Mahout is used to produce free implementations of distributed or otherwise scalable machine learning algorithms on the Hadoop platform.

     

    Mahout supports mainly four use cases:

     

    Recommendation mining takes users' behavior and from that tries to find items users might like.

    Clustering takes e.g. text documents and groups them into groups of topically related documents.

    Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.

    Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

    Prerequisites

    • Java 1.6 or later - install java on your linux machine and set Java home .
    • Hadoop - please refer to my blog on
    • Maven

      you can install maven using following command

          sudo  apt-get install maven2
      

      To make sure installation is done, run following command in your console

      
          $ mvn --help
      
    •  
    • Mahout

      
          sudo apt-get install mahout
      

      Let us go to steps

        • Configure maven in your eclipse ide

          
          help ->  market Place -> m2e
           
        • Now edit the pom.xml file and add the following dependecies

           

          
            
               org.apache.mahout
               mahout-core
               0.5
             
             
               org.apache.mahout
               mahout-math
               0.5
             
             
               org.apache.mahout
               mahout-math
               0.5
               test-jar
               test
             
             
               org.apache.mahout
               mahout-utils
               0.5
             
          

          It will install mahout 0.5 version .

        • Run mvn install for your maven project

          It will configure the mahout dependencies recently added to pom.xml , Mahout version 0.5 will be added to dependencies and other related dependecies.

          Now we are ready to run our first program in eclipse

           

          In src/main/java create a package and add your class here

        • Now simply run this program as java application and you are done.

        • Test

        • Create a class in src/main/java

        • Modify main() function using following code

          
              public static void main(String[ ] aa) throws exception
            {
              DataModel model = new FileDataModel(new File("path of file xyz.csv"));
              UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
              UserNeighborhood neighborhood = new NearestNUserNeighborhood (2, similarity, model);
               Recommender recommender = new           GenericUserBasedRecommender(model,neighborhood,similarity);
               List <recommendeditem > recommedations = recommender.recommend(1,1);
               for(RecommendedItem recommendation : recommendations)
              {
                 System.out.println(recommendation);
               }
            }
          
        • data for the file xyz.csv will contain

        • 1,101,5.0
          1,102,3.0
          1,103,2.5
          2,101,2.0
          2,102,2.5
          2,103,5.0
          2,104,2.0
          3,101,2.5
          3,104,4.0
          3,105,4.5
          3,107,5.0
          4,101,5.0
          4,103,3.0
          4,104,4.5
          4,106,4.0
          5,101,4.0
          5,102,3.0
          5,103,2.0
          5,104,4.0
          5,105,3.5
          5,106,4.0

        • In output console on running the program as java application you will get following output item :104 , value:4.257

Tags: mahout