Showing posts with label Mahout. Show all posts
Showing posts with label Mahout. Show all posts

Data representation in Mahout

Mahout uses concept of data model to represent the data on which we are doing analysis.

Datamodel Implementations represent a repository of information about users and their associated Preferences for items.

The class diagram below tries to explain more

Refreshable Interface : Implementations of this interface have state that can be periodically refreshed. For example, an implementation instance might contain some pre-computed information that should be periodically refreshed. The refresh(Collection) method triggers such a refresh.

 

 

 

DataModel

 

AbstractDataModel : Contains some features common to all implementations.

 

AbstractDataModel

 

FileDataModel : A DataModel backed by a delimited file. This class expects a file where each line contains a user ID, followed by item ID, followed by optional preference value, followed by optional timestamp. Commas or tabs delimit fields:

e.g Code

DataModel model = new FileDataModel(new File(
                "src/main/java/com/jugnu.mahout.LearnDatamodel/intro.csv"));

We declare a new datamodel of file type and give the path of file which has data to be analyzed