Data representation in Mahout

Mahout uses concept of data model to represent the data on which we are doing analysis.

Datamodel Implementations represent a repository of information about users and their associated Preferences for items.

The class diagram below tries to explain more

Refreshable Interface : Implementations of this interface have state that can be periodically refreshed. For example, an implementation instance might contain some pre-computed information that should be periodically refreshed. The refresh(Collection) method triggers such a refresh.

 

 

 

DataModel

 

AbstractDataModel : Contains some features common to all implementations.

 

AbstractDataModel

 

FileDataModel : A DataModel backed by a delimited file. This class expects a file where each line contains a user ID, followed by item ID, followed by optional preference value, followed by optional timestamp. Commas or tabs delimit fields:

e.g Code

DataModel model = new FileDataModel(new File(
                "src/main/java/com/jugnu.mahout.LearnDatamodel/intro.csv"));

We declare a new datamodel of file type and give the path of file which has data to be analyzed

No comments:

Post a Comment

Please share your views and comments below.

Thank You.