Cloudera Enterprise 5.16.x | Other versions

Setting Up Apache Mahout Using the Command Line

  Important: This item is deprecated and will be removed in a future release. Cloudera supports items that are deprecated until they are removed. For more information about deprecated and removed items, see Deprecated Items.

Apache Mahout is a machine-learning tool. By enabling you to build machine-learning libraries that are scalable to "reasonably large" datasets, it aims to make building intelligent applications easier and faster.

  Note:

To see which version of Apache Mahout is shipping in CDH 5, check the Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.

The main use cases for Mahout are:

  • Recommendation mining, which tries to identify things users will like on the basis of their past behavior (for example shopping or online-content recommendations)
  • Clustering, which groups similar items (for example, documents on similar topics)
  • Classification, which learns from existing categories what members of each category have in common, and on that basis tries to categorize new items
  • Frequent item-set mining, which takes a set of item-groups (such as terms in a query session, or shopping-cart content) and identifies items that usually appear together
  Important:

If you have not already done so, install the Cloudera yum, zypper/YaST or apt repository before using the instructions below to install Mahout. For instructions, see Installing and Deploying Unmanaged CDH Using the Command Line.

Continue reading:

    Installing Mahout

    You can install Mahout from an RPM or Debian package, or from a tarball.
      Note:

    To see which version of Apache Mahout is shipping in CDH 5, check the Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.

    Installing from packages is more convenient than installing the tarball because the packages:
    • Handle dependencies
    • Provide for easy upgrades
    • Automatically install resources to conventional locations

    These instructions assume that you will install from packages if possible.

      Note: Install Cloudera Repository
    Before using the instructions on this page to install or upgrade:
    • Install the Cloudera yum, zypper/YaST or apt repository.
    • Install or upgrade CDH 5 and make sure it is functioning correctly.
    For instructions, see Installing and Deploying Unmanaged CDH Using the Command Line and Upgrading Unmanaged CDH Using the Command Line.

    To install Mahout on a RHEL system:

    $ sudo yum install mahout

    To install Mahout on a SLES system:

    $ sudo zypper install mahout

    To install Mahout on an Ubuntu or Debian system:

    $ sudo apt-get install mahout

    To access Mahout documentation:

    The Mahout docs are bundled in a mahout-doc package that should be installed separately.
    $ sudo apt-get install mahout-doc
    The contents of this package are saved under /usr/share/doc/mahout*.

    The Mahout Executable

    The Mahout executable is installed in /usr/bin/mahout. Use this executable to run your analysis.

    Getting Started with Mahout

    To get started with Mahout, you can follow the instructions in this Apache Mahout Quickstart.

    Viewing the Mahout Documentation

    For more information about Mahout, see mahout.apache.org.

    Page generated October 24, 2018.