Main Content

Deploy Applications Using the MATLAB API for Spark

Create and execute MATLAB® applications against Spark™ using the MATLAB API for Spark

Supported Platform: Linux® only.

Using the MATLAB API for Spark to deploy an application consists of two parts:

  • Creating your application using the MATLAB API for Spark and packaging it as a standalone application in the MATLAB desktop environment.

  • Executing the standalone application against a Spark enabled cluster from a Linux shell.

While creating your application using the MATLAB API for Spark, you will be able to use Spark functions such as flatMap, mapPartitions, aggregate and others in your MATLAB code. The API exposes the Spark programming model to MATLAB, allowing for MATLAB implementations of numerous Spark functions. Many of these MATLAB implementations accept function handles or anonymous functions as inputs to perform various types of analyses.

The API lets you interactively run your application from within the MATLAB desktop environment in a nondistributed mode on a single machine. A second MATLAB session on the same machine serves as a worker. This functionality can be helpful in debugging your application prior to deploying it on a Spark enabled cluster. It is necessary to configure your MATLAB environment for interactive debugging using the MATLAB API for Spark. For more information, see Configure Environment for Interactive Debugging.

The general workflow for using the MATLAB API for Spark is as follows:

  1. Specify Spark properties.

  2. Create a SparkConf object.

  3. Create a SparkContext object.

  4. Create an RDD object from the data.

  5. Perform operations on the RDD object.

You can package an application created with this API into a standalone application using the mcc command or deploytool. You can then run the application on a Spark enabled cluster from a Linux shell.

Note

MATLAB applications developed using the MATLAB API for Spark cannot be deployed if they contain tall arrays.

For a complete example, see Deploy Applications to Spark Using the MATLAB API for Spark. You can follow the same instructions to deploy applications created using the MATLAB API for Spark to CLOUDERA® CDH.

Classes

matlab.compiler.mlspark.SparkConfInterface class to configure an application with Spark parameters as key-value pairs
matlab.compiler.mlspark.SparkContextInterface class to initialize a connection to a Spark enabled cluster
matlab.compiler.mlspark.RDDInterface class to represent a Spark Resilient Distributed Dataset (RDD)

Topics