matlab.compiler.mlspark.SparkConf Class

Namespace: matlab.compiler.mlspark
Superclasses:

Interface class to configure an application with Spark parameters as key-value pairs

Description

A SparkConf object stores the configuration parameters of the application being deployed to Spark™. Every application must be configured prior to deployment on a Spark cluster. The configuration parameters are passed onto a Spark cluster through a SparkContext.

Construction

conf = matlab.compiler.mlspark.SparkConf('AppName',name,'Master',url,'SparkProperties',prop) creates a SparkConf object with the specified configuration parameters.

conf = matlab.compiler.mlspark.SparkConf(___,Name,Value) creates a SparkConf object with additional configuration parameters specified by one or more Name,Value pair arguments. Name is a property name of the class and Value is the corresponding value. Name must appear inside single quotes (''). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Input Arguments

expand all

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

Name of application specified as a character vector inside single quotes ('').

Example: 'AppName', 'myApp'

Data Types: char | string

`url` — Master URL to connect to
character vector | string

Name of the master URL specified as a character vector inside single quotes ('').

URL	Description
`local`	Run Spark locally with one worker thread. There is no parallelism by selecting this option.
`local[K]`	Run Spark locally with `K` worker threads. Set `K` to the number of cores on your machine.
`local[*]`	Run Spark locally with as many worker threads as logical cores on your machine.
`yarn-client`	Connect to a Hadoop^® YARN cluster in client mode. The cluster location is found based on the `HADOOP_CONF_DIR` or `YARN_CONF_DIR` variable.

Example: 'Master', 'yarn-client'

Data Types: char | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

A containers.Map object containing Spark configuration properties as key-value pairs.

Note

When deploying to a local cluster using the MATLAB API for Spark, the 'SparkProperties' property name can be ignored during the construction of a SparkConf object, thereby requiring no value for prop. Or you can set prop to an empty containers.Map object as follows:

'SparkProperties',containers.Map({''},{''})

The key and value of the containers.Map object are empty char vectors.

When deploying to a Hadoop YARN cluster, set the value for prop with the appropriate Spark configuration properties as key-value pairs. The precise set of Spark configuration properties vary from one deployment scenario to another, based on the deployment cluster environment. Users must verify the Spark setup with a system administrator to use the appropriate configuration properties. See the table for commonly used Spark properties. For a full set of properties, see the latest Spark documentation.

Running Spark on YARN

Property Name (Key)	Default (Value)	Description
`spark.executor.cores`	`1`	The number of cores to use on each executor. For YARN and Spark standalone mode only. In Spark standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application runs on each worker.
`spark.executor.instances`	`2`	The number of executors. Note This property is incompatible with `spark.dynamicAllocation.enabled`. If both `spark.dynamicAllocation.enabled` and `spark.executor.instances` are specified, dynamic allocation is turned off and the specified number of `spark.executor.instances` is used.
`spark.driver.memory`	`1g` `2048m` (recommended)	Amount of memory to use for the driver process. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.executor.memory`	`1g` `2048m` (recommended)	Amount of memory to use per executor process. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.yarn.executor.memoryOverhead`	`executorMemory * 0.10`, with minimum of `384`. `4096m` (recommended)	The amount of off-heap memory (in MBs) to be allocated per executor. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.dynamicAllocation.enabled`	`false`	This option integrates Spark with the YARN resource management. Spark initiates as many executors as possible given the executor memory requirement and number of cores. This property requires that the cluster be set up. Setting this property to `true` specifies whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. This property requires `spark.shuffle.service.enabled` to be set. The following configurations are also relevant: `spark.dynamicAllocation.minExecutors`, `spark.dynamicAllocation.maxExecutors`, and `spark.dynamicAllocation.initialExecutors`
`spark.shuffle.service.enabled`	`false`	Enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed. This must be enabled if `spark.dynamicAllocation.enabled` is set to `true`. The external shuffle service must be set up in order to enable it.

MATLAB Specific Properties

Property Name (Key)	Default (Value)	Description
`spark.matlab.worker.debug`	`false`	For use in standalone/interactive mode only. If set to true, a Spark deployable MATLAB application executed within the MATLAB desktop environment, starts another MATLAB session as worker, and will enter the debugger. Logging information is directed to `log_<nbr>.txt`.
`spark.matlab.worker.reuse`	`true`	When set to `true`, a Spark executor pools workers and reuses them from one stage to the next. Workers terminate when the executor under which the workers are running terminates.
`spark.matlab.worker.profile`	`false`	Only valid when using a session of MATLAB as a worker. When set to `true`, it turns on the MATLAB Profiler and generates a Profile report that is saved to the file `profworker_<split_index>_<socket>_<worker pass>.mat`.
`spark.matlab.worker.numberOfKeys`	`10000`	Number of unique keys that can be held in a `containers.Map` object while performing `*ByKey` operations before map data is spilled to a file.
`spark.matlab.executor.timeout`	`600000`	Spark executor timeout in milliseconds. Not applicable when deploying tall arrays.

Monitoring and Logging

Property Name (Key)	Default (Value)	Description
`spark.history.fs.logDirectory`	`file:/tmp/spark-events`	Directory that contains application event logs to be loaded by the history server.
`spark.eventLog.dir`	`file:///tmp/spark-events`	Base directory in which Spark events are logged, if `spark.eventLog.enabled` is `true`. Within this base directory, Spark creates a sub directory for each application, and logs the events specific to the application in this directory. You can set this to a unified location like an HDFS™ directory so history files can be read by the history server.
`spark.eventLog.enabled`	`false`	Whether to log Spark events. This is useful for reconstructing the web UI after the application has finished.

Data Types: char

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

`ExecutorEnv` — Map of key-value pairs that will be used to establish the executor environment
`containers.Map` object

Map of key-value pairs specified as a containers.Map object.

Example: 'ExecutorEnv', containers.Map({'SPARK_JAVA_OPTS'}, {'-Djava.library.path=/my/custom/path'})

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string

A character vector specifying the path to MATLAB Runtime within single quotes ''.

Example: 'MCRRoot', '/share/MATLAB/MATLAB_Runtime/v91'

Data Types: char | string

Properties

The properties of this class are hidden.

Methods

There are no user executable methods for this class.

Examples

collapse all

Configure an Application with Spark Parameters

The SparkConf class allows you to configure an application with Spark parameters as key-value pairs.

sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);

More About

expand all

SparkConf

SparkConf stores the configuration parameters of the application being deployed to Spark. Every application must be configured prior to being deployed on a Spark cluster. Some of the configuration parameters define properties of the application and some are used by Spark to allocate resources on the cluster. The configuration parameters are passed onto a Spark cluster through a SparkContext.

References

See the latest Spark documentation for more information.

Version History

Introduced in R2016b

matlab.compiler.mlspark.SparkConf Class

Description

Construction

Input Arguments

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

`url` — Master URL to connect to
character vector | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

`ExecutorEnv` — Map of key-value pairs that will be used to establish the executor environment
`containers.Map` object

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string

Properties

Methods

Examples

Configure an Application with Spark Parameters

More About

SparkConf

References

Version History

See Also

Classes

Topics

matlab.compiler.mlspark.SparkConf Class

Description

Construction

Input Arguments

name — Name of the MATLAB® application deployed to Spark character vector | string

url — Master URL to connect to character vector | string

prop — Map of key-value pairs that specify Spark configuration properties containers.Map object

ExecutorEnv — Map of key-value pairs that will be used to establish the executor environment containers.Map object

MCRRoot — Path to MATLAB Runtime that is used to execute driver application character vector | string

Properties

Methods

Examples

Configure an Application with Spark Parameters

More About

SparkConf

References

Version History

See Also

Classes

Topics

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

`url` — Master URL to connect to
character vector | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

`ExecutorEnv` — Map of key-value pairs that will be used to establish the executor environment
`containers.Map` object

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string