How Parallel Computing Software Runs a Job
Overview
Parallel Computing Toolbox™ and MATLAB® Parallel Server™ software let you solve computationally and data-intensive problems using MATLAB and Simulink® on multicore and multiprocessor computers. Parallel processing constructs such as parallel for-loops and code blocks, distributed arrays, parallel numerical algorithms, and message-passing functions let you implement task-parallel and data-parallel algorithms at a high level in MATLAB without programming for specific hardware and network architectures.
A job is some large operation that you need to perform in your MATLAB session. A job is broken down into segments called tasks. You decide how best to divide your job into tasks. You could divide your job into identical tasks, but tasks do not have to be identical.
The MATLAB session in which the job and its tasks are defined is called the client session. Often, this is on the machine where you program MATLAB. The client uses Parallel Computing Toolbox software to perform the definition of jobs and tasks and to run them on a cluster local to your machine. MATLAB Parallel Server software is the product that performs the execution of your job on a cluster of machines.
The MATLAB Job Scheduler is the process that coordinates the execution of jobs and the evaluation of their tasks. The MATLAB Job Scheduler distributes the tasks for evaluation to the server's individual MATLAB sessions called workers. Use of the MATLAB Job Scheduler to access a cluster is optional; the distribution of tasks to cluster workers can also be performed by a third-party scheduler, such as Microsoft® Windows® HPC Server (including CCS) or Spectrum LSF®.
Basic Parallel Computing Setup
Toolbox and Server Components
MATLAB Job Scheduler, Workers, and Clients
The MATLAB Job Scheduler can be run on any machine on the network. The MATLAB Job Scheduler runs jobs in the order in which they are submitted, unless any jobs in its queue are promoted, demoted, canceled, or deleted.
Each worker is given a task from the running job by the MATLAB Job Scheduler, executes the task, returns the result to the MATLAB Job Scheduler, and then is given another task. When all tasks for a running job have been assigned to workers, the MATLAB Job Scheduler starts running the next job on the next available worker.
A MATLAB Parallel Server software setup usually includes many workers that can all execute tasks simultaneously, speeding up execution of large MATLAB jobs. It is generally not important which worker executes a specific task. In an independent job, the workers evaluate tasks one at a time as available, perhaps simultaneously, perhaps not, returning the results to the MATLAB Job Scheduler. In a communicating job, the workers evaluate tasks simultaneously. The MATLAB Job Scheduler then returns the results of all the tasks in the job to the client session.
Note
For testing your application locally or other purposes, you can configure a single computer as client, worker, and MATLAB Job Scheduler host. You can also have more than one worker session or more than one MATLAB Job Scheduler session on a machine.
Interactions of Parallel Computing Sessions
A large network might include several MATLAB Job Schedulers as well as several client sessions. Any client session can create, run, and access jobs on any MATLAB Job Scheduler, but a worker session is registered with and dedicated to only one MATLAB Job Scheduler at a time. The following figure shows a configuration with multiple MATLAB Job Schedulers.
Cluster with Multiple Clients and MATLAB Job Schedulers
Local Cluster
A feature of Parallel Computing Toolbox software is the ability to run a local cluster of workers on the client machine, so that you can run jobs without requiring a remote cluster or MATLAB Parallel Server software. In this case, all the processing required for the client, scheduling, and task evaluation is performed on the same computer. This gives you the opportunity to develop, test, and debug your parallel applications before running them on your network cluster.
Note
To develop and test your code, you can run batch jobs on a local cluster on your client machine instead of running them on a remote cluster. If you close your MATLAB session, any batch jobs using the local cluster also stop immediately.
Third-Party Schedulers
As an alternative to using the MATLAB Job Scheduler, you can use a third-party scheduler. This could be a Microsoft Windows HPC Server (including CCS), Spectrum LSF scheduler, PBS Pro® scheduler, TORQUE scheduler, or a generic scheduler.
Choosing Between a Third-Party Scheduler and a MATLAB Job Scheduler. You should consider the following when deciding to use a third-party scheduler or the MATLAB Job Scheduler for distributing your tasks:
Does your cluster already have a scheduler?
If you already have a scheduler, you may be required to use it as a means of controlling access to the cluster. Your existing scheduler might be just as easy to use as a MATLAB Job Scheduler, so there might be no need for the extra administration involved.
Is the handling of parallel computing jobs the only cluster scheduling management you need?
The MATLAB Job Scheduler is designed specifically for MathWorks® parallel computing applications. If other scheduling tasks are not needed, a third-party scheduler might not offer any advantages.
Is there a file sharing configuration on your cluster already?
The MATLAB Job Scheduler can handle all file and data sharing necessary for your parallel computing applications. This might be helpful in configurations where shared access is limited.
Are you interested in batch mode or managed interactive processing?
When you use a MATLAB Job Scheduler, worker processes usually remain running at all times, dedicated to their MATLAB Job Scheduler. With a third-party scheduler, workers are run as applications that are started for the evaluation of tasks, and stopped when their tasks are complete. If tasks are small or take little time, starting a worker for each one might involve too much overhead time.
Are there security concerns?
Your own scheduler might be configured to accommodate your particular security requirements.
How many nodes are on your cluster?
If you have a large cluster, you probably already have a scheduler. Consult your MathWorks representative if you have questions about cluster size and the MATLAB Job Scheduler.
Who administers your cluster?
The person administering your cluster might have a preference for how jobs are scheduled.
Do you need to monitor your job's progress or access intermediate data?
A job run by the MATLAB Job Scheduler supports events and callbacks, so that particular functions can run as each job and task progresses from one state to another.
Components on Mixed Platforms or Heterogeneous Clusters
Parallel Computing Toolbox software and MATLAB Parallel Server software are supported on Windows, UNIX®, and Macintosh operating systems. Mixed platforms are supported, so that the clients, MATLAB Job Scheduler, and workers do not have to be on the same platform. Other limitations are described at System Requirements.
In a mixed-platform environment, system administrators should be sure to follow the proper installation instructions for the local machine on which you are installing the software.
mjs Service
If you are using the MATLAB Job Scheduler, every machine that hosts a worker or MATLAB Job Scheduler session must also run the mjs service.
The mjs service controls the worker and MATLAB Job Scheduler sessions and recovers them when their host machines crash. If a worker or MATLAB Job Scheduler machine crashes, when the mjs service starts up again (usually configured to start at machine boot time), it automatically restarts the MATLAB Job Scheduler and worker sessions to resume their sessions from before the system crash. More information about the mjs service is available in the MATLAB Parallel Server documentation.
Components Represented in the Client
A client session communicates with the MATLAB Job Scheduler by calling methods and configuring properties of a MATLAB Job Scheduler cluster object. Though not often necessary, the client session can also access information about a worker session through a worker object.
When you create a job in the client session, the job actually exists in the MATLAB Job Scheduler job storage location. The client session has access to the job through a job object. Likewise, tasks that you define for a job in the client session exist in the MATLAB Job Scheduler data location, and you access them through task objects.
Life Cycle of a Job
When you create and run a job, it progresses through a number of stages. Each
stage of a job is reflected in the value of the job object's
State
property, which can be pending
,
queued
, running
, or
finished
. Each of these stages is briefly described in this
section.
The figure below illustrates the stages in the life cycle of a job. In the
MATLAB Job Scheduler (or other scheduler), the jobs are shown categorized by
their state. Some of the functions you use for managing a job are createJob
, submit
, and fetchOutputs
.
Stages of a Job
The following table describes each stage in the life cycle of a job.
Job Stage | Description |
---|---|
Pending | You create a job on the scheduler with the |
Queued | When you execute the |
Running | When a job reaches the top of the queue, the scheduler
distributes the job's tasks to worker sessions for evaluation.
The job's state is now |
Finished | When all of a job's tasks have been evaluated, the job is
moved to the |
Failed | When using a third-party scheduler, a job might fail if the scheduler encounters an error when attempting to execute its commands or access necessary files. |
Deleted | When a job's data has been removed from its data location
or from the MATLAB Job Scheduler with the |
Note that when a job is finished, its data remains in the MATLAB Job Scheduler's JobStorageLocation
folder, even
if you clear all the objects from the client session. The MATLAB Job Scheduler or scheduler keeps all the jobs it has executed, until
you restart the MATLAB Job Scheduler in a clean state. Therefore, you can retrieve
information from a job later or in another client session, so long as the
MATLAB Job Scheduler has not been restarted with the
-clean
option.
You can permanently remove completed jobs from the MATLAB Job Scheduler or scheduler's storage location using the Job Monitor
GUI or the delete
function.