Main Content

Manage and Access MATLAB Job Scheduler Cluster Job History

Since R2024a

MATLAB® Job Scheduler clusters save job history information by default. You can use job history data to gain insights into cluster usage.

Enable Job History Logging

MATLAB Job Scheduler clusters with MATLAB Parallel Server™ release R2024a or later save job history information by default. To benefit from job history saving, update the MATLAB Parallel Server version of your cluster to the R2024a release or later.

You can control the saving of job history information using the SAVE_JOB_HISTORY parameter in the mjs_def file. For more information about the mjs_def file, see Define MATLAB Job Scheduler Startup Parameters.

Manage Job History Files

The MATLAB Job Scheduler saves job history data to the job_history folder in the CHECKPOINTBASE location on the headnode. Check the mjs_def file to find the location of the checkpoint folder.

The scheduler saves job history data to a set of ten rotating CSV files. By default, when the active job history CSV file reaches 1 GB in size, the scheduler archives the file. In practice, when the active job history file, job_history.0.csv, reaches the 1 GB size limit, the scheduler archives the file as job_history.1.csv and continues to write to the new job_history.0.csv file. Simultaneously, the scheduler deletes the oldest file job_history.9.csv and rotates the existing job history files as follows: job_history.1.csv becomes job_history.2.csv, job_history.2.csv becomes job_history.3.csv, and so on, until job_history.8.csv becomes job_history.9.csv. The combined size of all files in the database folder is limited to 10 GB.

The combined size of all files in the database folder is limited to 10 GB.

If the scheduler encounters any issues during the setup of job history logging (for example, failure to create a file or write to disk), the startjobmanager command returns an error.

Read Job History File

File system permissions allow only the admin user to access the job history files.

Each job history entry corresponds to the completed execution of a task. Below is a snippet of a job history CSV file.

User,Version,Mode,Type,Job,Task,Attempt,Start,Duration,State,Worker,Host
user2,R2024a,batch,independent,1,1,1,1697120886.345,45.868,finished,mjs-worker-1,wkr1hostid
user5,R2024a,interactive,parpool,2,39,1,1697121035.862,72.551,finished,mjs-worker-25,wkr25hostid
user5,R2024a,interactive,parpool,2,115,1,1697121035.862,72.49,finished,mjs-worker-94,wkr94hostid
user5,R2024a,interactive,parpool,2,2,1,1697121035.862,72.613,finished,mjs-worker-10,wkr10hostid
user5,R2024a,interactive,parpool,2,3,1,1697121035.862,72.621,finished,mjs-worker-100,wkr100hostid
user5,R2024a,interactive,parpool,2,40,1,1697121035.862,72.585,finished,mjs-worker-26,wkr26hostid

The first row in the CSV file lists the names of the data columns.

Column Name Data TypeDescription
1UserString

Owner of the task.

2VersionString

MATLAB version of the task, for example R2024a.

3ModeString

Execution mode of the job associated with the task. Possible values are:

  • "batch" — Batch job submitted to cluster.

  • "interactive" — Interactive parallel pool job submitted to cluster.

4TypeString

Type of the job associated with the task. Possible values are:

  • "independent" — Batch independent job submitted to cluster.

  • "pool" — Batch pool job submitted to cluster.

  • "spmd" — Batch spmd job submitted to cluster.

  • "parpool" — Interactive parallel pool job submitted to cluster.

5JobInteger

ID number of the job associated with the task.

6TaskInteger

ID number of the task.

7AttemptInteger

Number of the task attempt.

8StartDouble

Start time of the task, given as the number of seconds that have elapsed since the epoch of January 1, 1970, 00:00:00 UTC.

9DurationDouble

Duration of the task, measured in seconds.

10StateString

Finished state of the task. Possible values are:

  • "finished" — The task ran to completion without error.

  • "errored" — The task threw an error in MATLAB.

  • "failed" — The task failed because of a problem with the cluster.

  • "canceled" — The user canceled or deleted the task.

11WorkerString

Name of the worker that ran the task.

12HostString

Host machine of the worker that ran the task.

You can use any program that can read CSV files to view, extract, and analyze data from the job history files.

Related Topics