Main Content

Resolve Out-of-Resource Conditions

Depending on how the cluster resources are configured, under certain situations, the cluster might be out of physical resources, which can affect pod status.

To see pod status, run the following command, replacing <namespace-name> with the namespace you used for MATLAB® Online Server™:

kubectl get pods --namespace <namespace-name>

How Out-of-Resource Condition Can Occur

Some pod statuses that can indicate that the cluster is out of resources include Pending, Evicted (see Resolve Evicted or Terminated Pod Issues), and ContainerCreating.

Kubernetes® can proactively monitor for and prevent total starvation of a compute resource. If this situation occurs during pod creation, Kubernetes can reclaim the starved resource by proactively failing one or more pods. The pod status then goes from Running to Evicted, and the new pods get stuck in the ContainerCreating state.

Possible Solution 1. Clean Up Unused Images

An out-of-resource condition can occur if you update the MATLAB image too many times without cleaning up unused images. For example, if the ephemeral storage of the Amazon EC2® instance (node) in AWS® is approximately 100 GB, and you spin up a single-node MATLAB Online Server cluster successfully with all pods in the Running state, most capacity on the node is occupied by the MATLAB image (approximately 27 GB) for a full MATLAB install.

If you happen to update the image in a few weeks and then perform a MATLAB Pool update, the node downloads yet another approximately 27 GB image (assuming it is an image with a different name). If you do this a few more times without cleaning up unused images, you start seeing that some pods are getting into an Evicted state and the pod you are trying to start (MATLAB pod in this case) is stuck in the ContainerCreating state because it was able to pull the image but did not have enough resources (memory) to start the container.

Make sure unused images from the node are periodically cleaned up. Run the following command to see the images on the node. If you are using Podman as your container management tool, replace docker with podman.

docker images
Then, use either of the following commands to remove an unused image:
docker rmi <image-id>
or
docker rmi <image:tag>

Possible Solution 2. Provision More Nodes or Set Explicit Resource Limits

If resource limits for CPU and memory are not set, users can inadvertently exploit all available resources.

For example, assume that the MATLAB Pool pod is configured with no resource limits (default) as shown:

matlabResources:
  requests:
    cpu: "200m"
    memory: "2Gi"
  limits:
    #cpu: "<cpu-limit-here>"
    #memory: "<memory-limit-here>"

Under these conditions, when a MATLAB Online™ user performs an operation that maxes out the resources on the node, other end users cannot sign in to MATLAB Online, as the pod is stuck in the ContainerCreating state. Because there is a lack of resources, Kubernetes starts evicting some pods, putting them in the Evicted state.

Possible Action 1: Provision More Nodes

Provision more nodes on the cluster to meet the resource needs of the cluster.

Possible Action 2: Set Explicit Resource Limits

Set resource limits appropriately for the MATLAB Pool pod so that users cannot exploit all the resources.

  1. Update the matlab-pool override file at <overrides/matlab-online-server/mathworks/matlab-pool.yaml>. Uncomment the limits section and then update with appropriate limit values.

  2. Run the following mosadm command to update the node:

    mosadm upgrade matlab-pool

Related Topics