Resolve Out-of-Resource Conditions
Depending on how the cluster resources are configured, under certain situations, the cluster might be out of physical resources, which can affect pod status.
To see pod status, run the following command, replacing
<namespace-name>
with the namespace you used for MATLAB®
Online Server™:
kubectl get pods --namespace <namespace-name>
How Out-of-Resource Condition Can Occur
Some pod statuses that can indicate that the cluster is out of resources include
Pending
, Evicted
(see Resolve Evicted or Terminated Pod Issues), and ContainerCreating.
Kubernetes® can proactively monitor for and prevent total starvation of a compute resource.
If this situation occurs during pod creation, Kubernetes can reclaim the starved resource by proactively failing one or more pods. The
pod status then goes from Running
to Evicted
, and the
new pods get stuck in the ContainerCreating
state.
Possible Solution 1. Clean Up Unused Images
An out-of-resource condition can occur if you update the MATLAB image too many times without cleaning up unused images. For example, if the
ephemeral storage of the Amazon EC2® instance (node) in AWS® is approximately 100 GB, and you spin up a single-node MATLAB
Online Server cluster successfully with all pods in the Running
state, most
capacity on the node is occupied by the MATLAB image (approximately 27 GB) for a full MATLAB install.
If you happen to update the image in a few weeks and then perform a MATLAB Pool update, the node downloads yet another approximately 27 GB image (assuming
it is an image with a different name). If you do this a few more times without cleaning up
unused images, you start seeing that some pods are getting into an Evicted
state and the pod you are trying to start (MATLAB pod in this case) is stuck in the ContainerCreating
state
because it was able to pull the image but did not have enough resources (memory) to start the
container.
Make sure unused images from the node are periodically cleaned up. Run the following
command to see the images on the node. If you are using Podman as your container management tool, replace docker
with
podman
.
docker images
docker rmi <image-id>
docker rmi <image:tag>
Possible Solution 2. Provision More Nodes or Set Explicit Resource Limits
If resource limits for CPU and memory are not set, users can inadvertently exploit all available resources.
For example, assume that the MATLAB Pool pod is configured with no resource limits (default) as shown:
matlabResources: requests: cpu: "200m" memory: "2Gi" limits: #cpu: "<cpu-limit-here>" #memory: "<memory-limit-here>" |
Under these conditions, when a MATLAB
Online™ user performs an operation that maxes out the resources on the node, other end
users cannot sign in to MATLAB
Online, as the pod is stuck in the ContainerCreating
state. Because
there is a lack of resources, Kubernetes starts evicting some pods, putting them in the Evicted
state.
Possible Action 1: Provision More Nodes
Provision more nodes on the cluster to meet the resource needs of the cluster.
Possible Action 2: Set Explicit Resource Limits
Set resource limits appropriately for the MATLAB Pool pod so that users cannot exploit all the resources.
Update the
matlab-pool
override file at<overrides/matlab-online-server/mathworks/matlab-pool.yaml>
. Uncomment the limits section and then update with appropriate limit values.Run the following
mosadm
command to update the node:mosadm upgrade matlab-pool