I think I found a relevant MATLAB example (Train Network on Image and Feature Data) which could help me. The URL is here: https://www.mathworks.com/help/deeplearning/ug/train-network-on-image-and-feature-data.html
In the example, the training data are converted into datastore Type via arrayDatastore and then combined into dsTrain, as seen in the picture below
Seems like the sequence of the combined data is the same as the input required by the neural net, as seen below
dsTrain = combine(dsX1Train,dsX2Train,dsTTrain);
dsX1Train(ImageInput), dsX2Train(rotation angle), dsTTrain(output).
Am I correct?
However, an answer from an experienced user or Mathworker would help a lot, :D.