If you could share the code that you've written so far, the community will be able to help you better.
If I still had to guess, I would say this is because YTrain is not a cell array of size 30x1 similar to XTrain. Each cell of YTrain would then have an array of size 3x1.
Also, ensure that the 'OutputMode' property is set to 'last'. You can set the 'NumHiddenUnits' to 3 or you can set it to a larger size as required and then follow up the lstmLayer with a fullyConnectedLayer whose outputSize is 3 similar to this example on Japanese vowel classification. The output size of the lstmLayer will be 'NumHiddenUnits' as mentioned here. This will be the input size to the fullyConnectedLayer.