"splitEachLabel" built-in function does not really randomize the picture distribution?
显示 更早的评论
When I use R2017b to do deep learning classification, the imageDatasotre object is divided into training and test set,whether or not to specify the number or proportion, 'splitEachLabel' optional parameters specified as 'randomized', the training set inside the picture is not randomly arranged, and why?
as the document said: https://cn.mathworks.com/help/nnet/examples/create-simple-deep-learning-network-for-classification.html
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
'nndatasets','DigitDataset');
digitData = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
trainingNumFiles = 750;
rng(1) % For reproducibility
[trainDigitData,testDigitData] = splitEachLabel(digitData, ...
trainingNumFiles,'randomize');
When you open "trainDigitData.Files" and "trainDigitData.Labels" in a workspace, they do not disrupt the order?

采纳的回答
更多回答(1 个)
xingxingcui
2018-3-1
2 个评论
debojit sharma
2023-7-8
Since,it may be risky to do a standard random train/test split when having strong class imbalance.Because very small number of positive cases, we might end up with a train and test set that have very different class distributions. We may even end up with close to zero positive cases in our test set. So, is there anyfunction to do stratified sampling during train/test split that avoids disturbing class balance in our samples in MatLab @cui @Wentao Du . Like the following code in python:
from sklearn.model_selection import train_test_split
train, test = train_test_split(data, test_size = 0.3, stratify=data.buy)
xingxingcui
2023-10-24
类别
在 帮助中心 和 File Exchange 中查找有关 Big Data Processing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!