signalDatastore of a large Dataset for feedforward training
3 次查看(过去 30 天)
显示 更早的评论
i'm trying to train a feedforward net with a very large number of files andh datas (approx 13k files more than 3000 rows each). not being able to fit every data in a single matrix for the training, i tried to build a signal datastore and give it to the network, but i always receive the same error: 'Error using trainNetwork (line 191)
Invalid training data. The output size (1) of the last layer does not match the response size (2201).
Error in NN_datastore_v2 (line 76)
net=trainNetwork(sdsTrain, layers,options);'.
where is the mistake? i suppose it's in the readfunction, maybe the format? i tried several options but i can't seem to get the right combination. please help.
here's the full code:
clc
clear all;
Folders="********";
sds=signalDatastore(Folders,"IncludeSubfolders",true,"ReadFcn", @dataproc, 'FileExtensions','.txt');
numFiles = numel(sds.Files);
rng('default'); % Per la riproducibilità
fileIndices = randperm(numFiles);
trainRatio = 0.7;
valRatio = 0.15;
numTrain = floor(trainRatio * numFiles);
numVal = floor(valRatio * numFiles);
% Indici per ciascun set
trainIdx = fileIndices(1:numTrain);
valIdx = fileIndices(numTrain+1:numTrain+numVal);
testIdx = fileIndices(numTrain+numVal+1:end);
% Crea i sottodatastore
sdsTrain = subset(sds, trainIdx);
sdsVal = subset(sds, valIdx);
sdsTest = subset(sds, testIdx);
%%
layers = [
featureInputLayer(429, "Normalization", "zscore")
reluLayer
...
fullyConnectedLayer(1)
regressionLayer
];
options = trainingOptions('adam', ...
'MaxEpochs', 1000, ...
'MiniBatchSize', 64, ...
'ValidationData',sdsVal,...
'OutputNetwork','best-validation',...
'Verbose',true');
%%
net=trainNetwork(sdsTrain, layers,options);
%%
%%
function data=dataproc(filename)
l_max=2500;
% opts = detectImportOptions(filename, 'Delimiter','\t');
opts=delimitedTextImportOptions("NumVariables", 442);
opts.Delimiter = "\t";
fixedVariableNames = [******];
dynamicVariableNames = "Gage" + string(1:429);
opts.VariableNames = [fixedVariableNames, dynamicVariableNames];
opts.VariableTypes = repmat("double", 1, 442);
opts=setvaropts(opts, "DecimalSeparator", ",");
tableData = readtable(filename, opts);
dataNumeric=table2array(tableData);
if size(dataNumeric,1) <l_max
data={};
return
end
% if size(dataNumeric,1) > l_max
Fz= dataNumeric(300:l_max,strcmp(fixedVariableNames, 'FzN'));
lambdas = dataNumeric(300:l_max, 14:end);
[b_butter, a_butter] = butter(7, 0.03); % Filtro passa-basso
window_size = 5; % Finestra per filtro mediano
outlierIndices = isoutlier(lambdas, 'mean');
lambdas(outlierIndices) = nan;
lambdas = fillmissing(lambdas, 'linear');
strain_filt = medfilt1(lambdas, window_size)
filtered_force = filtfilt(b_butter, a_butter, Fz);
% data.X =strain_filt;
% data.Y =filtered_force;
data = {strain_filt, filtered_force};
% end
end
2 个评论
回答(1 个)
Gayathri
2024-12-23
As per my understanding, each of your files have 2201 samples. But the network outputs only one sample as the number of neurons in the last "fullyConnectedLayer" is 1. Please replace this line of code with the following code.
fullyConnectedLayer(2201)
This would most probably solve the issue you are facing. I have not implemented the code at my end, as I do not have access to the input data.
For more information about "fullyConnectedLayer", please refer to the below link.
Hope you find this information helpful!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 AI for Signals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!