Reducing 3D to 2D in Neural Network Training

39 次查看(过去 30 天)
I MING
I MING 2024-10-30,9:20
编辑: Umang Pandey 2024-11-5,4:19
I want to reduce 3-dimensional data to 2-dimensional in neural network training, not by using preprocessing, but by using a customized network in training, or a built-in network.Here is my customized layer, it doesn't achieve the result I want, I tried to reduce the dimension from 128x1xN to 128xN
classdef DownDimension < nnet.layer.Layer
% 自定義降維層,將 (S*S*C) 降為 (S*C)
methods
function layer = DownDimension(name)
% 層建構函式
layer.Name = name;
layer.Description = "Squeeze layer from (S*S*C) to (S*C)";
end
function Z = predict(layer, X)
% 前向傳播操作
% 假設 X 的尺寸為 (128, 1)
% 顯示 X 的原始尺寸
disp('Original size of X:');
disp(size(X));
% 如果需要轉換為 (128, 1, N)
Z = reshape(X, [size(X,1), 1, size(X,2)]);
% 顯示 Z 的新尺寸
disp('New size of Z after reshaping:');
disp(size(Z));
end
end
end
  3 个评论
I MING
I MING 2024-10-30,9:38
我要先將我的程式進行二維的計算,再經過降維度經過一維捲積層,下面是我的程式碼,是要實做期刊中transformer-encodeing層訓練的
numHeads = 8;
numKeyChannels = 16;
modelSize=1;
kernalsize=17;
filter_num=1;
d_model=128;
frame_mo=128;
flattenedSize = [128 * 128, 1];
%% encoder_block1================
TF_block1_colmean=[
imageInputLayer([d_model frame_mo], 'Name', "input") % 改為聲譜圖輸入層
selfattention2dLayer(numHeads, numKeyChannels, modelSize, "encoder_block1_self2d")
%colmean
ColumnMeanLayer("encoder_block1_colmean")
DownDimension("encoder_block1_col_downdim1")
convolution1dLayer(kernalsize, filter_num, 'Name', 'encoder_block1_colmean_conv1', 'Padding', 'same') % 核大小為3,16個濾波器
reluLayer('Name', 'encoder_block1_colmean_relu')
convolution1dLayer(kernalsize, filter_num, 'Name', 'encoder_block1_colmean_conv2', 'Padding', 'same') % 核大小為3,32個濾波器
sigmoidLayer('Name', 'encoder_block1_colmean_sigmoid')
];
TF_block1_rowmean=[
%rowmean
RowMeanLayer('encoder_block1_rowmean')
DownDimension("encoder_block1_row_downdim1")
convolution1dLayer(kernalsize, filter_num, 'Name', 'encoder_block1_rowmean_conv1', 'Padding', 'same') % 核大小為3,16個濾波器
reluLayer('Name', 'encoder_block1_rowmean_relu')
convolution1dLayer(kernalsize, filter_num, 'Name', 'encoder_block1_rowmean_conv2', 'Padding', 'same') % 核大小為3,32個濾波器
sigmoidLayer('Name', 'encoder_block1_rowmean_sigmoid')
VectorMultiplicationLayer("encoder_block1_mult")
%add&mean
additionLayer(2, 'Name', "encoder_block1_add1")
layerNormalizationLayer('Name', "encoder_block1_norm1")
];
feedforward_layers=[
%feed-forward
fullyConnectedLayer(64, 'Name', "encoder_block1_fc1")
reluLayer('Name', "encoder_block1_relu")
fullyConnectedLayer(d_model*frame_mo*1, 'Name', "encoder_block1_fc2")
reshapeLayerfc('encoder_block1_reshape', [d_model,frame_mo])
additionLayer(2, 'Name', "encoder_block1_add2")
layerNormalizationLayer('Name', "encoder_block1_norm2")
];
%encoder_block1_connect
lgraph1 = layerGraph(TF_block1_colmean);
%lgraph1 = addLayers(lgraph1,TF_block1_colmean);
lgraph1 = addLayers(lgraph1,TF_block1_rowmean);
lgraph1 = addLayers(lgraph1,feedforward_layers);
%add輸入
lgraph1 = connectLayers(lgraph1, 'input', 'encoder_block1_add1/in2');
%將selfattention接到row
lgraph1 = connectLayers(lgraph1, 'encoder_block1_self2d', 'encoder_block1_rowmean');
%將矩陣乘法層數相連接
lgraph1 = connectLayers(lgraph1, 'encoder_block1_colmean_sigmoid', 'encoder_block1_mult/in2');
%第二個add輸入
lgraph1 = connectLayers(lgraph1, 'encoder_block1_norm1', 'encoder_block1_add2/in2');
lgraph1 = connectLayers(lgraph1, 'encoder_block1_norm1', 'encoder_block1_fc1');
%plot(lgraph1);
analyzeNetwork(lgraph1);
我的self-attention是使用paper中描述二維的自定義層,下面是我的二維self-attention
classdef selfattention2dLayer < nnet.layer.Layer
properties
% Layer properties.
NumHeads
NumKeyChannels
ModelSize
end
properties (Learnable)
% Layer learnable parameters.
QueryWeights
KeyWeights
ValueWeights
OutputWeights
end
methods
function layer = selfattention2dLayer(numHeads, numKeyChannels, modelSize, name)
% Create an instance of the layer.
layer.Name = name;
layer.NumHeads = numHeads;
layer.NumKeyChannels = numKeyChannels;
layer.ModelSize = modelSize;
% Initialize learnable parameters.
layer.QueryWeights = randn([1 1 modelSize numKeyChannels], 'single');
layer.KeyWeights = randn([1 1 modelSize numKeyChannels], 'single');
layer.ValueWeights = randn([1 1 modelSize numKeyChannels], 'single');
layer.OutputWeights = randn([1 1 numKeyChannels*numHeads modelSize], 'single');
end
function Z = predict(layer, X)
% Implement the forward pass of the layer.
% X is the input feature map (H x W x C x N).
% Specify data format
dataFormat = 'SSCB'; % 'SSCB' indicates spatial-spatial-channel-batch
% Create zero biases
zeroBiasQuery = zeros(1, 1, size(layer.QueryWeights, 4), 'single');
zeroBiasKey = zeros(1, 1, size(layer.KeyWeights, 4), 'single');
zeroBiasValue = zeros(1, 1, size(layer.ValueWeights, 4), 'single');
zeroBiasOutput = zeros(1, 1, size(layer.OutputWeights, 4), 'single');
% Linear projections.
Q = dlconv(X, layer.QueryWeights, zeroBiasQuery, 'DataFormat', dataFormat);
K = dlconv(X, layer.KeyWeights, zeroBiasKey, 'DataFormat', dataFormat);
V = dlconv(X, layer.ValueWeights, zeroBiasValue, 'DataFormat', dataFormat);
% Get the dimensions.
[H, W, ~, N] = size(X);
% Check dimensions before reshaping
% sizeQ = size(Q);
% sizeK = size(K);
% sizeV = size(V);
% disp(['Size of Q: ', num2str(sizeQ)]);
% disp(['Size of K: ', num2str(sizeK)]);
% disp(['Size of V: ', num2str(sizeV)]);
% Reshape Q, K, V to [H*W, numKeyChannels, NumHeads, N]
Q = reshape(Q, [], layer.NumKeyChannels, layer.NumHeads, N);
K = reshape(K, [], layer.NumKeyChannels, layer.NumHeads, N);
V = reshape(V, [], layer.NumKeyChannels, layer.NumHeads, N);
% Permute Q, K, V to [H*W, NumHeads, numKeyChannels, N] for batch-wise computation
Q = permute(Q, [1, 3, 2, 4]);
K = permute(K, [1, 3, 2, 4]);
V = permute(V, [1, 3, 2, 4]);
% Scaled dot-product attention.
dk = size(K, 3);
scores = pagemtimes(Q, 'none', K, 'transpose') / sqrt(dk);
attentionWeights = softmax(scores, 'DataFormat', 'SSCB');
A = pagemtimes(attentionWeights, V);
% Permute A back to [H*W, numKeyChannels*NumHeads, N]
A = permute(A, [1, 3, 2, 4]);
% Reshape A to [H, W, numKeyChannels*NumHeads, N]
A = reshape(A, H, W, [], N);
% Update OutputWeights dimensions to match A's channel dimension
layer.OutputWeights = randn([1 1 size(A, 3) size(layer.OutputWeights, 4)], 'single');
% Concatenate heads and project.
Z = dlconv(A, layer.OutputWeights, zeroBiasOutput, 'DataFormat', dataFormat);
% disp(['Converted self-Z type: ', class(Z)]);
end
end
end
I MING
I MING 2024-10-30,9:38
Yes, I tried to reduce the dimension from 128x1xN to 128xN

请先登录,再进行评论。

回答(1 个)

Umang Pandey
Umang Pandey 2024-11-5,4:19
编辑:Umang Pandey 2024-11-5,4:19
Hi,
You can make use of the "squeeze" function to convert the matrix of dimension "128X1XN" to "128XN". The function simply removes the dimension of length 1. You can refer to the following MATLAB documentation for details on implementation and examples:
However, this function would remove the dimension of length 1, if you want to perform Dimensionality Reduction using some neural net which would preserve the data for some expected parameters while reducing dimensions, you can make use of some popular DR techniques like PCA (Principal Component Analysis), SOM (Self-Organizing Maps), etc. You can refer to the following MATLAB documentation for more information:
Best,
Umang

类别

Help CenterFile Exchange 中查找有关 Quantization, Projection, and Pruning 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by