How to segment dataset and randomly sample and append datapoints?
3 次查看(过去 30 天)
显示 更早的评论
I'm attempting to recreate an example from this paper. This is essentially segmented randomly sampling. This is similar to the idea of sampling across the entire dataset, but ensuring that each segment has an equal chance of being represented.
Assume there's a table:
T(:,1) = [3.0, 5.6, 10.2, 12.0, 14.4, 15.6];
T(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17", "16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51", "25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
The table is divided into s=3 segments, resulting the following divisions:
Seg1(:,1) = [3.0, 5.6];
Seg1(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17"];
Seg2(:,1) = [10.2, 12.0];
Seg2(:,2) = ["16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51"];
Seg3(:,1) = [14.4, 15.6];
Seg3(:,2) = ["25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
I need to randomly sample n-1 out of n of the datapoints in each segment, repeating the random sampling nCr times from each segment (i.e., select 1 datapoint out of 2 from Seg1, repeat the random sampling 2C1 or 1 more time from Seg1. Repeat for each segment).
Then to create new datasets T, we append each randomly selected datapoint from each segment. This should result in (nCr)^s new datasets, or in this case (2C1)^3=8 new datasets. An example of one dataset is:
T1(:,1) = [3.0, 10.2, 14.4];
T1(:,2) = ["08-Feb-2019 12:34:52","16-Feb-2019 14:50:31", "25-Feb-2019 07:55:24"];
This is my attempt to code the above.
numRows=size(T,1); %Establish total number of rows
numSeg = 3; % Split it into 3 segments
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
% Splitting the table into numSeg segments with splitIndex number of datapoints
for m = 1:splitIndex:numRows
for i = length(numSeg)
Seg{i}= T(m);
end
end
% Randomly sample splitIndex-1 datapoints from each segment, repeat
% splitIndex choose splitIndex-1 times
for i = length(numSeg)
datasample(Seg{i})
T{i} = concat(Seg{i})
end
I'm especially struggling with randomly sampling from each segment and then matching the randomly sampled datapoints to append from each of the following segment. Thank you!
0 个评论
回答(1 个)
Image Analyst
2024-4-23
Try this:
% Create initial full table.
col1 = [3.0; 5.6; 10.2; 12.0; 14.4; 15.6];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
randomRows = randperm(numRows)
% Create 3 tables with random rows with no repeated rows.
t1 = T(randomRows(1:splitIndex), :)
t2 = T(randomRows(splitIndex+1: 2*splitIndex), :)
t3 = T(randomRows(2*splitIndex+1:3*splitIndex), :)
6 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Curve Fitting Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!