How to segment dataset and randomly sample and append datapoints?

3 次查看(过去 30 天)
I'm attempting to recreate an example from this paper. This is essentially segmented randomly sampling. This is similar to the idea of sampling across the entire dataset, but ensuring that each segment has an equal chance of being represented.
Assume there's a table:
T(:,1) = [3.0, 5.6, 10.2, 12.0, 14.4, 15.6];
T(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17", "16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51", "25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
The table is divided into s=3 segments, resulting the following divisions:
Seg1(:,1) = [3.0, 5.6];
Seg1(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17"];
Seg2(:,1) = [10.2, 12.0];
Seg2(:,2) = ["16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51"];
Seg3(:,1) = [14.4, 15.6];
Seg3(:,2) = ["25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
I need to randomly sample n-1 out of n of the datapoints in each segment, repeating the random sampling nCr times from each segment (i.e., select 1 datapoint out of 2 from Seg1, repeat the random sampling 2C1 or 1 more time from Seg1. Repeat for each segment).
Then to create new datasets T, we append each randomly selected datapoint from each segment. This should result in (nCr)^s new datasets, or in this case (2C1)^3=8 new datasets. An example of one dataset is:
T1(:,1) = [3.0, 10.2, 14.4];
T1(:,2) = ["08-Feb-2019 12:34:52","16-Feb-2019 14:50:31", "25-Feb-2019 07:55:24"];
This is my attempt to code the above.
numRows=size(T,1); %Establish total number of rows
numSeg = 3; % Split it into 3 segments
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
% Splitting the table into numSeg segments with splitIndex number of datapoints
for m = 1:splitIndex:numRows
for i = length(numSeg)
Seg{i}= T(m);
end
end
% Randomly sample splitIndex-1 datapoints from each segment, repeat
% splitIndex choose splitIndex-1 times
for i = length(numSeg)
datasample(Seg{i})
T{i} = concat(Seg{i})
end
I'm especially struggling with randomly sampling from each segment and then matching the randomly sampled datapoints to append from each of the following segment. Thank you!

回答(1 个)

Image Analyst
Image Analyst 2024-4-23
Try this:
% Create initial full table.
col1 = [3.0; 5.6; 10.2; 12.0; 14.4; 15.6];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
randomRows = randperm(numRows)
% Create 3 tables with random rows with no repeated rows.
t1 = T(randomRows(1:splitIndex), :)
t2 = T(randomRows(splitIndex+1: 2*splitIndex), :)
t3 = T(randomRows(2*splitIndex+1:3*splitIndex), :)
  6 个评论
Joy
Joy 2024-4-24
This is very helpful, particularly in separating the segments into cells. For my purposes, it seems that randomsample is more suitable for what I'm trying to obtain. In the last section I believe you're rearranging the rows in each segment, however I'm trying to sample a datapoint from a segment, splitIndex-1 times (one less than the number of datapoints there are.)
I'm trying to generate 8 new tables that mimic the original table. So what I'm hoping the following code does is...
There will be nchoosek(splitIndex,splitIndex-1)^2 datasets created called TNew, and to populate each of those datasets, a random sample from each segment will fill in the empty TNew. There should be 8 cells in TNew, with each cell have a table with the size [6,2].
TNew = zeros([numSeg*splitIndex-1 2]);
for i = 1:(nchoosek(splitIndex,splitIndex-1))^numSeg
for k = 1 : numel(segments)
randDataSample = vertcat(datasample(segments{k},splitIndex-1));
TNew{i}=randDataSample;
end
end
When I execute datasample(segments{k},splitIndex-1), I see it outputting the correct format I want, I'm having issues with vertically concatenating randomly sampled datapoints.
Again, thank you so much for all the help
Joy
Joy 2024-4-25
I've gotten to the point where I can generate a cell, and each cell should have the datapoints drawn from the segments. do you have any suggestion on how to concatenate the tables drawn from each segment?
%%
NumNewDatasets = (nchoosek(splitIndex,splitIndex-1))^numSeg;
TNew=cell(NumNewDatasets,1);
for k = 1 : numel(segments)
sampledData(k)= datasample(segments{k},splitIndex-1)
TNew{k,:}=table2cell(sampledData);
end

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Curve Fitting Toolbox 的更多信息

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by