Find random values that match a specific criteria

1 次查看(过去 30 天)
I have a table with 2000 rows with the 8 columns collected from traffic data. However, the only columns I'm interested in are speed and time.
I have saved the time and speed columns as vectors. The time column format was in datetime but I have converted it to datenum to make it easier to work with.
time, speed
'2022-03-01 05:10:03', 55
.....
I want to find values of speed that are greater than the median speed value and then randomly keep only 20% of these values.
So far, I have attempted this:
speed_idx = find(speed > median(speed));% find the index of speed values > median speed
speed_idx_red = round((20/100)*length(speed_idx)); %keeping 20% of values > median speed
final_speed = speed(randperm(length(speed), speed_idx_red)); %randomised 20% of speed values
  1. How can I simplify this? I think I'm finding the wrong final_speed, as I should perform the randomization first and then keep 20% of those values, but I'm not sure how to do that.
  2. How can I find the time values that match the random speed values I've found?
Any help would be much appreciated!

采纳的回答

Fifteen12
Fifteen12 2022-12-4
编辑:Fifteen12 2022-12-4
Broken down:
speed = randi(80, 10, 1);
m = median(speed);
pool = speed(speed > median(speed));
len = length(pool);
final_speed = pool(randi(len, floor(len * 0.2), 1));
Find the time values:
time = randi(100, 10, 1);
final_time = time(ismember(speed, final_speed));
Note that this method can give you more values for final_time than for final_speed if there are duplicates in final_speed. You'll have to choose how to handle duplicates. If you don't care about duplicates, and just want the random values, you can use unique to strip random numbers away from speed before finding the indices of time.
  2 个评论
newb_matlab help
newb_matlab help 2022-12-4
Can the speed be simplified to:
speed_median = speed(speed > median(speed));
final_speed = speed_median(randi(length(speed_median),floor(length(speed_median) * 0.2),1));
Also, for the time, you were right. I have a lot of duplicates and so the time vector is longer than the speed vector. I'll think of some ways to find the exact time values that match the random speed values. Thanks!
Fifteen12
Fifteen12 2022-12-4
编辑:Fifteen12 2022-12-4
Yes this works, as long as when you're searching for the time indices you use speed and not speed_median, otherwise your indices will be off.
You can use a unique function to strip the time variables, for instance
speed = randi(100, 10, 1);
time = randi(100, 10, 1);
time = time(ismember(speed, unique(speed)));
speed = unique(speed);
This removes all the duplicates speeds from time as well as speed. I think you'll need to do this before you do your median check (and randomization), otherwise you might call values that no longer exist.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Dates and Time 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by