Moving skewness/k​urtosis/rm​s/autocorr​elation for time series. Fast Implementation.

20 次查看(过去 30 天)
Hello,
I want a really really fast implementation of computing a moving window skewness over my timeseries as I have to do this for a lot of different window sizes and the application is time critical (and similarly do this for other measures such as kurtosis, rms, autocorrelation etc. Here I focus on skewness.
I have an array of time series data state_timeseries (~9,880,000) and the corresponding time values.
I want to use a window size (~300,000 - 4,000,000) and a step size of (= 1% of corresponding window size = 3000 - 40000) to find the moving window skewness of this time series.
This is to be done for all window sizes from the following list:
window_size_list = floor(linspace(300000, 4000000, 200));
step_size_list = floor(window_size_list * (1 / 100));
For parallel computing I have 16 workers with ~130GB RAM and the GeForce RTX 2070 Super 8GB in size.
EDIT: Is it possible to get good speed gains if I make these calculations in C++ instead? (I don't know C++, but reading online makes me think this). Then get the generated moving window skewness timeseries back into matlab for further processing?
My Current Implementation:
Currently, I am simply looping over (both window sizes and for each window the moving window) as follows:
% Dummy data of equal length
state_timeseries = randn(1, 10000000);
time = 1: length(state_timeseries);
total_window_count = 200; % 200 as used above
window_size_list = floor(linspace(300000, 4000000, total_window_count));
step_size_list = floor(window_size_list * (1 / 100));
for k = 1: total_window_count
% Set window size and step size to calculate the corresponding EWS timeseries
window_size = window_size_list(k);
window_step = step_size_list(k);
% Generate EWS timeseries for particular window size
Data_skewness_timeseries{k} = sk_timeseries_func(time, state_timeseries, window_size, window_step);
end
function Data = sk_timeseries_func(time, state_timeseries, window_size, window_step)
% Use GPU if available
gpu_available = canUseGPU();
if gpu_available == 1
time = gpuArray(time);
state_timeseries = gpuArray(state_timeseries);
end
% Get the indices where windows start and the corresponding times at which they end.
% This is to specify the skewness of a window at the time value at the end of the window.
window_idx = 1: window_step: (length(time) - window_size + 1);
time_window_ends_idx = window_idx + window_size - 1;
time_window_ends = time(time_window_ends_idx);
sk_timeseries = zeros(1, length(time_window_ends));
for i_window = 1: length(window_idx)
timeseries_window_data = state_timeseries(window_idx(i_window): window_idx(i_window) + window_size - 1);
sk_timeseries(i_window) = skewness(timeseries_window_data);
end
Data.time_window_ends = time_window_ends;
Data.sk_timeseries = sk_timeseries;
end

回答(2 个)

Yatharth
Yatharth 2023-8-30
编辑:Yatharth 2023-8-30
Hi understand that you want to improve the speed of computing the moving window skewness for your time series data.
You can consider the following optimizations:
  1. Utilize Parallel Computing: Since you have 16 workers available, you can use MATLAB's Parallel Computing Toolbox to distribute the computation across multiple workers. This can significantly speed up the calculations. You can use the 'parfor' loop instead of the regular 'for' loop to parallelize the computation. However, keep in mind that the parallelization overhead might not be worth it for smaller window sizes.
  2. Preallocate Memory: Preallocating memory for the results can improve the performance. Instead of initializing 'sk_timeseries' as an empty array, you can preallocate it with the appropriate size using 'zeros' or 'NaN' values.
  3. Vectorization: Whenever possible, try to avoid explicit loops and use vectorized operations. This can significantly improve the execution speed. In your case, you can calculate the moving window skewness for all windows simultaneously using indexing and vectorized operations.
Here's an optimized version of your code incorporating these suggestions:
% Dummy data of equal length
state_timeseries = randn(1, 10000000);
time = 1:length(state_timeseries);
total_window_count = 200; % 200 as used above
window_size_list = floor(linspace(300000, 4000000, total_window_count));
step_size_list = floor(window_size_list * (1 / 100));
% Preallocate cell array for results
Data_skewness_timeseries = cell(1, total_window_count);
% Use GPU if available
gpu_available = canUseGPU();
if gpu_available == 1
time = gpuArray(time);
state_timeseries = gpuArray(state_timeseries);
end
parfor k = 1:total_window_count
% Set window size and step size to calculate the corresponding EWS timeseries
window_size = window_size_list(k);
window_step = step_size_list(k);
% Get the indices where windows start and the corresponding times at which they end.
% This is to specify the skewness of a window at the time value at the end of the window.
window_idx = 1:window_step:(length(time) - window_size + 1);
time_window_ends_idx = window_idx + window_size - 1;
time_window_ends = time(time_window_ends_idx);
% Calculate the moving window skewness using vectorized operations
sk_timeseries = arrayfun(@(idx) skewness(state_timeseries(idx:idx+window_size-1)), window_idx);
% Store the results in the cell array
Data_skewness_timeseries{k} = struct('time_window_ends', time_window_ends, 'sk_timeseries', sk_timeseries);
end
Regarding your question about implementing these calculations in C++, it is possible to achieve better performance by implementing the computation in a lower-level language like C++. MATLAB provides a way to call C++ code from MATLAB using the MATLAB Engine API. You can write the computationally intensive part of the code in C++ and then call it from MATLAB to get the generated moving window skewness time series back into MATLAB for further processing. This approach can potentially provide significant speed gains, especially for large datasets.
Here are some of the useful resources for you:
I hope this helps.

Steven Lord
Steven Lord 2023-9-15
Currently MATLAB does not have any functions for computing the moving skewness, kurtosis, rms, or autocorrelation. For a list of the moving statistics functions available in MATLAB, see the Moving Statistics section in the Descriptive Statistics category of the documentation. We have entered this into the enhancement database for consideration for a future release of MATLAB.

类别

Help CenterFile Exchange 中查找有关 Parallel Computing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by