How to datasample exponential data without losing the exponential decay?
5 次查看(过去 30 天)
显示 更早的评论
Hi all!
So this is the question:
I have a Table with one column (std_spk_avg, attached). This column has 400 numbers. The data follow exponential distribution, so when i normally resample using 'resample' function in matlab to obtain 1000 iterations, i lose the exponential decay in each iteration...
How can i code with this function so as not to lose the exponential decay in my 1000 iterations?
Thanks you all in advance :)
0 个评论
采纳的回答
Star Strider
2024-8-16
Your data are not exponentially distributed, however they are convincingly lognormally distributed.
They do not have a corresponding independent variable (for example a time vector), however it is straightforward to create one. You can then use it to sample your data.
Other than that, what exactly do you want to do with them?
Try this —
load('std_spk_avg.mat')
% std_spk_avg
L = numel(std_spk_avg)
idxv = (1:L).';
figure
plot(std_spk_avg)
figure
histfit(std_spk_avg, 100, 'lognormal')
idxrs = sort(randsample(idxv, 50), 'ascend') % Randomly Sample The Index Vector
figure
plot(idxrs, std_spk_avg(idxrs)) % Plot The Rsult
Sampling the index vector and then plotting the sampled data against it, preserved the exponential character of your original data.
.
4 个评论
Star Strider
2024-8-17
As always, my pleasure!
No worries! I was initially a bit confused as to what you intended by ‘exponentially distributed’, so I added the distribution to my answer.
更多回答(2 个)
Les Beckham
2024-8-16
编辑:Les Beckham
2024-8-16
I guess I don't understand what problem you are seeing. resample seems to work just fine here (see below). I assumed that your original data was sampled with a period of one second. What are you "losing"?
Note that I would not describe this data as having an "exponential decay".
load('std_spk_avg.mat')
numel(std_spk_avg)
plot(std_spk_avg)
grid on
ts = timeseries(std_spk_avg);
t = linspace(0, numel(std_spk_avg)-1, 1000);
rs = resample(ts, t);
rs.Name = 'std_spk_avg resampled'
plot(rs)
grid on
rs
William Rose
2024-8-16
You want a matrix that is 400x1000, where the 1000 columns are "resampled" versions of the orignal data. You say the data have an exponential distribution, and you say that you do not want to lose the exponential decay when you resample. Is there a possible confusion about the distinction between exponential distribution and exponential decay? Is there possible confusion about resampling versus reshuffling?
Matlab's resample() keeps the time order of the data the same. It samples the data faster or slower than the original sampling. I think this is not what you want, since it would lead to resampled data with more (if faster) or less (if slower) than 400 samples.
Reshuffling is used in many machine learning applications and in some statistical testing. It will preserve the number of samples and will destroy any time correlations in the data.
Neither reshuffling nor resampling will alter the distribution of the data. The distribution is independent from the time ordering. A histogram of the data reveals the distribution of the sample. @Star Strider plotted the distribution of your data, and of the best-fit lognormal distribution. His plot shows that the sample histogram looks more like a lognormal distribution than an exponential distribution. I suspect that when you said your data has an exponential distribution, you meant to say that the data shows exponential decay with time (plus random noise).
This code reshuffles the data. After reshuffling, it no longer has exponential decay with time. But the distribution is unchanged.
load('std_spk_avg');
N=length(std_spk_avg);
y=zeros(N,1000);
for i=1:1000
y(:,i)=std_spk_avg(randperm(N));
end
If the code above works as intended, then the mean of each column of y should equal the mean of std_spk_avg, and the std.dev. of each column of y should equal the st.dev. of std_spk_avg.. Let's check and compare:
yMean=mean(y);
yStd=std(y);
disp([mean(std_spk_avg),min(yMean),max(yMean)])
disp([std(std_spk_avg),min(yStd),max(yStd)])
Looks like it works as intended.
3 个评论
Image Analyst
2024-8-17
@SARA CACCIATO SALCEDO Do you just want to interpolate an additional 600 samples in between the existing 400 samples?
William Rose
2024-8-17
"this is what I need but without losing the exponential decay in the first spikes, as i'll perform further analysis exactly on the decay, is there any option?"
Can you explain what you ar doing with the decay part and why ou want to shuffle ponts? If I understand your analysis goals I might understand better.
By the way, you say you have 400 spikes. In that case, each data point must represent some feature of each spike. It cannot be the time of each spike, or the inter-spike interval, because also provided a time vector which increases linearly in 400 steps from 0 to 100. Do the y-values indicate spike height, or width, or area under each spike, or some other feature? Also, the time base you have provided implies that the spikes occur exactly once every 0.250 seconds. Then these spikes are unlike spike trains I have worked with (1, 2, 3).
Anyway, back to your question, you could simply keep the first 150 points as is, and shuffle the remaining 250, as follows:
load('std_spk_avg'); load('std_time');
N=length(std_spk_avg);
M=150; % points to keep un-shuffled
y=zeros(N,1000);
for i=1:1000
y(:,i)=[std_spk_avg(1:M); std_spk_avg(M+randperm(N-M))];
end
Plot first 6 columns
figure
for i=1:6
plot(std_time,y(:,i)); hold on
end
xlabel('Time (s)'); grid on; hold off
Or you could shuffle the last 250 points, as above, but for the first 150 points, shuffle them with a 3 or 5 or 6 or 10-point-wide window:
M1=150; % points in exponential decay portion
M2=5; % width of narrow shuffle; M1/M2 should be integer
y=zeros(N,1000);
for i=1:1000
for j=1:M1/M2
k=(j-1)*M2;
y(k+1:k+M2,i)=std_spk_avg(k+randperm(M2));
end
y(M1+1:N,i)=[std_spk_avg(M1+randperm(N-M1))];
end
Plot columns 1-6:
figure
for i=1:6
plot(std_time,y(:,i)); hold on
end
xlabel('Time (s)'); grid on; hold off
As I said before, if you explain your analysis goals in more detail, you might get some additional useful comments. Another example of the use of shuffling in data analysis is simulation modeling analysis, as described in these papers:
1. Borckardt JJ, Nash MR (2014). Simulation modeling analysis for small sets of single-subject data collected over time. Neuropsychol Rehab 24: 492-506.
2. Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil, PM (2008). Clinical practice as natural laboratory for psychotherapy research: A guide to case-based time-series analysis. American Psychologist, 63(2), 77–95.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!