How to datasample exponential data without losing the exponential decay?

1 次查看(过去 30 天)
Hi all!
So this is the question:
I have a Table with one column (std_spk_avg, attached). This column has 400 numbers. The data follow exponential distribution, so when i normally resample using 'resample' function in matlab to obtain 1000 iterations, i lose the exponential decay in each iteration...
How can i code with this function so as not to lose the exponential decay in my 1000 iterations?
Thanks you all in advance :)

采纳的回答

Star Strider
Star Strider 2024-8-16
Your data are not exponentially distributed, however they are convincingly lognormally distributed.
They do not have a corresponding independent variable (for example a time vector), however it is straightforward to create one. You can then use it to sample your data.
Other than that, what exactly do you want to do with them?
Try this —
load('std_spk_avg.mat')
% std_spk_avg
L = numel(std_spk_avg)
L = 400
idxv = (1:L).';
figure
plot(std_spk_avg)
figure
histfit(std_spk_avg, 100, 'lognormal')
idxrs = sort(randsample(idxv, 50), 'ascend') % Randomly Sample The Index Vector
idxrs = 50x1
1 15 27 32 41 53 75 84 104 111
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
figure
plot(idxrs, std_spk_avg(idxrs)) % Plot The Rsult
Sampling the index vector and then plotting the sampled data against it, preserved the exponential character of your original data.
.
  4 个评论
Sara Woods
Sara Woods 2024-8-17
Thanks a lot! This is exactly what I needed! Excuse my explanations as I'm a total beginner
Star Strider
Star Strider 2024-8-17
As always, my pleasure!
No worries! I was initially a bit confused as to what you intended by ‘exponentially distributed’, so I added the distribution to my answer.

请先登录,再进行评论。

更多回答(2 个)

Les Beckham
Les Beckham 2024-8-16
编辑:Les Beckham 2024-8-16
I guess I don't understand what problem you are seeing. resample seems to work just fine here (see below). I assumed that your original data was sampled with a period of one second. What are you "losing"?
Note that I would not describe this data as having an "exponential decay".
load('std_spk_avg.mat')
numel(std_spk_avg)
ans = 400
plot(std_spk_avg)
grid on
ts = timeseries(std_spk_avg);
t = linspace(0, numel(std_spk_avg)-1, 1000);
rs = resample(ts, t);
rs.Name = 'std_spk_avg resampled'
timeseries Common Properties: Name: 'std_spk_avg resampled' Time: [1000x1 double] TimeInfo: tsdata.timemetadata Data: [1000x1 double] DataInfo: tsdata.datametadata
plot(rs)
grid on
rs
timeseries Common Properties: Name: 'std_spk_avg resampled' Time: [1000x1 double] TimeInfo: tsdata.timemetadata Data: [1000x1 double] DataInfo: tsdata.datametadata
  1 个评论
Sara Woods
Sara Woods 2024-8-16
Thanks for the quick answer! I think i am not making myself clear, I'm a new MatLab user so excuse me. In the Table I have 400 spikes following an "exponential decay".
From these 400 spikes, I would like to obtain x1000 resampled versions, regardless of time (is it possible? anyway, I attached the original time table as std_time).
This is in the end having a table with 1000 columns displaying 400 spikes each.
Thanks a lot again, waiting for your recommendations :)

请先登录,再进行评论。


William Rose
William Rose 2024-8-16
@Star Strider and @Les Beckham have provided excellent suggestions.
You want a matrix that is 400x1000, where the 1000 columns are "resampled" versions of the orignal data. You say the data have an exponential distribution, and you say that you do not want to lose the exponential decay when you resample. Is there a possible confusion about the distinction between exponential distribution and exponential decay? Is there possible confusion about resampling versus reshuffling?
Matlab's resample() keeps the time order of the data the same. It samples the data faster or slower than the original sampling. I think this is not what you want, since it would lead to resampled data with more (if faster) or less (if slower) than 400 samples.
Reshuffling is used in many machine learning applications and in some statistical testing. It will preserve the number of samples and will destroy any time correlations in the data.
Neither reshuffling nor resampling will alter the distribution of the data. The distribution is independent from the time ordering. A histogram of the data reveals the distribution of the sample. @Star Strider plotted the distribution of your data, and of the best-fit lognormal distribution. His plot shows that the sample histogram looks more like a lognormal distribution than an exponential distribution. I suspect that when you said your data has an exponential distribution, you meant to say that the data shows exponential decay with time (plus random noise).
This code reshuffles the data. After reshuffling, it no longer has exponential decay with time. But the distribution is unchanged.
load('std_spk_avg');
N=length(std_spk_avg);
y=zeros(N,1000);
for i=1:1000
y(:,i)=std_spk_avg(randperm(N));
end
If the code above works as intended, then the mean of each column of y should equal the mean of std_spk_avg, and the std.dev. of each column of y should equal the st.dev. of std_spk_avg.. Let's check and compare:
yMean=mean(y);
yStd=std(y);
disp([mean(std_spk_avg),min(yMean),max(yMean)])
0.3559 0.3559 0.3559
disp([std(std_spk_avg),min(yStd),max(yStd)])
0.0771 0.0771 0.0771
Looks like it works as intended.
  3 个评论
William Rose
William Rose 2024-8-17
"this is what I need but without losing the exponential decay in the first spikes, as i'll perform further analysis exactly on the decay, is there any option?"
Can you explain what you ar doing with the decay part and why ou want to shuffle ponts? If I understand your analysis goals I might understand better.
By the way, you say you have 400 spikes. In that case, each data point must represent some feature of each spike. It cannot be the time of each spike, or the inter-spike interval, because also provided a time vector which increases linearly in 400 steps from 0 to 100. Do the y-values indicate spike height, or width, or area under each spike, or some other feature? Also, the time base you have provided implies that the spikes occur exactly once every 0.250 seconds. Then these spikes are unlike spike trains I have worked with (1, 2, 3).
Anyway, back to your question, you could simply keep the first 150 points as is, and shuffle the remaining 250, as follows:
load('std_spk_avg'); load('std_time');
N=length(std_spk_avg);
M=150; % points to keep un-shuffled
y=zeros(N,1000);
for i=1:1000
y(:,i)=[std_spk_avg(1:M); std_spk_avg(M+randperm(N-M))];
end
Plot first 6 columns
figure
for i=1:6
plot(std_time,y(:,i)); hold on
end
xlabel('Time (s)'); grid on; hold off
Or you could shuffle the last 250 points, as above, but for the first 150 points, shuffle them with a 3 or 5 or 6 or 10-point-wide window:
M1=150; % points in exponential decay portion
M2=5; % width of narrow shuffle; M1/M2 should be integer
y=zeros(N,1000);
for i=1:1000
for j=1:M1/M2
k=(j-1)*M2;
y(k+1:k+M2,i)=std_spk_avg(k+randperm(M2));
end
y(M1+1:N,i)=[std_spk_avg(M1+randperm(N-M1))];
end
Plot columns 1-6:
figure
for i=1:6
plot(std_time,y(:,i)); hold on
end
xlabel('Time (s)'); grid on; hold off
As I said before, if you explain your analysis goals in more detail, you might get some additional useful comments. Another example of the use of shuffling in data analysis is simulation modeling analysis, as described in these papers:
1. Borckardt JJ, Nash MR (2014). Simulation modeling analysis for small sets of single-subject data collected over time. Neuropsychol Rehab 24: 492-506.
2. Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil, PM (2008). Clinical practice as natural laboratory for psychotherapy research: A guide to case-based time-series analysis. American Psychologist, 63(2), 77–95.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by