handling irregular observations. Maybe more progress needs to be made by Matlab team

3 次查看(过去 30 天)
Dear all,
Since in my analysis I use irregular time series observations that do not have standard frequency (like monthly, daily , yearly, quarterly) I was wondering how useful matlab can be in this case.
To give an example please take a look at the following link that displays how SAS (which I am not familiar with) can handle "automatically" such problems
I paste the table "Output 14.3.1 Measured Defect Rates"
1 13JAN1992 55
2 27JAN1992 73
3 19FEB1992 84
4 08MAR1992 69
5 27MAR1992 66
6 05APR1992 77
7 29APR1992 63
8 11MAY1992 81
9 25MAY1992 89
10 07JUN1992 94
11 23JUN1992 105
12 11JUL1992 97
13 15AUG1992 112
14 29AUG1992 89
15 10SEP1992 77
16 27SEP1992 8
we have irregular observations and after the interpolation we get monthly averages :
Obs date defects
1 JAN1992 59.323
2 FEB1992 82.000
3 MAR1992 66.909
4 APR1992 70.205
5 MAY1992 82.762
6 JUN1992 99.701
7 JUL1992 101.564
8 AUG1992 105.491
9 SEP1992 79.206
I had a discussion with Oleg regarding one of my previous questions
on how to obtain monthly averages when I have irregular observations. If I apply the approach of Oleg half the values in the output matrix interpData{b} are the same as the original input matrix A. But as you can see from the second table above, none of these values are the same as those of the first table.
is it possible to apply something similar as in the case of SAS program?. If not, then it is a pity that such a powerful program like Matlab is less better than SAS in this domain of converting irregular time series observations to other frequencies.
Thank you
  10 个评论
salva
salva 2012-8-6
well, I have these data and what I know is that these values represent either a 4,5,6,8,or 9 week average and from these values I have to obtain (via interpolation via weighted averages?)estimated monthly averages.I would be grateful to you if you give me some guidelines on how to obtain estimated monthly averages.
I have no other way of solving this problem apart from asking you, guys
thank you
per isakson
per isakson 2012-8-6
编辑:per isakson 2012-8-6
I'll like to pose a question. Assume you have bimonthly data
Jan&Feb 17
Mar&Apr 71
May&Jun 43
and I claim that the "best" monthly averages are
Jan 17
Feb 17
Mar 71
Apr 71
May 43
Jun 43
I guess you don't agree, but what arguments would you use to convince me that there are "better" estimates?
There is no magic trick!

请先登录,再进行评论。

采纳的回答

Oleg Komarov
Oleg Komarov 2012-8-5
编辑:Oleg Komarov 2012-8-6
I gave a look at SAS and honestly I don't understand how they got those values!
My approach was to take intra-month averages (I tried to interpret SASs method) and then interpolate them:
A = {
1 '13JAN1992' 55
2 '27JAN1992' 73
3 '19FEB1992' 84
4 '08MAR1992' 69
5 '27MAR1992' 66
6 '05APR1992' 77
7 '29APR1992' 63
8 '11MAY1992' 81
9 '25MAY1992' 89
10 '07JUN1992' 94
11 '23JUN1992' 105
12 '11JUL1992' 97
13 '15AUG1992' 112
14 '29AUG1992' 89
15 '10SEP1992' 77
16 '27SEP1992' 82}
% Convert dates to serial dates and store with data in a double matrix
data = [datenum(A(:,2),'ddmmmyyyy') cat(1,A{:,3})];
% Retrieve month year day
[yy mm dd] = datevec(data(:,1));
% Create aggregation subs for accumarray
subsr = repmat((yy-yy(1))*12 + mm-mm(1) + 1,2,1);
subsc = repmat(1:size(data,2),size(data,1),1);
% Take averages
avgData = accumarray([subsr subsc(:)], data(:),[],@nanmean);
% Interpolate
xi = datenum(1992,1:9,1);
intData = interp1(avgData(:,1),avgData(:,2),xi,'linear','extrap')
% Also, direct interpolation without averaging
intData2 = interp1(data(:,1),data(:,2),xi,'linear','extrap');
Plot
plot(data(:,1),data(:,2),'-db',xi,intData,'--om',xi,intData2,'-.+r')
axis tight
grid on
set(gca,'Xtick',xi)
datetick('x','mmm yy','keepticks')
legend('your data','interpolation of averages','direct interpolatio','location','NorthWest')
I feel a clarification is needed in response to salva's comments:
I don't know how many times I already said that, but manipulating data is dodgy. Even more the way SAS accomplishes that, which is not CLEAR from the link.
If you're doing research in finance/economics and you manipulate your data because you need it at certain points in time (at the beginning of the month) it's gonna already be an artificial result, but acceptable.
Do you think SAS is fancy because it changes ALL the values, well I assure SASs power isn't that.
MATLAB may lack some functions, but nobody stops you from writing your own and sharing it on the FEX.
MATLAB is not just a program but a programming language and it's not limited to statistics!
So yes, SAS could be more suited for statistical analysis because it has more embedded functions.
  8 个评论
Oleg Komarov
Oleg Komarov 2012-8-6
First of all you have to quantify how much of the population you lose if you discard completely the irregular series and the bi-monthly. Decide then which series to keep.
Then, I would suggest to apply some selection rules, a very standard approach. Filter out from the analysis those series which do not pass the selection rules, i.e. those which have very irregular spacing in time. How to decide about the rules, you should refer to literature that has already approached your type of analysis/data.
You can aggregate the monthly data to the bi-monthly frequency, that wouldn't impact your results as much as would the interpolation.
salva
salva 2012-8-6
Hi Oleg,
Thank you for your reply. I think that for my purposes it would be convenient to focus only in the case of transforming bimonthly to monthly data. Specifically, What I am asking is how I can modify your approach that you proposed here
when we take into account that the months do not have the same length. Actually, I have opened a new question for this purpose here
I think that this the most interesting to me at the moment
cheers

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Descriptive Statistics and Visualization 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by