Interpolating Multivariate time series

4 次查看(过去 30 天)
Andrew
Andrew 2011-4-29
Hi all,
I'm trying to test a multivariate time series dataset which has 2536instances and 73 attributes with missing values(represented by ?) in some rows. I tried looking for interpolating the time series. But all I can see is for 2-3 attributes.
Can someone help me on how to interpolate this dataset?The dataset is in .data format.
Andrew

回答(3 个)

Andrew
Andrew 2011-4-29
To be clear,the dataset will be something similar to this
1/1/1998,0.8,1.8,2.4,2.1,10330,-55,0,0.
1/2/1998,2.8,3.2,3.3,2.7,10275,-55,0,0.
. . .
1/5/1998,2.6,2.1,1.6,1.4,?,?,?,0.58,0.
. . .
1/22/1998,2.8,3.6,?,?,4.6,10090,-40,0,0.
  4 个评论
Andrew
Andrew 2011-4-29
@Oleg
not really...all the rows have same number of colums with 73 attributes.
This is the dataset I'm talking about
http://archive.ics.uci.edu/ml/machine-learning-databases/ozone/onehr.data
it has total 75 columns 1 date+73 attributes+1 result column which says if it's ozone day or not.
Andrew
Andrew 2011-4-29
@andrei
I'm not sure on how to use TriScatteredInnterp. Would you mind helping with the code that does the interpolation and save that missing values in the .data file. I need to use that data to test the algorithm
Thanks

请先登录,再进行评论。


Richard Willey
Richard Willey 2011-4-29
Handling missing data is a very complicated topic.
There are a number of different approaches that you can use including listwise deletion, substitution models, multiple imputation, yada yada yada. Each approach has its own advantages and disadvantages.
For example, an approach based on substitution (regression substitution, interpolation, what have you) will give you a complete data set to work with, however, this new data set is going to be biased. (As a simple example, supposed that you use a regression substitution model to estimate plausible values for your missing data point. Later on, you fit a regression model to your [complete) data set and report an R^2...)
Alternatively, an approach based on listwise deletion won't [necessarily] run into the same problems with bias, however, you will have issues with loss of statistical power.
I took a quick look at the data set in question. Two observations.
1. You are missing large blocks of data - this is going to cause some real problems for interpolation based techniques
2. Your data doesn't appear to be Missing Completely At Random or even Missing at Random
Personally, I would start with listwise deletion...

Andrew
Andrew 2011-4-30
I guess I can't delete the missing values..
How do we interpolate that with interp1???Can I use this to interpolate the above dataset?
I've read somewhere in the matlab works saying, yi = interp1(Y,xi) assumes that x = 1:N, where N is the length of Y for vector Y, or size(Y,1) for matrix Y.
yi = interp1(x,Y,xi,method) interpolates using alternative methods:
But then, how does it know what dataset to use??when I load dataset using "load onehr.data",it says unknown value '?'...
Can someone help me??

类别

Help CenterFile Exchange 中查找有关 Interpolation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by