Regression analysis in Matlab

How can I fit a model to predict a response variable(y) for a set of regressor variables(i.e. x1, x2, x3, x4, x5, x6). Probably the model may or may not be linear one. The 'sample' of simulation data are:
x1=[263,268,273,278,283,288,293,298,303,308,313,318,263,268,273,278,283,288,293];
x2=[323,333,343,353,363,373,343,423,433,473,323,443,463,493,353,363,383,403,453];
x3[10,20,50,40,20,10,30,40,50,40,30,20,20,10,20,30,40,40,20];
x4[0.83,0.88,0.77,0.83,0.84,0.87,0.71,0.84,0.63,0.69,0.83,0.50,0.88,0.83,0.97,0.83,0.96,0.83,0.78];
x5[0.00101325,1.01325,0.000101325,0.101325,1.01325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.00101325,0.101325,1.01325,1.01325];
x6[0.05,0.06,0.06,0.07,0.08,0.07,0.09,0.1,0.06,0.05,0.04,0.08,0.09,0.1,0.07,0.06,0.06,0.08,0.05];
y=[257.98,262.99,268.05,273.17,278.35,283.59,288.9,294.29,299.75,305.3,310.93,316.64,258.22,263.23,268.29,273.4,278.58,283.82,289.12];
Please advice me.....
T. Aseri

1 个评论

If have Statistics Toolbox, see
doc regress
W/O,
doc slash % NB: the backslash operator '\'

请先登录,再进行评论。

回答(2 个)

dpb
dpb 2013-12-28
编辑:dpb 2013-12-29
Now having Matlab open and convenient, to amplify on the above...
Stat Toolbox ...
>> b1=regress(y',[x1' x2' x3' x4' x5' x6'])'
b1 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
Base Matlab backslash operator...
>> b2=[[x1' x2' x3' x4' x5' x6']\y']'
b2 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
>>
Remarkable similarity, wot? :)
Now, as you might expect, the Toolbox solution has some more interesting outputs...
>> [b,bint,r]=regress(y',[x1' x2' x3' x4' x5' x6']);
>> [b bint]
ans =
1.0102 0.9968 1.0237
-0.0005 -0.0085 0.0075
-0.0090 -0.0404 0.0223
-6.8343 -9.6170 -4.0516
-0.2722 -1.2170 0.6726
-13.6140 -37.7253 10.4972
>> sqrt(sum(r.*r)/length(r))
ans =
0.6206
>> [b,bint,r]=regress(y',[x1' x2' x4']);
>> [b bint]
ans =
b =
1.0095 0.9980 1.0210
-0.0024 -0.0091 0.0043
-7.2257 -9.7197 -4.7316
>> sqrt(sum(r.*r)/length(r))
ans =
0.6663
>> [b,bint,r]=regress(y',[x1' x4']);
>> sqrt(sum(r.*r)/length(r))
ans =
0.6786
>>
Looking at the intervals on the estimated coefficients, only a few of the variables are significant and a much more parsimonious model is possible w/ essentially same SSe as with blindly including all six.
Your mission, should you choose to accept it, is to complete the analysis and judiciously choose the overall best model. I have not considered or looked at any interaction terms you'll note.
ADDENDUM:
Oversight--the above doesn't include the intercept term. Write the model as
b1=regress(y',[ones(size(x1')) x1' x2' x3' x4' x5' x6'])'
or similarly to include it.

4 个评论

Dear dpb, I am really grateful to you, I will check with your suggestion and get back to you soon. Thank you for your support.
T. Aseri
BTW, if you do have the Statistics Toolbox, look at
doc regstats
that does much of the work of computing the ancillary statistics needed.
I do wish TMW would take the last step of providing a nicely formatted table as an option a la SAS or their ilk.
OBTW, NB: I neglected to included an intercept term in the preceding -- see the ADDENDUM to the previous answer. regstats handles this automagically but regress or the backslash operator need the model coded explicitly.
Yes I do have statistics tool box and I am working on it. I need to first learn it then I am able to choose best fitted model with minimum regressor via performing all need tests. Thank you for your precious support, I'll be in touch with you.
Here is the problem, I've entered all data in column format with equal no. of rows (6696):

请先登录,再进行评论。

dpb
dpb 2014-1-1
编辑:dpb 2014-1-1
NB: you created a Matlab dataset object Datas (BTW, altho it doesn't matter to Matlab what a variable name is, "data" are plural from the Latin, the singular is a "datum" point--common US English use has corrupted this terribly) so you must reference the values by the use of the dot to reference the various variables.
Use
Datas.Properties.VarNames
to see the variable names in the Datas object; then you get the actual data by using
Datas.VarName
where "VarName" is the name for the particular variable. Assuming the Excel sheet has headings of the names you've used above, something like
X=[ones(length(Datas),1) Datas.Ta Datas.Tabs ... Datas.eabs];
would appear to be correct. If there are no headers, then the default variable name 'Var1' would have been assigned and it will be an array in which it's somewhat simpler to reference --
b=regress(Datas.Var1(:,7), [ones(length(Datas),1) Datas.Var1(:,1:6)]);
Again, note that you must specify the constant term in the model explicitly with regress
Since you say you have the Statistics Toolbox, I recommend reverting to regstats to get the additional statistics you'll want/need to evaluate the quality of the model directly.
See
doc dataset % and related for details on using the dataset object
Alternatively, of course, you could use one of the other methods of reading in the file ( xlsread comes to mind) and return the data into a base Matlab array which would obviate all the dataset stuff which may not be of much real use for your present purposes.

类别

帮助中心File Exchange 中查找有关 Get Started with Curve Fitting Toolbox 的更多信息

产品

提问:

2013-12-28

编辑:

dpb
2014-1-1

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by