Regression/ Ordinary Least squares on a custom equation

10 次查看(过去 30 天)
I am trying to model the relationship between Load & variables say X and (T - 1,2,3,4,5,6) according to the following equation:
Load = [ alpha(X) + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6] for X = 1 to 672
1) I have Load in the form of 15 minute interval data for a few months
2) X is a variable that is defined like this based on time:
Monday 00.00 am to 00.15 am = 1
Monday 00.15 am to 00.30 am = 2
.
.
Sunday 11.45 pm to 00.00 am = 672
Note:
This repeats again from 1 to 672 for the next week and is not a running number
T1 T2 T3 T4 T5 T6 are temperatures at each 15 min interval
Additional Info :
I can feed L, X, and T1 to T6. How can i perform regression on my equation to get coefficients alpha and B1 to B6. Observe B1 to B6 do not change with X but alpha does. So my regression output needs to be a vector of coefficients for Alpha, one for each X from 1 to 672 and a single value for B1 B2 B3 B4 B5 & B6 since they dont chage with X. I tries various ways and looked online.. All of them only say how to do this
Load = Alpha*X + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6
I have attached a subset of the data - about 8 weeks
  • Ok ! Let me go in detail. I have several months of load data for a chiller at 15 minute intervals. The assumption is that chiller load not only depends on temperature but also on time of week.
  • For ex, Lets say on a Wednesday at 10.00 - 10.15 am there is generally less occupancy so chiller load might be less than some other day with similar Outside air temperature. So the chiller load dependency is not just purely Outside temperature but also time of week.
  • The temperature at each interval is broken down into 6 components to get a piecewise continuous linear equation. (not important). So thats the T1 to T6 you see.
  • Then to incorporate time of week, we break a week into 672 15 minute intervals. The first X=1 starting at Monday 00.00 am to 00.15 am and so on till X = 672.
  • So the chiller load equation is modelled as:
[ Load = Alpha(function of time of week variable X) + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6
X = 1 to 672 ] where Alpha and B1 to B6 are regression coefficients
In a week there are 672, 15 minute intervals = 7 days * 24 * hours * 60 minutes / 15 minutes = 672 intervals
  • So I want to feed Load, X, T1 to T6 using several months of data. In the sample file we have 8 weeks of data.
  • In 8 weeks we will have 8 instances/datapoints of Monday 00.00 to 00.15 am (X=1) and so on. These are to be used to estimate alpha at X = 1. Similarily for X = 2 till 672. This is just a sample set. If you try to find a regression coefficient Alpha for each X using 8 weeks of data since you have only 8 datapoints for each 15 minute interval or X you will likely overfit alpha. I am not sure of this ..just FYI
  • In 8 weeks of data, you will have so many more data points to estimate B1 to B6 since these have no time of week or X dependency.
  • The load curve over time will look roughly like the +ve half of a sine curve
its based on this paper - If anyone is interested you can look into it - https://buildings.lbl.gov/publications/quantifying-changes-building
Again, Thank you all !
  6 个评论
Roop_T
Roop_T 2021-7-25
  • Ok ! Let me go in detail. I have several months of load data for a chiller at 15 minute intervals. The assumption is that chiller load not only depends on temperature but also on time of week.
  • For ex, Lets say on a Wednesday at 10.00 - 10.15 am there is generally less occupancy so chiller load might be less than some other day with similar Outside air temperature. So the chiller load dependency is not just purely Outside temperature but also time of week.
  • The temperature at each interval is broken down into 6 compenents to get a piecewise continuous linear equation. (not important). So thats the T1 to T6 you see.
  • Then to incorporate time of week, we break a week into 672 15 minute intervals. The first X=1 starting at Monday 00.00 am to 00.15 am and so on till X = 672.
  • So the chiller load equation is modelled as:
[ Load = Alpha(function of time of week variable X) + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6
X = 1 to 672 ] where Alpha and B1 to B6 are regression coefficients
In a week there are 672, 15 minute intervals = 7 days * 24 * hours * 60 minutes / 15 minutes = 672 intervals
  • So I want to feed Load, X, T1 to T6 using several months of data. In the sample file we have 8 weeks of data.
  • In 8 weeks we will have 8 instances/datapoints of Monday 00.00 to 00.15 am (X=1) and so on. These are to be used to estimate alpha at X = 1. Similarily for X = 2 till 672. This is just a sample set. If you try to find a regression coefficient Alpha for each X using 8 weeks of data since you have only 8 datapoints for each 15 minute interval or X you will likely overfit alpha. I am not sure of this ..just FYI
  • In 8 weeks of data, you will have so many more data points to estimate B1 to B6 since these have no time of week or X dependency.
  • The load curve over time will look roughly like the +ve half of a sine curve
its based on this paper - If you guys are interested you can look into it - https://buildings.lbl.gov/publications/quantifying-changes-building
Again, Thank you all !
Matt J
Matt J 2021-7-25
编辑:Matt J 2021-7-25
we have 8 weeks of data.
If the same parameters are to be used every week, then you can equivalently just average together Load data samples that were taken at the same time-of-week, reducing the fitting problem to just one week of data.
Load= mean( reshape(Load,672,[]) ,2);
Again, though, without further constraints on alpha, it is a trivial result. Just set all the B variables to zero and alpha(X)=Load.

请先登录,再进行评论。

回答(2 个)

Scott MacKenzie
Scott MacKenzie 2021-7-25
编辑:Scott MacKenzie 2021-7-25
This is probably too simple to be correct, but I'll toss it out there anyway. Admittedly, I haven't considered anything you written about time intervals, and such, because I think this is already present in the time variable, but I might be wrong.
Bottom line: You've got empirical data for eight variables (load, X or time, T1, T2, T3, T4, T5, and T6) and you want to build a model with one of the variables as the response variable and the other seven as predictors. Here's your model:
load = alpha*X+ b1*T1 + b2*T2 + b3*T3 + b4*T4 + b5*T5 + b6*T6
The script below generates a regression model using mvregress (with requires the Statistics and Machine Learning Toolbox):
f = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/694834/Data%20Subset%20for%20Matlab%20Central.xlsx';
T = readtable(f);
% dependent/response variable
X = T.load;
% predictor variables (Note: time is 'X' in the question)
Y = [T.time, T.t1, T.t2, T.t3, T.t4, T.t5, T.t6];
format longg;
beta = mvregress(X,Y)
beta = 1×7
4.85059311004729e-06 9.23431675549545e-05 4.65695454655777e-05 3.94649009791285e-05 2.61216493640209e-05 4.93472971474542e-06 1.84313279577997e-06
The seven model coefficients (alpha, b1, b2, etc.) are above. Visit the documentation for mvregress for other options you might want to explore. Good luck.
  3 个评论
Scott MacKenzie
Scott MacKenzie 2021-7-26
So, if your model requires a different alpha for each value of X, and X varies from 1 to 672 (or some other discrete value corresponding to a time interval), then you have 672 load equations, and, therefore, 672 models to build. But, in the data set, you only have about 8 rows for each value of X. So, you are trying to build a model with 7 predictors from only 8 data points. I played around with this a bit, but, I get errors such as "covariance is not positive-definite" or "insufficient data".
Roop_T
Roop_T 2021-7-26
True but not entirely. No.of data points available for predicting alpha = No of weeks of data you have. I have presented a subset here. I have about a year's worth of data so thats 52 weeks or 52 data points to predict alpha at each interval X. But, you can use data from all Xfor B1 to B6. The problem comes from the fact for alpha at X =1 you are using load data where X = 1 but for B1 to B6 you are using all the data but you want to predict both of them simultaneously.. which is the underlying programming complexity in this problem.

请先登录,再进行评论。


the cyclist
the cyclist 2021-7-25
It's definitely an interesting modeling problem. Here is a plot of your data, where I used errorbar to plot the mean and error of the mean.
chillerData = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/694834/Data%20Subset%20for%20Matlab%20Central.xlsx');
chillerData = chillerData(1:6048,:); % Only doing this step out of laziness, to get a multiple of 672
chillerLoad = chillerData.load;
chillerLoad = reshape(chillerLoad,9,672);
figure
errorbar(mean(chillerLoad),std(chillerLoad)/sqrt(size(chillerLoad,1)))
This does look close to sinusoidal (but I don't think only the positive portion?), so I think my first pass at a model would be one that varies sinusoidally in your X variable (scaled so that one cycle is 24 hours). And of course include the other terms.
I would not recommended doing averaging over the days, because you will then lose the ability of estimate the error. Just include all the data.
I would use fitnlm to do the fit. I can give more guidance on fitting the model if you need it.

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by