Groupby Date and Categorical Column on Tables
6 次查看(过去 30 天)
显示 更早的评论
Suppose that I have 5 columns in my table as Stock, Date, Open, High, Low, Close. Stock is a string column and Date is in Date format. Suppose that same stock can repeat on same day.
How can I count of unique stocks on each Day? That would be table with number of unique stock counts against each date.
Given that each stock can have mutiple observation on same day how can I group by both Date and Stock to get average Open price for each stock on each day.
23 个评论
Furqan Hashim
2020-8-28
Iam new to MATLAB I've used the function that you mentioned. I got an error
Undefined function or variable 'groupsummary'.
I from a Python background so what I can guess I need some module or library to work with the function you mentioned.
Apart from that I tried the following piece of code
mean = varfun(@mean, data,'GroupingVariables',{'Stock', 'Date'},...
'InputVariables','Open');
I got an error
Error using tabular/varfun (line 206)
Not enough input arguments.
Adam Danz
2020-8-28
Looks like you didn't fill in the Matlab release you're using when you wrote your questions (see the "Release" field to the right). groupsummary was released in r2018a. It doesn't require any toolboxes (aka, libraries in pythonese).
Adam Danz
2020-8-28
编辑:Adam Danz
2020-8-28
I filled in your release info.
When I test your solution, it works well!
Perhaps you could show us a sample of your input using
head(data)
Avoid using mean as a variable name since it's a very commonly used function.
data = table(['A';'B';'A';'A';'B'], datetime('now')-days([1;2;1;2;2]), rand(5,1)*100, 'VariableNames', {'Stock','Date','Open'})
% data =
% 5×3 table
% Stock Date Open
% _____ ____________________ ______
% A 27-Aug-2020 16:27:51 81.472
% B 26-Aug-2020 16:27:51 90.579
% A 27-Aug-2020 16:27:51 12.699
% A 26-Aug-2020 16:27:51 91.338
% B 26-Aug-2020 16:27:51 63.236
mu = varfun(@mean, data,'GroupingVariables',{'Stock', 'Date'},'InputVariables','Open')
% mu =
% 3×4 table
% Stock Date GroupCount mean_Open
% _____ ____________________ __________ _________
% A 26-Aug-2020 16:27:51 1 91.338
% A 27-Aug-2020 16:27:51 2 47.086
% B 26-Aug-2020 16:27:51 2 76.908
I've also verified that the r2017b documentation contains the same inputs and name-value options.
https://www.mathworks.com/help/releases/R2017b/matlab/ref/varfun.html
Furqan Hashim
2020-8-28
编辑:Furqan Hashim
2020-8-28
I did exactly what you did. I am still facing the same error. see attached screenshot.
Adam Danz
2020-8-28
I wonder if you are shadowing the mean function.
What's the result of
which mean -all
and
which('mean','in','varfun')
Furqan Hashim
2020-8-29
See the results below that you asked
>> which mean -all
C:\Program Files\MATLAB\R2017b\toolbox\matlab\datafun\mean.m
C:\Program Files\MATLAB\R2017b\toolbox\matlab\timefun\@duration\mean.m % duration method
C:\Program Files\MATLAB\R2017b\toolbox\matlab\timefun\@datetime\mean.m % datetime method
C:\Program Files\MATLAB\R2017b\toolbox\matlab\bigdata\@tall\mean.m % tall method
C:\Program Files\MATLAB\R2017b\toolbox\distcomp\parallel\@codistributed\mean.m % codistributed method
C:\Program Files\MATLAB\R2017b\toolbox\distcomp\gpu\@gpuArray\mean.m % gpuArray method
C:\Program Files\MATLAB\R2017b\toolbox\finance\ftseries\@fints\mean.m % fints method
C:\Program Files\MATLAB\R2017b\toolbox\mbc\mbctools\@sweepset\mean.m % sweepset method
C:\Program Files\MATLAB\R2017b\toolbox\stats\stats\@ProbDistUnivParam\mean.m % ProbDistUnivParam method
C:\Program Files\MATLAB\R2017b\toolbox\matlab\timeseries\@timeseries\mean.m % timeseries method
>> which('mean','in','varfun')
C:\Program Files\MATLAB\R2017b\toolbox\matlab\datafun\mean.m
Furqan Hashim
2020-8-29
I am able to group by with 1 variable
mu = varfun(@mean, data,'GroupingVariables','Stock','InputVariables','Open')
mu =
2×3 table
Stock GroupCount mean_Open
_____ __________ _________
A 3 60.182
B 2 45.716
Adam Danz
2020-8-29
编辑:Adam Danz
2020-8-29
Thanks for the providing the mean function info. It doesn't appear to be the problem.
I just tested the demo in r2016b and r2017b and it worked perfectly. The 2017b documentation clearly shows support for the cell array of GroupingVariables.
Furqan Hashim
2020-8-29
I am testing code on sample data that you generated using
data = table(['A';'B';'A';'A';'B'], datetime('now')-days([1;2;1;2;2]), rand(5,1)*100, 'VariableNames', {'Stock','Date','Open'})
Still getting the error.
Adam Danz
2020-8-29
Ok, let's dig a little deeper.
Open varfun in Matlab r2017b
open varfun
put a breakpoint on line 206 which should start with (some parts removed)
[ ] = a.table2gidx(groupVars)
Run your code, and then show us the results of the groupVars.
Furqan Hashim
2020-8-29
Ok, I'll do that but 1 thing that amazes me is that when I open up varfun using
open varfun
A function named varfun is opened that contains only 183 lines of code.
But when I do click on line 206 in mentioned error it gives stops me from making any changes as the file is read-only
Adam Danz
2020-8-29
When I open varfun in r2017b, the file is 509 lines long.
Are you sure you're using 17b? Is the varfun file the official Mathworks file containing the copyright line below?
% Copyright 2012-2016 The MathWorks, Inc.
Furqan Hashim
2020-8-29
编辑:Furqan Hashim
2020-8-29
See the attached ouptut after placing break point on line 206.
The error pops up from 17b as the function that opens after clicking on line 206 of error is the function with 509 lines of code.
Furqan Hashim
2020-8-29
编辑:Furqan Hashim
2020-8-29
Up unitl break point everything is fine when I run next line I get an error
Error using tabular/varfun (line 206)
Not enough input arguments.
Adam Danz
2020-8-29
So, when you place the break point on line 206 within the function that contains 509 lines of code, after running the varfun line with our demo table, what are the values of groupvars input?
I got a little lost from the previous two comments. Was there supposed to be an attachment? What's the other varfun file that you were looking at -- is that a custom function?
Furqan Hashim
2020-8-29
See attached lines below which tells the ouput of group vars
>> mu = varfun(@mean, data,'GroupingVariables',{'Stock', 'Date'},'InputVariables','Open')
206 [group,grpNames,grpRowLoc] = a.table2gidx(groupVars); % leave out categories not present in data
K>> groupVars
groupVars =
1 2
Adam Danz
2020-8-29
编辑:Adam Danz
2020-8-29
I'm baffled. That's the output you should be getting and you didn't get the error that you get when you're not running debug mode. It makes no sense to me. Why would you get an error when you run the function normally but not get the error in debug mode? It might have something to do with the other varfun file you found which I still have questions about - is that a Matlab file? Nevertheless, when you call varfun it must be using the correct file since its breaks on line 206 within the 509-lined file, right?
Furqan Hashim
2020-8-29
Yes it is using the correct file the one with 509 lines of code.
I think groupVars is the input being given in line 206 so when groupVars is passed to a.table2gidx it would give error
Adam Danz
2020-8-29
编辑:Adam Danz
2020-8-29
Right, but that function only has 1 input and it's the correct input but the error message you shared was "Not enough input arguments", which doesn't occur on my end.
From this point I think your options are
- Install the updates to your current release and try again.
- Use a different machine / installation of matlab / or maybe even a newer release. If the updates didn't fix the problem, something seems to be wrong locally since none of these issues occur on my vs of r2017b update 9.
- Make a tech support request and point to this thread within the request so they can see what troubleshooting you've already done.
I would be interested in hearing the final solution and cause of the problem if you get to the bottom of this. Note that the error you reported in the answer below also doesn't match the error I got on the demo data in r2017b (update 9). So something's fishy.
回答(1 个)
Mohammad Sami
2020-8-29
Perhaps you can try the alternative below.
if true
[G,tabout] = findgroups(data(:,{'Stock','Date'}));
tabout.avgOpen = splitapply(@mean,data.Open,G);
end
4 个评论
Furqan Hashim
2020-8-29
The above answer leads to this error.
Error using left (line 13)
if not cellstr, string can only be a single row or empty!
Error in istable (line 7)
if strcmp(left(name,1),'`') && strcmp(right(name,1),'`')
Error in findgroups (line 129)
if istable(tOutTemplate{i})
Adam Danz
2020-8-29
编辑:Adam Danz
2020-8-29
I'm have some doubts that you're using r2017b. That is not the error message you should be getting in 17b with Mohammad's code if you're running it on the demo table "data" that we've been using.
What's the first line of the output from the command below?
ver
For example, I'm testing this on "MATLAB Version: 9.3.0.948333 (R2017b) Update 9" and the error message does not match the error message you shared.
Adam Danz
2020-8-29
编辑:Adam Danz
2020-8-29
The only difference between your and my versions is the update #9. You could easily update your version for free by pressing the little bell icon in the upper right of the main window. I can't promise that will affect this problem but it's a good idea to get the updates anyway.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)