Table performance very slow
37 次查看(过去 30 天)
显示 更早的评论
I have used tables within a physics model that is solved by ode23. The performance is very slow, and in troubleshooting (using profiler) I found that the majority of the time is spent in various table functions.
The three functions table.subsasgnDot, table.subsref, table.subsref alone take approximately 30% of the execution time. Within those functions it seems to be variable name checking that takes the majority of the time.
The variable names in every table are all known at the start of the program and don't change. It seems like it would be much far efficient to check once rather than every pass through the loop.
This is for a simulation that takes several minutes to run each case, and would be used to call many cases. So, the slow performance is a significant problem.
I understand performance is better when the problem is vectorized. One of the subroutines can calculate 15,000 points in 10 sec if called as a vector, but takes 1 hr if called in a loop.
However, since this problem is being solved with ode23 it is being called in a loop unavoidably and unfortunately I used tables everywhere before discovering how slow they are.
Is there any way to improve performance without major rewriting to remove all use of the table class?
回答(4 个)
Oleg Komarov
2016-11-28
编辑:Oleg Komarov
2016-11-28
I have been using table() way before they were introduced into the core package, since de facto they are the ported version of the dataset() class from the Statistics Toolbox. I also noticed long time ago many limitations in terms of performance and functionality, and have logged feature enhancements with TMW.
To address the limitations of the table(), while waiting for the ufficial implementation of my enhancement requests, I created the tableutils(). Among the problems, you would be astonished to know that the disp() of a big table can literally freeze your pc until the next ice age (and I am not talking about the movies...). This is somethig that I fixed with a buffered disp method.
While my tableutils() do not address directly the problems in subsref/subsasgn, anyone is welcome to contribute to this effort to make the table() class better by submitting an issue or a Pull Request on Github.
0 个评论
Daniel Petrini
2016-10-5
编辑:dpb
2016-10-5
In my view: tables are very sporadic in perforance. Ranging from quick to very slow. I mean, do a clear and just >> table(). On mu 2016.b that can take many seconds. :-S I had to rewrite a (large) class based on tables to multiple vectors of same types. Performance is much more linear and trustworthy. Seems that the JIT does not know what to do with them? I wish Mathworks would post more about performance on these new data structures... In addition: the
<t=tic;my_class.insert_new_entry(...);toc(t)>
reported excellent times. Problem is that Matlab is "busy" and the output of toc(t) could take 2 sec to display (0.12 s)... What am I missing? I'm guessing it is some overhead in creating tables. i.e., table_1(1:5,my_col), creates a new table, and freezes...? Disclaimer: sitting on a 8 GB iCore7.
/Daniel Petrini, Stardots AB
2 个评论
Daniel Petrini
2016-10-6
My answer is probably not an answer, but rather a comment. Sorry. My first contribution to Matlab Answers.
Oleg Komarov
2016-11-28
The native table.disp() has a huge problem, and can freeze your pc for a long time. I implemented a buffered disp, that avoids this issue. See my answer below.
jbpritts
2016-11-24
I have Matlab 2016b. I can confirm that tables are terribly slow. Unless you really need it for heterogeneous data, then avoid them in any performance critical code. I will have to rewrite a fairly complicated section of code using legacy data structures. Matlab should address this extreme performance deficiency.
2 个评论
Image Analyst
2016-11-24
They are tremendously better and faster than cell arrays though, and use far less memory.
Oleg Komarov
2016-11-28
Internally, table() stores data in a cell array, where each column is a cell. So, your statement about speed and memory cannot be true, since there is additional overhead linked to VariableNames and matlab-coded subsref/subsasgn.
I do agree, that tables are more convenient.
Peter Perkins
2016-10-7
Byron, it's hard to make specific suggestions without knowing exactly what you're doing, but here are some thoughts.
Tables are best at managing data and doing vectorized operations. Based on your description, it sounds like you are probably doing scalar operations such as
t.Var(i) = x
in a loop. You've described your alternative as a complete rewrite to not use tables at all. But there is often a middle ground where you can find a localised scope in which you can pull some of the variables in a table out as, say, ordinary double vectors, do all the non-vectorized calculations on them in a scalar loop, and then assign back into the table. Sometimes you can even convert the table to a scalar struct and use the exact same syntax. Of course, separate variables or a scalar struct will not enforce equal number of rows, or provide a simple syntax for arbitrary rectangular selections, or the other things that tables are designed to do.
Hope this helps.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!