Is the time it takes for a database commit proportional to the amount of data?
2 次查看(过去 30 天)
显示 更早的评论
I'm having the following very perplexing and frustrating problem: I'm trying to programatically put large chunks of mixed-type data from multiple tab-delimited text files into an SQL Server database; because the data has various "landmines" and needs to be inserted in two distinct steps (one chunk gets fastinsert-ed into one table, then that table is queried for the new primary key (PK) values--which the DB automatically propagates into a second table--and the second table gets update-d with a PK-ordered second chunk), I've found it necessary to turn AutoCommit off, so that if/when problems arise in the second step, the first step can be rollback-ed; so as not to lose too much work, however, I have a commit placed at the completion of each file. So, the workflow looks like: dir for list of files to loop over; read/parse data from a file; open database connection and set AutoCommit to off; fastinsert chunk one; get new PK values; update with chunk two; check for success: if successful, commit, else, rollback (and add the name of the file to a list of "problem" files); close the database connection to try to enforce the commit (I added this step when I first detected that commit appeared not to be working); repeat with next file.
The problem is, rollback is rolling back everything, not just the changes since the last commit! I suspected a bug, so I tried to reproduce w/ a "minimal sample," i.e., insert one record, commit it, rollback, check state, but when I did this, commit works "as advertised." Thus, to "first order," the problem appears to be correlated to the size of the change that needs to be committed, e.g., I suspect that perhaps I'm issuing the rollback before the commit has had a chance to finish doing whatever it is that it does. (I know that chunk size is an issue on some level because the fastinsert-s take on the order of a minute or three and the update-s take on the order of 5 times that long. Of course, my "minimal sample" is rather less complicated than the full code, but really all the essential steps are reproduced, so I'm hard pressed to identify what my full code is doing differently...)
Anyway, two days in a row now I've set my code to chug away overnight, expecting to come in the next day and find everything moved, only to come in, find that it hit another landmine, rollback expecting only the pending file insertion to be undone, only to find that everything (we're talking over 220,000 58-column records this last time!) is lost--including me!
Any ideas as to what the hell is going on?
Thanks!
DG
0 个评论
回答(2 个)
Oleg Komarov
2011-8-6
It's highly inefficient and slow to use matlab to send data to SQL server. We used to process a database of 110e6 x 20 and we made a shy attempt with matlab at the beginning, since SSIS (SQL Server Integration Services) was a little bit too confusing. We soon discovered it was not feasible and after investing a week we built a routine in SSIS that looped through all the files.
Believe me you're still in time to change route.
2 个评论
Oleg Komarov
2011-8-11
Two things come to my mind:
* The data file (and eventually the log file) of the database is set to autogrowth (right click on the database from SSMS > properties > files) but the step is really small and whenever you insert some data it reaches the limit, it stops the execution, grows the file, resumes the execution and all over again. Look on the web, there are some rules of thumb on the web to set the optimal growth size and disable autoshrink.
* The java memory is not enough to support such an amount of data swap (I hope augmenting Java heap space form matlab can help)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Database Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!