Large streaming data direct to file
4 次查看(过去 30 天)
显示 更早的评论
Hi!,
I would like to setup a system to log months’ worth of financial json websocket data to a file.
- The json data coming in looks like this {"this": "that", "foo": [1,2,3], "bar": ["a", "b", "c"]}, and there is about 20 message per second.
- I did tests with FPRINTF writing directly to a .txt file. That works but the files get really big 2gb per day. Because there is not compression.
- I tested different SAVE formats ( '-v7' being by far the best) to save a new variable inside a .mat file every 10 mins. This was a little too slow to keep up with the stream of data coming in. Taking almost a second to save every 10 mins and it wouldn't be ideal to process it if I have to load a ton of different variables. But the file size looked to be very good. (http://undocumentedmatlab.com/blog/improving-save-performance)
- I tried the MATFILE declaration to write directly to file. But only could adjoin to the end of a file with '-v7.3' .mat files. Which makes the file a lot bigger then ‘-v7’ and still takes a little too long.
- I would like to have a file that uses good compression that I can write a new message to fast. Maybe HDF5 file format.?
I believe I need to serialize the data coming in and save it directly to a file in some kind of compressed way. But I'm not exactly sure how to do that.
- I read through this article and don't get exactly how to implement it. ( https://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data). Since this is older article is there a more up to date way.
- Do I use something like "h5write"? "getByteStreamFromArray"?
- After the file is created with months of data. How do I pull each message, one by one, to process it?
- Is this "Fast serialize/deserialize" in the file exchange the correct path?... I can't figure out how to use it.
Thank you!
Joe
0 个评论
回答(1 个)
Jan
2018-11-16
编辑:Jan
2018-11-16
You can create the text as chat vector by sprintf instead of fprintf and compress it in the RAM before writing them to disk: https://www.mathworks.com/matlabcentral/fileexchange/69388-mkzip . This should avoid the overhead of compressed MAT files.
Maybe it is just the disk access, which slows down the processing. Then try to use a SSD instead.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!