Regular expressions on uint8 or single byte characters
2 次查看(过去 30 天)
显示 更早的评论
I have a 200 MB text file encoded in UTF-8. My maximum array size is around 350 MB, so I can safely read it in using fread('filename','*uint8'). For using regular expressions, I need to turn this into a char array, which blows up the array size by at least a factor of two (depending on encoding, but for my application I can ignore all fancy characters), and thus leads to an "out of memory" error.
I wrote some code that breaks up the original array, so that the matching of the regular expressions works on smaller chunks, but I am still wondering: Can I somehow run regular expressions on the uint8 array? Or is there a char-like variable type that only uses 1 byte per character?
5 个评论
dpb
2013-8-26
Instead of 'unit8', try 'uchar' Not sure it'll help but it is at least a character class, not an integer.
Cedric
2013-8-27
编辑:Cedric
2013-8-27
Actually, it is simpler to ask what you are trying to match instead of the pattern (copy/paste of chunk of file content or string, and an explanation of what you want to extract). With a little luck, we can perform this using STRFIND (which works on uint8 arrays) or some numeric test on uint8's.
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!