Convert cell array of strings to unicode quickly

5 次查看(过去 30 天)
I have an array of approximately 10M strings, and I'm interested in converting each string to its unicode values. Is there a quick, one-line way to convert the whole string array into numeric values? Ideally, I'd love a solution like this:
numeric_matrix = double(string_array);
But of course double (and unicode2native) does not support cells. So my current solution is to loop through the string array:
for ii = 1:length(string_array)
numeric_matrix(ii,:) = double(string_array{ii});
end
Unfortunately this for-loop solution is very inefficient. It can take upwards of 10 minutes for very large numbers of strings. I tried googling this but didn't see anything better. Is there a simpler, faster way to do this, ideally in one line?

采纳的回答

Walter Roberson
Walter Roberson 2016-2-2
Try
numeric_array = cellfun(@uint16, stringarray);
Try it on a smaller subset first as I do not know how the timing would compare. It should have the advantage of not needing to change the internal representation.
  3 个评论
Guillaume
Guillaume 2016-2-2
As far as I understand, matlab native encoding is not unicode but whatever is your system locale, so converting the string to double (or uint16) may not convert it to unicode unless your locale is also unicode. You would have to call native2unicode on the strings to be sure.
Most likely your cellfun is slower than a loop because you're using an anonymous function to perform your extra operation. Anonymous function calls have a significant overhead in matlab.
Greg
Greg 2016-2-2
Thanks. I'm not interested in the unicode values per se. I just wanted a way to turn a string into a (hopefully) unique numeric value. But that's good to know about unicode.
And thanks for mentioning the anonymous function. That's probably what's happening!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Type Conversion 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by