Most efficient way to search in text arrays

15 次查看(过去 30 天)
Dear All,
I have a large array of textual information (model element IDs). In this array, I frequently have to find the index of particular IDs. What is the computationally most efficient way to do this? Store the IDs in a character array, cell array of characters or string array? What is the most efficient way for indexing when working with text?
Thanks for your help! Uwe

采纳的回答

Alain Kuchta
Alain Kuchta 2017-4-21
编辑:Alain Kuchta 2017-4-21
Here are two possible approaches to accomplish this:
1) Use strcmp and find with string array
This option is O(n) for each lookup; in the worst case, every string in ids will be checked in order to find query. You can also use a cell array of character vectors for ids, in my test a string array was slightly faster.
>> ids = ["M1","M2", "M3"];
>> query = "M2";
>> index = find(strcmp(ids, query) == 1)
index =
2
2) Use containers.Map with char arrays as keys and indices as values
This option is O(n) to setup, but O(1) for each lookup. Meaning that regardless of how many ids are in your map, looking up each one will take the same amount of time.
>> ids = {'M1', 'M2', 'M3'};
>> indices = 1:length(ids);
>> idMap = containers.Map(ids, indices);
>> query = 'M2';
>> index = idMap(query)
index =
2
Here is a performance comparison. At each size increment the average time to compute 500 random queries was measured for each approach. Each approach used the same set of queries at each size increment. In my case, for less than ~1000 elements, find with strcmp is faster. But as the number of elements grows, containers.Map is the clear winner.
  1 个评论
Walter Roberson
Walter Roberson 2017-4-21
The O(1) reference sounds as if containers.Map is using hashing -- which is a possibility but not one I see documented ?
True O(1) would tend to imply that it is using Perfect Hash, as regular hashes that can have collisions would have an O(n) or O(ln(n)) or O(n*ln(n)) term for the worst case as the table fills up.

请先登录,再进行评论。

更多回答(1 个)

Uwe Ehret
Uwe Ehret 2017-4-22
Dear Alain,
Thanks for your very helpful reply! The containers.Map option was new to me, but offers many very elegant and time-saving improvements to the program I am working on.
Uwe

类别

Help CenterFile Exchange 中查找有关 Dictionaries 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by