Custom List / Arbitrary Sequence Sort

版本 1.0.8 (53.6 KB) 作者: Stephen23

Sort a text array into the order of custom lists / arbitrary text sequences. Sort words using alphabets that are NOT in ASCII order!

关注

0.0

(0)

27.0 次下载

更新时间 2024/12/28

查看许可证

Summary

Sort the elements of a text array (string/cell of char/...) into the order of custom/arbitrary text sequences. Particularly useful for collation/sorting words of languages using the Latin, Greek, or Cyrillic alphabets. Inspired by MS Excel's "custom list" sorting feature, but extended with case-insensitive partial text matching using powerful regular expressions, for any number of sequences.

For example:

>> A = ["LargeBurger", "MediumCoffee", "SmallCoffee", "MediumBurger"];
>> sort(A) % for comparison
ans =  ["LargeBurger"  "MediumBurger"  "MediumCoffee"  "SmallCoffee"]
>> arbsort(A, ["small","medium","large"])
ans =  ["SmallCoffee"  "MediumBurger"  "MediumCoffee"  "LargeBurger"]

And of course the sorting itself can also be controlled:

ascending/descending sort direction
character case sensitivity/insensitivity
diacritic sensitivity/insensitivity
literal/regular-expression matching
whole/partial text matching

Alphabetic Sorting

ARBSORT is particularly useful for sorting text of languages for which ASCII/Unicode character-code order does not provide the correct alphabetic sort (e.g. text with diacritics or ligatures). ARBSORT does not provide the countless language-specific collation rules, but it does sort text into the order specified by the provided alphabet:

>> Ae = {'yo', 'os', 'la', 'ño', 'va', 'ni', 'de', 'ña'};
>> alfabeto = num2cell(['A':'N','Ñ','O':'Z']); % Spanish alphabet
>> arbsort(Ae, alfabeto)
ans =  {'de', 'la', 'ni', 'ña', 'ño', 'os', 'va', 'yo'}

Replacement Substrings

The sorting rules of some languages require certain characters to be replaced with (or considered equivalent to) other characters. For example, in German the eszett character "ß" is sorted as it was written as "ss", and in some circumstances vowels with umlauts are sorted as that vowel without an umlaut, suffixed with "e":

>> Ag = ["Goethe", "Goldmann", "Gurke", "Göbel", "Göthe", "Götz"]; % character code order
>> B1 = arbsort(Ag, ["ß";"ss"])                                    % DIN 5007 Variante 1
B1 = [ "Göbel", "Goethe", "Goldmann", "Göthe", "Götz", "Gurke"]
>> B2 = arbsort(Ag, ["ä","ö","ü","ß"; "ae","oe","ue","ss"])        % DIN 5007 Variante 2
B2 = [ "Göbel", "Goethe", "Göthe", "Götz", "Goldmann", "Gurke"]
>> Bg = arbsort(Ag, ["ß";"ss"], num2cell(['aä','b':'o','ö','p':'u','ü','v':'z'])) % Österreichische Sortierung
Bg = [ "Goethe", "Goldmann", "Göbel", "Göthe", "Götz", "Gurke"]

Examples

>> Ab = {'L', 'XS', 'S', 'M', 'XL', 'S', 'M', 'XL', 'XS', 'L'};
>> [Bb,Xb] = arbsort(Ab, {'XS','S','M','L','XL'})
Bb = {'XS', 'XS', 'S', 'S', 'M', 'M', 'L', 'L', 'XL', 'XL'}
Xb = [2,9,3,6,4,7,1,10,5,8]
>> Ac = ["medium_test", "high_train", "low_train", "high_test", "medium_train", "low_test"];
>> arbsort(Ac, ["train","test"], ["low","medium","high"])
ans = ["low_train", "low_test", "medium_train", "medium_test", "high_train", "high_test"]
>> Ad = ["test_three", "test_one", "test_ninetynine", "test_two"];
>> arbsort(Ad, @words2num) % download WORDS2NUM from FEX 52925. 
ans = ["test_one", "test_two", "test_three", "test_ninetynine"]

引用格式

Stephen23 (2025). Custom List / Arbitrary Sequence Sort (https://www.mathworks.com/matlabcentral/fileexchange/132263-custom-list-arbitrary-sequence-sort), MATLAB Central File Exchange. 检索时间: 2025/2/22.

MATLAB 版本兼容性

创建方式 R2010b

与 R2009b 及更高版本兼容

平台兼容性

Windows macOS Linux

致谢

参考作品: Natural-Order Row Sort, Natural-Order Filename Sort, Customizable Natural-Order Sort, Interactive Regular Expression Tool, Scientific Prefix to Number, Words to Number

启发作品: Customizable Natural-Order Sort, Natural-Order Row Sort, Natural-Order Filename Sort, Interactive Regular Expression Tool

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

ARBSORT Examples

版本	已发布	发行说明
1.0.8	2024/12/28	* Add testcases * Documentation improvements	下载
1.0.7	2024/5/1	* Documentation improvements.	下载
1.0.6	2024/4/21	* Update FEX description.	下载
1.0.5	2024/4/21	* Update FEX description.	下载
1.0.4	2024/4/21	* Faster regular expression handling. * Documentation improvements.	下载
1.0.3	2023/8/17	* Alphabets are defined as being the last provided sequence (more efficient). * Documentation improvements.	下载
1.0.2	2023/7/13	* Add references.	下载
1.0.1	2023/7/13	* Correct FEX submission number.	下载
1.0.0	2023/7/13		下载

Custom List / Arbitrary Sequence Sort

引用格式

必需项

MATLAB 版本兼容性

平台兼容性

标签添加标签

致谢

Community Treasure Hunt

探索实时编辑器

Custom List / Arbitrary Sequence Sort

引用格式

必需项

MATLAB 版本兼容性

平台兼容性

标签 添加标签

致谢

Community Treasure Hunt

探索实时编辑器

标签添加标签