Sort the elements of a text array (string/cell of char/...) into the order of custom/arbitrary text sequences. Particularly useful for collation/sorting words of languages using the Latin, Greek, or Cyrillic alphabets. Inspired by MS Excel's "custom list" sorting feature, but extended with case-insensitive partial text matching using powerful regular expressions, for any number of sequences. For example:
>> A = ["LargeBurger", "MediumCoffee", "SmallCoffee", "MediumBurger"];
>> sort(A)
ans = ["LargeBurger" "MediumBurger" "MediumCoffee" "SmallCoffee"]
>> arbsort(A, ["small","medium","large"])
ans = ["SmallCoffee" "MediumBurger" "MediumCoffee" "LargeBurger"]
And of course the sorting itself can also be controlled:
- ascending/descending sort direction
- character case sensitivity/insensitivity
- diacritic sensitivity/insensitivity
- literal/regular-expression matching
- whole/partial text matching
Alphabetic Sorting
ARBSORT is particularly useful for sorting text of languages for which ASCII/Unicode character-code order does not provide the correct alphabetic sort (e.g. text with diacritics or ligatures). ARBSORT does not provide the countless language-specific collation rules, but it does sort text into the order specified by the provided alphabet:
>> Ae = {'yo', 'os', 'la', 'ño', 'va', 'ni', 'de', 'ña'};
>> alfabeto = num2cell(['A':'N','Ñ','O':'Z']);
>> arbsort(Ae, alfabeto)
ans = {'de', 'la', 'ni', 'ña', 'ño', 'os', 'va', 'yo'}
Replacement Substrings
The sorting rules of some languages require certain characters to be replaced with (or considered equivalent to) other characters. For example, in German the eszett character "ß" is sorted as it was written as "ss", and in some circumstances vowels with umlauts are sorted as that vowel without an umlaut, suffixed with "e":
>> Ag = ["Goethe", "Goldmann", "Gurke", "Göbel", "Göthe", "Götz"];
>> B1 = arbsort(Ag, ["ß";"ss"])
B1 = [ "Göbel", "Goethe", "Goldmann", "Göthe", "Götz", "Gurke"]
>> B2 = arbsort(Ag, ["ä","ö","ü","ß"; "ae","oe","ue","ss"])
B2 = [ "Göbel", "Goethe", "Göthe", "Götz", "Goldmann", "Gurke"]
>> Bg = arbsort(Ag, ["ß";"ss"], num2cell(['aä','b':'o','ö','p':'u','ü','v':'z']))
Bg = [ "Goethe", "Goldmann", "Göbel", "Göthe", "Götz", "Gurke"]
>> Ab = {'L', 'XS', 'S', 'M', 'XL', 'S', 'M', 'XL', 'XS', 'L'};
>> [Bb,Xb] = arbsort(Ab, {'XS','S','M','L','XL'})
Bb = {'XS', 'XS', 'S', 'S', 'M', 'M', 'L', 'L', 'XL', 'XL'}
Xb = [2,9,3,6,4,7,1,10,5,8]
>> Ac = ["medium_test", "high_train", "low_train", "high_test", "medium_train", "low_test"];
>> arbsort(Ac, ["train","test"], ["low","medium","high"])
ans = ["low_train", "low_test", "medium_train", "medium_test", "high_train", "high_test"]
>> Ad = ["test_three", "test_one", "test_ninetynine", "test_two"];
>> arbsort(Ad, @words2num)
ans = ["test_one", "test_two", "test_three", "test_ninetynine"]