Speed Up String Conversion

9 次查看(过去 30 天)
Hi all. I am trying to speed up string conversion of a table field as below :-
GoingUC=string(table2cell(Inps(:,5)));
Inps is a table with approximately 730000 records with 13 fields. I've got 6 categorical fields to convert and it is taking over 2.5 hours so I wondered if there was a quicker way to do this. I need a string array for the following code which converts the categorical strings to numbers in a map (which is quick) :-
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC);
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)));
It all works perfectly for my use but it's just if I can speed it up that would be great.
Steve Gray
  2 个评论
Voss
Voss 2024-5-1
The third output from unique is the same as the end result (or the transpose of the end result, if GoingUC is a row vector), so using a Map is unnecessary.
GoingUC = string(randi(10,10000,1))
GoingUC = 10000x1 string array
"9" "6" "2" "3" "9" "1" "10" "5" "4" "9" "4" "10" "10" "3" "10" "8" "7" "2" "9" "7" "2" "2" "3" "7" "8" "9" "7" "1" "1" "6"
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC)
Unique_GoingU = 10x1 string array
"1" "10" "2" "3" "4" "5" "6" "7" "8" "9"
GoingU_Numeric_Cats = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)))
NTD_GoingU = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
isequal(GoingU_Numeric_Cats,NTD_GoingU)
ans = logical
1

请先登录,再进行评论。

采纳的回答

Voss
Voss 2024-5-1
Avoid using table2cell for this; instead, access the table data directly (using curly braces {}, or, even better, dot indexing)
% 100000x1 table of categoricals
Inps = table(categorical(randi(10,100000,1)))
Inps = 100000x1 table
Var1 ____ 7 10 8 5 7 3 10 8 4 2 6 6 2 10 10 9
% using table2cell
tic
str1 = string(table2cell(Inps(:,1)));
toc
Elapsed time is 1.676799 seconds.
% using curly brace indexing
tic
str2 = string(Inps{:,1});
toc
Elapsed time is 0.013733 seconds.
% using dot indexing
tic
str3 = string(Inps.(1));
toc
Elapsed time is 0.005515 seconds.
Accessing the table data directly is > 100 times faster, and produces the same result:
isequal(str2,str2,str3)
ans = logical
1
  3 个评论
Voss
Voss 2024-5-1
编辑:Voss 2024-5-1
You're welcome!
table2cell could be useful for collecting multiple variables of a table into a cell array, particularly if the variables contain different classes of data. Although I would most likely just keep the data in table form.
T = table(rand(10,1),cellstr(char(65+randi([0,9],10,5))),string(rand(10,1)))
T = 10x3 table
Var1 Var2 Var3 _______ _________ __________ 0.23051 {'ADAJB'} "0.15424" 0.46691 {'FACCA'} "0.49046" 0.60176 {'BFJGB'} "0.12775" 0.97235 {'IBGBJ'} "0.93042" 0.26794 {'GCCAI'} "0.42212" 0.13361 {'GABEB'} "0.094709" 0.12238 {'EEFBH'} "0.14285" 0.24268 {'CDDDG'} "0.42503" 0.69713 {'IGHGF'} "0.075316" 0.59503 {'JFEBG'} "0.36855"
% table to cell keeps the data classes as they are in the table
C = table2cell(T(:,[1 2 3]))
C = 10x3 cell array
{[0.2305]} {'ADAJB'} {["0.15424" ]} {[0.4669]} {'FACCA'} {["0.49046" ]} {[0.6018]} {'BFJGB'} {["0.12775" ]} {[0.9724]} {'IBGBJ'} {["0.93042" ]} {[0.2679]} {'GCCAI'} {["0.42212" ]} {[0.1336]} {'GABEB'} {["0.094709"]} {[0.1224]} {'EEFBH'} {["0.14285" ]} {[0.2427]} {'CDDDG'} {["0.42503" ]} {[0.6971]} {'IGHGF'} {["0.075316"]} {[0.5950]} {'JFEBG'} {["0.36855" ]}
% but the concatenation required when accessing directly converts
% numeric and cell char to string, in order to combine the
% numeric and cell char table variables with the string variable
T{:,[1 2 3]}
ans = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"
C = [T.(1) T.(2) T.(3)]
C = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Cell Arrays 的更多信息

产品


版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by