Removing specific characters from string in nested cells
8 次查看(过去 30 天)
显示 更早的评论
I have a series of strings which are contained within a nested cell array (because regexp loves to nest cells), and I would like to remove any non numeric or white space characters from them so that I can convert them to doubles, namely astrick.
I'm looking for the least painful way of removing any of these special characters from all strings. I do not have a sample file to attach, sorry, but I have dictated the shape of a sample array below.
X == 1x1 cell
X{1} == 1x1 cell (because regexp can't help itself apparently)
X{1}{1} = {'1234., ';'12.,* ';'1234., ','123.,* ',' 321.,* '};
12 个评论
Stephen23
2018-6-15
@Bob Nbob: you are right, it does not appear in the Mfile help. I notice that many other useful regular expression features also do not appear in the Mfile help: notably missing are dynamic expressions, lookaround operators, and named capture.
Both the inbuilt help and the page I linked to give a very useful introduction, and explain all features of regular expressions in MATLAB:
doc regexp
doc('Regular Expressions')
采纳的回答
Paolo
2018-6-15
编辑:Paolo
2018-6-15
Perhaps this can easily be achieved in two steps. For your input:
1 ****TABLE1****
COLUMN1= 1.12, 2.23, 3.34, 4.45, 5.56, 6.67,
COLUMN2= 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
COLUMN3= 1.23, 0.34, 3.45, 5.78*, 6.54*, 8.23,
1 ****TABLE2****
data = fileread('CORR.txt');
expression_sub = '(?<=\d\.\d*\*?)([\*\.,])';
data = regexprep(data,expression_sub,'');
Data will now not contain those characters. Data is now:
' 1 ****TABLE1****
COLUMN1= 1.12 2.23 3.34 4.45 5.56 6.67
COLUMN2= 0.00 0.00 0.00 0.00 0.00 0.00
COLUMN3= 1.23 0.34 3.45 5.78 6.54 8.23
1 ****TABLE2****
'
Step 2. Match your data. Live regex here. The expression is greedy and will try to match as many digit, full stop, digits combinations as it can. Therefore you don't need to repmat your expression like you showed.
expression_match = '(?<=COLUMN[1,3]=\s)(\d.?\d*\s)*';
[tokens,match] = regexp(data_sub,expression_match,'tokens','match');
Matlab manipulation.
column1 = str2double(strsplit(cell2mat(tokens{1}),' '));
column3 = str2double(strsplit(cell2mat(tokens{2}),' '));
column1 =
1.1200 2.2300 3.3400 4.4500 5.5600 6.6700
column3 =
1.2300 0.3400 3.4500 5.7800 6.5400 8.2300
更多回答(1 个)
George Abrahams
2022-12-30
The others are right to fix the root problem causing the tricky nested cell array. Having said that, for future reference, my deepreplace function on File Exchange / GitHub would have done exactly what you requested.
x = {{{'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '}}};
% Remove any character except for digits (0-9) and period (.)
match = regexpPattern('[^\d.]');
x = deepreplace(x,match,'');
% x = 1×1 cell array
% {1×1 cell}
% x{1} = 1×1 cell array
% {5×1 cell}
% x{1}{1} = 5×1 cell array
% {'1234.'}
% {'12.' }
% {'1234.'}
% {'12310'}
% {'321.' }
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Data Preparation 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!