A problem while splitting a text input with regexp
1 次查看(过去 30 天)
显示 更早的评论
I have a text file with the input as
sammy yo yo
yoyo with you
Samyukta
and I tried the following code to put each word into an element of an array.
fid = fopen('test4.txt');
table = fscanf(fid,'%c');
table2 = regexp(table,'\n','split');
this means that when I refer to table2{1}, it returns 'sammy yo yo' then I split every line individually with strsplit and ' ' (whitespace) as the delimiter. Therefore, when I refer to table2{1}{2} , it returns 'ýo'. But, the last word of every line has more number of letters than appears i.e. size(table2{1}{2},2) = 3 rather than 2. But when I strcmp it with '\n' and ' ' or any other thing, it returns 0. So now I don't know what to do.
2 个评论
Walter Roberson
2013-8-15
What shows up for
table2{1}{2}(end) + 0
I suspect you will find it is 13 (carriage return)
采纳的回答
Cedric
2013-8-15
>> fprintf('%d,', table) ; fprintf('\n') ;
115,97,109,109,121,32,121,111,32,121,111,13,10,121,111,121,111,32,119,105,
116,104,32,121,111,117,13,10,83,97,109,121,117,107,116,97,13,10,
As you can see, at the end of each line, there are 13 (carriage return: '\r') and 10 (new line: '\n').
If you just want to split words, why don't you split using REGEXP only with a pattern which matches whitespaces? For example:
>> buffer = fileread('test4.txt') ;
>> words = regexp(buffer, '\s+', 'split')
words =
'sammy' 'yo' 'yo' 'yoyo' 'with' 'you' 'Samyukta' ''
with this, you would just have to delete the last cell when empty (which happens when your file ends with '\r\n'), and you would be done.
2 个评论
Walter Roberson
2013-8-15
Only if the file was created with an older MS Windows editor. More modern MS Windows editors only put in \n (newline) without \r (carriage return). Linux and OS-X have never used \r . (MacOS before OS-X might have used \r )
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!