how to remove punctuation from Arabic text file
5 次查看(过去 30 天)
显示 更早的评论
Hello,I have a Arabic string and want to discard all punctuations. I want to keep only text and white space between words.For example this is my string: str='سلام. دوست خوب من!'. can I change codes below to do it?
str= fileread('D:/docc111.txt');
str1 = regexprep(str,'\s+',' ');%replace enter with white space
%or str1 = regexprep(str,'[\n\r]+',' ')
%str1 = 'Hello, I need 1 MATLAB code to discard all punctuation, and signs from 9 text files.'
Lstr1=length(str1);
str_space='\s'; %String of characters
str_caps='[A-Z]';
str_ch='[a-z]';
str_nums='[0-9]';
ind_space=regexp(str1,str_space);%Match regular expression
ind_caps=regexp(str1,str_caps);
ind_chrs=regexp(str1,str_ch);
ind_nums=regexp(str1,str_nums);
mask=[ind_space ind_caps ind_chrs ind_nums];
num_str2=1:1:Lstr1;
num_str2(mask)=[];
str3=str1;
str3(num_str2)=[];
chars = [str3];
%insert space after first index and after last index in chars
charsWithWhitespace = [' ', chars(1:end), ' '];
newTest = sprintf(strrep(charsWithWhitespace, '\n', ' '));
fid = fopen('myySE1.txt','w');
fprintf(fid, '%s',charsWithWhitespace);
fclose(fid);
0 个评论
回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Cell Arrays 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!