How to replace a particular string in text file
32 次查看(过去 30 天)
显示 更早的评论
arun
2013-10-18
I have a problem related to efficiency, the code given below will replace the string and with '' an ' .' the code is working properly for small size text file ,but the main problem i am facing is that if there are approx 40,0000+ lines in text file then it is taking too much time that no one can't wait so please can anyone suggest me something different which run faster than this, Thanks in advance.
fid = fopen('input.txt','r');
f=fread(fid,'*char')';
fclose(fid);
f = regexprep(f,' ','');
f = regexprep(f,' ',' .');
fid = fopen('output.txt','w');
fprintf(fid,'%s',f);
fclose(fid);
采纳的回答
Azzi Abdelmalek
2013-10-18
编辑:Azzi Abdelmalek
2013-10-18
strrep is faster then regexprep
f = strrep(f,' ','');
f = strrep(f,' ',' .');
17 个评论
Azzi Abdelmalek
2013-10-18
strrep is much faster, but when it comes to complex parsing, regexprep is more powerful
arun
2013-10-18
then,i can't replace these line by another one?
f = regexprep( f, '([^\n\r]+)', '<s> $1' );
f = regexprep(f,' \w*_|\,_',' ');
Cedric
2013-10-18
What is the purpose of the code? Note that if you wanted to wrap all lines in <s> and </s> tags, you could probably achieve that with
f = ['', strrep(f, '\r\n', '\r\n'), ''] ;
(I changed the order of new line and carriage return, make the change back if it is inverted for any reason in your files)
arun
2013-10-18
The purpose my code is, I have text file which contain text like
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
first of all i want to wrap all sentences like
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT
and i am using the code that is 'f = regexprep(f,'\._.','</s>');' 'f = regexprep( f, '([^\n\r]+)', '<s> $1' );'
After that i want to extract the pos
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VBP NN DT
VBD JJ IN VBN NN VBP NN DT
for this i am using 'f = regexprep(f,' \w*_|\,_',' ');'
As you suggest, the code which given above
f = ['', strrep(f, '\r\n', '\r\n'), ''] ;
gives the result as
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
Cedric
2013-10-18
编辑:Cedric
2013-10-18
Did you try with
f = ['', strrep(f, '\n\r', '\n\r'), ''] ;
as I suggest in my note in parenthesis? If it works, then you can just modify it so it removes '._.' as well.. assuming that there is a white space between the last . and the newline char..
f = ['', strrep(f, '._. \n\r', '\n\r'), ''] ;
arun
2013-10-18
yes.i notice that and also tried.
f = ['', strrep(f, '._. \r\n', '\r\n'), ''] ;
f = ['', strrep(f, '._. \n\r', '\r\n'), ''] ;
f = ['', strrep(f, '._.\n\r', '\r\n'), ''] ;
f = ['', strrep(f, '._.\n\r', '\r\n'), ''] ;
and i have also tried many others but they are not working it not replacing ._. with </s> and starting string <s>
I think it is reading whole file at a time and not recognizing the new line character
fid = fopen('input.txt','r');
f=fread(fid,'*char')';
fclose(fid);
Cedric
2013-10-18
I though that you were already matching '\r\n' and that was working but too slow.. was is not working? Could you attach one of these files to your question so I can try?
arun
2013-10-18
it is not slow i am trying these on 4 sentences(lines), and i am using the code which is given above.
Cedric
2013-10-18
编辑:Cedric
2013-10-18
You wrote "the code is working properly", and later that you were using '\r' and '\n' in a regexp pattern. Was it just the first part which was working properly?
In any case, could you attach a file or a chunk of file to your question? It would be easier if I could experiment with your file, because then I can check directly what special chars you have in there and how to match them or use them in replacements. If you post a large enough file, I can also try to optimize. If you cannot attach the file to a public forum page, you can send it to me by email.
arun
2013-10-18
i have attached two file 'input.txt' and a 'code.txt' file, these are the copy of the file i am using currently to get the expected output.
Cedric
2013-10-18
Ok, try the following:
content = fileread( 'inputtextfile.txt' ) ;
newContent = strrep( content, '._. ', '' ) ;
newContent = strrep( newContent, char([13,10]), sprintf('</s>\r\n') ) ;
newContent = ['<s>', newContent, ''] ;
arun
2013-10-19
编辑:arun
2013-10-19
yes,it is working,
content = fileread( 'inputtextfile.txt' ) ;
newContent = strrep( content, '._. ', '' ) ;
newContent = strrep( newContent, char([13,10]), sprintf('</s>\r\n ') ) ;
newContent = ['<s> ', newContent,''] ;
newContent = strrep( newContent, ' ', '' ) ; % it will remove extra from the end of file
But, I think 'strrep' can't be used instead of 'rexexprep' in case of last step to get the output file:
*newContent = regexprep(newContent,' \w*_|\,_',' ');*
Cedric
2013-10-19
编辑:Cedric
2013-10-19
So you want to remove (or replace with a white space) all prefixes like 'new_', 'on_', etc, as well as precisely the string ',_' ? If so, you can simplify the process by using STRREP for removing all ',_', which allows you to reduce the OR statement in the regexp pattern and keep only the first part ' \w*_'.
If it works, then you can profile REGEXP with other patterns which could apply as well to your case and be more efficient than '\w*', e.g. '\S*'.
arun
2013-10-19
yes, now i am using
f = regexp(f,'\S*_','split')
To get the following output,
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VB DT NN
These statement are much better.
Thanks for your efforts and for your valuable suggestions.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 File Operations 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)