How should I fix my regular expression to parse this txt file?

4 次查看(过去 30 天)
This is part of my code that reads the text file I attached and searches the file name between 'subsystems.tbl\' and '.sub' according to the given 'sub_sys (Major Role)' and 'location (Minor Role)' using regular expressions.
if ismember(sub_sys, {'spr', 'dpr', 'bum', 'reb'})
block_pattern = ['\/([^\/]+)\.', sub_sys];
elseif ismember(sub_sys, 'susp')
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : suspension','[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, {'steering', 'wheel'})
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : ', sub_sys, '[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, 'tir')
block_pattern = ['PROPERTY_FILE\s*=\s*''[^'']+\/([^\/]+)\.tir'''];
end
name_tokens = regexp(file_content, block_pattern, 'tokens', 'once', 'dotexceptnewline');
it reads well for the front suspension system (susp, spr, dpr, bum, reb, steering, wheel, tir) and returns the correct paths, but for rear suspension system, my code reads rr_susp_path = 'AA_TCAR_WHEEL_RR_22inch' instead of giving me rr_susp_path = 'AA_TCAR_SUSP_RR_RWS_230607'
It seems that my regular expression is way too broad and causing this problem. How should I fix my regular expression?

采纳的回答

Stephen23
Stephen23 2024-4-18
编辑:Stephen23 2024-4-18
"It seems that my regular expression is way too broad and causing this problem."
There are several locations where your regular expression matches unlimited amounts of (almost) anything:
  • [^'']+
  • [^>]+
  • [\s\S]*
I doubt that you really want unlimited matches like that.
"How should I fix my regular expression?"
Perhaps something like this:
pf1 = 'suspension';
pf2 = 'rear';
tmp = strcat('\$\s+',{'Major';'Minor'},'\s+Role\s+:\s+',{pf1;pf2},'\s+');
rgx = ['(?<=',tmp{:},'(\$.+\s+)*USAGE\s+=.+?)\w+\.sub']
rgx = '(?<=\$\s+Major\s+Role\s+:\s+suspension\s+\$\s+Minor\s+Role\s+:\s+rear\s+(\$.+\s+)*USAGE\s+=.+?)\w+\.sub'
str = fileread('test_example.txt');
out = regexp(str,rgx,'match','once','dotexceptnewline')
out = 'AA_TCAR_SUSP_RR_RWS_230607.sub'
  1 个评论
Munho Noh
Munho Noh 2024-4-19
Hello Steven, your answer is always helpful, thank you always.
I modified your answer a little bit like the following to capture only the file name except for the .sub extension.
block_pattern = ['(?<=\$\s+Major\s+Role\s+:\s+', sub_sys, '\s+\$\s+Minor\s+Role\s+:\s+', location, '\s+(\$.+\s+)*USAGE\s+=.+\/)(\w+)(?=\.sub)'];
Thank you for your good advice.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Time Series Events 的更多信息

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by