Converting rough strings to exact strings
显示 更早的评论
I have a string array of filenames which are names in an semi-consistent manner, e.g.:
AllFiles
AllFiles =
4x1 string array
"textIdontCareAbout_Phenolic32_Group5_textIdontCareAbout"
"textIdontCareAbout_P1_textIdontCareAbout"
"textIdontCareAbout_Epx2_G3_textIdontCareAbout"
"textIdontCareAbout_Epoxy_105_textIdontCareAbout"
Im trying to figure out how to extract & convert the inconsistent substrings of interest (the stuff between "textIdontCareAbout") into a consistent format, e.g.:
AllFiles
AllFiles =
4x1 string array
"P32G5"
"P1"
"E2G3"
"E105"
I had been avoiding using regexp, but having caved and decided to work with that, I'm trying to figure out an elegant way to do this conversion. At present the only thing I can see working is manually checking for each possible phrasing style I see when manualy searching through the data I have at present.
Is there a better way to go about this, or even just some suggestions to how to define the regexp in a way to have as few searches as possible?
4 个评论
the cyclist
2022-9-7
Using regular expressions is the only way I can think to attack this. And, at the risk of stating the obvious ... you can't write the regular expression code until you can precisely define the rule for the expression you are trying to identify. So, that's the first step.
It seems that the rule might be ...
- only the substring between the outermost underscores
- all the digits
- only the capitalized letters
I don't want to bother figuring out a regular expression that does that, though, if that does not seem like the fully general rule you need.
Gabriel Stanley
2022-9-7
There may be a way to do this with string operations. Hard to tell w/o knowing the rule(s) to apply for what to keep and what to discard from a single string. For example, supposing that @the cyclist has the correct rules, one could do
AllFiles = [
"textIdontCareAbout_Phenolic32_Group5_textIdontCareAbout"
"textIdontCareAbout_P1_textIdontCareAbout"
"textIdontCareAbout_Epx2_G3_textIdontCareAbout"
"textIdontCareAbout_Epoxy_105_textIdontCareAbout"]
AllFiles = extractAfter(AllFiles,"_")
AllFiles = reverse(extractAfter(reverse(AllFiles),"_"))
upperchars = isstrprop(AllFiles,'upper')
digitchars = isstrprop(AllFiles,'digit')
AllFiles = arrayfun(@(a,b,c)(string(a{1}(find(b{:} | c{:})))),cellstr(AllFiles),upperchars,digitchars)
IDK, maybe regexp will be better/easier (I've never been able to get my head wrapped around regular expressions and patterns).
Gabriel Stanley
2022-9-8
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!