Tricky Regexp Problem, How to specify a slightly varying pattern
显示 更早的评论
Hello all,
I am working with some files that have weird/dumb naming conventions and within the file's name, I need to extract some info so some other stuff I'm doing for a school database project.
The name of a typical file looks something like this:
G001E94A13g1_NM_MAE1317.xls
Where the combination of letters and numbers allow for users to know at a glance the student's name, major, and department among other things.
I want to be able to extract only G001E94A13 from the file name as the g1 is typically associated with something else that isn't necessary to know in the project's scope.
I am hoping to apply this regexp to an entire catalog of these files where the naming convention varies slightly. For example, some students will have two last names and other info that changes this "at a glance" file name. For reference it could look something like:
JG002C96E15X_TX_EE1317.xls
And all I would want is JG002C96E15. The big change is the extra last name that gives the initials JG and the big X after 15. The commonality is that the naming conventions don't change drastically enough to where I can always say that the first portion of the file name that I want will always end with the last two digits of the year the student started the program (15 in this one and 13 in the last one).
I am hoping there is a way to tell the regexp to grab the letters and numbers starting at the beginning of the file name, stopping at the first _, while not grabbing _ or the first set of numbers/characters to the left of _ but keeping the next number that comes after the first character left of the _.
Sounds convoluted but hopefully this explains what I'm trying to do. I'm thinking it might have to be two regexps or something.
What I have tried is:
fileName(1) = G001E94A13g1_NM_MAE1317.xls;
splitStr = regexp(fileName(1),'\_','split');
[a b] = regexpr(splitStr(1),'^([a-Az-0-9]+)_','match')
And I get:
a = G001E94A13g1
But I want:
a = G001E94A13
Thank you in advance and please let me know if more clarification is needed.
3 个评论
Can you simplify the issue by providing more examples like you had at the way end? What are the EXACT patterns you are looking for, exceptions, etc. It's easier for us just to see a bunch of examples (instead of an essay) that we can check the results, simply like:
G001E94A13g1_sdfsdf.xls ==> G001E94A13
JG002C96E15X_TX_EE1317.xls ==> JG002C96E15
What are the patterns that are okay? Start by filling in the brackets. EXAMPLE
"[1-2 letters][numbers][letters][numbers][IGNORE 1-2 char]_[IGNORE THE REST AFTER UNDERSCORE].xls"
Walter Roberson
2018-7-9
Is it 1 or more uppercase alphabetic letters, followed by upper case hex, ending in the first non-hex, with you wanting to stop just before that non-hex ?
L
2018-7-9
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!