- How can we identify the start of the substring: does it always consist of exactly the same letters (eg MOD), or is it always preceded by some recognizable pattern of characters?
- How can we identify the end of the substring: is it always exactly the same file extension that you need to locate?
extract part of a string with an extension
3 次查看(过去 30 天)
显示 更早的评论
Hi, I have a long string and I want to just exctract the names that have "hdf" as an extension:
I want just to get "MOD11C1.A2013001.005.2013015221704.hdf"
My string is:
U.S. GOVERNMENT COMPUTER
This US Government computer is for authorized users only. By accessing this
system you are consenting to complete monitoring with no expectation of privacy.
Unauthorized access or use may subject you to disciplinary action and criminal
prosecution.
********************************************************************************
</pre>
<pre><img src="/icons/blank.gif" alt="Icon "> Name Last modified Size Description<hr><img src="/icons/back.gif" alt="[DIR]"> Parent Directory -
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.1.jpg 15-Jan-2013 16:29 3.2M
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.2.jpg 15-Jan-2013 16:29 3.3M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf 15-Jan-2013 16:29 46M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf.xml 16-Jan-2013 02:15 32K
<hr></pre>
</body></html>
Thanks,
Zeinab
3 个评论
采纳的回答
per isakson
2014-12-3
编辑:per isakson
2014-12-4
Here is a solution(?) based on regexp
>> cac = cssm;
>> cac{:}
ans =
MOD11C1.A2013001.005.2013015221704.hdf
ans =
MOD11C1.A2013001.005.2013015221704.hdf
>>
where
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf';
cac = regexp( str, name_xpr, 'match' );
end
and cssm.txt contains the text of your question. Two identical name seems to be correct. You might want to apply unique
 
In response to comments:
My mistake illustrates a problem with regular expressions. Expressions often matches unexpected strings. I missed the case that ".hdf" is part of the base name rather than an extension. Now I have added that ".hdf" should be followed by "\s, Any white-space character; equivalent to [\f\n\r\t\v]". However, that white-space is not included in the output.
>> cssm
ans =
'MOD11C1.A2013001.005.2013015221704.hdf'
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf(?=\s)'; % <<<<<<< modified
cac = regexp( str, name_xpr, 'match' );
end
 
Stephen Cobeldick already proposed this modification to the expression. I like Stephen's list, which helps to pinpoint the unique characteristics of the string. It triggers thinking. Does the filename always start with "MOD"? Could "MOD" appear in the middle of the name? It's risky to deduce rules out of small samples. If the name shall always start with "MOD"
name_xpr = '(?<=\s)MOD[\w\.]+\.hdf(?=\s)';
is a better expression.
更多回答(1 个)
Stephen23
2014-12-3
编辑:Stephen23
2014-12-3
Why not all on one line?
str = fileread('temp.txt');
C = regexp(str,'MOD[\w\.]+\.hdf(?=\s)','match');
C =
'MOD11C1.A2013001.005.2013015221704.hdf'
This matches all substrings that meet the following conditions:
- starts with 'MOD'
- ends with '.hdf'
- contains any combination of alphnumeric characters plus period
- is followed by a space character (ie excludes '....hdf.xml')
As suggested by per isakson, you might also want to apply unique to the output.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 HDF5 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!