I am trying to modify file paths with consecutive repeated folder names, e.g, "archive" is repeated in "Clients/archive/archive/20220428.1349.zip". The modification I seek is to truncate that path beyond the 2nd occurance of a repeated folder, leaving the trailing file path separator, e.g., "Clients/archive/archive/". I thought this would do it:
FolderInSelf = regexprep( FolderInSelf, ...
"^(.*/(\w+)/\2/).*", "$1" );
"FolderInSelf" is vertical vector of strings, each representing a file path that contains a consecutively repeated folder name.
The outer set of brackets captures the 1st token, which is for the path upto the repeated folder, excluding anything after the slash.
The inner set of brackets is the 2nd token, which is the for the first occurrence of the repeated folder name ("archive" in the example above).
The back reference "\2" describes the fact that the token is repeated, and separated by a slash.
I am puzzled by why the above "regexprep" does nothing to the strings in FolderInSelf. To troubleshoot, I chose a simpler command that worked as expected
>> regexprep( "Clients/archive/archive/20220428.1349.zip", ...
"^(.*/(archive)/archive/).*", "$1" )
ans = "Clients/archive/archive/"
If I replace "$1" with "$2", I expect to get "archive" (the 2nd token). Instead, I get:
This suggest that the 2nd token is not being captured. Can anyone point out what I am doing wrong?