How to extract variables from Character array

Question

gvr 2015-6-19

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/224530-how-to-extract-variables-from-character-array

评论： Stephen23 2015-7-8

采纳的回答： Stephen23

labelStrings.mat

在 MATLAB Online 中打开

Is it possible to get variable names from character array of only particular range?

For example, My char array and it looks like below

en:
variable_en1 = expression; variable_en2 += expression;
variable_en3 := expression;
variable_en4++;
variable_en5--;
du:
variable_du1= expression;
variable_du2 := expression
ex:
variable_ex1=0;variable_ex2=1;
variable_ex3 = 2;

I would like to extract only variable_en1 to variable_en5 in one array and variable_ex1 to variable_ex3 in another arry.

I am attaching character array .mat file.

Could you please help me?

4 个评论
显示 2更早的评论隐藏 2更早的评论

Stephen23 2015-6-19

编辑：Stephen23 2015-6-19

What are "variables"? The question is not very clear.

You uploaded a .mat file containing one string. There are easy ways to extract parts of a string (particularly indexing or regular expressions), but you have not explained what part of the strings you are interested in. Please give exact examples of the desired output, and an explanation of how these should be identified (e.g. preceding or trailing characters, newline locations, character patterns, etc).

gvr 2015-6-19

编辑：gvr 2015-6-19

That one string is from state flow state. I want to find the variabels in from en:. when you load .mat file, it gives one string. There I need to find left side varibles.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Stephen23 2015-6-19

3
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/224530-how-to-extract-variables-from-character-array#answer_183288

编辑：Stephen23 2015-6-19

在 MATLAB Online 中打开

labelStrings.mat

Thank you for editing your question and making it clearer.

You can use regexp to locate these substrings. Here are several different versions using regexpi, which I tested on your sample .mat file:

>> regexpi(transDestiLabel,'^[a-z]+(?=\s\S?=)','match','lineanchors')
ans = 
    'a'    'b'    'c'    'd'    'e'    'f'    'g'    'h'
>> regexpi(transDestiLabel,'^[a-z]+(?=+|-)','match','lineanchors')
ans = 
    'i'    'j'
>> regexpi(transDestiLabel,'^[a-z]+(?=\s\S?=|+|-)','match','lineanchors')
ans = 
  'a'    'b'    'c'    'd'    'e'    'f'    'g'    'h'    'i'    'j'

The sample file:

12 个评论
显示 10更早的评论隐藏 10更早的评论

Stephen23 2015-6-26

编辑：Stephen23 2015-6-26

在 MATLAB Online 中打开

Such sloppy file formatting makes it hard to parse. If the file was neater then this would be a much simpler task. Some improvements that would make parsing simpler:

ensuring that the group headings are on separate lines to the variables
one variable per line
consistent whitespace: some group header have whitespace, some have none, some have whitespace infront of the colon...
no leading spaces

The more untidy the file format is the harder it is to parse. And this one is a mess:

green
en: green_led=1;
variable_en1 = 1; 
variable_en2 += 2;
du:
variable_du1= 1;
 varibale_du2 := 2;
ex: 
 variable_ex1=1;variable_ex2=1;
variable_ex3 = 2;
green_led=0;
   entry: 
  variable_entry1 = 1; 
   variable_entry2 += 2;
during:
variable_during1= 1;
varibale_during2 := 2;
exit:
variable_exit1=1;variable_exit2=1;
variable_exit3 = 2;
en, du:variable_endu1=1;
variable_endu2=2;
   en,ex :
  variable_enex1=1;
  variable_enex2=2;
du,ex: 
variable_duex1=1;
variable_duex2=2;
en,du, ex: 
variable_enduex1=1;
variable_enduex2=2;
entry, during:variable_entryduring1=1;
variable_entryduring2=2;
entry, exit :
variable_entryexit1=1;
variable_entryexit2=2;
during, exit: 
variable_duringexit1=1;
variable_duringexit2=2;
entry, during, exit: 
variable_entryduringexit1=1;
variable_entryduringexit2=2;

In any case, have a play with this, it might do what you want:

load('stateFlowCode.mat')
[C,S] = regexp(allTypesOfActions,'(?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)','match','start');
[D,T] = regexp(allTypesOfActions,'(?<=\n\s*)[\w_, ]+(?=\s?\:[^=])','match','start');
D = regexprep(strtrim(D),',\s?','_');
D(2,:) = arrayfun(@(b,e){C(b<S&S<e)},T,[T(2:end),Inf],'UniformOutput',false);
X = struct(D{:});

Where X is

                     en: {'green_led'  'variable_en1'  'variable_en2'}
                     du: {'variable_du1'  'varibale_du2'}
                     ex: {'variable_ex1'  'variable_ex2'  'variable_ex3'  'green_led'}
                  entry: {'variable_entry1'  'variable_entry2'}
                 during: {'variable_during1'  'varibale_during2'}
                   exit: {'variable_exit1'  'variable_exit2'  'variable_exit3'}
                  en_du: {'variable_endu1'  'variable_endu2'}
                  en_ex: {'variable_enex1'  'variable_enex2'}
                  du_ex: {'variable_duex1'  'variable_duex2'}
               en_du_ex: {'variable_enduex1'  'variable_enduex2'}
           entry_during: {'variable_entryduring1'  'variable_entryduring2'}
             entry_exit: {'variable_entryexit1'  'variable_entryexit2'}
            during_exit: {'variable_duringexit1'  'variable_duringexit2'}
      entry_during_exit: {'variable_entryduringexit1'  'variable_entryduringexit2'}

gvr 2015-7-8

编辑：Guillaume 2015-7-8

在 MATLAB Online 中打开

labels.mat

Yes. This is what I expected..Thank you..

and other side, I am trying to understand regexp which you used to filter the text but I could not get well..

[C,S] = regexp(label,'(?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)','match','start')

what I understood : (?<=\s|;|\:)\w matches string that follow : or ; and identifies word and

(?=(\s\S?)?(+|-|\:)?=) will matches white spaces and non-white spaces then followed by + or - or :

Is my understanding is correct or am I missing something?

I try to filter to get variables on one more char array which is attached to post but I am missing some variables. could you please check and let me know what is wrong?

Stephen23 2015-7-8

在 MATLAB Online 中打开

If you want to play around with regular expressions, try using my FEX submission, which lets you interactively build regular expressions and check them on a piece of text:

http://www.mathworks.com/matlabcentral/fileexchange/48930-regular-expression-helper

and keep reading this and trying examples until it all makes sense:

http://www.mathworks.com/help/matlab/matlab_prog/regular-expressions.html

Lets break down the regular expression:

 (?<=\s|;|\:)\w+(?=(\s\S?)?(+|-|\:)?=)
 (?<=\s|;|\:)                           % preceded by whitespace, ; or :
             \w+                        % any alphanumeric word
                (?=                     % followed by...
                   (\s\S?)?             % maybe whitespace + non-whitepsace
                           (+|-|\:)?    % maybe +, - or :
                                    =)  % equals sign

Hmmm... it seems like the \S? is not really required.

As I noted in an earlier comment the reasons this regular expression is so complicated is because the file format is a complete mess. If you can tidy up the file format, then identifying the variables becomes much easier.

Good luck!

请先登录，再进行评论。

How to extract variables from Character array

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

12 个评论
显示 10更早的评论隐藏 10更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

How to extract variables from Character array

4 个评论 显示 2更早的评论隐藏 2更早的评论

采纳的回答

12 个评论 显示 10更早的评论隐藏 10更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

12 个评论
显示 10更早的评论隐藏 10更早的评论