How to search a string with multiple rows for text?

Question

0 个投票

Hello, After running seq=getgenpept('NP_036795'); . I want to search seq.Features for some text value 'Protein' . I have been unable to find the correct function to search a string with multiple rows.

Running: k=strfind(seq.Features,'Protein') results with "Error using strfind. Input strings must have one row."

Any thoughts? Best, Joe

3 个评论
显示 1更早的评论隐藏 1更早的评论

per isakson 2015-3-27

编辑：per isakson 2015-3-27

在 MATLAB Online 中打开

Excerpt from doc of getgenpept

Features: [40x64 char]

strfind cannot handle multi-row character arrays.

What does this array of characters look like? &nbsp BTW: it's allowed to use for-loops.

Luuk van Oosten 2015-3-28

Looks like the pic below.

What kind of info are you trying to extract from 'Protein'?

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

per isakson 2015-3-28

编辑：per isakson 2015-3-29

在 MATLAB Online 中打开

0 个投票

I guess this block of characters is easier to read on screen than to read and parse automatically. "find the correct function" I don't think there is the function; a small program is needed. Anyhow, the script below creates a structure, sas, which is a start

    %%Create test data. (The OCR-program missed most of the underscore.)
    buf = { 'source   1..116                                                '
            '         /organism="Rattus norvegicus"                         '
            '         /dbxref="taxon: 10116^                                '
            '         /chromosome=^10^                                      '
            '         /map="10824"                                          '
            'Protein  1..116                                                '
            '         /product="vesicle-associated membrane protein 2^      '
            '         /note="VAMP-2; synaptobrevin-2; Synaptobrevin 2       '
            '         (vesicle-associated membrane protein VAMP-2);         '
            '         Vesicle-associated membrane protein (synaptobrevin 2)"'
            '         /calculated mol wt=12560                              '
            'Region   28..101                                               '
            '         /region name="Synaptobrevin"                          '
            '         /note="Synaptobrevin; pfam00957"                      '
            '         /dbxref="CDD:250253"                                  '
            'Site     95..114                                               '
            '         /site type="transmembrane region"                     '
            '         /inference="non-experimental evidence, no additional  '
            '         details recorded"                                     '
            '         /note="propagated from UniProt./Swiss-Prot (P63045.2).'
            'CDS      1..116                                                '
            '         /gene="Vamp2^                                         '
            '         /gene synonym="RATVAMPB; RATVAMPIR; SYS; Syb2^        '
            '         /coded by="NM 012663.2:83..433"                       '
            '         /dbxref="GeneID:24803^                                '
            '         /dbxref="RGD:3949"                                    '};
    str_array = char( buf );
    %%read and parse
    for rr = 1 : size( str_array, 1 )
        % search rows starting with a word and followed by digits, two ".", digits
        buf = regexp( str_array(rr,:), '^(\w+)\s+(\d+\.{2}\d+)', 'tokens' );
        if not( isempty( buf ) )
            field_name = buf{1}{1};
            sas.(field_name) = buf{1}(2); 
        else
            sas.(field_name) = cat( 1, sas.(field_name)         ...
                                ,   strtrim( str_array(rr,:) )  );
        end
    end

The structure, sas, has one field for each sub-group

    >> sas
    sas = 
         source: {5x1 cell}
        Protein: {6x1 cell}
         Region: {4x1 cell}
           Site: {4x1 cell}
            CDS: {6x1 cell}
    >> sas.Protein
    ans = 
        '1..116'
        '/product="vesicle-associated membrane protein 2^'
        '/note="VAMP-2; synaptobrevin-2; Synaptobrevin 2'
        '(vesicle-associated membrane protein VAMP-2);'
        'Vesicle-associated membrane protein (synaptobrevin 2)"'
        '/calculated mol wt=12560'
    >> char( sas.Protein )
    ans =
    1..116                                                
    /product="vesicle-associated membrane protein 2^      
    /note="VAMP-2; synaptobrevin-2; Synaptobrevin 2       
    (vesicle-associated membrane protein VAMP-2);         
    Vesicle-associated membrane protein (synaptobrevin 2)"
    /calculated mol wt=12560                              
    >>

Next step is to parse the sub-blocks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to search a string with multiple rows for text?

3 个评论
显示 1更早的评论隐藏 1更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

标签

Community Treasure Hunt

How to search a string with multiple rows for text?

3 个评论 显示 1更早的评论 隐藏 1更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论