getgenpept
Retrieve sequence information from GenPept database
Syntax
Data
= getgenpept(AccessionNumber
)
getgenpept(AccessionNumber
)
Data
= getgenpept(...,
'PartialSeq', PartialSeqValue
, ...)
Data
= getgenpept(...,
'ToFile', ToFileValue
, ...)
Data
= getgenpept(...,
'FileFormat', FileFormatValue
, ...)
Data
= getgenpept(...,
'SequenceOnly', SequenceOnlyValue
, ...)
Data
= getgenpept(...,
'TimeOut, TimeOutValue
, ...)
Arguments
AccessionNumber | Character vector specifying a unique alphanumeric identifier for a sequence record. |
PartialSeqValue | Two-element array of integers containing the start and end
positions of the subsequence [ that
specifies a subsequence to retrieve. StartAA is
an integer between 1 and EndAA ; EndAA is
an integer between StartAA and the length
of the sequence. |
ToFileValue | Character vector specifying either a file name or a path and file name for saving the GenPept data. If you specify only a file name, the file is saved to the MATLAB® Current Folder. |
FileFormatValue | Character vector specifying the format for the sequence information.
Choices are:
When |
SequenceOnlyValue | Controls the return of only the sequence as a character
array. Choices are |
TimeOutValue | Connection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here. |
Description
getgenpept
retrieves a protein (amino acid)
sequence information from the GenPept database, which is a translation
of the nucleotide sequences in the GenBank® database and is maintained
by the National Center for Biotechnology Information (NCBI).
Note
NCBI has changed the name of their protein search engine from
GenPept to Entrez Protein. However, the function names in the Bioinformatics Toolbox™ software
(getgenpept
and genpeptread
)
are unchanged representing the still-used GenPept report format. For
more information on GenPept data, visit https://www.ncbi.nlm.nih.gov/home/about/policies.shtml.
searches
for the accession number in the GenPept database and returns Data
= getgenpept(AccessionNumber
)Data
,
a MATLAB structure containing information for the sequence.
Tip
If an error occurs while retrieving the GenPept-formatted information, try rerunning the query. Errors can occur due to Internet connectivity issues that are unrelated to the GenPept record.
getgenpept(
displays
information in the MATLAB Command Window without returning data
to a variable. The displayed information is only hyperlinks to the
URLs used to search for and retrieve the data.AccessionNumber
)
getgenpept(..., '
calls PropertyName
', PropertyValue
,
...)getgenpept
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
returns
the specified subsequence in the Data
= getgenpept(...,
'PartialSeq', PartialSeqValue
, ...)Sequence
field
of the MATLAB structure. PartialSeqValue
is
a two-element array of integers containing the start and end positions
of the subsequence [
. StartAA
, EndAA
]StartAA
is
an integer between 1 and EndAA
; EndAA
is
an integer between StartAA
and the length
of the sequence.
saves the data returned from the GenPept database to a file.
Data
= getgenpept(...,
'ToFile', ToFileValue
, ...)ToFileValue
is a character vector specifying either a file name
or a path and file name for saving the GenPept data. If you specify only a file name, the file
is saved to the MATLAB Current Folder. The function does not append data to an existing file. Instead,
it overwrites the contents of the existing file without warning.
Tip
You can read a GenPept-formatted file back into MATLAB using
the genpeptread
function.
returns
the sequence in the specified format. Choices are Data
= getgenpept(...,
'FileFormat', FileFormatValue
, ...)'GenPept'
or 'FASTA'
.
When 'FASTA'
, then Data
contains
only two fields, Header
and Sequence
. 'GenPept'
is
the default when SequenceOnlyValue
is false
. 'FASTA'
is
the default when SequenceOnlyValue
is true
.
returns
only the sequence in Data
= getgenpept(...,
'SequenceOnly', SequenceOnlyValue
, ...)Data
, a character
array. Choices are true
or false
(default).
Note
If you use the 'SequenceOnly'
and 'ToFile'
properties
together, the output is always a FASTA-formatted file.
sets the connection timeout to retrieve data from the GenPept database.Data
= getgenpept(...,
'TimeOut, TimeOutValue
, ...)
Examples
To retrieve the sequence for the human insulin receptor and
store it in a structure, Seq
, in the MATLAB Command
Window, type:
Seq = getgenpept('AAA59174') Seq = LocusName: 'AAA59174' LocusSequenceLength: '1382' LocusNumberofStrands: '' LocusTopology: 'linear' LocusMoleculeType: '' LocusGenBankDivision: 'PRI' LocusModificationDate: '06-JAN-1995' Definition: 'insulin receptor precursor.' Accession: 'AAA59174' Version: 'AAA59174.1' GI: '307070' Project: [] DBSource: 'locus HUMINSR accession M10051.1' Keywords: '' Source: 'Homo sapiens (human)' SourceOrganism: [4x65 char] Reference: {[1x1 struct]} Comment: [14x67 char] Features: [40x64 char] Sequence: [1x1382 char] SearchURL: [1x104 char] RetrieveURL: [1x92 char]
By looking at the Features
field of the structure, you can determine that
the furin-like repeats domain is positions 234 through 281. To retrieve only the furin-like
repeats domain from the sequence for the human insulin receptor and store it in a structure,
Fur
, in the MATLAB Command Window, type:
Fur = getgenpept('AAA59174','PARTIALSEQ',[234,281]);
Version History
Introduced before R2006aSee Also
genpeptread
| getembl
| getgenbank
| getpdb