Main Content

pdbread

Read data from Protein Data Bank (PDB) file

Description

PDBStruct = pdbread(File) reads data from a PDB-formatted text file and stores the data in a structure containing a field for each PDB record.

example

PDBStruct = pdbread(File,Name=Value) uses additional options specified by one or more name-value arguments.

example

Examples

collapse all

Use the getpdb function to retrieve structure information for the nicotonic receptor protein from the Protein Data Bank, and save the data as a PDB-formatted file in the current folder.

PDBtoFile = getpdb('1abt',ToFile="nicotonic_receptor.pdb");

Read the data from the PDB-formatted file into a structure.

PDBStruct = pdbread("nicotonic_receptor.pdb")
PDBStruct = struct with fields:
            Header: [1×1 struct]
             Title: [2×60 char]
          Compound: [8×37 char]
            Source: [2×10 char]
          Keywords: 'TOXIN'
    ExperimentData: 'SOLUTION NMR'
           Authors: 'V.J.BASUS,G.SONG,E.HAWROT'
      RevisionDate: [1×5 struct]
           Journal: [1×1 struct]
           Remark1: [1×1 struct]
           Remark2: [1×1 struct]
           Remark3: [1×1 struct]
           Remark4: [2×59 char]
         Remark100: [3×59 char]
         Remark210: [25×59 char]
         Remark215: [6×59 char]
         Remark300: [6×59 char]
         Remark350: [13×59 char]
         Remark465: [13×59 char]
         Remark500: [168×59 char]
      DBReferences: [1×2 struct]
          Sequence: [1×2 struct]
             Helix: [1×1 struct]
             Sheet: [1×5 struct]
            SSBond: [1×5 struct]
       CISPeptides: [1×2 struct]
            Cryst1: [1×1 struct]
           OriginX: [1×3 struct]
             Scale: [1×3 struct]
             Model: [1×4 struct]
      Connectivity: [1×10 struct]
            Master: [1×1 struct]

Read the data from only the second model into a structure.

PDBStruct = pdbread("nicotonic_receptor.pdb",ModelNum=2);
PDBStruct.Model
ans = struct with fields:
    MDLSerNo: 2
        Atom: [1×1205 struct]
    Terminal: [1×2 struct]

Input Arguments

collapse all

Name of PDB file, specified as one of the following.

  • Character vector or string specifying a filename, a path and filename, or a URL pointing to a file. The file must be a PDB-formatted file (ASCII text file). If you specify only a filename, that file must be on the MATLAB® search path or in the current folder.

  • Character array or column vector of strings that contains the text of a PDB-formatted file.

For more information about PDB formatting, visit https://www.wwpdb.org/documentation/file-format.

Tip

You can use the getpdb function with the ToFile property to retrieve protein structure data from the PDB database and create a PDB-formatted file.

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: PDBStruct = pdbread(File,ModelNum=2,TimeOut=10)

Model number to read from File, specified as a positive integer. If ModelNum does not correspond to an existing model number in File, then the function reads the coordinate information of all the models.

Example: PDBStruct = pdbread(File,ModelNum=2) reads only the second model from the file.

Data Types: double

Connection timeout (in seconds) to read data from the PDB database, specified as a positive scalar.

Example: PDBStruct = pdbread(File,TimeOut=10) waits 10 seconds to receive a response from the PDB database.

Data Types: double

Output Arguments

collapse all

PDB data, specified as a structure with a field for each PDB record. The following table summarizes the possible PDB records and the corresponding fields in PDBStruct.

PDB Database RecordField in the MATLAB Structure
HEADERHeader
OBSLTEObsolete
TITLETitle
CAVEATCaveat
COMPNDCompound
SOURCESource
KEYWDSKeywords
EXPDTAExperimentData
AUTHORAuthors
REVDATRevisionDate
SPRSDESuperseded
JRNLJournal
REMARK 1Remark1
REMARK N

Note

N can be any number from 2 through 999.

Remarkn

Note

n can be any number from 2 through 999.

DBREFDBReferences
SEQADVSequenceConflicts
SEQRESSequence
FTNOTEFootnote
MODRESModifiedResidues
HETHeterogen
HETNAMHeterogenName
HETSYNHeterogenSynonym
FORMULFormula
HELIXHelix
SHEETSheet
TURNTurn
SSBONDSSBond
LINKLink
HYDBNDHydrogenBond
SLTBRGSaltBridge
CISPEPCISPeptides
SITESite
CRYST1Cryst1
ORIGXnOriginX
SCALEnScale
MTRIXnMatrix
TVECTTranslationVector
MODELModel
ATOMAtom
SIGATMAtomSD
ANISOUAnisotropicTemp
SIGUIJAnisotropicTempSD
TERTerminal
HETATMHeterogenAtom
CONECTConnectivity

Several fields of PDBStruct are structures containing subfields.

  • The Sequence field contains sequence information in the following subfields:

    • NumOfResidues

    • ChainID

    • ResidueNames — Contains the three-letter codes for the sequence residues

    • Sequence — Contains the single-letter codes for the sequence residues

    Note

    If the sequence has modified residues, then the ResidueNames subfield might not correspond to the standard three-letter amino acid codes. In this case, the Sequence subfield contains the modified residue code in the position corresponding to the modified residue. The modified residue code is provided in the ModifiedResidues field.

  • The Model field is a structure or array of structures that contains coordinate information. If PDBStruct contains one model, then Model is a structure containing coordinate information for that model. If PBStruct contains multiple models, then Model is an array of structures containing coordinate information for each model. The Model field contains the following subfields:

    • Atom

    • AtomSD

    • AnisotropicTemp

    • AnisotropicTempSD

    • Terminal

    • HeterogenAtom

  • The Atom field is an array of structures containing the following subfields:

    • AtomSerNo

    • AtomName

    • altLoc

    • resName

    • chainID

    • resSeq

    • iCode

    • X

    • Y

    • Z

    • occupancy

    • tempFactor

    • segID

    • element

    • charge

    • AtomNameStruct — Contains three subfields: chemSymbol, remoteInd, and branch

Version History

Introduced before R2006a