Encyclopedia of Biological Chemistry - Vol_4.pdf

(9058 KB) Pobierz
s
Secondary Structure in
Protein Analysis
George D. Rose
The Johns Hopkins University, Baltimore, Maryland, USA
D EGREES OF F REEDOM
IN THE B ACKBONE
The six backbone atoms in the peptide unit [Ca( i ) – CO –
NH – Ca( i þ 1)] are approximately coplanar, leaving
only two primary degrees of freedom for each residue.
By convention, these two dihedral angles are called f
and c( Figure 2 ). The protein’s backbone conformation
is described by the f,c-specification for each residue.
Proteins are linear, unbranched polymers of the 20 naturally
occurring amino acid residues. Under physiological conditions,
most proteins self-assemble into a unique, biologically relevant
structure: the native fold. This structure can be dissected into
chemically recognizable, topologically simple elements of
secondary structure: a-helix, 3 10 -helix, b-strand, polyproline
II helix, turns, and V-loops. Together, these six familiar motifs
account for ,95% of the total protein structure, and they are
utilized repeatedly in mix-and-match patterns, giving rise to the
repertoire of known folds. In principle, a protein’s three-
dimensional structure is predictable from its amino acid
sequence, but this problem remains unsolved. A related, but
ostensibly simpler, problem is to predict a protein’s secondary
structure elements from its sequence.
C LASSIFICATION OF S TRUCTURE
Protein structure is usually classified into primary,
secondary, and tertiary structure. “Primary structure”
corresponds to the covalently connected sequence of
amino acid residues. “Secondary structure” corresponds
to the backbone structure, with particular emphasis on
hydrogen bonds. And “tertiary structure” corresponds
to the complete atomic positions for the protein.
Protein Architecture
A protein is a polymerized chain of amino acid residues,
each joined to the next via a peptide bond. The
backbone of this polymer describes a complex path
through three-dimensional space called the “native
fold” or “protein fold.”
Secondary Structure
Protein secondary structure can be subdivided into
repetitive and nonrepetitive, depending upon whether
the backbone dihedral angles assume repeating values.
There are three major elements (a-helix, b-strand, and
polyproline II helix) and one minor element (3 10 -helix)
of repetitive secondary structure ( Figure 3 ). There are
two major elements of nonrepetitive secondary structure
(turns and V- loops).
C OVALENT S TRUCTURE
Amino acids have both backbone and side chain
atoms. Backbone atoms are common to all amino
acids, while side chain atoms differ among the 20
types.
Chemically,
an
amino
acid
consists
of
a
central, tetrahedral carbon atom, (
), linked cova-
R EPETITIVE S ECONDARY S TRUCTURE:
THE a-H ELIX
When backbone dihedral angles are assigned repeating
f,c-values near (2608, 2408), the chain twists into a
right-handed helix, with 3.6 residues per helical turn.
First proposed as a model by Pauling, Corey, and
Branson in 1951, the existence of this famous structure
was experimentally confirmed almost immediately by
lently to (1) an amino group ( – NH 2 ), (2) a carboxyl
group ( – COOH), (3) a hydrogen atom ( – H) and (4)
the side chain ( – R). Upon polymerization, the amino
group loses an – H and the carboxy group loses an
– OH; the remaining chemical moiety is called an
“amino acid residue” or, simply, a “residue.” Resi-
dues in this polymer are linked via peptide bonds,
as shown in Figure 1 .
1
Encyclopedia of Biological Chemistry, Volume 4. q 2004, Elsevier Inc. All Rights Reserved.
824252280.005.png 824252280.006.png 824252280.007.png 824252280.008.png
2
SECONDARY STRUCTURE IN PROTEIN ANALYSIS
FIGURE 1 (A) A generic amino acid. Each of the 20 naturally occurring amino acids has both backbone atoms (within the shaded rectangle) and
side chain atoms (designated R). Backbone atoms are common to all amino acids, while side chain atoms differ among the 20 types. Chemically, an
amino acid consists of a tetrahedral carbon atom ( – C– ), linked covalently to (1) an amino group ( – NH 2 ), (2) a carboxyl group ( – COOH), (3) a
hydrogen atom ( – H), and (4) the side chain ( – R). (B) Amino acid polymerization. The a-amino group of one amino acid condenses with the
a-carboxylate of another, releasing a water molecule. The newly formed amide bond is called a peptide bond and the repeating unit is a residue . The
two chain ends have a free a-amino group and a free a-carboxylate group and are designated the amino-terminal (or N-terminal) and the carboxy-
terminal (or C-terminal) ends, respectively. The peptide unit consists of the six shaded atoms (Ca–CO–NH–Ca), three on either side of the peptide
bond.
Perutz in ongoing crystallographic studies, well before
elucidation of the first protein structure.
In an a-helix, each backbone N – H forms a hydrogen
bond with the backbone carbonyl oxygen situated four
residues away in the linear sequence chain (toward the
N-terminus): N – H( i )· · ·OyC( i 2 4). The two sequen-
tially distant hydrogen-bonded groups are brought into
spatial proximity by conferring a helical twist upon
the chain. This results in a rod-like structure, with the
hydrogen bonds oriented approximately parallel to
the long axis of the helix.
In globular proteins, the average length of an a-helix
is 12 residues. Typically, helices are found on the outside
of the protein, with a hydrophilic face oriented toward
the surrounding aqueous solvent and a hydrophobic face
oriented toward the protein interior.
Inescapably, end effects deprive the first four amide
hydrogens and last four carbonyl oxygens of
Pauling-type, intra-helical hydrogen bond partners.
The special hydrogen-bonding motifs that can provide
partners for these otherwise unsatisfied groups are
known as “helix caps.”
In globular proteins, helices account for ,25% of
the structure on average, but this number varies.
Some proteins, like myoglobin, are predominantly
helical, while others, like plastocyanin, lack helices
altogether.
R EPETITIVE S ECONDARY S TRUCTURE :
THE 3 10 -H ELIX
When backbone dihedral angles are assigned repeating
f,c-values near (2508, 2308), the chain twists into a
right-handed helix. By convention, this helix is named
using formal nomenclature: 3 10 designates three residues
per helical turn and 10 atoms in the hydrogen bonded
ring between each N – H donor and its CyO acceptor.
(In this nomenclature, the a-helix would be called a
3.6 13 helix.)
Single turns of 3 10 helix are common and closely
resemble a type of b-turn (see below). Often, a-helices
terminate in a turn of 3 10 helix. Longer 3 10 helices are
sterically strained and much less common.
824252280.001.png
3
SECONDARY STRUCTURE IN PROTEIN ANALYSIS
FIGURE 3 A contoured Ramachandran (f ; c) plot. Backbone f,c-
angles were extracted from 1042 protein subunits of known structure.
Only nonglycine residues are shown. Contours were drawn in popu-
lation intervals of 10% and are indicated by the ten colors (in rainbow
order). The most densely populated regions are colored red. Three
heavily populated regions are apparent, each near one of the
major elements of repetitive secondary structure: a-helix (,2608,
2408), b-strand (,21208, 1208), P II helix (,2708, 1408). Adapted from
Hovm ¨ ller, S., Zhou, T., and Ohlson, T. (2002). Conformation of amino
acids in proteins. Acta Cryst. D58, 768 – 776, with permission of IUCr.
FIGURE 2 (A) Definition of a dihedral angle. In the diagram, the
dihedral angle, u, measures the rotation of line segment CD with respect
to line segment AB, where A, B, C, and D correspond to the x,y,z-
positions of four atoms. (uis calculated as the scalar angle between the
two normals to planes A – B – C and B – C – D.) By convention, clockwise
rotation is positive and u¼ 08 when A and D are eclipsed. (B) Degrees of
freedom in the protein backbone. The peptide bond (C 0 – N) has partial
double bond character, so that the six atoms, Ca( i ) – CO– Ca( i þ 1), are
approximately co-planar. Consequently, only two primary degrees of
freedom are available for each residue. By convention, these two
dihedral angles are called fand c 0 fis specified by the four atoms C 0 ( i )–
N–Ca–C 0 ( i þ 1) and c by the four atoms N( i )–Ca–C 0 –N( i þ 1).
When the chain is fully extended, as depicted here, f¼ c¼ 1808 :
Two b-strands in a b-sheet are classified as either
parallel or anti-parallel, depending upon whether their
mutual N- to C-terminal orientation is the same or
opposite, respectively.
In globular proteins, b-sheet accounts for about 15%
of the structure on an average, but, like helices, this
number varies considerably. Some proteins are pre-
dominantly sheet while others lack sheet altogether.
R EPETITIVE S ECONDARY S TRUCTURE :
THE b-S TRAND
When backbone dihedral angles are assigned repeating
f,c-values near (21208, 2 1208), the chain adopts
an extended conformation called a b-strand. Two or
more b-strands, aligned so as to form inter-strand
hydrogen bonds, are called a b-sheet. A b-sheet of just
two hydrogen-bonded b-strands interconnected by a
tight turn is called a b-hairpin. The average length of a
single b-strand is seven residues.
The classical definition of secondary structure found
in most textbooks is limited to hydrogen-bonded back-
bone structure and, strictly speaking, would not include
a b-strand, only a b-sheet. However, the b-sheet is
tertiary structure, not secondary structure; the interven-
ing chain joining two hydrogen-bonded b-strands can
range from a tight turn to a long, structurally complex
stretch of polypeptide chain. Further, approximately
half the b-strands found in proteins are singletons and
do not form inter-strand hydrogen bonds with another
b-strand. Textbooks tend to blur this issue.
Typically, b-sheet is found in the interior of the
protein, although the outermost parts of edge-strands
usually reside at the protein’s water-accessible surface.
R EPETITIVE S ECONDARY S TRUCTURE :
THE P OLYPROLINE II H ELIX (P II )
When backbone dihedral angles are assigned repeating
f,c-values near (2708, þ 1408), the chain twists into
a left-handed helix with 3.0 residues per helical turn.
The name of this helix is derived from a poly-proline
homopolymer, in which the structure is forced by its
stereochemistry. However, a polypeptide chain can
adopt a P II helical conformation whether or not it
contains proline residues.
Unlike the better known a-helix, a P II helix has no
intrasegment hydrogen bonds, and it is not included in
the classical definition of secondary structure for this
reason. This extension of the definition is also needed in
the case of an isolated b-strand. Recent studies have
shown that the unfolded state of proteins is rich in
P II structure.
824252280.002.png 824252280.003.png
4
SECONDARY STRUCTURE IN PROTEIN ANALYSIS
N ONREPETITIVE S ECONDARY
S TRUCTURE: THE T URN
Turns are sites at which the polypeptide chain changes
its overall direction, and their frequent occurrence is the
reason why globular proteins are, in fact, globular.
Turns can be subdivided into b-turns, g-turns, and
tight turns. b-turns involve four consecutive residues,
with a hydrogen bond between the amide hydrogen of
the 4th residue and the carbonyl oxygen of the 1st
residue: N – H( i )···OyC( i 2 3). b-turns are further
subdivided into subtypes (e.g., Type I, I 0 , II, II 0 , III,…)
depending upon their detailed stereochemistry. g-turns
involve only three consecutive, hydrogen-bonded resi-
dues, N – H( i )· · ·OyC( i 2 2), which are further divided
into subtypes.
More gradual turns, known as “reverse turns” or
“tight turns,” are also abundant in protein structures.
Reverse turns lack intra-turn hydrogen bonds but
nonetheless, are involved in changes in overall chain
direction.
Turns are usually, but not invariably, found on the
water-accessible surface of proteins. Together, b,g-and
reverse turns account for about one-third of the
structure in globular proteins, on an average.
accept a protein’s three-dimensional coordinates as
input and provide its secondary structure components
as output.
I NHERENT A MBIGUITY IN
S TRUCTURAL I DENTIFICATION
It should be realized that objective criteria for structural
identification can provide a welcome self-consistency,
but there is no single “right” answer. For example, turns
have been defined in the literature as chains sites at
which the distance between two a-carbon atoms,
separated in sequence by four residues, is not more
than 7 ˚ , provided the residues are not in an a-helix:
distance[Ca( i )–Ca( i þ 3)] # 7 ˚ and Ca( i )–Ca( i þ 3)
not a-helix. Indeed, turns identified using this definition
agree quite well with one’s visual intuition. However,
the 7 ˚ threshold is somewhat arbitrary. Had 7.1 ˚ been
used instead, additional, intuitively plausible turns
would have been found.
P ROGRAMS TO I DENTIFY S TRUCTURE
FROM C OORDINATES
Many workers have devised algorithms to parse the
three-dimensional structure into its secondary structure
components. Unavoidably, these procedures include
investigator-defined thresholds. Two such programs are
mentioned here.
N ONREPETITIVE S ECONDARY
S TRUCTURE : THE V-L OOP
V-loops are sites at which the polypeptide loops back on
itself, with a morphology that resembles the Greek letter
“V” although often with considerable distortion. They
range in length from 6 – 16 residues, and, lacking any
specific pattern of backbone-hydrogen bonding, can
exhibit significant structural heterogeneity.
Like turns, V-loops are typically found on the outside
of proteins. On an average, there are about four such
structures in a globular protein.
The D atabase of S econdary S tructure
Assignments in P roteins
This is the most widely used secondary structure
identification method available today. Developed by
Kabsch and Sander, it is accessible on the internet, both
from the original authors and in numerous implemen-
tations from other investigators as well.
The d atabase of s econdary s tructure assignments in
p roteins (DSSP) identifies an extensive set of secondary
structure categories, based on a combination of back-
bone dihedral angles and hydrogen bonds. In turn,
hydrogen bonds are identified based on geometric criteria
involving both the distance and orientation between a
donor– acceptor pair. The program has criteria for
recognizing a-helix, 3 10 -helix, p helix, b-sheet (both
parallel and anti-parallel), hydrogen-bonded turns and
reverse turns. (Note: the p-helix is rare and has been
omitted from the secondary structure categories.)
Identification of Secondary
Structure from Coordinates
Typically, one becomes familiar with a given protein
structure by visualizing a model – usually a computer
model – that is generated from experimentally deter-
mined coordinates. Some secondary structure types are
well defined on visual inspection, but others are not. For
example, the central residues of a well-formed helix are
visually unambiguous, but the helix termini are subject
to interpretation. In general, visual parsing of the
protein into its elements of secondary structure can be
a highly subjective enterprise. Objective criteria have
been developed to resolve such ambiguity. These criteria
have been implemented in computer programs that
Pro tein S econdary S tructure Assignments
In contrast to DSSP, protein secondary structure assign-
ments (PROSS) identification is based solely on back-
bone dihedral angles, without resorting to hydrogen
5
SECONDARY STRUCTURE IN PROTEIN ANALYSIS
bonds. Developed by Srinivasan and Rose, it is
accessible on the internet .
PROSS identifies only a-helix, b-strand, and turns,
using standard f,c definitions for these categories.
Because hydrogen bonds are not among the identifi-
cation criteria, PROSS does not distinguish between
isolated b-strands and those in a b-sheet.
proteins had been solved, these data-dependent f -values
fluctuated significantly as new structures were added to
the database. At this point there are more than 22 000
structures in the Protein Data Bank ( www.rcsb.org ), and
the f -values have reached a plateau.
D ATABASE- I NDEPENDENT P REDICTIONS:
THE H YDROPHOBICITY P ROFILE
Hydrophobicity profiles have been used to predict the
location of turns in proteins. A hydrophobicity profile is
a plot of the residue number versus residue hydropho-
bicity, averaged over a running window. The only
variables are the size of the window used for averaging
and the choice of hydrophobicity scale (of which there
are many). No empirical data from the database is
required. Peaks in the profile correspond to local
maxima in hydrophobicity, and valleys to local minima.
Prediction is based on the idea that apolar sites along the
chain (i.e., peaks in the profile) will be disposed
preferentially to the molecular interior, forming a
hydrophobic core, whereas polar sites (i.e., valleys in
the profile) will be disposed to the exterior and
correspond to chain turns.
Prediction of Protein
Secondary Structure from
Amino Acid Sequence
Efforts to predict secondary structure from amino acid
sequence dates back to the 1960s to the works of Guzzo,
Prothero and, slightly later, Chou and Fasman. The
problem is complicated by the fact that protein secondary
structure is only marginally stable, at best. Proteins fold
cooperatively, with secondary and tertiary structure
emerging more or less concomitantly. Typical peptide
fragments excised from the host protein, and measured in
isolation, exhibit only a weak tendency to adopt their
native secondary structure conformation.
P REDICTIONS B ASED ON E MPIRICALLY
D ETERMINED P REFERENCES
Motivated by early work of Chou and Fasman, this
approach uses a database of known structures to discover
the empirical likelihood, f
N EURAL N ETWORKS
More recently, neural network approaches to second-
ary structure prediction have come to dominate the
field. These approaches are based on pattern-recog-
nition methods developed in artificial intelligence.
When used in conjunction with the protein database,
these
of finding each of the twenty
amino acids in helix, sheet, turn, etc . These likelihoods
are equated to the residue’s normalized frequency of
occurrence in a given secondary structure type, obtained
by counting. Using alanine in helices as an example
;
are
the
most
successful
programs
available
today.
A neural network is a computer program that
associates an input (e.g., a residue sequence) with an
output (e.g., secondary structure prediction) through a
complex network of interconnected nodes. The path
taken from the input through the network to the output
depends upon past experience. Thus, the network is said
to be “trained” on a dataset.
The method is based on the observation that amino
acid substitutions follow a pattern within a family of
homologous proteins. Therefore, if the sequence of
interest has homologues within the database of known
structures, this information can be used to improve
predictive success, provided the homologues are recog-
nizable. In fact, a homologue can be recognized quite
successfully when the sequence of interest and a putative
homologue have an aligned sequence identity of 25%
or more.
Neural nets provide an information-rich approach
to secondary structure prediction that has become
increasingly successful as the protein databank has
grown.
fraction Ala in helix ¼ occurrences of Ala in helices
occurrences of Ala in database
This fraction is then normalized against the corre-
sponding fraction of helices in the database:
fraction Ala in helix
fraction helices in database
f helix
Ala ¼
¼ occurrences of Ala in helices
occurrences of Ala in database
number of residues in helices
number of residues in database
A normalized frequency of unity indicates no pre-
ference – i.e., the frequency of occurrence of the given
residue in that particular position is the same as its
frequency at large. Normalized frequencies greater
than/less than unity indicate selection for/against the
given residue in a particular position.
These residue likelihoods are then used in combination
to make a prediction. When only a small number of
824252280.004.png
 
Zgłoś jeśli naruszono regulamin