|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Vol. 13, Issue 10, 3369-3387, October 2002
Howard Hughes Medical Institute, Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Submitted May 7, 2002; Revised June 25, 2002; Accepted July 10, 2002| |
ABSTRACT |
|---|
|
|
|---|
The von Willebrand A (VWA) domain is a well-studied domain involved
in cell adhesion, in extracellular matrix proteins, and in
integrin receptors. A number of human diseases arise from
mutations in VWA domains. We have analyzed the phylogenetic
distribution of this domain and the relationships among ~500 proteins
containing this domain. Although the majority of VWA-containing
proteins are extracellular, the most ancient ones, present in all
eukaryotes, are all intracellular proteins involved in functions such
as transcription, DNA repair, ribosomal and membrane transport, and the
proteasome. A common feature seems to be involvement in multiprotein
complexes. Subsequent evolution involved deployment of VWA domains by
Metazoa in extracellular proteins involved in cell adhesion such
as integrin
subunits (all Metazoa). Nematodes and chordates
separately expanded their complements of extracellular matrix proteins
containing VWA domains, whereas plants expanded their intracellular
complement. Chordates developed VWA-containing integrin
subunits, collagens, and other extracellular matrix proteins
(e.g., matrilins, cochlin/vitrin, and von Willebrand factor).
Consideration of the known properties of VWA domains in
integrins and extracellular matrix proteins allows insights
into their involvement in protein-protein interactions and the roles
of bound divalent cations and conformational changes. These allow
inferences about similar functions in novel situations such as protease
regulators (e.g., complement factors and trypsin inhibitors) and
intracellular proteins (e.g., helicases, chelatases, and copines).
| |
INTRODUCTION |
|---|
|
|
|---|
The rapid accumulation of genomic sequences offers both the challenge of understanding the functions of proteins encoded by those genomes and the opportunity for drawing inferences about the evolution of functions in proteins in different phyla. Effective annotation of the genes and their products requires both analyses of sequence and structural homologies among genes and incorporation of biochemical and biological information about the proteins to make best use of the genomic information.
We have been interested in the structure, function, and evolution of
proteins involved in cell adhesion and interaction (Hynes and Zhao,
2000
). Annotation of these proteins represents a significant challenge
because we estimate that there are >2000 such proteins encoded by
mammalian genomes. Searching proteomes for conserved domains is a first
step toward overcoming that challenge. Some conserved domains have been
extensively studied and their presence within a protein suggests
specific biological properties. In this essay, we present an analysis
of a subset of proteins; those including the so-called von Willebrand A
(VWA) domain (reviewed in Colombatti et al., 1993
; Tuckwell,
1999
). Proteins containing VWA domains are present in Eukaryota
(Metazoa, fungi, plants, and protists), Eubacteria, and Archaea. VWA
domains are Rossmann folds consisting of a
-sheet sandwiched by
multiple
helices. Many VWA domains bind metal ions via a
noncontiguous sequence motif called metal ion-dependent adhesion site
(MIDAS). Frequently, VWA domain-containing proteins function in
multiprotein complexes. The eponymous VWA domains of von Willebrand
factor play key roles in the linkage of platelets to collagen (see
below). The homologous inserted or I domains of some integrin
subunits are also involved in interactions with collagen and other
ligands. Our initial interest in VWA/I domains arose from
considerations of the evolution of integrin subunits because
VWA domain-containing integrin
subunits seem to be
restricted to chordates (Hynes and Zhao, 2000
). In the course of
investigating the verity and significance of this supposition, we
explored the universe of VWA/I domains, and we present herein the
results of that inquiry, including a number of novel proteins and new insights.
The majority of well-characterized VWA domains are found in cell
adhesion and extracellular matrix (ECM) proteins (Tuckwell, 1999
). In
many cases, it is clear, or plausible, that they are involved in
protein-protein (e.g., receptor-ligand) interactions that frequently
involve divalent cations. In exploring what might be the origin of this
domain, we discovered that the VWA domains most widely distributed
phylogenetically are intracellular proteins present in all eukaryotic
genomes sequenced thus far. The roles of the VWA domains in these
intracellular proteins are not clear, but many are components of
multiprotein complexes and a plausible hypothesis is that the VWA
domains mediate protein-protein interactions involved in the assembly
or function of these complexes. It seems that the presumptive
primordial VWA domains subsequently were deployed by metazoans in
extracellular protein-protein interactions, with integrin
subunits as very early representatives. Later incorporations of VWA
domains also into integrin
subunits and into many ECM
molecules seem to be predominantly chordate elaborations presumably
related to the large expansion of ECM complexity in chordates, although
other eukaryotes, notably, Caenorhabditis elegans, have also
deployed VWA domains in ECM proteins. VWA domains also pop up in other
surprising and interesting contexts such as ion channel subunits, the
anthrax toxin receptor, and protease regulators.
In this review, we consider first the best understood VWA domains; those in integrins and von Willebrand factor (vWF) from which one can infer their likely functions. We then consider the other contexts in which one finds these domains and discuss their possible functions and potential evolution.
| |
MATERIALS AND METHODS |
|---|
|
|
|---|
As of 24 March 2002 the SMART nonredundant database
(nrdb) (a merger of swissprot, swissnew, sptrembl, and
sptremblnew) contained 948 proteins containing 1196 VWA domains, with
new additions being deposited regularly (Letunic et al.,
2002
; http://smart.embl-heidelberg.de/). The survey presented in
this essay is based largely on the SMART database and analysis because
of the experience with integrin
subunits. The properties of
integrin
subunits prompted speculation that they contained
a VWA domain (Lee et al., 1995
; Tozer et al., 1996
; Loftus and Liddington, 1997
; Tuckwell and Humphries, 1997
), in
spite of the fact that many domain prediction algorithms were unable to
support this claim. However, the SMART analysis techniques predict the
VWA domain in integrin
subunits with high confidence (Tuckwell, 1999
; Ponting et al., 2000
; Schultz et
al., 2000
). Recently, the crystal structure of the extracellular
portion of the integrin
v
3 was reported (Xiong et
al., 2001
), and the presence of a VWA domain in the
subunit
was confirmed. Therefore, we feel that the SMART algorithm for
prediction of VWA domains is currently the best available, and we have
used SMART analysis as the basis for this essay. We have complemented
these analyses with Interpro where helpful. In addition, we have used
each human VWA domain predicted by SMART (as of March 2002) to query
the Genomescan-predicted protein database at National Center for
Biotechnology Information (Yeh et al., 2001
;
http://www.ncbi.nlm.nih.gov/genome/seq/HsBlast.html). The National
Center for Biotechnology Information genome annotation effort is
ongoing, and many novel proteins detected by Genomescan have not yet
been assigned to the databases included in the SMART nonredundant
database. In many cases, particularly the novel collagens we report
herein, the public and private human and mouse genome assemblies were
also used extensively to arrive at plausible structures for
uncharacterized molecules.
In addition to overall homology of domain structure and database
queries with basic local alignment search tool (BLAST), we also made
use of whole-molecule and individual VWA domain (extracted using SMART)
alignments. Alignments, bootstrap analysis, and tree preparation were
done using ClustalW, ClustalTree, and DrawTree provided by the Biology
Workbench 3.2 (http://workbench.sdsc.edu/; Felsenstein, 1989
) and
the VectorNTI software package. Frequently, bootstrap numbers,
expressed as a percentage of 1000 pseudoreplicates, are provided to
give a confidence level for a given relationship (Efron et
al., 1996
). In the case of whole-molecule alignments, the
resulting phylogenetic trees were used to provide a framework for
discussion of families of molecules. In the case of the individual domain alignments, phylogenetic trees and bootstrap values were used to
identify uncharacterized homologues by calculating relatedness among
different domains. This type of analysis was more systematic and
quantifiable than individual BLAST searches and was particularly useful
in cases where the molecules in the databases were fragments or gene
predictions. Due to the relatively large number of VWA domains in the
database, it was sometimes necessary to create species-specific minimal
sets where a single representative sequence was selected for each group
of closely related sequences (usually paralogues; Table S1). This was
done by first removing nearly identical sequences such as allelic
variants and fragments to create a more strictly defined nonredundant
set. The nonredundant sets were then aligned and subjected to bootstrap
analysis and groups of closely related sequences were reduced by
randomly selecting a single representative from each node with
bootstrap support >94%. These minimal collections were then small
enough to allow pairwise comparisons of the complement of VWA domains
within a species or group.
We also investigated the occurrence of MIDAS motifs, first defined as
metal ion-binding motifs in the VWA domains of integrin
subunits (Lee et al., 1995
). Because metal ions play key
roles in the functions of VWA domains in integrins, we scored
the three noncontiguous elements of the MIDAS motif (D-x-S-x-S... .
T... . D) in hand-edited alignments of all the VWA domains
identified in the five completed eukaryotic genomes as well as all
prokaryotic VWA domains and a subset of protist VWA domains. For the
purpose of discussion, the D-x-S-x-S will be referred to as region 1 and the other conserved residues will be called T4 and D5 (Figure S1,
a-c). Approximately 46% of VWA domains have a perfectly conserved MIDAS motif (see figures, red stars); others are missing one or more
elements. Structural studies of integrin VWA domains and biochemical analysis of copines (Tomsig and Creutz, 2000
) indicate that
a perfectly conserved MIDAS motif is not required for metal ion
binding. To accommodate these observations and emphasize the importance
in metal binding of the region 1 D followed by spaced alcohol residues
(S or T), we coined the term imperfect MIDAS (open red stars) to refer
to VWA domains that lack a subset of MIDAS elements but are likely to
bind metal ions. Examples of imperfect motifs include those with region
1 (D-x-S-x-S) but without one or both of T4 and D5 or those with
conservative changes in region 1 (D-x-T-x-S in copines) with and
without conservation of T4 and D5. It is likely that confident
conclusions regarding the presence or absence of a functional MIDAS
motif will require structural analysis of the VWA domain in question in
both ligand-bound and -unbound states (Xiong et al., 2002
).
In light of these considerations, we have presented our analysis of
MIDAS motifs in an Excel (Microsoft, Redmond, WA) spreadsheet (Table
S3) that will allow interested readers to sort VWA domains with respect
to conservation and sequence of any or all of the three MIDAS elements.
| |
RESULTS |
|---|
|
|
|---|
Our analysis of the human proteome uncovered 86 proteins
containing 134 VWA domains (pseudogenes and splice variants excluded). Phylogenetic analysis and examination of domain architectures indicate
that most of these proteins fall into 15 clearly paralogueous groups
(Figure 1). These molecules can be
clustered into the following categories.
|
Integrins
Integrin
Subunits
Integrin
heterodimers are the major cell surface receptors for extracellular
matrix and can also support cell-cell adhesion (Hynes, 1992
). We have
searched the available human genome assemblies and found no additional
integrin
subunits beyond the eight already known. The same
is true for the mouse genome with the exception of a
molecule called pactolus (Chen et al., 1998
). At the
sequence level, murine pactolus is closely related to the
integrin
2 subunit but it does not seem to associate with
subunits (Garrison et al., 2001
) and is not detected
in the human genome. We were unable to detect any credible homologues
of
integrins in nonmetazoan phyla.
subunits are predicted to have VWA domains
(Tuckwell, 1999
v
3 integrin
heterodimer (Xiong et al., 2001
3 subunit. Integrin
subunits also contain
an N-terminal PSI domain and repeated EGF-related domains termed I-EGF
domains (Beglova et al., 2002
VWA
domain interfaces with the
subunit and ligand binds across the
interface (Xiong et al., 2002
/
associations, and ligand specificity is conferred by the
/
combination. Sequences within the integrin
VWA
domains are important determinants of both heterodimer and ligand
specificity (Takagi et al., 1997
subunit VWA domains (Table S3)
and integrin-ligand interaction involves joint coordination of
a metal ion by the integrin MIDAS and a carboxylate from the ligand (Xiong et al., 2002
subunits are among the most ancient of those
involved in cell adhesion that use these properties of the domain.
Integrin
Subunits
Nine of the 18 known
human integrin
subunits have a VWA domain (Figure 1). No
new
subunits (with or without VWA domains) were detected in
detailed searches of the complete human and mouse genomes. The
VWA-positive
subunits can be divided into two groups. Integrins
1,
2,
10, and
11 associate with
integrin
1 subunit to form receptors for collagen.
Integrins
M,
X,
D,
L complex with the
2 subunit,
whereas
E complexes with
7, and all five are expressed on
leukocytes where they mediate cell-cell adhesion. Like integrin
subunits,
subunits in general are restricted to Metazoa. VWA
domain-containing
subunits, however, seem to be a chordate-specific
radiation of the gene family because they have been found only in
vertebrates and in the primitive chordate Halocynthia
roretzi (Miyazawa et al., 2001
) but are absent
in the C. elegans and Drosophila
melanogaster genomes. The H. roretzi
subunit
is expressed on phagocytic hemocytes, perhaps suggesting that it is an
orthologue of the leukocyte
subunits. Because H.
roretzi contains homologues of the B and C3 components of
complement (Nonaka and Azumi, 1999
), it is possible that the H.
roretzi VWA-containing
chain functions as a C3b receptor
like
M
2 in mammals. The
subunit VWA domains probably predate
the
subunit VWA domains because
subunits are found in all
metazoans, whereas the VWA domain-containing
subunits so far are
known only in chordates. Recombinant VWA domains from
integrin
subunits retain the ligand-binding specificity and
dependence on divalent cations observed in intact heterodimers (Randi
and Hogg, 1994
; Ueda et al., 1994
; reviewed in Shimaoka
et al., 2002
). As a result, integrin
VWA domains are understood in detail and are informative reference points for
considering the functions of other VWA domains. The crystal structures
of the isolated VWA domains of
1 (Nolte et al., 1999
),
2 (Emsley et al., 1997
),
L (Qu and Leahy,
1995
), and
M (Lee et al., 1995
; Baldwin et
al., 1998
) are known. It is clear that two
subunit VWA domain
conformations (open and closed) represent high-affinity and
low-affinity ligand-binding states, respectively (Shimaoka et
al., 2001
; Ma et al., 2002
). This observation
highlights the possibility that nonintegrin VWA domains may be
subject to similar regulation.
subunit VWA domains and interaction with ligand involves mutual coordination of metal ion (Emsley et al., 2000
VWA
domain MIDAS motif is critical for ligand binding in heterodimers that include VWA domain-containing
subunits. Mutations in the
2 and
7 VWA domain MIDAS motifs inhibit ligand binding despite being
located a distance from the point of integrin-ligand contact (Bajt et al., 1995Multidomain ECM Proteins (Figure 2)
von Willebrand Factor
vWF is a plasma and ECM protein
that mediates adhesion of platelets to fibrillar collagen underlying
injured vascular endothelium (reviewed in Sadler, 1998
). It is known
only in vertebrates. Mutations in the vWF gene result in von Willebrand
disease (vWD), a common human bleeding disorder (reviewed in Keeney and
Cumming, 2001
). There are three VWA domains in vWF (referred to as A1,
A2, and A3); A1 and A2 are related and neither has a MIDAS motif. At
the sequence level, A3 has an imperfect MIDAS. The crystal structures of A1 and A3 are known and neither contains a coordinated metal ion and
the conserved MIDAS elements in A3 are not required for function
(Bienkowska et al., 1997
; Emsley et al.,
1998
). However, the VWA domains support the vWF-mediated linkage
between platelets and fibrillar collagens. The platelet receptor
GPIb/IX/V binds to domain A1 (Cruz et al., 2000
) and
mutations in A1 cause type 2B and type 2M vWD. These are characterized,
respectively, by increased or decreased platelet interaction. Domain A3
supports cation-independent vWF interaction with fibrillar collagens
(Cruz et al., 1995
; Bienkowska et al.,
1997
; Romijn et al., 2001
). A missense mutation in A3
causes vWD due to defective collagen binding (Ribba et
al., 2001
). The function of the vWF A2 domain is less well
defined but missense mutations in A2 frequently cause type 2A vWD, a
dominant disorder characterized by a decrease in high-molecular-weight multimers of vWF. This is due to improper secretion of vWF or increased
proteolysis of the secreted mutant protein (Keeney and Cumming,
2001
) or perhaps to a defect in assembly of multimers. Recombinant vWF lacking A2 is resistant to proteolysis after
denaturation, suggesting that a protease sensitivity site within the
domain may be exposed in misfolded mutant protein (Lankhof et
al., 1997
). Thus, like those in integrins, the vWF VWA
domains are well-characterized domains clearly involved in
protein-protein interactions important for cell adhesion. One
difference is that, in contrast with the integrin domains, the
VWA domains of vWF seem to be cation-independent, perhaps related to
their imperfect MIDAS motifs.
|
Hemicentins, NG37, and DICE1
Hemicentin is an
adhesion molecule originally characterized in C. elegans, where it is secreted by body wall muscle cells and
gonadal leader cells. Once secreted, hemicentin assembles into
track-like matrix structures. C. elegans lacking
hemicentin have defects in mechanosensory neuron development and
germline cell migration (Vogel and Hedgecock, 2001
). Both humans and
mice have two homologues of C. elegans hemicentin but
orthologues are not detectable in the D. melanogaster,
yeast, and Arabidopsis thaliana genomes. Hemicentin is
the only confirmed VWA domain-containing ECM protein found in both
C. elegans and mammals (but see DICE1 below). The domain
architecture is similar between C. elegans and humans
with a single VWA domain near the N terminus followed by >40 Ig
domains. All hemicentin VWA domains have imperfect MIDAS motifs and are
highly conserved among themselves; region 1 has the sequence D-x-T-x-S,
T4 is a D, and D5 is conserved. In contrast with the C. elegans gene, mammalian hemicentins contain additional domains
that are likely to be functionally important (Figure 2). Mammalian
hemicentin 1 contains multiple TSP-1 domains. TSP-1 domains were
originally identified in thrombospondin and are present in a wide range
of proteins with roles in cell adhesion. The only other known proteins
containing a combination of TSP-1 and VWA domains are proteins found in
parasites from the protist kingdom of eukaryotes. These organisms cause
malaria in humans and the TSP-1-VWA proteins seem to be secreted or
transmembrane and function in adhesion and motility during the invasive
phase of the parasitic life cycle (reviewed in Naitza et
al., 1998
; Figure S8). The plasmodium proteins are not closely
related to any of the metazoan VWA-containing proteins. Both mammalian
hemicentins also contain a G2F domain. The functions of G2F domains are
unclear but they seem to be restricted to mammalian hemicentins and the
metazoan nidogens, also adhesion proteins.
Polydom
Polydom is a secreted protein originally
identified in mice in a screen for molecules containing EGF domains
(Gilges et al., 2000
). The domain architecture consists
of an N-terminal VWA domain and a central PTX domain; the remainder of
the protein is made up of a large number of repeated CCP and EGF
modules (Figure 2). The human orthologue of polydom was previously
uncharacterized and has a similar domain architecture (see Table S4 for
sequence). The VWA domain of polydom contains a conserved MIDAS motif,
suggesting a potential role for divalent cations in polydom
function. Included in this list of enormous ECM molecules with
proven or likely roles in cell adhesion is a predicted nematode protein
of >1300 kDa (Figure 2). This molecule has four VWA domains (all
lacking MIDAS motifs), three EGF-like, six Ig-like, one CCP, and 10 Fn3
domains. Its function, like that of polydom, is unknown but their
domain composition, including VWA domains, strongly suggests roles in cell adhesion.
Matrilins
The matrilins are a family of
fibril-forming vertebrate ECM proteins with four paralogues in the
human genome (reviewed in Deak et al., 1999
). With the
exception of matrilin 3, which has a single VWA domain in all
orthologues examined, matrilins contain two VWA domains that flank a
variable number of EGF domains (Figure 2). Matrilins 1 and 3 are
expressed in cartilage and matrilins 2 and 4 have a widespread
distribution (Deak et al., 1999
). The matrilins form
oligomers (Wu and Eyre, 1998
), and both VWA domains of matrilin 1 play
a role in oligomerization (Chen et al., 1999
). All
matrilin VWA domains have MIDAS motifs (imperfect in the matrilin 3 VWA
domain) and mutation of the MIDAS motif in both matrilin 1 VWA domains
blocks filamentous network formation, suggesting that cation binding by
the VWA domains of matrilins may be required for function (Chen
et al., 1999
). Matrilin-1 supports integrin
1
1-mediated adhesion and spreading of chondrocytes (Makihira et al., 1999
). Two groups observed normal development in
mice lacking matrilin-1 despite abnormal type II collagen
fibrillogenesis (Aszodi et al., 1999
; Huang et
al., 1999
). Two different recessive mutations in the exon
encoding the VWA domain of matrilin-3 found in unrelated families cause
the EDM5 form of multiple epiphyseal dysplasia (Chapman et
al., 2001
). These mutations result in single amino acid changes
of V194D or R121W. These residues are conserved in all matrilin family
members, are not part of the MIDAS motif, and the disease-causing
mechanism is unknown. Like the VWA collagens, matrilins seem to be a
chordate invention.
Cochlin and Vitrin
Cochlin and vitrin are proteins
containing a single LCCL domain followed by two VWA domains (Figure 2).
Cochlin is expressed by fibrocytes in the inner ear, localized to
extracellular spaces, and missense mutations in the LCCL domain lead to
the autosomal-dominant hearing disorder DFNA9 (Robertson et
al., 2001
). Vitrin was isolated from the vitreous of the bovine
eye (Mayne et al., 1999
). The LCCL is a domain found in
Limulus Factor C, cochlins, and lgl1. The C-terminal VWA domains of
cochlin and vitrin form a clade with those of matrilins (51% bootstrap
support). Whole-molecule alignments also support the relationship
between cochlin/vitrin and matrilins. The C-terminal VWA domains of
both cochlin and vitrin have perfectly conserved MIDAS motifs. The
functions of the VWA domains in these molecules are unknown but, given
their extracellular localization and sequence similarities, they are likely similar to the roles in matrilins and VWA collagens (see below).
All seem most likely to comprise ECM molecules and many, perhaps all,
their VWA domains are involved in protein-protein interactions.
VWA Collagens (Figure
3)
The collagens are a large family
of extracellular matrix molecules defined by the presence of repeating
(G-X-Y) sequences that form a triple helix (Ricard-Blum et
al., 2000
). Sixteen collagens also contain 57 of the 134 VWA
domains found in the human proteome. Of these, 25 have perfectly
conserved MIDAS motifs (Figure 3, solid red stars); four imperfect
MIDAS motifs are also indicated (Figure 3, hollow red stars). Eight of
the 16 collagens are well characterized and have been studied
experimentally; the other eight are novel genes detected in this study
(sequences available in Table S4). These 16 collagens can be considered
in several related groups (Figure 3).
|
1,
2, and
3 chains) is a
heterotrimer that forms filamentous structures flanked by globular domains containing the VWA domains. These globular domains can associate with themselves or with other ECM molecules. The monomeric form of collagen 6 is a heterotrimer consisting of 6
1, 6
2, and 6
3 chains. Antiparallel association of collagen 6 monomers forms dimers and the dimers laterally associate to form tetramers. The tetramers associate end-to-end to form a beaded filament. Collagen 6 interacts with a wide range of ECM components (reviewed in Ricard-Blum et al., 2000
3 interact with
fibrillar collagen 1 (Bonaldo et al., 1990
1 and col6
2 but is not clearly
related at the sequence level. Three new collagens (A, B, and C) are in
a cluster on chromosome 3 in human, and their orthologues are on
chromosome 9 in mouse. The gene predictions are a little ambiguous in
the current genome assemblies, and the structures shown in Figure 3
represent a best estimate based on both the human and mouse assemblies
(public and Celera). Collagens A-D show close homologies in domain
arrangements and similarities among VWA domains with collagen 6
3
(Figure 3). Although the gene predictions shown represent open reading
frames (ORFs) without stop codons, their domain organizations have some puzzling features such as lack of collagen repeats in collagens A and B
and the anomalous position of a VWA-Ku domain pair at the N terminus of
collagen C. Whether these represent genome assembly problems or
pseudogenes will have to await further information, but it seems likely
that there may exist a family of collagen subunits related to collagen
6
3.
It seems very likely that the VWA domains of these collagens are
involved in protein-protein interactions with other matrix proteins
and possibly with cells. This elaboration of VWA collagens seems to be
chordate- or vertebrate- specific.
Other Extracellular VWA Domain Proteins
We turn next to a different class of VWA-domain proteins where the presence of VWA domains comes as something of a surprise (Figure 1).
Calcium Channel
2
Family
Voltage-gated calcium
channels are a complex of five proteins:
1,
1,
,
2, and
. The
2 and
subunits result from proteolytic processing of a
single gene product (De Jongh et al., 1990
). The
2
subunit consists of ~950 N-terminal amino acids and contains a VWA
and a cache domain (Figure 1). In most cases, at least an imperfect
MIDAS motif is conserved (Figure S3). Cache domains are only found in
these molecules and in a family of prokaryotic chemotaxis receptors
(Anantharaman and Aravind, 2000
). The remainder of the molecule
comprises the
subunit. The
2 and
subunits are
disulfide-linked and Brickley et al. (1995)
conclude
that the
subunit contains transmembrane domains and that the
2
subunit is entirely extracellular. When coexpressed with the
pore-forming
1 subunit, the
2
complex regulates various
functional properties of the channel complex (reviewed in Hobom
et al., 2000
). The
2
gene family has
orthologues in the D. melanogaster and C. elegans
genomes, but none are detectable in A. thaliana or yeast. There are five paralogues in human, two in C. elegans, and
four in D. melanogaster (Figure S4). In human, three
paralogues were characterized previously (CACNA2D1-3), a fourth was
deposited in the database from a full-length cDNA sequencing project
and is the most divergent of the five. The fifth is a
Genomescan-predicted protein identified in this study (see Table S4 for
sequence). Both C. elegans orthologues were targeted with
RNAi and no mutant phenotype was observed (Maeda et al.,
2001
). Mutant phenotypes have not been described for any of the
Drosophila genes. Because the annotation of several of
these molecules is uninformative in the databases, we have provided
tentative names based on orthologous relationships with human molecules
(Figure S3).
CLCA Family of Putative Chloride Channel Subunits
The
first member of the CLCA family (Pauli et al., 2000
) was
characterized in bovine aortic endothelial cells as a protein involved
in attachment of melanoma cells to lung endothelium and was named
lung-specific endothelial adhesion molecule (Lu-ECAM-1; Zhu et
al., 1991
). Lu-ECAM-1 was found to encode a protein 88% identical to a putative calcium-activated chloride channel from bovine
trachea (Elble et al., 1997
). Expression of
calcium-activated chloride channel in Xenopus oocytes
resulted in the appearance of anion-selective conductance (Cunningham
et al., 1995
). In the human genome, four paralogues are
located in a cluster on chromosome 1p22. The transcript encoding the
human orthologue of bovine Lu-ECAM-1 (hCLCA3) contains several stop
codons and results in the production of a secreted variant
corresponding to the N-terminal 262 amino acids (without the VWA
domain) of the other family members (Gruber and Pauli, 1999
). Human
CLCA1 and CLCA4 have conserved MIDAS motifs. Orthologues of the CLCA
family are not detectable in the C. elegans and
D. melanogaster genomes but a possible homologue is
present in the Xenopus expressed sequence tag
(EST) collection (GB# AW767641), suggesting that the CLCA family
is a chordate invention. It is likely that CLCA proteins do not
form channels by themselves. According to SMART predictions, CLCA
family members consist of a central VWA domain and a single
transmembrane domain near the C terminus (Figure 1). Published
transmembrane predictions for these molecules (Cunningham et
al., 1995
; Gruber et al., 1999
) require that the VWA
domain spans the plasma membrane, a situation unlikely to occur based
on VWA domain structural studies. The biochemical data presented by
Gruber et al. (1999)
, however, are entirely consistent with
the model of CLCAs derived from SMART analysis: a type I transmembrane
protein with an extracellular VWA domain (Figure 1). Reminiscent of the
2
proteins, CLCAs are expressed as 125-kDa precursor proteins
that are subsequently processed into 90- and 35-kDa subunits (reviewed
in Pauli et al., 2000
). It seems to us more likely that the
CLCAs are accessory molecules for chloride channels, analogous to the
2
subunits of calcium channels. The roles of the VWA domain in
both types of molecules are unknown but could involve modulation of
channel activity by binding other proteins or divalent cations via the VWA domains.
Anthrax Toxin Receptor Family
There are three members
of the anthrax toxin receptor family, and the two that have been
studied experimentally are type I transmembrane proteins with a single
extracellular VWA domain. The third is a Genomescan-predicted protein
that we have termed anthrax toxin receptor (ATR) 3 (Figure 1; see Table
S4 for sequence). The ATR is the cellular receptor for the anthrax
protective antigen and facilitates entry of the toxin into cells. The
VWA domain of ATR mediates interaction with protective antigen and the
binding is dependent on divalent cations (Bradley et
al., 2001
). The murine orthologue of this gene, TEM-8, was
identified as a gene up-regulated in colon tumor endothelium (St Croix
et al., 2000
). The second member of this family, CMG-2,
was identified in a similar screen for genes up-regulated during
capillary morphogenesis (Bell et al., 2001
). The normal
cellular ligand for the anthrax receptor is unknown but a recombinant
fragment of CMG-2 was shown in a solid-phase assay to bind collagen IV,
laminin and, to a lesser extent, fibronectin (Bell et
al., 2001
). All three molecules have conserved MIDAS motifs and
the cytoplasmic domain of ATR can be alternatively spliced (Bradley
et al., 2001
). Potential homologues of these molecules
are present in the chicken and zebrafish EST databases. The
observations discussed above suggest that these proteins are a family
of vertebrate ECM receptors expressed by endothelial cells.
Complement Factors
Complement factors B and C2 both
contain three CCP or Sushi domains, a single VWA domain with a
conserved MIDAS motif, and a trypsin-type serine protease domain
(Figure 1). Orthologues of these molecules are found from echinoderms
to chordates (Smith et al., 1998
) but are not found in
D. melanogaster and C. elegans, suggesting that they may be a deuterostome-specific invention. During
complement activation, the CCP domains are cleaved off, resulting in
the formation of an active protease that cleaves and activates
complement C3. Complement C2 is in the classical pathway and complement
factor B is in the alternative pathway. The interactions of C2 with C4
and of factor B with C3b are both dependent on Mg2+-binding
sites within the VWA domains, and the VWA domain of factor B has been
shown to mediate the binding of C3 (Tuckwell et al., 1997
), consistent with the common inferred function of VWA domains as
magnesium-dependent protein interaction domains. Up to this point,
all of the molecules discussed have extracellular VWA domains, and most
seem likely to play roles in cell adhesion or in protein-protein interactions among ECM molecules. However, VWA domains, like FN3 and Ig
domains (in titin, etc.) can also be found in intracellular molecules.
The next group of proteins provides a useful transition to
consideration of intracellular VWA domain-containing proteins because
some are extracellular, one is intracellular, and some have not been characterized.
Trypsin Inhibitors and Their Relatives
Three classes
of human proteins contain a combination of VIT and VWA domains: seven
inter-
-trypsin inhibitor heavy chains (ITIH), three members of the
novel Q9BVH8 family, and a single poly(ADP-ribose) polymerase [vault
poly(ADP-ribose) polymerase, vPARP; Figures 1 and
S4]. The VWA domains of these proteins
represent a discrete clade in a comparison with other human VWA domains (25% bootstrap support). The most closely related domains outside this
group are those of the calcium channel
2
subunits. The function
of the VIT domain (vault protein inter-
-trypsin domain) is not
known. It is found only in chordates with the exception of a single
bacterial protein predicted in Cyanobacterium anabaena (NP_488452), which clusters with vPARP and the Q9BVH8 family with 100%
bootstrap support (Figure S4). With the exception of the Mg chelatases
discussed below, this molecule is the only clear VWA domain-containing
orthologue shared between bacteria and eukaryotes and may represent a
recent horizontal gene transfer event.
Intracellular Proteins: The Primordial Group?
One of the more surprising results of our survey (at least to us)
was the presence of a set of intracellular VWA domain-containing proteins (Figure 4). Intracellular VWA
domains have been noted previously (Ponting et al., 1999
).
As we discuss below, these proteins are more broadly distributed
phylogenetically than the metazoan extracellular proteins discussed
above, suggesting that the intracellular proteins may be the most
ancient.
|
Rpn10-26S Proteasome Regulatory Subunit
The 26S
proteasome is a complex of proteins involved in degradation of
ubiquitin-tagged proteins (reviewed in Voges et al., 1999
). The subunit of the complex that recognizes chains of ubiquitin is Rpn10 (or S5a, PSD4). A single orthologue of this molecule is
encoded by all completed eukaryotic genomes. In addition to the
N-terminal VWA domain, Rpn10 proteins contain ubiquitin-interacting motifs that are involved in recognition of multiubiquitin. Yeast cells
deficient in Rpn10 are viable, suggesting that Rpn10 is not the only
multiubiquitin-binding protein in cells. However, the VWA domain in
Rpn10 may play a role in efficient 26S complex function (Fu et
al., 2001
). MIDAS motifs are not found in any Rpn10 (Table S3).
Region 1 of the yeast Rpn10 VWA domain has the sequence DxSxY, and Fu
et al. (2001)
demonstrated a requirement for an acidic
residue in the D position by using several functional assays,
suggesting that the intact VWA domain is required for 26S proteasome
function. The VWA domain can also mediate interaction with Id1, a
transcription regulator that is itself regulated by ubiquitin-mediated
proteolysis (Anand et al., 1997
; Bounpheng et
al., 1999
).
TFIIHp44
TFIIH is a multiprotein complex that is one
of the five general transcription factors that binds the RNA polymerase
II holoenzyme (Orphanides et al., 1996
; Myer and Young,
1998
). The p44 subunit of TFIIH is the human orthologue of yeast SSL1
(Humbert et al., 1994
). Orthologues of these genes are
also found in all completed eukaryotic genomes and all these proteins
have VWA domains (Figure 4). TFIIHp44 functions as a DNA helicase in
RNA polymerase II transcription initiation and DNA repair, and its
transcriptional activity is dependent on its C-terminal Zn-binding
domains (Fribourg et al., 2000
). The function of the VWA
domain is unclear, but it may be involved in complex assembly. MIDAS
motifs are not conserved except for the fly, which has an imperfect
MIDAS motif. During the course of these analyses, we performed
pairwise comparisons of isolated VWA domains from each species
considered. One outcome of this work was the identification of a strong
relationship between TFIIHp44 and Rpn10. When the complete VWA domain
collections of eukaryotes are compared, the VWA domains of TFIIHp44 and
Rpn10 form a clade with >90% bootstrap support. This relationship is supported by Aravind and Ponting (1998)
who recognized them as homologues. Both proteins function in intracellular multiprotein complexes so these results may suggest an analogous role for each in
their respective complexes.
Ku70/80 DNA Helicase Family
In humans, the lupus
autoantigens Ku70 and Ku86 form heterodimers, bind DNA, and are
involved in repair of double-stranded breaks in DNA (Walker et
al., 2001
, and references therein). SMART predicts that human
Ku70 and Ku86 have VWA domains. The Ku DNA helicases have orthologues
in all completed eukaryotic genomes and, therefore, seem to be a
pan-eukaryotic VWA domain-containing protein (Figure 4). The VWA
domain, however, is not consistently predicted in many of these
orthologues (Figure S6). The structure of the human Ku70/Ku86
heterodimer has been solved and both subunits contain a Rossmann fold,
but whether this should be considered a true VWA domain is unclear. The
VWA domains/Rossmann folds are probably not involved in dimer formation
but may support interactions with additional molecules (Walker
et al., 2001
).
ATPases Associated with Diverse Cellular Activities
(AAA)-VWA Proteins
Two proteins in the human genome contain a
combination of AAA and VWA domains. AAA domains are found in all
branches of life and have ATPase activity (Patel and Latterich, 1998
).
There is growing evidence that AAA proteins are important in assembly
and disassembly of macromolecular complexes (Maurizi and Li, 2001
). Both of the AAA-VWA proteins in the human genome were previously uncharacterized. The first is a 5000 amino acid protein common to
all completed eukaryotic genomes (AAAVWA-euk; Figure 4). The yeast
orthologue is required for cell viability (Winzeler et al., 1999
) and coprecipitates with a protein complex thought to be the
transport intermediate of 60S ribosomal subunits (Bassler et
al., 2001
). These proteins have multiple AAA domains at the N
terminus and a single VWA domain near the C terminus. Human and
C. elegans AAAVWA-euk have perfect and the other orthologues have imperfect MIDAS motifs, suggesting that divalent cations may play
a role in their function. In the databases, the yeast, D. melanogaster, and A. thaliana orthologues are full
length. The human and C. elegans are fragments and
concatenation of adjacent Genomescan-predicted proteins is required to
generate full-length sequence. These sequences are available in Table
S4. The second AAA-VWA domain protein in the human genome has
orthologues in D. melanogaster and C. elegans but
is not detectable in yeast and A. thaliana, suggesting it
may be a metazoan invention (AAAVWA-ani; Figures 1, 5, and S7). These
molecules have two AAA domains and a single C-terminal VWA domain.
Perfect MIDAS motifs are found in the human and D. melanogaster orthologues, and the C. elegans molecule
has an imperfect motif, suggesting that cations may be important in
their function. The human protein is a fragment in the database but
concatenation of adjacent Genomescan-predicted proteins results in an
intact molecule that is >30% identical to the C. elegans
orthologue over 1800 amino acids. The Drosophila protein is
also a fragment in the database, and two adjacent predicted genes in
Flybase were concatenated to generate a full-length molecule. The
sequences of these assembled molecules are available in Table S4. The
function of these proteins is unknown.
Sec-23 The S. cerevisiae protein Sec-23 has orthologues in all completed eukaryotic genomes, but only the S. cerevisiae molecule is predicted to have a VWA domain. Sec-23 is part of a multiprotein complex involved in transport of vesicles from the Golgi to the endoplasmic reticulum. It is possible that the Sec-23 VWA domain prediction is a false positive because the e-value is high (1.01) and all three MIDAS regions are unconserved. Q9ZQH3 is an uncharacterized A. thaliana molecule whose VWA domain is similar to yeast Sec23 (63% bootstrap support). This molecule, however, is distinct from the A. thaliana orthologues of Sec-23, all of which lack a predicted VWA domain. These five intracellular VWA proteins found in all eukaryotes (Figure 4) could be viewed as the primordial set of VWA proteins. All are involved in multiprotein complexes, and it is tempting to speculate that the role of VWA domains is in the protein-protein interactions contributing to these complexes. In most cases, divalent cations are key to the structure and or function of these complexes, suggesting another potential role for the VWA domains, although most lack MIDAS motifs.
Copines
The copines are phospholipid-binding
proteins, originally identified in Paramecium (Creutz
et al., 1998
). There are nine family members in human
(three groups of three) and three orthologues in both C.
elegans and A. thaliana, but none are detectable
in D. melanogaster or S. cerevisiae
(Figure S5). Two of the human copines (A and B) are previously
uncharacterized Genomescan-predicted proteins and their sequences are
available in Table S4. It seems that the ancestral organism common to
Homo sapiens, C. elegans, and A.
thaliana had a single copine that independently expanded into
three in C. elegans and A. thaliana and
nine in H. sapiens. The phylogenetic distribution
suggests that copines have been lost from some eukaryotic phyla (Figure
S5). Each copine contains two C2 domains followed by a single VWA
domain except for copine W3, which has a single C2 domain. Three
additional C. elegans molecules (Figure
5) have VWA domains that are closely
related to the copines but all lack C2 domains. No functional
properties have been assigned to the VWA domains of copines. They
contain a functional MIDAS motif based on preferential binding to
magnesium and manganese (Tomsig and Creutz, 2000
) despite the fact that the MIDAS motif is not perfectly conserved. In all 15 cases, region 1 of the copine MIDAS consists of the sequence D-x-T-x-S. The Paramecium sequence is D-x-T-x-Q at this location. The C2
domains mediate calcium-dependent phospholipid binding (Davletov and
Sudhof, 1993
) and support oligomerization (Tomsig and Creutz, 2000
). In A. thaliana, copines may play a role in growth regulation.
BONZAI1 mutants that lack copine 1 (Figure S5) produce miniature plants when grown at 22°C (Hua et al., 2001
). Mutations in the
same gene described by another group result in abnormal cell death in
response to low-humidity conditions (Jambunathan et al.,
2001
).
|
Unknown Conserved in Humans and Flies (Q9H0S5/Q9VPY0)
The final human VWA protein, Q9H0S5, is an uncharacterized molecule
with a clear MIDAS domain and widespread expression based on EST
frequency. Q9VPY0 is the Drosophila orthologue of this gene with several embryonic ESTs (TIGR Drosophila gene
index TC71868; Adams et al., 2000
; Figure S7). Q9VPY0
does not have a conserved MIDAS motif. Both molecules consist of an
N-terminal VWA domain followed by ~400 amino acids of sequence that
lacks obvious distinguishing features, including a signal sequence. The
human and D. melanogaster molecules are ~33%
identical over their entire length. Orthologues of these molecules are
not detectable in the C. elegans, A.
thaliana, or S. cerevisiae genomes. This is the
only VWA domain-containing protein found in humans and D.
melanogaster but not in the C. elegans genome.
Additional C. elegans VWA Proteins
It has been noted (Hutter et al., 2000
) that C. elegans contains a novel set of ECM and adhesion genes, and the
same can be said for VWA-domain proteins (Figure 5, marked in blue).
This genome encodes a large number of C-type lectin proteins, several of which also contain a VWA domain. This combination of domains is not
seen in any other organism, and many of the proteins contain other
domains as well or instead (Figure 5; CUB, ZP, and EGF). These all seem
to reflect separate evolution of VWA-domain proteins in C. elegans. It remains unclear why this species has elaborated such a
plethora of apparent adhesion proteins; perhaps it has something to do
with cuticle formation or with the highly reproducible arrangements of
cells in this organism, providing additional zip codes not required in
less determinative organisms.
The C. elegans proteins mup-4 and mua-3 are VWA
domain-containing transmembrane receptors for extracellular matrix
(Figure 5). Mup-4/mua-3 have functional and molecular similarities with the mammalian integrin
4, suggesting that the molecules may
represent an example of convergent evolution (Bercher et
al., 2001
; Hong et al., 2001
). Like all
integrin
subunits, mup-4 and mua-3 have extracellular VWA
domains with conserved MIDAS motifs and EGF modules and, like
integrin
4, their cytoplasmic domains link to intermediate
filaments. In support of this idea, D. melanogaster do not
have intermediate filaments and lack a homologue of either integrin
4 or mup-4 (Bercher et al., 2001
).
Additional A. thaliana VWA Proteins
In contrast with the elaboration of novel extracellular VWA domain
proteins in C. elegans, A. thaliana seems to have
elaborated the intracellular portion of the VWA domain-containing
protein set (Figure 6). There are no
integrins or ECM proteins in the plant set of VWA proteins.
|
Two groups of A. thaliana proteins contain a combination of VWA and ring finger domains. Ring finger domains frequently have E3 ubiquitin-protein ligase activity. Rice and A. thaliana are the only sequenced organisms containing this combination, suggesting that it is a plant-specific domain architecture. One group of three paralogues has VWA-ring arrangement. A fourth paralogue, Q9LVN6, lacks the ring domain but the VWA domain is closely related to the other three. The VWA domains of these molecules are also closely related (100% bootstrap support) to the A. thaliana copine VWA domains, including the D-x-T-x-S sequence at region 1 of the MIDAS (Figure 6).
The second group, which seems to be unrelated to the first, has eight paralogues and the architecture ring-VWA. One-half of these have conserved MIDAS domains (Figure 6). The VWA domains of these molecules are related to the human Q9BVH8 family of VIT-VWA molecules (70% bootstrap support) and a group of bacterial proteins (NP_442565 and relatives; 92% bootstrap support).
With the possible exception of two groups of homologues, one with two members and the other with three members (Figure 6, below purple line), all A. thaliana VWA domain proteins seem to be intracellular. No functional information is available for the molecules below the purple line, but all have at least imperfect MIDAS motifs and one has a predicted signal sequence, suggesting that it might be secreted.
Prokaryotic VWA Proteins
In general, the archaeal and bacterial VWA domain-containing proteins are not well characterized. Of the 148 prokaryotic VWA domains in the databases, 90 have the words hypothetical, putative, ORF, or unknown in their descriptions. More than 80% of prokaryotic VWA domains have at least an imperfect MIDAS motif, indicating that divalent cation coordination may play a role in protein function (Figure S1, a-c, and Table S3).
In Archaea, there are 16 VWA domain proteins from nine different species (Table S2). Five of these are Mg chelatases based on sequence homology. The remaining proteins are all described as hypothetical or ORF. Within this list of unknown proteins, there are two groups of clear homologues; one group contains four members, the other contains two. All archaeal genomes sequenced to date have at least one VWA protein, but there are no universally present VWA domain proteins and the only molecules clearly related to eukaryotic VWA domain proteins are the Mg chelatases (see below).
In bacteria, there are 132 VWA domain proteins from 49 different species (Table S2). Examples of intracellular, secreted, and membrane-associated proteins can be found in this list. There are 11 Mg and four Co chelatases in the list. Eight molecules are orthologues of n