Molecular Biology of the Cell track citations

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Originally published as MBC in Press, 10.1091/mbc.E02-05-0259 on August 6, 2002
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow MBC Supplemental Material
Right arrow All Versions of this Article:
E02-05-0259v1
13/10/3369    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Whittaker, C. A.
Right arrow Articles by Hynes, R. O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Whittaker, C. A.
Right arrow Articles by Hynes, R. O.

Vol. 13, Issue 10, 3369-3387, October 2002

ESSAY
Distribution and Evolution of von Willebrand/Integrin A Domains: Widely Dispersed Domains with Roles in Cell Adhesion and Elsewhere

Charles A. Whittaker, and Richard O. Hynes

Howard Hughes Medical Institute, Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Submitted May 7, 2002; Revised June 25, 2002; Accepted July 10, 2002
Monitoring Editor: Thomas D. Pollard

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
REFERENCES

The von Willebrand A (VWA) domain is a well-studied domain involved in cell adhesion, in extracellular matrix proteins, and in integrin receptors. A number of human diseases arise from mutations in VWA domains. We have analyzed the phylogenetic distribution of this domain and the relationships among ~500 proteins containing this domain. Although the majority of VWA-containing proteins are extracellular, the most ancient ones, present in all eukaryotes, are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport, and the proteasome. A common feature seems to be involvement in multiprotein complexes. Subsequent evolution involved deployment of VWA domains by Metazoa in extracellular proteins involved in cell adhesion such as integrin beta  subunits (all Metazoa). Nematodes and chordates separately expanded their complements of extracellular matrix proteins containing VWA domains, whereas plants expanded their intracellular complement. Chordates developed VWA-containing integrin alpha  subunits, collagens, and other extracellular matrix proteins (e.g., matrilins, cochlin/vitrin, and von Willebrand factor). Consideration of the known properties of VWA domains in integrins and extracellular matrix proteins allows insights into their involvement in protein-protein interactions and the roles of bound divalent cations and conformational changes. These allow inferences about similar functions in novel situations such as protease regulators (e.g., complement factors and trypsin inhibitors) and intracellular proteins (e.g., helicases, chelatases, and copines).

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
REFERENCES

The rapid accumulation of genomic sequences offers both the challenge of understanding the functions of proteins encoded by those genomes and the opportunity for drawing inferences about the evolution of functions in proteins in different phyla. Effective annotation of the genes and their products requires both analyses of sequence and structural homologies among genes and incorporation of biochemical and biological information about the proteins to make best use of the genomic information.

We have been interested in the structure, function, and evolution of proteins involved in cell adhesion and interaction (Hynes and Zhao, 2000). Annotation of these proteins represents a significant challenge because we estimate that there are >2000 such proteins encoded by mammalian genomes. Searching proteomes for conserved domains is a first step toward overcoming that challenge. Some conserved domains have been extensively studied and their presence within a protein suggests specific biological properties. In this essay, we present an analysis of a subset of proteins; those including the so-called von Willebrand A (VWA) domain (reviewed in Colombatti et al., 1993; Tuckwell, 1999). Proteins containing VWA domains are present in Eukaryota (Metazoa, fungi, plants, and protists), Eubacteria, and Archaea. VWA domains are Rossmann folds consisting of a beta -sheet sandwiched by multiple alpha  helices. Many VWA domains bind metal ions via a noncontiguous sequence motif called metal ion-dependent adhesion site (MIDAS). Frequently, VWA domain-containing proteins function in multiprotein complexes. The eponymous VWA domains of von Willebrand factor play key roles in the linkage of platelets to collagen (see below). The homologous inserted or I domains of some integrin alpha  subunits are also involved in interactions with collagen and other ligands. Our initial interest in VWA/I domains arose from considerations of the evolution of integrin subunits because VWA domain-containing integrin alpha  subunits seem to be restricted to chordates (Hynes and Zhao, 2000). In the course of investigating the verity and significance of this supposition, we explored the universe of VWA/I domains, and we present herein the results of that inquiry, including a number of novel proteins and new insights.

The majority of well-characterized VWA domains are found in cell adhesion and extracellular matrix (ECM) proteins (Tuckwell, 1999). In many cases, it is clear, or plausible, that they are involved in protein-protein (e.g., receptor-ligand) interactions that frequently involve divalent cations. In exploring what might be the origin of this domain, we discovered that the VWA domains most widely distributed phylogenetically are intracellular proteins present in all eukaryotic genomes sequenced thus far. The roles of the VWA domains in these intracellular proteins are not clear, but many are components of multiprotein complexes and a plausible hypothesis is that the VWA domains mediate protein-protein interactions involved in the assembly or function of these complexes. It seems that the presumptive primordial VWA domains subsequently were deployed by metazoans in extracellular protein-protein interactions, with integrin beta  subunits as very early representatives. Later incorporations of VWA domains also into integrin alpha  subunits and into many ECM molecules seem to be predominantly chordate elaborations presumably related to the large expansion of ECM complexity in chordates, although other eukaryotes, notably, Caenorhabditis elegans, have also deployed VWA domains in ECM proteins. VWA domains also pop up in other surprising and interesting contexts such as ion channel subunits, the anthrax toxin receptor, and protease regulators.

In this review, we consider first the best understood VWA domains; those in integrins and von Willebrand factor (vWF) from which one can infer their likely functions. We then consider the other contexts in which one finds these domains and discuss their possible functions and potential evolution.

    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
REFERENCES

As of 24 March 2002 the SMART nonredundant database (nrdb) (a merger of swissprot, swissnew, sptrembl, and sptremblnew) contained 948 proteins containing 1196 VWA domains, with new additions being deposited regularly (Letunic et al., 2002; http://smart.embl-heidelberg.de/). The survey presented in this essay is based largely on the SMART database and analysis because of the experience with integrin beta  subunits. The properties of integrin beta  subunits prompted speculation that they contained a VWA domain (Lee et al., 1995; Tozer et al., 1996; Loftus and Liddington, 1997; Tuckwell and Humphries, 1997), in spite of the fact that many domain prediction algorithms were unable to support this claim. However, the SMART analysis techniques predict the VWA domain in integrin beta  subunits with high confidence (Tuckwell, 1999; Ponting et al., 2000; Schultz et al., 2000). Recently, the crystal structure of the extracellular portion of the integrin alpha vbeta 3 was reported (Xiong et al., 2001), and the presence of a VWA domain in the beta  subunit was confirmed. Therefore, we feel that the SMART algorithm for prediction of VWA domains is currently the best available, and we have used SMART analysis as the basis for this essay. We have complemented these analyses with Interpro where helpful. In addition, we have used each human VWA domain predicted by SMART (as of March 2002) to query the Genomescan-predicted protein database at National Center for Biotechnology Information (Yeh et al., 2001; http://www.ncbi.nlm.nih.gov/genome/seq/HsBlast.html). The National Center for Biotechnology Information genome annotation effort is ongoing, and many novel proteins detected by Genomescan have not yet been assigned to the databases included in the SMART nonredundant database. In many cases, particularly the novel collagens we report herein, the public and private human and mouse genome assemblies were also used extensively to arrive at plausible structures for uncharacterized molecules.

In addition to overall homology of domain structure and database queries with basic local alignment search tool (BLAST), we also made use of whole-molecule and individual VWA domain (extracted using SMART) alignments. Alignments, bootstrap analysis, and tree preparation were done using ClustalW, ClustalTree, and DrawTree provided by the Biology Workbench 3.2 (http://workbench.sdsc.edu/; Felsenstein, 1989) and the VectorNTI software package. Frequently, bootstrap numbers, expressed as a percentage of 1000 pseudoreplicates, are provided to give a confidence level for a given relationship (Efron et al., 1996). In the case of whole-molecule alignments, the resulting phylogenetic trees were used to provide a framework for discussion of families of molecules. In the case of the individual domain alignments, phylogenetic trees and bootstrap values were used to identify uncharacterized homologues by calculating relatedness among different domains. This type of analysis was more systematic and quantifiable than individual BLAST searches and was particularly useful in cases where the molecules in the databases were fragments or gene predictions. Due to the relatively large number of VWA domains in the database, it was sometimes necessary to create species-specific minimal sets where a single representative sequence was selected for each group of closely related sequences (usually paralogues; Table S1). This was done by first removing nearly identical sequences such as allelic variants and fragments to create a more strictly defined nonredundant set. The nonredundant sets were then aligned and subjected to bootstrap analysis and groups of closely related sequences were reduced by randomly selecting a single representative from each node with bootstrap support >94%. These minimal collections were then small enough to allow pairwise comparisons of the complement of VWA domains within a species or group.

We also investigated the occurrence of MIDAS motifs, first defined as metal ion-binding motifs in the VWA domains of integrin alpha  subunits (Lee et al., 1995). Because metal ions play key roles in the functions of VWA domains in integrins, we scored the three noncontiguous elements of the MIDAS motif (D-x-S-x-S... . T... . D) in hand-edited alignments of all the VWA domains identified in the five completed eukaryotic genomes as well as all prokaryotic VWA domains and a subset of protist VWA domains. For the purpose of discussion, the D-x-S-x-S will be referred to as region 1 and the other conserved residues will be called T4 and D5 (Figure S1, a-c). Approximately 46% of VWA domains have a perfectly conserved MIDAS motif (see figures, red stars); others are missing one or more elements. Structural studies of integrin VWA domains and biochemical analysis of copines (Tomsig and Creutz, 2000) indicate that a perfectly conserved MIDAS motif is not required for metal ion binding. To accommodate these observations and emphasize the importance in metal binding of the region 1 D followed by spaced alcohol residues (S or T), we coined the term imperfect MIDAS (open red stars) to refer to VWA domains that lack a subset of MIDAS elements but are likely to bind metal ions. Examples of imperfect motifs include those with region 1 (D-x-S-x-S) but without one or both of T4 and D5 or those with conservative changes in region 1 (D-x-T-x-S in copines) with and without conservation of T4 and D5. It is likely that confident conclusions regarding the presence or absence of a functional MIDAS motif will require structural analysis of the VWA domain in question in both ligand-bound and -unbound states (Xiong et al., 2002). In light of these considerations, we have presented our analysis of MIDAS motifs in an Excel (Microsoft, Redmond, WA) spreadsheet (Table S3) that will allow interested readers to sort VWA domains with respect to conservation and sequence of any or all of the three MIDAS elements.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
CONCLUSIONS
REFERENCES

Our analysis of the human proteome uncovered 86 proteins containing 134 VWA domains (pseudogenes and splice variants excluded). Phylogenetic analysis and examination of domain architectures indicate that most of these proteins fall into 15 clearly paralogueous groups (Figure 1). These molecules can be clustered into the following categories.


View larger version (43K):
[in this window]
[in a new window]
 
Figure 1.   Cell adhesion and extracellular matrix molecules predominate in the diverse collection of human VWA domain-containing proteins. The domain architecture of all human VWA domain-containing proteins is indicated in this figure and the following ones as noted. Paralogous molecules are grouped together in the phylogenetic tree derived from a clustalW alignment. The groups of paralogues or unrooted individual molecules have been shuffled along the vertical axis for clarity of presentation; so there is no information in the root of the tree (vertical gray bar). Molecules with known or likely roles in cell adhesion are marked by black stars in the gray bar on the left-hand side of the figure. Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively. All molecules above the purple bar seem to be extracellular; those below the pink bar are intracellular. Signal sequences are designated by a red line and transmembrane segments by a blue bar. Proteins that were assembled or originally characterized in this study are in red, and their sequences are availablein Table S4. See Table S2 to cross-reference molecules in this figure with database identifiers. Figure S2 contains SMART or INTERPRO identifiers for the various domains shown in the diagram (see also http://web.mit.edu/ccrhq/hyneslab/vwapaper/FigureS2.html). The diagrams in this and other figures are extracted largely from SMART (http://smart.embl-heidelberg.de/) but supplemented in some cases with information from other sources (see text).

Integrins

Integrin beta  Subunits Integrin heterodimers are the major cell surface receptors for extracellular matrix and can also support cell-cell adhesion (Hynes, 1992). We have searched the available human genome assemblies and found no additional integrin beta  subunits beyond the eight already known. The same is true for the mouse genome with the exception of a molecule called pactolus (Chen et al., 1998). At the sequence level, murine pactolus is closely related to the integrin beta 2 subunit but it does not seem to associate with alpha  subunits (Garrison et al., 2001) and is not detected in the human genome. We were unable to detect any credible homologues of beta  integrins in nonmetazoan phyla.

All integrin beta  subunits are predicted to have VWA domains (Tuckwell, 1999; Ponting et al., 2000; Figure 1). This prediction was recently confirmed by analyses of the crystal structure of the extracellular portion of the alpha vbeta 3 integrin heterodimer (Xiong et al., 2001), which shows a clear VWA domain in the beta 3 subunit. Integrin beta  subunits also contain an N-terminal PSI domain and repeated EGF-related domains termed I-EGF domains (Beglova et al., 2002). The integrin beta  VWA domain interfaces with the alpha  subunit and ligand binds across the interface (Xiong et al., 2002). There is selectivity in alpha /beta associations, and ligand specificity is conferred by the alpha /beta combination. Sequences within the integrin beta  VWA domains are important determinants of both heterodimer and ligand specificity (Takagi et al., 1997, 2002). MIDAS motifs (at least imperfect) are present in all beta  subunit VWA domains (Table S3) and integrin-ligand interaction involves joint coordination of a metal ion by the integrin MIDAS and a carboxylate from the ligand (Xiong et al., 2002). As we shall see, these features are commonly observed in VWA domains, and the VWA domains of integrin beta  subunits are among the most ancient of those involved in cell adhesion that use these properties of the domain.

Integrin alpha  Subunits Nine of the 18 known human integrin alpha  subunits have a VWA domain (Figure 1). No new alpha  subunits (with or without VWA domains) were detected in detailed searches of the complete human and mouse genomes. The VWA-positive alpha  subunits can be divided into two groups. Integrins alpha 1, alpha 2, alpha 10, and alpha 11 associate with integrin beta 1 subunit to form receptors for collagen. Integrins alpha M, alpha X, alpha D, alpha L complex with the beta 2 subunit, whereas alpha E complexes with beta 7, and all five are expressed on leukocytes where they mediate cell-cell adhesion. Like integrin beta  subunits, alpha  subunits in general are restricted to Metazoa. VWA domain-containing alpha  subunits, however, seem to be a chordate-specific radiation of the gene family because they have been found only in vertebrates and in the primitive chordate Halocynthia roretzi (Miyazawa et al., 2001) but are absent in the C. elegans and Drosophila melanogaster genomes. The H. roretzi alpha  subunit is expressed on phagocytic hemocytes, perhaps suggesting that it is an orthologue of the leukocyte alpha  subunits. Because H. roretzi contains homologues of the B and C3 components of complement (Nonaka and Azumi, 1999), it is possible that the H. roretzi VWA-containing alpha  chain functions as a C3b receptor like alpha Mbeta 2 in mammals. The beta  subunit VWA domains probably predate the alpha  subunit VWA domains because beta  subunits are found in all metazoans, whereas the VWA domain-containing alpha  subunits so far are known only in chordates.   Recombinant VWA domains from integrin alpha  subunits retain the ligand-binding specificity and dependence on divalent cations observed in intact heterodimers (Randi and Hogg, 1994; Ueda et al., 1994; reviewed in Shimaoka et al., 2002). As a result, integrin alpha  VWA domains are understood in detail and are informative reference points for considering the functions of other VWA domains. The crystal structures of the isolated VWA domains of alpha 1 (Nolte et al., 1999), alpha 2 (Emsley et al., 1997), alpha L (Qu and Leahy, 1995), and alpha M (Lee et al., 1995; Baldwin et al., 1998) are known. It is clear that two alpha  subunit VWA domain conformations (open and closed) represent high-affinity and low-affinity ligand-binding states, respectively (Shimaoka et al., 2001; Ma et al., 2002). This observation highlights the possibility that nonintegrin VWA domains may be subject to similar regulation.

All available data emphasize the importance of the VWA domain and MIDAS motif in integrin-ligand interaction (Xiong et al., 2002). The MIDAS motif is perfectly conserved in all integrin alpha  subunit VWA domains and interaction with ligand involves mutual coordination of metal ion (Emsley et al., 2000). The beta  VWA domain MIDAS motif is critical for ligand binding in heterodimers that include VWA domain-containing alpha  subunits. Mutations in the beta 2 and beta 7 VWA domain MIDAS motifs inhibit ligand binding despite being located a distance from the point of integrin-ligand contact (Bajt et al., 1995; Higgins et al., 2000). Given the high percentage of nonintegrin VWA domains with conserved MIDAS motifs (Figure S1, a-c, and Table S3), the coordination of metal ion by MIDAS and ligand may be a common feature of protein-protein interactions mediated by VWA domains.

Multidomain ECM Proteins (Figure 2)

von Willebrand Factor vWF is a plasma and ECM protein that mediates adhesion of platelets to fibrillar collagen underlying injured vascular endothelium (reviewed in Sadler, 1998). It is known only in vertebrates. Mutations in the vWF gene result in von Willebrand disease (vWD), a common human bleeding disorder (reviewed in Keeney and Cumming, 2001). There are three VWA domains in vWF (referred to as A1, A2, and A3); A1 and A2 are related and neither has a MIDAS motif. At the sequence level, A3 has an imperfect MIDAS. The crystal structures of A1 and A3 are known and neither contains a coordinated metal ion and the conserved MIDAS elements in A3 are not required for function (Bienkowska et al., 1997; Emsley et al., 1998). However, the VWA domains support the vWF-mediated linkage between platelets and fibrillar collagens. The platelet receptor GPIb/IX/V binds to domain A1 (Cruz et al., 2000) and mutations in A1 cause type 2B and type 2M vWD. These are characterized, respectively, by increased or decreased platelet interaction. Domain A3 supports cation-independent vWF interaction with fibrillar collagens (Cruz et al., 1995; Bienkowska et al., 1997; Romijn et al., 2001). A missense mutation in A3 causes vWD due to defective collagen binding (Ribba et al., 2001). The function of the vWF A2 domain is less well defined but missense mutations in A2 frequently cause type 2A vWD, a dominant disorder characterized by a decrease in high-molecular-weight multimers of vWF. This is due to improper secretion of vWF or increased proteolysis of the secreted mutant protein (Keeney and Cumming, 2001) or perhaps to a defect in assembly of multimers. Recombinant vWF lacking A2 is resistant to proteolysis after denaturation, suggesting that a protease sensitivity site within the domain may be exposed in misfolded mutant protein (Lankhof et al., 1997). Thus, like those in integrins, the vWF VWA domains are well-characterized domains clearly involved in protein-protein interactions important for cell adhesion. One difference is that, in contrast with the integrin domains, the VWA domains of vWF seem to be cation-independent, perhaps related to their imperfect MIDAS motifs.


View larger version (28K):
[in this window]
[in a new window]
 
Figure 2.   Noncollagenous VWA domain-containing ECM proteins All molecules shown are human except where designated (see text for discussion of orthologues/paralogues). All these proteins are found only in Metazoa and are secreted to the extracellular matrix with the possible exceptions of NG37 and DICE (see text), and most are clearly implicated in cell adhesion. Domains are designated as in Figures 1 and S2. Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively.

Hemicentins, NG37, and DICE1 Hemicentin is an adhesion molecule originally characterized in C. elegans, where it is secreted by body wall muscle cells and gonadal leader cells. Once secreted, hemicentin assembles into track-like matrix structures. C. elegans lacking hemicentin have defects in mechanosensory neuron development and germline cell migration (Vogel and Hedgecock, 2001). Both humans and mice have two homologues of C. elegans hemicentin but orthologues are not detectable in the D. melanogaster, yeast, and Arabidopsis thaliana genomes. Hemicentin is the only confirmed VWA domain-containing ECM protein found in both C. elegans and mammals (but see DICE1 below). The domain architecture is similar between C. elegans and humans with a single VWA domain near the N terminus followed by >40 Ig domains. All hemicentin VWA domains have imperfect MIDAS motifs and are highly conserved among themselves; region 1 has the sequence D-x-T-x-S, T4 is a D, and D5 is conserved.   In contrast with the C. elegans gene, mammalian hemicentins contain additional domains that are likely to be functionally important (Figure 2). Mammalian hemicentin 1 contains multiple TSP-1 domains. TSP-1 domains were originally identified in thrombospondin and are present in a wide range of proteins with roles in cell adhesion. The only other known proteins containing a combination of TSP-1 and VWA domains are proteins found in parasites from the protist kingdom of eukaryotes. These organisms cause malaria in humans and the TSP-1-VWA proteins seem to be secreted or transmembrane and function in adhesion and motility during the invasive phase of the parasitic life cycle (reviewed in Naitza et al., 1998; Figure S8). The plasmodium proteins are not closely related to any of the metazoan VWA-containing proteins. Both mammalian hemicentins also contain a G2F domain. The functions of G2F domains are unclear but they seem to be restricted to mammalian hemicentins and the metazoan nidogens, also adhesion proteins.

The VWA domain of a potentially secreted protein, termed NG37 in human and G7c in mouse, is closely related to the VWA domain of hemicentin. The residues at the MIDAS positions are identical with those in all three hemicentins, suggesting that they may have similar divalent cation-binding properties. NG37 and G7c were identified in genomic sequencing of the major histocompatibility complex class III region (Snoek et al., 1998, 2000). The close sequence relationship among the VWA domains of NG37 and hemicentins, coupled with the predicted signal sequence make it seem likely that NG37/G7c plays some role in cell adhesion.

Kumanovics and Lindahl (2001) and our results also suggest a relationship between the VWA domains of NG37/G7c and the domain found in DICE1, a protein encoded by a gene located in a human tumor suppressor locus (Wieland et al., 1999). A Genomescan-predicted protein on the X chromosome is a paralogue of Dice1 (see Table S4 for sequence). Mice have orthologues of both human DICE1 and DICE1-2, and single orthologues are present in the genomes of C. elegans and D. melanogaster (Wieland et al., 2001). Orthologues were not detected in Saccharomyces cerevisiae and A. thaliana. In all cases, the DICE VWA domains are located at the N terminus of the proteins. MIDAS motifs are perfectly conserved except for C. elegans where S substitutes for T4. The Dice proteins are being considered herein as potential ECM proteins because of their VWA domain similarities with the hemicentin/NG37 molecules. However, signal sequences are not predicted in any DICE orthologue and the mammalian molecules have a DEAD box motif that is found in RNA helicases (reviewed in Tanner and Linder, 2001). Because the DEAD box is not conserved in the C. elegans and D. melanogaster orthologues, we have weighted the relationship with hemicentin/NG37 but inclusion in the ECM group is tentative.

Polydom Polydom is a secreted protein originally identified in mice in a screen for molecules containing EGF domains (Gilges et al., 2000). The domain architecture consists of an N-terminal VWA domain and a central PTX domain; the remainder of the protein is made up of a large number of repeated CCP and EGF modules (Figure 2). The human orthologue of polydom was previously uncharacterized and has a similar domain architecture (see Table S4 for sequence). The VWA domain of polydom contains a conserved MIDAS motif, suggesting a potential role for divalent cations in polydom function.   Included in this list of enormous ECM molecules with proven or likely roles in cell adhesion is a predicted nematode protein of >1300 kDa (Figure 2). This molecule has four VWA domains (all lacking MIDAS motifs), three EGF-like, six Ig-like, one CCP, and 10 Fn3 domains. Its function, like that of polydom, is unknown but their domain composition, including VWA domains, strongly suggests roles in cell adhesion.

Matrilins The matrilins are a family of fibril-forming vertebrate ECM proteins with four paralogues in the human genome (reviewed in Deak et al., 1999). With the exception of matrilin 3, which has a single VWA domain in all orthologues examined, matrilins contain two VWA domains that flank a variable number of EGF domains (Figure 2). Matrilins 1 and 3 are expressed in cartilage and matrilins 2 and 4 have a widespread distribution (Deak et al., 1999). The matrilins form oligomers (Wu and Eyre, 1998), and both VWA domains of matrilin 1 play a role in oligomerization (Chen et al., 1999). All matrilin VWA domains have MIDAS motifs (imperfect in the matrilin 3 VWA domain) and mutation of the MIDAS motif in both matrilin 1 VWA domains blocks filamentous network formation, suggesting that cation binding by the VWA domains of matrilins may be required for function (Chen et al., 1999). Matrilin-1 supports integrin alpha 1beta 1-mediated adhesion and spreading of chondrocytes (Makihira et al., 1999). Two groups observed normal development in mice lacking matrilin-1 despite abnormal type II collagen fibrillogenesis (Aszodi et al., 1999; Huang et al., 1999). Two different recessive mutations in the exon encoding the VWA domain of matrilin-3 found in unrelated families cause the EDM5 form of multiple epiphyseal dysplasia (Chapman et al., 2001). These mutations result in single amino acid changes of V194D or R121W. These residues are conserved in all matrilin family members, are not part of the MIDAS motif, and the disease-causing mechanism is unknown. Like the VWA collagens, matrilins seem to be a chordate invention.

Cochlin and Vitrin Cochlin and vitrin are proteins containing a single LCCL domain followed by two VWA domains (Figure 2). Cochlin is expressed by fibrocytes in the inner ear, localized to extracellular spaces, and missense mutations in the LCCL domain lead to the autosomal-dominant hearing disorder DFNA9 (Robertson et al., 2001). Vitrin was isolated from the vitreous of the bovine eye (Mayne et al., 1999). The LCCL is a domain found in Limulus Factor C, cochlins, and lgl1. The C-terminal VWA domains of cochlin and vitrin form a clade with those of matrilins (51% bootstrap support). Whole-molecule alignments also support the relationship between cochlin/vitrin and matrilins. The C-terminal VWA domains of both cochlin and vitrin have perfectly conserved MIDAS motifs. The functions of the VWA domains in these molecules are unknown but, given their extracellular localization and sequence similarities, they are likely similar to the roles in matrilins and VWA collagens (see below). All seem most likely to comprise ECM molecules and many, perhaps all, their VWA domains are involved in protein-protein interactions.

VWA Collagens (Figure 3) The collagens are a large family of extracellular matrix molecules defined by the presence of repeating (G-X-Y) sequences that form a triple helix (Ricard-Blum et al., 2000). Sixteen collagens also contain 57 of the 134 VWA domains found in the human proteome. Of these, 25 have perfectly conserved MIDAS motifs (Figure 3, solid red stars); four imperfect MIDAS motifs are also indicated (Figure 3, hollow red stars). Eight of the 16 collagens are well characterized and have been studied experimentally; the other eight are novel genes detected in this study (sequences available in Table S4). These 16 collagens can be considered in several related groups (Figure 3).


View larger version (35K):
[in this window]
[in a new window]
 
Figure 3.   Sixteen unconventional mammalian collagens contain VWA domains. The figure shows the predicted structures of the eight known unconventional collagens and eight related novel proteins (red), all containing VWA domains. The eight novel proteins have been given letter designations for the purpose of discussion, and their sequences are available in Table S4. Appropriate names should be given after cDNA analysis. On the left is a phylogenetic tree showing only the most confidently clustered groups of paralogues. The vertical gray bar replaces the root of the tree. The VWA domains of these collagens comprise a clade and are more closely related among themselves than with any other VWA domains. They also show familial relationships within the group, shown herein by shading and boxes designating the degree of similarity with percentage bootstrap scores for individual groups shown (yellow ovals). Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively. Also marked are all occurrences of RGD motifs (red arrows). Domain structures shown were predicted by SMART except for the collagen motifs (black boxes, each of which represents 20 G-X-Y repeats predicted by Pfam). SMART does not predict VWA 2 or the KU domain in collagen 7alpha 1 but the domains are predicted by Pfam so they were added to the figure. The asterisk (*) indicates that colF is the mouse orthologue used to show C-terminal collagen motifs (see text). The dagger (dagger ) indicates these molecules lack predicted collagen motifs but are included because of close homology of their VWA domains with those of known collagens, as is also true for collagens A-D (see text). aThe human orthologue of chicken collagen 20 is KIAA1510 identified in a full-length cDNA sequencing project (Nagase et al., 2000). bCurrent assemblies a bit ambiguous (see text).

Collagens 12, 14, 20, and 21 (Fitzgerald and Bateman, 2001; Tuckwell, 2002) are fibril-associated collagen with interrupted triple helix (FACIT) collagens that associate with fibrillar collagen via their C-terminal collagenous domains. Their N-terminal segments containing the VWA domains extend out from the surface of the collagen fibrils and are suggested/known to bind other ECM components. The newly predicted collagen F seems to fall into this same group. The gene predicted in the human genome assembly lacks C-terminal collagen repeats, possibly due to a gap in the sequence assembly near the 3' end of the gene. To present a more likely picture of this molecule, we have shown the mouse orthologue of collagen F. Both the human and the mouse sequences are available in Table S4. These five FACIT collagens share a related set (clade) of VWA domains (Figure 3, highlights in yellow) as well as the universal presence of a TSPN domain just N-terminal of their C-terminal collagenous segments. Collagens 12, 14, and 20 also share tandem arrays of FN3 repeats interspersed with a closely related set of VWA domains (Figure 3). Both collagen 12 and collagen 14 are cell adhesion collagens and both have RGD sequences.

Collagen 7 shares many of these features and homologies of FACIT collagens but differs from them in having a C-terminal KU domain that forms a globular noncollagenous domain in collagen 7. Collagen 7 is a homotrimer with a cross-like structure. The short arms of the cross are formed by the three globular N-terminal noncollagenous domains consisting of the VWA and FN3 domains. The long arm is the collagen triple-helical region and a globular C-terminal domain. In the extracellular space, homotrimers associate in an antiparallel manner via the C-terminal globular domains. The N-terminal globular domains bind a variety of ECM/BM components, including collagens 1 and 4 and laminins 5 and 6. This bivalent ECM-binding feature of collagen 7 results in linkage of epidermal basal lamina and dermal ECM. Mutations in collagen 7 are involved in many types of epidermolysis bullosa, a human skin disease characterized by separation of dermal and epidermal layers (Pulkkinen and Uitto, 1999). The new predicted protein colG is included here because of its homologies, although it has no predicted collagen repeats and may not be a true collagen.

Another large group of collagens includes collagen 6 and related proteins (Figure 3) Collagen 6 (alpha 1, alpha 2, and alpha 3 chains) is a heterotrimer that forms filamentous structures flanked by globular domains containing the VWA domains. These globular domains can associate with themselves or with other ECM molecules. The monomeric form of collagen 6 is a heterotrimer consisting of 6alpha 1, 6alpha 2, and 6alpha 3 chains. Antiparallel association of collagen 6 monomers forms dimers and the dimers laterally associate to form tetramers. The tetramers associate end-to-end to form a beaded filament. Collagen 6 interacts with a wide range of ECM components (reviewed in Ricard-Blum et al., 2000). The VWA domains of col6alpha 3 interact with fibrillar collagen 1 (Bonaldo et al., 1990) and the triple-helical part of collagen 6 interacts with the A1 VWA domain of vWF (Hoylaerts et al., 1997). The new collagen E is similar in overall structure to col6alpha 1 and col6alpha 2 but is not clearly related at the sequence level. Three new collagens (A, B, and C) are in a cluster on chromosome 3 in human, and their orthologues are on chromosome 9 in mouse. The gene predictions are a little ambiguous in the current genome assemblies, and the structures shown in Figure 3 represent a best estimate based on both the human and mouse assemblies (public and Celera). Collagens A-D show close homologies in domain arrangements and similarities among VWA domains with collagen 6alpha 3 (Figure 3). Although the gene predictions shown represent open reading frames (ORFs) without stop codons, their domain organizations have some puzzling features such as lack of collagen repeats in collagens A and B and the anomalous position of a VWA-Ku domain pair at the N terminus of collagen C. Whether these represent genome assembly problems or pseudogenes will have to await further information, but it seems likely that there may exist a family of collagen subunits related to collagen 6alpha 3.

It seems very likely that the VWA domains of these collagens are involved in protein-protein interactions with other matrix proteins and possibly with cells. This elaboration of VWA collagens seems to be chordate- or vertebrate- specific.

Other Extracellular VWA Domain Proteins

We turn next to a different class of VWA-domain proteins where the presence of VWA domains comes as something of a surprise (Figure 1).

Calcium Channel alpha 2delta Family Voltage-gated calcium channels are a complex of five proteins: alpha 1, beta 1, gamma , alpha 2, and delta . The alpha 2 and delta  subunits result from proteolytic processing of a single gene product (De Jongh et al., 1990). The alpha 2 subunit consists of ~950 N-terminal amino acids and contains a VWA and a cache domain (Figure 1). In most cases, at least an imperfect MIDAS motif is conserved (Figure S3). Cache domains are only found in these molecules and in a family of prokaryotic chemotaxis receptors (Anantharaman and Aravind, 2000). The remainder of the molecule comprises the delta  subunit. The alpha 2 and delta  subunits are disulfide-linked and Brickley et al. (1995) conclude that the delta  subunit contains transmembrane domains and that the alpha 2 subunit is entirely extracellular. When coexpressed with the pore-forming alpha 1 subunit, the alpha 2delta complex regulates various functional properties of the channel complex (reviewed in Hobom et al., 2000).   The alpha 2delta gene family has orthologues in the D. melanogaster and C. elegans genomes, but none are detectable in A. thaliana or yeast. There are five paralogues in human, two in C. elegans, and four in D. melanogaster (Figure S4). In human, three paralogues were characterized previously (CACNA2D1-3), a fourth was deposited in the database from a full-length cDNA sequencing project and is the most divergent of the five. The fifth is a Genomescan-predicted protein identified in this study (see Table S4 for sequence). Both C. elegans orthologues were targeted with RNAi and no mutant phenotype was observed (Maeda et al., 2001). Mutant phenotypes have not been described for any of the Drosophila genes. Because the annotation of several of these molecules is uninformative in the databases, we have provided tentative names based on orthologous relationships with human molecules (Figure S3).

CLCA Family of Putative Chloride Channel Subunits The first member of the CLCA family (Pauli et al., 2000) was characterized in bovine aortic endothelial cells as a protein involved in attachment of melanoma cells to lung endothelium and was named lung-specific endothelial adhesion molecule (Lu-ECAM-1; Zhu et al., 1991). Lu-ECAM-1 was found to encode a protein 88% identical to a putative calcium-activated chloride channel from bovine trachea (Elble et al., 1997). Expression of calcium-activated chloride channel in Xenopus oocytes resulted in the appearance of anion-selective conductance (Cunningham et al., 1995). In the human genome, four paralogues are located in a cluster on chromosome 1p22. The transcript encoding the human orthologue of bovine Lu-ECAM-1 (hCLCA3) contains several stop codons and results in the production of a secreted variant corresponding to the N-terminal 262 amino acids (without the VWA domain) of the other family members (Gruber and Pauli, 1999). Human CLCA1 and CLCA4 have conserved MIDAS motifs. Orthologues of the CLCA family are not detectable in the C. elegans and D. melanogaster genomes but a possible homologue is present in the Xenopus expressed sequence tag (EST) collection (GB# AW767641), suggesting that the CLCA family is a chordate invention.   It is likely that CLCA proteins do not form channels by themselves. According to SMART predictions, CLCA family members consist of a central VWA domain and a single transmembrane domain near the C terminus (Figure 1). Published transmembrane predictions for these molecules (Cunningham et al., 1995; Gruber et al., 1999) require that the VWA domain spans the plasma membrane, a situation unlikely to occur based on VWA domain structural studies. The biochemical data presented by Gruber et al. (1999), however, are entirely consistent with the model of CLCAs derived from SMART analysis: a type I transmembrane protein with an extracellular VWA domain (Figure 1). Reminiscent of the alpha 2delta proteins, CLCAs are expressed as 125-kDa precursor proteins that are subsequently processed into 90- and 35-kDa subunits (reviewed in Pauli et al., 2000). It seems to us more likely that the CLCAs are accessory molecules for chloride channels, analogous to the alpha 2delta subunits of calcium channels. The roles of the VWA domain in both types of molecules are unknown but could involve modulation of channel activity by binding other proteins or divalent cations via the VWA domains.

Anthrax Toxin Receptor Family There are three members of the anthrax toxin receptor family, and the two that have been studied experimentally are type I transmembrane proteins with a single extracellular VWA domain. The third is a Genomescan-predicted protein that we have termed anthrax toxin receptor (ATR) 3 (Figure 1; see Table S4 for sequence). The ATR is the cellular receptor for the anthrax protective antigen and facilitates entry of the toxin into cells. The VWA domain of ATR mediates interaction with protective antigen and the binding is dependent on divalent cations (Bradley et al., 2001). The murine orthologue of this gene, TEM-8, was identified as a gene up-regulated in colon tumor endothelium (St Croix et al., 2000). The second member of this family, CMG-2, was identified in a similar screen for genes up-regulated during capillary morphogenesis (Bell et al., 2001). The normal cellular ligand for the anthrax receptor is unknown but a recombinant fragment of CMG-2 was shown in a solid-phase assay to bind collagen IV, laminin and, to a lesser extent, fibronectin (Bell et al., 2001). All three molecules have conserved MIDAS motifs and the cytoplasmic domain of ATR can be alternatively spliced (Bradley et al., 2001). Potential homologues of these molecules are present in the chicken and zebrafish EST databases. The observations discussed above suggest that these proteins are a family of vertebrate ECM receptors expressed by endothelial cells.

Complement Factors Complement factors B and C2 both contain three CCP or Sushi domains, a single VWA domain with a conserved MIDAS motif, and a trypsin-type serine protease domain (Figure 1). Orthologues of these molecules are found from echinoderms to chordates (Smith et al., 1998) but are not found in D. melanogaster and C. elegans, suggesting that they may be a deuterostome-specific invention. During complement activation, the CCP domains are cleaved off, resulting in the formation of an active protease that cleaves and activates complement C3. Complement C2 is in the classical pathway and complement factor B is in the alternative pathway. The interactions of C2 with C4 and of factor B with C3b are both dependent on Mg2+-binding sites within the VWA domains, and the VWA domain of factor B has been shown to mediate the binding of C3 (Tuckwell et al., 1997), consistent with the common inferred function of VWA domains as magnesium-dependent protein interaction domains.   Up to this point, all of the molecules discussed have extracellular VWA domains, and most seem likely to play roles in cell adhesion or in protein-protein interactions among ECM molecules. However, VWA domains, like FN3 and Ig domains (in titin, etc.) can also be found in intracellular molecules. The next group of proteins provides a useful transition to consideration of intracellular VWA domain-containing proteins because some are extracellular, one is intracellular, and some have not been characterized.

Trypsin Inhibitors and Their Relatives Three classes of human proteins contain a combination of VIT and VWA domains: seven inter-alpha -trypsin inhibitor heavy chains (ITIH), three members of the novel Q9BVH8 family, and a single poly(ADP-ribose) polymerase [vault poly(ADP-ribose) polymerase, vPARP; Figures 1 and S4]. The VWA domains of these proteins represent a discrete clade in a comparison with other human VWA domains (25% bootstrap support). The most closely related domains outside this group are those of the calcium channel alpha 2delta subunits. The function of the VIT domain (vault protein inter-alpha -trypsin domain) is not known. It is found only in chordates with the exception of a single bacterial protein predicted in Cyanobacterium anabaena (NP_488452), which clusters with vPARP and the Q9BVH8 family with 100% bootstrap support (Figure S4). With the exception of the Mg chelatases discussed below, this molecule is the only clear VWA domain-containing orthologue shared between bacteria and eukaryotes and may represent a recent horizontal gene transfer event.

The four characterized ITIH family members are extracellular molecules found in complexes with the kunitz-type serine protease inhibitor bikunin (Bost et al., 1998), which confers the protease inhibitor function of the complex. The function of the heavy chains is unclear, but they are covalently bound to hyaluronic acid and may play a role in ECM binding and/or stability (reviewed in Bost et al., 1998). ITIH1-4 are well characterized and reviewed in Salier et al. (1996). The three new paralogues are fragments in the database or Genomescan-predicted proteins. The newly assembled protein sequences are available in Table S4. The VWA domains of ITIH proteins cluster with 100% bootstrap support. MIDAS motifs are conserved in all ITIH family members except for ITIH7.

Three proteins (Q9BVH8-1-3) also have the domain architecture VIT-VWA but are separate from the ITIH family (Figure S4). These molecules are poorly characterized and the conclusion that they contain the VIT-VWA domain arrangement is based on SMART analysis of the corresponding Genomescan-predicted proteins and clustalW alignments (see Figure S4 for details). These molecules are more closely related to vPARP (see below) than to the ITIH family, but they lack the BRCT domain found in vPARP and form their own subfamily of VIT-VWA proteins. The MIDAS motif is conserved in two of the three. Nothing is known about their subcellular localization.

The final protein that contains VIT and VWA domains is vPARP (Figure 1), the 193-kDa component of cytoplasmic vault ribonucleoprotein complexes (Jean et al., 1999). Poly(ADP-ribose) polymerase is a eukaryotic enzyme but vPARP (Figure 1) seems to be restricted to mammals. We have identified a predicted protein from the mouse golden path genome assembly, and the mouse orthologue of vPARP and its sequence are available in Table S4. The vault complex is involved in DNA repair, nuclear transport, and multidrug resistance (Kickhoefer et al., 1999). The VWA domains in both human and mouse vPARP have conserved MIDAS motifs, implicating coordinated metal ions in their function. The function of the vPARP VWA domain is unknown, but vPARP is part of a multiprotein complex (Kickhoefer et al., 1999), raising the possibility that the VWA domain of vPARP plays a role in complex assembly.

Intracellular Proteins: The Primordial Group?

One of the more surprising results of our survey (at least to us) was the presence of a set of intracellular VWA domain-containing proteins (Figure 4). Intracellular VWA domains have been noted previously (Ponting et al., 1999). As we discuss below, these proteins are more broadly distributed phylogenetically than the metazoan extracellular proteins discussed above, suggesting that the intracellular proteins may be the most ancient.


View larger version (18K):
[in this window]
[in a new window]
 
Figure 4.   VWA domain-containing proteins common to all eukaryotes are intracellular and found in multiprotein complexes. The five proteins shown are found in all eukaryotes, and we show examples from Metazoa, A. thaliana, and yeast. VWA domains are predicted in orthologues from each of the five. The VWA domain is a particular form of a Rossmann fold and some of the Rossmann folds in orthologues of Ku70/80 and Sec23 do not give predictions for the VWA domain subclass. The VWA domains of TFIIHp44 and Rpn10 are closely related. Both TFIIHp44 and Ku70/80 exhibit helicase activity. A common feature of these proteins is that they are subunits of multiprotein complexes. Given the role of VWA domains in protein-protein interactions, a possible role for the VWA domains in these proteins is in mediating assembly of these complexes. Whatever the role of the VWA domains, these intracellular molecules seem to be the most ancient eukaryotic examples of this protein domain from which other (mostly extracellular) VWA proteins presumably evolved. Molecules that may also be included in this group but have been lost in some taxa include copines (Figure S5). Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively. See text for further discussion of these molecules and their relatives.

Rpn10-26S Proteasome Regulatory Subunit The 26S proteasome is a complex of proteins involved in degradation of ubiquitin-tagged proteins (reviewed in Voges et al., 1999). The subunit of the complex that recognizes chains of ubiquitin is Rpn10 (or S5a, PSD4). A single orthologue of this molecule is encoded by all completed eukaryotic genomes. In addition to the N-terminal VWA domain, Rpn10 proteins contain ubiquitin-interacting motifs that are involved in recognition of multiubiquitin. Yeast cells deficient in Rpn10 are viable, suggesting that Rpn10 is not the only multiubiquitin-binding protein in cells. However, the VWA domain in Rpn10 may play a role in efficient 26S complex function (Fu et al., 2001). MIDAS motifs are not found in any Rpn10 (Table S3). Region 1 of the yeast Rpn10 VWA domain has the sequence DxSxY, and Fu et al. (2001) demonstrated a requirement for an acidic residue in the D position by using several functional assays, suggesting that the intact VWA domain is required for 26S proteasome function. The VWA domain can also mediate interaction with Id1, a transcription regulator that is itself regulated by ubiquitin-mediated proteolysis (Anand et al., 1997; Bounpheng et al., 1999).

TFIIHp44 TFIIH is a multiprotein complex that is one of the five general transcription factors that binds the RNA polymerase II holoenzyme (Orphanides et al., 1996; Myer and Young, 1998). The p44 subunit of TFIIH is the human orthologue of yeast SSL1 (Humbert et al., 1994). Orthologues of these genes are also found in all completed eukaryotic genomes and all these proteins have VWA domains (Figure 4). TFIIHp44 functions as a DNA helicase in RNA polymerase II transcription initiation and DNA repair, and its transcriptional activity is dependent on its C-terminal Zn-binding domains (Fribourg et al., 2000). The function of the VWA domain is unclear, but it may be involved in complex assembly. MIDAS motifs are not conserved except for the fly, which has an imperfect MIDAS motif.   During the course of these analyses, we performed pairwise comparisons of isolated VWA domains from each species considered. One outcome of this work was the identification of a strong relationship between TFIIHp44 and Rpn10. When the complete VWA domain collections of eukaryotes are compared, the VWA domains of TFIIHp44 and Rpn10 form a clade with >90% bootstrap support. This relationship is supported by Aravind and Ponting (1998) who recognized them as homologues. Both proteins function in intracellular multiprotein complexes so these results may suggest an analogous role for each in their respective complexes.

Ku70/80 DNA Helicase Family In humans, the lupus autoantigens Ku70 and Ku86 form heterodimers, bind DNA, and are involved in repair of double-stranded breaks in DNA (Walker et al., 2001, and references therein). SMART predicts that human Ku70 and Ku86 have VWA domains. The Ku DNA helicases have orthologues in all completed eukaryotic genomes and, therefore, seem to be a pan-eukaryotic VWA domain-containing protein (Figure 4). The VWA domain, however, is not consistently predicted in many of these orthologues (Figure S6). The structure of the human Ku70/Ku86 heterodimer has been solved and both subunits contain a Rossmann fold, but whether this should be considered a true VWA domain is unclear. The VWA domains/Rossmann folds are probably not involved in dimer formation but may support interactions with additional molecules (Walker et al., 2001).

ATPases Associated with Diverse Cellular Activities (AAA)-VWA Proteins Two proteins in the human genome contain a combination of AAA and VWA domains. AAA domains are found in all branches of life and have ATPase activity (Patel and Latterich, 1998). There is growing evidence that AAA proteins are important in assembly and disassembly of macromolecular complexes (Maurizi and Li, 2001). Both of the AAA-VWA proteins in the human genome were previously uncharacterized.   The first is a 5000 amino acid protein common to all completed eukaryotic genomes (AAAVWA-euk; Figure 4). The yeast orthologue is required for cell viability (Winzeler et al., 1999) and coprecipitates with a protein complex thought to be the transport intermediate of 60S ribosomal subunits (Bassler et al., 2001). These proteins have multiple AAA domains at the N terminus and a single VWA domain near the C terminus. Human and C. elegans AAAVWA-euk have perfect and the other orthologues have imperfect MIDAS motifs, suggesting that divalent cations may play a role in their function. In the databases, the yeast, D. melanogaster, and A. thaliana orthologues are full length. The human and C. elegans are fragments and concatenation of adjacent Genomescan-predicted proteins is required to generate full-length sequence. These sequences are available in Table S4.   The second AAA-VWA domain protein in the human genome has orthologues in D. melanogaster and C. elegans but is not detectable in yeast and A. thaliana, suggesting it may be a metazoan invention (AAAVWA-ani; Figures 1, 5, and S7). These molecules have two AAA domains and a single C-terminal VWA domain. Perfect MIDAS motifs are found in the human and D. melanogaster orthologues, and the C. elegans molecule has an imperfect motif, suggesting that cations may be important in their function. The human protein is a fragment in the database but concatenation of adjacent Genomescan-predicted proteins results in an intact molecule that is >30% identical to the C. elegans orthologue over 1800 amino acids. The Drosophila protein is also a fragment in the database, and two adjacent predicted genes in Flybase were concatenated to generate a full-length molecule. The sequences of these assembled molecules are available in Table S4. The function of these proteins is unknown.

The only other proteins that contain the combination of AAA and VWA domains are the D subunits of the protoporphyrin IX Mg chelatase complexes. These proteins are known as chloroplast Mg chelatase D (ChlD) in A. thaliana (Figure 6) and bacterial Mg chelatase D (BchD) in bacteria. They are also found in archaeal genomes, and it may be appropriate to name these molecules archaeal Mg chelatase D (AchD). The protoporphyrin IX enzymatic complex is essential for chlorophyll biosynthesis and is likely to be common to all photosynthetic organisms. Mg chelatases are complexes of three subunits that function to incorporate Mg into protoporphyrin IX in an ATP-dependent mechanism (Walker and Willows, 1997). Bacterial cobalt chelatases are also three subunit complexes (Schubert et al., 1999) and one subunit, COBT, has a VWA domain but not an AAA domain. At least an imperfect MIDAS is present in 18/19 chelatases subunits examined and in 11 of the examples the MIDAS motif is perfectly conserved. In the cobalt chelatases, a G replaces the T4 in all three available examples, possibly reflecting different cation preferences of the chelatases. In any case, it is likely that the MIDAS motif in most chelatase D subunits has the potential to coordinate a cation. The functional significance of this coordination may be in presentation of ion to the protoporphyrin IX or in complex assembly.

Sec-23 The S. cerevisiae protein Sec-23 has orthologues in all completed eukaryotic genomes, but only the S. cerevisiae molecule is predicted to have a VWA domain. Sec-23 is part of a multiprotein complex involved in transport of vesicles from the Golgi to the endoplasmic reticulum. It is possible that the Sec-23 VWA domain prediction is a false positive because the e-value is high (1.01) and all three MIDAS regions are unconserved. Q9ZQH3 is an uncharacterized A. thaliana molecule whose VWA domain is similar to yeast Sec23 (63% bootstrap support). This molecule, however, is distinct from the A. thaliana orthologues of Sec-23, all of which lack a predicted VWA domain.   These five intracellular VWA proteins found in all eukaryotes (Figure 4) could be viewed as the primordial set of VWA proteins. All are involved in multiprotein complexes, and it is tempting to speculate that the role of VWA domains is in the protein-protein interactions contributing to these complexes. In most cases, divalent cations are key to the structure and or function of these complexes, suggesting another potential role for the VWA domains, although most lack MIDAS motifs.

Copines The copines are phospholipid-binding proteins, originally identified in Paramecium (Creutz et al., 1998). There are nine family members in human (three groups of three) and three orthologues in both C. elegans and A. thaliana, but none are detectable in D. melanogaster or S. cerevisiae (Figure S5). Two of the human copines (A and B) are previously uncharacterized Genomescan-predicted proteins and their sequences are available in Table S4. It seems that the ancestral organism common to Homo sapiens, C. elegans, and A. thaliana had a single copine that independently expanded into three in C. elegans and A. thaliana and nine in H. sapiens. The phylogenetic distribution suggests that copines have been lost from some eukaryotic phyla (Figure S5).   Each copine contains two C2 domains followed by a single VWA domain except for copine W3, which has a single C2 domain. Three additional C. elegans molecules (Figure 5) have VWA domains that are closely related to the copines but all lack C2 domains. No functional properties have been assigned to the VWA domains of copines. They contain a functional MIDAS motif based on preferential binding to magnesium and manganese (Tomsig and Creutz, 2000) despite the fact that the MIDAS motif is not perfectly conserved. In all 15 cases, region 1 of the copine MIDAS consists of the sequence D-x-T-x-S. The Paramecium sequence is D-x-T-x-Q at this location. The C2 domains mediate calcium-dependent phospholipid binding (Davletov and Sudhof, 1993) and support oligomerization (Tomsig and Creutz, 2000). In A. thaliana, copines may play a role in growth regulation. BONZAI1 mutants that lack copine 1 (Figure S5) produce miniature plants when grown at 22°C (Hua et al., 2001). Mutations in the same gene described by another group result in abnormal cell death in response to low-humidity conditions (Jambunathan et al., 2001).


View larger version (41K):
[in this window]
[in a new window]
 
Figure 5.   C. elegans has a large number of novel VWA domain-containing ECM proteins. The domain architecture of all C. elegans VWA domain-containing proteins is indicated. Paralogous molecules are grouped together in the phylogenetic tree derived from a clustalW alignment. The groups of paralogues or unrooted individual molecules have been shuffled along the vertical axis for clarity of presentation; so there is no information in the root of the tree (vertical gray bar). The molecules in blue lack close homologues in all other completed genomes. Note the novel domain associations in many of these proteins. All molecules below the purple bar seem to be extracellular or membrane-associated; the localizations of those below the red bar are unclear. The VWA domains of the C. elegans copines are closely related to the VWA domains of three uncharacterized molecules and these relationships are indicated by a red box. Other relationships are also indicated by boxes and the bootstrap numbers are provided in yellow ovals. Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively. See Table S2 to cross-reference molecules in this figure with database identifiers.

Unknown Conserved in Humans and Flies (Q9H0S5/Q9VPY0) The final human VWA protein, Q9H0S5, is an uncharacterized molecule with a clear MIDAS domain and widespread expression based on EST frequency. Q9VPY0 is the Drosophila orthologue of this gene with several embryonic ESTs (TIGR Drosophila gene index TC71868; Adams et al., 2000; Figure S7). Q9VPY0 does not have a conserved MIDAS motif. Both molecules consist of an N-terminal VWA domain followed by ~400 amino acids of sequence that lacks obvious distinguishing features, including a signal sequence. The human and D. melanogaster molecules are ~33% identical over their entire length. Orthologues of these molecules are not detectable in the C. elegans, A. thaliana, or S. cerevisiae genomes. This is the only VWA domain-containing protein found in humans and D. melanogaster but not in the C. elegans genome.

Additional C. elegans VWA Proteins

It has been noted (Hutter et al., 2000) that C. elegans contains a novel set of ECM and adhesion genes, and the same can be said for VWA-domain proteins (Figure 5, marked in blue). This genome encodes a large number of C-type lectin proteins, several of which also contain a VWA domain. This combination of domains is not seen in any other organism, and many of the proteins contain other domains as well or instead (Figure 5; CUB, ZP, and EGF). These all seem to reflect separate evolution of VWA-domain proteins in C. elegans. It remains unclear why this species has elaborated such a plethora of apparent adhesion proteins; perhaps it has something to do with cuticle formation or with the highly reproducible arrangements of cells in this organism, providing additional zip codes not required in less determinative organisms.

The C. elegans proteins mup-4 and mua-3 are VWA domain-containing transmembrane receptors for extracellular matrix (Figure 5). Mup-4/mua-3 have functional and molecular similarities with the mammalian integrin beta 4, suggesting that the molecules may represent an example of convergent evolution (Bercher et al., 2001; Hong et al., 2001). Like all integrin beta  subunits, mup-4 and mua-3 have extracellular VWA domains with conserved MIDAS motifs and EGF modules and, like integrin beta 4, their cytoplasmic domains link to intermediate filaments. In support of this idea, D. melanogaster do not have intermediate filaments and lack a homologue of either integrin beta 4 or mup-4 (Bercher et al., 2001).

Additional A. thaliana VWA Proteins

In contrast with the elaboration of novel extracellular VWA domain proteins in C. elegans, A. thaliana seems to have elaborated the intracellular portion of the VWA domain-containing protein set (Figure 6). There are no integrins or ECM proteins in the plant set of VWA proteins.


View larger version (40K):
[in this window]
[in a new window]
 
Figure 6.   A. thaliana has additional intracellular VWA domain proteins. The domain architecture of all A. thaliana VWA domain-containing proteins is indicated. Paralogous molecules are grouped together in the phylogenetic tree derived from a clustalW alignment. The groups of paralogues or unrooted individual molecules have been shuffled along the vertical axis for clarity of presentation; so there is no information in the root of the tree (vertical gray bar). The molecules in blue lack homologues in completed fungal and metazoan genomes. All molecules above the purple bar seem to be intracellular. No information is available for the molecules below the purple bar although one (Q9LSX2) has a predicted signal sequence (indicated by red bar), suggesting that that group of three molecules might be secreted. The VWA domains of the copines are closely related to the VWA domains of the VWA-RING proteins (Q9LVN6 lacks the RING domain). The relationship is indicated by the red box and the number in the yellow oval. Perfect and imperfect MIDAS motifs are indicated by solid and hollow red stars, respectively. See Table S2 to cross-reference molecules in this figure with database identifiers.

Two groups of A. thaliana proteins contain a combination of VWA and ring finger domains. Ring finger domains frequently have E3 ubiquitin-protein ligase activity. Rice and A. thaliana are the only sequenced organisms containing this combination, suggesting that it is a plant-specific domain architecture. One group of three paralogues has VWA-ring arrangement. A fourth paralogue, Q9LVN6, lacks the ring domain but the VWA domain is closely related to the other three. The VWA domains of these molecules are also closely related (100% bootstrap support) to the A. thaliana copine VWA domains, including the D-x-T-x-S sequence at region 1 of the MIDAS (Figure 6).

The second group, which seems to be unrelated to the first, has eight paralogues and the architecture ring-VWA. One-half of these have conserved MIDAS domains (Figure 6). The VWA domains of these molecules are related to the human Q9BVH8 family of VIT-VWA molecules (70% bootstrap support) and a group of bacterial proteins (NP_442565 and relatives; 92% bootstrap support).

With the possible exception of two groups of homologues, one with two members and the other with three members (Figure 6, below purple line), all A. thaliana VWA domain proteins seem to be intracellular. No functional information is available for the molecules below the purple line, but all have at least imperfect MIDAS motifs and one has a predicted signal sequence, suggesting that it might be secreted.

Prokaryotic VWA Proteins

In general, the archaeal and bacterial VWA domain-containing proteins are not well characterized. Of the 148 prokaryotic VWA domains in the databases, 90 have the words hypothetical, putative, ORF, or unknown in their descriptions. More than 80% of prokaryotic VWA domains have at least an imperfect MIDAS motif, indicating that divalent cation coordination may play a role in protein function (Figure S1, a-c, and Table S3).

In Archaea, there are 16 VWA domain proteins from nine different species (Table S2). Five of these are Mg chelatases based on sequence homology. The remaining proteins are all described as hypothetical or ORF. Within this list of unknown proteins, there are two groups of clear homologues; one group contains four members, the other contains two. All archaeal genomes sequenced to date have at least one VWA protein, but there are no universally present VWA domain proteins and the only molecules clearly related to eukaryotic VWA domain proteins are the Mg chelatases (see below).

In bacteria, there are 132 VWA domain proteins from 49 different species (Table S2). Examples of intracellular, secreted, and membrane-associated proteins can be found in this list. There are 11 Mg and four Co chelatases in the list. Eight molecules are orthologues of n