An Elaborate Classification of SNARE Proteins Sheds Light on the Conservation of the Eukaryotic Endomembrane System
Proteins of the SNARE (soluble N-ethylmalemide–sensitive factor attachment protein receptor) family are essential for the fusion of transport vesicles with an acceptor membrane. Despite considerable sequence divergence, their mechanism of action is conserved: heterologous sets assemble into membrane-bridging SNARE complexes, in effect driving membrane fusion. Within the cell, distinct functional SNARE units are involved in different trafficking steps. These functional units are conserved across species and probably reflect the conservation of the particular transport step. Here, we have systematically analyzed SNARE sequences from 145 different species and have established a highly accurate classification for all SNARE proteins. Principally, all SNAREs split into four basic types, reflecting their position in the four-helix bundle complex. Among these four basic types, we established 20 SNARE subclasses that probably represent the original repertoire of a eukaryotic cenancestor. This repertoire has been modulated independently in different lines of organisms. Our data are in line with the notion that the ur-eukaryotic cell was already equipped with the various compartments found in contemporary cells. Possibly, the development of these compartments is closely intertwined with episodes of duplication and divergence of a prototypic SNARE unit.
The elaborate endomembrane system of eukaryotes is thought to have evolved by invagination of the plasma membrane during perfection of a phagotrophic lifestyle. Subsequently, primitive endomembranes differentiated into the various spatially and functionally separated compartments found in contemporary eukaryotes (Roger, 1999; Cavalier-Smith, 2002). Material exchange is mediated by cargo-loaded vesicles that bud from the donor and eventually fuse with the acceptor compartment. This allows the cells to take up nutrients through the endocytic pathway. Conversely, newly synthesized proteins and lipids are transported within the cell through the exocytic pathway. As each organelle must maintain its identity, vesicular trafficking is tightly regulated (Bonifacino and Glick, 2004). Although some protists possess highly divergent compartments, it is becoming clear that the underlying molecular machineries involved in vesicular trafficking are highly conserved among all eukaryotes. Key players in the concluding step, the fusion of a vesicle with its acceptor membrane, are the so-called SNARE (soluble N-ethylmalemide–sensitive factor attachment protein receptor) proteins (reviewed in Hong, 2005; Jahn and Scheller, 2006). They form a family of small cytoplasmically orientated membrane-associated proteins that comprise a relatively simple domain architecture. Their characteristic is the so-called SNARE motif, an extended segment arranged in heptad repeats. In most SNAREs the motif is C-terminally connected to a single transmembrane domain by a short linker.
As general mechanism it is thought that the SNAREs on the transport vesicle form a tight complex with the SNAREs in the acceptor membrane. Directional complex assembly between the membranes, starting from the N-terminal tips of the SNARE motifs toward the C-terminal membrane anchors, is thought to pull the lipid bilayers together and to initiate membrane merger. The very similar crystal structures of three, only distantly related SNARE units (Sutton et al., 1998; Antonin et al., 2002b; Zwilling et al., 2007), have strengthened the view that they all form similar “nano-machines,” consisting of an elongated, parallel four-helix bundle. In its interior, 16 layers of mostly hydrophobic residues are formed, which are highly conserved; in particular, a hydrophilic layer in the center, consisting of three glutamine (Q) residues and one arginine (R) residue, is almost unchanged throughout the SNARE family, leading to a classification into Q- and R-SNAREs (Weimbs et al., 1997; Fasshauer et al., 1998). The existence of other asymmetric layers in the bundle implies that the three Q-SNARE helices are also prototypes that define distinct subfamilies. Accordingly, the SNARE motifs of a SNARE unit are classified into Qa-, Qb-, Qc-, and R-SNAREs (“QabcR-complex”; Bock et al., 2001). However, it is still not entirely clear whether all SNARE sequences can be classified into four basic types. The analysis of SNARE sequences is also made difficult by the fact that the current Hidden Markov model (HMM) profiles used in the Pfam and SMART databases often do not recognize SNARE motifs nor do they allow for an unambiguous classification of SNARE types.
An initial phylogenetic survey had indicated that all SNAREs build functional units that are in accord with the structural “QabcR-rule” (Bock et al., 2001). These SNARE units are thought to participate in different membrane traffic steps within the cell. Yet, several SNARE proteins seem to function in more than one trafficking step, rendering it difficult to specify unique SNARE sets (Banfield, 2001; Pelham, 2001; Jahn and Grubmuller, 2002; Jahn and Scheller, 2006). In addition, the original classification of SNARE proteins (Bock et al., 2001) was based on only ∼100 sequences from only few organisms and did not include some more diverged SNARE types (Lewis et al., 1997; Lewis and Pelham, 2002; Burri et al., 2003; Dilcher et al., 2003). In the subsequent years, additional complete genomes shed more light onto the conservation of the SNARE machinery, yet some SNAREs, in particular from protists, appeared to be rather atypical (Sanderfoot et al., 2000; Dacks and Doolittle, 2002, 2004; Gupta and Brent Heath, 2002; Uemura et al., 2004; Besteiro et al., 2006; Schilde et al., 2006; Sutter et al., 2006; Ayong et al., 2007; Kissmehl et al., 2007; Sanderfoot, 2007). These studies, however, did not provide a universal classification scheme, as usually only a subset of sequences was examined. Thus, it remains unclear how the remarkable morphological diversity of eukaryotes is reflected in their SNARE repertoires. For example, so far the highest number of SNARE proteins was discovered in green plants (Sanderfoot et al., 2000; Sutter et al., 2006; Sanderfoot, 2007), but it is unclear so far whether the additional SNAREs mediate novel trafficking steps or whether they are simply variations in the given repertoire. This calls for an exhaustive comparison of SNARE repertoires in different organisms. Evidently, a better understanding of the number, distributions, and interactions of SNAREs in different eukaryotic kingdoms might shed more light onto the organization, conservation, and possibly the origins of the eukaryotic endomembrane system.
Here, we present a detailed classification of the different members of the SNARE family, reflecting their participation in different trafficking steps. Nineteen of 20 generated HMM profiles achieve at least a 95% accuracy and can therefore be used as a highly significant method to classify SNARE proteins. Finally we provide an interactive web interface to de novo classify SNAREs and to access our collected information.
MATERIALS AND METHODS
We started with a set of ≈150 SNARE proteins from the five species Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, and Arabidopsis thaliana. These were grouped into the four previously established subgroups (Bock et al., 2001). We aligned each subgroup using muscle (Edgar, 2004) and used an eye-by-eye verification to assure the quality of the alignment. We extracted the 53 amino acid long SNARE motif of each subgroup and used HMMER, with standard settings and calibration, to build HMM profiles (hmmer.janelia.org; Durbin et al., 1998) and to search the nonredundant (nr) database at National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov). We gathered ∼800 sequences and used the SNARE motif to perform a classification analysis. With the results of the classification analysis we searched the nr-database at NCBI again and gathered more than 2000 protein sequences. We finally added more than 1000 protein sequences from different genome projects (DOE Joint Genome Institute [JGI], genome.jgi-psf.org/; and Broad Institute; www.broad.mit.edu/seq/). After removing sequences with more than one occurrence, sequences that seemed to be misassembled or sequences who failed an eye-by-eye verification, we obtained a set of 2165 SNARE sequences. For these sequences we removed insertions and added gaps according to the aligned HMMER sequence.
We reconstructed the phylogenetic tree at two different levels of evolution, the complete tree of all SNAREs, and trees of each of the four main subgroups (data not shown). Because of the large number of sequences, we defined a core set of 11 representative species (short name in brackets), for which we calculated the individual species trees and a general tree based on all 11 species: Arabidopsis thaliana (ArTh), Cryptococcus neoformans (CrNe), Danio rerio (DaRe), Homo sapiens (HoSa), Neurospora crassa (NeCr), Schistosoma japonicum (ScJa), Ostreococcus lucimarinus (OsLu), Phytophthora sojae (PhSo), Schizosaccharomyces pombe (ScPo), Trichomonas vaginalis (TrVa), and Trypanosoma cruzi (TrCr). For each set of sequences gathered, we started the analysis by building an IQPNNI (Important Quartet Puzzling and Nearest Neighbor Interchange) tree (Vinh le and Von Haeseler, 2004) from the SNARE motifs using the Jones, Taylor, and Thornton (JTT-) distance matrix (Jones et al., 1992) and a gamma distribution with four categories to take rate-heterogeneity across sites into account. Afterward we used Likelihood-Mapping (Strimmer and von Haeseler, 1997) for each edge in the IQPNNI tree to estimate the accuracy of the topology. For more confidence, we also used the phylip package (Felsenstein, 1989) to run a distance-based bootstrap analysis with 1000 replicates. We used standard settings for seqboot, again the JTT-distance matrix (Jones et al., 1992) and also a gamma distribution (with parameter approximation from tree-puzzle; Schmidt et al., 2002), for protdist and standard options for neighbor. Whenever necessary we used a random seed of 9. Because bootstrap values have been shown to be systematically biased, we used the almost unbiased (AU) test (Shimodaira, 2002) to correct for this. The site wise log-likelihoods needed for the AU-test were obtained using a modified version of phyml (Guindon and Gascuel, 2003) and the test was performed using consel (Shimodaira and Hasegawa, 2001). Finally we estimated the root of the four main subgroups using the trees of the four main subgroups with the Qa subgroups as outgroups for the Qb and R groups and the R subgroups as outgroups for the Qa and Qc subgroups.
Together with the results of the phylogenetic analysis, biological knowledge and the domain structure of the complete protein sequences, we assigned each sequence motif to one functional subgroup. The resulting subgroups were analyzed for their PPR and sensitivity. Whenever we found motifs with <95% PPR, we joined the subgroup with the subgroup causing most false positive hits. Motifs with <95% sensitivity were broken apart into smaller and more sensitive subgroups.
A Web Interface for the De Novo Classification of SNAREs and Access to Our Results
We implemented a Web-based interface for access to our results (http://bioinformatics.mpibpc.mpg.de/snare/). It is divided into three sections. The first section is dedicated to the access to our collected information, which can be searched for groups, species, and protein names. We programmed two different views, one for an overview and one for the detailed sequence information, such as the position of the found motif and its expectation value. The second section presents an interface for the submission of new sequences to our HMM models. We implemented a 0.1 expectation value cutoff to minimize false-positive results. The results display the best three hits and the position of the motif in the alignment. The final section contains the SNARE tree generated in Nexus format (Maddison et al., 1997) that can be analyzed in detail with SplitsTree (Huson and Bryant, 2006).
For our study, we used only the highly conserved SNARE motif, which allows for an ungapped alignment of 53 amino acids (Figure 1). We started with a known set of approximate 150 SNARE proteins (Bock et al., 2001). We extracted HMM profiles for the four main groups of the SNARE family and searched the nonredundant (nr) database from the NCBI site (www.ncbi.nlm.nih.gov) with these. We gathered ∼800 protein sequences for a phylogenetic analysis. This allowed to refine the four main groups into 20 distinct subgroups. We then extracted HMM profiles for these subgroups and gathered ≈3600 protein sequences from the nr-database, the est-database, and some genome projects. Rigorously removing splice variants and sequences with low certainty we kept a total of 2165 sequences. One hundred forty-five species occurred within the analysis, and we are certain that we have found an almost complete set of SNAREs in about half of these species. In detail, our analysis included SNAREs from 59 animals, 41 fungi, 18 plants, 25 protists, and 2 viruses.
The discovery of SNAREs encoded in the genome of viruses was somewhat unexpected, but these proteins might be used during host cell invasion. The R-SNARE (group IV) encoded by the Coccolithovirus, which infects the microalga Emiliana huxleyi, had previously been reported (Wilson et al., 2005). The other viral SNARE sequence was found in the genome of the mimivirus, which grows in the amoeba Acanthamoeba polyphaga (Raoult et al., 2004). It encodes for a SNAP-25-like (25-kDa synaptosome-associated protein) protein containing two SNARE motifs (Qbc-SNARE).
Even though we were not aiming at finding new SNAREs in well-studied genomes, we identified four new SNARE proteins in vertebrate genomes. One of these new SNAREs (SNAP-47) is a Qbc-SNARE and has been described recently by our group (Holt et al., 2006). The three others were Qa-SNAREs (often referred to as syntaxins [Syx]). We named them Syx20, a homolog of Syx7 (Qa.III.b), Syx19, and Syx21, the latter two homologues of Syx11 (Qa.IV). Detailed information to all collected SNARE proteins can be found on our publicly accessible Web-based interface (http://bioinformatics.mpibpc.mpg.de/snare/).
Evidently, a high quality of the obtained data set is indispensable for discriminating the different types of SNARE proteins within a cell and for pinpointing evolutionary changes within the repertoires. An assessment of our generated HMM profiles showed that our classification is well suited to clearly separate nearly all true positives from false positives (Figure 2A). Furthermore, for each HMM profile we estimated the positive predictive rate (PPR), a value that gives the rate at which a positive is a true positive, and the sensitivity, a value that describes the percentage of correctly found true positives, by a resampling approach. All profiles achieved at least 95% PPR and reached, with one exception, at least 95% sensitivity (Figure 2B). Profiles such as the Qa.II top out at a 100% PPR and 100% sensitivity. The only profile not matching our goal is the one of the Qb.III.d-class with a sensitivity of only 91%. As this group contains the smallest number of sequences, it might be that as new sequences become available, the sensitivity of this group improves above the 95% margin.
It is very likely these subgroups established by bioinformatic analysis correspond to functional orthologues, but the sequence information is not sufficient to pinpoint the exact transport step of the different SNARE types. With the available biological information, we have therefore tentatively assigned the 20 subgroups to the main routes of vesicle trafficking of a “typical” eukaryotic cell (Table 1). In addition, we have grouped them—following the structural QabcR rule—into basic SNARE units. We started by allocating the five Qa-SNARE subgroups, which are thought to denote the target organelle, to the major compartments. These compartments were numbered according to the arrangement of the secretory pathway: endoplasmic reticulum (ER; I), Golgi apparatus (II), trans-Golgi network (TGN; III.a), digestive endosomal compartments (III.b), and plasma membrane (IV). According to biological information, we assigned the other SNAREs to the trafficking steps toward the respective target compartments. For example, the SNARE set thought to be involved in retrograde transport from the Golgi toward the ER has been combined into group I (Qa.I: Syx18/Ufe1; Qb.I: Sec20; Qc.I: Use1; R.I: Sec22). The consecutive transport steps, from the ER to the Golgi and within the Golgi, are mediated by only one Qa.II-SNARE (Syx5/Sed5). The interacting Qb.II- and Qc.II-groups usually contain two factors each, suggesting that each factor participates only in one transport step. Furthermore, the interaction partners of the TGN-localized Qa.III.a-group (Syx16/Tlg2) are not established without doubt. As several SNAREs have been reported to engage in promiscuous interactions (Banfield, 2001; Pelham, 2001; Jahn and Scheller, 2006), it seems possible that Syx16/Tlg2 can interact with the SNAREs from the adjacent Golgi apparatus (II) and the endosomal compartment (III.b and III.c). It should also be kept in mind that it is still debated for some SNAREs in which trafficking step and which complex they participate. Hence, the groups listed in Table 1 might reflect the predominantly formed complexes, but interactions with SNAREs of neighboring compartments are possible as well.
|Qa group||Qb group||Qc group||R group|
|I. Endoplasmic reticulum (ER)||Qa.I||Qb.I||Qc.I||R.I|
|II. Golgi apparatus||Qa.II||R.II|
|f: Sed5||m/p: Membrin||Bet1|
|p: Syp3||f: Bos1|
|m: Gos28||m: Gs15|
|f: Gos1||f: Sft1|
|III.a. trans-Golgi network (TGN)||Qa.III.a|
|III.b. Endosomal compartments||Qa.III.b||Qb.III.b||R.III|
|Early||m: Syx7/13||Vti1||Qc.III.b||m/p: Vamp7|
|f: Pep12||m: Syx6||f: Nyv1|
|p: Syp2||f: Tlg1|
|IV. Secretion (+ Cytokinesis/Sporulation)||Qa.IV||Qb.IV (SNAP.b)||Qc.IV (SNAP.c)||R.IV|
|m: Syx1||1. Helix of Qbc||2. Helix of Qbc||m: Syb1|
|f: Sso||m: SNAP-25||m: SNAP-25||f: Snc1|
|p: Syp1||f: Sec9||f: Sec9|
Notably, the SNAREs of the secretory routes involving the ER (group I), the Golgi apparatus (group II), but also in the TGN (Syx16/Tlg2) rarely exhibited an increase in the number of the involved genes, even in multicellular organisms. The high conservation of these SNARE groups suggests that these trafficking routes in particular are highly preserved between all eukaryotes. In contrast, several SNARE proteins involved in endosomal trafficking seem to have adapted much more to the different demands and feeding types in different eukaryotes. The original exocytotic SNARE set comprises only three SNARE proteins, a Qa-SNARE (Qa.IV), an R-SNARE (R.IV), and a SNAP-25-like protein that combines one Qb- and one Qc-motif in one protein (Qb.IV and Qc.IV, “Qbc-SNARE”). In many organisms, however, the number of secretory SNAREs has increased.
Almost all SNARE groups found are highly conserved throughout all inspected species. Several eukaryotes, in particular fungi and green algae, encompassed a relatively simple repertoire of SNARE, often consisting of only one member for each subgroup. This strongly suggests that the 20 subgroups in Table 1 represent the basic set of SNAREs of the eukaryotic ancestor. Changes in the SNARE repertoire in different eukaryotic organisms appear to have arisen by multiplications and diversifications within this original set. The high conservation of the basic SNARE groups, suggests that the earliest eukaryotes were probably already equipped with an elaborated endomembrane system with the main routes of vesicle trafficking established. Interestingly, due to genome duplications (Aury et al., 2006) the unicellular ciliate Paramecium tetraurelia with its stunningly complex subcellular organization holds even more SNARE proteins (∼70) than multicellular organisms like higher green plants (e.g., 62 in Arabidopsis) or vertebrates (e.g., 41 in Homo). This indicates that an increase in the number of SNARE proteins is not necessarily associated with the emergence of multicellularity. Interestingly, although the Paramecium sequences are often clearly more deviated, they still fitted into our classification scheme. Thus, it seems likely that the more complex membrane trafficking pathways of Paramecium is build on the basic subcellular organization known from “typical” eukaryotic cells.
Finally, we constructed 11 phylogenetic trees, each containing the complete or nearly complete set of SNAREs of a different representative eukaryotic species and one tree containing the SNAREs of all 11 species. The tree obtained for the trematode Schistosoma japonicum is shown in Figure 3. The tree of this “lower” animal represents well the general outline of SNARE trees obtained for most eukaryotic species. All other trees can be found in the supplemental section on our Webpage (http://bioinformatics.mpibpc.mpg.de/snare/). These trees confirm the fundamental splitting of SNAREs into the four main branches Qa-, Qb-, Qc-, and R-SNAREs (Bock et al., 2001). These main groups very likely reflect the principal structural arrangement of SNARE complexes, strengthening the view that all SNAREs assemble into canonical four-helix bundle structures. Each of the four fundamental branches segregated into distinct subgroups (Figure 3), corresponding to our HMM profiles (Table 1). The root of each subgroup branch usually consists of the SNAREs involved in secretion (group IV). This suggests, assuming that the different SNARE sets in present-day eukaryotes evolved by duplication and diversification of a prototypic set, that the secretory SNAREs appear to be relatively unchanged. In contrast the SNAREs involved in ER transport (group I) seem to have deviated more.
The conserved mechanism and structure of SNARE proteins has long been recognized (reviewed in Hong, 2005; Jahn and Scheller, 2006), but a comprehensive classification was lacking so far. Our first attempt to classify SNAREs using psi-BLAST together with hierarchical clustering resulted in a clearly inferior accuracy. A recently published survey of the SNARE family using the cluster approach aids this impression (Yoshizawa et al., 2006), because several well-known SNAREs were not found and sequences were clustered together that may not be orthologues.
We systematically identified and classified SNAREs using HMM-profiles and a phylogenetic analysis. Because sequence orthology is a reliable predictor of function, our classification scheme provides valuable insights into the catalyzed transport steps. In addition, our classification can be used as a highly significant method to assign a distinct class to a de novo found SNARE sequence. As a highly beneficial tool for cell biology, we have implemented a Web-based interface to access our results and to query new sequences according to our classification.
SNAREs from Protists with More Derived Endomembrane Systems Fit Well into Our Classification
As most SNARE sequences in our study stem from genomes of animals, plants, and fungi, the current classification scheme is especially suited to classify SNAREs from these kingdoms. For several orthologous SNAREs, the tree also quite accurately reflects the phylogenetic relationship of species within these kingdoms. However, morphological and functional adaptations of trafficking routes and organelles, mostly of the endosomal and secretory pathways, also influenced the pattern of the SNARE tree. It has been proposed that the deepest eukaryotic divide places plants together with most protists on one side and fungi, animals, and some amoebae on the other side (Philippe et al., 2000; Stechmann and Cavalier-Smith, 2002; Richards and Cavalier-Smith, 2005). If correct, comparisons of the SNARE sets of animals, fungi, and plants would be largely sufficient to outline the bauplan of the secretory apparatus of all eukaryotes. Nevertheless, in the phylogenetic tree containing all 11 species (Supplementary Material), SNAREs from some protists are often isolated. We believe that this will change, as more sequences from other protists will be included in our classification. This phenomenon is visible already for the apicomplexa, for which several species were included (Plasmodium, Theilleria, and Cryptosporidium). Similarly, SNAREs of the kinetoplastid parasites Trypanosoma brucei, T. cruzi, and Leishmania major usually also congregated, reflecting their close relationship. In most protists with entire genomes available, we discovered a reasonable collection of SNARE proteins, supplementing the so far established repertoires (Dacks and Doolittle, 2002, 2004; Besteiro et al., 2006; Schilde et al., 2006; Yoshizawa et al., 2006; Ayong et al., 2007; Kissmehl et al., 2007). For example, for Plasmodium falciparum we found 22 SNAREs with almost all different subgroups represented. For L. major we discovered 28 SNARE sequences, but so far we found only one type (Sec20, Qb.I) of the more deviated Q-SNAREs involved in ER-transport (group I). Before drawing conclusions, however, one has to keep in mind that, although several protist genomes have been completely sequenced, the available data still appear slightly fragmentary.
A number of contemporary anaerobic eukaryotes were until recently thought to primitively lack mitochondria and to possess a more primitive endomembrane system. For example, the parasitic protist Giardia was thought to lack a morphologically identifiable Golgi apparatus. Formerly, Giardia was thought to have diverged before the acquisition of such organelles. However, Giardia was found not only to contain rudimentary mitochondria (Embley and Martin, 2006), it also possess several key factors of membrane trafficking, among them SNARE proteins (Dacks and Doolittle, 2002, 2004; Marti et al., 2003a,b). In fact, with only 15 detected SNAREs, Giardia has a clearly smaller set as compared with most other unicellular eukaryotic organisms. Principally, the SNAREs of Giardia are compatible with our classification; however, SNAREs belonging to group II, which is involved in Golgi transport, appear to lack. Because Giardia does not possess a conventional Golgi apparatus (Marti et al., 2003a,b), a possible explanation is that an entire SNARE unit, and hence the vesicular trafficking route mediated by these SNAREs, has been lost in this organism. Another possibility, however, is that the group II SNARE proteins in Giardia are too derived to be detected by our HMM profiles.
In contrast to the compact set in Giardia, we found 31 SNAREs in the genome of Entamoeba histolytica. Remarkably, in this organism the Q-SNAREs involved in ER transport (group I) appear to lack. As the sequences of this SNARE group are often more derived, we again cannot rule out that our current HMM profiles do not detect these sequences in this organism. Because we found at least one member of the group I SNAREs in all major groups of eukaryotes that are represented by the current data set, it is likely that this vesicular trafficking step was also present in the common ancestor.
Thus, although the SNAREs of several different protozoan parasites often form long branches in our trees, our analysis does not support the notion that their endomembrane system is more primitive but rather might be a degenerate feature of their parasitic lifestyle. Nevertheless, because of sparse biological data for these protist groups, our classification of protist SNAREs ought to be verified in the future.
Evolution of the SNARE Apparatus Is Linked to the Development of the Endomembrane System
Our analysis suggests that the 20 basic SNARE subgroups discriminated by our HMM profiles represent the repertoire of SNARE proteins of the eukaryotic ancestor. Almost certainly, in the eukaryotic ancestor these 20 SNARE types operated already in vesicular trafficking steps between the major intracellular compartments that are conserved in almost all contemporary eukaryotes. It has been proposed that the different intracellular compartments of a eukaryotic cell, along with the molecular machineries involved in vesicular trafficking, emerged by events of duplication and diversification of a simpler endomembrane system of a more primitive ancestor (Roger, 1999; Cavalier-Smith, 2002). Possibly, the development of the 20 basic SNARE types is closely intertwined with the development of different intracellular compartments. In fact, it had been discussed earlier that the machineries involved in vesicular trafficking appear not only to be conserved through phylogeny of species but also throughout the different compartments of the cell (Bock et al., 2001), although the species included in this study by far did not represent the entire eukaryotic diversity. In subsequent studies, the presence of different conserved syntaxin subfamilies (i.e., mostly Qa-SNAREs) in several diverse eukaryotes supported the notion that these SNAREs diverged early in eukaryotic evolution (Dacks and Doolittle, 2002, 2004). The early diversification of SNARE proteins is also corroborated by the fact that other, distantly related eukaryotes encompass relatively conserved SNARE sets (Besteiro et al., 2006; Schilde et al., 2006; Yoshizawa et al., 2006; Ayong et al., 2007; Kissmehl et al., 2007; and our analysis). Notably, our study substantiates the view that all SNARE proteins principally split into the four major phylogenetic classes: Qa, Qb, Qc, and R (Bock et al., 2001), suggesting that the prototypic unit was composed of four different SNARE proteins, able to assemble into a tight four-helix bundle between two fusing membranes. Thus, it is likely that during the early stages of eukaryotic evolution entire SNARE units rather than the single subunits were duplicated. These prototypic SNARE units diverged afterward. In fact, we detected some patterns of coevolution in the different SNARE units, in particular in the Q-SNAREs of group I, but promiscuous interactions between different SNARE subunits may have precluded further diversifications. The phylogenetic trees obtained from SNARE proteins of different eukaryotic species (Figure 3 and Supplementary Material), together with the relative simple domain architecture of the SNAREs (Figure 1), allow for some tentative speculations about the nature of a prototypic SNARE machinery: Because all ancestral R-SNARE types appear to contain an N-terminal domain with a profilin-like fold (Figure 1; Gonzalez et al., 2001; Tochio et al., 2001; Wen et al., 2006), they may have originated from a common ancestor that contained this N-terminal extension. Note that this domain has been lost in the secretory R.IV-SNAREs in fungi and animals. Similarly, all Qa-SNAREs exhibit a very similar domain architecture, carrying an N-terminal three-helix bundle structure (Fernandez et al., 1998; Lerman et al., 2000; Misura et al., 2000; Munson et al., 2000; Dulubova et al., 2001). Interestingly, several of the Qb- and the Qc-SNAREs possess an N-terminal three-helix bundle as well (Antonin et al., 2002a; Misura et al., 2002; Fridmann-Sirkis et al., 2006). This hints at a common origin of all three main Q-SNARE groups from a prototypic Q-SNARE. Hence, a scenario can be envisioned in which originally a trimer of a prototypic Q-SNAREs on the target membrane interacted with a prototypic R-SNARE on the vesicular membrane. However, this partition between the two membranes is so far only established for the secretory SNAREs, whereas in other trafficking steps the distribution of the four SNARE subunits is still heavily debated. Therefore, it is challenging to bring our data into line with biological knowledge. For example, a different notion about the prototypic SNARE unit might be supported by the fact that in the species trees (e.g., Figure 3) the four basic SNARE groups partition into two elementary groups, one containing the R- and Qb-SNAREs and one containing the Qa- and Qc-SNAREs. These splitting can be observed in species trees of animals, protists, and plants. Interestingly, the two main branches each unite the two diagonally opposite helices of the four-helix bundle SNARE complex. Notably, these two main branches in general exhibit a comparable topology of the subgroup splits, possibly reflecting coevolution.
The two trees obtained from fungi species show a different splitting, but can be brought into line as can be seen from the tree containing all eleven species (data set in Supplementary Materials). In this tree the support of the inner splitting is somewhat decreased compared with a tree without the fungi species (data not shown). A more detailed phylogenetic analysis of the evolution of SNAREs within the different eukaryotic kingdoms seems to be of interest but is beyond the scope of this article.
Analogous themes of duplication and divergence for other important factors involved in intracellular membrane trafficking have been exposed and implicated in the evolution of the eukaryotic endomembrane system, for example, for the organelle-specific Rab GTPases (Pereira-Leal and Seabra, 2001; Jekely, 2003) and tethering factors involved in vesicular trafficking (Koumandou et al., 2007). Similarly, coat protein–based budding of transport vesicles is mediated by different but homologous protein machineries at different donor organelles: COPI, COPII, and clathrin (McMahon and Mills, 2004). Related protein machineries have been even implicated in the establishment of the nuclear envelope (Devos et al., 2004; Mans et al., 2004) and the eukaryotic cilium (Jekely and Arendt, 2006). Very likely these factors evolved from a prototypic unit that was able to generate areas of highly curved membranes. Together, the development of such prototypic protein machineries provided the raw material for the intricate evolution of the endomembrane system of the eukaryotic cell.
This article was published online ahead of print in MBC in Press (http://www.molbiolcell.org/cgi/doi/10.1091/mbc.E07-03-0193) on June 27, 2007.
We thank H. Schmidt for help with IQPNNI, W. Hordijk for the modifications to phyml, and H. D. Schmitt, G. Fischer von Mollard, F. Varoqueaux, P. Burkhardt, R. Jahn, U. Winter, and K. Wiederhold for critical reading of the manuscript. We are greatly indebted to D. Huson for generously supporting this project.