Molecular Biology of the Cell track citations

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Originally published as MBC in Press, 10.1091/mbc.E03-06-0443 on January 23, 2004

Vol. 15, Issue 4, 1487-1505, April 2004

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
E03-06-0443v1
15/4/1487    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cox, R.
Right arrow Articles by Segev, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cox, R.
Right arrow Articles by Segev, N.

Essay

Phylogenetic Analysis of Sec7-Domain–containing Arf Nucleotide Exchangers

Randal Cox *, Roberta J Mason-Gamer {dagger}, Catherine L. Jackson {ddagger}, and Nava Segev § ||

* Departments of Biochemistry and Molecular Genetics, Laboratory for Molecular Biology, University of Illinois at Chicago, Chicago, Illinois 60607; {dagger} Department of Biological Sciences, Section of Ecology and Evolution, Laboratory for Molecular Biology, University of Illinois at Chicago, Chicago, Illinois 60607; § Department of Biological Sciences, Laboratory for Molecular Biology, University of Illinois at Chicago, Chicago, Illinois 60607; and {ddagger} Cell Biology and Metabolism Branch, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892

Submitted June 26, 2003; Revised December 29, 2003; Accepted December 31, 2003
Monitoring Editor: Thomas Pollard


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
The eukaryotic family of ADP-ribosylation factor (Arf) GTPases plays a key role in the regulation of protein trafficking, and guanine-nucleotide exchange is crucial for Arf function. Exchange is stimulated by members of another family of proteins characterized by a 200-amino acid Sec7 domain, which alone is sufficient to catalyze exchange on Arf. Here, we analyzed the phylogeny of Sec7-domain–containing proteins in seven model organisms, representing fungi, plants, and animals. The phylogenetic tree has seven main groups, of which two include members from all seven model systems. Three groups are specific for animals, whereas two are specific for fungi. Based on this grouping, we propose a phylogenetically consistent set of names for members of the Sec7-domain family. Each group, except for one, contains proteins with known Arf exchange activity, implying that all members of this family have this activity. Contrary to the current convention, the sensitivity of Arf exchange activity to the inhibitor brefeldin A probably cannot be predicted by group membership. Multiple alignment reveals group-specific domains outside the Sec7 domain and a set of highly conserved amino acids within it. Determination of the importance of these conserved elements in Arf exchange activity and other cellular functions is now possible.



    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Arf GTPases are conserved molecular switches that regulate vesicular transport in all eukaryotic cells (Donaldson and Jackson, 2000Go; Randazzo et al., 2000Go). In this process, membrane-bound vesicles bud from one intracellular compartment and fuse with another. These vesicles mediate the different protein transport steps of the exocytic and endocytic pathways. The Arf family has been studied extensively in two model systems: yeast and mammals (Table 1). Saccharomyces cerevisiae has five Arf or Arf-related (Arl) proteins. The Arf1/Arf2 pair is required for cell viability and is implicated in all the steps of the exocytic and endocytic pathways (Stearns et al., 1990Go; Gaynor et al., 1998Go; Yahara et al., 2001Go). The other three proteins, Arf3/Arl2, Arl1, and Arl3, are not required for cell viability, and their function is less clear. The mammalian Arfs include three groups: classes I–III. Although class I Arfs are involved in transport through the exocytic pathway, class III is implicated in endocytosis and in actin dynamics at the plasma membrane (D'Souza-Schorey et al., 1995Go; Peters et al., 1995Go; Radhakrishna and Donaldson, 1997Go). Little is known about the roles of the class II Arfs. In addition, a number of mammalian Arls share ~35–55% identity with the Arfs (Proteome BioKnowledge Library. http://www.incyte.com/sequence/proteome/index.shtml on 5/1/2003). The most extensively studied Arfs, the yeast Arf1/2 and the mammalian class I Arfs, share ~80% identity and are thought to be important for the process of vesicle budding (Randazzo et al., 2000Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Substrate specificity of Arf GEFs

 

Switching between the GDP- and the GTP-bound forms is crucial for Arf function (Dascher and Balch, 1994Go; Kahn et al., 1995Go). The switch from the GDP to the GTP-bound form is stimulated by guanine-nucleotide exchange factors (GEFs). GTPase activating proteins stimulate the switch from the GTP- to the GDP-bound form of Arf. All Arf GEFs identified to date have a Sec7 domain (Jackson and Casanova, 2000Go), a 200-amino acid region that is sufficient for GEF activity (Chardin et al., 1996Go; Jones et al., 1999Go). The phylogeny of the Sec7-domain protein family is the subject of this study. The multiplicity of Arfs and Sec7-domain proteins suggests that different Sec7-domain proteins have substrate preference toward specific Arfs. In yeast, the substrate specificity question has not been addressed, because only Arf1/2 were tested as substrates for the Sec7-domain proteins. Mammalian Sec7-domain proteins have known substrate preference in vitro (Table 1). For example, BIG1 preferentially acts as a GEF on class I Arfs, Golgi-specific brefeldin A (BFA) resistance factor 1 (GBF1) on class II, whereas BRAG2/Arf-GEP100 and EFA6 act on class III Arf6. The cytohesins (CYH) can act on both class I and III Arfs, but the Sec7 domain of CYH1 and ARNO/CYH2 do not catalyze exchange on Arl1–3 and Arl3, respectively (Chardin et al., 1996Go; Pacheco-Rodriguez et al., 1998Go). It is not clear that this in vitro substrate preference is physiological. Different Sec7-domain proteins are known to exhibit different patterns of intracellular localization and biological function (Jackson and Casanova, 2000Go; Bonifacino and Jackson, 2003Go).

The three-dimensional structures of the Sec7 domains of several GEF proteins have been solved, both alone and in complex with Arf. The Arf and Arl proteins are unique among GTPases in having two regions of the protein that change conformation upon release of GDP. As in all GTPases, Switch I and Switch II undergo a major conformational change, resulting in effector binding to these regions. Unique to the Arf/Arl subfamily of GTPases is a second conformational change which extends to the myrisoylated N-terminal amphipathic helix and results in the attachment of the GTPase to membranes (Randazzo et al., 1995Go; Antonny et al., 1997Go; Pasqualato et al., 2002Go). Therefore, Arf nucleotide switching also determines the cycling of Arf between the cytoplasm and membranes. A possible exception to this rule is Arf6, which under certain conditions seems to be constitutively bound to membranes (Cavenagh et al., 1996Go; Yang et al., 1998Go; Gaschet and Hsu, 1999Go). If true, GTP-binding induces a conformational change on Arf6 without affecting its membrane association.

The 10 {alpha}-helices and connecting loops of the Sec7 domains of S. cerevisiae Gea2p and Homo sapiens ARNO/CYH2 are organized into two subdomains with a hydrophobic groove between them (Cherfils et al., 1998Go; Goldberg, 1998Go). In the complex of the Sec7 domain with nucleotide-free Arf, the two switch domains of Arf are positioned in the hydrophobic groove (Goldberg, 1998Go). Superimposing the unbound and Arf-bound Sec7 domain of Gea2p suggests that the two proteins are flexible and that they change each other's structure upon binding (Renault et al., 2002Go). Arf-GDP binding to the Sec7 domain results in the closure of the Sec7-domain hydrophobic groove. This closure triggers the protrusion of the catalytic glutamate from the Sec7 domain, which in turn causes Arf to lose its nucleotide. The nucleotide-free form of Arf is already in a conformation very close to that of the GTP-bound form. When in the GDP-bound form, Arf is largely soluble, with its amphipathic N-terminal helix tucked into a hydrophobic pocket of the protein. In the Arf–Sec7 domain complex, this pocket disappears due to movement of the {beta}2 and {beta}3 strands and the loop between them, forcing the N-terminal myrisoylated {alpha}-helix to swing out and become inserted into the membrane (Beraud-Dufour et al., 1999Go; Pasqualato et al., 2002Go).

BFA, a fungal metabolite, has been widely used as a drug for studying protein transport. The main cellular target of BFA has been identified as a subset of the Arf GEFs (Donaldson et al., 1992Go; Helms and Rothman, 1992Go; Randazzo et al., 1993Go; Peyroche et al., 1999Go). BFA blocks the GEF activity of the Sec7 domain itself (Sata et al., 1998Go; Jones et al., 1999Go) through a somewhat unusual mechanism in which BFA binds to both the Sec7 domain and Arf (Mansour et al., 1999Go; Peyroche et al., 1999Go). The target of BFA is an early intermediate in the exchange reaction, an Arf-GDP–Sec7 protein complex (Robineau et al., 2000Go). BFA biding interferes with the closure of the Sec7-domain groove, freezing the Sec7-domain Arf–GDP complex (Peyroche et al., 1999Go; Renault et al., 2002Go). The convention in the field is that small Sec7-domain proteins, e.g., H. sapiens CYH1, are BFA resistant, whereas large Sec7-domain proteins, e.g., BIG and GEA, are BFA sensitive (Jackson and Casanova, 2000Go). Analysis presented here challenges the generalization of this convention to other systems.

Phylogenetic analyses are now possible due to the increasing number of fully sequenced genomes. The protein transport machinery and mechanisms are highly conserved from yeast to mammals (Mellman and Warren, 2000Go). Therefore, a phylogenetic analysis of its components should provide a basis for extrapolation of what is known about the mechanisms in one model system to others. The phylogenetic analysis of the Sec7 proteins presented here suggests their distribution into seven groups. Based on the distribution of proteins in these groups and the known intracellular function of some Sec7-domain proteins, predictions can be made regarding the biological function of other members of the same group in other organisms. Multiple alignments of the Sec7 domain suggest that both Arf binding and the conserved structure are crucial for the function of this domain. Multiple alignments of whole proteins reveal group-specific as well as a number of cross-group homologies outside the Sec7 domain. The group-specific domains suggest that members of individual groups share group-specific functions that might include intracellular localization and intermolecular interactions in addition to their primary Arf-GEF activity.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Databases
Sequences of completely finished and curated proteomes of eight eukaryotes, 16 archaea, and 77 bacteria were acquired from European Bioinformatics Institute in April 2003 (Pruess et al., 2003Go). The annotated protein sequences from assembly version 3 of the Neuorspora crassa genome were obtained from the Whitehead Institute Center for Genome Research (Galagan et al., 2003Go). Annotated protein sequences for the Legionella pneumophila genome were from the Columbia Genome Center (Russo, 2003Go). The whole genome nucleotide sequences for Drosophila melanogaster (Celniker et al., 2002Go), and Caenorhabditis elegans (C. elegans Sequencing Consortium, 1998Go), H. sapiens (Lander et al., 2001Go), and 136 microbial genomes were from National Center for Biotechnology Information. Additional names and annotations were derived from the nonredundant protein database at National Center for Biotechnology Information (Benson et al., 2003Go).

Identification of Sec7-Domain Proteins
Sec7-domain–containing proteins were identified in the annotated proteome databases by iteratively comparing this database to known Sec7-domain sequences by three blast (Altschul et al., 1997Go) rounds with an expectation value of 10-2. In the first round, protein blast detected 69 sequences homologous to the Sec7 domain of the S. cerevisiae protein Sec7 (183 amino acids, from 838-FNNK to HQAM-1020), which was shown to posses an Arf-GEF activity (Jones et al., 1999Go). Custom perl scripts stripped each of these blast hits to just the sequences homologous to the query sequence. The original sequence plus the homologous hits were the query sequences for a second round of protein blast, which revealed nine more hits, for a total of 79 Sec7-domain–containing proteins. Subsequent rounds of iterative blasting revealed no additional hits in the protein database. To be sure that no Sec7-domain–containing sequences were omitted during the proteome curation at EBI and elsewhere, the entire published genomes of D. melanogaster, C. elegans, and H. sapiens were translated in all six reading frames using the universal translation table. All open reading frames (ORFs) longer than 30 amino acids were blasted against the Sec7-domain–containing proteins defined above. None of the resultant hits from the translated ORFs revealed any novel sequences.

Full-length versions of each sequence were isolated from the protein databases for all further work. Inspection of these sequences revealed many to be truncated duplicates of other sequences. Only the longest sequences were retained. Sequences from the currently incomplete Mus musculus genome were removed, because they were sufficiently similar in subsequent phylogenetic analysis to H. sapiens sequences that they offered little additional information, except for the mouse FBX8 (see below). The remaining 52 protein sequences were compared against the nonredundant database at National Center for Biotechnology Information (Benson et al., 2003Go) by using protein blast to identify the most recent National Center for Biotechnology Information names, annotations, and alternative names, all of which are recorded in Table 2. All sequences in our database corresponded to an identical sequence (same length and residues) at National Center for Biotechnology Information.


View this table:
[in this window]
[in a new window]
 
Table 2. Sec7-domain–containing proteins in seven eukaryotic model systems and bacteria

 



View larger version (45K):
[in this window]
[in a new window]
 
Figure 1. Unrooted phylogenetic tree relating the Sec7-domain ARF-GEF family members. (A) Tree was constructed using neighbor-joining with core Sec7-domain gene sequences. Pie sections demarcate seven phylogenetically related groups of the Sec7 family plus one in mammals and one in bacteria. The bar on the lower left-hand shows distance. (B) Confidence measurements of phylogentic tree branches. Branches as in A, except that lengths no longer show phylogenetic distance and branch color indicates data from bootstrap analysis (1000 iterations). Colors range from red, indicating 50% bootstrap support, to green, indicating 100% support.

 
Construction of the Phylogenetic Trees
The 51 core Sec7-domain protein sequences form the seven model systems and bacteria, were aligned by Clustal (Thompson et al., 1994Go), by using the default settings for slow/accurate alignments (gap penalty of 10, gap extension cost of 0.2, 30% delay for divergent sequences, four space gap separation distance, without end gap separation, with residue-specific penalties, and by using the Gonnet series protein weight matrix). The aligned core Sec7 sequences, defined by the iterative blasting described above, were manually trimmed on the N- and C-terminal ends to remove weak or ambiguous alignments. Phylogenetic analysis was performed using PAUP 4.0b10 (Swofford, 2003Go) as informed by a globular protein substitution matrix (Whelan and Goldman, 2001Go). Distance trees were created using the heuristic search for optimal trees. The initial tree was created by neighbor joining, and branch swapping used the TBR algorithm. To estimate the reliability of this tree, bootstrap analysis was performed with 1000 replicates of full heuristic searches, by using the same weighting parameters used in the initial analysis. The distance and bootstrap trees were drawn by TreeView version 1.6.6 (Page, 1996Go) and manually modified in a general-purpose graphics editor. For the bootstrap results, the branches were color coded from red to green to reflect their persistence in 50–100% of the bootstrapped trees.

Multiple Alignments within and across Groups
Alignment illustrations were based on the above-mentioned Clustal alignments and generated by a custom perl script to reflect comparisons of the qualities of alignments between and within groups of Sec7 family proteins. All Sec7 family groups were assigned a primary hue on the HSL color wheel from red to purple, the same as those seen in Figure 1. For a given group of proteins, the alignment was broken into segments according to cross-family alignments (e.g., a run of just CYH alignments, a run of CYH and BIG alignment, etc.). For regions with no alignment in at least half of the members of the group, gray boxes were drawn in the middle of the region reflecting the average size of the members that had unrelated sequences in that area.

For each other region, both intra- and intergroup similarity scores were computed. Intragroup scores were the average PAM250 (Altschul et al., 1997Go) value, averaged over all possible pairings between members of the group. Intergroup similarities were computed as the average PAM250 score of each possible pair consisting of one member of each of the two groups. Each comparison generated a bar colored according to the primary hue of the target group but with an intensity reflecting the quality of the alignment. PAM250 scores of less than zero are drawn in black. Positive scores are drawn in the target group's hue but with a saturation scaled from 0 to 100% for PAM250 scores of 0–3, respectively. The pleckstrin homology (PH), coiled coil (CC), and F-box domains were identified by the hidden Markov models used by SMART (Schultz et al., 1998Go; Letunic et al., 2002Go).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Identification of Sec7-Domain–containing Proteins
We searched the fully sequenced and annotated genomes of seven eukaryotic model systems that span fungi, animals, and plants, for Sec7-domain–containing proteins. The fungi are S. cerevisiae (Sc), Schizosaccharomyces pombe (Sp), and N. crassa (Nc); Arabidopsis thaliana (At) represents the plants; and the animals are C. elegans (Ce), D. melanogaster (Dm), and H. sapiens (Hs). We chose only completely sequenced and well-curated genomes to be reasonably sure that the generated trees contain all Sec7-domain proteins from each organism. This allows us to make conclusions regarding the groups that are present or not present in those organisms. An iterated blasting approach detected all sequences containing a Sec7 domain present in the proteome databases, no matter how far diverged from one another. We began with the Sec7 domain of S. cerevisiae Sec7p as seed sequence because it was shown to posses an Arf-GEF activity on its own (Jones et al., 1999Go). Briefly, sequences homologous at a blast score of 10-2 to this seed were trimmed to just the core Sec7 domain and blasted against the databases, generating new hits to blast against the databases until no new sequences were found (see MATERIALS AND METHODS).

This method revealed a total of 49 Sec7-domain–containing protein sequences in the seven eukaryotic model systems (Table 2; http://www.uic.edu/~nava/papers/SEC7). Most discovered sequences were not well described in the literature. More than one-half of these are only automatically named. Of those well studied, most are from S. cerevisiae or mammals. Table 2 suggests a consistent set of names for all identified Sec7-domain proteins based on the phylogenetic analysis shown in Figure 1A, by using published names of the related well-studied members of each group as templates. All seven searched eukaryotic model systems have Sec7-domain proteins. Interestingly, one fully sequenced eukaryote genome, the endosymbiotic algal nucleus of Guillardia theta, has no recognizable Sec7-domain protein. However, evolutionary compaction eliminated the majority of the genome of this nucleomorph (Douglas et al., 2001Go).

The A. thaliana Big5 sequence found at European Bioinformatics Institute has a 26-amino acid deletion relative to the other A. thaliana BIGs. This deletion corresponds to most of helix {alpha}7 and includes the catalytic glutamate and three other highly conserved amino acids (sites h to j; Figure 4B). Such a deletion should cause a significant change in the structure of the Sec7 domain of this protein and loss of Arf GEF activity. In addition, the deletion results in an unexplainable positioning of A. thaliana Big5 in the phylogenetic tree. We found the nucleotide code for the missing 26 amino acids in the DNA sequence immediately after the intronexon border as defined by TAIR (The Arabidopsis Information Resource. www.arabidopsis.org on October 1, 2003). The missing sequence is almost perfectly conserved with that of the other A. thaliana BIGs. Even though its inclusion requires the use of an unconventional splice site, this region is immediately followed by a stop codon, as is often the case for used splice sites (Li et al., 2002Go). It is possible that the A. thaliana BIG5 evolved from a typical GEF, had a splice site changed to abolish the catalytic activity, and acts as a dominant negative protein. However, such a change must have happened very recently given the high sequence conservation of the 26 amino acids. Therefore, the corrected A. thaliana Big5 protein sequence was used for all further analyses. When these amino acids were included, the A. thaliana BIG5 groups together with the other BIGs on the phylogenetic tree.



View larger version (39K):
[in this window]
[in a new window]
 
Figure 4. Multiple alignments of the Sec7 domain arranged by group. (A) Visual representation of the multiple alignment strength in the Sec7 catalytic domain. As in Figure 3, except all cross-group homologies, including those with limited quality, are shown as colored areas. In addition, the gray areas indicating regions with no intragroup sequence similarity are embedded in white spaces whose width indicates the maximum insert size. (B) Positions of the HC amino acids identified here (further described in Table 4). (C) Positions of previously characterized mutations that affect Arf GEF activity. The top row indicates the wild-type sequence and the bottom row shows the target mutations. Color from red to green indicates the proportion (from 0 to 100%) of wild-type activity expressed by the mutant. Mutations that abolish Arf binding are indicated by *. CYH1 mutant coordinates: E157K, V179A, Y187A, M195A, D207A, K208A, K208E, R219A, and E157A (Betz et al., 1998Go). ARNO/CYH2 mutant coordinates: E117K, R152E, E156K, M194A, and N201A (Cherfils et al., 1998Go). (D) Arf contacts. The positions of the 15 amino acids that make contact with Arf in the cocrystal (Goldberg, 1998Go). See coordinates in F legend. (E) BFA sensitivity consensus. The positions of the amino acids, which are proposed here to define a consensus for BFA sensitivity (Table 5). See structural positions in G. (F) Arf contact sites. The Arf-binding surface in the hydrophobic groove is shown on the three-dimensional structure of the Sec7 domain of S. cerevisiae GEA2 (by x-ray crystallography). Numbered green solid arrows indicate the 10 predicted alpha helices. Light blue indicates predicted loops. Bright blue sequences correspond to the Arf-contact sites positioned in the hydrophobic grove as defined in the Arf–Sec7 domain complex (Renault et al., 2002Go). Yellow indicates the catalytic glutamic acid. The N-side of the groove is composed of helices {alpha}1–5, the glutamate finger loop, and N-the terminal half of helix {alpha}7. The C-terminal side includes helices {alpha}6, the C-terminal half of helix {alpha}7 and helices {alpha}8–10. Positions for amino acids are, in S. cerevisiae Gea2 coordinates: R650, L651, G653, S655, Q656, I658, D696, F699, I700, Y703, I706, M707, D711, and V717. (G) BFA sensitivity consensus. As in F, except that yellow sequences correspond to amino acids that are predicted to define BFA sensitivity shown in E and Table 5. Positions for consensus sites are in S. cerevisiae Gea2 coordinates: Y703, S704, M707, D711, and M721. (H) HC amino acids. HC amino acids are positioned on both sides of the Sec7 domain. Two 180° opposite views of the Sec7 domain are shown: front and back. Red indicates HC sites shown in B and Table 4. Limited overlap, 4/16, exists between HC amino acids and Arf contact sites of the Sec7 domain.

 



View larger version (51K):
[in this window]
[in a new window]
 
Figure 3. Multiple alignments of whole proteins arranged by group. Visual representation of the multiple alignment strength in the Sec7-domain–containing proteins. Each group is shown in a different color and row. For a given row, colored bands indicate the strength of the alignment between members of that group and each other group for which there is significant alignment. For example, the orange areas in the BIG/SEC7 row indicate intragroup homologies, whereas the yellow areas represent BIG/SEC7-GBF/GEA intergroup homologies. Color intensities indicate average PAM scores between all pairs of sequences in the comparison, with gray indicating no sequence similarity (0 or less) to bright colors indicating strong sequence similarity (3). Tall gray areas indicate regions with no sequence similarity even within a group. The width of these gray areas is proportional to the average insert size. Boxes mark areas of large cross-group sequence similarity, including the central Sec7 domain. Names are attached to regions of sequence similarity either within or across groups. Actual sequences corresponding to these regions can be viewed at http://www.uic.edu/~nava/papers/SEC7.

 

View this table:
[in this window]
[in a new window]
 
Table 4. HC amino acids in the Sec7 domain in all Sec7-domain proteins

 

View this table:
[in this window]
[in a new window]
 
Table 5. BFA sensitivity as determined by alignment

 
We also searched fully annotated proteomes of 77 bacteria and 16 archaea. The latter are considered to share a common ancestor with eukaryotes (Iwabe et al., 1989Go; Brown and Doolittle, 1995Go; Baldauf et al., 1996Go), which could suggest that they have related proteins. However, no archael Sec7-doain proteins were found, implying that the Sec7-domain family evolved in eukaryotes after their divergence from the archaea. Instead, two bacterial proteomes, both parasites of eukaryotes, each contain a single Sec7-domain protein: L. pneumophila (Lp) (Nagai et al., 2002Go) and Rickettsia prowazekii (Rp). Furthermore, a protein blast against 136 translated microbial genomes at National Center for Biotechnology Information as of May 2003 revealed no hits against the Sec7 domain of S. cerevisiae Sec7 with an e <10-2, except for the R. prowazekii protein already in our database (the L. pneumophila genome was not in National Center for Biotechnology Information database at the time). Both L. pneumophila and R. prowazekii belong to the Proteobacteria phylum. However, 23 other Proteobacteria, including, notably, a close relative of R. prowazekii, Rickettsia conorii, do not have Sec7-domain proteins. Based on these findings and the fact that the two are parasites of eukaryotes, it seems clear that the BACT group resulted from horizontal transfer, as was suggested previously for the L. pneumophila RalF (Nagai et al., 2002Go).

Phylogenetic Tree for Sec7-Domain Proteins
Figure 1A shows a distance tree of the 52 identified Sec7-domain proteins (49 from model eukaryotic organisms, a single mouse protein noted below, and two bacterial proteins) that was constructed using the core Sec7-domain protein sequences and the programs described in MATERIALS AND METHODS. We used only the core sequence because tree construction is based on sequence alignment, and there is very little cross-group sequence similarity outside the Sec7 domain (Figure 3). The tree is unrooted because we could not find any sequences that are related, even at an extremely lenient blast cutoff (e <10-2) and are unambiguously diverged from all seven model eukaryotes.

The colored pie sections delineate groups of the Sec7 family. The rationale for this grouping comes from the topology of the tree and is based on known phylogenetic relationships (e.g., all animal or fungi members of a particular group are clustered together). Further justification of this grouping comes from the findings that the branches at the base of each group, except GBF/GEA, are strong (Figure 1B) and that the members of each group, including GBF/GEA, share similar overall structure (Figure 3). Two groups, BIG/SEC7 and GBF/GEA, include members from fungi, plants, and animals. Three groups, CYT, BRAG, and EFA, are specific for animals. Two groups, SYT1 and SYT2, are specific for fungi. One outlier is the human FBX8, which is found at the base of the tree, between the BRAG and the SYT/EFA branches. This protein is not included in another group because it does not meet the above-mentioned three considerations: phylogenetic relationship, strong branch position, and similar structure (discussed further below with respect to Figure 2). An almost identical protein is present in M. musculus, shown in Figure 1 for comparison.



View larger version (82K):
[in this window]
[in a new window]
 
Figure 2. Distribution of Sec7-domain Arf-GEFs in different organisms. Trees are as in Figure 1 except that the thick black lines show the members of the family in each organism. All fungi have Sec7 proteins in the same four groups. Animals each have members in five groups. Plants have members in just two groups. F, human and mouse genomes also contain a novel gene, FXB8, which is only weakly related to any other group. B, in addition to the eukaryotic sequences, two related bacterial sequences were also found, which are not closely related to any other group.

 

The two bacterial Sec7-domain proteins branch from the center of the tree, not especially close to any other group. In addition, these two proteins do not show substantial structural similarity to the eukaryotic Sec7-domain proteins outside the Sec7 domain, even though they do share extensive similarity between themselves (Figure 3). We have not used these bacterial sequences to root the eukaryotic tree because of the following two reasons. First, the lack of Sec7-domain proteins in most bacteria suggests horizontal transfer from eukaryotes, and not evolution of the eukaryotic Sec7-domain family from a bacterial progenitor (see above). Second, the fact that the bacterial Sec7-domain proteins are not close to any of the eukaryotic groups does not necessarily mean that the horizontal transfer occurred before the divergence of the eukaryotic groups, but rather that the evolution of bacterial and eukaryotic proteins could occur at very different rates. Therefore, an unrooted tree is the most reasonable representation of relationships among these Sec7-domain proteins.

Bootstrap Analysis of the Sec7-Domain Tree
To assess the reliability of the position of the branches, a distance-based bootstrap analysis in PAUP by using 1000 iterations was performed. This method computes many trees after randomly resampling individual amino acids. Branches that are supported by few characters, or by characters that change multiple times, will be present in only a small fraction of these trees, whereas branches with robust support will be present in a high percentage of all trees. Figure 1B graphically illustrates the strength of these branches in terms of the fraction of trees containing the same branch.

In the bootstrap analysis of the Sec7-domain protein tree, many branches, especially within groups, show substantial bootstrap support (>90%). Furthermore, the branches at the bases of the groups, except for GBF/GEA, all show 84–100% bootstrap support, implying that the definition of these groups is solid. Membership of the A. thaliana GBFs to the GBF/GEA group is only weakly supported (53%). However, sequence similarity outside the Sec7 domain (Figure 3) implies that they do belong to this group. We conclude that the definition of these seven groups shown in Figure 1 is solid.

The center of the tree has weak bootstrap support (28–78%). The center branches are relatively short, and their poor support probably reflects the fact that short internal branches can be difficult to resolve with a limited amount of data (e.g., Saitou and Nei, 1986Go). Consequently, the exact relationship between the groups remains uncertain. The branch that separates the mammalian EFA and the fungal SYT groups from the other groups has 78% bootstrap support, suggesting that these groups are related. However, we do not define animal EFA and fungal SYTs as one group, EFA/SYT, because unlike the other two groups that include fungal and animal members, BIG/SEC7 and GBF/GEA, the EFAs and the SYTs do not share sequence similarity either outside the Sec7 domain, or at the N terminus of the Sec7 domain (helices 2 and 3), where they diverge from the other groups (Figures 3 and 4A).

Distribution of the Sec7-Domain Proteins in the Seven Model Systems
Figure 2 and Table 3 show the occurrence of the Sec7-domain proteins with respect to the groups defined in Figure 1, in the seven model systems analyzed here. Two groups are present in all the model systems: BIG/SEC7 and GBF/GEA. The three fungi have four to seven Sec7-domain proteins that are representatives of four groups, two of which are fungal specific, SYT1, and SYT2. Animals have five to 15 Sec7-domain proteins from five groups of which three are animal specific, CYH, BRAG, and EFA. N. crassa, C. elegans, and D. melanogaster each has single representatives for the groups common to their respective kingdoms, but S. cerevisiae, S. pombe, and H. sapiens show some apparently recent duplication. The A. thaliana plant genome seems to have specialized only in BIG/SEC7 and GBF/GEA GEFs, making multiple duplicates of each. The two bacterial sequences form a separate group as judged by the blast, bootstrapping, and structural analysis.


View this table:
[in this window]
[in a new window]
 
Table 3. Summary of group members present in each organism

 

In addition to the normal five groups found in all animals, H. sapiens contains an additional novel member, FBX8. Almost identical sequences for FBX8 are found in the human and mouse genomes, implying that FBX8 is not a sequencing mistake of the H. sapiens genome. However, it not found in other model systems, including the animals C. elegans and D. melanogaster, in spite of a thorough search for potentially missed genes in unannotated ORFs (see MATERIALS AND METHODS). The two FBX8 proteins lie between the BRAG and EFA/SYT groups by PAUP analysis (Figure 1). However, because they are not present in lower animals, and because they are different from all Sec7-domain proteins, including the BRAG, EFA, SYT1, and SYT2 groups, by structure (Figure 3), we define them as a separate group. Sequencing genomes of organisms between D. melanogaster and H. sapiens should help determine when the FBX group occurred in evolution.

Identification of Common and Group-specific Domains of Sec7-Domain Proteins
Multiple alignments of whole Sec7-domain proteins arranged by group are shown in Figure 3. Each group has extensive regions of sequence similarity shared by all members of the group, across all model systems indicated by color (as opposed to gray). A few domains besides the Sec7 domain are common to several groups. Regions of sequence similarity that are common to all members of at least two groups are boxed, whereas regions of sequence similarity that are common to all members of one group are marked as group-specific domains. Sequences corresponding to these regions are viewable at http://www.uic.edu/~nava/papers/SEC7.

The seven eukaryotic groups of Sec7-domain proteins have a centrally located Sec7 domain flanked by varying cross-groups and group-specific domains. The PH domain occurs in five of the seven eukaryotic groups, as determined by hidden Markov models at SMART (Schultz et al., 1998Go). PH domains are important for the interaction of proteins with inositol phosphates and are thought to have a role in the association of proteins with specific membranes, and/or for protein–protein interactions (Blomberg et al., 1999Go; Lemmon and Ferguson, 2001Go). However, although there is sequence similarity within each group in the PH domain, there is no significant blast sequence similarity between PH domains of the five groups (at e < 10-2). The PH domains of representatives from the CYH and EFA groups were shown to be functional (Chardin et al., 1996Go; Derrien et al., 2002Go). The putative PH domains in BRAG, SYT1, and SYT2 do not share significant sequence similarity between themselves or with those of the CYH or the EFA groups. Therefore, further functional characterization of these domains is required. SMART also recognizes CC motifs in CYH and EFA, although these regions show no cross-group sequence similarity. The BIG/SEC7 and GBF/GEA groups share common B/G {alpha},{beta},{gamma} domains. In addition, there are many group-specific domains. Besides the PH and CC domains, these other cross-group and group-specific domains have no similarity to known motifs. Potential functions and binding partners for the cross-group and group-specific domains are discussed below.

The FBX and BACT groups have a C- and N-terminal Sec7 domain, respectively. The FBX proteins contain an F-BOX domain (determined by SMART), which is known to interact with ubiquitin ligases (Kipreos and Pagano, 2000Go). The BACT{alpha}domain is on the carboxy side of the Sec7 domain. It is not significantly homologous (e < 10-2) to any sequence in the database (National Center for Biotechnology Information databases as of May 2003). However, BLAST does find weak alignments with several bacterial proteins with a variety of identified functions, suggesting that this domain is of a bacterial origin.

The group-specific and cross-group domains allow us to speculate about the evolution of the Sec7-domain family and the importance of these domains for their function (see DISCUSSION).

Sequence Similarity within the Sec7 Domain
A detailed multiple alignment of the Sec7 domains of the 52 identified Sec7-domain proteins arranged by group is shown in Figure 4A. For comparison, the crystal structure of the Sec7 domain of S. cerevisiae GEA2 is shown in Figure 4F (Renault et al., 2002Go). The domain contains 10 {alpha}-helices, although there is little or no sequence similarity in the first {alpha}-helix between different Sec7 groups. Interestingly, the break points in the alignment shown in Figure 4A almost exactly correspond to borders between {alpha}-helical domains (see DISCUSSION). The multiple alignments show excellent conservation in helices {alpha}4-{alpha}10 between all groups. Helices {alpha}2 and {alpha}3 are conserved across all groups, including FBX and BACT, except as follows. Helix {alpha}2 is not conserved for EFA and the SYTs. Helix {alpha}3 is not conserved across groups for SYT1 and is weakly conserved for SYT2. However, the latter is conserved within each group, namely between all members of the SYT1 and SYT2 groups (see DISCUSSION).

Sixteen amino acids that are highly conserved between at least 50 of the 52 Sec7-domain proteins identified here are shown in Figure 4B and Table 4. These highly conserved (HC) sites are labeled a to p. Also shown are amino acids that were previously identified by mutagenesis to be important for the interaction of the Sec7 domain of H. sapiens ARNO/CYH2 with Arf in helices {alpha}6-{alpha}9 and for catalysis of exchange (Figure 4C) (Beraud-Dufour et al., 1998Go; Cherfils et al., 1998Go). Arf-contact sites, as identified by the Arf-Sec7 domain cocrystal structure derived from S. cerevisiae GEA2 (Goldberg, 1998Go) are shown in Figure 4, D and F (coordinates are derived from Renault et al., 2002Go). Most of the HC amino acids defined here (HC sites g to p) are within the region of Sec7-domain-Arf contact as delimited by the mutations and the cocrystal. One of these HC residues, site g (E654), is the catalytic glutamate that has been shown to play a critical role in the exchange reaction, by destabilizing Mg2+ and GDP from Arf1 (Beraud-Dufour et al., 1998Go). Only one of the GEFs, S. pombe SYT21, contains a change in this amino acid to aspartic acid (see DISCUSSION).

Four of the 16 HC residues (sites g, k, m, and n) are involved in binding directly to Arf, as determined by the cocrystal structure. Mutations in two of these sites, in addition to the catalytic Glu residue, have been shown to significantly decrease Arf exchange activity. However, there is an overlap of only four amino acids between the 16 HC amino acids identified here, and the 15 Arf-contact sites of the Sec7 domain identified by the cocrystal structure. The remaining 11 contact sites are conserved in at least 38 of the 52 sequences. This indicates that the majority of the amino acids in the Sec7 domain-Arf interface are not as highly conserved and that most of the HC amino acids (12/16) fulfill other roles. In agreement with the latter point, the HC sites are not all on one surface of the Sec7 domain (Figure 4H). These HC amino acids are very likely to be crucial for the structure of the Sec7 domain as a whole (see DISCUSSION).

Eight of the 16 highly conserved amino acids are conserved in all 52 Sec7-domain proteins identified here: HC site e in helix {alpha}6, site g in the 6–7 loop (the catalytic site), sites h to i in helix {alpha}7, and sites l to o in helix {alpha}8 (Table 4). The other eight highly conserved amino acids have one or two exceptions to the conservation as shown in Table 4. All the exceptions are in the fungi SYTs, or BACT. However, the SYTs and BACTs are probably functional Arf-GEFs, because such an activity was shown for one member of the SYT1 and BACT groups (Jones et al., 1999Go; Nagai et al., 2002Go). The definition of HC sites by conservation in 50/52 Sec7-domain proteins is not arbitrary. For amino acids that were conserved in <50 proteins, the exceptions are more randomly distributed in groups other than the SYTs and the BACTs (our unpublished data).

BFA Sensitivity of the Different Sec7-Domain Protein Groups
The secretion-inhibiting drug BFA acts as an uncompetitive inhibitor of the exchange reaction, binding to a normally short-lived Arf–GDP–Sec7 domain protein complex (Mansour et al., 1999Go; Peyroche et al., 1999Go). However, only some Sec7 domains are direct targets of BFA (Chardin et al., 1996Go; Jones et al., 1999Go; Moss and Vaughan, 2002Go). Previous studies have identified amino acids within the Sec7 domain that are responsible for conferring BFA sensitivity or resistance. Five residues were identified that converted a BFA sensitive Arf GEF into a BFA-resistant one, or vise versa, both in vitro and in vivo. In a BFA-sensitive GEF, these residues are YS-M-DM (Y703 S704-M707-D711 M721, S. cerevisiae Gea2p coordinates) in helix {alpha}8 and loop {alpha}8 to {alpha}9 (Figure 4, E and G). We suggest a consensus for BFA sensitivity of the Sec7 domain based on mutations in the Sec7 domain (Table 5A). The sensitivity consensus is: the middle M and either the N-terminal YS or the C-terminal DM (Table 5A).

This consensus is based on the following findings. First, the importance of the middle M residue for BFA sensitivity of Gea2 and the Sec7-domain protein of the malaria parasite Plasmodium falciparum was shown in screens for BFA-resistant mutants (M was changed to L or I). Moreover, the M-to-L mutant of the Gea1p Sec7 domain was resistant to BFA in vitro (Peyroche et al., 1999Go; Baumgartner et al., 2001Go). These findings indicate that the middle M residue is important both in vivo and in vitro for BFA sensitivity. Second, the roles of the N-terminal YS and the C-terminal DM were shown in a number of studies with the BFA-resistant CYHs that normally have the sequence FA-M-SP (F190 A191-M194-S198 P208, ARNO/CYH2 coordinates). Mutating either the N-terminal FA to YS of H. sapiens ARNO/CYH2, or the C-terminal SP to DM of H. sapiens CYH1 converts these Sec7-domain proteins from BFA resistant to BFA sensitive (Peyroche et al., 1999Go; Sata et al., 1999Go). The strength of BFA binding to the ARNO/CYH2 Sec7 domain complex was determined using wild-type and mutant Sec7 domains. Binding of BFA to the wild-type ARNO/CYH2 Sec7 domain was undetectable, whereas each of the double mutants FA-to-YS and SP-to-DM exhibited similar and higher levels of BFA binding. The quadruple mutant bound twofold more BFA than either double mutant alone, and the dissociation rate of BFA from the complex was 10-fold slower for the quadruple mutant compared with each single mutant (Robineau et al., 2000Go).

Interestingly, the five amino acids of this consensus are positioned on one surface of the Sec7 domain, and three of them, Y703, M707, and D711, are also Arf contact sites (Figure 4, F and G). BFA is large enough to cover all three contact points on helix {alpha}8 (Suh et al., 2002Go). Although it is certainly possible that other residues of the Sec7 domain contribute to the BFA sensitivity or resistance of a given Arf GEF, the importance of these five residues is well documented. Hence, we used the above-mentioned consensus to predict the BFA sensitivity or resistance of the 52 Sec7-domain proteins identified here (Table 4B). It should be kept in mind that this analysis is a simplification of the situation, because there is a range of BFA sensitivities in any given assay, and "resistant" or "sensitive" represent points chosen near each end of the distribution.

Eleven of the 12 observed BFA phenotypes are as predicted by our consensus. However, in one case our prediction does not match the reported phenotype: H. sapiens GBF1 has YA-M-DM and is predicted to be BFA sensitive because it has the Y-M-DM BFA-sensitive residues, but it was reported to be resistant for GEF activity on Arf5 (Claude et al., 1999Go). One possible explanation for this discrepancy is that other amino acids are important for the BFA phenotype besides the consensus defined here. Alternatively, because BFA interacts not only with the Sec7 domain but also with Arf (Robineau et al., 2000Go), it is possible that BFA has specificity toward Arf1/3, but not toward Arf5. Testing the BFA sensitivity of the GBF1 GEF activity on Arf1/3 might help resolve this issue.

Our analysis predicts that the convention that members of the GEA/GBF and SEC7/BIG groups are BFA sensitive, whereas members of the CYH, EFA, and BRAG groups are BFA resistant, seems to be valid for human Sec7-domain proteins. This convention, however, breaks down in other organisms, where group membership does not seem to be a predictor for BFA sensitivity or resistance.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
The phylogenetic analysis of the Sec7-domain protein family presented here identified 49 proteins in seven eukaryotic model systems spanning the fungi, plant, and animal kingdoms. Three additional sequences used in this study are the mouse homolog of H. sapiens FBX8, and two bacterial Sec7-domain proteins. These 52 sequences were used for generating multiple alignments and a phylogenetic tree. The tree and the alignments were in turn used for four purposes. First, we defined seven family groups and suggested a consistent set of names for family members, which should also help naming newly identified members in other organisms. Second, alignments of the Sec7 domain itself allowed us to identify elements that might be important for the Arf-GEF activity of the domain and its sensitivity to BFA. Third, alignments of entire proteins identified common as well as group-specific domains, which are likely to be important for aspects of the Sec7-protein family function other than Arf GEF activity. Fourth, the tree and the alignments form the basis for speculation about the evolution of the Sec7 domain family.

Sec7-Domain Family Groups
The phylogenetic tree and the multiple alignments of the 52 proteins formed the basis for defining the seven eukaryotic Sec7-domain protein groups. Three criteria were used for group definition: known phylogenetic relationships (e.g., all animal or fungi in a group are clustered together), strength of branches at the bases of groups, and similar structure within and outside the Sec7 domain. Two groups include members from fungi, animals and plants: BIG/SEC7 and GBF/GEA. The other groups are specific to fungi or to animals. FBX and BACT are defined as separate groups according to the above three criteria. These groups form the basis for consistent naming of the Sec7-domain protein family (Table 2).

We propose that all Sec7-domain proteins identified here possess an Arf GEF activity based on the following two lines of evidence. First, most of the Sec7 domain is highly conserved in all identified family members (Figure 4A), and this domain acts as an Arf GEF by itself. Second, at least one representative from each group, except for the SYT2 group, was shown to have such an activity. Group members that were shown to act as Arf GEFs are indicated in Table 2.

Members of the GBF/GEA and BIG/SEC7 groups, which act on class I Arfs, localize and function at the early and late Golgi, respectively (Franzusoff and Schekman, 1989Go; Peyroche et al., 2001Go; Spang et al., 2001Go; Kawamoto et al., 2002Go; Shinotsuka et al., 2002aGo,bGo; Zhao et al., 2002Go). EFA group members and their substrate class III Arf6 reside on the plasma membrane (PM), and function in PM ruffling (Franco et al., 1999Go; Derrien et al., 2002Go). BRAGs also act on class III Arf6, and at least one group member, BRAG2/GEP100, was suggested to localize to an endosomal compartment (Someya et al., 2001Go). The CYHs, which can act on class I and class III Arfs, are largely cytoplasmic in resting cells, but are recruited to the PM upon PI-3 kinase stimulation and have also been localized to the Golgi (Jackson and Casanova, 2000Go; Klarlund et al., 2000Go; Lee et al., 2000Go; Claing et al., 2001Go). The S. cerevisiae SYT1 and SYT2 are not required for cell viability (Jones et al., 1999Go; Dolinski et al., 2003Go), and their localization and function are still unknown. Although the cellular localization and function are conserved from yeast to human for the GBF/GEA and BIG/SEC7 groups, it does not necessarily mean they are conserved between all members of each group. For example, in the plant A. thaliana, which has representatives only from the GBF/GEA and BIG/SEC7 groups, one member of the GBF/GEA group, GBF3/GNOM, localizes to endosomes (Geldner et al., 2003Go), not the Golgi. Therefore, probably all members of each group have some, but not all, functions in common, those mediated by the group-specific domains.

Why are BIG/SEC7 and GBF/GEA conserved from yeast to plants and humans, whereas the other groups are not? One explanation is that at least some members of these two subfamilies serve an essential function in transport through the secretory pathway, which is essential to all eukaryotic cells. Members of the other groups function in endocytic pathways or in transport between the Golgi and endosomal/lysosomal compartments, which are more specialized in each organism. For example, certain animal cells can be stimulated to become highly mobile, a change that results in dramatic PM alterations that require Arf6 and its GEFs (Donaldson, 2003Go). Yeast cells, on the other hand, are nonmotile, but have an elaborate machinery to establish and maintain polarity during bud formation, a process that requires Arf3, and presumably its GEFs as well (Huang et al., 2003Go).

The Sec7 Domain and Arf GEF Activity
Alignments of the Sec7 domains of 52 proteins identified here provide insights regarding elements that are important for the Arf GEF activity of the domain. Breakpoints in the conserved alignment almost precisely correspond to known structural borders of the Sec7 domain. The correspondence between structural and alignment data suggests that the three-dimensional structure of the Sec7 domain of S. cerevisiae GEA2 is conserved not only with that of H. sapiens CYH1/2 (Betz et al., 1998Go; Cherfils et al., 1998Go; Goldberg, 1998Go) but also between all family members. It also implies that the {alpha}-helices have a positive selective value presumably due to conserved biological roles but that the loops (except for the catalytic glutamate) are merely spacers. Inspection of the core Sec7 domain identified a set of HC amino acids that are likely to have a crucial role in molecular shape integrity, interaction with Arf, Arf GEF activity, and/or interaction with other proteins. Most of the HC amino acids are not involved in Arf binding, as determined by previous structural and mutagenesis studies. Together, the breakpoints positioning and the HC amino acids suggest that conservation of the structure of the whole Sec7 domain is as important for its Arf GEF function as the Arf contact points. Such conservation probably reflects the importance of the flexibility of the Sec7-domain groove and the spatial arrangement necessary for the complex interactions with Arf during nucleotide exchange.

There are two possible cases where family members might have lost an Arf GEF activity. First, S. pombe SYT21 probably has a significantly reduced Arf GEF activity because it contains a change in the catalytic glutamate. Although a conserved change, E to D, the analogous mutation in H. sapiens ARNO/CHY2 resulted in 400 times lower exchange activity (Beraud-Dufour et al., 1998Go). This suggests that S. pombe SYT21 is either a weak Arf GEF or that this is a sequencing mistake (possible with a single nucleotide substitution). If the former is correct, its cellular role should be explored. Second, the Sec7 domain of A. thaliana BIG5 either uses an unconventional splice site, which is our favored explanation, or contains a 26-amino acid deletion that includes the catalytic glutamate. Such a deletion is expected to yield a protein that cannot act as an Arf GEF. It is not clear what might be the biological function of a Sec7-domain protein that has a significantly reduced Arf GEF activity or has lost it completely. One possibility is that it acts as a dominant inhibiting protein that sequesters Arf or other binding proteins and thereby hinders transport steps that are dependent on their function.

Although numerous studies have determined the in vitro substrate preference of various Sec7-domain proteins, little is known about substrate specificity of the Sec7-domain proteins in cells. An important question is whether substrate specificity observed in vitro is determined by the Sec7 domain itself or by other domains of these proteins. One study indicates that domains flanking the Sec7 domain are important for substrate specificity of CHY1 (Pacheco-Rodriguez et al., 1998Go). A multiple alignment of the Sec7 domains shows significant consensus in the {alpha}4-{alpha}10 helices and some diversity in the {alpha}2-{alpha}3 helices. These latter helices are conserved among most groups, but differ in SYT1, SYT2, and EFA (Figure 4A). This diversity cannot explain the in vitro specificity of EFA to class III Arfs, because BRAG and CYH can also act on class III Arfs, and they share sequence similarity in helices {alpha}2-{alpha}3 with the GBF and GEA groups, which act on class I Arfs (Table 1; see DISCUSSION under The Role of Common and Group-specific Domains). However, the Sec7 domain of the SYTs might be important for their substrate preference because in addition to the lack of conservation in helices {alpha}2-{alpha}3 with those of the other groups, four of the eight SYTs have nonconserved substitutions in the HC sites (Table 4). Helices {alpha}2 and {alpha}3 are not next to the Arf-contact points in the cocrystal structure. However, it is possible that they are important indirectly for Arf binding specificity. For example, these helices could be involved in the localization of the GEFs or in their interaction with other proteins, which in turn might be important for substrate specificity. Therefore, the Sec7 domain itself can bind Arf, act as its GEF, and might also play a role in substrate preference.

Sec7-domain proteins were shown to act as GEFs for Arfs that share >65% identity among themselves (yeast Arfs1/2 and mammalian classes I-III) (Table 1). Most studies do not address the question of substrate specificity outside the Arf group, namely, Arls, even in vitro, although the CYHs do not seem to act on Arl1–3 (Chardin et al., 1996Go; Pacheco-Rodriguez et al., 1998Go). We suggest that Sec7-domain proteins probably cannot act on Arls that share only 35% similarity with Arfs. This suggestion stems from the fact that a distant relative of the Arfs, Sar1, which shares 35% identity with the Arfs, has its own GEF, Sec12 (Barlowe and Schekman, 1993Go). Sec12 has no sequence similarity to the Sec7 domain even at e <10-2. Therefore, the limit of the substrate recognition of the Sec7-domain family is probably >=35% identity to the Arfs. Further in vitro and in vivo analyses are required to address the issue of substrate preference of Sec7 domain proteins toward Arls that share 35–65% identity with Arfs. Subsequent structure–function analyses would determine the role played by specific Sec7 subdomains in this characteristic.

BFA Sensitivity
BFA binds both to the Sec7 domain and to Arf and can inhibit the GEF activity of some Sec7-domain proteins on class I Arfs. Multiple alignments of the Sec7 domain was used to predict the BFA sensitivity of members of the family. We used information regarding three Sec7 domain mutations that change the BFA phenotype to define a BFA-sensitivity consensus and to predict the BFA phenotypes of all family members identified here. With the caveat of Arf specificity, it seems that group membership cannot be a predictor for BFA sensitivity. This will be important to keep in mind as new Arf GEFs are identified.

Although the BFA sensitivity phenotype has been used for research purposes, it is not clear whether it also has a biological significance. BFA is a natural fungal drug, probably produced to inhibit certain Arf GEFs regulating key transport steps. If true, coevolution of BFA and the Sec7 domain family could determine the current BFA sensitivity phenotype of family members. Because the C-terminal region containing the BFA sensitivity/resistance residues is ideally placed to play a regulatory role in GEF activity, it will be interesting to determine whether regulatory proteins or lipids bind to this region of Sec7 domain proteins.

The Role of Common and Group-specific Domains
The common and group-specific domains of the Sec7 domain proteins are probably important for interaction with partners other than Arfs. These interactions might determine specific membrane localization, substrate specificity, and upstream regulation of family members. Group-specific domains suggest similar interactions, and therefore function, for all members of the group. Determining the cellular function and identification of interactors of at least some members of each Sec7 domain group will allow further structure–function analysis to elucidate the roles of the various domains identified here. Although the precise functions of most of the cross-group domains flanking the Sec7 domain of the Arf GEFs are unknown, clues to function for certain domains are beginning to emerge.

One domain common to five of the seven groups is a PH domain, although there is no significant sequence similarity between the PH domains of the five groups. PH domains are involved in binding phosphoinositides (PIs). Therefore, one possible role of the domain in all Sec7 domain proteins that contain it is specific membrane attachment of the Sec7 domain proteins, as has been demonstrated for certain members of the CYH and EFA6 groups. The lack of sequence similarity between the PH domains of the Sec7 domain groups probably reflects binding specificity to various PIs and thereby to different cellular membranes. Interestingly, the presence of a PH domain distinguishes CYH, EFA, and BRAG, which act on class III Arfs, from BIG and GBF, which do not (Table 1). Therefore, it is possible that the PH domain determines the specificity of the Sec7 domain proteins toward class III Arfs indirectly. Because PH domains interact with PIs (Lemmon and Ferguson, 2000Go), substrate specificity can be mediated through the effect of PIs on membrane localization of the GEFs, to areas where Arf6 resides.

The PI binding specificity of the PH domains and the cellular localization were determined for members of the EFA and CYH groups. Specifically, the PH domains of EFA6A and B probably interact with phosphatidylinositol bisphosphate and determine their colocalization with Arf6 on actin-rich membrane ruffles and microvilli-like projections on the cell surface (Derrien et al., 2002Go). Deletion of the PH domain prevented membrane localization of EFA6A (Franco et al., 1999Go), and isolated PH domain of EFA6A localizes to the same membranes as does the whole protein (Derrien et al., 2002Go). The PH domains of CYHs interact with phosphatidylinositol bisphosphate and phosphatidylinositol trisphosphate, and mediate their recruitment to the PM and possibly the Golgi (Jackson and Casanova, 2000Go; Klarlund et al., 2000Go; Lee et al., 2000Go; Claing et al., 2001Go).

We cannot make predictions regarding the PI binding specificity and the localization of the BRAG, SYT1, and SYT2 groups based on the sequence of their putative PH domains for the following reason. The best hit of any of the PH domain of any SYT1, SYT2, or BRAG members to the Conserved Domain Database is Sp SYT21 (at 3e-8) (Marchler-Bauer et al., 2003Go). This sequence is weakly homologous to the PH domain of Hs EFA6A and Hs betaIII spectrin (at e = 0.002 for both). These two proteins are known to localize to different compartments: plasma membrane (Derrien et al., 2002Go) and Golgi (Stankewich et al., 1998Go), respectively. Therefore, one cannot draw any credible conclusion about the membrane localization of Sp SYT21. The PH domains of all other members of the SYT1, SYT2, and BRAG groups have an even poorer homology to the PH domain consensus as determined by Conserved Domain Database and SMART. Importantly, even a few modified amino acids can change the PI binding specificity of PH domains and thereby their intracellular localization (Klarlund et al., 2000Go). Therefore, the PI binding specificity and localization of the PH domains of BRAG, SYT1, and SYT2 need to be defined empirically, as was done for the known PH domains.

The coiled-coil domain at the N-terminal region of the CYH proteins seems to function as a protein–protein interaction domain in a number of the CYHs. Five partners have been identified for various members of the CYH family, all of which interact with the CC domain of these GEFs. These cytohesin partners are the presynaptic Munc13-1 (Neeb et al., 1999Go), GRP1-associated scaffold protein (Nevrivy et al., 2000Go), GRP1-binding partner (GRSP1) (Klarlund et al., 2001Go), the scaffolding protein CASP/Cybr (cytohesin binder and regulator) (Mansour et al., 2002Go; Tang et al., 2002Go), and interaction protein for cytohesin exchange factors 1 (IPCEF1) (Venkateswarlu, 2003Go). Two partners, GRSP1 and CASP/Cybr, have a coiled coil domain themselves, which is responsible for interaction with the CYH family members (Klarlund et al., 2001Go; Mansour et al., 2002Go; Tang et al., 2002Go). The functions of these partners are not known, although two, CASP/Cybr and IPCEF1, have been shown to stimulate Arf GEF activity (Tang et al., 2002Go; Venkateswarlu, 2003Go).

Members of the GBF/GEA and BIG/SEC7 groups share three common domains, B/G{alpha}-{gamma}, suggesting that these two families have at least partial overlapping interactions and/or cellular localization. Interacting proteins have been identified for all three B/G{alpha}-{gamma} domains. The N-terminal 331 amino acid region of BIG1 has been shown to interact with FKBP13 (Padilla et al., 2003Go). This portion of BIG1 contains the B/G{alpha}region that shares homology with members of both BIG/SEC7 and GBF/GEA subfamilies. B/G{alpha} is also present in the first 246-aa region of Arabidopsis GNOM/GBF3, which has been dubbed the "DBC" domain, for dimerization and cyclophilin binding (Grebe et al., 2000Go). This region interacts with cyclophilin 5 and was also demonstrated to be essential for interaction between GNOM/GBF3 monomers. Although the functional significance of these two interactions is not understood, it is interesting that both FKBP13 and cyclophilin are targets of immunosuppressive drugs FK506 and cyclosporin A, respectively. A likely hypothesis is that these interactions are involved in regulating GEF function.

Another possible regulatory interaction has been uncovered for BIG2, which binds the regulatory subunit of protein kinase A (PKA). Three potential PKA binding motifs were identified in the N-terminal region of BIG2 (Li et al., 2003Go). Two of these motifs are in poorly conserved regions with homologies only among animal BIG proteins, whereas the third is present in region B/G{beta}, which is highly conserved between all GBF/GEA and BIG/SEC7 family members. It will be interesting to determine which of these potential interaction sites act as PKA binding sites in cells and to investigate PKA regulation of Arf GEF activity. The third region of homology common to both GBF/GEA and BIG/SEC7 subfamilies, B/G{gamma}, is involved in the binding of Gea2p to the transmembrane protein Gmh1p (Chantalat et al., 2003Go). Gmh1p has homologues in all eukaryotic cells examined, and it will be interesting to determine whether other GBF/GEA and BIG/SEC7 proteins interact with Gmh1p family members, and whether they serve to recruit these GEF proteins to membranes. A proline-rich region at the C-terminus of GBF1 was found to interact with the vesicle tethering factor p115. This interaction is likely unique to higher e