|
NSF
Proposal
The
Tree of Life: Phylogeny of Spiders
Project
Summary
Our aim is to produce
a robust phylogeny of all the deepest branches within a mega-diverse
group, the spiders, by combining a massive amount of newly generated
comparative genomic data with a substantial set of new and re-assessed
data on morphology and behavior. Spiders are among the oldest and most
diverse groups of organisms on our planet, with fossils dating back
to the Devonian (c. 380 million years ago) and a current diversity of
over 37,500 described species placed in 3,471 genera and 109 families.
Among the few other mega-diverse groups that comprise similarly
large branches of the tree of life on Earth, spiders stand out because
of their ecological importance as the dominant non-vertebrate predators
in most terrestrial ecosystems. It
is probably no exaggeration to say that without spiders, human populations
would be greatly affected, as insect pests would devour even more than
the one-third of our crops they already destroy.
Spiders in many ways "replicate" the evolutionary experiment
insects represent.
In contrast to other
non-vertebrate groups of comparable size, the cornerstones for a comprehensive
phylogenetic study of spiders are at hand.
Spiders uniquely enjoy a completely up-to-date, on-line, species-level
taxonomic database extending from Linnaeus to the present -- essential
to taxonomic and phylogenetic research.
Deep branches of spider phylogeny have been investigated in over
50 modern, quantitative cladistic analyses that overlap to cover a surprising
proportion of total spider diversity (102 of 109 families, 23% of all
genera, almost 2,400 homology hypotheses), although the complete matrix
jointly implied by these studies has never been assembled, much less
analyzed. These studies provide an initial hypothesis
of relationships far more detailed than that available for any similarly
large and important non-amniote group; probably only fishes have received
comparable cladistic scrutiny.
However,
these analyses have been based almost entirely on morphological (and
a small amount of behavioral) data.
The insignificant amount of genomic work to date on spiders has
been uncoordinated and of little utility for broad-scale phylogenetic
investigation. The advent of high-throughput DNA sequencing,
however, makes it feasible to examine substantial parts of the genome
across a dense sampling of spider taxa.
We propose to sequence at least 50 "loci" (genome samples
of 500-1,000 or more base pairs that can be sequenced as single pieces
in both directions simultaneously) for representatives of at least 500
genera of spiders and their closest relatives (the whipscorpion orders
Amblypygi, Uropygi, and Schizomida).
These genera will be carefully selected by a sampling strategy
designed to maximize the resolution of deep branches within spider phylogeny,
and will purposefully include all the previously most-favored study
organisms of ethologists, ecologists, physiologists, and developmental
and molecular biologists, thus integrating and contextualizing their
research.
Data matrices will
be produced that combine the new genomic data with a new, comprehensive
survey of morphological and behavioral homologies, thus offering a unique
"index" to all comparative data on one large group. The more than 20 million entries in these matrices
will dwarf those of all previous studies taken together. The computational challenges posed by such huge
matrices were insoluble until recently.
New computer software, designed in large part by members of our
group and using massively parallel processing to achieve supercomputing
capability, makes such analyses feasible for the first time. We will use parsimony and maximum likelihood
methods of phylogenetic reconstruction to analyze our data. We will also quantitatively assess the robustness
of the results and the contribution of various data partitions to phylogenetic
patterns implied by these data. Many
of the leading researchers in phylogenetic systematics are arachnologists;
this proposal involves an unusually integrated, collaborative, and informed
team involving 5 PIs and 10 senior researchers, postdoctoral fellows,
and graduate students, working in 14 labs housed in 13 institutions
and 4 countries.
We propose to collect
a huge amount of genomic information in order to test and improve the
results achieved by over 50 detailed morphological cladistic analyses
conducted by more than 30 investigators during the past 15 years. For three decades, the lack of well-tested phylogenies,
rather than comparative data, has been the rate-limiting step in broad-scale
evolutionary research. We propose
to remove that obstacle for one large group entirely. These data and the resulting phylogeny will
have ramifications that extend far beyond systematics. Spiders are already model organisms in behavioral
(especially sexual and web-building behaviors) and ecological (foraging,
predator-prey systems, integrated pest management) research. A robust and comprehensive phylogeny for the
deepest branches of this large branch of the tree of life will greatly
aid expanded research in all areas of comparative biology.
Project
Description
Results
from Prior NSF Support
G. Hormiga and J.
Coddington, Monographic Research in Araneoid Spider Systematics, DEB-9712353,
$415,480, 1997-2002. Three Ph.D. students are working on the research projects
funded by this grant. Jeremy
(Zujko-) Miller (working on Neotropical Erigoninae) is expected to defend
his dissertation by September 02. Ingi
Agnarsson (revising Anelosimus; started Fall 98) has
been advanced to candidacy and is expected to complete his dissertation
by the end of 2003. Matjaz Kuntner
(revising Nephilinae; started his Ph.D. Spring 99) has completed his
coursework and will take his orals in Fall 02.
We have made important progress in understanding the taxonomy
and phylogenetics of our target groups of araneoid spiders.
Fieldwork carried out in Colombia (1998), Myanmar (1998), Costa
Rica (1999), Guyana (1999), Chile (2000-01), South Africa (2001), Madagascar
(2001) and Australia (2002); we are currently planning fieldwork in
Thailand for Spring 2003. All
lab members have participated in most of these field trips.
The following products are available through our PEET project
web site (www.gwu.edu/~clade/spiders/peet.htm):
Neotropical linyphiid spider taxonomic catalog; on-line catalog
of the USNM spider collection; cladograms from past and upcoming papers
on araneoid systematics (linked to their phylogenetic databases); on-line
images of the linyphioid genera of the world (99% completed; copyright
permissions pending before upload).
Publications supported by this grant include: Agnarsson (2000),
Agnarsson (in press), Coddington & Colwell (2001); Griswold et al.
(1999); Griswold, Long & Hormiga (1999); Herberstein et al. (2000);
Hormiga (1998, 1999, 2000, in press); Hormiga & Coddington (2001);
Hormiga, Scharff & Coddington (2000); Hormiga, Arnedo & Gillespie
(in press); Kress et al. (1999); Kuntner & Hormiga (in press); Kuntner
(in press); Kuntner & Sereg (2002); Miller (submitted); and Zujko-Miller
(1999a, b).
G. Hormiga, Scanning Electron Microscope
for Systematic Biology, NSF DBI-0070362; G. Hormiga; PI & P. Herendeen,
D. Lipscomb, J. Clark, D. Lieberman, Co-PIs, $118,274, 2000-2001. This
grant provided funds to help establish a SEM facility at the Department
of Biological Sciences (GWU). A LEO 1430VP variable SEM and accessory
equipment (critical point drier, sputter coater) were purchased in 2000. Publications resulting from the use of this
equipment include: Hormiga (in press); Hormiga, Arnedo & Gillespie
(in press).
R. Gillespie
and J.
Coddington, Systematics of Spider Family Theridiidae, NSF DEB-9707744, 1997-
2000, $200,900. This grant provided funds to estimate the phylogeny
of the spider family Theridiidae from morphological and molecular data,
based on a comprehensive sample of genera. The morphological work is
nearly completion and the molecular work is done, although not yet published.
Five gene fragments have been sequenced and 255 morphological characters
coded for 51 genera (143 terminals), and papers on molecular, morphological,
and combined analysis are in preparation. The grant has contributed
to the support of two post-docs and 2 graduate students. Publications
supported by this grant include: Agnarsson (2000), Agnarsson (in press),
Coddington & Colwell (2001), Gillespie & Oxford (1998), Griswold
et al. (1999), Herberstein et al. (2000), Hormiga & Coddington (2001),
Hormiga, Scharff & Coddington (2000), Hormiga, Arnedo & Gillespie
(in press), Scharff & Coddington (1997), Oxford & Gillespie
(1998, 2001), Sorensen et al. (2002) and Tan et al. (1999).
P. Sierwald, The Diplopoda: Research,
Taxonomic Training and Computerization, NSF DEB 97-12438, $740,000,
1998 - 2002. . Co-PI: W. A. Shear, Hampden-Sydney College, VA.; 2 grad
students, 1 post doc, 2 masters student interns, 6 undergraduate interns;
FMNH millipede collection completely computerized, type collection separated.
Web page: www.fmnh.org/research_collections/zoology/zoo_sites/millipeet/home.html;
Publications supported by this grant include:
Bond, J.E. & P. Sierwald (In press a, b); Shelley R, P. Sierwald,
S.B. Kiser & S. Golovatch (2000); Sierwald P. & S. I. Golovatch
(2001); Shear, W. A. & D. A. Hubbard (1998a); Shear, W. A. &
D. A. Hubbard (1998c); Shear, W. A. (1999a, 1999b, 1999c, 2000a, 2000b);
Shear, W. A., M. Harvey & H. Hoch (2000); Shear, W. A. & P.
Selden (2001). One on-line publication: 2001, Version 1.0, Editor: Petra
Sierwald, Nomenclator Generum Diplopodorum. A complete genus listing
of all genus-group names in the class Diplopoda from 1758 through 1999.
Authors: Jeekel, C. A.W., R. L. Hoffman, R. M. Shelley, P. Sierwald,
S. B. Kiser & S. I. Golovatch.
W. Wheeler (with R. T. Schuh), The Evolution and Phylogeny of the True Bugs (Heteroptera), DEB 97-26587,
$65,000, 1998-2001. During this three-year project, we attempted to
acquire and sequence the broadest possible sample of heteropteran taxa. Many of the specimens were obtained through
fieldwork conducted by Schuh in Australia during the grant period. Initially, new taxa were easy to acquire, and
within a relatively short period we made tremendous progress toward
having comparable sequences for most of the families and many of the
subfamilies within the Heteroptera, as well as a dense sampling of outgroups. The remaining 20% of the taxa were much harder
to acquire. Through continued
fieldwork and contacts with colleagues, we have now sequenced 76 family-level
taxa within the Heteroptera, this number being based on the revised
classification of the Lygaeoidea recently published by Henry. We have sequenced 17 outgroup families within
the Hemiptera, including the Coleorrhyncha.
Sampling at the subfamily level was most dense in the Cimicomorpha
and Pentatomomorpha. The total
sample includes about 445 taxa for which at least some sequence data
were acquired. The densest sampling
is within the Miridae, where we have a relatively complete set of sequences
for 170 taxa representing virtually all recognized suprageneric groupings.
We chose to sequence the following gene regions, known to contain phylogenetic
signal on the basis of prior studies: 18S rDNA (~1000 bases), CO1 (
~1000 bases), 28s (~350 bases), 16s (~650 bases), or about 3000
bases per taxon for a total of more than 1.2 million bases.
Using these sequence data in concert with existing and newly
acquired morphological data allowed testing of the following phylogenetic
hypotheses: 1) suprafamilial relationships within the Hemiptera (with
densest sampling for the Heteroptera); 2) family-group relationships
within the Cimicomorpha; 3) family-group relationships within the Lygaeoidea;
4) family-group relationships within the Pentatomoidea; and 5) tribal-level
relationships within the Miridae. Preliminary
analyses for the Cimicomorpha and Lygaeoidea indicate corroboration
of the basic outlines of the hypotheses proposed by Schuh and Stys for
the Cimicomorpha and by Henry for the Lygaeoidea. Publications supported
by this grant: Wheeler, Whiting, Carpenter, and Wheeler (2001).
Introduction
Among the most fundamental
missions of biology are a complete global inventory of the species on
our planet, and a natural classification of those species on the basis
of their phylogenetic relationships; the importance of both missions
is well delineated in the reports and recommendations of Systematics
Agenda 2000 (1994). Phylogenetic
classifications are scientific hypotheses that are crucial to all aspects
of comparative biology; not only do they provide maximally efficient
descriptions of the data on organismic attributes already at hand, they
allow maximally effective predictions about the distributions of attributes
not yet studied in detail.
Imagine that we
find a newly discovered species, and are able to identify it as a spider
(for example, by discovering that it has abdominal silk glands and spinnerets,
features unique to spiders). From
that information alone, we can predict, for example, that this new species
will have male pedipalps that are modified for sperm transfer (another
feature unique to spiders). We
can also predict that it will have the features characteristic of the
larger groups to which spiders belong; as an arachnid, we can predict
that the newly discovered species will have two body regions and four
pairs of legs; as an arthropod, we can predict that it will have jointed
appendages, etc. Every grouping
of species in a hierarchical classification enables such predictions,
and the accuracy of the predictions depends on the degree to which the
classification reflects the evolutionary history of the groups (i.e.,
the phylogenetic interrelationships of their component taxa).
Groups of organisms
are not all equally well known, of course, either in terms of inventorying
all their component species, or of understanding the interrelationships
among those species already described. Estimates of species richness yet to be discovered
range from about 8 million to 100 million species (Hammond, 1992), and
only for the most conspicuous groups of large organisms (vertebrate
animals, green plants) are we at all close to having a complete global
inventory of species. Unfortunately,
vertebrate animals and green plants together represent only about 3%
of the world's biota (and quite possibly the least representative 3%
at that; Hammond, 1992; Platnick, 1999). This historical bias against
smaller and less conspicuous organisms is also evident in the phylogenetic
aspects of systematics, where it has severely hampered comparative biology. Groups whose interrelationships are poorly understood
are often actively avoided by the research community as model subjects
for inquiry, leading to a vicious circle of continuing, comparative
neglect.
It is for all these
reasons that the report of a recent NSF-sponsored workshop on "Assembling
the Tree of Life: Research Needs in Phylogenetics and Phyloinformatics"
calls for a major new initiative to resolve the basic outlines of the
Tree of Life, with emphasis on the deeper branches of the tree (i.e.,
the oldest and most diverse groups).
We propose here to focus on spiders (Araneae), as a group that
is an especially well-suited target for this initiative, by combining
a massive, comparative sampling of spider genomes -- something never
before undertaken, and only now achievable .
with an equally thorough synthesis of the existing
and new morphological and behavioral data on the same set of taxa.
Why
Spiders?
Even among smaller
and less conspicuous organisms, some groups have fared better than others.
Spiders are among the oldest and most diverse of such groups.
The earliest spider fossils are from 380 million year old Devonian
deposits at Gilboa, New York (Shear et al., 1989), and the earliest
fossils of the most closely related groups of arachnids are Devonian
as well. At present, there are over 37,500 currently
valid species of spiders, grouped into 3,471 genera and 109 families. By comparison, among the other animal groups
ranked as Orders, only the five largest insect groups (Coleoptera, Lepidoptera,
Hymenoptera, Diptera, and Heteroptera) and the mites (Acari) are larger. Current estimates of the world's total spider
diversity range from 76,000 (Platnick, 1999) to 170,000 (Coddington
and Levi, 1991) -- in other words, somewhere between 20 and 50% of the
world's total spider species have already been described and classified. This contrasts well with other non-vertebrate
taxa; the 8,000 known species of millipedes, for example, are thought
to represent at most 10% of the actual total diversity, and the figure
for mites would be much lower.
Over recent decades,
spider systematics has advanced dramatically, through the efforts of
a relatively large number of specialists.
By way of comparison, both the Coleopterist's Society (which
covers all beetles) and the American Arachnological Society (which covers
all arachnids other than mites) have approximately 600 members (not
all of which are systematists, of course), even though the number of
beetle species in the world is an order of magnitude greater than the
number of non-mite arachnids. This disparity among research communities is
also reflected in taxonomic activity; between 1978 and 1987, for example,
an average of 2,300 new beetle species were described per year, whereas
more than half as many (1,350) new arachnid species were described annually
(Hammond, 1992), with spiders representing the lion.s share of those
new descriptions. In addition, unlike most groups of non-vertebrates,
our existing knowledge of spiders is well cataloged. The taxonomically important contents of a series
of 14 large volumes of printed catalogs (Roewer, 1942-55; Bonnet, 1945-59;
Brignoli, 1983; Platnick, 1989, 1993, 1998) are now available electronically
in "The World Spider Catalog" (Platnick, 2002), already on-line
as >13 megabytes of text (at http://research.amnh.org/entomology/spiders/catalog81-87/index.html),
and on CD-ROM in database format as well, with on-line database versions
to follow. The world catalog
provides fast and easy access to information on original and all subsequent
descriptions, synonymies, transfers, and geographical distribution. Mutual links are being installed between entries
in The World Spider Catalog and those for spiders in GenBank (Platnick
has had oversight responsibility for the systematics of the spider listings
in GenBank for several years).
Moreover, spider
diversity encompasses the taxonomic levels that are most crucial to
research in comparative and evolutionary biology.
In spiders, most natural history attributes (e.g., foraging styles
and ecological guilds, sexual dimorphism and sex-ratio characteristics,
suites of behavioral characters, and major adaptive attributes) characterize
genera or at most families. For
example, all members of the family Salticidae (jumping spiders) are
diurnal sight-hunters. Larger groups, such as orders, tend not to be
so coherent with regard to the biological attributes of their members
(i.e., a much wider variety of foraging modes, reproductive biology,
and habitats exists among the other arachnid orders).
Species, on the other hand, tend to share most such biological
attributes; for example, all members of the genus Deinopis (family Deinopidae) spin identical and unique webs. Genera and families often demarcate evolutionary
novelties, e.g., shifts in foraging mode or web-construction. Therefore, research on these "mid-level"
(familial and generic) phylogenies is an absolute necessity to place
most of the comparative data from other biological disciplines (especially
ecological, behavioral, physiological, and developmental studies) into
a predictive framework.
Why
Now?
Despite the immense
size of the order, spiders have benefited from a relatively long history
of modern phylogenetic research (Table 1, below, at end of Project Description).
Focusing just at the generic level and above, explicit morphological
matrices analyzed by quantitative techniques cover 805 of the 3,471
described genera and 102 of the 109 currently recognized families.
Ignoring overlaps in characters, these studies involved 2329
morphological characters (when overlaps are taken into account we estimate
the number will reduce to perhaps 1500-- a rough indication of the number
of morphological homology hypotheses to date for spiders).
In contrast, molecular data are available for fewer than 50 taxa,
and with a few exceptions were gathered in order to exemplify Araneae
in higher-level studies on chelicerates or arthropods, or for intrageneric
studies.
Taken together,
these studies occurred over a span of 15 years and involved over 30
different investigators, methodological approaches, and systematic goals
(Table 1). Only about 200+ genera are shared between two
or more matrices. Character state definitions (even of the same homology
hypothesis) vary significantly among studies, depending on the taxon
sample used and the goal of the study.
If combined and edited for overlaps, these matrices can be the
basis for a comprehensive database of comparative morphological information
on spiders. Nevertheless, key deep nodes in spider phylogeny have not
been addressed by these previous studies, for example the relationship
between Palpimanoidea and the remaining entelegynes.
The internal structure of Palpimanoidea and Gnaphosoidea, as well as the placements
of Periegopidae, Cryptothelidae, and Zodariidae, have never been
tested quantitatively. Indeed,
except for Orbiculariae, no interfamilial relationships in spiders have
been tested by substantial taxon sampling of the contained genera; results
to date are based purely on exemplars and very sparse taxon sampling. Dionychan monophyly requires test. Mygalomorph phylogeny is contentious: the classical
families Dipluridae, Nemesiidae, Theraphosidae, and Cyrtaucheniidae
seem to be para- or polyphyletic. The higher level phylogeny of the
suborder Mygalomorphae is currently being investigated by project participants
Bond and Hedin, funded by a NSF grant (see under .Plan of Work: Ingroup.). Within
Araneomorphae, the higher-level phylogeny of Dionycha (17 families)
is almost unknown (but is currently under study by project participant
Ramirez), and the important tropical family Ctenidae may be polyphyletic.
Seven spider families have never been included in any cladistic
quantitative study (although phylogenetic arguments exist for some):
Periegopidae, Cryptothelidae, Cybaeidae, Halidae, Chummidae, Hahniidae,
and Homalonychidae. Periegopids
are obviously haplogynes, the remainder entelegynes. Cybaeids and hahniids together comprise 36 genera
and show many critical character combinations that are sure to rearrange
the provisional topologies suggested by the few multi-family cladistic
studies published to date. Many
of the deeper nodes within Entelegynae, therefore, have only been superficially
explored, and will certainly change to some extent. Many families are
probably not monophyletic -- most obviously Ctenidae, but also Pisauridae,
Miturgidae, Liocranidae, Corinnidae, Clubionidae, Amaurobiidae, Dictynidae,
and Mysmenidae.
This proposal seeks
to produce a completely scored, internally consistent morphological
and molecular matrix for at least 500 carefully chosen generic taxa
that will sample all spider families; family and higher relationships
will emerge as a result of detailed analysis at lower levels.
In short, morphological analyses of spider interrelationships
have now advanced to the point where our current hypotheses need to
be severely tested, and refined, by an entirely separate source of data. Genomic information is the best available source
of that test, and now needs to be collected on a scale comparable with
that already achieved for morphology.
Plan
of Work: Outgroups
A phylogenetic study
of any group must collect and analyze data on the closest relatives
(outgroups) of the study group (ingroup), in order to root the resulting
tree. Two competing hypotheses
on the sister group of spiders exist. One hypothesis maintains that
whipspiders (order Amblypygi) constitute the sister group (Weygoldt
and Paulus, 1979); the competing hypothesis maintains that Pedipalpi
(Amblypygi, Uropygi and Schizomida together) is the sister group (Shultz,
1990; Wheeler and Hayashi, 1998). If
the latter hypothesis is true, including all three orders might still
provide only one outgroup node. We therefore propose to obtain genomic
information on representatives of all three orders as well as Palpigradi
in order to assure at least two outgroup nodes in order to unambiguously
polarize homology hypotheses within spiders. Work on both the morphological and molecular
characters of these outgroups will be under the primary direction of
Lorenzo Prendini.
Amblypygi. The order
Amblypygi includes ca. 141 species assigned to 19 genera and 5 families,
two of which have two subfamilies each.
The Amblypygi are the best studied of the outgroup taxa; a cladogram
based on morphological data is available for the families and genera,
most of which are monophyletic (Weygoldt, 1996, 1999, 2000). However, Charinus is seemingly paraphyletic and charinids are in serious need of revision
(Delle Cave, 1986; Weygoldt, 2000; Harvey, in prep.). The phrynichid subfamily Damoninae may also
be paraphyletic or, if not, Trichodamon should be transferred
to the Phrynichinae (P. Weygoldt, pers. comm.).
Our sampling strategy will minimally include representatives
of the Damoninae and Phrynichinae (Phrynichidae), Heterophryninae and
Phryninae (Phrynidae), Charinidae, and Charontidae.
Ideally, sampling would include representatives of as many of
the 19 genera as possible. The
larger genera, especially Charinus, would be represented
by two or more species, including (where possible) the type species. Paracharon, from Guinea-Bissau, presently
placed in a monotypic family and suborder, is considered to be the sister
group of all other amblypygids (Weygoldt, 1996, 2000) and would therefore
be an important (if perhaps elusive) target. Amblypygid genera and species
are thinly spread across tropical and subtropical countries, often with
only a single species recorded per country.
DNA samples are already in hand for eleven genera in four families
and three subfamilies. Neotropical
collecting could yield four additional genera.
Phrynichosarax can be collected in India,
Malaysia or Singapore, also important locales for schizomids and uropygids
(see below). The remaining genera
are geographically restricted and would require collecting in Myanmar
(Catageus) and South Africa (Phrynichodamon).
Schizomida. The
order Schizomida includes ca. 217 species assigned to 34 genera and
two families (one with two subfamilies).
The higher classification of schizomids is explicitly phylogenetic,
although not yet supported by quantitative analyses (Cokendolpher and
Reddell, 1992). However, the
monophyly of most schizomid genera remains untested; Schizomus is particularly problematic
because many older descriptions do not mention the characters now considered
diagnostic (Reddell and Cokendolpher, 1995; J. C. Cokendolpher, pers.
comm.). The Schizomida are the
least studied of the three outgroup orders; Harvey (unpublished) estimates
that over 500 species may eventually be recognized globally. The African and Asian schizomid faunas are the
most poorly known. Our sampling
strategy will minimally include representatives of the two genera of
the Protoschizomidae, the one genus of the hubbardiid subfamily Megaschizominae,
and two genera of the Hubbardiinae.
Ideally, sampling would include as many of the genera as possible,
with the larger genera represented by two or more species, including
the type species, where necessary, but our minimal strategy addresses
the most important areas of schizomid phylogeny.
Megaschizomus (from Mozambique and South Africa) is considered to be the
sister group of the Hubbardiinae (Cokendolpher and Reddell 1992) and
is therefore an important target for resolving hubbardiine relationships. The endemic Mexican Protoschizomidae, which
comprise the sister group of the Hubbardiidae (Cokendolpher and Reddell,
1992), are also of considerable interest because the female genitalia
resemble those of diplurid spiders and charinid amblypygids (J. Cokendolpher,
pers. com.). Schizomid genera and species are also tropical and subtropical,
and the optimal collecting strategy thus overlaps completely that for
spiders, amblypygids, and uropygids.
Collecting in Australia and Mexico could provide species from
14 genera and both families. Exemplars from 11 additional genera could be
added by collecting in Brazil, Costa Rica, Cuba, Indonesia, and Singapore. The remaining genera are each geographically
restricted and would generally not be cost-effective to secure.
Uropygi. The order Uropygi, generally considered the
sister group of the Schizomida (Weygoldt and Paulus, 1979; Shultz, 1990;
Wheeler and Hayashi, 1998) is the least speciose of the three orders,
comprising 16 genera and 102 species assigned to a single family with
four subfamilies. Uropygids are
poorly known and lack a phylogenetically sound classification. Harvey.s unpublished preliminary analysis confirms
the finding by Haupt and Song (1996) that the Hypoctonidae are not monophyletic.
Dunlop and Horrocks (1996) have even provided a conflicting hypothesis
in which uropygid monophyly was violated by grouping the .hypoctonids.
with schizomids rather than the remaining uropygids.
Several large uropygid genera (e.g., Thelyphonus) lack supporting apomorphies (Harvey, unpublished) and the
status of some of the smaller genera (e.g., those erected by Speijer,
1933, 1936) is also dubious (Rowland and Cooke, 1973). Our sampling strategy will minimally include representatives of each
of the four subfamilies; ideally, as many of the 16 genera as possible
would be included, with problematic groups like Thelyphonus represented by two or
more species, including the type species. The two major areas of
uropygid endemism are in Asia and the Indo-Pacific (12 genera found
from India to Fiji) and in the Americas (three genera found from the
southern U.S. to Brazil). DNA
samples are already in hand for four genera from two subfamilies.
Collecting efforts in Indonesia, the Philippines, and Brazil
could provide exemplars from an additional nine genera and one subfamily. Exemplars from the remaining three genera and
one subfamily could be added by collecting in India (Labochirus, Uroproctus), and West Africa (Etiennius). Inclusion of the
African and Indian species is of considerable interest from a biogeographic
perspective and may be important in resolving relationships among the
American and remaining Asian genera.
Plan
of Work: Ingroup
Responsibilities
for the various aspects of the ingroup analyses will be divided among
the investigators (see Management Plan).
Co-PIs Coddington, Hormiga, Prendini and Sierwald and senior
collaborators Arnedo, Bond, Griswold, Maddison, Ramirez, Scharff and
Shear will compile the morphological and behavioral parts of the matrices. Because so much basic work on spider anatomy
and behavior has already been organized cladistically, we will make
every effort to include and test all of it against the genomic data. For 500 taxa, just synthesizing and concatenating
the roughly 2400 homology hypotheses to date will require roughly 800,000
novel entries, as only about 15% of the total implied matrix is currently
in hand. Four sources will augment
the estimated 1,500 unique homology hypotheses in the literature. First, many highly informative sources of morphological
data have been examined in some but not all spider families; among these
are spinneret spigot, setal, and tarsal organ morphology, studied through
scanning electron microscopy. Previous
surveys of these and other characters will be expanded to cover the
full taxon set. Second, the taxonomic
literature for many taxa suggests diagnostic characters and morphological
oddities not yet assessed cladistically, e.g. onychia, details of male
genitalia, male epiandrous spigots, female reproductive systems, spination
patterns, cheliceral modifications, and male sperm duct trajectories. Third, comparative biologists other than systematists
have proposed many homologies over the years never assessed by rigorous
systematic research (e.g. details of eye morphology, sperm ultrastructure,
musculature, various gland systems, mating postures, attack behaviors,
eggsac features, dragline/line climbing behavior, and especially ultrastructural
features such as stridulatory structures, pore fields, hair types, and
cuticular textures). Fourth, many groups at multiple hierarchical levels
have never been studied phylogenetically, and are sure to yield myriad
new discoveries. We will marshal
all of these morphological, behavioral, microanatomical, and ultrastructural
data and unite them with newly collected molecular data to create one
unified, consistent, modern encyclopedia of comparative, heritable information
on spiders and their closest relatives. All collaborators will be involved
in the data analysis, which will be spearheaded by Wheeler, Goloboff,
and Maddison who have developed most of the software involved (see Data
Analysis section, below). The choice of taxonomic exemplars is obviously
crucial, but intermediate results will be required before we can identify
issues such as long branches that need to be broken up by taxon addition,
or important character optimizations that are made ambiguous by the
omission of critical taxa. We seek a robust phylogeny that will strongly
impact comparative biology, be used widely, and be broadly applicable. The following mix of theoretically and practically
driven criteria seem important to those goals:
1) The cladistically most crucial representatives of a group.s groundplan
are the two most basal lineages; when the composition of this .first
doublet. is suggested by an existing analysis,
we will sample those lineages first.
Thus for example, within the Theridiosomatidae, we would choose
Plato (Platoninae) and Epeirotypus (Epeirotypinae) over Wendilgarda and Epilineutes (Theridiosomatinae). Filistatinella among filistatids and either Filistata or Kukulcania, but not both, is another
example of basal lineage selection.
2) Except for monotypic families, at least two non congeneric species
will be sampled from every family, if possible, to ensure that putative
family-level synapomorphies are cladistically informative and tested
against the full dataset. For
small or dubiously monophyletic families (Liocranidae, Miturgidae, Nemesiidae,
etc.), the type genus will be sampled in addition to component lineages.
3) We will use existing cladistic information to select the most
basally branching genera from all significant clades. Where no such cladograms or modern classifications
exist, we will consult the most detailed classification available (Roewer's
1942-55 catalog arrangement, which included 183 subfamilies and 351
tribal groupings). Although the
family-level classification proposed by Roewer has been thoroughly refuted,
many of his lower-level groupings (often taken from the previous work
of Simon) may be monophyletic, and should provide a better-than-random
map of the internal cladistic structure of families that have not yet
been studied phylogenetically.
4) To ensure that we have sufficiently dense sampling of those genera
most crucial to establishing the deeper branches of spider phylogeny,
we will bias sampling against the seven largest families (Salticidae,
Linyphiidae, Araneidae, Theridiidae, Thomisidae, Lycosidae, and Gnaphosidae). Each of these large groups is currently considered
monophyletic. We will sample
them sufficiently to test current hypotheses of their internal structure
(and monophyly), but five of them already have (or will have, from on-going
work in our laboratories) supported phylogenies that are more detailed
than the average resolution we hope to achieve for deeper clades. For the purposes of this project, we are less
interested in the details of the distal branches of the internal phylogenies
of those families than in what those families have to tell us about
interfamilial relationships among araneoids, lycosoids, and dionychans
in general. These seven families jointly account for almost
half of the described spider genera (1,693 genera); by undersampling
their terminal branches, we can achieve very dense coverage of all remaining
lineages, and hence the deepest, most contentious, and most difficult
questions of spider phylogeny. We
will consult widely with colleagues working on the seven large families,
to achieve a choice of exemplars that will maximize synergy with their
efforts and on-going studies (as, of course, we will do for the smaller
families as well).
5) To maximize the impact of our results on related fields, we will
choose taxa that have been (or are likely to be) the subjects of detailed
study by behaviorists, ecologists, physiologists, and other non-systematic
biologists, so that their past and future results map easily to the
phylogenetic and evolutionary context we will provide. These taxa tend to be easy to find, and abundant,
so choosing them is also pragmatic.
6) As the success of high-throughput sequencing depends on the availability
of high-quality DNA samples, we will attempt (through our own fieldwork
and that of our colleagues) to secure fresh, adult material of all taxa
if such is not already available. Newly
collected specimens, fixed in absolute ethanol, amplify much more successfully
than do standard museum specimens that have been stored in 70-80% ethanol
for extended periods. Newly collected
specimens also have the advantage that successful DNA amplification
is usually possible using only one or two legs, so that the remaining
parts of the specimen are fully useful as vouchers and for normal systematic
investigation (in all cases, the genitalic structures necessary for
specific-level identifications will be vouchered).
Using legs only also has the advantage of greatly reducing the
possibilities of contamination by sequencing prey DNA from the digestive
system of whole animals.
The drawback, of
course, is that fieldwork is required at multiple sites around the world. Our budgeted fieldwork will
"piggy-back" on existing projects wherever possible. Charles Griswold is funded to conduct field
surveys of spiders in Madagascar and China; he and his field crews will
collect material in absolute alcohol for sequencing.
Bond and Hedin are currently funded to travel to South Africa,
western Australia, and South America to collect mygalomorphs,
and will be preserving spiders from other families for DNA and morphological
work. Our continuing PEET projects allow sampling in the Neotropics,
southeast Asia, and Australia. The Smithsonian budget offers competitive funding
opportunities that have supplemented substantially prior NSF-funded
projects in which Coddington is a co-PI. In other cases, it will be cheaper to provide
funding to colleagues already working at target sites than to visit
them ourselves, and we will aggressively pursue all such opportunities
to secure needed specimens at the lowest possible cost.
Given the two ABI
3700 sequencers, the BIOMEK sequencing robot, and a single technician
line, the cost of sequencing supplies, not personnel or equipment, is
the rate-limiting factor. Were
supply costs not a factor, we would aim for sequencing 100 loci for
each of 1000 taxa, and we will attempt to find other sources of funding
to allow additional taxa to be included.
Bond and Hedin are
currently funded by NSF to conduct work on the systematics of the Mygalomorphae.
This work will combine morphological and molecular data for a comprehensive
sample of mygalomorph genera from all 15 families - the target sample
includes about 120 total taxa (about 110 different genera). Clearly,
this effort overlaps with the proposed goals of this grant, but we see
this overlap as generally synergistic in two obvious ways. First, the
Bond and Hedin phylogenetic sample will be a perfect forum to explore
gene utility for all spiders - mygalomorphs are an obvious clade with
several well-defined subclades. Furthermore, the group includes both
deep- and shallow-diverging lineages. Because Bond and Hedin will have
DNA samples available for key taxa/clades before most of the TOL work
begins, exploratory analyses of gene utility might best be conducted
in this group. Second, the genomics results of the TOL work will feed
back into the efforts of Bond and Hedin. New genes, found to be informative
for the broader spider sample, might be applied to the large taxon sample
of mygalomorphs. This feedback will greatly strengthen the molecular
systematics component of the mygalomorph research.
The list of the
109 spider families showing the current number of described genera in
each family, followed by the minimum number of genera we hope to sample,
is presented in the Management Plan of the proposal, under .Morphological
data. (see Supplementary Documentation).
Sequencing
Techniques
Primer Search: We will take three approaches to generate a set
of at least 50 loci that will pcr-amplify and sequence from the spider
and outgroup taxa. These three approaches are: PCR-primer design
and genomic DNA probing, EST-cDNA library generation and overlap, and
a combination of the first two. Through
literature sources (e.g. Colgan et al., 1998; Damen, Weller & Tautz,
2000; Masta, 2000; Regier & Shultz, 1997; Tatarenkov et al., 1999;
Wheeler, 1989; Wheeler, Cartwright & Hayashi, 1993; Whiting et al.,
1997) we have designed primers that amplify and sequence 22 loci (18S-[1,800bp],
28S-[2,315bp], 16S-[550bp], CO1-[1,100bp], 12S-[300bp], H3a-[350bp],
beta actin-[3,600bp], ITS1-[500bp], ITS2-[500bp], RNAHel-[500bp], Ntid-[~900bp],
Amy-[500bp], Kuz-[800bp], C1-J-2309/C2-N-3389-[1,000bp-amplifies 717bp
of COI and 300bp of COII], U2-[150bp],
POLII-[600bp], DDC-[700bp], cmos-[441bp], Boss-[400bp], Hb16S/HbND1-[1,569bp-amplifies
50bp of 12S, 51bp of tRNA Val, 1020bp of 16S, 53bp of tRNA Leu[CUN],
and 395bp of ND1], EF-1a-[500bp, Runt gene-[400bp], Hunchback gene-[450bp]). Six of these loci (18S rDNA, 28S rDNA, 12S mtrDNA,
16S mtrDNA, Cytochrome Oxidase I, and Histone 3a) have been examined
in detail and results presented for over 100 taxa in the preliminary
data section of this proposal. Given
our success with this approach, we feel confident that these methods
will continue to yield loci amenable to genomic PCR amplification.
A second approach we will take is based on EST analysis and cDNA
generation. The general methodology
for generating new primers is constructing cDNA libraries for a group
of taxa representing the diversity of the targeted group (Carninci et
al., 2000). This targeted group may be spiders as a whole,
or sub-clades may be more intensively sampled. From these libraries we will generate EST (random
sequences of the cDNA). Sequences common to multiple libraries are then
used to design primers for that specific locus. The major problem facing
this method is the frequency of overlap. Since sequences are generated
at random, they have a very low probability of overlapping across libraries.
Fortunately, there are several ways to improve overlap frequency (Piao
et al., 2001; Ko, 1990) such as basing libraries on specific tissue
thereby reducing the diversity of expressed genes.
Our third approach is to combine the first two by .fishing. the
libraries with primers derived from whole genome computational analysis
and literature-based primer design.
Tools exist within GeneBank (ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/
and http://sea-urchin.caltech.edu:8000/genome/databases/) for such procedures
and should enrich our yield of homologous loci.
When candidate loci are identified, and suitable primers developed,
loci will be sequenced for a small subset of taxa (ten or so) including
broadly distributed representatives.
From these initial data, we will assess the level of variability
(or conservation) and decisions will be made as to the suitability of
continuing to collect data from that locus. Furthermore, confounding
issues such as paralogy can be explored through this initial foray.
Multiple PCR bands, wildly discordant sequences, huge size variation
would all lead to suspicions of homology problems with that locus. Issues such as intron variation may be very
useful information systematically, but will make sequencing efforts
and primer design much more complex.
If the introns were small, and easily characterized, we would
attempt to make use of this information.
If intron variation is large, however, in size of complexity,
we would be unlikely to continue to invest time and energy in that system.
Isolation
of DNA: Genomic
DNA samples are obtained from fresh, frozen, or ethanol-preserved tissues
in a solution of guanidinium thiocyanate homogenization buffer following
a modified protocol for RNA extraction (Chirgwin et al., 1979). Alternative automated DNA preparation is accomplished
using the Qiagen Dneasy Tissue Kit: Dneasy Protocol for Animal Tissues.
PCR amplification
and Sequencing: Our molecular lab currently uses one ABI 3700
automated sequencing machine and has added a second (NASA-funded) to
accomplish its comparative sequencing projects.
A Biomek sequencing robot was recently added to the facility
to automate PCR purification and sequencing procedures.
The combination of these pieces of equipment has increased our
ability to sequence DNA by an order of magnitude.
The robotic sequencing machines interact directly with two Tetrad
4-head Thermocyclers. In general, amplification is carried out in a
50 ”L volume reaction, with 1.25 units of AmpliTaqź DNA Polymerase (Perkin Elmer), 200 ”M of dNTPs and
1 ”M of each primer or using Ready-To-Go PCR beads made by Amersham
Pharmacia Biotech to which we add 1 ”l per reaction of each
10”M primer, 23 ”l of water, and 2 ”l of DNA. The PCR program
consists of a initial denaturing step at 94șC for 60 seconds, 35 amplification
cycles (94șC for 15 sec, 49șC for 15 sec, 72șC for 15 sec), and a final
step at 72șC for 6 minutes in a GeneAmpź PCR System 9700 (Perkin Elmer)
or in Tetrad 4 head Thermocyclers. Specific
conditions are optimized for taxa and primer pairs.
PCR samples are
purified with the Qiagen Qiaquick 96 PCR Purification Kit by eluting
PCR product into 60 ”l buffer EB (on the Biomek Robot using a 96 well
format). The samples are then
dried about one hour in a speed vac and resuspended in 10 ul water with
the Biomek. The isolated products
are then directly sequenced using an automated ABI 3700 DNA sequencer. Cycle-sequencing with AmpliTaqź DNA Polymerase, FS (Perkin-Elmer) using dye-labeled terminators
(ABI PRISMTM BigDyeTM Terminator Cycle Sequencing
Ready Reaction Kit) is performed in a GeneAmpź PCR System 9700 (Perkin
Elmer) and in Tetrad 4 head Thjermocyclers. Sequencing combines 3 ”l water, 2 ”l Big Dye, 2 ”l Big Dye Extender, 1 ”l 3.2 ”M primer, 2 ”l DNA 96 at a time using the Biomek. The amplification program is as follows: 96șC
for 15sec, 50șC for 15sec x 25, 60șC for 4
min. Sequencing reactions are then cleaned using
Isopropanol/Ethanol Precipitation: 40
”l 70% isopropanol; spin 30 min at 3500rpm; flip plate upside down
and spin 1 min at 500 rpm; add 40 ”l 70% ethanol and repeat
spins; dry on bench 30 min; resuspend in 10 ”l formamide. The cleaned products (in microtiter plate) are
then loaded directly onto the 3700, four plates at a time. Sequences
are edited and contigs assembled using "SEQUENCHER" (Gene
Codes Corporation).
The combination
of the Tetrad Thermocycler, Biomek robot, and ABI 3700 make it possible
for one technician to amplify or sequence several hundred reactions
in a day. The 3700.s have the capacity (using POP5 buffer)
to sequence 8x96 (768) samples per day and the AMNH molecular lab has
two of these machines. The Biomek
allows the complete automation of PCR purification and sequencing (on
96 or 384 well micro titer plates), thus saving the technician thousands
of pipetting steps, improving accuracy and consistency (as well as state
of mind), freeing the researcher to perform more intellectual tasks.
This level of automation is what makes such an ambitious sequencing
project possible; our lab has the capacity to perform approximately
8000 sequencing reactions per week.
Choice
of genes. Our
explorations of genes will cover many parts of the genome, but will
focus on ribosomal and nuclear protein coding genes. It is likely that some genes that pass our initial
assessment of utility will be evolving too quickly to be useful at this
deep phylogenetic level. Hence,
we will first sample a relatively small number of taxa (about 50) for
each of 50 genes, then run separate phylogenetic
analyses for each. Those genes whose trees show considerable concordance
with those of other genes will be judged as retaining sufficient historical
information to be made targets for the full 500 taxon sampling. Additional
"well-behaved" genes will be sought by a similar strategy
until the total number of genes with apparent deep phylogenetic signal
reaches at least 50.
Archiving
of Samples:
The AMNH has established a modern frozen tissue storage facility,
the Ambrose Monell Collection for Molecular and Microbial Research,
intended to become a core sample resource center for comparative genomics. The facility can store one million samples from
around the world, thus representing a comprehensive range of species,
both pure cultured samples of taxa under study as well as taxa that
cannot currently be cultured. These
samples are housed at liquid nitrogen temperatures so that the highest
quality, maximum stability conditions are maintained for biomolecules
indefinitely. Several thousand spider specimens will be added
to the tissue storage facility as a result of our proposed work. Ultimately,
these samples might form the kernel of an international effort to store
and disseminate the genomes of all described spider genera
Data
Analysis
Reconstructing the phylogenetic tree for spiders will not be an easy
task analytically, both because of the depth of time spanned by the
tree and the size of the data set.
Because of the time depth (deepest divergences probably extending
into the Paleozoic), some branches of the tree may be long and isolated,
having accumulated so many differences from other taxa that relationships
are obscured. With such noise
in the data set, methods are challenged to extract the correct signal. The size of the data set, in number of characters
but particularly in number of taxa, will provide perhaps the greatest
computational challenge. For
example, an analysis of 500 taxa must select, implicitly or explicitly,
among the 7.8 x 101275 possible trees.
Our analyses will
therefore provide not exact but heuristic estimates, and require exploration
of varied optimality criteria and creative approaches to tree searches. Trees will be inferred under both the parsimony
(Farris, 1970, 1983; Kluge, 1984) and maximum likelihood (Cavalli-Sforza
and Edwards, 1967; Felsenstein, 1973, 1979, 1981a, b, 1983; Huelsenbeck
and Crandall, 1997) criteria using several programs (POY, Gladstein
and Wheeler, 1997; NONA and PEE-WEE, Goloboff, 1997a,b; TNT, Goloboff,
Farris, and Nixon, in prep.; PAUP*, Swofford, 2002; MrBayes, Huelsenbeck,
2000; Huelsenbeck & Ronquist, 2001).
We take this broad approach to take advantage of the diverse
skills of our team, and to cross-check the quality of our heuristic
estimates. Should varied approaches give substantially
similar conclusions, it will suggest that those results are robust against
both violation of assumptions and differing efficacies of the alternative
programs.
Available to us
are programmers experienced in phylogenetic computation, as well as
excellent computational facilities.
The presence of programmers on the research team -- Wheeler,
with the program POY; Goloboff, with PEE-WEE, NONA, and TNT; Maddison,
with MacClade (D. Maddison & W. Maddison 2000) and Mesquite (W.
Maddison & D. Maddison, 2001) -- will give us an unparalleled opportunity
to refine software in the context of a massive empirical project.
In addition to state-of-the-art desktop computers in many of
the participating laboratories, we will have access to a large parallel
cluster. In 1999, the AMNH installed a 256-processor (500 Mhz Pentium
III) cluster and in 2001 upgraded this to 560-processors (through the
addition of 1 Ghz Pentium III and 1Gig RAM
per node) designed especially for phylogenetic analysis of genomic data.
This parallel cluster is the fastest installed in any evolutionary
biology laboratory to date. Its size is presently being doubled and its
capacity tripled, and we anticipate upgrading it again in 2003. Parallel applications that have been developed
by us include integer-intensive DNA sequence alignment and direct optimization
software (written in-house); column vector-based phylogenetic algorithms,
such as pNONA and a parallel version of TNT currently under development;
and simulated annealing modeling of gene circuits. The new hardware and the 2003 upgrade, along
with algorithmic improvements, should keep run times for 500 taxa within
50-100 hours. Speed is important
because we intend to perform many runs to address parameter sensitivity
and to explore thoroughly the analytical space implied by the data. The total time required for these analyses may
be months on this machine -- but it would be over a century on even
a current, state-of-the-art single-processor PC.
Searches for most
parsimonious trees will be undertaken with the programs POY, NONA, TNT,
and PAUP*, each of which has parallel versions that are operational
or under development. Members
of the team will divide use of the programs according to expertise (e.g.,
Wheeler and Goloboff, POY, NONA and TNT; Maddison and Hedin, PAUP*)
and compare results. One fundamental difference among these programs
concerns alignment of nucleotide sequences: NONA, TNT and PAUP* expect
pre-aligned sequences, while POY searches simultaneously for alignment
and tree. Simultaneous alignment
and tree-search is generally regarded as ideal in principle (Wheeler,
1994, 1996, 1998, 1999, 2000, 2001; Slowinski 1998; Giribet & Wheeler,
1999; Giribet et al., 2000, 2001; Wahlberg & Zimmermann, 2000),
although it carries a much higher computational cost.
For analyses by NONA/TNT/PAUP*, alignments will be provided either
by POY runs or by gene-by-gene application of CLUSTAL (Higgins &
Sharp, 1988, 1989; Higgins et al., 1992; Thompson et al., 1994, 1997;
Jeanmougin et al., 1998), using elision techniques (Wheeler et al.,
1995) to choose alignment parameters (Hedin & Maddison, 2001), supplemented
by manual alignment.
Maximum likelihood
analyses will be done by both PAUP* and the newly-implemented likelihood
routines in POY. Likelihood analyses
are computationally more intensive than parsimony analyses of pre-aligned
data, and thus we will restrict likelihood analyses to subsets of up
to 150 taxa. Subsets will be
chosen in some analyses to maximize dispersion over the expected relationships,
in other analyses to focus on detailed relationships within clades that
appear well-corroborated otherwise.
The resulting overlapping but partial trees will be combined
by supertree methods (Gordon, 1986; Baum, 1992; Ragan, 1992a, b; Bininda-Emonds
& Bryant, 1998; Steel et al., 2000; Semple & Steel, 2000; Bininda-Emonds
& Sanderson, 2001) and by "eye". In addition to conventional likelihood analyses,
we will use the closely related Bayesian methods (Rannala & Yang,
1996; Mau & Newton, 1997; Mao et al., 1999) as implemented in MrBayes
(Huelsenbeck, 2000; Huelsenbeck & Ronquist, 2001).
In addition to allowing us to explore an alternative criterion,
use of Bayesian methods will make analysis of the entire set of taxa,
by a likelihood-related method, computationally feasible.
Analyses of different
data partitions (morphology, different genes) will be done separately
and simultaneously. The simultaneous analysis approach, because it uses
all available evidence, will be given more weight than any other single
analysis in determining our primary estimate of the tree.
This analysis (and any other analysis involving morphological
data) will be restricted to parsimony methods, because likelihood's
usual assumption of uniform stochastic behavior is especially problematical
for morphological data. The data
partitions will be analyzed separately for several reasons.
First, it will allow more refined gene-specific estimation of
models for likelihood analysis. Second,
it will reveal to what extent different genetic regions (possibly evolving
by different processes) offer independent corroboration for the same
tree. Partitioned bremer support (Baker & DeSalle 1997, Baker et
al. 1998) will be used to address the relative contributions of different
genes and different morphological character systems (genitalia vs. somatic)
to results of the simultaneous analysis. Third, concordance among genes
will indicate which are retaining the most historical information, and
thus should be the target of sequencing during denser taxon sampling.
The relative degree of support for nodes in all trees obtained will
be assessed with branch support indices (Bremer, 1988, 1994; Donoghue
et al., 1992) and bootstrap percentages (Felsenstein, 1985; Sanderson,
1989).
We noted above that
likelihood analyses will require decomposing the set of taxa into smaller
subsets, analyzing, then grafting the resulting
trees together. We anticipate
that this approach will be useful for some of the larger parsimony analyses
as well. Similarly, as some clades emerge as well-corroborated
by many characters and analyses, to simplify computation we may perform
subsequent analyses by constraining their monophyly, reducing the number
of their sampled representatives, or analyzing the clades (with outgroups)
in isolation. Because the large
number of taxa is one of the greatest burdens on the analyses, we will
also use the techniques of parametric bootstrapping to study how best
to increase taxon sampling. In
development in Mesquite are modules that can simulate increased taxon
sampling (randomly adding taxa to a skeletal tree, perhaps to specific
regions of the tree). Characters are simulated on the augmented tree,
the tree reconstructed from them, and the ability to recover the skeletal
tree is compared to simulations with the skeletal tree unaugmented. Such studies will help guide us as to how important
increased taxon sampling might be, and what regions of the tree most
need it.
These diverse analyses
will produce many, sometimes conflicting results. How to reconcile them?
Points of agreement will, obviously, be considered especially
well supported. However, we expect to encounter some irreconcilable
differences whose resolution will await future study.
Preliminary
results
The very preliminary molecular data (Fig. 1A) show a number
of classically recognized groups as well as numerous problematic groupings,
at least from a morphological point of view.
This result, based solely on sparse preliminary data for a very
broad range of taxa, is expected because to date we still lack about
30% of the sequences for the taxa in Fig. 1A, and because the range
of time represented in Fig. 1A clearly exceeds the ability of only six
genes (of which three are mitochondrial) to provide robust phylogenetic
signal on deep nodes. Obviously important genes for deep nodes, such
as EF1a and Pol II, are missing. In the few months available before
the TOL deadline, we produced an impressive amount of preliminary molecular
data: 617 sequenced fragments from six genes of 98 taxa. The choice
of taxa was largely drawn from the fresh material on hand during a north
temperate winter at one museum: lycosoids, symphytognathoids, and cribellate
groups were undersampled, and dionychans and trochanteriids oversampled.
We present results here not to argue that the choice of genes or taxa
in this preliminary data set was ideal, much less that our results are
definitive in any way, but simply to show that the ambitious DNA sequencing
schedule is feasible. Given the full range of genes envisaged above,
we anticipate that the concordance between molecules and morphology
will dramatically improve. Fig. 1A recovers many accepted shallow nodes
in spider phylogeny, and recovers many of the deeper nodes approximately. For example, the orders Uropygi and Amblypygi
are monophyletic, as are the families Austrochilidae, Archaeidae, Theridiidae,
Synotaxidae, Agelenidae, Desidae, Deinopidae, Zodariidae, Eresidae,
Liocranidae, Clubionidae, Anyphaenidae, Salticidae, and Oxyopidae. The doublets Scytodidae-Sicariidae, Oonopidae-Orsolobidae,
Diguetidae-Plectreuridae, Amphinectidae-Desidae, and Oecobiidae-Hersiliidae
are recovered. Malkarids group with pararchaeids, which is not unreasonable.
Corinnids may be polyphyletic. Thomisid, filistatid, and trochanteriid
genera largely group together, suggesting perhaps problems with the
individual sequences or taxa. Of course, the results are also very noisy.
To name a few, liphistiids are definitely spiders, archaeids are not
sister to austrochilids, linyphioids are not monophyletic, deinopids
and Tengella probably do not belong
in dionychans. Both Palpimanoidea and Deinopoidea are scattered throughout
the tree. Deeper nodes are less convincing and the large clades often
include taxa that clearly do not belong, but the molecular data do find
some previously hypothesized higher groups of spiders. Araneomorphae,
Haplogynae, Entelegynae, Divided Cribellum Clade, the RTA Clade, Fused
Paracribellar Clade, Araneoidea, and Gnaphosoidea are substantially
intact. The rooting of Araneoidea within Entelegynae, however, is upside
down.
The combined analysis
(Fig. 1B) corrects some of the problems with deeper nodes (Araneae,
Opisthothelae), but in fact the morphological data for this particular
set of taxa are quite preliminary as well, especially as regards dionychans.
Although being actively studied, the morphological evidence bearing
on dionychan relationships has never been assessed, and no one has attempted
to concatenate morphological homology hypotheses at this sampling density
across all spiders. The morphology dataset used here by itself yields
1664 most parsimonious trees and contains large polytomies. Given
more time to .stitch. together morphological
knowledge, we are confident that comparative morphology will supply
a very strong phylogenetic signal for spiders.
POY (vers. 3.0)
now supports maximum likelihood analysis with simultaneous dynamic sequence
alignment on combined data of any sort (morphological, fossil, molecular,
etc.). We treated the molecular data as nine complex characters (six
molecules, plus three distinct 18S regions) under a seven parameter
substitution model (GTR+indels) with adjustments for invariant sites
and 4 class gamma rate distribution, empirically estimating five "base"
frequencies (A, C, G,T, and "gap").
Each parameter was independently estimated for each of the nine
molecular "characters." Morphological
data were analyzed using the model of Tuffley & Steel (1997). The results are roughly comparable to the parsimony
results, recovering, for example, Amblypygi, Pedipalpi, Austrochilidae,
Deinopidae, Archaeidae, Theridiidae, Synotaxidae, Agelenidae, Desidae,
Eresidae, Liocranidae, Clubionidae, Anyphaenidae, Salticidae, Oxyopidae,
Scytodidae-Sicariidae, Oonopidae-Orsolobidae, and Amphinectidae-Desidae,
but losing, for example, zodariid monophyly. Problems are still evident
at deeper nodes. We believe these problems are data-dependent and will
be evanescent, but wish to emphasize that our computational capabilities
now include various implementations of maximum likelihood and parsimony,
whether of partitioned or simultaneously analyzed data, and constitute
uniquely powerful tools for exploratory data analysis.
Broad
Impacts
Mega-diverse groups
like spiders are a major element of our planet.s biocomplexity, performing
crucial roles in the ecological processes that support human life.
Because of the historical emphasis on larger, more conspicuous
organisms, groups like spiders have been comparatively neglected, and
a full appreciation of their role in the evolutionary history of life
on earth has been impossible to achieve. By combining a massive comparative genomic survey
of spiders with an equally thorough survey of new and existing morphological
and behavioral data, we will be able to elucidate the history of a major
chunk of the tree of life, on a global scale.
Through this project
we will help train at least three postdoctoral fellows and at least
three graduate students, in all aspects of this interdisciplinary effort,
from morphological and molecular data collection to the details of modern
computational techniques for phylogenetic analysis. New and encyclopedic archives of all the available
comparative data on a wide range of taxa, and the results of our analyses
of those data, will be made available electronically to colleagues everywhere
through the www.
Training
and Education. This research proposal brings in a strong training
and educational commitment covering a wide range of educational levels,
from high school students to postdoctoral trainees. Four postdoctoral
trainees
will be trained during the duration of the grant. We favor postdoctoral tenures of two years.
These trainees will have their .home base. at
the AMNH, CAS, GWU and the Smithsonian.
If this project were to be funded, CAS would match to hire a
postdoc for two years to work on this research (see letter from Griswold
in Supplementary Documentation). We will implement a system of laboratory rotation
that will allow postdocs to work on project areas complementary to their
primary research responsibilities. For
example, the postdoc at GWU will be responsible primarily for collecting
the morphological data of a particular clade (or group of clades).
During his/her two year tenure at GWU this trainee will carry
out a smaller project at a molecular lab (e.g., the AMNH.s) so he/she
can complement their expertise by becoming familiar with the collection
and analysis of sequence data. This
hands-on approach will help trainees to become familiar with the diversity
of data and analytical approaches involved in the project as a whole.
Several graduate students will be actively involved in this research project.
The following project participants are based at academic institutions
that offer graduate programs in systematics (e.g., www.gwu.edu/~clade)
and can act as primary graduate advisors: Arnedo (U. Barcelona), Bond
(ECU), Hedin (SDSU), Hormiga (GWU), Maddison (U. Arizona), and Scharff
(U. Copenhagen). All the remaining PIs and collaborators have
formal ties with academic institutions, such as adjunct professorships
at local universities, and can co-direct theses and dissertations (see
Biographical Sketches). The diversity of participating institutions
(universities, museums, research institutes, etc.) and approaches (morphologists,
molecular systematists, theoreticians, programmers, etc.) will provide
a fertile milieu for research exchanges that will be particularly beneficial
for students. In total our group has more
than 106 years of experience as biology educators and have been
involved, in cumulative terms, in the training of more than 56 graduate
students and 19 postdoctoral associates.
Given the magnitude and scope of the TOL project we favor training
doctoral students over M.S., but are open to latter. As stated for the postdoctoral trainees, we
will implement a system of lab rotation for graduate students to ensure
training in all aspects of systematics.
In addition, we will try to integrate existing (or future) graduate
students in our labs within the TOL project, by complementing the scope
of their doctoral projects. For
example, a graduate student working on species-level systematics for
his/her dissertation could contribute to the TOL project by placing
the genus in the higher-level cladistic context by using and contributing
to TOL data. Such an approach would be mutual |