Why do we care about phylogenies?
-All biological relationships can be defined by constructing phylogenies.
Phylogenies
show genetic and evolutionary relationships among groups and individuals
Sister taxa
groups that share an immediate common ancestor
pedigree
depiction of relations within a population
Parts of phylogeny: taxon
The ends of each branch
Parts of phylogeny: root
beginning of phylogenetic tree
Parts of phylogeny: node
speciation event has occurred: where a new branch is created
Parts of phylogeny: internal branch (internode)
space in between different speciation events
monophyletic
clade consisting of ancestral taxa and ALL descendents
Molecular clock
phylogenies rely on the molecular clock--mutations (on average) occur at a given rate.
If more mutational differences?
species branched a long time ago
If fewer mutational differences?
species branched more recently
Problems with molecular clock
mutation rate can vary among species
Parsimony
uses discrete characters (like mutations or some trait)
How to choose which tree is most parsimonious?
Draw multiple trees, one with fewest character-state transition is most parsimonious
Pros of Parsimony:
-fastest and simplest method of phylogenetic reconstruction
Cons of Parsimony:
-May give misleading results if rates of evolution (mutation rate) differ in dif. lineages. -Beomes less accurate as genetic distance increases (only 4 nucleotides so will get same mutations occurring repeatedly)
Distance Matrix
calculate pairwise distances between taxa, chose tree that minimizes overall distances between taxa
Distance Matrix in relation to parsimony
Generally more accurate than parsimony, but is also very quick like parsimony
Maximum Likelihood
probability of data given a tree -has one true answer and one true tree (no drawing multiple and choosing the best one)
Maximum likelihood and other methods
More accurate than parsimony and distance matrix but requires much more computation -relies on accuracy of which mutations are more probable
Bayesian inference
probability of tree given data (opposite of maximum likelihood) -uses prior information, doesn't assume there is only one correct tree
Bayesian inference and other methods
more accurate than parsimony or distance (similar to likelihood) -more computationally intensive than parsimony or distance but less than maximum likelihood.
Problems of phylogenetic reconstruction
-Insufficient data: yields tree lacking resolution (lacks statistical power) -Evolutionary history of individual genes aren't necessarily the same (try to get data from many genes or whole genome)
Sometimes the Molecular Clock (based on genetic data) conflicts with the Geological Record. Why would this happen? a) Sometimes there are gaps in the geological record, because fossils do not form everywhere, and mutation rate might vary between different species b) Radiometric dating relies on chance events in the preservation of isotopes, making the timing events in the geological time scale less accurate than the molecular clock c) Mutation rates slow down as you go back in time, making estimation of timing of events less accurate as you go back in time d) The molecular clock is calculated from radioisotopes, while the geological record is obtained from fossil data. The two can conflict when fossils end up displaced from their original sedimentary layer 52
a) Sometimes there are gaps in the geological record, because fossils do not form everywhere, and mutation rate might vary between different species
Which of the following is most TRUE regarding phylogenetic reconstructions? a) Phylogenetic reconstruction based on any gene would yield the same tree b) Parsimony is the most accurate method for reconstructing phylogenies c) Some DNA sequence data is better for phylogenetic reconstruction than others, such as those that tend to be less subjected to selection (3rd codon, introns) d) Maximum likelihood relies on maximizing distances among taxa
c) Some DNA sequence data is better for phylogenetic reconstruction than others, such as those that tend to be less subjected to selection (3rd codon, introns)
Which of the following types of data would be most optimal for constructing a phylogeny? a) Amino acid sequences b) Intron sequences within rapidly evolving genes c) Non-coding (and non-regulatory) sequences, or even better, whole genome sequences d) Genes that were introduced to the genome through horizontal gene transfer e) Genes that have undergone natural selection due to adaptation to different environments
c) Non-coding (and non-regulatory) sequences, or even better, whole genome sequences
Why would the type of data chosen in the question above be optimal for constructing phylogeny? a) Because we want the phylogeny to most accurately reflect patterns of adaptation to the environment b) Because we want the phylogeny to most accurately reflect patterns of protein evolution c) Because we want the phylogeny to most accurately reflect patterns of rapid genome evolution d) Because we want the phylogeny to most accurately reflect information from the pan genome e) Because we want the phylogeny to most accurately reflect neutral genetic relationships (based on the molecular clock) among organisms
e) Because we want the phylogeny to most accurately reflect neutral genetic relationships (based on the molecular clock) among organisms