About: Phylogenetic invariants

Property	Value
dbo:abstract	Phylogenetic invariants are polynomial relationships between the frequencies of various site patterns in an idealized DNA multiple sequence alignment. They have received substantial study in the field of biomathematics, and they can be used to choose among phylogenetic tree topologies in an empirical setting. The primary advantage of phylogenetic invariants relative to other methods of phylogenetic estimation like maximum likelihood or Bayesian MCMC analyses is that invariants can yield information about the tree without requiring the estimation of branch lengths of model parameters. The idea of using phylogenetic invariants was introduced independently by James Cavender and Joseph Felsenstein and by James A. Lake in 1987. At this point the number of programs that allow empirical datasets to be analyzed using invariants is limited. However, phylogenetic invariants may provide solutions to other problems in phylogenetics and they represent an area of active research for that reason. Felsenstein stated it best when he said, "invariants are worth attention, not for what they do for us now, but what they might lead to in the future." (p. 390) If we consider a multiple sequence alignment with t taxa and no gaps or missing data (i.e., an idealized multiple sequence alignment), there are 4t possible site patterns. For example, there are 256 possible site patterns for four taxa (fAAAA, fAAAC, fAAAG, … fTTTT), which can be written as a vector. This site pattern frequency vector has 255 degrees of freedom because the frequencies must sum to one. However, any set of site pattern frequencies that resulted from some specific process of sequence evolution on a specific tree must obey many constraints. and therefore have many fewer degrees of freedom. Thus, there should be polynomials involving those frequencies that take on a value of zero if the DNA sequences were generated on a specific tree given a particular substitution model. Invariants are formulas in the expected pattern frequencies, not the observed pattern frequencies. When they are computed using the observed pattern frequencies, we will usually find that they are not precisely zero even when the model and tree topology are correct. By testing whether such polynomials for various trees are 'nearly zero' when evaluated on the observed frequencies of patterns in real data sequences one should be able infer which tree best explains the data. Some invariants are straightforward consequences of symmetries in the model of nucleotide substitution and they will take on a value of zero regardless of the underlying tree topology. For example, if we assume the Jukes-Cantor model of sequence evolution and a four-taxon tree we expect: This is a simple outgrowth of the fact that base frequencies are constrained to be equal under the Jukes-Cantor model. Thus, they are called symmetry invariants. The equation shown above is only one of a large number of symmetry invariants for the Jukes-Cantor model; in fact, there are a total of 241 symmetry invariants for that model. Symmetry invariants are non-phylogenetic in nature; they take on the expected value of zero regardless of the tree topology. However, it is possible to determine whether a particular multiple sequence alignment fits the Jukes-Cantor model of evolution (i.e., by testing whether the site patterns of the appropriate types are present in equal numbers). More general tests for the best-fitting model using invariants are also possible. For example Kedzierska et al. 2012 used invariants to establish the best-fitting model out from a specific model set. The asterisk after the JC69, K80, and K81 models is used to emphasize the non-homogeneous nature of the models that can be examined using invariants. These non-homogeneous models include the commonly used continuous-time JC69, K80, and K81 models as submodels. The SSM (strand-specific model), also called the CS05 model, is a generalized non-homogeneous version of the HKY (Hasegawa-Kishino-Yano) model constrained to have equal distribution of the pairs of bases A,T and C,G at each node of the tree and no assumption regarding a stable base distribution. All models listed above are submodels of the general Markov model (GMM). The ability to perform tests using non-homogeneous models represents a major benefit of the invariants methods relative to the more commonly used maximum likelihood methods for phylogenetic model testing. Phylogenetic invariants, which are defined as the subset of invariants that take on a value of zero only when the sequences were (or were not) generated on a specific topology, are likely to be the most useful invariants for phylogenetic studies. . (en)
dbo:wikiPageExternalLink	https://www.worldcat.org/oclc/52127769%7Ctitle=Inferring
dbo:wikiPageID	65534930 (xsd:integer)
dbo:wikiPageLength	17855 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1076986897 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Bayesian_inference_in_phylogeny dbr:Purine dbr:Pyrimidine dbr:David_Hillis dbr:Joseph_Felsenstein dbr:Degrees_of_freedom dbr:Substitution_model dbr:Maximum_likelihood_estimation dbr:Singular_value_decomposition dbr:PAUP* dbr:Mathematical_and_theoretical_biology dbr:Least_squares dbr:ENCODE dbr:PHYLIP dbr:James_A._Lake dbc:Phylogenetics dbr:Chi-squared_test dbr:Models_of_DNA_evolution dbr:Phylogenetics dbr:Neighbor_joining dbr:Multiple_sequence_alignment
dcterms:subject	dbc:Phylogenetics
rdfs:comment	Phylogenetic invariants are polynomial relationships between the frequencies of various site patterns in an idealized DNA multiple sequence alignment. They have received substantial study in the field of biomathematics, and they can be used to choose among phylogenetic tree topologies in an empirical setting. The primary advantage of phylogenetic invariants relative to other methods of phylogenetic estimation like maximum likelihood or Bayesian MCMC analyses is that invariants can yield information about the tree without requiring the estimation of branch lengths of model parameters. The idea of using phylogenetic invariants was introduced independently by James Cavender and Joseph Felsenstein and by James A. Lake in 1987. (en)
rdfs:label	Phylogenetic invariants (en)
owl:sameAs	wikidata:Phylogenetic invariants https://global.dbpedia.org/id/FQwKB
prov:wasDerivedFrom	wikipedia-en:Phylogenetic_invariants?oldid=1076986897&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Phylogenetic_invariants
is dbo:wikiPageRedirects of	dbr:Phylogenetic_invariant
is dbo:wikiPageWikiLink of	dbr:Substitution_model dbr:PHYLIP dbr:Weyr_canonical_form dbr:Phylogenetic_invariant
is foaf:primaryTopic of	wikipedia-en:Phylogenetic_invariants