{"id":891,"date":"2011-06-22T11:08:46","date_gmt":"2011-06-22T10:08:46","guid":{"rendered":"http:\/\/www.monkeyshines.co.uk\/blog\/?p=891"},"modified":"2012-02-23T14:20:55","modified_gmt":"2012-02-23T14:20:55","slug":"tree-comparison","status":"publish","type":"post","link":"https:\/\/monkeyshines.co.uk\/blog\/tree-comparison\/","title":{"rendered":"Tree Comparison"},"content":{"rendered":"<p>\nThroughout the course of my degree I have found it useful to write summaries of the various aspects of phylogenetics and biology that I have learned. That these will be useful to others is perhaps a vain hope, in both senses of the word, but I thought I might as well publish some of them on my blog. (It also afforded me the chance to <a href='\/blog\/archives\/890'>try my hand at LyX\/LaTeX<\/a>.) For your reading convenience, <a href='\/doc\/Tree_Comparison.pdf'>this post is available as a pdf <img src='\/img\/pdf.gif' alt='pdf' style='padding: 0px;' \/><\/a>.\n<\/p>\n<p>\n<b>Tree Comparison<\/b><br \/>\nPhylogenetic trees have two properties that can usefully be compared, their topologies and their branch lengths. Usually, the desired outcome of a tree comparison is a single number, indicating how different the trees are from one another. Reducing multiple complex structures to a single interpretable digit is<br \/>\ndifficult, even when just comparing two trees; a range of methods have been<br \/>\ndeveloped, most of which use (sometimes implicitly) graph theoretical measures of distance. Note that this is different from the situation of tree evaluation,<br \/>\nwhere the aim is to determine whether some trees are a better representation<br \/>\nof evolutionary history than others. Tree comparison is often done after evaluation, to gauge how much credence and importance to give to the results of the evaluation (there is, as yet, no method to state formally that the difference between trees is significant).\n<\/p>\n<p>\nFelsenstein (2004, pp.528-535) provides a historical overview of phylogenetic<br \/>\ntree comparison, starting with the symmetric difference metric, also known<br \/>\nas the Robinson-Foulds (RF) distance, which measures differences in topology between a pair of (possibly multifurcating) trees (Robinson and Foulds, 1981). The symmetric difference can be conceptualised as the minimum number of transformations that are required to convert one tree to the other, where a transformation corresponds to either removing a branch and merging the nodes<br \/>\nit connected, or by splitting a node into two and inserting a branch between the<br \/>\nnew nodes. The symmetric difference is widely used, but can be highly sensitive;<br \/>\nthat is, it can have a high value for trees which are intuitively similar (Felsenstein, 2004).\n<\/p>\n<p>\nIncluding information on branch lengths in tree comparisons is potentially useful, particularly when the tree has a relatively wide range of branch lengths.<br \/>\nThe weighted Robinson-Foulds distance (Robinson and Foulds, 1979) and the<br \/>\nbranch score (Kuhner and Felsenstein, 1994) are two metrics that use branch<br \/>\nlength information, and both are based on the symmetric difference. The<br \/>\nweighted RF distance is the sum of the differences between corresponding<br \/>\nbranch lengths; a branch length is considered to be zero if it does not exist in<br \/>\none of the trees. The branch score is similar, but squares the differences before adding them, and the square root of this sum is named the branch-length<br \/>\ndistance (BLD) (Felsenstein, 2004).\n<\/p>\n<p>\nThe pair of trees being compared can be mapped to two points in tree space,<br \/>\nwhich suggests another distance metric, the geodesic distance, defined as the<br \/>\nshortest path between two points in tree space. In tree space, the weighted<br \/>\nRF distance and the BLD correspond to Manhattan and Euclidean distances, respectively (Kupczok et al., 2008). Calculating the geodesic distance may be<br \/>\ncomputationally prohibitive for large trees, but good approximations are available (Kupczok et al., 2008).\n<\/p>\n<p>\nAll of the distances that use branch lengths will produce relatively high values if<br \/>\nthe branches in one tree tend to be larger, even if the topologies are very similar; that is, if the evolutionary rate differs between the trees. This behaviour may or may not be desirable, so to prevent differences in rate from having a disproportionate effect, Kuhner and Felsenstein (1994) suggested using relative<br \/>\nbranch lengths, dividing each branch length by the sum of all branch lengths.<br \/>\nAs far as I am aware, this has not been implemented in any publicly available<br \/>\nsoftware. The K score is a modification of the BLD that scales one tree to have<br \/>\nsimilar global divergence to the other before calculating the BLD, but the scaling means that the K score is no longer mathematically defined as a distance,<br \/>\nand its use is not always appropriate (Soria-Carrasco et al., 2007).\n<\/p>\n<p>\n<b>Citing this Document<\/b><br \/>\n[If referring to this document, please cite its location on the Monkeyshines<br \/>\nwebsite: <a href=\"\/blog\/archives\/891\">http:\/\/www.monkeyshines.co.uk\/blog\/archives\/891<\/a>]\n<\/p>\n<p>\n<b>References<\/b><\/p>\n<ul>\n<li>\nFelsenstein, J. (2004) <em>Inferring Phylogenies.<\/em> Sinauer, Sunderland, Massachusetts.\n<\/li>\n<li>\nKuhner,M.K. and Felsenstein, J. (1994) A simulation comparison of phylogeny<br \/>\nalgorithms under equal and unequal evolutionary rates. <em>Molecular Biology<br \/>\nand Evolution<\/em>, <strong>11<\/strong>, 459-468. <a href='http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/8015439'>Pubmed<\/a>\n<\/li>\n<li>\nKupczok, A. et al. (2008) An exact algorithm for the geodesic distance between<br \/>\nphylogenetic trees. <em>Journal of Computational Biology<\/em>, <strong>15<\/strong>, 577-591. <a href='http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18631022'>Pubmed<\/a>\n<\/li>\n<li>\nRobinson, D.F. and Foulds, L.R. (1979) Comparison of weighted labelled trees.<br \/>\n<em>Lecture Notes in Mathematics<\/em>, <strong>748<\/strong>, 119-126.\n<\/li>\n<li>\nRobinson, D.F. and Foulds, L.R. (1981) Comparison of phylogenetic trees.<br \/>\n<em>Mathematical Biosciences<\/em>, <strong>53<\/strong>, 131-147.\n<\/li>\n<li>\nSoria-Carrasco, V. et al. (2007) The Ktree score: quantification of differences in<br \/>\nthe relative branch length and topology of phylogenetic trees. <em>Bioinformatics<\/em>, <strong>23<\/strong>, 2954-2956. <a href='http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17890735'>Pubmed<\/a>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Throughout the course of my degree I have found it useful to write summaries of the various aspects of phylogenetics and biology that I have learned. That these will be useful to others is perhaps a vain hope, in both senses of the word, but I thought I might as well publish some of them [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[],"class_list":["post-891","post","type-post","status-publish","format-standard","hentry","category-biology"],"_links":{"self":[{"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/posts\/891","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=891"}],"version-history":[{"count":7,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/posts\/891\/revisions"}],"predecessor-version":[{"id":922,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/posts\/891\/revisions\/922"}],"wp:attachment":[{"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=891"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=891"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monkeyshines.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=891"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}