Measures of Sequence / Structure Similarity

To analyze sequence and structure space of designed pools at the base level, we use two standard measures of distance between any two RNAs: Hamming distance and tree edit distance.

The Hamming distance is the number of differing letters between two equal-length RNA sequences aligned end-to-end (Hamming, 1987). The tree edit distance between two (full) tree secondary structures measures the minimum sum of the cost (insertion, deletion, and replacement of nodes) along an edit path for converting one tree into another (Hofacker, 2003). We use the tree edit distance measure as implemented in RNAdistance of the Vienna RNA package.


Contour plots of sequence/structure relationships using Hamming distance versus tree edit distance for pools generated by 22 mixing matrices, starting from a modified P5abc domain. Each color bar indicates the frequency of joint distance distributions. There are 10,000 sequences in each pool as motif distribution calculation.