Our pool design algorithm is based on analyses of sequence and structure spaces to allow design of specific structures,
including novel RNA-like motifs identified using
graph theory analysis.
The algorithm below assumes we have available reference data
that relate mixing matrices
and starting sequences to
motif distributions in resulting pools. By knowing the structural
distributions of various sequence space regions, we optimize the choice of starting sequences, mixing matrices, and
associated weights to approximate the target structured pool.|
Our pool design algorithm involves the following steps: |
(i) Specify a target distribution (T) of topologies/shapes.
(ii) Define candidates for starting sequences and mixing matrices that aim to cover the sequence space. In this web server, we use mainly 6 starting sequences and constructed 22 mixing matrices to cover the sequence space.
(iii) Compute motif distributions corresponding to all starting sequence / mixing matrix pairs. This step analyzes pool structural diversity.
(iv) Choose the number of mixing matrices to approximate the designed pool.
(v) Find an optimal combination of starting sequences (Si) and mixing matrices (Mi) and associated weights () to approximate the target RNA motif distribution. The mathematical procedures for this step are detailed below.
Optimization Procedures for Step (v)We approximate a target structural distribution by optimizing a set of starting sequence/mixing matrix pairs based on pool structural frequency data. Generally, we consider a designed pool composed of k subpools, each generated with a mixing matrix/starting pair and associated with a weight : , where and denotes synthesizing fraction of the pool sequences using starting sequence Si and mixing matrix Mi. Optimization of the three pool parameters Si, Mi and can be formulated as follows: If the nx1 matrix is the target distribution with Ti fractions of structures 1, 2,…, n and is the pool fraction of structure l generated by starting sequence Si and mixing matrix Mi, the pool parameters () can be optimized by the following equation:
The estimated pool fractions for the other shapes or topologies 2,3,…,n are derived from the known as follows: