Our pool design algorithm is based on analyses of sequence and structure spaces to allow design of specific structures,
including novel RNA-like motifs identified using
graph theory analysis.
The algorithm below assumes we have available reference data
that relate mixing matrices
and starting sequences to
motif distributions in resulting pools. By knowing the structural
distributions of various sequence space regions, we optimize the choice of starting sequences, mixing matrices, and
associated weights to approximate the target structured pool.
| ||||||||||

Our pool design algorithm involves the following steps:
(i) Specify a target distribution ( (ii) Define candidates for starting sequences and mixing matrices that aim to cover the sequence space. In this web server, we use mainly 6 starting sequences and constructed 22 mixing matrices to cover the sequence space. (iii) Compute motif distributions corresponding to all starting sequence / mixing matrix pairs. This step analyzes pool structural diversity. (iv) Choose the number of mixing matrices to approximate the designed pool.
(v) Find an optimal combination of starting sequences (S
| ||||||||||

## Optimization Procedures for Step (v)We approximate a target structural distribution by optimizing a set of starting sequence/mixing matrix pairs based on pool structural frequency data. Generally, we consider a designed pool composed ofk subpools, each generated
with a mixing matrix/starting pair and associated with a weight : ,
where and denotes synthesizing fraction of
the pool sequences using starting sequence S and mixing matrix _{i}M.
Optimization of the three pool parameters _{i}S, _{i}M and
can be formulated as follows: If the _{i}nx1 matrix is the target distribution with
T fractions of structures 1, 2,…, n and is the pool fraction of structure
_{i}l generated by starting sequence S and
mixing matrix _{i}M, the pool parameters
() can be optimized by the following equation: _{i}
_{1}
+ _{2} + ... + _{k} = 1 and
_{i} 0.
Since experimental implementation of pool synthesis is simpler with fewer mixing matrices, we will consider below the
solution of for k=2;
the optimization procedure can be generalized. Formula (1) with only two mixing matrices M_{1} and
M_{2} reduces to:
The estimated pool fractions for the other shapes or topologies 2,3,…, n are derived from the known
as follows:
S, _{1}M) and (_{1}S, _{2}M)
by minimizing the error . The above procedure will allow us to obtain the optimized parameters
for a target distribution . The convergence of the procedure depends
on the number of mixing matrices and starting sequences, or coverage of the sequence/structure space.
_{2} |