An algorithm for structured pool designs

Algorithm for the structured pool designs

Our pool design algorithm is based on analyses of sequence and structure spaces to allow design of specific structures, including novel RNA-like motifs identified using graph theory analysis. The algorithm below assumes we have available reference data that relate mixing matrices and starting sequences to motif distributions in resulting pools. By knowing the structural distributions of various sequence space regions, we optimize the choice of starting sequences, mixing matrices, and associated weights to approximate the target structured pool.

Our pool design algorithm involves the following steps:

(i) Specify a target distribution (T) of topologies/shapes.

(ii) Define candidates for starting sequences and mixing matrices that aim to cover the sequence space. In this web server, we use mainly 6 starting sequences and constructed 22 mixing matrices to cover the sequence space.

(iii) Compute motif distributions corresponding to all starting sequence / mixing matrix pairs. This step analyzes pool structural diversity.

(iv) Choose the number of mixing matrices to approximate the designed pool.

(v) Find an optimal combination of starting sequences (S_i) and mixing matrices (M_i) and associated weights () to approximate the target RNA motif distribution. The mathematical procedures for this step are detailed below.

Optimization Procedures for Step (v)

We approximate a target structural distribution by optimizing a set of starting sequence/mixing matrix pairs based on pool structural frequency data. Generally, we consider a designed pool composed of k subpools, each generated with a mixing matrix/starting pair and associated with a weight