©1996-2019
All Rights Reserved. Online
Journal of Bioinformatics . You may not store these pages in any form
except for your own personal use. All other usage or distribution is illegal
under international copyright treaties. Permission to use any of these pages in
any other way besides the before mentioned must
be gained in writing from the publisher. This article is exclusively
copyrighted in its entirety to OJB publications. This article may be copied
once but may not be, reproduced or re-transmitted
without the express permission of the editors. This journal satisfies the
refereeing requirements (DEST) for the Higher Education Research Data Collection
(Australia). Linking:To link
to this page or any pages linking to this page you must link directly to this
page only here rather than put up your own page.
OJBTM
Online Journal of Bioinformatics ©
Volume 8
(1):30-40, 2007.
PIDA: A new algorithm for
pattern identification
Putonti C1,2,
1Department
of Computer Science, 2Department of Biology and Biochemistry, and 3Department
of Chemistry,
ABSTRACT
Putonti C, Pettitt BM, Reid JG, Fofanov Y.,
PIDA: A new algorithm for pattern identification, Onl
J Bioinform, 8 (1):30-40, 2007. Algorithms for motif
identification in sequence space have predominately been focused on recognizing
patterns of a fixed length containing regions of perfect conservation with
possible regions of unconstrained sequence. Such motifs can be found in
everything from proteins with distinct active sites, to non-coding RNAs with
specific structural elements that are necessary to maintain
functionality. In the event that an insertion/deletion has occurred
within an unconstrained portion of the pattern, it is possible that the pattern
retains its functionality. In such a case the length of the pattern is
now variable and may not be overlooked when utilizing existing motif detection
methods. The Pattern Island Detection Algorithm (PIDA) presented here has
been developed to recognize patterns that have occurrences of varying length
within sequences of any size alphabet. PIDA works by identifying all
regions of perfect conservation (for lengths longer than a user-specified
threshold), and then builds those conservation “islands” into fixed-length
patterns. Next the algorithm modifies these fixed-length patterns by
identifying additional (and different) islands that can then be incorporated
into each pattern through insertions/deletions within the “water” separating
the islands. To provide some benchmarks for this analysis, PIDA was used
to search for patterns within randomly generated sequences as well as sequences
known to contain conserved patterns. For each of the patterns found, the
statistical significance is calculated based upon the pattern’s likelihood to
appear by chance, thus providing a means to determine those patterns which are
likely to have a functional role. The PIDA approach to motif finding is
designed to perform best when searching for patterns of variable length
although it is also able to identify patterns of a fixed length. PIDA has
been designed to be as generally applicable as possible since there are a
variety of sequence problems of this type, from transcription factor binding
sites in DNA, to structural motifs in non-coding RNA, to high-contact-order
domains in certain proteins. The algorithm was implemented in C++ and is freely
available upon request from the authors.
KEY
WORDS:
pattern discovery, motif conservation, variable length patterns
FULL-TEXT
(SUBSCRIPTION OR PURCHASE TITLE $25USD)