Online Journal of Bioinformatics


Volume 8 (2):139-153, 2007.

Shape-to-String Mapping: A Novel Approach To Clustering Time-Index Biomics Data


Antoine W1, Miernyk JA1,2,3


1Department of Biochemistry 2USDA, Agricultural Research Service, Plant Genetics Research Unit and 3Interdisciplinary Plant Group, University of Missouri, Columbia, USA




Antoine W, Miernyk JA, Shape-to-String Mapping: A Novel Approach To Clustering Time-Index Biomics Data, Onl J Bioinform., 8 (2):139-153, 2007. Herein we describe a qualitative approach for clustering time-index biomics data. The data are transformed into angles from the intensity-ratios between adjacent time-points.  A code is used to map a qualitative representation of the numerical time-index data which captures the features in the data that define the shape of the pattern expression as a function of time.  The problem of clustering time-index biomics data is then either solved directly or reduced to a problem similar to the well-studied task of clustering protein sequence data.  For datasets with few time points, the words derived from the transformation are adequate to define clusters.  Dissimilarities between the newly defined objects can be estimated, and the distance matrix can be used for further clustering.  The results from transcript profiling of developing soybean embryo have been used to illustrate the utility of the method.  Comparative mapping of the intensity-ratios and the angles by multidimensional scaling and Procrustes analysis revealed otherwise cryptic information within the data.  The Euclidian distance matrices were calculated from the words and corresponding gene list using the PHYLogeny Inference Package (PHYLIP) algorithms and the Point of Accepted Mutation (PAM) scores matrix to compare the effectiveness of the code in clustering the data.


Key words:  String Map, Cluster, Biomics