25 August 2003 DNA sequence similarity search through content-based retrieval technique
Author Affiliations +
Deoxyribonucleic acid (DNA) sequences are difficult to analyze similarity due to their length and complexity. The challenge lies in being able to use digital signal processing (DSP) to solve highly relevant problems in DNA sequences. Here, we transfer a one-dimensional (1D) DNA sequence into a two-dimensional (2D) pattern by using the Peano scan algorithm. Four complex values are assigned to the characters “A”, “C”, “T”, and “G”, respectively. Then, Fourier transform is employed to obtain far-field amplitude distribution of the 2D pattern. Hereto, a 1D DNA sequence becomes a 2D image pattern. Features are extracted from the 2D image pattern with the Principle Component Analysis (PCA) method. Therefore, the DNA sequence database can be established. Unfortunately, comparing features may take a long time when the database is large since multi-dimensional features are often available. This problem is solved by building indexing structure like a filter to filter-out non-relevant items and select a subset of candidate DNA sequences. Clustering algorithms can organize the multi-dimensional feature data into the indexing structure for effective retrieval. Accordingly, the query sequence can be only compared against candidate ones rather than all sequences in database. In fact, our algorithm provides a pre-processing method to accelerate the DNA sequence search process. Finally, experimental results further demonstrate the efficiency of our proposed algorithm for DNA sequences similarity retrieval.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chia Hung Yeh, Chia Hung Yeh, Po Yi Sung, Po Yi Sung, Hsuan T. Chang, Hsuan T. Chang, Chung Jung Kuo, Chung Jung Kuo, "DNA sequence similarity search through content-based retrieval technique", Proc. SPIE 5096, Signal Processing, Sensor Fusion, and Target Recognition XII, (25 August 2003); doi: 10.1117/12.486714; https://doi.org/10.1117/12.486714

Back to Top