Master of Science

E.coli Promoter Recognition Using Neural Networks

Completed in 2006

Abstract:

The size of biological sequence databases is increasing exponentially, making the use of sequence alignment algorithms to compare the homogeneity of newly discovered sequences to known sequences increasingly impractical. Impractical in terms of computational time and that similar sequences do not necessarily reflect related functions or properties. Hence the need for a classification system that can be trained on databases of sequences to produce a model of the sequence data, and so allow rapid classification of the type and features of uncharacterised sequences. A problem that has received much attention over the years is how to identify promoters in Escherichia coli (E.coli) DNA sequences, where a promoter is a regulatory region of DNA that determines to what extent a gene(s) is expressed.

This thesis presents a methodology for feature extraction from DNA sequences, feature selection and ultimately the application of different classifiers to the problem of E.coli promoter recognition. A comprehensive comparison study was carried out, where multi-layer feed-forward neural networks, Support Vector Machines and Extreme Learning Machines were employed. Unlike most other approaches to classifying E.coli DNA, the system presented does not rely on specific background knowledge of the binding sites of E.coli promoter regions, and so does not need to perform binding site alignment procedures, making the system computationally more efficient. The system performance was found to be comparable to previous researchers' results, despite previous researchers' system requiring specific knowledge of the binding sites with which to align the sequences to. It was also found that the type of non-target training examples strongly influenced the system's ability to recognize promoters.

Acknowledgement:

This thesis would not have been possible without the professional guidance, mental encouragement and financial support from my supervisor, Dr. Dianhui Wang, in the Department of Computer Science and Computer Engineering, La Trobe University. When I started my Research Masters course two years ago, I had no understanding of the word "academic" and did not know what I could do for bioinformatics using neural networks. He has demonstrated great patience and dedication in teaching me and his other students. He has taught me skills needed to produce insightful and practical research that contributes to knowledge and understanding for both academia and the world at large. His enthusiasm for academic pursuit and his professional working attitude will greatly influence my further career development. I highly appreciate his encouragement, professional supervision and personal help for my Masters course.

This thesis is a result of the physical and mental training and support received from my great Chinese, Buddhist, Taoist, Master Sifu Chow Yuk Nen. I deeply appreciate and thank Master Sifu Chow Yuk Nen for his great life training, guidance, energy and wisdom that he has generously given to me over the past six and a half years. I would like to thank my parents for their financial support and encouragement of my studies.

I am grateful to Ms Jennifer Valente, also a student of Master Sifu Chow Yuk Nen, who spent a lot of her free time explaining the complexities of genetics and biology to me. Without which, I would have been entangled in DNA. I would like to thank the Department of Computer Science and Computer Engineering for their support and for providing the resources with which I conduct this work. Finally I would also like to thank Heladia Salgado from RegulonDB, for supplying the E.coli promoter and gene sequence data.

Publications

Conilione, P.C. and Wang, D. "E-coli Promoter Recognition Using Neural Networks with Feature Selection", International Conference on Intelligent Computing, 2005, Vol2, Pages pp 61-70. bibtex

Conilione, P.C. and Wang, D. "Neural Classification of E.coli Promoters Using Selected DNA Profiles", The Fourth IEEE International Workshop on Soft Computing and Transdisciplinary Science and Technology, Springer, 2005, pp 51-60. bibtex

Conilione, P.C. and Wang, D. "Effect of Non-Target Examples on E.coli Promoters Recognition Using Neural Networks", Proceedings of International Joint Conference on Neural Networks (IJCNN) 2005 IEEE, 2005, pp 310-315 bibtex

Conilione, P.C. and Wang, D. "A Comparative Study on Feature Selection for E.coli Promoter Recognition", International Journal of Information Technology, 2005, 11, pp 54-66 bibtex