The Royal Statistical SocietyThe Royal Statistical Society
Introduction to Bioinformatics

A Short Course on Statistical Bioinformatics
 
presented by Wally Gilks (Centre for Statistical Bioinformatics, University of Leeds)
 
An organism's DNA, including its genes, holds almost all the information required for  its development and function. Human understanding of this information is at an early stage, but is accumulating rapidly due to new high-throughput forms of experimentation. This has led to large and rapidly expanding databases of DNA sequence, and related databases of the structure and function of biomolecules such as proteins. Bioinformatics is concerned with the development of these databases, and tools for deciphering and exploiting the information they contain.

The links from bioinformatic data to underlying biology are noisy and uncertain. For example, knowing an organism's DNA sequence doesn't clearly identify where the genes are located within it; knowing the sequence of a gene doesn't tell us the 3-dimensional structure of the protein encoded by it; and knowing a protein's structure may give us only weak clues as to its function. In each case, however, bioinformatic methods have been developed to exploit such information.
Statistics is a science for extracting meaning from noisy, uncertain information. It would seem natural, therefore, to approach bioinformatic questions from a statistical perspective. Indeed, there is now a considerable statistical literature on gene-expression (microarray) data analysis. However, many other areas of bioinformatics have so far had little exposure to statististical thinking. This is partly a consequence of the shortage of statisticians with an adequate background in molecular and cell biology.

This short course will provide a brief introduction to some biological fundamentals, some bioinformatic questions, and some statistical solutions. These will be presented in four lectures:

" DNA sequence analysis: discovering what lies lurking in the genome
" Biomolecular structure: predicting the shape of proteins and RNA molecules
" Gene-expression: finding out how genes control and are controlled
" Phylogenomics: uncovering evolutionary relationships between organisms or genes.

The aim will be to convey some appreciation of the excitement and importance of this new scientific frontier, and the challenges and opportunities it presents to both methodological and applied statisticians.