Ideas from Biology and Biological Sciences Department
Local contact: Dr Alwyn Barry,
These project ideas have been provided by staff members from the Department of Biology and Biochemistry. A student attempting one of these projects will be allocated a project supervisor from within the Department of Computer Science, but will be working to the requirements of the staff member sponsoring the project in the Biology and Biochemistry Department. Tutorial support for the Biological background will be provided by the Biology Department tutor.
- 3-D Visualisation of Artemus Output
- Sponsor: Dr Alex Jeffries
- Last year a Computer Science project student provided a new piece of software which enabled genetic information relating to a number of related species of bacteria to be compared for commonalities. A basic visual comparison between the species was provided, but no real attempt was made to investigate or provide visualisation of the data.
- This new project will build upon the new data available as a result of the previous work. It will investigate possible visualisation techniques that will identify and permit investigation of the links between genes as they are expressed in the proteins that they encode. The visualisation technique will make use of 3D graphics so that the complexity of the relationships can be more readily represented, and shall allow the viewer to "dig down" into the data to look in more detail at areas of interest. The software produced should, in addition, provide a framework into which new visualisations of the data can be "plugged in" and simultaneous multiple visualisations should be made possible.
- The amount of data involved and the computing power required for the visualisations may be large. The project should therefore also investigate GRID technologies and identify ways in which the GRID can be used to speed up the production of visualisations, provide access to more data and/or speed up dissemination of results.
- Protein Structure Threading
- Sponsor: Dr Alex Jeffries
- It is possible to identify the 3D structure of a protein by means of X-Ray Crystalography. Given a known genetic sequence with a known 3D structure, it is also possible to take a second sequence with an unknown structure and overlay (or "thread") the sequences to gain information about their similarity. Using standard Z statistics we can then identify the degree of "fit" of the sequences, and if the fit is good one can infer the 3D structure of the unknown protein without having to go to the expense of 3D Crystalography. One might expect, therefore, that similarly shaped proteins have similar sequences. Interestingly we sometimes find two bacteria with a similar structure, which might therefore be inferred to have similar ancestry, whose genetic sequences are actually very different.
- Dr Jeffries has two proteins which need comparison. Unfortunately, since there are only two sequences available rather than two sets of sequences, it is not possible to compute Z scores. We therefore need to use another "threading" technique which will allow less similar sequences to be used as part of the comparison. A program to do this job has already been written, but due to the amount of data and possible lack of programming practice, is very slow. It is also insufficiently genetal and does not provide all the required functionality. This program needs to be redesigned, rewritten and expanded, taking particular account of the fact that the computation is very processor intensive ... in fact, so much so that the program may need to be implemented as a parallel program run in parts on a number of machines.
- This project is suitable for any student with good programming skills who would be interested in learning a bit about parallel processing, possibly using the PVM or MPI libraries.
- Species Data Set Provision
- Sponsor: Dr Szekeley
- Bioinformatics includes the identification and analysis of biological data to infer new information about the origin and connections between species. Often the information being processed is genetic data that has been discovered as part of on-going DNA decoding. Many research centres are discovering new information on a regular basis, but for new discoveries related to the connections between genetic information to be made this data has to be collected and maintained in a location that is accessible to many researchers.
- This project will seek to identify the requirements for genetic data upload, cleansing, storage and access to help this research community. In particular, it needs to provide a storage solution that can integrate data that may physically reside on different servers in many different countries, so that it appears to be stored within a single accessible system. It needs to be aware of the ownership and access rights connected with data, and provide appropriate security to make the data accessible only in accordance with security and access policies which may change regularly. It needs to permit constant updating of the data in a system which is always accessible. The volumes of data it provides access to may grow very large, and so scalability and efficiency constraints are an issue. Finally, it may need to provide ways of answering common research queries which are seeking to find connections between behaviour, illnesses, or evolutionary paths between species.
- This is a vital piece of work for the bioinformatics research community, and successful completion of this project would make a significant research contribution.
- This project is most suitable for candidates with client-server database development experience and with a good previous academic record.