Text only

CM30076 / CM30082
Individual Project

Project Ideas

Dr John Collomosse


These projects all lie on a continuum between Computer Graphics and Computer Vision, and most are in line with my own research interests. All require a reasonable degree of competence in mathematics. I am also happy to supervise your own original ideas in these areas. For more details please call in to see me - I am also happy to answer questions via email.

Photo collection categorisation and search
Digital cameras have become common consumer devices, contributing to an explosion in amateur digital imagery. Commercial software to manage and index this imagery (e.g. home photo collections) is becoming increasingly popular, but remains quite rudimentary - photos are manually tagged with keywords, which can then be searched on (e.g. Flickr). There are two possible projects here:
  1. Write software that will automatically tag photos based on their visual content. For example, a facial recognition system that tags photographs with the names of people present in them. The names can then be searched on by entering text. Other alternatives/augmentations might be a system capable of recognising day/night indoor/outdoor scenes.
  2. Write software that will allow a user to sketch a desired image (either as line art, or using colour blobs), and then return the image in a collection mostly closely matching that sketch. This is an example of "Content Based Image Retrieval" (CBIR) i.e. searching on the visual content of an image, rather than on keywords annotated to it. This is a more challenging task than (a) and is the topic of a research grant starting this summer. Therefore in developing a successful system, you may have the opportunity to contribute to a live research project.
Pre-requisite knowledge: Good mathematical background. Strong ability in graphics and programming (Matlab).
Indicative reading: Google for CBIR - many references online.
Augmented Reality
Augmented reality (AR) is a form of virtual reality in which text/graphics are superimposed over a live video feed, output through a pair of eye-goggles, giving the impression that virtual objects exist within the user's real world environment. Virtual objects move in accordance with the user's own movements, so adding to sense of realism. The Department possesses a head mounted display (pair of eye-goggles) with built in camera, enabling students to experiment with innovative project ideas in this developing area. Possibilities might include education/tutorials, navigation/interactive guides around tourist sites, etc. however you are encouraged to be creative and put forward novel ideas. Open source software exists to handle the Computer Vision issues of tracking and object immersion (see ARToolKit). Your implementation would interface with this library using the OpenGL graphics library, and C/C++.
Pre-requisite knowledge: Strong ability in Computer Graphics. Able to program efficiently in C/C++ - code will need to run quickly for this real-time application. Experience with the OpenGL graphics library would be advantageous but a good programmer should pick this up quickly.
Indicative reading: Have a look web resources on augmented reality for creative inspiration. Visit the ARToolKit webpage.
360 degree video camera with playback
Standard video is restricted in its field of view; you can only see so much in a frame, and the choice of what you see has already been made by the person shooting the video. This project will explore ways of recording 360 degree video using commodity cameras i.e. webcams. The idea is to stich the output from multiple webcams together to record a larger view of the environment - up to 360 degrees. This video data would be played back later using a custom graphics application which projects the stitched recording onto the inside of a (virtual) sphere, with the user's "viewpoint" at the centre of the sphere (think of a planetarium). The user's viewpoint would be able to rotate, enabling them to look around the inside of the sphere, so enabling them to choose the part of the environment they wish to view at the time of playback (rather than at the time of shooting the video). The techniques used to stitch video would be similar to those used to stitch photographs into panoramas, as discussed in Foundations of Computer Graphics this year. The challenge will be detecting the areas of overlap in the stitched images to allow for seamless viewing. The likely language for implementation is C/C++ (with some prototyping in Matlab) but this is negotiable.
Pre-requisite knowledge: Good mathematical background. Strong ability in graphics and programming (C/C++ or Matlab).
Read up on quaternions in Alan Watt's Advanced Rendering text. For a review of image stitching techniques read the start of David Capel's PhD thesis (Oxford University, 2001) or read Chapter 6 of my PhD thesis and follow the references.
Mobile image recognition for context discovery
The inclusion of cameras in mobile devices presents significant potential for the development of context-aware pervasive applications. In collaboration with colleagues at Bath on the CityWare project, I have developed a prototype image recognition system capable of identifying landmarks (typically, buildings and other urban structures) from photographs captured on camera phones. The image recognition process runs server-side, as a web service accessible over GPRS from a web enabled phone or PDA. Photographs of landmarks are pre-loaded into a central database, which is then queried by users submitting images from their camera phones. The image recognition algorithm is robust to variations in both illumination and point of view.
I am interested in improving the scalability of this prototype and developing a few more demonstrator applications of this technology (currently we have a tourist information app). Please see me if you are interested to discuss possible projects.
Pre-requisite knowledge: Good mathematical background. Strong ability in programming. Depending on whether you choose to work on the client or server side of the system you will need good Java/C#, or C skills respectively.
Indicative reading: None set - see me to discuss.
Scripted Computer Vision Behaviour Detection
Computer Vision is often used commercially to detect and react to particular behaviour patterns. Consider a brief example. A moving object enters the car park zone. It is large, and so the system knows to classify it as a car. It stops within an image zone known to contain a parking bay. A smaller moving object breaks away from the stationary large object - a person - and exits the car park without visiting the "pay and display machine" zone. The carpark attendant is automatically paged, and issues a fine. This example demonstrates how simple image processing (identification of moving blobs) can be combined with a priori specification of rules and image zones to create a reactive vision system. Many such systems exist, but are often custom built to particular applications - i.e. the rules are hard coded. The aim of this project is to develop a generalisation of such reactive systems, that will use a simple scripting language of your design to specify rules and so describe behaviour of the system.
Pre-requisite knowledge: Good mathematical background. Good mark/strong ability in the Computer Graphics unit. Some background knowledge on language parsing/compiler design.
Indicative reading: There are numerous papers on vision and behaviour detection, but David Hogg's research at Leeds is a good reference point. Introductory material re: image processing, e.g. "Feature Extraction and Image Processing" Nixon/Aguado, or "Image Processing the Fundamentals" Petrou. Copies in the library.