Speaker Attention for Video Conferencing using Multiple Visual Cues

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Bjarke Andersen
  • Peter Zinck Nielsen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
This report describes the investigation of methods to be used in a video conferencing system, where a person gets the attention by raising his or her right hand. A system is designed and implemented and a number of video recordings made to make it possible to do experiments on the methods.
To constrain the search area for hand raises, the faces in the videos are found and tracked. To find the faces, we first detect the skin-colours in the images using either lookup tables (LUTs) or Gaussian models. Methods which make it possible to adjust to changes in illumination colour are also investigated. A list of face candidates is made and each face candidate verified by looking at the size, solidity, similarity to a nose-eye template, and elliptic shape. Face trackers are updated and new trackers started based on the face list. Different methods for tracking are investigated and a combination of the Mean Shift algorithm, ellipse fitting, and a Kalman filter is found to be suitable. Based on the face trackers, the areas in which to search for hand-raises are defined. To detect hand-raises the accumulated difference pictures (ADPs) are used. Hand-raises will leave a vertical track in the ADPs, and can therefore be distinguished from other skin-coloured objects passing by in the background or foreground.
Experiments are made to determine the best combination of methods to use and to find out how well the system handles different situations such as illumination change, occlusion, movement in the background, etc. Finally, suggestions for future work are given and the investigations and results made in this report are concluded upon.
Udgivelsesdatojun. 2001
ID: 61080400