7 February 2010 - 10:14Intelligent User Interfaces 2010 conference
I’m at IUI 2010, this is a mostly academic conference focused on using new techniques to make intelligent user interfaces. I’ll update this entry as the conference proceeds.
Day 1 (Sunday) – Workshops
I’m in the Eye Gaze for Intelligent Human Machine Interaction workshop, there’s a full breakdown of this session’s talks here. The talks focus on the use of eye-gaze tracking tools to let humans interact with computers in an intuitive and easy fashion.
Two talks have really caught my eye. Manuel Möller has presented “The Text 2.0 Framework – Writing Web-Based Gaze-Controlled Realtime Applications Quickly and Easily” (via here). Text20.net is the background site, they’re offering a browser plug-in (Safari at present, Chrome/Firefox to come) that augments your browsing experience if you’ve got a head tracker. They’ve added some new mark-up tags like:
- OnGazeOver – like OnMouseOver but fires if your gaze goes over the element (e.g. to make an image change or high-light)
- OnPerusal – if you quickly scan a piece of text then this would fire
- OnRead – only fires if your start to properly read the text
They propose using a site like DBPedia to augment your browsing experience – perhaps bringing in additional text if your gaze rests on a block of text, bringing in alternative images if you look at an image or translating text that you re-read if it knows you’re a foreign-language user.
The above is only useful if you have a gaze-sensing device and these are a bit pricey (think: $10,000-$20,000). However…
Shortly before Wen-Hung Liao presented “Robust Pupil Detection for Gaze-based User Interface” (via here) where he described a $60 device (the $60 refers to the cost of a standard 640×480 30fps webcam) that gives reasonable eye-gaze tracking on a desktop computer. Pretty much he’s describing a way to replace $20,000 work of high-end eye-gaze tracking tools with the webcam in your laptop.
The resolution achieved is around 40×40 – pretty low but enough to support a lightly modified web browser that allows eye-gaze control. The modification is a zoom whenever the user’s gaze rests on an area – that section zooms so you can more accurately select a link.
Here’s a demo showing “eye typing” (see some more under VIPLpin):
There is a downside – natural light washes out too much detail (and casts shadows and reflections) so the camera needs a simple modification. By popping out the normal lens and using an IR lens the camera senses light in the infra-red range – for this algorithm the input is far cleaner. It is quite conceivable that we’ll have a second (IR style) webcam in our laptops and this second device could give us simple gaze control on our machines. This algorithm runs comfortably on a dual-core machine at 30fps (previous generation algorithms are laggy as they’re too CPU-intensive).
What happens if we combine this $60 device (free for me – I have a good webcam in my MacBook that could be modified…) with the Text 2.0 plug-in? I can probably navigate web pages when reading wikipedia purely using gaze. If the gaze is getting to the bottom of the screen then it could auto-scroll and I’d certainly like annotations from sites like wikipedia augmenting my research experience.
The workshop is over and we’ve ended up having a further chat about Pico projectors costing $350USD (apparently a bit dangerous – they’re laser-based and can burn the retina) and augmenting reality with said devices as you wander around (imagine strapping one to your chest).
In the poster session that followed Stylianos Asteriadis showed a head pose detector that works using a desktop webcam using a published algorithm – this could be used in gaming and for hands-free control. It detects the attitude of the head on 3 axis by investigating a bounding box around the head and the location of features like eyes and the mouth.
Some interesting people met so far – Chuck Rich (cool robots), Isamu Nakao (Sony R&D), Wen-Hung Liao (National Chengchi Uni). Tweets are under #iui2010.
Day 2 (first day of conference talks)
The first talk of the day was Cortically Coupled Computer Vision by Paul Sajda. The intent was to speed up search for a target image from a large database using fast brain recognition techniques. The user has a target image in mind, they throw 10s of images at a user showing each for 100ms. By recording brain activity using non-invasive techniques like EEG and a custom labeling approach the they were able to significantly improve precision and recall in search problems.

This was followed by the 1-minute madness session where 20 or so speakers introduced the posters that would be shown at the banquet the next night. Two that caught my eye were Henry Lieberman’s Why UI (he’s one of the creators of ConceptNet) and another chap’s $3 Gesture Recognizer (based on Android and Wii devices):
Amy Harrison gave an interesting talk on Automatically Identifying Targets Users Interact With During Real World Tasks. Given my background with screencasting and interest in scripted (automatic) screencasting, the ideas around taking screenshots and identifying screen targets (like buttons, scroll bars etc) to extract additional information was very interesting. Her techniques using CRUMBs identify 89% of user interface features vs 74% for the Microsoft accessibility interface.
Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.
No Comments | Tags: ArtificialIntelligence














