I’m at IUI 2010, this is a mostly academic conference focused on using new techniques to make intelligent user interfaces. I’ll update this entry as the conference proceeds.
Day 1 (Sunday) – Workshops
I’m in the Eye Gaze for Intelligent Human Machine Interaction workshop, there’s a full breakdown of this session’s talks here. The talks focus on the use of eye-gaze tracking tools to let humans interact with computers in an intuitive and easy fashion.
Two talks have really caught my eye. Manuel Möller has presented “The Text 2.0 Framework – Writing Web-Based Gaze-Controlled Realtime Applications Quickly and Easily” (via here). Text20.net is the background site, they’re offering a browser plug-in (Safari at present, Chrome/Firefox to come) that augments your browsing experience if you’ve got a head tracker. They’ve added some new mark-up tags like:
- OnGazeOver – like OnMouseOver but fires if your gaze goes over the element (e.g. to make an image change or high-light)
- OnPerusal – if you quickly scan a piece of text then this would fire
- OnRead – only fires if your start to properly read the text
They propose using a site like DBPedia to augment your browsing experience – perhaps bringing in additional text if your gaze rests on a block of text, bringing in alternative images if you look at an image or translating text that you re-read if it knows you’re a foreign-language user.
The above is only useful if you have a gaze-sensing device and these are a bit pricey (think: $10,000-$20,000). However…
Shortly before Wen-Hung Liao presented “Robust Pupil Detection for Gaze-based User Interface” (via here) where he described a $60 device (the $60 refers to the cost of a standard 640×480 30fps webcam) that gives reasonable eye-gaze tracking on a desktop computer. Pretty much he’s describing a way to replace $20,000 work of high-end eye-gaze tracking tools with the webcam in your laptop.
The resolution achieved is around 40×40 – pretty low but enough to support a lightly modified web browser that allows eye-gaze control. The modification is a zoom whenever the user’s gaze rests on an area – that section zooms so you can more accurately select a link.
Here’s a demo showing “eye typing” (see some more under VIPLpin):
There is a downside – natural light washes out too much detail (and casts shadows and reflections) so the camera needs a simple modification. By popping out the normal lens and using an IR lens the camera senses light in the infra-red range – for this algorithm the input is far cleaner. It is quite conceivable that we’ll have a second (IR style) webcam in our laptops and this second device could give us simple gaze control on our machines. This algorithm runs comfortably on a dual-core machine at 30fps (previous generation algorithms are laggy as they’re too CPU-intensive).
What happens if we combine this $60 device (free for me – I have a good webcam in my MacBook that could be modified…) with the Text 2.0 plug-in? I can probably navigate web pages when reading wikipedia purely using gaze. If the gaze is getting to the bottom of the screen then it could auto-scroll and I’d certainly like annotations from sites like wikipedia augmenting my research experience.
The workshop is over and we’ve ended up having a further chat about Pico projectors costing $350USD (apparently a bit dangerous – they’re laser-based and can burn the retina) and augmenting reality with said devices as you wander around (imagine strapping one to your chest).
In the poster session that followed Stylianos Asteriadis showed a head pose detector that works using a desktop webcam using a published algorithm – this could be used in gaming and for hands-free control. It detects the attitude of the head on 3 axis by investigating a bounding box around the head and the location of features like eyes and the mouth. See example videos and publications.
Some interesting people met so far – Chuck Rich (cool robots), Isamu Nakao (Sony R&D), Wen-Hung Liao (National Chengchi Uni), Marc Cavazza (Companions project), Elisabeth Andre (avatars and agents). Tweets are under #iui2010.
Day 2 (first day of conference talks)
The first talk of the day was Cortically Coupled Computer Vision by Paul Sajda. The intent was to speed up search for a target image from a large database using fast brain recognition techniques. The user has a target image in mind, they throw 10s of images at a user showing each for 100ms. By recording brain activity using non-invasive techniques like EEG and a custom labeling approach the they were able to significantly improve precision and recall in search problems.
This was followed by the 1-minute madness session where 20 or so speakers introduced the posters that would be shown at the banquet the next night. Two that caught my eye were Henry Lieberman’s Why UI (he’s one of the creators of ConceptNet) and another chap’s $3 Gesture Recognizer (based on Android and Wii devices):
Amy Harrison gave an interesting talk on Automatically Identifying Targets Users Interact With During Real World Tasks. Given my background with screencasting and interest in scripted (automatic) screencasting, the ideas around taking screenshots and identifying screen targets (like buttons, scroll bars etc) to extract additional information was very interesting. Her techniques using CRUMBs identify 89% of user interface features vs 74% for the Microsoft accessibility interface.
Day 3 (Second day of conference and Demos)
In “Intelligent Understanding of Hand Written Geometry Theorem Proving” a technique was displayed that lets a student draw geometric diagrams along with annotations using standard geometry algebra – the system then recognises the diagram and the annotations and tells you if your annotations match the diagram. They developed new visual recognition algorithms with 90% accuracy (an audience member pointed them at existing algorithms that offer 95% accuracy), with similar accuracy for the hand-written annotation recognition. I could really see this being developed into a tool to help students learn geometry – fab stuff:
“Usage Patterns and Latent Semantic Analysis for Task Goal Inferences” looked at the use of a multi-modal interface (speech and pen in this case) so the user could speak a question like “How do I go from here to here?” whilst drawing the locations on the map. The system learns to recognise various types of drawing (e.g. points, circles and strokes) that are coupled with various question types:
The demo and poster session was very interesting! Everyone migrated upstairs for the rather excellent food, drink and a mix of live demos and posters.
The Nao (wikipedia, video demo) humanoid robot was very cool – it was dancing to Thriller and doing Tai Chi for us. The robot has complex joints, can balance, has dual cameras for vision, an on-board Geode-based AMD CPU (a mini PC), support for off-line processing, vision and speech recognition on-board and 30-60 min battery life.
Sven Kratz was demoing his accelerometer-based gesture recognition library for the iPhone. The work is based on “Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes” and is called “$3 Gesture Recognizer – Simple Gesture Recognition for Devices Equipped with 3D Acceleration Sensors”. I got to play with the demo, it recognised some of my gestures on an iPhone 3GS and Sven explained that a newer version is significantly faster and more reliable. One interesting feature is that it can recognise gestures even if the phone is turned e.g. upside down.
Rush (no photo but I do have video 1, video 2) is a novel iPhone interface for music preference selection. You move your thumb around and it selects paths through music options – watch the videos for details. The slides from the talk are also available, I really should have taken a photo during the demo.
I had a play with a haptic interface with augmented reality that acts as a dental trainer. Without augmented reality I could see a virtual tooth on a screen and using the pen (which is mounted in the grey haptic feedback device) I’d get force feedback whenever the pen tried to push ‘into’ the virtual tooth. If I pressed the pen’s button I’d activate the ‘drill’ and grind away some of the tooth, the feedback device then let me move into that grove. The force feedback was rather cool:
In addition I tried the augmented reality environment – using a pair of goggles and looking at a special card I could see a 3D (real-world) version of the tooth, the haptic interface again let me ‘feel my way’ around the tooth:
I also had a go on a simple game that uses a sensor strapped to the waist to measure ‘jumps’. In this open-source game you roll your marble to collect coins, when you run out of time you jump up and down to gain seconds. The project aims to encourage fitness through gaming, they measured improvements in users’ aerobic fitness compared to a non-jumping control version of the game.
In Agents as Intelligent User Interfaces for the Net Generation avatars are controlled by the user to train autonomous agents to solve tasks. I believe this is a part of Miao Chunyan’s work (e.g. “Transforming Learning through Agent Augmented Virtual World“).
In this example you teach the avatar how a plant’s internal processes work – the aim is to enhance the user’s understanding by forcing them to clearly explain the processes to the avatar so it solves certain tasks:
Professor Tracy Hammond was demonstrating some of the work from her lab in sketch recognition – for this poster she explained their tool which uses an off-the-shelf face recogniser to help sketching students learn to draw better faces. The system performs facial recognition on the user’s sketch and compares it to the target image so it can give feedback on areas that are wrong.
Tracy is also the creator of the tech behind all the sketch-a-car-and-watch-it-move physics demos that appeared in the last year or so, see a video of her original approach here.
Peggy Chi‘s poster talks about Raconteur, a system that helps a user construct a story using media elements with annotations using natural language. One of the tools underneath it is the common sense reasoning system ConceptNet. A focus of the software is the search for analogies between elements of the story or independent stories:
Henry Lieberman‘s poster also uses ConceptNet, they’re mapping how people solve tasks by performing natural language processing at 43Things to build networks of goals. The goal is to automatically extract the steps required to solve goals by analysing existing stories:
Day 4 (Final day of conference)
“A Code Reuse Interface for Non Programmer Middle School Students” was interesting, they’re using a visual programming environment where non-programmers create animated sequences. Animations can by copy/pasted between stories so the underlying code segments can be re-used. The goal is to teach non-programmers to re-use and improve existing code.
The topics covered were varied (some far too far from my own interests) but many contained interesting ideas – the real gold for me has been in the meeting of experts in the various fields I’m interested in. The organisers certainly did a fine job – the food was rather excellent, the service great and everything ran to time. Overall this has been a very good conference.
Update – there’s a nice slide version of the conference as IUI 2010: An Informal Summary.
Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and in his Mor Consulting, sign-up for Data Science tutorials in London. He also founded the image and text annotation API Annotate.io, lives in London and is a consumer of fine coffees.