DCU Final Year Project: February 2008

Thursday 21 February 2008

21/02/08 - Progress Report

The Schecdule, as outlined in the Functional Specification, is something like this:

Receive required hardware: October 2007 - January 2008
Interface sensors to the XBee: November 2007 - January 2008
Construct headset unit: December 2007 - March 2008
Daemon/components to gather sensor data: January 2008 - March 2008
Write audio feedback daemon: November 2007 - February 2008
Message router/state machine: November 2007 - March 2008
Demonstration applications: March 2008 - April 2008
Improvements and tuning: April 2008
System documentation: April 2008 - May 2008

The current state of each of those entries is as follows:

Receive required hardware: Originally planned hardware has arrived. Waiting on new hardware now (soundcard & wireless headphones).
Interface sensors to the XBee: Graham has completed this, see his blog for details.
Construct headset unit: Graham has completed this, see his blog for details.
Daemon/components to gather sensor data: Graham has completed this, see his blog for details.
Write audio feedback daemon: I did this in January and February. More details below.
Message router/state machine: I wrote this in November and December. It's approx. 85% complete, but will need some editing to reflect design changes made to other aspects. There are more details in this post.
Demonstration applications: The original intention was to write the demo applications when the framework is complete - obviously, we have found that writing them as we progress, in order to test each new component, is much better. The first demo application, described in posts here and here, has been written (February) and will continue to evolve over the next week or so. This puts us ahead of schedule in this area.
Improvements and tuning: We have both been working to improve our work as we think of ways (Graham has been tuning the hardware in an attempt to get a more accurate reading, I have been mostly researching methods of producing better 3D audio output in FMOD), though the real work in this area will begin once the framework is fully constructed and tested.
System documentation: No work has been done in this area yet.

From the above, we can see that currently, we are a little ahead of schedule! Good stuff.

Details on the audio component:
I started looking into this in mid to late December. At that point, I was still trying to decide which audio API to use. The choice was between OpenAL and FMOD.
Back then, I was attempting to write this component in Python, since it is a much cleaner and nicer language to program in, however, I had no success getting the OpenAL or the FMOD bindings for Python to work.
Around Christmas, with still no success getting either of these API's to work in Python (and no luck finding any other audio libraries for Python, which could produce 3D audio), I decided to write this component in C++ instead. I briefly thought about writing my own Python bindings, but ultimately it didn't seem worth the extra work.
At the beginning of January, before the exams, I wrote a test program in C++, using OpenAL. It worked, but the code was not very nice and the program was very brittle and dificult to manage. This is when I decided to use FMOD instead - which has a higher level API, a superior featureset and is very well supported.
No further work went into this until after the exams, but ever since I started writing the FMOD version, at the beginning of February, I have been constantly tuning the program and adding new features. When the first demo application was working (both the hardware and the GUI app), the audio component supported only a single sound source, which could be moved around 3D space using text commands over TCP/IP. The listener could also be moved and rotated to face any direction.
Within the next week, I added support for multiple sound sources (which could be created and destroyed at runtime, again, using the text commands over TCP/IP). Besides some minor tuning (mainly cleaning up the code), this is the current state of the program. Currently, it consists of 462 lines of C++ and I see this growing to over 1K before the program is fully complete.

Wednesday 20 February 2008

20/02/08 - Research into sound cards

Yesterday and today, I was researching exactly which sound cards have hardware accelerated HRTF support (and making sure that FMOD actually supported it). In the process, I came accross a project at Carnegie Mellon University which aims to do some very interesting things with 3D sound. The basic idea is (taken verbatim from their website):

The goal of our project is to demonstrate that audio can successfully be the primary element of interactive entertainment. Through the use of 3D trackers, headphones, and a game engine, we plan to create an immersive experience that is unique and demonstrates both the creative potential and emotional power of an audio experience.

Sounds kinda familiar ;-) Definately something to keep an eye on anyway!

While looking for a soundcard with good 3D support (specifically, for use with FMOD), as well as any other information on 3D sound, or methods of improving the sound quality, I've been digging through a number of forum and blog posts, especially on the FMOD support forums.

This thread on the FMOD forums shows that, from the posters personal experience, the Creative X-Fi XtremeMusic soundcards perform exceptionally better than the older Creative Audigy soundcards.
This article talks about 3D sound for games and has a lot of useful information regarding soundcard technology, HRTF, sound occlusion/obstruction, volumetric sound sources and different audio APIS (FMOD included).
Wikipedia article on the Creative X-Fi cards. According to the article, the Creative X-Fi Xtreme Audio cards do not contain the new EMU20K1 chip, meaning it does not support 3D sound in hardware! (Citations: here and here).
Thread on the FMOD forums outlining the recommended startup sequence. Not related to any specific soundcard, but covers some potential problems and solutions - this may become an issue when switching from software mixed 3D sounds to hardware sound.

All of the soundcards advertise CMSS-3D, but according to the research above, some cards (the X-Fi Xtreme Audio) emulate it in the driver instead - something we definitely want to avoid!
The Xtreme Gamer soundcard (besides being marketed as a gaming soundcard) seems to be the cheapest of the X-Fi soundcards that has the features we need. The Xtreme Gamer Fatal1ty Pro soundcard is the next one up, which potentially has better performance and/or quality (though not guaranteed!).

So heres the summary:

The Creative X-Fi Xtreme Gamer soundcard (komplett.ie product page) - €89
The Createive X-Fi Xtreme Gamer Fatal1ty Pro soundcard (komplett.ie product page) - €129

I would, of course, prefer the Fatal1ty Pro, since it is potentially better, but as this is not guaranteed and it costs €40 more, it would probably be best getting the cheaper one (which seems like a capable card anyway). If I were paying for it myself, I'd probably get the other one though - I never have been that good at managing money when it comes to shiney toys hahaha!

Well, thats enough research into soundcards, I think. I certainly can't wait to try the headset with improved 3D sound, when the soundcard arrives!

Monday 18 February 2008

15/02/08 - Meeting with supervisor

Today myself and Graham had a meeting with our supervisor, Prof. Alan Smeaton, in order to show him what we have done so far and discuss the direction of our project. We also used him as a guinea pig to test our demo app - he was able to localize the sounds very quickly, so this proves that the sound localization works!
Alan had some good news for us too - DCU will be installing their Ubisense soon!! We hope that we will already have our Ubisense component working, before this is done, so that we can begin testing our framework with the Ubisense as soon as it is installed.
He also told us about a project which Donal Fitzpatrick and his postdocs are working on: to develop a virtual white cane, which would use tactile feedback to simulate an environment as it would appear to someone using such a cane. Our project has great potential for a future tie-in with that project, as we are not relying on anything visual in our project and will already have location tracking (which that project will also require) implemented and we will be using various sensors (ultrasonic, compass, accelerometer/gyroscope) which could be useful to theirs. Also, our audio feedback system may be useful to their project too! It's always nice to discover new uses for a project - that may not have been thought of at the start!

For the near future, we intend on meeting with Donal Fitzpatrick and gain some valuable feedback on how we could improve our system as it could be used for a visual-less navigation system and also which sounds we should concentrate on for the greatest effect. Also, as Alan has kindly agreed to obtain a sound card with HRTF support, we hope to be able to significantly improve the quality of the 3D sound localization!

Some other things which we will work on over the coming weeks are to modify the GUI for the demo app and measure the users performance and generate statistics about how well people can localize sounds, how quickly they adapt and learn and so forth. This would be a valuable tool for the testing of our framework (providing some much-needed metrics of how successful the 3D sound system is) as well as a tool for further research into which sounds work best.

Once the Ubisense system is up and running, we will begin incorporating the Ultrasonic sensor into the project so that we can detect physical objects and begin working on a means of signaling the objects existence to the user using audio.

Once all of the above is complete, it will be time to solely focus on the framework aspect of the project: integrating everything we have so far into the message router application (by tuning the command set each component understands, having them connect through the router, instead of directly as we are currently doing and writing driver python scripts) and working on a flexible but intuitive API for the applications. Also, I want to start developing a graphical configuration tool (as explained in a previous entry) which would allow the user to specify the routing behavior through a visual tool. We will also need to spend some time testing the framework for usability and also develop a number of demo apps.

All in all, the project has been moving along quite successfully, with no show stopper problems (at least, none that we couldn't work around - thanks to UCD allowing us to use their Ubisense). the coming weeks look to become exceptionaly busy!

13/02/08 - Trip to UCD

Today both myself and Graham went to UCD to meet with Dr. Lorcan Coyle regarding Ubisense.
He introduced us to some of the other researchers, who are working with the Ubisense system there and showed us some of the applications they have written for use with the Ubisense. Although they were impressive, the applications did highlight some of the Ubisense systems fundamental shortcomings: that it really isn't as accurate as advertised (in an ideal environment, a highly callibrated system would be accurate up to approx. 15cm, in reality though, in a normal environment, it is accurate to about a meter and is quite jumpy in movement). It is still accurate enough for our purposes, though we will need to play around with it to see can we eliminate or smoothen the jumpiness, as it could cause problems.

We will be returning to UCD next week in order to begin coding the Ubisense component for our framework.

Besides this, we have been looking into soundcards which support hardware accelerated HRTF, in the hope that we could get one to improve the quality of the 3D sound. Initial research shows that the high end Creative X-Fi cards support this, but not the low end ones. OVer the next week, I plan to choose a soundcard with HRTF support (basically, a price features balance - but since we are only interested in HRTF and don't care about such things as MIDI and so forth, this shouldn't be a problem - though, if I had the money, I would love on of those cards!).

08/02/08 - Improvements to the demo app

Over the past few days, I have worked to improve our first demonstration application.

Firstly, our original version, though it worked, had some problems. The compass module and 3D sound aren't as accurate as we would like. We spent a few hours working on the calibration of the compass module and we have it acceptably accurate, but the 3D sound could possibly be improved through more advanced use of HRTF. I will be looking into this over the coming days.

I have written a GUI tool (in Python) which allows us to place sounds around the listener by simply clicking on the GUI, select which sound is audible and cycle through them. It also automatically cycles through the sounds if you simply look at the current sound (that is, you look in the direction which the sound appears to be coming from) for five seconds.

The green dot represents the position of the listener (currently this is static, but when we have the Ubisense working, it should move around as the user walks around). The black line represents the direction the listener is facing and is driven by the digital compass.

The grey dots represent sounds previous to the current one (in the list of sounds), the blue dots sounds that are yet to come in the list and the red dot is the currently audible sound. Over the coming weeks, I intend to extend this program (and the FMOD audio daemon) to allow for multiple different sounds to be playing in different positions at once as well as allowing the Ubisense to drive the position of the listener.

Myself and Graham have been in contact with Dr. Lorcan Coyle from the UCD Complex & Adaptive Systems Lab in regards to using their Ubisense system and he has confirmed that it is ok for us to do so. Myself and Graham will be visiting him in UCD next wednesday (the 13th).

04/02/08 - First Demo App!

Myself and Graham (Graham on the hardware and software communicating side hardware and myself with the audio) have now managed our first major success!
We can now play a sound in 3D but without turning your head effecting the position of the sound.
Why is this significant? Well, if a sound is playing at a certain location in 3D space, then it is playing relative to the forward direction in which the headphones are facing. That is, if the sound is coming directly from the right, it is playing only on the right hand speaker of the headphones. But if I now turn 90 degrees, it is still playing on the right speaker and the position of the sound has changed! This is obviously not what we want.
Using the digital compass, we can now detect the direction in which the user is facing and then adjust the forward vector of the listener in the FMOD 3D audio API, so that the 3D sound position gets recalculated to take the direction in which the user is facing into account.

And the best part is... IT WORKS!

Take a look at Grahams photo of the headset!

We will now be spending a day or two testing this on the lab guinea pigs (err, I mean, people passing through the labs) and also testing different sounds to determine which sounds are easiest to localize in 3D (some sounds appear to be easier to localize than others).

After this, the next step is to combine this with the actual position of the user. That is, adding Ubisense support to the system. But.. problem: DCU's Ubisense system has not yet been installed and could possibly take another significant amount of time.
Luckily, myself and Graham have some contact with people in UCD, who have had a Ubisense system installed for a few years now. Alan suggested we get in touch with Dr. Lorcan Coyle from the UCD Complex & Adaptive Systems Lab (Myself and Graham also know Lorcan from the Odysseus program, which both of us participated in, in the summer of 2006).

01/02/08 - 3D Sound

Since the last entry, before Christmas, many things have happened.
First of all, all of the hardware components we have ordered have arrived and Graham has constructed the current head unit (see his blog for lots of photos). Also (besides exams :-/) I have the audio feedback system implemented (or, at least, the first version).

Before I move on to discuss this, I will talk a little about 3D sound.

The idea is that, through the attenuation, panning and modification of a sounds frequency, sounds can be made to appear to be positioned in 3D space, even though they are merely generated by a pair of stereo headphones. Crude models of 3D sound use only volume and panning to simulate a sounds positional properties. This works for simple left/right localization of sound, but it is extremely difficult (if even possible) for a person to differentiate between sounds coming from in front or behind. Up and down are also exceptionally difficult (or even impossible) to simulate using only volume and panning. FMOD supports this model of 3D sound, but also has a crude software implementation of HRTF, which changes the sounds frequency when the sound is behind the listener, to dampen the sound. This significantly improves the quality of the 3D-ness of the sound.

As an introduction to 3D sound, a number of 3D soundscape audio clips can be found on youtube. Two that I liked are found below (you will need a pair of stereo headphones for the full effect). This demonstrates the goal, in regards to audio, which myself and Graham are hoping to achieve.

Virtual barber shop:

Various 3D sounds:

The HRTF is a powerful means of simulating sounds as they would be heard by humans, if they (the sounds) were positioned in 3D around the user. The basic idea is to alter the sounds frequencies to simulate the sound waves reflecting and refracting off and around the listeners head and into the ear.
The angle at which the sound enters the ear and the speed of hearing the reflected sounds causes a change in the frequency of the sounds actually heard. These changes are used by the brain to position the sound in 3D space.
As everyones heads and ears are different, the HRTF will never be able to simulate 3D sound exactly, but it can come close enough to generate convincing 3D audio (similar to the youtube videos above, though they use binaural recording, which produces vastly superior results, but doesn't work with dynamic sounds, like we require).
3D sound using HRTF is important for high quality sound localization, but simulating 3D sound through HRTF is computationally expensive and not feasible to do in software in a realtime system. Luckily, some mid- and high-end soundcards (Creative X-Fi, for example) support hardware HRTF, which should vastly improve the quality of our 3D sound.

Some resources on 3D sound:

Localization of sound sources (Michigan State University)
Audio and 3D sound links
FMOD Website
Blog Entry about 3D Sound, HRTF and Sound Cards

21/12/07 - Completion of the router

Since the last entry, I have almost completed the routing application. The only things left to do are to determine what the commands that the program accepts are. Doing this will need to wait until after the rest of the system is in working order.

Basically, what works is:

Components can connect to the router and identify themselves.
A python script can be loaded for each component, effectively acting as a driver for that component. This is used to allow the router to handle component specific commands.
The applications can be implemented as a python script which is loaded and run by the router in an event driven fashion (the application registers commands and events in which it is interested and the router will notify the application when these occur).
The state machine and state transition files are fully working.

I have also started looking at both OpenAL and FMOD, in preparation for writing the audio feedback daemon. It looks like FMOD is better supported, easier to use and more feature rich than OpenAL is, so I will use that to implement the audio server.
I have also been looking into HRTF and other means of improving the sound localization. More work will need to be done on this when the audio component is capable of generating 3D positional sound.

04/12/07 - Functional Specification

Today we submitted our functional specification, which can be obtained here. Last week, we met with our supervisor, Alan Smeaton, to discuss the direction of our project, as described in the functional specification.

Graham also received some more of the hardware components and has started constructing the headset. In the meantime, I am working on the central router application.

The image shows how my router program will interact with the rest of the system. Put simply, each component will, at some point, pass it's data to the router program. The application can then specify how it wants to route the data through the system (should the users position be passed to the audio system? Will the ultrasonic data generate sound? etc). Eventually, I intend on writing a graphical tool where you simply drag and drop components and connect them together with lines to route the information between the components as this is, in my opinion, the most user friendly way to configure the system.

As well as the routing capabilities, I have created a simple state machine. A configuration file, containing a list of state transitions, can be loaded by the application and events triggered (or even routing of components) would depend on the current state. Changing state when the user is within a certain radius of specified locations is something else planned for this aspect of the program, as we foresee that (location based state changes) to be an integral part of a large number of potential applications. For example, the waypoint demonstration application could be almost entirely implemented using only a number of state transitions.

The state transition configuration file is described in more detail in the functional specification. The basic layout is a list of entries (one per line), in the following syntax:

current_state x y z radius next_state

This means that a transition occurs from state current_state to next_state if the person is within radius of the position (x, y, z). To create the waypoint application, a state transition file would simply be set up to contain a sequential list of state transitions.

The application also gets notified when a state transition occurs, so that it can perform some extra action when it happens.

09/11/07 - Architecture and first hardware.

The previous few weeks (yes, all of the next few blog entries will be back dated - since I haven't been adding them to my blog as I went along, something which I will have to do from now on) we have been ordering the required hardware components, researching 3D sound localization and designing the frameworks general architecture.

So far, we have received the ultrasonic sensor and the XBee modules. Graham has some great pictures in his blog. Like this one of us testing the ultrasonic sensor. We have also ordered more parts (digital compass, accelerometer) and Graham will begin constructing the headset as soon as they arrive. The XBee module will be used solely as a wireless serial connection between the headset and host computer.

The frameworks planned architecture:
The framework will be built in a highly modular architecture, with each component being completely self contained. The components will communicate with each other through a central message router, over TCP/IP. Any complex subsystem will be implemented in a separate component. This allows us to easily add new components or remove old ones, change how the messages are routed between them, monitor the messages or control the components through external GUI programs.

The applications can use the framework in two ways:

They can be implemented as a completely separate program (or set of programs) which communicate(s) with the framework over TCP/IP. It would, effectively, act the same as any other component in the framework and has the full power of the framework at it's disposal.
They can be implemented as an event based python script, being executed as part of the message router. How flexible or powerful this method is will depend on the API developed, but an interface to send and receive commands to/from the router (and therefore to and from the rest of the framework) will exist, making it almost as powerful as the alternative option. This method would be able to make full use of the routers integrated state machine (more on this in a later post) and is highly event based, making this method much easier to develop applications with. Python is also a very convenient high level language.

Option 2 would be the preferred method of developing applications, but it may also be useful (especially for the development of tools and new, additional components) to allow for option 1.

Introduction

Though I have been keeping track of progress, I haven't put it in my blog before now, so adding everything to the blog is in order. I guess I have a natural aversion for keeping blogs... Oh well, time to get over it.. :-/

I guess an introduction to the project is a good a place to start as any.

My (and Graham's) project is to build an SDK or framework which would allow users to easily build Augmented Reality applications.
This means we are developing hardware and software components which can be used both together or in isolation (depending on the application being built) to handle some aspect of an Augmented Reality application, tools to configure and control these components and an API to build custom components which interact with the "stock" components we are developing. We will also be developing some sample applications to demonstrate the use of our framework.

The hardware/software components collectively would allow us to create virtual environments, inside a real physical space. The sensors would provide the system with a stream of input which would then be processed in some application-dependent way to produce audio feedback back to the user.

The Ubisense tags would allow the framework to know where in the physical environment the user is.
The digital compass and accelerometer will let the framework know which direction the user is facing. The importance of this will be discussed in a later post.
The ultrasonic sensor would act to detect physical objects which may be in the persons way.
The wireless headphones would provide the user with audio feedback. The audio will be 3D sound(see also here), generated with the FMOD audio API.

The planned components are:

A headset consisting of a number of sensors (digital compass, accelerometer, ultrasonic, Ubisense tag) and a pair of wireless headphones. This will be the basic interface through which an end user would interact with the applications developed using this framework.
Software components to match the various hardware components. To keep the design of both the software and the hardware modular, instead of controlling the hardware through a single monolithic piece of software, each distinct hardware component will receive it's own software daemon to monitor and control it. Roughly speaking, this means there will be a software component for handling the digital compass, the ultrasonic sensor, ubisense and 3D sound generation.
A central hub which controls the various components. This program would act as a router between all the other parts of the framework and allow for a central place to configure and manage how components are to interact.
Monitoring and management tools. There should be a set of generic tools to monitor the state of the system at any given time as well as to manage (and possible recalibrate?) the system through a GUI. These would register themselves with the routing application to receive the commands which they are monitoring.

Each component will interact through TCP/IP, allowing the framework to be restructured simply through the routing application, at runtime. This also allows for the possibility of running different parts of the system on different computers. This could be useful to spread computing intensive simulations out over a number of computers for better performance.

Some ideas for demonstration applications:

A sound localization test program which would play sounds in a number of different locations and test whether the user can determine "where" in the virtual space the sound is coming from, possibly by simply looking at it for a number of seconds.
A simple waypoint-based navigation system where the user must navigate through a set of waypoints using only audio feedback to navigate.

We (myself and Graham) plan on implementing a number of demonstration applications. More will be posted on them as we begin working on them.

The project also has some interesting potential future uses:

Helping blind people navigate
Improving the audio aspect of augmented reality (there has been a lot of work done in mixing the real and virtual visually, but 3D audio has not been explored as much as it should be)
Augmented Reality computer games
Since we are only suing audio feedback (and nothing significantly visual), perhaps this could be developed into a set of computer games for blind people?
It can be combined with traditional Augmented Reality (for example, by adding a head mounted display) and perhaps creating a more realistic and immersive (audio-wise) version of ARQuake

DCU Final Year Project