DCU Final Year Project: 2008

Wednesday 28 May 2008

</project>

And it's all over!
I had the DCU Champagne Breakfast on Monday and myself and Graham got a nice photo with Bertie in the Irish Independent and the Metro. As Borat would say: GREAT SUCCESS.
Today I had the project demonstration and I'm fairly confident that I did well in that. I was able to talk about what we did and answer all the questions fairly easily. Then again, why wouldn't I? I did spend months working on this!

Since I'm too tired to write anything now, I will update the blog again in a few days to tie up loose ends. Also, I will put the technical spec online.

Sunday 4 May 2008

04/05/08 - The end is near

So this is it(ok, last friday was it, but I needed time to recover, heh). The project's code has been submitted alongside the (rather lengthy, in our case) technical specification. The only thing left now is the demonstrations. I will post a final message to this blog after the demonstrations are over. I will also fill in the details I promised in previous posts, but never got around to. Updating this blog is slow going, what with all the assignments over previous weeks and now exams (and the fact that I got a fairly crap result for this blog, which is somewhat disappointing since I have compared it to blogs which got significantly better grades and I have more content than some of those, as well as being updated more. Doesn't exactly help motivation). Stress!

So now that the development aspect is over (details of development can be found elsewhere on this blog, architectural details, manuals and notes can be found in the technical spec and user manual), what have I learnt? What would I do differently next time?

As always, I firmly believe that some up-front design is necessary, as without it, the implementation direction is too unstructured and things will eventually go wrong. However, it is also extremely important to be prepared for changes. Requirements will always change. Nothing in software development is static and this porject was probably more dynamic than most, as it was a research project just as much as it was a project to develop an end product. As new things were learnt or descovered, the design was impacted. For this reason, I am a firm believer that agile development methodologies are a necessity of dynamic software development. Unfortunately, I admit that I did not follow a well-defined agile development plan for this project. Lesson learnt.

I also learnt the importance of a flexible decentralised communication model between components. In this project, we achieved this by seperating each component out into an (almost standalone) component which communicates with other components using a textual commandset over TCP/IP. This gave us enormous flexability to add/remove components as we saw necessary, to redirect commands, intercept or monitor them and to distribute the framnework accross a number of computers - that last capability alone made this design decision worthwhile, especially when we realised that AudioD would need to run on Microsoft Windows to take advantage of hardware accelerated HRTF, while the rest of the framework was designed to be run on linux.

Given those two big lessons learnt, what would I do differently, were I given the oportunity to do this project again, from scratch?

There are a number of things which could be done to vastly improve the quality of this project, but which were out of our reach because of budget and time contraints. Most importantly, higher quality sensors could be bought (we used the absolute cheapest which we could find that would do the job). A lot of accuracy is lost because of the sensors and this accumulates to produce a significant drop in quality. Not nearly enough to render the framework useless, but enough to make it noticeable. I would love to see this framework as it would be with high sensors!

Time and cost aside though, since that is not something we could have done differently, what other improvements could have been done, knowing what I kow now?

Probably the biggest change I would make is the overall architecture. It would still be component based. The components would still communicate over TCP/IP. Those were good design decisions and not ones I would want changed. I would, however, define a standardised communications protocol to be used throughout the entire system. I would also implement a software library which manages, not only the networking and threading aspects (Graham more or less did this in his C code), but also the protocol and commandset.
That is, every component would contain a number of common commands used for the overall management, configuration and querying of the components. This library would also contain a parsing system, which would parse the commands and pass them to the correct parts of the components. This would allow the components to focus entirely on the actual application logic, instead of dealing with maintenance tasks.
The common commandset would consist of commands to:

Configure which ports are used.
Terminate, reset or restart the component.
Query the component for connection information (how many clients are connected).
Query the component for it's current status.
Query the component for a commandlist.
Register a monitor or intercept callback (the component would forward all of its input or output (as requested) to the component making the request. This would either be done asynchronously to processing the input/sending the output to it's destination, or it would wait for an "allow" command - this would allow other components to monitor or intecept a components commands, either as part of an external tool, debugging, or to implement some crazy complex features).

I would also stick to the original design plan - but modify it a little. The reason why I changed focus was time constraints. I felt I did not have the time to follow through with the originally planned architecture of having the message router/state machine application (codenamed Seadog) at the center of the framework and connecting external tools and components to/through it. Instead I merged the existing Seadog code with ASEDIT, because I felt I did not have the time to expose all of the required features to ASEDIT, as they ended up needing to be very tightly coupled.
This was both a good move and a bad one: good because it allowed me to get more needed ASEDIT features implemented in a shorter time and bad because it poluted ASEDIT's software architecture into a somewhat hackish state, while detracting from the frameworks overall flexability (it could be regained, by creating a custom component, but then you'd be simply reimplementing Seadog).
To overcome the problems Seadog posed, I would make use of the commandset described above - if such a common interface existed, then a lot of the time contraint issues with maintaining a seperate core component from the editors which require it's features would be alleviated. This would bring me to the final changes:

The centralised message routing component - unl;ike seadog, would not really be a message router as such, but rather a node for a more structured architecture than the pure TCP/IP architecture used in the rest of the components. By this I mean that it would not be a central component through which all messages must flow (so it can route them accordingly), but rather it would be a component which acts as a gateway to a higher level interface: one written enitrely in python and using serialised python objects as the communication protocol, instead of simple text commands.
This would mean that an ASEDIT-like editor could just as easily be developed as a sub-component of Seadog without the issues which I faced when I tried doing so. This also means that a simple means of extending the framework or building applications in a higher level than building "raw" components would exist.
A python library would, of course, need to be written to encapsulate the common maintenance code envolved in this archtecture into a simple API.

Using this, I would build a number of applications, which I feel would make this framework infinitely better and more useful:

An ASEDIT-like editor for deifining the environment and placing sound sources around.
A graphical configuration tool, which allowed you to "drag and drop" components and connect them together visually - the common configuration commandset would then be leveraged to reconfigure the framework on the fly. This was actually something myself and Graham talked about at great length, but it was dropped because of its complexity (though if the interface had allowed a simpler means of doing so... most of Grahams code actually does support this, to a degree..)
A graphical drag-and-drop editor for creating applications (using custom written python code) which could then be saved and reloaded.

This would make the framework a lot simpler to use and potentially more powerful and useful, for those same reason. As I'm wiritng this, I realise that I would not be able to implement all of those things in the given time, so I would probably drop the configuration tool and application builder. They would be damned cool to have, but the rest is more important as it builds the infastructure required for everything else. Featurewise, it would then be more or less the same as we have now, except that it would be a lot simpler and more convenient to extend it in crazy and interesting ways.

Oh well, what could have been.. always seems simpler in hindsight!

After such a lengthy post on what I would do differently and what I wish I had done.. on to something more positive: self-evaluation of my work (well... mine and Grahams, it's much easier to talk about the framework as a whole than just specific parts, since its reasonably tightly coupled):
Overall, I am quite satisfied with this project. It does pretty much what we envisioned it to. We have the ultrasonic object detection and the pathfinding navigation working - the two demo apps which started it all (sort of). We have a powerful and extensible architecture. We have an interesting mix of (custom) hardware and software. We got to play with interesting technologies (ubisense, 3d audio, sensors). We have a nice mix of programming languages, interacting nicely (Synergy! hah). We have our system distributed over a number of computers - something which our architecture allowed us to achieve.
Overall I feel this project was a huge huge success! The sheer quantity of work and the many interesting outcomes, I feel that this is certainly the best project I've ever worked on (and I'd compare it to others too, but I hate to brag, though I will say this: few projects show such a large amount of technologies working in flux, allow for a distributed computing model, have a reusable component architecture, are easily extended or make use of custom hardware - not all at once anyway).

I just hope that our examiners see this.. After the effort that went into this, I'd certainly not be pleased if some database driven website (which I'd implement in Django over the weekend) were to receive a better grade.. I know I'm being mean, but a lot went into this project and I don't want it to all be in vain.
At the end of the day, the only thing I will get from this is the experience (actually, that alone makes it worth it) and the grade (I hope this will make it worth it too).
Hell, I'd love if I got to do further work in this area (or another similar one), but unfortunately, I do not see this happening (who would want to finance mine or Grahams crazy ideas haha). Well, besides doing a Phd - something I intend on doing in maybe 2 years time, after I've had a chance to fix my finances a bit.

Well, I guess I'll theres another post on the way at the end of the month. So, stay tuned.

Monday 28 April 2008

22/04/08 - Showing our project to Donal

We finally met up with Donal Fitzpatrick and gave him a little demonstration. He gave us lots of positive feedback, which was certainly encouraging, since this is, to a degree, his area of work.
We had a nice chat about the project and what could be done with it before heading up to take a look at his own works in haptics, something I had not been exposed to before. It was certainly very cool and definitely something I will need to watch closely in the near future.

Placeholder

Placeholder post which I will edit when I get time, to add in details of other things that have happened...

Friday 11 April 2008

11/04/08 - Great Success

Yeah, it's been a while since my last post.. but I said I'd keep updating, so here we are.

Since my last post, I've been working on ASEDIT and meeting with Graham in the Ubisense area in DCU so that we can test everything. The plan was to take a load of measurements and metrics, but... there were problems, as always!
The problem that I spent the most time on was the "region editor" mode of ASEDIT. This mode alows the user to draw a shape which represents the Ubisense equipped region of a room - the region where the application is to run - and to assign Ubisense coordinates to each corner. The idea was that the coordinates from Ubisense could be converted to screen coordinates, so that I can draw the users location accurately within the shape representing the room. Also, when a sound source is placed in the room, the position of the sound source needs to be converted from screen coordinates to Ubisense coordinates.
Well.. I thought this would be straightforward enough, but after AGES of struggling I gave up and ecided on a simpler version - instead of drawing a potentially odd shape, you can now only draw a rectangle and also, it is now guranteed that the corners are in a certain order (eg, corner 1 is always the top-lefthand corner). Eventually I got it working. Mostly, I think I was making little stupid mistakes, but this monring I managed to get it working pretty much perfectly!

Now you can use the editor to place sounds in the room and watch the user move around. It all works.
Myself and Graham tested it and we got one of the other guys to test it too, as he was passing by. All three of us were able to successfully locate the position of the sound and walk up to it resonably easily. Success! What does this mean? That our project works! Well, we knew that already, but now theres no doubts ;-)

Here is the coordinate system conversion code I wrote (to any Python programmer out there, I know it's horrible Python code, not all my code is like this!):

def _convert(self, x, y, to_screen):

    ubi = self.get_ubisense_coords_cb()
    scr = self._region_corners

    u_left = ubi[0][0]
    u_right = ubi[1][0]
    u_top = ubi[0][1]
    u_bottom = ubi[3][1]
    
    s_left = scr[0][0]
    s_right = scr[1][0]
    s_top = scr[0][1]
    s_bottom = scr[3][1]
    
    left = x - u_left
    top = y - u_top
    sel_a, sel_b = max, min
    if to_screen:
        offset_x, offset_y = s_left, s_top
    else:
        offset_x, offset_y = u_left, u_top
        sel_a, sel_b = sel_b, sel_a
        left = x - s_left
        top = y - s_top
    
    if u_top > u_bottom:
        if to_screen:
            top = u_top - y
        else:
            offset_y *= -1
        u_top, u_bottom = u_bottom, u_top
    if u_left > u_right:
        if to_screen:
            left = u_left - x
        else:
            offset_x *= -1
        u_left, u_right = u_right, u_left
    
    u_width = u_right - u_left
    u_height = u_bottom - u_top
    
    s_width = s_right - s_left
    s_height = s_bottom - s_top
    
    x_ratio = sel_a([u_width, s_width]) / sel_b([u_width, s_width])
    y_ratio = sel_a([u_height, s_height]) / sel_b([u_height, s_height])
    
    nx = left * x_ratio
    ny = top * y_ratio
    
    return (abs(nx + offset_x), abs(ny + offset_y))

If to_screen is True, then the x and y are assumed to represent a point in Ubisense space and the function will return them in screen space, otherwise a point in screen space is assumed and it is returned in Ubisense space.

Other things have happened too - I'll write an entry about them later.

Sunday 30 March 2008

30/03/08 - Farewell!

I just took a brief glance at Grahams last post and I realised that this is the last time the blog is checked for marking purposes!? That I know of, at least. So I figured, I'd best say a few last words!

Firstly, these won't really be a few last words, I fully intend (even thoguh I hate blogging) to keep posting updates here until the project is fully completed (ie, until we have to hand in everything). So, should anyone actually be interested, please feel free to continue reading!
I completely forgot to mention that we decided my application editor should be called ASEDIT (short for Audio Spatial EDITor). It makes sense to us nayway ;-)
THANKS VERY MUCH to anyone who was reading my blog purely for the interest in the project! I definitely appreciate it. If you would like to drop me some comments, feedback, criticism or anything else, please please leave a comment.
As Graham said in his post - words, videos and images are not really enough to describe this project. It needs to be experienced. So, if any readers want a chance to be one of the guinea pigs (that is, if you want a go), please let either myself or Graham know. We need a good few people to test this and to tell us how well it works! Leave a comment or drop me an email (d kereten AT g mail DOT com).
Both myself and Graham see this project as a great success. It was supposed to be a research project, to develop a framework to ease the development of audio and location based augmented reality applications. I believe we have managed to do just that. In fact, I hope by the end of April, that we will have surpassed that. I hope the people marking our project can see that too, if not, please get in touch, so that we may take action to fix any potential problems (read: aspects which would cause a loss of marks). Thank you very much!
I feel I learn't a lot through working on this project. If nothing else, I think that the learning experience made it all worthwhile.
The system we built is crude compared to what it could have been. The hardware we used was the cheapest we could find that would do the job. The software was (or, should I say, still is) written to give suficient results before the deadline. Unfortunately this means we were a little lax in certain areas: no unit testing, for example. Basically, there are a huge number of improvements which could be done. I feel myself and Graham could, with sufficient time and money, design and develop a relaistically usable augmented reality platform. There is so much more we wanted to add, but had to scrap due to time/money constraints.

All in all, I enjoyed working on this project and feel it was an astounding success! I hope to be able to post more success stories over the last few weeks of the project. Check back in a few days, more is bound to have happened!

30/03/08 - Framework and tools.

Admittedly, I posted the previous post five minutes ago, though it should have been posted on Friday. I got home late on Friday, so never got a chance and yesterday... yeah. Anyway, on to todays real post:

I didn't really do much since Friday, mainly because of time constraints - unfortunately I have a lot lot more needing doing than just the project.. I did, however, get to plan out the featureset for the application editor some more and have also decided upon an architecture for the integration of the application editor and the message router/state machine.
I have decided to merge both programs. If I had more time for the project, I would probably keep them seperate, as that allows for greater flexability in the long run, but since the deadline is fast approaching, I decided it was more important to simply get it running. By merging them, it also makes it a lot easier to expose functionality between the two - because they are now one and the same! The application editor also benefits from already having the Twisted networking code implemented in the message router and it now has direct access to the state machine. In the morning, I hope to merge the two and refactor the resulting code to ensure there is no decrease in code quality. I expect this program to be an integral part of the framework, not only being the central point for users to access the frameworks built-in functionality through a grpahical editor, routing messages between the various daemons or haandling states and state transitions, but also providing a platform for python code to access the frameworks features for custom coded applications, which are not possible through the editor alone - and these applications should be able to add tools to the editor, for convenient access!
Hopefully I can manage to complete the code merger in a single day, in which case there should be another post here tomorrow. This will leave me to implement features on tuesday and wednesday. The goal now is that any core functionality is complete for Thursday, so that we can then build a few simple demonstration applications. I hope the demo apps will prove the framework and application editor to be intuitive and easy to work with. These demo apps can then be also used for our guinea pig testing :-)

If the rest of the project was a success so far, then the demo apps should be trivial to implement. After all, that is the whole point of implementing a framework.

A quick look at the calendar and schedule shows that we are still on track, though care must now be taken to ensure we don't fall behind. We need to have the demonstration applications complete within about a week and a half, meaning the framework needs to be in a useful state before then. That leaves us with a week or two for testing and evaluation and a week for documentation. The schedule should really be revised, since I don't think we are able to do antyhing after April. As it stands though, I believe we are about where we should be, perhaps a week behind, but nothing we cannot recover from.
At the start, most of the work was done by Graham, since there was little that could be done without the hardware and code that interfaced with said hardware. Now that it's mostly complete, it's my turn to have more work to do. So, now Graham is mostly testing and debugging, while I'm writing a bucketload of code. Almost IBM standards - millions of KLOCS! Ok, maybe not, but I wrote more code last week than I have any other week since the start of the project!

28/03/08 - Meeting with supervisor

Today we had a meeting with Alan to see if we are on track for the second milestone. For the meeting, we prepared a small demo, which showed him all the completed hardware working together with the application editor so that he could see the test environment we intend to use over coming weeks to test and evaluate our project. I'll take a moment to talk about the application editor first.

The application editor, in it's current state, is a more powerful form of the compass test program written back in February - you can position sound sources in virtual space and the listener (now represented by a little graphic of a person, instead of simply a dot) gets positioned in such a way that it represents the position and orientation of the user, wearing the headset. While the compass test only took orientation into account, the application editor also reacts to Ubisense data. Another difference is that multiple sound sources can be active (ie, playing sound) at once and each can be playing a different sound. This is accomplished by giving each sound it's own set of properties, which can be adjusted seperately. Finally, sound sources can now be repositioned/moved after having been placed, which could not be done in the compass test.

While feature-wise, this is not a significant improvement, the infastructure is now coded to allow for more advanced (and useful) features to be implemented over coming days - per-sound source property support being the most important, but also the graphical widget for representing the environment. In the compass test code, I was simply drawing to a gtk Drawable widget, while this time I created my own custom widget, derived from both Drawable and the GtkGlExt OpenGL class. This allows me to handle events in a more localised fashion as well as drawing more complex images. Hopefully I'll have a video online showing the program in action, the still screenshots don't really do it justice. Speaking of screenshots, heres one now:

Over the next few days, I plan to scrap my current socket based networking code for Twisted code. This should make the networking aspect of the program much more robust and flexable. Currently, certain scenarios are somewhat error prone, due to the use of blocking sockets in a multithreaded GUI application (needs to be multithreaded so that a blocking socket does not stall the entire GUI, but this causes problems when a thread needs to terminate...). Twisted will solve all of these issues.
Besides the networiking code, there are a lot of features planned that still need to be implemented. Currently the program is little more than a testing tool, when complete, I hope it will actually be an application editor capable of allowing one to design and build augmented reality applications with out framework. After all, what is a framework without intuitive tools?

So, thats basically what I've been working on since my last post. Other than that, I fixed some issues witht he audio daemon - I added a command to reset the audio environment by removing all sound sources at once. This is needed so that each application can easily ensure it has a clean environment to work in. I also spent a couple of hours with Graham, testing the system, that is, wearing the headset, walking around trying to find sounds and ensuring that my application editor actually did what it was intended to. Eventually, everything was working and we could test everything together, though the system still has a few bugs, making the whole thing a little brittle. In fact, during our demonstration to alan, the application editor crashed... not good! but, it was easily restarted and everything was fine. I guess I now need to make sure that doesn't happen again!

So, yes, there are still some bugs which need to be fixed, but overall, everything seems to be working. I guess that puts us on track.

To recap, what I want to (or need to) do next is:

Debugging! I certainly don't want anything to crash during demo day.
Adding more features to the application editor. I posted a "TODO" list a few days ago, and, even though I had hoped to have it completed by the end of the week, this didn't happen.
Testing testing testing! Myself, Graham and Alan agreed that it would be beneficial to find some guinea pigs to test our project, so that we can evaluate and report on our findings, with respect to sound localisation, the usefulness of our hardware and the effectiveness of our techniques.
Documentation. It's a large project with many aspects. Everything needs to be thoroughly documented before the project deadline, which is now approaching faster than we would like.

Wednesday 26 March 2008

26/03/08 - Application Editor

Currently, I am working on an editor which will allow you to position multiple potentially moving sound sources around the listener and set a number of properties for each. This will be similar to the demo program I wrote, except that it will provide a lot more functionality.
Each sound source will have a number of associated properties which can be edited. So far the planned properties are:

Position
Sound file to play
Duration
Which state the user must be in for the sound to play
Radius
State to change to if within radius
Path along which the sound will move
Speed of movement along path

Other properties will be added as I think of them.

This should help us to not only test the hardware better, but also to develop some simple, yet rich, demo applications, such as the virtual zoo or band.

This program will be used to either drive the router/state machine program written a few months back (after it is updated, later this week), or it will be integrated into it. I have yet to decide which approach I will take. For the time being, it will be a standalone application (and will not support the state properties until I have it working together with the state machine program, using whichever method I decide upon).

Here is a screenshot of the program so far:

Nothing terribly interesting there yet, besides the basic GUI. Over the next couple of hours I will be coding a custom OpenGL-based GTK widget for editing the sound sources.

The lefthand bar will be the toolbar, containing buttons for each type of action that can be performed. The existing buttons represent (from top to bottom) editing the dimensions of the environment (ie, setting a rectangular area which representts the ubisense enabled area the application will be run in), placing and editing sound sources and the cogwheels at the bottom will run the application.

The black area in the middle is the OpenGL-enabled canvas. This is where all the magic will happen ;-)
The area to the right will contain a list of properties for the currently selected sound source.

The plan is to have the basic version of this working tonight and then to test it in DCU tomorrow morning. If all goes well, I will be adding the state properties and adding whatever else I think of tot he program over Friday and the weekend. The hope is that it will be completed on or before Monday - before, if at all possible.

Ok, best get back to work :-)

Tuesday 25 March 2008

25/03/08 - Plans for this week

I figured I may as well post my plans for this week. It will help me remember twhat I want to do and provide me with a checklist too, as well as showing whoever may be reading my blog what I intend to accomplish over the coming days.

The audio feedback component needs some more work. Before the weekend, I'd very much like for it to be more or less complete, which means I need to implement the ability to add sounds that decay over a given time, by fading in volume until it is inaudible, at which point it will be removed. The ability to reset a sounds timeout should also exist. This would allow for some interesting applications - one use for this which myself and Graham have discussed is to represent objects which were detected by the ultrasonic sensor to decay over time, as there no no easy way to determine if the object is static or moving, storing it's position indefinately would cause inaccurate results, but having it decay would provide a useful representation to the user.
I want to refactor the router application to accomodate for advances made in the project so far. It is now time that the router/state machine application can be used (and indeed, it should be quite useful now), but it's current state does not really reflect the rest of the project. I hope to have this completed and working by friday, so that it may be used to implement the demo applications we have planned.
Tools. I have a number of tools, which would be used to configure the framework and develop applications, planned. The tools are what I was hinting at in previous posts - one (or more, depending on how successful I am) of these tools will have a nice pretty drag'n'drop interface which I will write in Python and PyGTK using OpenGL. If all goes to plan, this should make using the framework much much easier than the current vodoo needed to make everything work.

I guess we will find soon enough if I'm successful or not with this little todo list... today looks to be an exceptionally busy day though, so I don't think I'll get a chance to work on the project until tomorrow :-(

Thursday 20 March 2008

20/03/08 - Neglecting to update blog

It is waaaay too easy to neglect the blog.. I really am not a blog person. GRR BLOGS...

Now that I have that out of my system, onto the real post.
Graham has posted about the hardware problems we've had, so I won't say any more about it now. Instead I'll write about the coding I've done.

Since my last post, I was still sick, so the Friday demo plan was completely ruined. After that, I ported my FMOD code to windows, since FMOD cannot use hardware acceleration under linux.. boo! This mainly just involved minor editing to the socket code to make it work under windows and removing the linux specific headers. The FMOD code itself did not need any changes, besides changing the initialization flags to tell it to use hardware acceleration. Simple. Easy. Great.
... Except there were problems. Theres always problems. Basically, on windows, FMOD only supports the C++ API when compiling with Microsoft Visual C++. I wanted to compile using MinGW, because thats what I've always used on windows and since I use GCC on linux, I figured why not use it on windows too. To cut a long story short, I downloaded Visual Studio 2008 Express and tried to compile with that. Endless hassle. If I had used Visual Studio before, I'm sure I could have got it working, but it was taking me longer than I had available time, with no success. At first it kept trying to compile into managed .NET code, but eventually I found the option to disable that so that it would produce unmanaged native code, though even then I couldn't get it compiling without problems. So, instead of spending yet more time trying to fix it, I decided instead to scrap the FMOD C++ API entirely and use the C API instead (which is supported by all compilers, since the C ABI is standard and the $*%& C++ ABI is not).
Luckily, porting FMOD from the C++ API to the C API was extremely easy and painless and now everything works. Yay.

Graham and I also ported the C# Ubisense code which Lorcan Coyle sent us to the new Ubisense 2.0 API. This wasn't difficult, but it took a while to match the old functions to the new ones, as the API seems to have been reorganized quite a bit. The documentation wasn't terribly useful in teaching you how to use the API, but it served as a good reference and the Ubisense code works now too. Success!

I have also been working on updating my demo app to allow for multiple sound sources to be playing at once and also to allow the listener to move around - controlled by Ubisense. This will allow us to test the complete system, once the hardware issues have been overcome.

Finally, I have been playing with GtkGLExt in PyGTK, so that I can create a GUI using Python and PyGTK (well, I already can and have done - the demo app for example), but also allowing me to draw onto GTK widgets using OpenGL. This will be useful for some tools I am slowly working on, sicne they have a visual component which would be a lot easier to implement (and prettier) using OpenGL rather than GTK's native drawing functions. I should be ready to post about the tools about a week from now.

So, now our project contains some C, C++, PIC assembly, PIC BASIC, C# and Python code. Interesting how a nice TCP/IP-based modular design allows a nice mix of languages ;-)

Wednesday 5 March 2008

05/03/08 - Quick Update

Ok. It has been a while since my last post. Last week I was away, attending the Irish Web Technology Conference (thanks to the folks over at Python Ireland for the free ticket) and this week I've been sick.. Hopefully tomorrow I'll be better so as to get back to work, since our hardware has arrived. That leads me on to the first part of my update:

Hardware has arrived
As mentioned in a previous post, Alan has been kind enough to purchase a soundcard and wireless headphones for us. They arrived sometime yesterday evening, so we are keen to test them out. Graham has already modified our headset to incorporate the wireless headphones.

Ubisense is ready
On a related note, the ubisense is finally ready and we have received a ubisense tag from Kirk Zhang, one of the techies here in the CDVP. Graham emailed Lorcan yesterday and he has kindly provided us with their C# code, which simply transmits the location data from the ubisense over TCP/IP. This fits in well with the rest of our system, as all our components communicate over TCP/IP.

Tomorrow morning, I plan to test the hardware accelerated HRTF. Hopefully we will now have much improved 3D sound localization. Besides that, I will modify my test application so that the listeners position can be controlled by the ubisence. Once we have the ubisence working, then the hardware side of our project is more or less ready and we can work on demonstration applications and flashy development tools to ease the use of our framework.
Besides this, over the next week, I plan on revising the message router code which I wrote before christmas and then design an easy to use, event-driven API to allow applications to be written in python. The foundation for this already exists, it just needs to be cleaned and refined (and the actual features of the other components need to be exposed to the applications). This will mean that applications can be developed in a single unified place, rather than having to manually connect everything together via TCP/IP (though that option will still exist, just in case it is needed).
Finally, I want to develop a graphjical configuration tool which would be used as a convenient (and user friendly) means of configuring and setting up the framework and applications. Graham knows what I mean, since we have discussed it in great detail, but I won't post more about it until I've started work on it - since it's easier for me to write about something as I'm actually working on it.

Thursday 21 February 2008

21/02/08 - Progress Report

The Schecdule, as outlined in the Functional Specification, is something like this:

Receive required hardware: October 2007 - January 2008
Interface sensors to the XBee: November 2007 - January 2008
Construct headset unit: December 2007 - March 2008
Daemon/components to gather sensor data: January 2008 - March 2008
Write audio feedback daemon: November 2007 - February 2008
Message router/state machine: November 2007 - March 2008
Demonstration applications: March 2008 - April 2008
Improvements and tuning: April 2008
System documentation: April 2008 - May 2008

The current state of each of those entries is as follows:

Receive required hardware: Originally planned hardware has arrived. Waiting on new hardware now (soundcard & wireless headphones).
Interface sensors to the XBee: Graham has completed this, see his blog for details.
Construct headset unit: Graham has completed this, see his blog for details.
Daemon/components to gather sensor data: Graham has completed this, see his blog for details.
Write audio feedback daemon: I did this in January and February. More details below.
Message router/state machine: I wrote this in November and December. It's approx. 85% complete, but will need some editing to reflect design changes made to other aspects. There are more details in this post.
Demonstration applications: The original intention was to write the demo applications when the framework is complete - obviously, we have found that writing them as we progress, in order to test each new component, is much better. The first demo application, described in posts here and here, has been written (February) and will continue to evolve over the next week or so. This puts us ahead of schedule in this area.
Improvements and tuning: We have both been working to improve our work as we think of ways (Graham has been tuning the hardware in an attempt to get a more accurate reading, I have been mostly researching methods of producing better 3D audio output in FMOD), though the real work in this area will begin once the framework is fully constructed and tested.
System documentation: No work has been done in this area yet.

From the above, we can see that currently, we are a little ahead of schedule! Good stuff.

Details on the audio component:
I started looking into this in mid to late December. At that point, I was still trying to decide which audio API to use. The choice was between OpenAL and FMOD.
Back then, I was attempting to write this component in Python, since it is a much cleaner and nicer language to program in, however, I had no success getting the OpenAL or the FMOD bindings for Python to work.
Around Christmas, with still no success getting either of these API's to work in Python (and no luck finding any other audio libraries for Python, which could produce 3D audio), I decided to write this component in C++ instead. I briefly thought about writing my own Python bindings, but ultimately it didn't seem worth the extra work.
At the beginning of January, before the exams, I wrote a test program in C++, using OpenAL. It worked, but the code was not very nice and the program was very brittle and dificult to manage. This is when I decided to use FMOD instead - which has a higher level API, a superior featureset and is very well supported.
No further work went into this until after the exams, but ever since I started writing the FMOD version, at the beginning of February, I have been constantly tuning the program and adding new features. When the first demo application was working (both the hardware and the GUI app), the audio component supported only a single sound source, which could be moved around 3D space using text commands over TCP/IP. The listener could also be moved and rotated to face any direction.
Within the next week, I added support for multiple sound sources (which could be created and destroyed at runtime, again, using the text commands over TCP/IP). Besides some minor tuning (mainly cleaning up the code), this is the current state of the program. Currently, it consists of 462 lines of C++ and I see this growing to over 1K before the program is fully complete.

Wednesday 20 February 2008

20/02/08 - Research into sound cards

Yesterday and today, I was researching exactly which sound cards have hardware accelerated HRTF support (and making sure that FMOD actually supported it). In the process, I came accross a project at Carnegie Mellon University which aims to do some very interesting things with 3D sound. The basic idea is (taken verbatim from their website):

The goal of our project is to demonstrate that audio can successfully be the primary element of interactive entertainment. Through the use of 3D trackers, headphones, and a game engine, we plan to create an immersive experience that is unique and demonstrates both the creative potential and emotional power of an audio experience.

Sounds kinda familiar ;-) Definately something to keep an eye on anyway!

While looking for a soundcard with good 3D support (specifically, for use with FMOD), as well as any other information on 3D sound, or methods of improving the sound quality, I've been digging through a number of forum and blog posts, especially on the FMOD support forums.

This thread on the FMOD forums shows that, from the posters personal experience, the Creative X-Fi XtremeMusic soundcards perform exceptionally better than the older Creative Audigy soundcards.
This article talks about 3D sound for games and has a lot of useful information regarding soundcard technology, HRTF, sound occlusion/obstruction, volumetric sound sources and different audio APIS (FMOD included).
Wikipedia article on the Creative X-Fi cards. According to the article, the Creative X-Fi Xtreme Audio cards do not contain the new EMU20K1 chip, meaning it does not support 3D sound in hardware! (Citations: here and here).
Thread on the FMOD forums outlining the recommended startup sequence. Not related to any specific soundcard, but covers some potential problems and solutions - this may become an issue when switching from software mixed 3D sounds to hardware sound.

All of the soundcards advertise CMSS-3D, but according to the research above, some cards (the X-Fi Xtreme Audio) emulate it in the driver instead - something we definitely want to avoid!
The Xtreme Gamer soundcard (besides being marketed as a gaming soundcard) seems to be the cheapest of the X-Fi soundcards that has the features we need. The Xtreme Gamer Fatal1ty Pro soundcard is the next one up, which potentially has better performance and/or quality (though not guaranteed!).

So heres the summary:

The Creative X-Fi Xtreme Gamer soundcard (komplett.ie product page) - €89
The Createive X-Fi Xtreme Gamer Fatal1ty Pro soundcard (komplett.ie product page) - €129

I would, of course, prefer the Fatal1ty Pro, since it is potentially better, but as this is not guaranteed and it costs €40 more, it would probably be best getting the cheaper one (which seems like a capable card anyway). If I were paying for it myself, I'd probably get the other one though - I never have been that good at managing money when it comes to shiney toys hahaha!

Well, thats enough research into soundcards, I think. I certainly can't wait to try the headset with improved 3D sound, when the soundcard arrives!

Monday 18 February 2008

15/02/08 - Meeting with supervisor

Today myself and Graham had a meeting with our supervisor, Prof. Alan Smeaton, in order to show him what we have done so far and discuss the direction of our project. We also used him as a guinea pig to test our demo app - he was able to localize the sounds very quickly, so this proves that the sound localization works!
Alan had some good news for us too - DCU will be installing their Ubisense soon!! We hope that we will already have our Ubisense component working, before this is done, so that we can begin testing our framework with the Ubisense as soon as it is installed.
He also told us about a project which Donal Fitzpatrick and his postdocs are working on: to develop a virtual white cane, which would use tactile feedback to simulate an environment as it would appear to someone using such a cane. Our project has great potential for a future tie-in with that project, as we are not relying on anything visual in our project and will already have location tracking (which that project will also require) implemented and we will be using various sensors (ultrasonic, compass, accelerometer/gyroscope) which could be useful to theirs. Also, our audio feedback system may be useful to their project too! It's always nice to discover new uses for a project - that may not have been thought of at the start!

For the near future, we intend on meeting with Donal Fitzpatrick and gain some valuable feedback on how we could improve our system as it could be used for a visual-less navigation system and also which sounds we should concentrate on for the greatest effect. Also, as Alan has kindly agreed to obtain a sound card with HRTF support, we hope to be able to significantly improve the quality of the 3D sound localization!

Some other things which we will work on over the coming weeks are to modify the GUI for the demo app and measure the users performance and generate statistics about how well people can localize sounds, how quickly they adapt and learn and so forth. This would be a valuable tool for the testing of our framework (providing some much-needed metrics of how successful the 3D sound system is) as well as a tool for further research into which sounds work best.

Once the Ubisense system is up and running, we will begin incorporating the Ultrasonic sensor into the project so that we can detect physical objects and begin working on a means of signaling the objects existence to the user using audio.

Once all of the above is complete, it will be time to solely focus on the framework aspect of the project: integrating everything we have so far into the message router application (by tuning the command set each component understands, having them connect through the router, instead of directly as we are currently doing and writing driver python scripts) and working on a flexible but intuitive API for the applications. Also, I want to start developing a graphical configuration tool (as explained in a previous entry) which would allow the user to specify the routing behavior through a visual tool. We will also need to spend some time testing the framework for usability and also develop a number of demo apps.

All in all, the project has been moving along quite successfully, with no show stopper problems (at least, none that we couldn't work around - thanks to UCD allowing us to use their Ubisense). the coming weeks look to become exceptionaly busy!

13/02/08 - Trip to UCD

Today both myself and Graham went to UCD to meet with Dr. Lorcan Coyle regarding Ubisense.
He introduced us to some of the other researchers, who are working with the Ubisense system there and showed us some of the applications they have written for use with the Ubisense. Although they were impressive, the applications did highlight some of the Ubisense systems fundamental shortcomings: that it really isn't as accurate as advertised (in an ideal environment, a highly callibrated system would be accurate up to approx. 15cm, in reality though, in a normal environment, it is accurate to about a meter and is quite jumpy in movement). It is still accurate enough for our purposes, though we will need to play around with it to see can we eliminate or smoothen the jumpiness, as it could cause problems.

We will be returning to UCD next week in order to begin coding the Ubisense component for our framework.

Besides this, we have been looking into soundcards which support hardware accelerated HRTF, in the hope that we could get one to improve the quality of the 3D sound. Initial research shows that the high end Creative X-Fi cards support this, but not the low end ones. OVer the next week, I plan to choose a soundcard with HRTF support (basically, a price features balance - but since we are only interested in HRTF and don't care about such things as MIDI and so forth, this shouldn't be a problem - though, if I had the money, I would love on of those cards!).

08/02/08 - Improvements to the demo app

Over the past few days, I have worked to improve our first demonstration application.

Firstly, our original version, though it worked, had some problems. The compass module and 3D sound aren't as accurate as we would like. We spent a few hours working on the calibration of the compass module and we have it acceptably accurate, but the 3D sound could possibly be improved through more advanced use of HRTF. I will be looking into this over the coming days.

I have written a GUI tool (in Python) which allows us to place sounds around the listener by simply clicking on the GUI, select which sound is audible and cycle through them. It also automatically cycles through the sounds if you simply look at the current sound (that is, you look in the direction which the sound appears to be coming from) for five seconds.

The green dot represents the position of the listener (currently this is static, but when we have the Ubisense working, it should move around as the user walks around). The black line represents the direction the listener is facing and is driven by the digital compass.

The grey dots represent sounds previous to the current one (in the list of sounds), the blue dots sounds that are yet to come in the list and the red dot is the currently audible sound. Over the coming weeks, I intend to extend this program (and the FMOD audio daemon) to allow for multiple different sounds to be playing in different positions at once as well as allowing the Ubisense to drive the position of the listener.

Myself and Graham have been in contact with Dr. Lorcan Coyle from the UCD Complex & Adaptive Systems Lab in regards to using their Ubisense system and he has confirmed that it is ok for us to do so. Myself and Graham will be visiting him in UCD next wednesday (the 13th).

04/02/08 - First Demo App!

Myself and Graham (Graham on the hardware and software communicating side hardware and myself with the audio) have now managed our first major success!
We can now play a sound in 3D but without turning your head effecting the position of the sound.
Why is this significant? Well, if a sound is playing at a certain location in 3D space, then it is playing relative to the forward direction in which the headphones are facing. That is, if the sound is coming directly from the right, it is playing only on the right hand speaker of the headphones. But if I now turn 90 degrees, it is still playing on the right speaker and the position of the sound has changed! This is obviously not what we want.
Using the digital compass, we can now detect the direction in which the user is facing and then adjust the forward vector of the listener in the FMOD 3D audio API, so that the 3D sound position gets recalculated to take the direction in which the user is facing into account.

And the best part is... IT WORKS!

Take a look at Grahams photo of the headset!

We will now be spending a day or two testing this on the lab guinea pigs (err, I mean, people passing through the labs) and also testing different sounds to determine which sounds are easiest to localize in 3D (some sounds appear to be easier to localize than others).

After this, the next step is to combine this with the actual position of the user. That is, adding Ubisense support to the system. But.. problem: DCU's Ubisense system has not yet been installed and could possibly take another significant amount of time.
Luckily, myself and Graham have some contact with people in UCD, who have had a Ubisense system installed for a few years now. Alan suggested we get in touch with Dr. Lorcan Coyle from the UCD Complex & Adaptive Systems Lab (Myself and Graham also know Lorcan from the Odysseus program, which both of us participated in, in the summer of 2006).

01/02/08 - 3D Sound

Since the last entry, before Christmas, many things have happened.
First of all, all of the hardware components we have ordered have arrived and Graham has constructed the current head unit (see his blog for lots of photos). Also (besides exams :-/) I have the audio feedback system implemented (or, at least, the first version).

Before I move on to discuss this, I will talk a little about 3D sound.

The idea is that, through the attenuation, panning and modification of a sounds frequency, sounds can be made to appear to be positioned in 3D space, even though they are merely generated by a pair of stereo headphones. Crude models of 3D sound use only volume and panning to simulate a sounds positional properties. This works for simple left/right localization of sound, but it is extremely difficult (if even possible) for a person to differentiate between sounds coming from in front or behind. Up and down are also exceptionally difficult (or even impossible) to simulate using only volume and panning. FMOD supports this model of 3D sound, but also has a crude software implementation of HRTF, which changes the sounds frequency when the sound is behind the listener, to dampen the sound. This significantly improves the quality of the 3D-ness of the sound.

As an introduction to 3D sound, a number of 3D soundscape audio clips can be found on youtube. Two that I liked are found below (you will need a pair of stereo headphones for the full effect). This demonstrates the goal, in regards to audio, which myself and Graham are hoping to achieve.

Virtual barber shop:

Various 3D sounds:

The HRTF is a powerful means of simulating sounds as they would be heard by humans, if they (the sounds) were positioned in 3D around the user. The basic idea is to alter the sounds frequencies to simulate the sound waves reflecting and refracting off and around the listeners head and into the ear.
The angle at which the sound enters the ear and the speed of hearing the reflected sounds causes a change in the frequency of the sounds actually heard. These changes are used by the brain to position the sound in 3D space.
As everyones heads and ears are different, the HRTF will never be able to simulate 3D sound exactly, but it can come close enough to generate convincing 3D audio (similar to the youtube videos above, though they use binaural recording, which produces vastly superior results, but doesn't work with dynamic sounds, like we require).
3D sound using HRTF is important for high quality sound localization, but simulating 3D sound through HRTF is computationally expensive and not feasible to do in software in a realtime system. Luckily, some mid- and high-end soundcards (Creative X-Fi, for example) support hardware HRTF, which should vastly improve the quality of our 3D sound.

Some resources on 3D sound:

Localization of sound sources (Michigan State University)
Audio and 3D sound links
FMOD Website
Blog Entry about 3D Sound, HRTF and Sound Cards

21/12/07 - Completion of the router

Since the last entry, I have almost completed the routing application. The only things left to do are to determine what the commands that the program accepts are. Doing this will need to wait until after the rest of the system is in working order.

Basically, what works is:

Components can connect to the router and identify themselves.
A python script can be loaded for each component, effectively acting as a driver for that component. This is used to allow the router to handle component specific commands.
The applications can be implemented as a python script which is loaded and run by the router in an event driven fashion (the application registers commands and events in which it is interested and the router will notify the application when these occur).
The state machine and state transition files are fully working.

I have also started looking at both OpenAL and FMOD, in preparation for writing the audio feedback daemon. It looks like FMOD is better supported, easier to use and more feature rich than OpenAL is, so I will use that to implement the audio server.
I have also been looking into HRTF and other means of improving the sound localization. More work will need to be done on this when the audio component is capable of generating 3D positional sound.

04/12/07 - Functional Specification

Today we submitted our functional specification, which can be obtained here. Last week, we met with our supervisor, Alan Smeaton, to discuss the direction of our project, as described in the functional specification.

Graham also received some more of the hardware components and has started constructing the headset. In the meantime, I am working on the central router application.

The image shows how my router program will interact with the rest of the system. Put simply, each component will, at some point, pass it's data to the router program. The application can then specify how it wants to route the data through the system (should the users position be passed to the audio system? Will the ultrasonic data generate sound? etc). Eventually, I intend on writing a graphical tool where you simply drag and drop components and connect them together with lines to route the information between the components as this is, in my opinion, the most user friendly way to configure the system.

As well as the routing capabilities, I have created a simple state machine. A configuration file, containing a list of state transitions, can be loaded by the application and events triggered (or even routing of components) would depend on the current state. Changing state when the user is within a certain radius of specified locations is something else planned for this aspect of the program, as we foresee that (location based state changes) to be an integral part of a large number of potential applications. For example, the waypoint demonstration application could be almost entirely implemented using only a number of state transitions.

The state transition configuration file is described in more detail in the functional specification. The basic layout is a list of entries (one per line), in the following syntax:

current_state x y z radius next_state

This means that a transition occurs from state current_state to next_state if the person is within radius of the position (x, y, z). To create the waypoint application, a state transition file would simply be set up to contain a sequential list of state transitions.

The application also gets notified when a state transition occurs, so that it can perform some extra action when it happens.

09/11/07 - Architecture and first hardware.

The previous few weeks (yes, all of the next few blog entries will be back dated - since I haven't been adding them to my blog as I went along, something which I will have to do from now on) we have been ordering the required hardware components, researching 3D sound localization and designing the frameworks general architecture.

So far, we have received the ultrasonic sensor and the XBee modules. Graham has some great pictures in his blog. Like this one of us testing the ultrasonic sensor. We have also ordered more parts (digital compass, accelerometer) and Graham will begin constructing the headset as soon as they arrive. The XBee module will be used solely as a wireless serial connection between the headset and host computer.

The frameworks planned architecture:
The framework will be built in a highly modular architecture, with each component being completely self contained. The components will communicate with each other through a central message router, over TCP/IP. Any complex subsystem will be implemented in a separate component. This allows us to easily add new components or remove old ones, change how the messages are routed between them, monitor the messages or control the components through external GUI programs.

The applications can use the framework in two ways:

They can be implemented as a completely separate program (or set of programs) which communicate(s) with the framework over TCP/IP. It would, effectively, act the same as any other component in the framework and has the full power of the framework at it's disposal.
They can be implemented as an event based python script, being executed as part of the message router. How flexible or powerful this method is will depend on the API developed, but an interface to send and receive commands to/from the router (and therefore to and from the rest of the framework) will exist, making it almost as powerful as the alternative option. This method would be able to make full use of the routers integrated state machine (more on this in a later post) and is highly event based, making this method much easier to develop applications with. Python is also a very convenient high level language.

Option 2 would be the preferred method of developing applications, but it may also be useful (especially for the development of tools and new, additional components) to allow for option 1.

Introduction

Though I have been keeping track of progress, I haven't put it in my blog before now, so adding everything to the blog is in order. I guess I have a natural aversion for keeping blogs... Oh well, time to get over it.. :-/

I guess an introduction to the project is a good a place to start as any.

My (and Graham's) project is to build an SDK or framework which would allow users to easily build Augmented Reality applications.
This means we are developing hardware and software components which can be used both together or in isolation (depending on the application being built) to handle some aspect of an Augmented Reality application, tools to configure and control these components and an API to build custom components which interact with the "stock" components we are developing. We will also be developing some sample applications to demonstrate the use of our framework.

The hardware/software components collectively would allow us to create virtual environments, inside a real physical space. The sensors would provide the system with a stream of input which would then be processed in some application-dependent way to produce audio feedback back to the user.

The Ubisense tags would allow the framework to know where in the physical environment the user is.
The digital compass and accelerometer will let the framework know which direction the user is facing. The importance of this will be discussed in a later post.
The ultrasonic sensor would act to detect physical objects which may be in the persons way.
The wireless headphones would provide the user with audio feedback. The audio will be 3D sound(see also here), generated with the FMOD audio API.

The planned components are:

A headset consisting of a number of sensors (digital compass, accelerometer, ultrasonic, Ubisense tag) and a pair of wireless headphones. This will be the basic interface through which an end user would interact with the applications developed using this framework.
Software components to match the various hardware components. To keep the design of both the software and the hardware modular, instead of controlling the hardware through a single monolithic piece of software, each distinct hardware component will receive it's own software daemon to monitor and control it. Roughly speaking, this means there will be a software component for handling the digital compass, the ultrasonic sensor, ubisense and 3D sound generation.
A central hub which controls the various components. This program would act as a router between all the other parts of the framework and allow for a central place to configure and manage how components are to interact.
Monitoring and management tools. There should be a set of generic tools to monitor the state of the system at any given time as well as to manage (and possible recalibrate?) the system through a GUI. These would register themselves with the routing application to receive the commands which they are monitoring.

Each component will interact through TCP/IP, allowing the framework to be restructured simply through the routing application, at runtime. This also allows for the possibility of running different parts of the system on different computers. This could be useful to spread computing intensive simulations out over a number of computers for better performance.

Some ideas for demonstration applications:

A sound localization test program which would play sounds in a number of different locations and test whether the user can determine "where" in the virtual space the sound is coming from, possibly by simply looking at it for a number of seconds.
A simple waypoint-based navigation system where the user must navigate through a set of waypoints using only audio feedback to navigate.

We (myself and Graham) plan on implementing a number of demonstration applications. More will be posted on them as we begin working on them.

The project also has some interesting potential future uses:

Helping blind people navigate
Improving the audio aspect of augmented reality (there has been a lot of work done in mixing the real and virtual visually, but 3D audio has not been explored as much as it should be)
Augmented Reality computer games
Since we are only suing audio feedback (and nothing significantly visual), perhaps this could be developed into a set of computer games for blind people?
It can be combined with traditional Augmented Reality (for example, by adding a head mounted display) and perhaps creating a more realistic and immersive (audio-wise) version of ARQuake

DCU Final Year Project