DCU Final Year Project

Wednesday 28 May 2008

</project>

And it's all over!
I had the DCU Champagne Breakfast on Monday and myself and Graham got a nice photo with Bertie in the Irish Independent and the Metro. As Borat would say: GREAT SUCCESS.
Today I had the project demonstration and I'm fairly confident that I did well in that. I was able to talk about what we did and answer all the questions fairly easily. Then again, why wouldn't I? I did spend months working on this!

Since I'm too tired to write anything now, I will update the blog again in a few days to tie up loose ends. Also, I will put the technical spec online.

Sunday 4 May 2008

04/05/08 - The end is near

So this is it(ok, last friday was it, but I needed time to recover, heh). The project's code has been submitted alongside the (rather lengthy, in our case) technical specification. The only thing left now is the demonstrations. I will post a final message to this blog after the demonstrations are over. I will also fill in the details I promised in previous posts, but never got around to. Updating this blog is slow going, what with all the assignments over previous weeks and now exams (and the fact that I got a fairly crap result for this blog, which is somewhat disappointing since I have compared it to blogs which got significantly better grades and I have more content than some of those, as well as being updated more. Doesn't exactly help motivation). Stress!

So now that the development aspect is over (details of development can be found elsewhere on this blog, architectural details, manuals and notes can be found in the technical spec and user manual), what have I learnt? What would I do differently next time?

As always, I firmly believe that some up-front design is necessary, as without it, the implementation direction is too unstructured and things will eventually go wrong. However, it is also extremely important to be prepared for changes. Requirements will always change. Nothing in software development is static and this porject was probably more dynamic than most, as it was a research project just as much as it was a project to develop an end product. As new things were learnt or descovered, the design was impacted. For this reason, I am a firm believer that agile development methodologies are a necessity of dynamic software development. Unfortunately, I admit that I did not follow a well-defined agile development plan for this project. Lesson learnt.

I also learnt the importance of a flexible decentralised communication model between components. In this project, we achieved this by seperating each component out into an (almost standalone) component which communicates with other components using a textual commandset over TCP/IP. This gave us enormous flexability to add/remove components as we saw necessary, to redirect commands, intercept or monitor them and to distribute the framnework accross a number of computers - that last capability alone made this design decision worthwhile, especially when we realised that AudioD would need to run on Microsoft Windows to take advantage of hardware accelerated HRTF, while the rest of the framework was designed to be run on linux.

Given those two big lessons learnt, what would I do differently, were I given the oportunity to do this project again, from scratch?

There are a number of things which could be done to vastly improve the quality of this project, but which were out of our reach because of budget and time contraints. Most importantly, higher quality sensors could be bought (we used the absolute cheapest which we could find that would do the job). A lot of accuracy is lost because of the sensors and this accumulates to produce a significant drop in quality. Not nearly enough to render the framework useless, but enough to make it noticeable. I would love to see this framework as it would be with high sensors!

Time and cost aside though, since that is not something we could have done differently, what other improvements could have been done, knowing what I kow now?

Probably the biggest change I would make is the overall architecture. It would still be component based. The components would still communicate over TCP/IP. Those were good design decisions and not ones I would want changed. I would, however, define a standardised communications protocol to be used throughout the entire system. I would also implement a software library which manages, not only the networking and threading aspects (Graham more or less did this in his C code), but also the protocol and commandset.
That is, every component would contain a number of common commands used for the overall management, configuration and querying of the components. This library would also contain a parsing system, which would parse the commands and pass them to the correct parts of the components. This would allow the components to focus entirely on the actual application logic, instead of dealing with maintenance tasks.
The common commandset would consist of commands to:

Configure which ports are used.
Terminate, reset or restart the component.
Query the component for connection information (how many clients are connected).
Query the component for it's current status.
Query the component for a commandlist.
Register a monitor or intercept callback (the component would forward all of its input or output (as requested) to the component making the request. This would either be done asynchronously to processing the input/sending the output to it's destination, or it would wait for an "allow" command - this would allow other components to monitor or intecept a components commands, either as part of an external tool, debugging, or to implement some crazy complex features).

I would also stick to the original design plan - but modify it a little. The reason why I changed focus was time constraints. I felt I did not have the time to follow through with the originally planned architecture of having the message router/state machine application (codenamed Seadog) at the center of the framework and connecting external tools and components to/through it. Instead I merged the existing Seadog code with ASEDIT, because I felt I did not have the time to expose all of the required features to ASEDIT, as they ended up needing to be very tightly coupled.
This was both a good move and a bad one: good because it allowed me to get more needed ASEDIT features implemented in a shorter time and bad because it poluted ASEDIT's software architecture into a somewhat hackish state, while detracting from the frameworks overall flexability (it could be regained, by creating a custom component, but then you'd be simply reimplementing Seadog).
To overcome the problems Seadog posed, I would make use of the commandset described above - if such a common interface existed, then a lot of the time contraint issues with maintaining a seperate core component from the editors which require it's features would be alleviated. This would bring me to the final changes:

The centralised message routing component - unl;ike seadog, would not really be a message router as such, but rather a node for a more structured architecture than the pure TCP/IP architecture used in the rest of the components. By this I mean that it would not be a central component through which all messages must flow (so it can route them accordingly), but rather it would be a component which acts as a gateway to a higher level interface: one written enitrely in python and using serialised python objects as the communication protocol, instead of simple text commands.
This would mean that an ASEDIT-like editor could just as easily be developed as a sub-component of Seadog without the issues which I faced when I tried doing so. This also means that a simple means of extending the framework or building applications in a higher level than building "raw" components would exist.
A python library would, of course, need to be written to encapsulate the common maintenance code envolved in this archtecture into a simple API.

Using this, I would build a number of applications, which I feel would make this framework infinitely better and more useful:

An ASEDIT-like editor for deifining the environment and placing sound sources around.
A graphical configuration tool, which allowed you to "drag and drop" components and connect them together visually - the common configuration commandset would then be leveraged to reconfigure the framework on the fly. This was actually something myself and Graham talked about at great length, but it was dropped because of its complexity (though if the interface had allowed a simpler means of doing so... most of Grahams code actually does support this, to a degree..)
A graphical drag-and-drop editor for creating applications (using custom written python code) which could then be saved and reloaded.

This would make the framework a lot simpler to use and potentially more powerful and useful, for those same reason. As I'm wiritng this, I realise that I would not be able to implement all of those things in the given time, so I would probably drop the configuration tool and application builder. They would be damned cool to have, but the rest is more important as it builds the infastructure required for everything else. Featurewise, it would then be more or less the same as we have now, except that it would be a lot simpler and more convenient to extend it in crazy and interesting ways.

Oh well, what could have been.. always seems simpler in hindsight!

After such a lengthy post on what I would do differently and what I wish I had done.. on to something more positive: self-evaluation of my work (well... mine and Grahams, it's much easier to talk about the framework as a whole than just specific parts, since its reasonably tightly coupled):
Overall, I am quite satisfied with this project. It does pretty much what we envisioned it to. We have the ultrasonic object detection and the pathfinding navigation working - the two demo apps which started it all (sort of). We have a powerful and extensible architecture. We have an interesting mix of (custom) hardware and software. We got to play with interesting technologies (ubisense, 3d audio, sensors). We have a nice mix of programming languages, interacting nicely (Synergy! hah). We have our system distributed over a number of computers - something which our architecture allowed us to achieve.
Overall I feel this project was a huge huge success! The sheer quantity of work and the many interesting outcomes, I feel that this is certainly the best project I've ever worked on (and I'd compare it to others too, but I hate to brag, though I will say this: few projects show such a large amount of technologies working in flux, allow for a distributed computing model, have a reusable component architecture, are easily extended or make use of custom hardware - not all at once anyway).

I just hope that our examiners see this.. After the effort that went into this, I'd certainly not be pleased if some database driven website (which I'd implement in Django over the weekend) were to receive a better grade.. I know I'm being mean, but a lot went into this project and I don't want it to all be in vain.
At the end of the day, the only thing I will get from this is the experience (actually, that alone makes it worth it) and the grade (I hope this will make it worth it too).
Hell, I'd love if I got to do further work in this area (or another similar one), but unfortunately, I do not see this happening (who would want to finance mine or Grahams crazy ideas haha). Well, besides doing a Phd - something I intend on doing in maybe 2 years time, after I've had a chance to fix my finances a bit.

Well, I guess I'll theres another post on the way at the end of the month. So, stay tuned.

Monday 28 April 2008

22/04/08 - Showing our project to Donal

We finally met up with Donal Fitzpatrick and gave him a little demonstration. He gave us lots of positive feedback, which was certainly encouraging, since this is, to a degree, his area of work.
We had a nice chat about the project and what could be done with it before heading up to take a look at his own works in haptics, something I had not been exposed to before. It was certainly very cool and definitely something I will need to watch closely in the near future.

Placeholder

Placeholder post which I will edit when I get time, to add in details of other things that have happened...

Friday 11 April 2008

11/04/08 - Great Success

Yeah, it's been a while since my last post.. but I said I'd keep updating, so here we are.

Since my last post, I've been working on ASEDIT and meeting with Graham in the Ubisense area in DCU so that we can test everything. The plan was to take a load of measurements and metrics, but... there were problems, as always!
The problem that I spent the most time on was the "region editor" mode of ASEDIT. This mode alows the user to draw a shape which represents the Ubisense equipped region of a room - the region where the application is to run - and to assign Ubisense coordinates to each corner. The idea was that the coordinates from Ubisense could be converted to screen coordinates, so that I can draw the users location accurately within the shape representing the room. Also, when a sound source is placed in the room, the position of the sound source needs to be converted from screen coordinates to Ubisense coordinates.
Well.. I thought this would be straightforward enough, but after AGES of struggling I gave up and ecided on a simpler version - instead of drawing a potentially odd shape, you can now only draw a rectangle and also, it is now guranteed that the corners are in a certain order (eg, corner 1 is always the top-lefthand corner). Eventually I got it working. Mostly, I think I was making little stupid mistakes, but this monring I managed to get it working pretty much perfectly!

Now you can use the editor to place sounds in the room and watch the user move around. It all works.
Myself and Graham tested it and we got one of the other guys to test it too, as he was passing by. All three of us were able to successfully locate the position of the sound and walk up to it resonably easily. Success! What does this mean? That our project works! Well, we knew that already, but now theres no doubts ;-)

Here is the coordinate system conversion code I wrote (to any Python programmer out there, I know it's horrible Python code, not all my code is like this!):

def _convert(self, x, y, to_screen):

    ubi = self.get_ubisense_coords_cb()
    scr = self._region_corners

    u_left = ubi[0][0]
    u_right = ubi[1][0]
    u_top = ubi[0][1]
    u_bottom = ubi[3][1]
    
    s_left = scr[0][0]
    s_right = scr[1][0]
    s_top = scr[0][1]
    s_bottom = scr[3][1]
    
    left = x - u_left
    top = y - u_top
    sel_a, sel_b = max, min
    if to_screen:
        offset_x, offset_y = s_left, s_top
    else:
        offset_x, offset_y = u_left, u_top
        sel_a, sel_b = sel_b, sel_a
        left = x - s_left
        top = y - s_top
    
    if u_top > u_bottom:
        if to_screen:
            top = u_top - y
        else:
            offset_y *= -1
        u_top, u_bottom = u_bottom, u_top
    if u_left > u_right:
        if to_screen:
            left = u_left - x
        else:
            offset_x *= -1
        u_left, u_right = u_right, u_left
    
    u_width = u_right - u_left
    u_height = u_bottom - u_top
    
    s_width = s_right - s_left
    s_height = s_bottom - s_top
    
    x_ratio = sel_a([u_width, s_width]) / sel_b([u_width, s_width])
    y_ratio = sel_a([u_height, s_height]) / sel_b([u_height, s_height])
    
    nx = left * x_ratio
    ny = top * y_ratio
    
    return (abs(nx + offset_x), abs(ny + offset_y))

If to_screen is True, then the x and y are assumed to represent a point in Ubisense space and the function will return them in screen space, otherwise a point in screen space is assumed and it is returned in Ubisense space.

Other things have happened too - I'll write an entry about them later.

Sunday 30 March 2008

30/03/08 - Farewell!

I just took a brief glance at Grahams last post and I realised that this is the last time the blog is checked for marking purposes!? That I know of, at least. So I figured, I'd best say a few last words!

Firstly, these won't really be a few last words, I fully intend (even thoguh I hate blogging) to keep posting updates here until the project is fully completed (ie, until we have to hand in everything). So, should anyone actually be interested, please feel free to continue reading!
I completely forgot to mention that we decided my application editor should be called ASEDIT (short for Audio Spatial EDITor). It makes sense to us nayway ;-)
THANKS VERY MUCH to anyone who was reading my blog purely for the interest in the project! I definitely appreciate it. If you would like to drop me some comments, feedback, criticism or anything else, please please leave a comment.
As Graham said in his post - words, videos and images are not really enough to describe this project. It needs to be experienced. So, if any readers want a chance to be one of the guinea pigs (that is, if you want a go), please let either myself or Graham know. We need a good few people to test this and to tell us how well it works! Leave a comment or drop me an email (d kereten AT g mail DOT com).
Both myself and Graham see this project as a great success. It was supposed to be a research project, to develop a framework to ease the development of audio and location based augmented reality applications. I believe we have managed to do just that. In fact, I hope by the end of April, that we will have surpassed that. I hope the people marking our project can see that too, if not, please get in touch, so that we may take action to fix any potential problems (read: aspects which would cause a loss of marks). Thank you very much!
I feel I learn't a lot through working on this project. If nothing else, I think that the learning experience made it all worthwhile.
The system we built is crude compared to what it could have been. The hardware we used was the cheapest we could find that would do the job. The software was (or, should I say, still is) written to give suficient results before the deadline. Unfortunately this means we were a little lax in certain areas: no unit testing, for example. Basically, there are a huge number of improvements which could be done. I feel myself and Graham could, with sufficient time and money, design and develop a relaistically usable augmented reality platform. There is so much more we wanted to add, but had to scrap due to time/money constraints.

All in all, I enjoyed working on this project and feel it was an astounding success! I hope to be able to post more success stories over the last few weeks of the project. Check back in a few days, more is bound to have happened!

30/03/08 - Framework and tools.

Admittedly, I posted the previous post five minutes ago, though it should have been posted on Friday. I got home late on Friday, so never got a chance and yesterday... yeah. Anyway, on to todays real post:

I didn't really do much since Friday, mainly because of time constraints - unfortunately I have a lot lot more needing doing than just the project.. I did, however, get to plan out the featureset for the application editor some more and have also decided upon an architecture for the integration of the application editor and the message router/state machine.
I have decided to merge both programs. If I had more time for the project, I would probably keep them seperate, as that allows for greater flexability in the long run, but since the deadline is fast approaching, I decided it was more important to simply get it running. By merging them, it also makes it a lot easier to expose functionality between the two - because they are now one and the same! The application editor also benefits from already having the Twisted networking code implemented in the message router and it now has direct access to the state machine. In the morning, I hope to merge the two and refactor the resulting code to ensure there is no decrease in code quality. I expect this program to be an integral part of the framework, not only being the central point for users to access the frameworks built-in functionality through a grpahical editor, routing messages between the various daemons or haandling states and state transitions, but also providing a platform for python code to access the frameworks features for custom coded applications, which are not possible through the editor alone - and these applications should be able to add tools to the editor, for convenient access!
Hopefully I can manage to complete the code merger in a single day, in which case there should be another post here tomorrow. This will leave me to implement features on tuesday and wednesday. The goal now is that any core functionality is complete for Thursday, so that we can then build a few simple demonstration applications. I hope the demo apps will prove the framework and application editor to be intuitive and easy to work with. These demo apps can then be also used for our guinea pig testing :-)

If the rest of the project was a success so far, then the demo apps should be trivial to implement. After all, that is the whole point of implementing a framework.

A quick look at the calendar and schedule shows that we are still on track, though care must now be taken to ensure we don't fall behind. We need to have the demonstration applications complete within about a week and a half, meaning the framework needs to be in a useful state before then. That leaves us with a week or two for testing and evaluation and a week for documentation. The schedule should really be revised, since I don't think we are able to do antyhing after April. As it stands though, I believe we are about where we should be, perhaps a week behind, but nothing we cannot recover from.
At the start, most of the work was done by Graham, since there was little that could be done without the hardware and code that interfaced with said hardware. Now that it's mostly complete, it's my turn to have more work to do. So, now Graham is mostly testing and debugging, while I'm writing a bucketload of code. Almost IBM standards - millions of KLOCS! Ok, maybe not, but I wrote more code last week than I have any other week since the start of the project!

28/03/08 - Meeting with supervisor

Today we had a meeting with Alan to see if we are on track for the second milestone. For the meeting, we prepared a small demo, which showed him all the completed hardware working together with the application editor so that he could see the test environment we intend to use over coming weeks to test and evaluate our project. I'll take a moment to talk about the application editor first.

The application editor, in it's current state, is a more powerful form of the compass test program written back in February - you can position sound sources in virtual space and the listener (now represented by a little graphic of a person, instead of simply a dot) gets positioned in such a way that it represents the position and orientation of the user, wearing the headset. While the compass test only took orientation into account, the application editor also reacts to Ubisense data. Another difference is that multiple sound sources can be active (ie, playing sound) at once and each can be playing a different sound. This is accomplished by giving each sound it's own set of properties, which can be adjusted seperately. Finally, sound sources can now be repositioned/moved after having been placed, which could not be done in the compass test.

While feature-wise, this is not a significant improvement, the infastructure is now coded to allow for more advanced (and useful) features to be implemented over coming days - per-sound source property support being the most important, but also the graphical widget for representing the environment. In the compass test code, I was simply drawing to a gtk Drawable widget, while this time I created my own custom widget, derived from both Drawable and the GtkGlExt OpenGL class. This allows me to handle events in a more localised fashion as well as drawing more complex images. Hopefully I'll have a video online showing the program in action, the still screenshots don't really do it justice. Speaking of screenshots, heres one now:

Over the next few days, I plan to scrap my current socket based networking code for Twisted code. This should make the networking aspect of the program much more robust and flexable. Currently, certain scenarios are somewhat error prone, due to the use of blocking sockets in a multithreaded GUI application (needs to be multithreaded so that a blocking socket does not stall the entire GUI, but this causes problems when a thread needs to terminate...). Twisted will solve all of these issues.
Besides the networiking code, there are a lot of features planned that still need to be implemented. Currently the program is little more than a testing tool, when complete, I hope it will actually be an application editor capable of allowing one to design and build augmented reality applications with out framework. After all, what is a framework without intuitive tools?

So, thats basically what I've been working on since my last post. Other than that, I fixed some issues witht he audio daemon - I added a command to reset the audio environment by removing all sound sources at once. This is needed so that each application can easily ensure it has a clean environment to work in. I also spent a couple of hours with Graham, testing the system, that is, wearing the headset, walking around trying to find sounds and ensuring that my application editor actually did what it was intended to. Eventually, everything was working and we could test everything together, though the system still has a few bugs, making the whole thing a little brittle. In fact, during our demonstration to alan, the application editor crashed... not good! but, it was easily restarted and everything was fine. I guess I now need to make sure that doesn't happen again!

So, yes, there are still some bugs which need to be fixed, but overall, everything seems to be working. I guess that puts us on track.

To recap, what I want to (or need to) do next is:

Debugging! I certainly don't want anything to crash during demo day.
Adding more features to the application editor. I posted a "TODO" list a few days ago, and, even though I had hoped to have it completed by the end of the week, this didn't happen.
Testing testing testing! Myself, Graham and Alan agreed that it would be beneficial to find some guinea pigs to test our project, so that we can evaluate and report on our findings, with respect to sound localisation, the usefulness of our hardware and the effectiveness of our techniques.
Documentation. It's a large project with many aspects. Everything needs to be thoroughly documented before the project deadline, which is now approaching faster than we would like.

Wednesday 26 March 2008

26/03/08 - Application Editor

Currently, I am working on an editor which will allow you to position multiple potentially moving sound sources around the listener and set a number of properties for each. This will be similar to the demo program I wrote, except that it will provide a lot more functionality.
Each sound source will have a number of associated properties which can be edited. So far the planned properties are:

Position
Sound file to play
Duration
Which state the user must be in for the sound to play
Radius
State to change to if within radius
Path along which the sound will move
Speed of movement along path

Other properties will be added as I think of them.

This should help us to not only test the hardware better, but also to develop some simple, yet rich, demo applications, such as the virtual zoo or band.

This program will be used to either drive the router/state machine program written a few months back (after it is updated, later this week), or it will be integrated into it. I have yet to decide which approach I will take. For the time being, it will be a standalone application (and will not support the state properties until I have it working together with the state machine program, using whichever method I decide upon).

Here is a screenshot of the program so far:

Nothing terribly interesting there yet, besides the basic GUI. Over the next couple of hours I will be coding a custom OpenGL-based GTK widget for editing the sound sources.

The lefthand bar will be the toolbar, containing buttons for each type of action that can be performed. The existing buttons represent (from top to bottom) editing the dimensions of the environment (ie, setting a rectangular area which representts the ubisense enabled area the application will be run in), placing and editing sound sources and the cogwheels at the bottom will run the application.

The black area in the middle is the OpenGL-enabled canvas. This is where all the magic will happen ;-)
The area to the right will contain a list of properties for the currently selected sound source.

The plan is to have the basic version of this working tonight and then to test it in DCU tomorrow morning. If all goes well, I will be adding the state properties and adding whatever else I think of tot he program over Friday and the weekend. The hope is that it will be completed on or before Monday - before, if at all possible.

Ok, best get back to work :-)

Tuesday 25 March 2008

25/03/08 - Plans for this week

I figured I may as well post my plans for this week. It will help me remember twhat I want to do and provide me with a checklist too, as well as showing whoever may be reading my blog what I intend to accomplish over the coming days.

The audio feedback component needs some more work. Before the weekend, I'd very much like for it to be more or less complete, which means I need to implement the ability to add sounds that decay over a given time, by fading in volume until it is inaudible, at which point it will be removed. The ability to reset a sounds timeout should also exist. This would allow for some interesting applications - one use for this which myself and Graham have discussed is to represent objects which were detected by the ultrasonic sensor to decay over time, as there no no easy way to determine if the object is static or moving, storing it's position indefinately would cause inaccurate results, but having it decay would provide a useful representation to the user.
I want to refactor the router application to accomodate for advances made in the project so far. It is now time that the router/state machine application can be used (and indeed, it should be quite useful now), but it's current state does not really reflect the rest of the project. I hope to have this completed and working by friday, so that it may be used to implement the demo applications we have planned.
Tools. I have a number of tools, which would be used to configure the framework and develop applications, planned. The tools are what I was hinting at in previous posts - one (or more, depending on how successful I am) of these tools will have a nice pretty drag'n'drop interface which I will write in Python and PyGTK using OpenGL. If all goes to plan, this should make using the framework much much easier than the current vodoo needed to make everything work.

I guess we will find soon enough if I'm successful or not with this little todo list... today looks to be an exceptionally busy day though, so I don't think I'll get a chance to work on the project until tomorrow :-(

Thursday 20 March 2008

20/03/08 - Neglecting to update blog

It is waaaay too easy to neglect the blog.. I really am not a blog person. GRR BLOGS...

Now that I have that out of my system, onto the real post.
Graham has posted about the hardware problems we've had, so I won't say any more about it now. Instead I'll write about the coding I've done.

Since my last post, I was still sick, so the Friday demo plan was completely ruined. After that, I ported my FMOD code to windows, since FMOD cannot use hardware acceleration under linux.. boo! This mainly just involved minor editing to the socket code to make it work under windows and removing the linux specific headers. The FMOD code itself did not need any changes, besides changing the initialization flags to tell it to use hardware acceleration. Simple. Easy. Great.
... Except there were problems. Theres always problems. Basically, on windows, FMOD only supports the C++ API when compiling with Microsoft Visual C++. I wanted to compile using MinGW, because thats what I've always used on windows and since I use GCC on linux, I figured why not use it on windows too. To cut a long story short, I downloaded Visual Studio 2008 Express and tried to compile with that. Endless hassle. If I had used Visual Studio before, I'm sure I could have got it working, but it was taking me longer than I had available time, with no success. At first it kept trying to compile into managed .NET code, but eventually I found the option to disable that so that it would produce unmanaged native code, though even then I couldn't get it compiling without problems. So, instead of spending yet more time trying to fix it, I decided instead to scrap the FMOD C++ API entirely and use the C API instead (which is supported by all compilers, since the C ABI is standard and the $*%& C++ ABI is not).
Luckily, porting FMOD from the C++ API to the C API was extremely easy and painless and now everything works. Yay.

Graham and I also ported the C# Ubisense code which Lorcan Coyle sent us to the new Ubisense 2.0 API. This wasn't difficult, but it took a while to match the old functions to the new ones, as the API seems to have been reorganized quite a bit. The documentation wasn't terribly useful in teaching you how to use the API, but it served as a good reference and the Ubisense code works now too. Success!

I have also been working on updating my demo app to allow for multiple sound sources to be playing at once and also to allow the listener to move around - controlled by Ubisense. This will allow us to test the complete system, once the hardware issues have been overcome.

Finally, I have been playing with GtkGLExt in PyGTK, so that I can create a GUI using Python and PyGTK (well, I already can and have done - the demo app for example), but also allowing me to draw onto GTK widgets using OpenGL. This will be useful for some tools I am slowly working on, sicne they have a visual component which would be a lot easier to implement (and prettier) using OpenGL rather than GTK's native drawing functions. I should be ready to post about the tools about a week from now.

So, now our project contains some C, C++, PIC assembly, PIC BASIC, C# and Python code. Interesting how a nice TCP/IP-based modular design allows a nice mix of languages ;-)

Wednesday 5 March 2008

05/03/08 - Quick Update

Ok. It has been a while since my last post. Last week I was away, attending the Irish Web Technology Conference (thanks to the folks over at Python Ireland for the free ticket) and this week I've been sick.. Hopefully tomorrow I'll be better so as to get back to work, since our hardware has arrived. That leads me on to the first part of my update:

Hardware has arrived
As mentioned in a previous post, Alan has been kind enough to purchase a soundcard and wireless headphones for us. They arrived sometime yesterday evening, so we are keen to test them out. Graham has already modified our headset to incorporate the wireless headphones.

Ubisense is ready
On a related note, the ubisense is finally ready and we have received a ubisense tag from Kirk Zhang, one of the techies here in the CDVP. Graham emailed Lorcan yesterday and he has kindly provided us with their C# code, which simply transmits the location data from the ubisense over TCP/IP. This fits in well with the rest of our system, as all our components communicate over TCP/IP.

Tomorrow morning, I plan to test the hardware accelerated HRTF. Hopefully we will now have much improved 3D sound localization. Besides that, I will modify my test application so that the listeners position can be controlled by the ubisence. Once we have the ubisence working, then the hardware side of our project is more or less ready and we can work on demonstration applications and flashy development tools to ease the use of our framework.
Besides this, over the next week, I plan on revising the message router code which I wrote before christmas and then design an easy to use, event-driven API to allow applications to be written in python. The foundation for this already exists, it just needs to be cleaned and refined (and the actual features of the other components need to be exposed to the applications). This will mean that applications can be developed in a single unified place, rather than having to manually connect everything together via TCP/IP (though that option will still exist, just in case it is needed).
Finally, I want to develop a graphjical configuration tool which would be used as a convenient (and user friendly) means of configuring and setting up the framework and applications. Graham knows what I mean, since we have discussed it in great detail, but I won't post more about it until I've started work on it - since it's easier for me to write about something as I'm actually working on it.

Thursday 21 February 2008

21/02/08 - Progress Report

The Schecdule, as outlined in the Functional Specification, is something like this:

Receive required hardware: October 2007 - January 2008
Interface sensors to the XBee: November 2007 - January 2008
Construct headset unit: December 2007 - March 2008
Daemon/components to gather sensor data: January 2008 - March 2008
Write audio feedback daemon: November 2007 - February 2008
Message router/state machine: November 2007 - March 2008
Demonstration applications: March 2008 - April 2008
Improvements and tuning: April 2008
System documentation: April 2008 - May 2008

The current state of each of those entries is as follows:

Receive required hardware: Originally planned hardware has arrived. Waiting on new hardware now (soundcard & wireless headphones).
Interface sensors to the XBee: Graham has completed this, see his blog for details.
Construct headset unit: Graham has completed this, see his blog for details.
Daemon/components to gather sensor data: Graham has completed this, see his blog for details.
Write audio feedback daemon: I did this in January and February. More details below.
Message router/state machine: I wrote this in November and December. It's approx. 85% complete, but will need some editing to reflect design changes made to other aspects. There are more details in this post.
Demonstration applications: The original intention was to write the demo applications when the framework is complete - obviously, we have found that writing them as we progress, in order to test each new component, is much better. The first demo application, described in posts here and here, has been written (February) and will continue to evolve over the next week or so. This puts us ahead of schedule in this area.
Improvements and tuning: We have both been working to improve our work as we think of ways (Graham has been tuning the hardware in an attempt to get a more accurate reading, I have been mostly researching methods of producing better 3D audio output in FMOD), though the real work in this area will begin once the framework is fully constructed and tested.
System documentation: No work has been done in this area yet.

From the above, we can see that currently, we are a little ahead of schedule! Good stuff.

Details on the audio component:
I started looking into this in mid to late December. At that point, I was still trying to decide which audio API to use. The choice was between OpenAL and FMOD.
Back then, I was attempting to write this component in Python, since it is a much cleaner and nicer language to program in, however, I had no success getting the OpenAL or the FMOD bindings for Python to work.
Around Christmas, with still no success getting either of these API's to work in Python (and no luck finding any other audio libraries for Python, which could produce 3D audio), I decided to write this component in C++ instead. I briefly thought about writing my own Python bindings, but ultimately it didn't seem worth the extra work.
At the beginning of January, before the exams, I wrote a test program in C++, using OpenAL. It worked, but the code was not very nice and the program was very brittle and dificult to manage. This is when I decided to use FMOD instead - which has a higher level API, a superior featureset and is very well supported.
No further work went into this until after the exams, but ever since I started writing the FMOD version, at the beginning of February, I have been constantly tuning the program and adding new features. When the first demo application was working (both the hardware and the GUI app), the audio component supported only a single sound source, which could be moved around 3D space using text commands over TCP/IP. The listener could also be moved and rotated to face any direction.
Within the next week, I added support for multiple sound sources (which could be created and destroyed at runtime, again, using the text commands over TCP/IP). Besides some minor tuning (mainly cleaning up the code), this is the current state of the program. Currently, it consists of 462 lines of C++ and I see this growing to over 1K before the program is fully complete.

Wednesday 20 February 2008

20/02/08 - Research into sound cards

Yesterday and today, I was researching exactly which sound cards have hardware accelerated HRTF support (and making sure that FMOD actually supported it). In the process, I came accross a project at Carnegie Mellon University which aims to do some very interesting things with 3D sound. The basic idea is (taken verbatim from their website):

The goal of our project is to demonstrate that audio can successfully be the primary element of interactive entertainment. Through the use of 3D trackers, headphones, and a game engine, we plan to create an immersive experience that is unique and demonstrates both the creative potential and emotional power of an audio experience.

Sounds kinda familiar ;-) Definately something to keep an eye on anyway!

While looking for a soundcard with good 3D support (specifically, for use with FMOD), as well as any other information on 3D sound, or methods of improving the sound quality, I've been digging through a number of forum and blog posts, especially on the FMOD support forums.

This thread on the FMOD forums shows that, from the posters personal experience, the Creative X-Fi XtremeMusic soundcards perform exceptionally better than the older Creative Audigy soundcards.
This article talks about 3D sound for games and has a lot of useful information regarding soundcard technology, HRTF, sound occlusion/obstruction, volumetric sound sources and different audio APIS (FMOD included).
Wikipedia article on the Creative X-Fi cards. According to the article, the Creative X-Fi Xtreme Audio cards do not contain the new EMU20K1 chip, meaning it does not support 3D sound in hardware! (Citations: here and here).
Thread on the FMOD forums outlining the recommended startup sequence. Not related to any specific soundcard, but covers some potential problems and solutions - this may become an issue when switching from software mixed 3D sounds to hardware sound.

All of the soundcards advertise CMSS-3D, but according to the research above, some cards (the X-Fi Xtreme Audio) emulate it in the driver instead - something we definitely want to avoid!
The Xtreme Gamer soundcard (besides being marketed as a gaming soundcard) seems to be the cheapest of the X-Fi soundcards that has the features we need. The Xtreme Gamer Fatal1ty Pro soundcard is the next one up, which potentially has better performance and/or quality (though not guaranteed!).

So heres the summary:

The Creative X-Fi Xtreme Gamer soundcard (komplett.ie product page) - €89
The Createive X-Fi Xtreme Gamer Fatal1ty Pro soundcard (komplett.ie product page) - €129

I would, of course, prefer the Fatal1ty Pro, since it is potentially better, but as this is not guaranteed and it costs €40 more, it would probably be best getting the cheaper one (which seems like a capable card anyway). If I were paying for it myself, I'd probably get the other one though - I never have been that good at managing money when it comes to shiney toys hahaha!

Well, thats enough research into soundcards, I think. I certainly can't wait to try the headset with improved 3D sound, when the soundcard arrives!

Monday 18 February 2008

15/02/08 - Meeting with supervisor

Today myself and Graham had a meeting with our supervisor, Prof. Alan Smeaton, in order to show him what we have done so far and discuss the direction of our project. We also used him as a guinea pig to test our demo app - he was able to localize the sounds very quickly, so this proves that the sound localization works!
Alan had some good news for us too - DCU will be installing their Ubisense soon!! We hope that we will already have our Ubisense component working, before this is done, so that we can begin testing our framework with the Ubisense as soon as it is installed.
He also told us about a project which Donal Fitzpatrick and his postdocs are working on: to develop a virtual white cane, which would use tactile feedback to simulate an environment as it would appear to someone using such a cane. Our project has great potential for a future tie-in with that project, as we are not relying on anything visual in our project and will already have location tracking (which that project will also require) implemented and we will be using various sensors (ultrasonic, compass, accelerometer/gyroscope) which could be useful to theirs. Also, our audio feedback system may be useful to their project too! It's always nice to discover new uses for a project - that may not have been thought of at the start!

For the near future, we intend on meeting with Donal Fitzpatrick and gain some valuable feedback on how we could improve our system as it could be used for a visual-less navigation system and also which sounds we should concentrate on for the greatest effect. Also, as Alan has kindly agreed to obtain a sound card with HRTF support, we hope to be able to significantly improve the quality of the 3D sound localization!

Some other things which we will work on over the coming weeks are to modify the GUI for the demo app and measure the users performance and generate statistics about how well people can localize sounds, how quickly they adapt and learn and so forth. This would be a valuable tool for the testing of our framework (providing some much-needed metrics of how successful the 3D sound system is) as well as a tool for further research into which sounds work best.

Once the Ubisense system is up and running, we will begin incorporating the Ultrasonic sensor into the project so that we can detect physical objects and begin working on a means of signaling the objects existence to the user using audio.

Once all of the above is complete, it will be time to solely focus on the framework aspect of the project: integrating everything we have so far into the message router application (by tuning the command set each component understands, having them connect through the router, instead of directly as we are currently doing and writing driver python scripts) and working on a flexible but intuitive API for the applications. Also, I want to start developing a graphical configuration tool (as explained in a previous entry) which would allow the user to specify the routing behavior through a visual tool. We will also need to spend some time testing the framework for usability and also develop a number of demo apps.

All in all, the project has been moving along quite successfully, with no show stopper problems (at least, none that we couldn't work around - thanks to UCD allowing us to use their Ubisense). the coming weeks look to become exceptionaly busy!