Thursday, November 15, 2012

Robot Life, Vol 3

Since the last post I've been working on some audio processing tests. The goal is to have Nao play along with improvised music jamming! I'll detail some of the experiments and progress thus far.

The high-level flow is to listen to nearby audio, determine tempo and musical scale, and then synthesize some melodies to match. The first step is to analyse the microphone audio to extract useful information using the Fast Fourier Transform.

The FFT lets us determine which frequencies exist in a raw audio waveform. The result is a set of linearly spaced frequency bins which tell us how much of each frequency range is present in the waveform.

The frequencies the FFT can detect is inherently limited by the duration and sample rate of the waveform data that is processed. For example, if we read 20ms of waveform data we can process frequencies into (1 / 1000 * 20 * SampleRate) discrete bins. With a sample rate of 44100 samples per second (44khz), this gives us 882 bins of data. Each bin represents data within a frequency range of 50hz (44100 / 882). Due to the Nyquist Limit, only the lower half of these bins will be usable, giving us 441 bins of usable data. We can use these bins to determine which musical notes are present based on how much of their frequency we detected.

frequency bins

Humans perceive sound on a logarithmic scale, which is not cleanly represented in our linearly spaced frequency bins from the FFT. Musical notes in low octaves are closer in frequency than musical notes at higher octaves, which means we lose accuracy in estimating notes.

For example, to determine C7 (2093hz), we examine the bin containing 2050-2100hz data. For the adjacent note C#7 (2217hz), we can check the bin for 2200-2250hz (reference). In our linearly spaced 50hz bins, these are several bins apart, so each note is cleanly stored in a single bin.

When we examine lower octaves, we can see the frequency difference between musical notes is smaller. For example, A2 and A#2 are 110hz and 116hz respectively, but with a bin spacing of 50hz, both these notes will end up mostly in the same frequency bin - we cannot distinguish them.

spacing of notes across frequency bins

With our precision of 50hz bins, it would seem that anything below A5 (880hz) or G#5 (830hz) is not accurately detectable. Any given note will contribute to adjacent bins with a lesser amount, which means we can still detect lower notes than this if we interpolate between several adjacent bins. For a 50hz bin spacing, notes can still be accurately detected down to steps of 15-20hz or so (around C4).

We can increase the number of frequency bins (and thus reduce the spacing, giving increased resolution) by increasing the length of sample data. This will decrease responsiveness since we need to wait longer for data before processing, so a good balance of parameters is necessary.

There are a few details worth paying attention to when implementing your waveform processing. Apply a Hanning Window to your sample data to prevent Spectral Leakage. Take care that your data at all steps is within value ranges that you expect. Work with input data of -1.0f to 1.0f, and use log() to display your FFT results in a more natural form for visualization/debugging. Retrieve local maxima of curves from the data of several adjacent bins for better estimation of low-frequency notes.

To determine musical scale from this data, simply tracking notes and trivially checking for best-match sets of notes is sufficient. Tracking notes over a much longer time period than your sample data lets you reach the correct musical scale within reasonable time. As always, parameters need tuning to balance responsiveness to scale changes and accuracy of scale detection.

So that's the general overview of how things are put together. The current musical note detection is fairly accurate for clean notes within a limited octave range. Human whistling and ocarina-style instruments give clean note detection. Guitars and pianos produce Harmonics over multiple frequencies which needs to be accounted for.

Next time I'll cover some of the trickery involved in generating synthesized audio data to play specific notes and melodies!

Wednesday, October 3, 2012

Robot Life, Vol 2

Robots ahoy! Some new image processing tests and opencv tests! Since the last post, I did some more manual testing and moved to opencv to experiment with more image filters for vision processing.

After performing basic image processing with Sobel and Bilateral filters, we have a good image to process with a Hough Transform. This allows us to extract lines from an image which we can then reconstruct into geometrical shapes. Here we have a sample picture (ref) showing what the algorithm produces, and an application of it to a low-res Random Internet Photo.

sample hough transform

hough transform test

The Hough Transform is really quite cool. It maps lines in cartesian space to lines expressed as polar coordinates by writing out sinusoids in Hough Space. Where many of these sinusoids overlap you have a strong candidate polar coordinate for a line in your image. Super rad. Here are some of the better papers I found describing it: 123456.

However, the results of the Hough Transform require careful coefficient tuning and so is not as robust as we would hope. Some OpenCV testing lets us chain much more variety and find a better set of image transforms to extract the desired polygonized shapes.

After experimenting with a lot of combinations I settled on a Bilateral + Canny filter to get shape outlines and then performed polygonization on this image. Having a larger number of iterations on the Bilateral filter was more beneficial than an additional Sobel filter or Box/Gaussian blur. The polygonization step with a low/high angle threshold is also shown here.

opencv tests

Looks like a good start to the sort of data we want. Here it is running real-time, on a low-res image feed. The colors indicate unique polygons, but it is randomized each frame so it changes color erratically!

Testing with OpenCV gave a much faster turnaround time, though manual implementation is fun and educational. There were a lot of combinations of filters and settings that gave poor results before reaching this. It is vital to have real-time parameter tuning so you can find the right set of filters+tunings for your needs.

Things are busy so I'm currently on a tangent doing some smaller audio processing experiments for some fun applications with Nao. Till next time!

Saturday, September 1, 2012

Galaxy Defenders

A long time ago, in a dark room far far away, I made a small 2d shooter. The project was one of my favorite old game projects from back when I just started learning programming. More recently I did a small remake of it during a free sunday to learn some XNA.

A day with XNA

I ended up working on it some more over weekends and eventually fleshing it out into a full shooting game. Many of the mechanics were based on an old DOS game called Tyrian (whose original artist DanC writes very good game design stuff over at lostgarden). This game took on more of it's own flavor as it progressed.

The Code

The design of the game code is a very object-oriented approach. While this is very flexible for added new entities and tweaking things, structurally it is one of the worst approaches for performance. The virtual function calls and branches in object-oriented code don't let the calculations streamline very well for a system of many objects (see: TCPPBDOD).

While C# generates reasonably well optimized assembly, it's restrictions on inlining functions means that for core physics and collision objects it ends up being better to work on basic data types with minimal function calls. Vector types and helper functions are no good.

Keeping garbage objects to a minimum was more a performance issue than a memory issue, and a critical one to keep a game like this running at 60hz. I wrote about this earlier (link), so I won't repeat the details here.

The Game

The core game design based around Tyrian was reasonably fun, but not as well executed as Tyrian. The core feeling of destroying enemies and buildings in the game is key for shooting games.

Differentiating weapon types in behavior and usage was a key goal, rather than just visually different weapons. Melee weapons, shields, and high-energy-consumption alternate shot modes added a reasonably nice gameplay rhythm. The aim was to balance energy consumption and weapon usage timing to allow better skill to produce more effective results.

The UI turned out quite poor, especially the in-game shop interface. In retrospect doing more menu prototyping would have been worthwhile. Complicated user interfaces are easy, simple user interfaces are difficult.

With a lack of artists, I had to embrace programmer art and try and make it a part of the style. The game has a single color per player (ship + all weapons), two colors for enemies, and two colors for buildings. Working with these restrictions was a fun experiment in color design - having 12 stages with differing color themes that don't collide with the fixed game colors is tricky!

The first few stages are a bit slow, but things get more interesting after that as new enemies and gimmicks are introduced. It would have been better to make some more exciting early stages - the current flow has too much of a slow-start before introducing the more difficult areas.

The Music

There is a lot of free music around, but for a whole game a consistent musical theme is necessary. With this in mind, I narrowed down the most fitting tracks from rushjet1, who makes mountains amazing NES music.

I collected 15~ tracks that were of reasonable length and then tweaked the stages to the flow of the songs. I tried to theme the levels to the music where possible, I think it turned out nicely!

Code + Game + Video:

Wednesday, August 29, 2012

Robot Life, Vol 1

Some weeks have passed since I received my NAO robot, so I thought I'd post on development progress. The goal of this research project is to create a robust and naturally behaving robot - something less 'robot' and more like a pet or animal.

After checking out the initial capabilities of the robot and testing some basic actions with the SDK tool for choreographing activities, I dived into programming custom behavior. The SDK uses a build system called qibuild, which is based on CMake (setup notes and helper scripts here: installationenvironment).

For C++ work this is necessary, but doing development in Python is much less involved, and for simpler systems I would certainly recommend that approach.

Once everything was building, I did a few basic tests of speech recognition and output, did some collection of sensor data, and some basic body-part movement tests. The API is quite straightforward to use, though some care is necessary to avoid performance issues, particularly when running over TCP/IP.

With some basic C++ code running, I started integrating some game engine code to have GameMonkey script and a quick prototyping rendering environment available. Having this environment available will make visualization of information structures much simpler. We can see here some debug UIs driving sensor data and rendering camera data as an OpenGL texture.

Camera Render + Sensor Data

Working from there I started looking and some computer vision research and doing some image filtering tests. There is a lot of research around on this topic and fairly robust solutions have been found for many vision processing problems. Image filtering will quickly eat up your CPU, but we aren't working in the restrictions of 60hz so there is a lot more leeway than game programming.

NAO's CPU is an Intel Atom, which supports SIMD instructions up to SSE3. I re-implemented some filtering tests using glm vector library and got some solid performance boosts from vectorizing core image processing algorithms. I highly recommend this library, it has a nice opengl-like API, and good vector swizzling support.

Sobel + Bilateral Filter Tests

A solid world model is necessary to give NAO a natural level of awareness. Since NAO does not have stereo camera, or a depth camera, we need to find a robust depth-map generation technique to generate a 3d voxel world from. This will provide a strong core data set upon which we can build general navigation systems, memory and location awareness.

Next time I'll write about some more image transforms and OpenCV experiments!

Friday, August 3, 2012

magic intervals

It is desirable to have large numbers of entities processed per-frame on multiple threads and fibers in a balanced manner. This can be achieved by creating a manager to schedule updates across a frame in small slices to keep resources from being contended too aggressively.

We can use (abuse?) random initial sleep intervals to achieve this behavior intrinsically, without the need for any management processing. Consider a group of game entities, which all sleep for one second (t) and then perform their update. We can imagine all the threads/fibers running at the same time causing a spike in our resource consumption at that point.

If we perform a sleep of a random interval between 0..t for the first update, we can effectively distribute the updates across subsequent frames, without any further management.

explanatory graph of explaining

Our normal behavior in the top frame is imbalanced and spikes each frame as all the entities wake up for processing simultaneously. Using our random first sleep, we shuffle the processing order across the frame. We can see in the sorted view how this would give us well-distributed updates across the frame.

The caveats for this are that a large enough number of entities is required to balance out the updates evenly. The method should be indifferent to variable-length sleep intervals (variable-length sleep would inevitably end up distributed like this anyway).

It's a trick only applicable in certain cases, but the required code change is very localized (a sleep() during init()) and can work nicely for game entities/enemies.

Saturday, July 21, 2012

robot, dreams

So this week I received my NAO robot. It's created by Aldebaran Robotics and is currently in available for developers, with a public release in the near future some time.

NAO has a wide range of features: dual HD cameras, various touch sensors, microphones and speakers, wifi, infrared, SD card memory, speech recognition, facial and object visual recognition, body position self-awareness and so on. He can perform moderately complex physical motions; sit down, stand up, fall over.

However, in my mind the most interesting part about NAO is that he is very accessible. With an easy to use API, drop-in applications, a growing developer community and good support (much like the iPhone), NAO is more of a platform that will allow developers to pursue their own ideas for what a humanoid robot is/does - in contrast to what has otherwise been the work of specialized research groups.

A generalized platform can provide interesting options for the future. Smaller and larger variations, different body shapes (quadruped, wheeled, flying, etc) and a multitude of special-purpose applications to fit small niche-needs. I feel this is a necessary step for consumer-level robotics and NAO seems to be a good first step towards that.

Exciting times ahead!

Monday, July 16, 2012

Soak it up

Soak testing is a brute-force method for software testing by running the application over a long period of time. It is effective for tracking down memory leaks, rare timing crashes and obscure bugs.

For games, the key is to find a simple method for running most of your game code without any user input. To emulate player behavior we create a simple AI that approximates a player, but simplifies some aspects to streamline the game cycle.

Lets consider how we might do this in a 2D top-down shooting game. We create an AI that moves randomly around the stage and fires it's weapons randomly. The player health is forcibly increased so that he does not die but such that receive-damage processing is still performed. We choose weapons randomly to cover all cases of enemy and weapon collision over a long enough time period. There is not so much complexity in a 2D shooter so this would generally suffice to cover all activities a player might perform. Any special abilities or similar can be randomly triggered as necessary.

For a first person shooting game, more complicated AI is necessary, but we should still simplify where appropriate. Similarly we can choose random weapons and shoot randomly at enemies (with some variance to emulate player inaccuracy). We can simplify by forcing player rotation and position. To get good map coverage we can simply pick random locations of interest (items, enemies, checkpoints) and fly along path-finding routes to get to them.

It is important to simplify past areas that are going to be likely to result in 'stuck' situations. These situations are the biggest hindrance to effective soak-testing, so brute-force anything necessary to get past them.

Providing a way for your game to run at a higher speed allows for much more effective testing. You can disable vsync or frame-time waits and have your game run as fast as possible. Core rendering can be disabled to speed things up much more, but this will of course reduce testing of the rendering system. Reducing screen resolution to a few pixels is another easy win depending on your performance bottlenecks.

A further simplified stage-cycling system can get a lot of coverage with little programming effort. Setup your game to load a random stage, wait a second and then do it again. This is a good first step for your soak-testing system, and can catch a surprising number of bugs. Try hooking this up to run automatically after every build commit to catch some errors in a timely fashion.