After checking out the initial capabilities of the robot and testing some basic actions with the SDK tool for choreographing activities, I dived into programming custom behavior. The SDK uses a build system called qibuild, which is based on CMake (setup notes and helper scripts here: installation, environment).
For C++ work this is necessary, but doing development in Python is much less involved, and for simpler systems I would certainly recommend that approach.
Once everything was building, I did a few basic tests of speech recognition and output, did some collection of sensor data, and some basic body-part movement tests. The API is quite straightforward to use, though some care is necessary to avoid performance issues, particularly when running over TCP/IP.
With some basic C++ code running, I started integrating some game engine code to have GameMonkey script and a quick prototyping rendering environment available. Having this environment available will make visualization of information structures much simpler. We can see here some debug UIs driving sensor data and rendering camera data as an OpenGL texture.
|Camera Render + Sensor Data|
Working from there I started looking and some computer vision research and doing some image filtering tests. There is a lot of research around on this topic and fairly robust solutions have been found for many vision processing problems. Image filtering will quickly eat up your CPU, but we aren't working in the restrictions of 60hz so there is a lot more leeway than game programming.
NAO's CPU is an Intel Atom, which supports SIMD instructions up to SSE3. I re-implemented some filtering tests using glm vector library and got some solid performance boosts from vectorizing core image processing algorithms. I highly recommend this library, it has a nice opengl-like API, and good vector swizzling support.
|Sobel + Bilateral Filter Tests|
A solid world model is necessary to give NAO a natural level of awareness. Since NAO does not have stereo camera, or a depth camera, we need to find a robust depth-map generation technique to generate a 3d voxel world from. This will provide a strong core data set upon which we can build general navigation systems, memory and location awareness.
Next time I'll write about some more image transforms and OpenCV experiments!