Firstly, using the XNA Remote Performance Monitor to gather some more detailed information I saw there was a garbage collection quite frequently. Digging into this with the CLR Profiler I noticed a few things allocating frequently per-frame. The 360 collects garbage every time 1MB is allocated, so reducing the allocation to less than 1MB during a stage would keep us free from any GC-related performance problems.
Firstly, the framerate counter was generating with String.Format(), which generates garbage objects. A few object allocations per frame adds up very fast, so avoiding this is vital to keep the next garbage collect away. Caching all the strings to only be recreated when the value changes removed most of the issues from the HUD.
Secondly, a GetMethod() call in the collision resolution was causing an allocation -- given the number of colliding entities this was generating a lot of garbage objects! A simple brute force method of caching collision methods in a Dictionary reduced the need to call GetMethod() more than once for any collision pair type.
The next target was all of the LINQ usage throughout the codebase. LINQ is significantly slower and allocates a few objects, so keep it far and away from anything running every frame.
Using C# and XNA you can easily generate alot of garbage without realizing it. The CLR profiler has a very simplistic interface, but can give you really amazingly detailed data. You can dig through allocations over time, view the entire heap with each allocation showed by type, and you can see who allocated any block of memory. Very useful!
Here is a post of the current histogram showing the allocations over a minute of entering the game and playing a little bit of a stage:
The large noticeable allocations are Texture data, and a large Vertex Buffer and Sprite Vertex allocation by the XNA internal rendering system. Although this shows that alot of the previous unnecessary allocations have been removed, it still shows room for improvement.
The dictionary of cached collision methods is still rather large -- we can write something to iterate all game entities in the assembly and precache the collision map before the game starts.
Given that there are not so many textures in this game, we can pre-load all of them at the start of the game (and find a way to precache the XNA sprite vertex data too), to keep all those allocations together.
There are better ways to architect your game to never have these sort of GC problems, but if you just want to get things done fast then a few simple things like this can let you keep the GC at bay without spending all of your time writing things that aren't the game itself.