Here's a quick snippet for any devs still using Unity 3.5 and wanting the newly added Profiler.GetMonoHeapSize() function in Unity 4. Tested on PC and Android.

using System.Runtime.InteropServices;

static class ProfileUtils
{
    public static uint GetMonoHeapSize()
    {
        return (uint)mono_gc_get_heap_size();
    }
 
  #if UNITY_IPHONE && !UNITY_EDITOR
    [DllImport("__Internal")]
  #else
    [DllImport("mono")]
  #endif
    static extern System.Int64 mono_gc_get_heap_size();
}
Add a comment
Category: Programming

During development of Slingshot Racing we found that the CPU was one of our main bottlenecks, particularly on the 600Mhz ARM Cortex-A8 in the iPhone 3GS. We made a number of optimisations to save CPU time, including but not limited to:

  • Dropping the fixed (physics) update rate to 30Hz.
  • Reduction in complexity of the physics scene.
  • Aggressive batching to reduce the number of draw calls.
  • Both high and low level optimisations to our own script code.

This wasn't enough, so we still needed to find further ways to save CPU time. We don't have access to Unity's source code, so the only thing we could really optimise was our own script code. Fortunately there were plenty of hotspots to target as we'd written a number of bespoke systems for our game. From earlier attempts to optimise these scripts it was becoming clear that while high level optimisations are helpful, low level optimisations in C# tended to result in less readable code and unsatisfactory results compared with what we'd expect from C++. This was especially true of floating point maths intensive functions.

The solution was to bite the bullet and start porting parts of C# script to C++. C# has an excellent mechanism called P/Invoke for making native function calls, but it was really important to ensure that any C# to native overhead wasn't going to wipe out the gains. The trick here is to arrange data in C# in such a way that it can be passed directly by pointer to the native code. Although C# makes it possible to specify a layout for classes, to keep things simple and transparent (and to avoid any surprise overhead) we stuck to using structs and arrays of structs or simple types.

The code samples rely on a helper class to handle the fact that the import library name is "__Internal" on iOS, and the dynamic library name on PC, Mac and Android. Our C# code imports the correct namespace based on the UNITY_IPHONE macro.

// SlingshotImport.cs
// Define the import library name to use for code in each namespace.

namespace SlingshotStaticLib
{
    public class SlingshotImport
    {
        public const string c_libName = "__Internal";
    }
}
namespace SlingshotDynamicLib
{
    public class SlingshotImport
    {
        public const string c_libName = "SlingshotNative";
    }
}

C# makes passing references to structs (and other blittable types) both simple and efficient. There is no marshalling overhead in this function call, and the C# to native wrapper is a small portion of the overall cost of the function.

// Quaternion.cpp
#include "UnityStructs.h" // header with suitable definitions for Unity types

extern "C" void QuaternionMul(Vector3* v, Quaternion const* q)
{
    // fast NEON quaternion mul
    ...
}
// Quaternion.cs

using UnityEngine;
using System.Runtime.InteropServices;
#if UNITY_IPHONE
    using SlingshotStaticLib;
#else
    using SlingshotDynamicLib;
#endif

public static class Quaternion
{
    [DllImport(SlingshotImport.c_libName)]
    static extern void QuaternionMul(ref Vector3 v, ref Quaternion q);
}

It's passing arrays from C# to C++ that starts to get tricky. The C# routines that were particularly expensive were those processing arrays of vertices or particles, so it was important to get this right. In theory C# should also make passing arrays of structs simple and efficient, as the runtime should be able to obtain a pointer to the first array element and pass that to C++. In practice my profiling observations show instead that the mono runtime on iOS performs hugely expensive data marshalling copies for the whole array. For one subsystem the time wasted doing this was around 10ms, making it completely impractical. Here are the declarations:

// Rope.cpp

extern "C" void UpdateRopeParticles(
    Vector3* oldNew,
    Vector3 const* curr,
    int numParticles)
{
    // perform integration and constraint step for rope particles
    ...
}
// Rope.cs

public static class Rope
{
    [DllImport(SlingshotImport.c_libName)]
    static extern void UpdateRopeParticles(
        [In,Out] Vector3[] oldNew,
        [In] Vector3[] curr,
        int numParticles);
}

One way to avoid the marshalling overhead is to use pointers in C#, however this requires the 'unsafe' keyword which is not supported in Unity C# scripts. We compiled the unsafe C# code into a separate assembly and copied it to the Assets/Plugins directory where Unity picks it up like any other script code. In order to cope with the different import library names, this code is actually compiled twice into two assemblies using different namespaces. The assembly C# code looks like this:

// Rope.cs
// Compiled with STATIC_LIB defined and not.

#if STATIC_LIB
    namespace SlingshotStaticLib {
#else
    namespace SlingshotDynamicLib {
#endif

public static class Rope
{
    // C# function declaration to match C++
    [DllImport(SlingshotImport.c_libName)]
    static unsafe extern void UpdateRopeParticlesUnsafe(
        Vector3* oldNew,
        Vector3* curr,
        int numParticles);

    // C# wrapper function for Unity C# script
    static void UpdateRopeParticles(
        Vector3[] oldNew,
        Vector3[] curr,
        int numParticles)
    {
        unsafe
        {
            fixed (Vector3* oldNewPtr = oldNew)
            fixed (Vector3* currPtr = curr)
            {
                UpdateRopeParticlesUnsafe(oldNewPtr, currPtr, numParticles);
            }
        }
    }
}

} // end namespace

Alternatively, if building a wrapper assembly seems like too much work, then hold on to your hat and simply pass a reference to the first array element. This abuse of C# appears to work correctly and efficiently (at least for our use cases), but requires making some assumptions about how the runtime is implemented. My investigations and discussions suggest this is probably fine, but I'd be interested to know of any situations where this would go wrong, so feel free to comment. Here's the C# code:

public class Rope
{
    ...

    // C# function declaration to match C++
    [DllImport(SlingshotImport.c_libName)]
    static extern void UpdateRopeParticlesByRef(
        ref Vector3 oldNew,
        ref Vector3 curr,
        int numParticles);

    // C# wrapper function for Unity C# script
    static void UpdateRopeParticles(
        Vector3[] oldNew,
        Vector3[] curr,
        int numParticles)
    {
        if (numParticles > 0)
        {
            UpdateRopeParticlesByRef(ref oldNew[0], ref curr[0], numParticles);
        }
    }

    ...
}

In conclusion, porting expensive C# routines to C++ yielded anywhere up to 25x improvements in their performance for us, effectively eliminating those bottlenecks. A future article will focus on tips to getting the best performance out of ARM processors using C++ and NEON intrinsics, and a few reasons why it's possible to beat C# by such large margins!

Add a comment
Category: Tech

For Slingshot Racing we wanted moving objects in the scene (such as the vehicles) to look correctly lit with respect to their surroundings. This means they would need shadows and if possible AO and indirect lighting to affect them. The dynamic lighting techniques available in Unity (eg. shadow maps and SSAO) were too computationally expensive to run on current mobile hardware so we needed to write a custom solution*.

Unity has an excellent light mapping tool so we used this as a starting point. The initial idea was to allocate a section of lightmap for use on dynamic objects and sample this at runtime in our shaders. Fortunately all our dynamic objects stay close to the splines we use for our track geometry. This constraint allowed us to use lightmap space efficiently by mapping rectangles of lightmap space onto the track splines; U axis running across the width of the track and V axis running along the length of the track.

To get Unity to allocate and render these sections of lightmap we modified Unity's light baking process using a custom editor script. This script starts by creating track spline meshes with the lightmap UV mapping. These meshes are then positioned ~1m above the surface of the actual track, since most of our dynamic objects are slightly above the track. To these we apply a fully transparent material so the geometry receives light but does not occlude or reflect it. All normal vectors in the meshes are pointed up since the most significant lighting contributions come from the upper hemisphere of our scenes. We then run Unity's built in lightmap baking process to generate the track lighting data. When this is complete we extract the information showing us where the lightmapper rendered the meshes, store this in the track spline object and delete the meshes.

Track Lighting Mesh

A track lighting mesh in the Unity editor

At runtime we use the information we got from the lightmapper to construct a matrix that maps vertices on a mesh to points within the track lightmap**. We then use this transform to sample the lightmap and get a lighting value per pixel. Due to the way we construct the geometry in our baking process these lighting values are not as accurate as the values we would get using other lighting techniques. Most significantly we assume all normal vectors are pointing up, this results in the shading looking too flat. To improve this we modulate the lighting value by a directional component based on the dot product of the main light direction and the surface normal of the object. This technique proved to be both convincing and fast enough to work on all our target hardware, including 3GS. The cost of the technique is independent of the number of lights, so in our night races we can efficiently light our vehicles with multiple shadowed spot lights.

Comparison with Diffuse

A comparison of our shader (right) and the standard diffuse shader (left), with (top) and without (bottom) diffuse textures. Lighting is sampled per pixel to ensure accurate shadowing.

As an extension to this system we added a lower resolution copy of the track lightmaps that could be accessed via script. This allowed us to calculate a lighting value at any point on the track and use it to tint particle effects, skid marks and other special effects to improve their lighting.

If you have constrained environments on mobile platforms, as we did in Slingshot Racing, then this technique can be used to give more accurate shadowing than the more general Unity light probe solution used by itself***.

* True at time of development. Unity now has a light probe solution that runs on mobile hardware.
** These matrices are local approximations of the actual transform but proved to be accurate enough to not cause any problems.
*** Unity 4 may give mobile developers the option to combine light probes with real time lighting and shadowing.

Add a comment
Category: Tech

We wanted the vehicles in Slingshot Racing to cast good looking shadows (rather than just blobs) and we wanted this to work on all our supported hardware. On console or PC we'd build a generic solution using something like Cascaded Shadow Mapping but on mobile devices this can get very CPU and GPU intensive. An alternative solution is to ignore self shadowing and to render each shadow casting object into a simple grey-scale shadow map and project this on to the receiving geometry. Even this can get expensive, as we can have 4 cars on screen at any time.

We decided to go with a hybrid solution where we pre-calculated a number of shadow maps for a number of angles and then projected the correct one on to the receiving geometry depending on the angle of the vehicle.

Here is the texture we used:

You can see that it is separated into 4 sections (one for each vehicle type) with 4 rows in each. This gives a total of 64 angles for each vehicle type. In the shader we blend between the closest two shadow maps based on the vehicle angle so as to get a smooth transition. Without this, you would see a flick whenever the vehicle rotated enough to switch to a different shadow map. We put all the shadow maps for all the vehicles into a single texture so that all the shadows in the scene could be rendered in a single batch.

Unity is very good for doing pre-calculation. It has extensive Editor features that allow you to access all the game content and process it in many ways. So we were able to create a new scene, load in each vehicle model, render the vehicles from all the angles and then pack all the images into a single texture, all within an automated process.

Another problem was calculating the geometry that the shadow intersects.  In standard projected shadow mapping all the polygons in the scene would be rendered in order to sample from the shadow map (although data structures can be used to optimise this process). We decided to go with a faster approach that involved directly sampling the spline data for the track to quickly generate a best-fit quad that approximated the extent of the shadow being cast on to the track.  This was much faster and still gave good results for our smooth track surfaces.

Add a comment
Category: Tech

Page 1 of 2