Dealing with Scale in AR
During our talk at Unite Austin 2017, we brought up scaled content and why you would want it. We also discussed why you would not want to scale the content itself, but use «camera-tricks» to do the scaling. We showed two options for scaling content, one of them uses one camera, while the other uses two or more cameras. We want to provide the details of the implementations here.
Suppose you want to create a desktop game called Ragin’ Rooster, the story of an irradiated giant chicken that scratches and pecks cities apart. You could start with a city block environment from the Asset Store. You settle on one that has a 100 meter by 100 meter area for the city block. Next, you import rooster asset into your city scene.A 0.3 meter high rooster isn’t going to terrorize anyone, so you set his scale to 100 in all three axises, making your roster 30 meters high, impressive!
How difficult is it to transition a third person perspective game like this into an AR experience?when moving to any new platform, you always expect some common issues, such as User Interface changes and different forms of input. AR has one additional aspect which you may not be anticipating: scale.
Transitioning the Ragin’ Rooster game over to ARKit / ARCore first involves importing the specific asset package for each AR technology into your game. With a few minor scene changes, you are ready to see how the game looks in AR. When you launch the game, you may only see a few giant pixels that take up most of the screen. Moving around, your screen becomes obscured with these giant pixels. No city. No rooster. What went wrong?
Unity’s editor works in distance units of meters. A 100 meter city block would take up 100 units. ARKit and ARCore also work natively in meters. Sounds like a perfect match. For some AR experiences, you may want to have assets that match up perfectly to the world. For example, a mannequin of your height showing you the latest fashions that you can walk around. For table top experiences, you will generally will want your assets to size down to fit within the play area on your table. In this example, our Ragin’ Rooster city, when put into AR, is taking up 100 meters in physical space, just like a real city block. As most tables are around half a meter in size, we have a problem fitting our city onto the table.
As an experienced Unity developer, your first solution to scaling the city, rooster, and effects to fit on the table might be to place everything under one parent transform and scale that transform. Visually, the city and the rooster will scale down, but effects do not scale properly. Your rooster may still move at the original speed. In addition, gravity isn’t scaled, causing everything to fall at crazy fast speeds. You could spend a lot of time manually tweaking these areas to try and get it to match up, but if you want to support different table sizes and always have your city scale appropriately, this might become nearly impossible.
Fortunately, the new common AR scripting interfaces solve this problem without requiring you to manually adjust individual aspects within your game. We offer two different solutions depending upon your needs: one which is designed to apply a universal scale across the entire scene and one which allows different mixtures of scale.
The easiest thing to do is simply multiply all positional data coming from the device by a constant scale value. This includes the position of the device itself, as well as any data it generates, such as planes or feature points. Rotational information does not need to be scaled.
ARCore and ARKit produce information in meter scale, but it is relative to where it begins tracking. That is, (0, 0, 0) refers to the position at the time the app began. All the data it produces, e.g. planes, are also relative to this startup position. We’ll call this “device space.”
Unity also uses meters, but a common use case might be to scale those multi-meter tall objects down to the size of a few real-world centimeters. We’ll call this original space in which the scene was authored “content space.»
Multiplying positional data by a scale factor transforms from device space into content space. Scaling the content, by contrast, transforms content space into device space. The latter is much more difficult, since many systems within Unity change their behavior or are difficult to scale effectively.
For example, physical interactions will change under scale. Scaling down physics objects can cause instability, jitter, or other undesirable changes in behavior. Other systems, such as terrain and nav meshes, can’t be scaled or moved once created, so scaling them is not an option. Particle systems don’t have an overall scale factor, so the developer would need to tweak several individual settings in order to maintain the original look. It’s much easier to simply scale the camera.
For our Unite demos, we put the Unity Camera under a parent GameObject called “AR Root”. All device-generated GameObjects, such as planes, are instantiated as siblings to the camera GameObject. We can then apply scale and a positional offset to the AR Root GameObject which will move and scale the camera and planes. Our hierarchy during runtime might look like this:
The AR Root GameObject should have an ARController component, which manages the lifecycle of the AR SDK. To set the scale, simply set the value of ARController.scale.
Say our scale factor is 10. When the app starts up, the device reports its position as (0, 0, 0), so our ARCamera will also be at (0, 0, 0). We step backward one meter in the real world, so the ARCamera now has a localPosition of (0, 0, -1). However, because the AR Root GameObject has a scale of (10, 10, 10), the camera’s world position will be (0, 0, -10). This means content at the origin will be ten times farther away from the camera, which makes it appear ten times smaller. However, it isn’t really smaller, so physics interactions remain the same, nav meshes continue to work as originally designed, etc.
Both ARKit and ARCore provide APIs for hit testing against feature points and planes. However, they are not scale aware (that is, the hit test is performed in device space). We could transform from content space to device space, perform the hit test, then transform the result back into content space. For our needs, however, we simply add a mesh collider to our plane GameObjects and raycast against the mesh collider of the plane.
The plane prefab that we use has a plane primitive and mesh collider on it, arranged in such a way that unit scale will produce a 1×1 meter plane. Since the AR Root GameObject has a scale factor, it will also scale the planes. This means we can do normal physics raycasts without worrying about scale. For example, say we want to place an object on a plane when the user taps the screen. We can use code like this:
Ray ray = camera.ScreenPointToRay(Input.mousePosition);
if (Physics.Raycast(ray, out rayHit, float.MaxValue, planeLayerMask))
m_ObjectToPlace.transform.position = rayHit.point;
See PlaceOnPlane.cs for a full example. Notice there is no special handling of scale here because the camera and planes are already in content space.
The above explains how to apply a particular scale factor, but what if you want to fit a specific asset or level geometry onto a surface whose size isn’t known until a plane is found? Imagine you have a complex scene, perhaps representing an entire city block, and you want to place it on a tabletop such that it exactly fits on the table. What scale factor should you choose? Simple! The correct scale factor is:
scale = levelSize / surfaceSize
Where levelSize is the size (in meters) of the content. This could be the length of one edge of its bounding box, for example. The surfaceSize is the size (in real world meters) of the plane. This might be the length of the smallest dimension, e.g.:
See MatchingScalePicker.cs for a complete example.
Positioning and orienting level geometry
Recall that some assets can’t be moved or rotated at runtime, e.g. terrain and nav meshes in Unity have this restriction. We know how to scale them at runtime, but what about positioning and orienting them? If you want to place your level geometry on a tabletop, the plane representing the table could be at any position and orientation. Since we can’t move the content, we instead have to position and orient the AR Root GameObject accordingly.
For example, if the center of my level geometry is at p1, and I want to place it at p2, then we want to move the AR Root GameObject such that p1=p2. That is, rather than move the level geometry to p2, shift the AR Root GameObject by the same amount in the opposite direction. Since the planes and camera do not move relative to each other, it will appear as if only the level geometry has moved.
We can do the same thing with orientation. Since we cannot rotate the level geometry to match the orientation of the plane, we inverse rotate the AR Root GameObject. Another way to think about this is that if we want to rotate the level geometry in space, we actually orbit the AR Root around the level geometry in the opposite direction.
The ARController component has some helper methods to achieve these position and orientation changes:
public Vector3 ARController.pointOfInterest
The “point of interest” is the position in content space that we want to orbit the AR Root GameObject about. Typically, this is the pivot of the level geometry.
public Quaternion ARController.rotation
The rotation the content at pointOfInterest should appear to have. In reality, the AR Root GameObject rotates in the opposite direction by the same amount.
public void AlignWithPointOfInterest(Vector3 position)
Moves the AR Root GameObject such that the point of interest appears to be at “position”. This might be the result of a raycast against a plane.
See the DemoGUI.cs component for a code example.
Mixtures of Scale
The second (two or more cameras) option is outlined below, and the code and project for it on ARKit Plugin is available here. It’s not available as part of the ARInterface example projects at this time, but will be ported over at a later date.
For the two camera solution, we use one camera to keep track of the GameObjects in the current real world coordinate system e.g. the size and position of the table you are trying to display things on. Lets call this camera the «TrackingDataCamera», and keep this moving and rotating according to what the ARKit device does. It will also render any of the ARKit generated GameObjects, like the debug planes and feature points.
The second camera, which we will call the «ContentCamera» will have a parent transform that will be positioned and rotated in such a way that rendering the content scene using this camera will make the scene (which is static remember) appear in the right place, and at the right size. How is this achieved? We start with a «content anchor», which is a point in real world coordinates where you want your scaled scene to appear. We calculate the offset of our TrackingDataCamera from this point. We then multiply this offset by the inverse of the scale you desire for your scene (e.g. if you want your city scene to be 1/100 the size, we will multiply the offset by 100). We then translate the parent transform of the ContentCamera to this multiplied offset from the anchor. The camera transform for this ContentCamera still follows the same positioning and rotation of the TrackedCamera, so its orientation matches what device does. See rough sketch for how parent transform is moved.
How to use this? In the UnityARContentScalingScene, replace the SkyscraperRoot GameObject with whatever scene objects contains your content. When you start the scene, it will look for a surface, and when you tap on the debug plane that describes the surface, it will use that hit position as the ContentAnchor noted above. Your content will appear at that position. If you know what scale you want your content at, put that value in ContentCameraParent GameObject in the Content Scale parameter of the Content Scale Manager component. The GUI + and — buttons can help you find the correct scale for your content if it isn’t correct.
The two camera (or multi-camera) solution can be useful in various situations:
- You have different pieces of content that need to be scaled, moved and/or rotated differently from one another (without actually touching any of the content itself)
- You need to maintain the ARKit generated objects at real-world scale, and have other objects appear to be scaled down
- You want to do special effects like screenshake etc — where there are temporary changes to camera movement or scale
Join the Scaled Content Club!
As you can see, our AR experiences can be made more realistic and beautiful by taking content that has been authored in a specific scale and couching it in the real world at a different scale. We talked about some ways to do that without too much work on the authoring side, and provided two readymade solutions that you can use for your own apps. Please experiment with these and let us know how it works for you on the forums. Show us your results on Twitter — we can’t wait for the next #madewithunity Friday!