Use Unity’s computer vision tools to generate and analyze synthetic data at scale to train your ML models
Synthetic data alleviates the challenge of acquiring labeled data needed to train machine learning models. In this post, the second in our blog series on synthetic data, we will introduce tools from Unity to generate and analyze synthetic datasets with an illustrative example of object detection.
In our first blog post, we discussed the challenges of gathering a large volume of labeled images to train machine learning models for computer vision tasks. We also discussed the state of the art research from the likes of Google Cloud AI and OpenAI that demonstrated the efficacy of synthetic data for tasks such as object detection.
However, there are many intermediate steps between getting started with synthetic data and creating a dataset ready to train an ML model. In this process, developers often encounter similar problems and are compelled to write custom, one-off solutions that often do not yield the quality of data necessary to train a machine learning model. Today, we are introducing two new tools: Unity Perception package and Dataset Insights, which remove a number of these redundant steps, making it easy to generate high-quality synthetic datasets as well as analyze them.
Accelerating synthetic data creation with Unity Computer Vision tools
Unity Perception Package
The Unity Perception package enables a new workflow in Unity for generating synthetic datasets and supports both Universal and High Definition Render Pipelines. In this first release, it provides tools for dataset capture and consists of 4 primary features: object labeling, labelers, image capture, and custom metrics. The package provides a simple interface to input object-label associations, that are picked up automatically and fed to the labelers. A labeler uses this object information to generate ground truth data such as 2D bounding boxes or semantic segmentation masks. The produced ground truth is then captured along with associated metrics in JSON files.
In future releases, we plan to add more labelers such as instance segmentation to support other common computer vision tasks, tools for scene generation, the ability to configure and manage large sets of parameters for domain randomization, and scalability in the cloud.
Exploration and analysis of labeled data is critical for any ML practitioner. When working with synthetic data, the dataset size can become large very quickly due to the ability to generate millions of images with cloud-based simulation runs. With Dataset Insights, a Python package, we have made the process of computing statistics and generating insights from large synthetic datasets simple and efficient. It can consume the metrics, exported per frame locally or on our managed cloud service, and visualize statistics aggregated over the entire dataset.
In the next section, we will describe how we used the Unity Perception Package and Dataset Insights to create synthetic datasets-for the purpose of training an object detection model that detects and labels a set of grocery products. The tools are designed to be general and extensible to other environments and computer vision tasks in the future, with the long term goal of enabling more ML practitioners to adopt synthetic data and solve diverse problems.
3D asset creation
Recent research from Google Cloud AI uses 64 grocery products easily available in stores such as cereal boxes, paper towels, etc. to demonstrate the efficacy of object detection models trained purely on synthetic data. Inspired by this research, we chose an equal number of products that were either the same or close approximations of the original products in size, shape, and textural diversity.
We created a library of 3D assets of selected grocery products using Digital Content Creation (DCC) tools, scanned labels, and photogrammetry. Additionally, we created background and occluding assets using real world imagery mapped onto simple primitives such as cubes, spheres, and cylinders. All of the grocery products used custom shaders created in the Unity Editor using Shadergraph, in the Universal Rendering Pipeline.
We defined the behaviors for placement of the 3D assets along with the background assets and other distractions in shape and texture to add complexity. Adding varied backgrounds helps ML models trained on this dataset to deal with a wide variety of backgrounds that can be encountered in the real world.
For each render loop, a new random placement of foreground, background, and occluding objects along with a randomization of lighting, object hue, blur and noise is generated. As shown below, the Perception package captures RGB images, object bounding boxes and other randomization parameters for each image in the dataset.
Using Dataset Insights for exploratory data analysis
For large synthetic datasets, it is impossible to inspect the images manually and detect anomalies such as: bias, missing objects or artifacts, unintended patterns in placement, pose, etc. that may subsequently result in sub-optimal performance of the ML model.
In this case, the dataset of objects generated previously was input to the Dataset Insights Python package for computing statistics and subsequently training the ML model. These insights proved to be very effective in ensuring that our image data was useful for the purpose of training an object detection model. Some examples of summary statistics that were generated for the dataset are shown below.
During our own testing, we had encountered an anomaly wherein some of the objects appeared multiple times in a frame. This became apparent from the chart visualizing the object count across image frames. We were then able to quickly fix the issue and ensure that our objects of interest were uniformly distributed in the dataset, thereby ensuring an equal likelihood of detection.
Although we want a uniform distribution of objects across the dataset, we expect the ML model to be able to detect multiple objects, whether the image has few or several objects of interest. The chart above shows that the number of labeled objects in each frame of the generated dataset follows a familiar normal distribution centered around seven objects per frame.
Visualization of different light source positions. Each point represents the position of the light source relative to the objects of interest.To represent real world lighting conditions, we also varied the direction and color of the light in the scene. As shown in the plot above, the light source was placed at various locations, ensuring that the images captured across the dataset have diverse lighting conditions and corresponding shadows.
Scaling synthetic datasets with Unity Simulation
In order to run simulations with a large number of possible permutations, we used Unity Simulation, our managed cloud service that can execute Unity projects and generate a complete dataset needed to train a modern computer vision model.
In the last installment of this blog series, we will share more on generating a large-scale dataset with Unity Simulation, training a machine learning model with synthetic data, and evaluating it on real data, the economics of synthetic data, and our key takeaways from this exercise.
Get started with computer vision tools for free. Check out our tools and object detection example on Github.
We would love to hear from you – leave a comment below, or contact us at firstname.lastname@example.org for questions/feedback
* All trademarks are the property of their respective owners