Agnostic Cloud Management
Hi I am Karsten, I have been working behind the scenes of Unity since 2011, as an IT Manager, to support our IT infrastructure.
IT at Unity does many things behind the scenes, from ordering hardware to operating services, both for our own internal usage and for our customers. I tend to say that our finest role is to make sure that everybody that relies on the services we provide is able to do their job.
Modern conveniences that you take for granted, like for example getting an Uber, require a lot of reliable IT infrastructure. We put together this blog post to give you some insight into the basic principles and tools we use to build the backbone of Unity’s IT.
During the years of building and maintaining the IT infrastructure of Unity we have tried to live by 4 simple guidelines.
- Use Open Source where possible.
- Design by KISS (Keep It Simple, Stupid) principle.
- No Single Point Of Failure – NSPOF.
- If anything can be done better or is not working optimal within the infrastructure, address it and fix it, even though we just built it.
What I am going to write about today is one of the building blocks we use for our Infrastructure, which is OpenNebula. OpenNebula is a cloud management tool that supports a variety of different virtualization technologies including Xen, KVM, Vmware and have hybridcloud functionality to Softlayer Cloud, Amazon EC2, and Azure. This enables us to combine bare-metal servers with public clouds, so we can build our services without the risk of running out of resources. Furthermore, this provides us the flexibility to use the technology that fits the service best and we have a single API to use to leverage the tech.
When I first joined Unity, we used a very traditional virtualization strategy. Create VM’s as needed and often only one with that purpose. That worked out well for a while however at some point, our old setup did not scale and we wanted to find a better way to manage the complete environment, from creating disks to deploying VM’s.
We started looking for tools that supported our needs and our guidelines. We came up with a list of products that we evaluated on a high level. It quickly became apparent to us that the only real choice we had was OpenNebula.
The main wining points was:
- OpenNebula uses known technology to manage the cloud, Linux, KVM, libvirt etc.
- OpenNebula uses standard virtualization tools, so we did not have to learn new complicated tech.
- We can manage the complete cloud environment without OpenNebula, because it uses default virtualization tools. So if for some reason OpenNebula were to stop working we would still be able to manage, migrate, etc. existing VM’s with standard tools like libvirt.
- We can manage our virtual environment as well as Amazon EC2 and Softlayer Cloud the same way, through OpenNebula.
Third try is the charm
We then started to migrate our hosting to OpenNebula and have had 3 different clouds managed this way. The first time we used an older version of OpenNebula and a big EMC SAN for storage. We experienced challenges with this setup and we realized after some time that our GFS2 cluster was not the best choice to store images on. The second time we used OpenNebula 4.0 and we replaced GFS2 with Ceph. This provided more flexibility but we had to ‘hack’ parts of OpenNebula for it to support Ceph cloning / CoW. The third try was an iteration over the second setup with a more matured (non-hacked!) OpenNebula. Throughout all setups we have always had a clear vision to embrace a hybridcloud setup with a public and private facing part
The evolution of IT is moving very fast and some of the functionality that was not in OpenNebula back when we first deployed it is now available. Things like Virtual Data Centers so we do not need to have 3 independent clouds running but can run them all in a federated environment. As you all know Unity is moving fast and this in turn requires that the IT infrastructure is evolving at the same pace to keep up with business.
To support our growing business we just built a new cloud infrastructure. We involved OpenNebula Systems, the company behind OpenNebula, to help us finalize our design ideas and to speed up the deployment phase. We mainly used the functionality of OpenNebula, but also required some additional functionality that we funded: Ceph snapshots.
Why do we want to fund the feature:
- To support and give back to the open-source community behind OpenNebula.
- To get the required extra functionality.
- To make sure that it is supported upstream so that the functionality will continue to be available.
Since we are globally spanning we need a setup that supports that, so we created a cloud that is truly global. We have data centers in the US, EMEA, and ASIA regions.
When we grow, our model allows us to add extra data centers easily, according to our guidelines #2 and #3. Since our cloud infrastructure is build on the KISS principle, we have created the data centers to run interconnected, autonomously or anything in between.
One data center consists of the following components:
- Compute (CPU+RAM).
- Hybrid scale-out to both Softlayer Cloud and Amazon EC2.
This will enable us to create auto-scaling groups that will initially use the resources on our bare-metal servers. If we then run out of local resources, we can scale out into either Softlayer Cloud or Amazon EC2.
Together with OpenNebula Systems we got all the components running in just 4 weeks. To illustrate the flexibility and that our design is working as expected we created a new data center in just 2 days. This exercise made us confident that we can continue to scale at the pace that the business requires us to do.