Miniclip today has over 100 million players every month, our game library extending beyond 800 titles, and almost 200 employees across our offices in the UK, Portugal, Switzerland the USA. In our new Tech Blog series, written by our DevOps Manager Dave Shanker, we’ll take a look at the challenges faced – and overcome – in growing Miniclip from its origins to what it is today.
In this first part, Transitioning to a DevOps Culture, Dave gives us an insight into the issues he faced and the quest for SA/SE efficiency as the Miniclip infrastructure began to grow rapidly around him…
As with all startups, Miniclip originally started with a small admin team. Very small, in fact: just one person, who handled most, if not all, aspects of our infrastructure. As that infrastructure grew, additional members were added and teams were created. Today, the main teams include the following:
System Engineering/Administration team
Each team handles their own respective duties, but the SE/SA team always seem to play an integral role in the success of the other teams! After all, the database team can’t run a database if they don’t have a server, the network team doesn’t have much to do if they have nothing to plug into the network and well, PHP can’t run if there’s no web server to run it on.
The initial Systems Engineering / Systems Administration team structure and processes worked rather well: the team did a great job of supporting what we had, and going the extra mile whenever required. We always had new requests coming in on a daily basis (new servers, package installations, etc) along with the normal maintenance tasks like security updates, performance issues, failed servers and so on. As Miniclip’s infrastructure expanded it wasn’t long before this started becoming unsustainable without a big investment in personnel or major changes.
For example, back then, we would receive a request for a package installations for a single server and then a couple days later, we’d get a request to install the same package on another server. Since the first SA that installed it was different from the SA that received the second request, the packages were setup slightly differently on each server. If and when it broke, we’d have to deal with the differences in implementation of that package.
While requests like this were being processed, other SAs would be troubleshooting performance issues with servers, patching software, and attempting to push out new architecture to support new features all of which were tracked in email and updates via Skype or phone.
We’d reached the stage where we couldn’t continue with the setup that had worked well for us in previous years. So, we pushed changes that decreased costs, and made day-to-day processes more efficient – starting with virtualization.
Saving time and money with virtualization
When I first started back in 2009, our data center was sprawling with servers, and we seemed to purchase new servers every quarter. Virtualization wasn’t trusted and was only implemented for small, non-production servers – anything production was running on a physical server.
Knowing how much money could be saved, and how we could increase administrative efficiency, one of my first projects was to implement virtualization – but first, it had to be sold to upper management. Luckily for me, the Database manager was looking for new hardware for one of their DB products. It would not be easy though, as virtualization was pretty much distrusted at this point, so I thought we’d have a hard timing setting up a Proof of Concept.
With the Database manager, we spec’d out hardware that was generic enough to be used to run the DBs as physical servers, but could also be used as virtualized test beds that would give us a way to justify the hardware cost. If virtualization didn’t end up working, then we could order more hardware of the same configuration (cutting any waste). But if it did work, then we would save a lot of money. Just one crucial drawback to this plan: no Storage Area Network – but more on that later.
We developed an architecture of 2 servers connected to a Direct Access Storage of 24 disks (12 disks split per server) to run 6 virtual DB servers. There were huge doubts that this setup would perform, but perform it did. In testing, it actually outperformed a single non-virtualized configuration on the same hardware.
This setup proved to be so wildly successful, it became our standard virtualized SQL stack and we were able to run what would have taken up 18 Us and 18 AMPs of electricity in an envelope of 6 Us and 10 AMPs. We then took this template and applied it to other SQL products, which saved even more space and power.
But for me, that wasn’t enough. I was worried management would want to implement the free version of the virtualization suite (both Xen or ESXi were free), but I knew the benefit to efficiency that the management suite in the paid version would give us. So I set out to justify the cost of licensing these hypervisors and associated management software.
After documenting the cost of each scenario (physical servers, virtualized servers with free licenses and virtualized servers with management software), and showing the benefits of licensing the management software (such as centralized management, VMotion, an enabled ecosystem, etc), upper management clearly saw the benefit of paying for licenses.
The success of virtualization
As a direct result of implementing virtualization, over the past two years we’ve made only a single hardware purchase – and that was due only to a lack of available hardware for a new game database that we were implementing at the time.
Managing servers saw a huge improvement in efficiency as well: we could now take snapshots of Virtual Machines before we attempt a scary upgrade. We could clone virtual machines to create other ones, cutting the amount of time necessary to provision new servers. If there was a kernel panic or any other issue requiring console access, we now only needed to login to the VMware console instead of having physical hands in front of the machine. In total, implementing virtualization has probably saved us hundreds if not thousands of hours of work.
In the next installment, Dave looks at Performance & Availability Monitoring.