Why a private cloud is an exercise in futility


Every corporation and enterprise want a private cloud these days. The arguments vary from company to company, usually revolving around security, cost, independence and strangely enough — reliability. I could argue that given the track record of most enterprise IT departments it seems dubious they can improve even one of these parameters compared to a public cloud, but I won’t. It turns out there’s no point refuting those arguments, because, and I cannot emphasize this enough:

You are going to end up using a public cloud, even if it’s more expensive and less secure, less reliable and less independent

The bottom line is clear, you are going to be on a public cloud no matter how it compares in terms of conventional IT parameters. This may sound ridiculous at first, are we really going to ignore cost in our IT systems? of course not. But we will be willing to take a 50% cost increase in return for some other crucial benefit, a trump card that makes cost savings (if these even exist) in Private Cloud seem almost irrelevant in comparison. The problem lies in the way IT managers view the concept of Private Cloud: for many of them, it’s an automation layer to provide VMs and storage to users (IaaS is the common buzzword). Take a look at AWS, GCP (Google Cloud Platform) and Azure’s service portfolios and count what percentage of their services revolve around VMs, storage and networking. High-level services like messaging, data processing, serverless computation, machine learning, etc. take up a large and ever growing part of portfolios. Momentum has shifted from IAAS to SAAS and it seems only VMWare and OpenStack are still pushing IaaS as a selling point.

The Cloud is no longer about VMs, storage and networking, it’s about high-level software services

Think about services like AWS Lambda or even AWS and GCP’s famed load balancing solutions. No private cloud product provides these currently and it’s doubtful any private cloud product will ever be able to. In the meantime, the number of “serverless” services keeps growing. By the time OpenStack ships a Lambda equivalent AWS will provide 20 other new “serverless” services.

Public clouds provide services that will never be available offline

Perhaps your developers want to use DynamoDB or perhaps your data team wants to use BigQuery — there are no offline equivalents for these services you can install on a private cloud. If you go with a private cloud you will simply have to make do without these technologies while your competitors happily utilize them for their advantage.

But although DynamoDB doesn’t have an offline equivalent, why not use Cassandra to provide a similar solution? After all, there are numerous Open Source projects out there trying to mimic or rebuild AWS and GCP services — and some are backed by prominent technology companies. This suggests that any service on a Public Cloud will eventually make its way to our data centers, right?

Wrong.

Scale & Operations

Here’s where things begin to deviate wildly from traditional IT. When a storage system crashes in a traditional data center, the systems using it have an outage. When a cloud storage system has issues, everything has an outage. Don’t believe me? go read AWS’ DynamoDB and EBS outage post-mortems. The cloud is a spaghetti of interconnections, unexpected couplings and mind-boggling complexity. At scale.

It’s important to understand that integration between services is inherent in building a cloud system. Without EBS there is no RDS, without DynamoDB there is no SQS. This is why projects like OpenStack, with its “freedom of choice” of components from various vendors, only make things worse: ironing out integration problems and unexpected interactions of 20 different parts is hard. Doing the same for all possible combinations of vendors? unrealistic.

Handling such systems require a totally different approach and focus than what is commonly found in corporate IT and thus SRE/PE were born. All major cloud providers are companies focused on the operations of services — development is integrated into that effort, not the other way around as is the case with other companies. Don’t delude yourself into thinking you can call EMC, Oracle and IBM tier 3 engineers to help you when you have an outage, they will stand feckless in the face of the unfamiliar mess presented to them — every cloud is a unique beast that can only be petted after a long introduction.

Data and AI are game changers

Google’s Vision API is a great example of the future of cloud services. The technology is based on machine learning and allows classifying and analyzing the content of images —things like automatically recognizing objects (e.g. the statue of liberty or the face of your cat) in the image, and performing sentiment analysis from people’s faces. All major public cloud providers offer such services (although this is obviously Google’s home court for now) or plan to do so in the near future.

The nature of machine learning services is that the models that power them require data to build — lots and lots of data. These enormous datasets undergo careful processing and cleanup before they are fed to giant clusters of compute nodes which digest the data to incrementally improve the model. This process can take a long time, requires enormous amounts of computation power and must be repeated if the model is rebuilt (perhaps some parameter was wrong, or the algorithm changed) - it is a costly and complicated procedure.

Collecting large amounts of data and the pricey processing of this data is the bread and butter of companies like Google, Microsoft and Amazon; no IT vendor has such capabilities and needless to say, neither do enterprise IT departments. Where would you get photos of every landmark on the planet to train your computer vision model? billions of voice recordings of humans speaking 5000 languages for building voice recognition model? trillions of email messages to train your text analysis engine? Google and Microsoft are getting this data for free. Heck, we even pay them to collect it. But Oracle has to purchase this data… and I doubt it can fit in their budget.

A known secret in the machine learning world is that successful application of ML requires careful preparation of data and scrupulous tuning of model parameters, artisanal skills held by very few engineers. ML also generates technical debt in unexpected places — making long term development a harder. Though this may change in the future as we improve our algorithms, for now, ML and AI are fields requiring hard-to-come-by expertise. So if you think you can have a “Machine Learning team” as part of your corporate IT, perhaps in the office opposite the DBA team, forget about it.

Bottom line

This synergy of cloud services is a tall wall blocking traditional IT vendors and corporate IT departments from building the IT services of the future. The ability to collect, clean, prepare and process gigantic amounts of data and the ability to operate these systems at scale are a prerequisite to building modern cloud systems. I maintain that this is a fundamental economy of scale that will render any attempt to create a Private Cloud a costly mud pool of obsolete technology.

There are some organizations which will not be able to embrace the public cloud, albeit not as many as assumed — some can simply get AWS to dedicate a region for them. These organizations are facing a fundamental change in their IT strategy, but that’s a topic for another article.

cloud software-engineering
comments powered by Disqus