Podcast

071 - The state of the cloud with Janani Ravi, David Tucker and Casey West

March 09, 2021

Experts Janani Ravi, David Tucker and Casey West discuss specific tools and strategies impacting cloud architecture, management and security, and provide actionable insight to help technologists and leaders strengthen their organization’s cloud muscle.


If you enjoy this episode, please consider leaving a review on Apple Podcasts or wherever you listen.

Please send any questions or comments to podcast@pluralsight.com.

Transcript

Daniel Blaser:

Hello, and welcome to All Hands on Tech. I'm Daniel Blaser. Today's episode brings you the audio from a webinar we had last week all about the state of the cloud. We brought three cloud experts together to share their expertise and point of view on the subject. And with that, I'll transition over to Jeremy to introduce the panel and get things started.

Jeremy Morgan:

Today we're talking about the state of the cloud. Now, as a technologist, you're probably already familiar with the big three cloud providers and the services they provide. But what emerging services and approaches that are gaining momentum now and what's crucial to your cloud transformation? That's what we're going to talk about today. So today we're going to talk about some specific tools and strategies for cloud transformation. I have several experts here that we're going to get together and talk about kind of the state of the cloud. So first up, Janani Ravi, do you want to introduce yourself?

Janani Ravi:

Thank you, Jeremy. Hi, I'm Janani. I'm a Pluralsight author and I have been one for many years now. I'm based out of Bangalore in India and I run a little business called Loonycorn out of there. Before Loonycorn, I worked in many different companies and countries including Google in New York and Singapore, and Microsoft in Redmond. And even before that, I was born and raised in Mumbai, India. And I attended grad school at Stanford.

Jeremy Morgan:

And next up we have David Tucker.

David Tucker:

Hey, thanks, Jeremy. So, as mentioned, I'm focused in the cloud just like everybody else here. And so, I spend kind of my daytime hours working with companies as generally an interim CTO or CTO consultant to come in and help them figure out how they can leverage the cloud with whatever they're building. I generally come at this from a bit of a software perspective. I spent about 10 years as the EVP of Technology at Universal Mind, guiding software implementations that leverage the cloud. And so, here, I've just kind of taken that on and continued to also author courses on Pluralsight.

Jeremy Morgan:

And Casey West?

Casey West:

Hi, I'm a developer advocate in the Google Cloud engineering group. That means that I'm an engineer that spends more of my time looking outside of our bubble than inside. I try to find out what's working in our industry in terms of software delivery and share that with as many of our end users as possible.

Jeremy Morgan:

So we're going to start out with the first question and it's, what is the state of the cloud? So just kind of the state of the cloud from your perspective and your lens. Janani, we'll start with you.

Janani Ravi:

The first thing I think of the words that come to mind are important and urgent, with the and in big red font and shining font if it's possible, right? I'd say that cloud capabilities over the last few years have become a must-have rather than just a nice to have. Like, even about 18 months or 24 months ago, there was still a case to be made for an enterprise to not be laser-focused on cloud capabilities. They were like, "No, we are building all these other important applications." They take precedence over moving to the cloud or building on the cloud. But now, given everything that's happened over the last few years, everyone has got to be thinking about how to build up their cloud capabilities. So if you're not on the cloud, you've got to get onto the cloud now. And if you are on the cloud, you've got to worry about platform lock-in and optimizing cloud costs and being really good at cloud first and cloud native development, rather than just porting applications over.

Jeremy Morgan:

Casey, what is your view of the state of the cloud today?

Casey West:

I think it dovetails really nicely. The primary focus that I see cloud users focused on is consistency. So consistency regardless of your technical role when you're in operations or a security engineer or a developer. What I'm seeing is an importance on being able to deliver solutions where you want to deliver them, but still have consistency of experiences with that delivery process. So, open standards and open APIs are a big part of that. And I think that dovetails really nicely into the conversation around avoiding lock-in, but still getting maximum benefit from cloud adoption.

Jeremy Morgan:

And David?

David Tucker:

I think there's really two things. And I think that both of these are critical, but they're kind of looking at it from different perspectives. The first thing is, is I think really over the last two years, we've begun to see the shift of organizations having a technical strategy for the cloud, to many organizations having a business strategy for the cloud. So it's no longer simply, "Hey, we need to do something technically." It's, "We want to speed up our innovation" or, "We want to be able to get products to market faster." These kinds of things that are core to business drive how organizations use the cloud.

The second thing I think we're seeing in the cloud is that the different providers are both growing together and growing apart. So we're seeing common support for different approaches and different solutions. You can look at things like, for example, how we've seen multiple ways to run Kubernetes across all platforms and those things have increased. But we're also growing apart, because we have new services even maybe in the area of databases, for example, that do things dramatically differently across the different platforms. So it's left us in an interesting position where we have to decide how we're going to deal with some of those challenges and choices that organizations face when they're looking at the cloud in 2021.

Jeremy Morgan:

Janani, you've talked with CIOs in a previous Pluralsight webinar about multi-hybrid cloud strategy. Can you speak to why that's even more important today? Or is it more important today?

Janani Ravi:

Oh, I think that's the most important topic in... As you have webinars in the future, I'm sure you'll devote entire webinars to just multi-cloud and hybrid cloud. So I think the heart of the matter is that big cloud providers are not only in the cloud business. You can think AWS, you have Azure, and you have the GCP. They do a whole host of other things, right? If anything, they seem to be in every business nowadays. So, any enterprise has to ask themselves whether any amount of convenience is worth getting tied to one cloud platform. So is it really convenient? I mean, I know it's convenient, but just one cloud platform? That cloud platform may be your friend and cheerleader today and they might be like, "No, take all my services. I'll give you all this great support," but that same cloud platform might be your competitor tomorrow.

So these days, there's almost no enterprise which has a business model that's entirely online or entirely brick and mortar. So they generally have some kind of mix of the two. So they have their offline businesses and their online business models. Their business are a combination of both, but it's mostly the brick and mortar ones that are trying to move online. And they're in this intermediate state. They're also trying to move online reactively because they realize they cannot function completely offline. Now, and this makes these organizations very vulnerable, because if cloud providers decide to return the favor and expand into their line of business, they're in trouble. So they have to have a hybrid cloud and strategy in place right from the very beginning or a multi-cloud strategy.

Jeremy Morgan:

Yeah, absolutely. Is there any kind of key or magic metric to decide what to move to the cloud? So if you're doing a hybrid cloud and you've got a little bit on-prem and you've got a little bit in the cloud, how do you decide which of those services to move to the cloud first or whether to move them at all?

Janani Ravi:

If this is a cloud migration strategy, I would say move what you can first so that you get experience building on the cloud because you're not going to get good at building on the cloud right away, right? Your developers need some time to ramp up. But if you are kind of in a mature state and then you have to figure out what services to keep on your on-premises DC versus on the cloud, I would say keep your most critical services which you can't live without, where you need complete control over. You don't want the cloud to manage your downtime. Keep your most critical services on-premises. That's hard to do because those are the ones you have to manage and you have to have the expertise to manage.

David Tucker:

Yeah, I think I'll just say one thing here and I'll talk about it a bit more later in the webinar. I think the interesting thing to understand with this is, needs are dramatically different whether or not you're talking about enterprise, versus medium size businesses, versus small and startups. And for many of those startups up to medium size businesses, for them, maybe speed of innovation is the most important metric. And in those cases you might make some trade offs and you might say, "Well I don't know what the future's going to hold in terms of this, but I'm going to tie myself to a cloud provider or maybe two cloud providers and maybe we'll split applications between them" because you're just trying to move as quickly as you can because you're in an industry that's changing and evolving so quickly.

Obviously, dramatically different needs when you're looking at a major grocery provider, right? And Janani's totally right, because all of a sudden you turn around one day and you see, "Oh, Amazon's competing with me because they bought out Whole Foods." That changes your approach. But if you're a startup, you probably don't have time to worry about that. Your goal is going to be to get to market as quickly as possible.

Janani Ravi:

And I'm sure there are lots of conversations in Netflix about being on AWS, right? And they're still on AWS and they seem to be fine, but I'm sure they're thinking about it every day.

Casey West:

The strategies that we see in some of the largest consumers is that they have multi-cloud strategy. And that's a necessity also from just a business risk perspective. We don't want to go all in on one provider or one vendor for very real reasons when it comes to spreading out your liability. And as soon as you try to engage multiple providers or you try to find multiple deployment targets, it's now you have to solve an obstruction problem as well. I think Janani speaks to it really well about avoiding the lock-in and finding the opportunities where you can have more agnostic solutions. But then David also speaks to the point that the more abstraction you try to incorporate, the more complex your solutions become, the more time it takes to do it well.

Jeremy Morgan:

My next question is actually for you, Casey. You have a lot of expertise in architecture and security at Google Cloud. What are some of the biggest security considerations when you're doing these transformations?

Casey West:

Sure. I think there are a couple of factors. One of them is more of a philosophy. I think one of the most predominant things is changing your philosophy around your security posture from a perimeter-based security to security in depth. This is a common theme in data centers and more legacy environments that we've try to put a big wall around a complex system and hope that that wall holds. We know in practice that that wall does not hold and we know that as soon as you breach it, you have access to everything if you don't engage in security in depth.

Security in depth becomes particularly important as your solutions become more distributed and more network connected. And as we adopt high cloud solutions, your architecture has naturally become more distributed. So it's important to have security perimeter around each service and have a clear understanding about what that perimeter's going to provide you. So that's the security in depth.

Then we have to manage the complexity of that architecture. And from a security perspective, there's nothing more important than observability. Just to be able to log telemetry and have actionable intelligence about what's happening, how traffic routes, that sort of thing is very important.

The last thing I'll mention. When it comes to the cloud and especially around the ability to move into the cloud, as you migrate, services in your architecture has become a little more hybrid especially in that in between time. Your network connections, network connectivity in securing those, it's not as straightforward as I think we often want to believe. It's not just about standing up an API and connecting to it over the public internet. You have to deal with VPNs and dedicated network connections and there's cost and time associated with setting that up. But the security benefits are enormous so it's important to pay attention to it. So networking, observability, and security in depth.

Jeremy Morgan:

Yeah, I think that's still one of the biggest fears out there and it's been a fear since the beginning of the cloud is, "We have things in our data center and it's secure. And we know it's secure. We don't want to put it out there in the cloud publicly." Do you think that attitude still persists today?

Casey West:

I think that attitude does persist. I've got a background before Google working for security companies that do intrusion detection and analysis. The biggest surprise to me are the organizations that feel like if they own it all, it will inherently be more secure. I've often seen that in organizations with a staff of a dozen in IT who are working their butts off and doing amazing work. But to compare that to a staff of thousands that are dedicated to the security of an entire infrastructure stack, it's hard to compete. And I would ask, why would you?

David Tucker:

On that note, there was a stark change that happened in consulting for me. It was about three years ago. It was a combination of what we saw with both Anthem and Equifax, where organizations shifted from saying, "Oh my goodness, I can't put it in the cloud because it's not secure," to organizations saying, "Oh my goodness, I can't keep it in my own data center. It's not secure." And so, I think for the most part, many organizations that don't have special use cases, and there are some them that are very unique use cases, but for general companies, even ones that have to abide by very strict regulatory compliance, in those situations I think they're all seeing there is a pathway to the cloud for them. Some of those old objections I think are starting to break down a bit.

Casey West:

Yeah. We have certification and governance in every cloud provider and vendor of any significance as certification in governance that they have to adhere to that they can show their attestations for. There's a lot of power in that. And just pulling it back to the business rationale, I think, David, you discussed this a little bit in your intro. Every organization is aiming to be the best in the world at something. That's their entire business strategy. The cloud providers, such as the one that I work for, we are aiming to be the best in the world at a secure global scale infrastructure. And if you're an insurance provider, maybe that isn't your core business. And that's okay.

Janani Ravi:

Just one thing about security at the platform level, there's often... How I feel about the cloud and what discussions I have with architects I work with, is basically everyone says "Learn from experience," but it's even better if you learn from somebody else's experience, right? So you don't have to face all of the pain yourself. That never happens in real life, of course. But on the cloud, you can, because I mean, there was the Capital One data breach. I think this was about two years ago and it was a big deal. It was something about misconfigured... I don't know, VPC. Something very, very trivial. But AWS realized it's a problem, and now they're going to fix it for everyone, right? Others don't have to worry about that. So I think that's really helpful to keep that in mind.

Jeremy Morgan:

Yeah, definitely. It seems like every time there's a security breach, the second thing mentioned after the company is the provider they're on. Which whether that's fair or not, it seems to be like, "There's X breach and they're on AWS or they're on Google." And so...

My next question is for David. You mentioned that you work with clients on creating solutions that are meant for the cloud. Can you talk a little bit about what the differences are between porting something to the cloud and something that you're building greenfield that will live in the cloud?

David Tucker:

Sure. A lot of organizations that I go into and I speak with, truth be told, they haven't fundamentally changed how they've built applications in the last seven years. So we're still seeing a lot of the same technologies, the same patterns, the same approaches. And here's the thing. I truly believe if you're going to get the full benefit of the cloud, you're going to have to change what you build. There's going to have to be fundamental changes. And this isn't just, "Oh, we need to implement new approaches and new things." It can be fundamentally different of even how you choose the solutions that you use.

So we've had a conversation here about lock-in. And there's no single answer for how you do lock-in. Again, it's going to be very dependent based on what you're looking to do. But again, if your organization is focused on speed of innovation, you're going to change, "Hey, instead of me spinning up all of these clusters where I maintain my own databases, I maintain my own data stores, and manage the security around that," you're now going to be choosing manage services provided by the cloud provider that are going to enable you to do that with minimal maintenance in the long term, which for many organizations, that's going to be a big shift. You're going to see more people focused in the business on adding new value, rather than just keeping the lights on.

Now, it doesn't change the fact that you're going to still need both of those, but hopefully, that shift is going to help you change the ratio of people that are doing one versus the other.

Jeremy Morgan:

I think that's important that you bring up the business case, because it seems like sometimes people get overly focused on the tech, especially us tech people where we're like, 'We're going to spin up 10,000 servers." The business doesn't care that you spun up 10,000 servers. They don't care about anything other than the business metrics. So at a certain point, you have to decide what problem are you trying to solve, which was my favorite question as a consultant.

David Tucker:

Right.

Jeremy Morgan:

The very first one is, "What problem are you trying to solve?"

"Well, we're Netflix and we want to make it so if we have a hit TV show and 10 times the amount of traffic comes in, we can serve it" or, "We want to put out features faster" like you mentioned, David. Many companies are like, "We want to have an idea go to a feature as quick as possible." And so, does a lot of that shape the strategy in you all's experience as far as the why, why they're doing it in the first place?

Casey West:

Yeah. I could just mention that one of the things that we ground a lot of our work with our end users and our customers in is the research around what helps an organization perform better in the industry, right? So the DevOps research and assessment in particular is this large long study on what creates elite performers and what separates them from high, medium, and low performers. And key capabilities around this are around not just the technical, but also the cultural and business process aspects of software delivery and solution delivery. So I think it really does get to David's point around focusing on the business outcomes you're trying to achieve and trying to optimize for that.

Moving to the cloud isn't the same as being cloud native. Moving to the cloud isn't the same as realizing any of the gains of on-demand infrastructure and services with per second billing, right? But it will require architecture and the business process change in order to realize those gains.

David Tucker:

And it's important to mention too that you could move to the cloud and achieve virtually no benefits of the cloud, like you said, if you don't change what you do. And I've seen so many organizations that say, "Oh, cloud transformation, we've moved over." Well, you just moved servers from one location to another and nothing else. Again, there's some value in that, but that's nowhere near all the benefits that you can achieve.

Casey West:

Yeah. If you need to turn down a data center because your lease is up and that's the only objective you have, then yes, by all means, move to the cloud and do nothing else.

Jeremy Morgan:

What are the tangible differences in developing software? Traditionally, we're developing a software within the constraints of a single server or maybe a set of servers that are real metal servers sitting on a rack, versus now we have these abstracted services. And so they're basically services instead of servers. And then we have to think about things like scaling, horizontal scaling versus vertical scaling, things like that. What are some of the big differences in the software development part of it?

Janani Ravi:

Maybe there are four points generally when I have this discussion with architects. There are four points which I bring up first and then of course, everything follows from here. Stateless applications, microservices, infrastructure as code, and automation for testing and releases. The thing is, these four points encompass everything about how you build your application. And it's all about like the DevOps science. It seems simple when you see this, but each of these is what goes into cloud native development, right? That's when you're already leveraging what the cloud has to offer, rather than just moving your machines over as David said, right? So design for your applications will be stateless, which means you're not holding state within the components of your application. And this state doesn't need to be kept in sync because that means your components constantly talk to each other, which is not what you want when you're scaling up and down in an elastic manner, right?

Building your application using containerization and microservices. If you have little components and every service is very, very narrow in scope, like as narrow as possible, in that case you're able to version and scale up individual services. Hopefully, that'll not affect the rest of your application, but that of course, requires major architectural changes in many, many cases.

The other one was infrastructure as code. The whole idea of not having to provision machines yourself is like a huge... I mean, you invest a lot of time in actually building up those YAML files or whatever provisioning infrastructure you're using, or cloud formation templates. You name it. But that investment is one time. It may take a lot of time the first time around, but then once you've built your expertise, you're just running it over and over again. It's like learning to code, right?

And then finally, of course, the whole CI/CD, which is over hyped and everyone talks about it, but it's super, super important because that way you're going to be able to release incremental releases to your customers and not totally mess up whatever service you're providing them by making huge changes, which go out once a month, or once in two months, or once in six months.

David Tucker:

Yeah. Absolutely. And I would just build onto that. I had one client one time. I'll be very vague for obvious reasons as to the specifics of them. But they did about $10 billion a year off of an e-commerce platform that they built and managed. Now, here was the challenge. They could only release code about every 10 weeks because of the extensive manual regression that had to happen with every release. Now, here's the reality of it. As businesses, you need to be able to release value at minimum every two weeks, right? So that you can go through and build and learn and test to evaluate and then rebuild and keep going through that cycle. That's the speed at which businesses need to evolve in today's climate with many organizations doing it multiple times per day.

So the truth is, with the cloud, sometimes, and this always frustrates me, it gets build as, "Oh, you can build stuff so much easier." The truth is, is when you build something, the way that Janani's talking about here, it's going to take a large upfront investment to get those things in place. But when you're doing it this way, you're building for the long term. You're building to reduce the amount of time it takes you to add value to the organization each time you need to do it. So there will be a large upfront barrier.

Another thing is, sometimes people sell it as it's going to completely reduce costs. And in many cases, it won't, but you're going to have a lot of additional value. It goes back to the question, again, "What are you building for?" If you're looking to create new value for your business to increase your innovation, it's probably going to be able to do that for you if you follow a lot of those concepts that were just mentioned.

Casey West:

Absolutely. I'm in alignment with this. And I would have doubled down on the CI/CD and the automation. But from a particular perspective, if I had a specific piece of guidance to offer organizations that are at the beginning of this journey, which most are, I feel, it would be to in invest in the areas where you have communities of practice around things like automation as a first principle. We focus a lot on the technical aspects of automation. We're going to use Puppet, and Chef, and Terraform. We're going to just build a pipeline and it's going to work. But the methods by which you build that automation will say a lot about how easy it is for you to deliver that value and how you can incrementally improve. When I say communities of practice, I mean, really investing in educating your staff and your workforce, giving them the opportunity to experiment and iterate, and seeing the reliability of the software delivery life cycle as a feature of your product because ultimately that's a feature that delivers value to your end users.

Jeremy Morgan:

Yeah, absolutely. I love the fact that all three of you have mentioned investment. I think that's a language that we should be using in continually framing it that way as an investment, because every time you're doing these transformations, there's such a big push and a big lift. I'm sure a lot of people, especially people who aren't necessarily technical, would look at that and say, "Okay, why are we doing this? Why are we paying all of these engineers? What is the return on that?" So yeah, framing it as an investment is definitely the way to go for success, I think.

You mentioned cost. Each of you kind of mentioned cost. Cost management, of course, is one of the big, huge hurdles in cloud transformations. Even organizations that are like a hybrid cloud, they've got a little in the cloud, little on the data center, one of their biggest fears is, "We're going to push everything up to this service and it's going to be great. And then at the end of the month, we're going to have this giant bill that we have no idea how to pay." What are some good strategies for avoiding that kind of sticker shock and overspending?

David Tucker:

I think a few things from my perspective that I've seen work really well, is first, cost analysis and planning has to be a first class citizen in your build process, right? It doesn't need to be something where it comes after the fact. It needs to be something that every architect understands. However, that being said, most organizations don't start there. You don't start with people that are experts in this from day one. I think this is actually a phenomenal area where it helps to rely on either third party consultants that can come in and get some of your people up to speed, or some of the different solutions that are for additional cost analysis. There's some different SaaS platforms and things that you can leverage that help you with this if you're just beginning. I think the goal is, again, eventually you have that expertise in-house. But if you're starting, it's probably a great investment. It probably will pay for itself if you're bringing in some third party consultants that can potentially help you understand how you plan to minimize those costs in the long term.

Casey West:

I know one of the big conversations that I end up having is around just the visibility of cost and how that changes your attitude about it. In an on-prem data center scenario, it's very frequent that we over provision and we pre-provision. And in the cloud, obviously, that entire model has slipped on its head. You consume ala carte and you consume as you need. Suddenly, you have a lot more visibility into the realistic costs of your architecture than you often do otherwise.

I agree with David's points about how to manage that, especially in the early days. Leverage the expertise of people who have already been there. One of the more fun folks is QuinnyPig on Twitter, Corey Quinn, who spent a huge amount of time with us and that's his business, right? There's a good reason why he's in business. It's because he really is saving people because he understands how to analyze that cost. The visibility is there and the ala carte nature is really attractive, but it can get out of hand really quickly, so being very front of mind about paying attention to it. It kind of goes back to that operational model around security of observability. If you can observe the cost and make it actionable, then you should be doing that.

Janani Ravi:

I was in a webinar yesterday. There were a bunch of architects who were involved in cloud migration. So legacy applications such as the post office in the UK, they were moving that to the cloud. And then they had all these concerns about cost. That's the first thing all the business and finance teams ask about. And it's a big problem, right? Because technologists need to be able to answer these questions. I think very practically one thing they found helped to convince them, is they went in and fanatically tagged or labeled all of their resources. It's very manual because each time... So you have to embed this within your team where each time they provision something, they're tagging it or labeling it with, "This is the project" and "This is the feature" or "This is the prototype." It may seem manual initially, but then your automation scripts should take care of this, right? Within every project.

Once they were proactive about doing this, they were able to assign cost to individual projects. Often, the finance teams would come and say "We discussed this cool new strategy, this project which has to go out" and they would say, "This is exactly how much it costs. So now, decide whether this is important or something else is important, or is the migration important." So I think doing that manual work up front will also help your engineers learn how much projects costs and how much it costs to keep things up.

Jeremy Morgan:

We'll shift gears a little bit here. I have a question for everyone, and it's about serverless. So there's a ton of hype around serverless as a concept. Is it a fad? Is it something worth looking at? What are some of the experiences you've all had and what are some of the pushbacks maybe that you've received for going serverless?

Janani Ravi:

I love serverless because everything we do because we're a small startup is cloud first, right? So serverless solutions is what we look at. We also find it more cost effective. I've left stuff running. For my organization, that's a huge bill, right? Whatever little it is. [inaudible 00:31:04] serverless has really like... Especially we use a lot of BigQuery for our data analysis, and I love BigQuery. I much prefer that over anything that has a cluster running.

But for larger organizations, I think there are still formidable challenges, because I don't know if you're quite at that point where large organizations who are actually in the process of cloud migration will choose serverless. The first and most important is they want to get to the cloud as quickly as possible. They often have these deadlines. "Oh, this account or this client that you're servicing needs to be on the cloud in a year." And then in that year, they only have time to firefight what the must-haves, right?

Serverless involves re-architecting your application to work with the new technology, maybe work within the constraints of a few languages, whereas maybe they're using some arcane language and setup for their application. In the first phase, they want to be able to say "We are on the cloud." So that's lift and shift. Serverless is not going to come in there. So I think this is the most important challenge. I have other things to say about this, but I'm also curious about what David and Casey have to say so.

David Tucker:

Yeah. A huge portion of my time is spent talking about serverless. This really becomes one of the areas that I work with organizations on. First of all, I agree with the challenge here, right? If you're moving anything over, what I coach organizations is, "If you have time to re-architect something, then we can talk about it." But really the way you can look at it is anything net new that you do, I believe you should consider serverless as an approach. If it fits, wonderful. There are some situations where it doesn't fit, but the benefits here are huge.

Now, as I mentioned earlier, sometimes it gets build as, "Oh, this is so easy. Just write your code and everything works." There's still a large upfront investment of time to get your teams up to speed. There's a large effort put in place to be sure you get your CD pipelines set in place for your serverless application. You're going to be using many different managed services instead of maybe just deploying things onto one cluster. And so, there is some complexity with that. Teams have to learn multiple services how they leverage them, again, irrespective of which cloud platform you're looking at.

So I think there's a ton of value, because I've been a part of two organizations that have shifted global applications with millions of users over to a completely pure serverless approach. And for those use cases, it was a perfect fit. They've seen cost reduction. They've seen speed improvement. They've seen improved performance globally with minimal maintenance. They've cut their maintenance down in some cases by about 60%. So it does work. But again, if you're looking at situations where you have use cases where it doesn't fit, if you're trying to cram it in there, it's only going to cause you pain in the long term.

Casey West:

Absolutely. Yeah. It's one of these situations where not all that glitters is gold. I enter into a lot of conversations with tech leadership at large organizations who want to implement serverless because they read about it in CIO Magazine and they think that it will transform their businesses. And the truth of the matter is, architectures, they naturally develop to the context that they exist in. The reason that monoliths exist isn't because they're horrible and all the engineers in the '90s were terrible. It's because the hardware we had at the time optimized for that solution, or the other way around, that solution optimized for that hardware. Now have on-demand, scale to zero, near infinite infrastructure for most business applications. And so, architectures like serverless have emerged to take advantage of that context.

When you try and take an organization from the former to the latter, that becomes a much bigger conversation about how you're going to get from a monolith to a microservice, or how are you going to get an organization that's used to building in a monolithic environment to building in a microservices or a serverless environment.

The one thing that I want to add in here a little bit is that serverless is an instance of microservices as an architectural pattern. Microservices as an architecture is something that is often hyped up to be a more simple to manage environment. And nothing could be further from the truth. Microservices are optimized for discreet teams to work on discreet problems like Janani mentioned. The smaller you can make it, the more independent you can make it, the better optimization for that. But for every one of those you have, you have interdependencies, interconnections of communication, both at the human level and at the computer level that you now have to manage, it increases the complexity of your architecture a great deal.

And that gets to something that David mentioned, which is you need to invest in technical enablement to make sure your organization can manage that footprint effectively and responsibly. You need the continuous delivery of pipelines and you need the observability and the metrics analysis. If you don't have that, you don't know the environment you're going into. It can be pretty frightening. So I am often pulling people back from serverless to give them the bigger picture about what they're signing up for if they want to go in that direction. And it's often a bigger lift than I think folks realize.

Jeremy Morgan:

That brings up a good question. What are some common situation where it may not be right to make the move to the cloud? So if they're considering a cloud transformation, what are some instances where that wouldn't be the right thing for them to do?

Casey West:

This is something that's kind of near and dear to my heart. You all know that I have now work for a cloud provider. But most of our largest organizations that we work with have multiple cloud providers that they're engaged with. They have on-premise data centers. Often, many of them are continuing to invest heavily in their data centers, and for very good business and strategic reasons.

Applications tend to live where their data lives. And moving their data is costly and risky. And so, that's one area where the opportunity to do things like modernize in place might really make the most sense. That if you have an application that does require modernization, try to incorporate modern infrastructure and tooling into the existing environment. So this might be using a Kubernetes clusters in your own data center. Starting to stand those up and see about containerizing and migrating your applications in place and near the data that they have to manage and operate. That's one area.

Another area where we have a question about actually is essentially around COTS, commercial off-the-shelf software. When you have a closed vendor solution, how do you manage that? That's often an area where you are at the mercy of that vendor and that vendor's relationships with the cloud providers that you might be interested in. So making choices often comes down to more legal and licensing considerations and even technical considerations, but they're still important to think about.

David Tucker:

I'll say this. I mean, there obviously are still situations where it doesn't make sense to move to the cloud. That being said, I think what we've seen, especially with some of the different hybrid cloud enablement approaches that have come into play, the ability for organizations to consume large amounts of data from their own data centers, I think we're starting to see a lot of those break down. So again, even if you're shifting to a hybrid cloud model where you're still keeping several things in-house, there's the ability to integrate the cloud in some way. I feel, for most organizations, there is a way to leverage the cloud that will be beneficial if we look at the ability again to handle it for data warehousing, and then instantly plug that into your ML pipeline so that you can rebuild models that you can then use for inference.

There's things that it almost doesn't make sense for organizations to build on their own if they're having to start from scratch. So does everything in the cloud make sense for all organizations? No, but I do think for most all organizations, there is a portion of the cloud that would benefit their business.

Jeremy Morgan:

Jenani, you are in the AI and ML space. I'm very curious if there are any cases in that world where someone would not want to move to the cloud, like they would want to have a specific set of tools or some reason that they would want to be on-prem.

Janani Ravi:

I actually think David said it perfectly, correctly. For AI and ML, most of their applications are going to be new. You're not already going to have all these legacy applications running. And if you have these legacy applications, they probably don't work very well, because AI and ML, they're evolving so quickly you want to take advantage of the new algorithms and the new infrastructure that the cloud provides. So I actually think that is one area where cloud platforms are really amazing, right? Or even a few years ago, like just three or four years ago, when I'd look at GCP or AWS, the ML offerings on the cloud were very, very limited. They had distributed training, prediction. Anytime you're building a custom model, that's what you'd go look at. And that was enough, right?

But if you look at ML ops on the cloud, they've thought about this, right? Okay. Maybe you want some critical applications to run on-premises so you have Kubeflow pipelines which can run on the cloud and on-premises. And there's no mindset change that you need to have to run Kubeflow on the crowd or on-premises. It's Kubernetes cluster. The difference is whether you're managing the cluster yourself or whether a cloud platform is managing it for you. But the way you set up your components, your ML workflow, everything remains exactly the same. It's not like you have to adapt to a completely new world on the cloud.

And this is something that is different from legacy web applications where they're using all this proprietary stuff. So for ML workflows, you're going to build from scratch anyway. What I think, and I want to specifically bring this out, is ML offerings on the cloud now go beyond just training, and prediction, and deployment. Cloud platforms have a much more integrated approach to ML. They're like, "This is everything that you need for ML and here's everything in one place," right?

So let's consider GCP, that's the platform I'm most familiar with, and I'm sure Casey will attest to all of what I'm saying. The services are much more end to end. There's distributed training and prediction, but there is TensorFlow and this first class support for PyTorch, which did not exist like a few years ago. There's a data labeling service. You have Jupyter Notebooks to prototype your models. You can run them on VMs. Then you have the Vizier optimization service for hyperparameter tuning. So all of these services you know will be useful, just take advantage of them, right? You're not going to build all this in-house. It seems like a waste of time. And whatever I mentioned on the GCP, I looked this up on AWS because I was curious about it at some point and the same services exist. So it's not like you have to be on the GCP to take advantage of all of this.

One last thing and I'll shut up about this, is cloud platforms have really thought about democratization of data, right? It's not just you don't have to be a coder and a data scientist to do that to actually take advantage of what they have to offer. So they have all this drag and drop ML such as Azure ML designer BigQuery ML, SQL ML. Just give your data to analysts who do know how to code and see them do code things, right? Why focus your attention only on developers?

David Tucker:

I totally agree with all of that. I'll add on another piece to that. And that is for most organizations, they're starting off without people in-house that know how to do much with ML.

Janani Ravi:

Exactly.

David Tucker:

We know right now there's just a limited amount of people with this skillset. What I love about the cloud, and this is true whether we're talking about GCP, AWS, or Azure, is you can take gradual steps. Maybe your first step is you're leveraging on AWS, something like their AI services. Or if you're on Azure, you're leveraging their cognitive services. And these give you the ability to leverage machine learning just by getting developers that know how to call an API. You don't really have to know much more than that. And some of them provide a level of customization, but you don't have to go out and hire a data science team at the beginning.

But once you see that it adds value, then you can shift into using the more sophisticated platforms and you can go to either the AutoML or the SageMaker autopilot type solutions that guide you. And then once you bring everybody in and you've got a full level of expertise, absolutely, leverage the same tools but now use the services that allow you to do everything, including like hyperparameter optimization. All of these things you can do with the platform. And like Janani said, why would you do this from scratch if there's a platform that already exists you, and even the way now that I think most of the platforms have the ability to integrate human intelligence into the process? So if you need to do things like manual labeling or other things, there's a way to instantly integrate that in as well, even with people that you pick.

So there's just so much that you would have to build to get to the level of where the cloud providers are. And I think this is just one of many examples where with how the cloud, if you're using it as a key part of your strategy for enabling new technology, can help you get a gradual ramp into those new technologies without having to start off on day one with massive levels of expertise.

Casey West:

There's a parallelization here to compute in terms of multiple obstructions that serve different needs, right? So we have very high level compute platform as a service offerings where you give it code and then it runs your code, all the way down to managing virtual machines or bare metal. Well, the abstraction you choose will be highly dependent on your ability to use it and how it serves your needs. But start at the top and work your way down if you're able to. And we're seeing that in ML. I'm not a data scientist, but I can play one on TV with ML pre-trained APIs or with an auto ML solution like we have. And that's pretty great.

Janani Ravi:

I think the platform as a service offerings for ML are way cooler than the pass offerings for compute, right? With the drag and drop ML, that's... I mean, even my kid can build ML models if he wants to.

Jeremy Morgan:

We do have some questions here that I kind of want to get to. Richard is asking, "We're a midsize company and using a closed architecture on-premise retail management system." Casey kind of touched on this a little bit. But they have concerns about outages, especially for their point of sale system. What approach would each of you recommend for that scenario?

David Tucker:

I think there's a couple of things I'd mentioned with this. One of the things I'm always curious about is, what is the percentage of uptime that your organization and currently has with what you've deployed. Because in most situations... And not all. I mean, there's some organizations that are rockstars and keep their things up 100% of the time. But I think if we compare cloud outages versus the outages you have, because of the way that things are managed at such a scale and there's so many best practices built into the cloud providers, generally, I find that outages in the cloud are going to be less than the outages that exist, even if you're not fully taking advantage of some maybe multi-region high availability approaches.

If you do need something that's always on 100% of the time, and I've worked with clients that fall into this, you can look at different approaches that's going to be different per platform on how you enable a high degree of high availability for your applications and how you build in full tolerance. There's a lot of different approaches that can be taken. But again, in most cases, I find that the cloud would give an organization a better uptime than what you would get if you were trying to do it on your own.

Casey West:

I can corroborate that. I'm working with a global media company that moved from an on-premise data center into managed Kubernetes in GCP. I had the opportunity to have a bit of a retrospective with them around whether or not the move improved their outcomes with regard to software delivery. Did it increase their velocity and did it reduce failure rate? In particular, those two key metrics. What we found was that, in part, architecture played a role and that they had to re-architect their application in order to live in a containerized world well. That's independent of any infrastructure provider. That's just good architecture leads to better outcomes. But one of the other major things that we saw is that they went from a 25% change fail rate. Specifically, every time they made a change, it failed 20% of the time because of infrastructure and platform issues. They went from 25% to 0% on a managed Kubernetes solution on Google Cloud. And I imagine that would've been the same if they were running on EKS or similar.

So I think that speaks to that. And we saw dramatic improvements there. That's on self-built and self-managed software. When it comes to this specific environment, and I'm also working with one of the larger retailers in north America, I would say it gets back to the same idea though. Architecture plays a role. If you have a single point of failure centralized system, then moving that single point of failure centralized system is likely to lead to failures and outages.

So, thinking about your architecture in a more distributed fashion and reducing the single points of failure would be an issue. Perhaps, each store location has its own micro data center or micro edge. And we can think about making them a little more independent. That gets to the points around automation and continuous delivery being paramount in managing that service. Secondly, though, and to David's other point around uptime is, "What is your SLA?" If a store is only open for 12 hours out of 24, that leaves you 12 hours to play. So just think about what that is realistically. So architecture and setting realistic goals I think is pretty key.

Janani Ravi:

Just one point to add to the architecture. If you're going to be re-architecting your solution anyway and you have to add containerization, use Kubernetes which is usually part of this, right? Even if you're just going to be running on-premises, this is something you may do anyway. One thing is to consider the hybrid cloud so that your team feels you have some level of control over the machines that you have in your data center, but you're also on the cloud. And then once you're confident that things on the cloud work as you want it to, you can move entirely on the cloud. Here is where I want to say I think Anthos is something that is very cool and very different. Anthos on the GCP, which allows you to run like a hybrid cloud and have one platform to manage this hybrid cloud. I mean, maybe I'm a little out of date, but I haven't seen this particular service available on other cloud platforms, though I'm sure they'll catch up very, very soon too.

Jeremy Morgan:

Yeah. AWS actually has outposts for ECS so that you can use their same managed container service in your own data center now. That was an announcement from last year.

Janani Ravi:

Got it.

Casey West:

Yeah. And we have what we call Anthos which allows you to have managed to GKE on any infrastructure provider you choose.

Jeremy Morgan:

I have one final question for everyone. Looking at 2021 or even a few years down the road, what do you think are the most interesting emerging technologies coming out of the cloud that folks should pay attention to? Let's start with Janani.

Janani Ravi:

I have already spoken about this so I'm just going to say a few lines. I think, basically, AI and ML applications. That's something you're going to start afresh. That's a great way to just get introduced to the cloud. And then once you're there and people are familiar, you can bring your other applications over. Now, if you're scared about making that investment for all of your legacy applications, anytime you're building something and using ML, just do it on the cloud.

ML models are very, very sensitive to how intensively you train them and you tune them. So this is one class of software where you throw more compute, you throw more resources, you might really end up getting a 10X or even 100X more effective solution, right? With web applications and other business applications, that's not necessarily the case. So it makes no sense to have all of that compute capacity in-house. I mean, that's going to be very, very expensive. Basically, the cloud is your only answer. So it's worth investing there now so you can build up to be the complex solution that you want.

Casey West:

With my focus on DevOps and operations, software delivery, there's a huge amount of toil still in this space, which I think is kind of interesting because DevOps and SRE as areas of practice are entirely designed around engineering away the monotony of delivering and managing services at scale. And yet, we're still here with just reams and reams of YAML files and get [inaudible 00:51:39]. And so, I'm seeing this continuous trend toward the simplification and automation of that, in particular, the continuous delivery aspect of the CI/CD pipeline being more and more managed by the platforms that you choose. The automation in health checks, canary deployments, rolling deploy, and then actionable metrics and observability. I think the abstractions are going to get more built up and more robust over time.

We've had deep abstractions in the industry for about 10 years, but they've been extremely purpose built, expensive to maintain, and often brittle. I think we're coming around the corner on solving some of those problems.

David Tucker:

From my perspective, it's very similar to what Casey mentioned. Because if you go throughout this whole talk, one of the things we mentioned was there's a lot of time to get things set up. There's a lot of time to get continuous delivery going. There's a lot of time to teach your team new things. I think what we're going to see is just like the entire history of the cloud, things that were once complex are going to become easier. I think we've seen a bit of that. AWS announced their service last year called Proton that was going to enable organizations to build these kind of serverless components or serverless mini applications that can completely be customized, but you can have one team within your organization say, "Oh, here's how we do this" and it's got continuous delivery built in from day one. And another team can say, "Oh, we're going to deploy one of those" or, "We're going to integrate that in with an existing application that we have." This is kind of a serverless service catalog in essence.

In doing that, you're now taking away some of the biggest points of pain that exist in getting started. And so, I think of this helpful in two ways, both in terms of, again, reducing that initial toil that you have when getting something going, but also enabling you to help spread cloud native knowledge across your organization more easily. Because that's the other huge challenge. It's just getting everybody up to speed with that. And so, I think we'll see it go far beyond just that. I think we're going to see a lot of new services and approaches that are going to be enabled that ease that pain point that organizations have right now.

Jeremy Morgan:

I think we've covered are a lot of ground here and we've talked about some really good things for cloud transformations. Some really good information and I thank all of you for that. Does anyone have any final thoughts for our viewers today?

Casey West:

I would recommend taking time to pick your obstructions wisely. Don't necessarily try to become an organization that's expert at GCP or Amazon when maybe being expert at Terraform is the right move.

David Tucker:

I would say too, just to take advantage of the resources that you have. I realize this is a... I'm obviously a Pluralsight author. I think there's a lot of great content that you can go consume on the Pluralsight platform that people like several of us on here have authored that help lessen that learning curve for you. Don't feel like you have to immediately start and try to learn everything. Follow some of the courses and paths that are available. I believe that'll help you get started a little faster.

Janani Ravi:

And then this is the question I get asked really often especially by guys who are just starting out. They're 21, 22, or they're just 26, 27. They've worked for a while and this cloud has come along and they're not sure what to do how to upskill themselves. They always ask me, "So which cloud platform should I learn about? Or what should I..." I have no right answer. I always tell them, "You start with one. Pick one which seems like you feel comfortable with, maybe you have some exposure. And as long as it's one of the big three, it does not matter. And then, don't just stop there. Learn the others as well. Because if you're going be multi-cloud, your expertise in one cloud platform is going to be outdated very, very soon. It's not that different once you start." So this is something I would tell everyone who's thinking of moving to the cloud and learning about the cloud.

Daniel Blaser:

Thank you for listening to All Hands on Tech. To see show notes and more info, visit pluralsight.com/podcast.