Zombie identities: the hidden threat in your cloud Artwork

Cloud Security Today

The Cloud Security Today podcast features expert commentary and personal stories on the “how” side of cybersecurity. This is not a news program but rather a podcast that focuses on the practical side of launching a cloud security program, implementing DevSecOps, cyber leadership, and understanding the threats most impacting organizations today.

All Episodes

Cloud Security Today

Zombie identities: the hidden threat in your cloud

June 03, 2024 • Matthew Chiodi • Season 4 • Episode 7

0:00 | 39:37

Send us Fan Mail

Episode Summary

On this episode, Sandy Bird, CTO and Co-Founder of Sonrai Security, joins the show to discuss identity security in the Cloud. Prior to Sonrai Security, Sandy co-founded Q1 Labs, which was acquired by IBM. He then became the CTO and helped IBM Security grow to $2B in revenue.

Today, Sandy talks about his journey in cybersecurity and how to manage and eliminate dormant identities. Why should listeners be concerned about zombie identities? Hear about the permissions attack surface and where to start implementing zero trust policies.

Timestamp Segments

· [01:41] Getting into cybersecurity.

· [03:48] Key lessons from IBM.

· [08:40] Zombie identities.

· [12:53] Is it possible to manage and eliminate dormant identities?

· [16:17] Tying the process into a CI/CD pipeline.

· [21:01] The Dirty Dozen of Cloud Identity.

· [24:13] The permissions attack surface.

· [27:00] Zero Trust best practices.

· [30:08] Creating nett new machine identities.

· [33:17] Prioritizing identity misconfigurations.

· [35:15] Sandy’s mentors and inspirations.

· [37:37] How does Sandy stay sharp?

Sound Bites

"Nothing is a straight path in starting companies in your career."
"Zombie identities are identities that were part of previous projects and never get cleaned up."
"Fix the low-hanging fruit first, such as getting rid of zombie identities and locking down sensitive identities."

Relevant Links

Website: sonraisecurity.com

LinkedIn: Sandy Bird

Quantifying Cloud Access: Overprivileged Identities and Zombie Identities

This is the Cloud Security Today podcast, where leaders learn how to get Cloud security done, and now, your host, Matt Chiodi.

[00:14] Matt Chiodi: Sandy Bird is the CTO and co-founder at Sonrai Security, and I had him on the show today to talk, specifically, about identity security in the Cloud. Now, this is a topic that I've actually followed pretty closely over the last three to five years, because I don't think there's enough discussion happening in the industry about identity security. So, I was really excited to have him on. This is an area that Sandy has gotten extremely deep in. Sonrai recently released a report around this. You can check the show notes for a link to that report. I think you're going to find this useful. One of the things that came up in our discussion was the whole idea, and I want you to pay attention to this, that we have a good process, generally speaking, for controlling human identities, but when it comes to machine identities, we don't typically have a good process because they're not associated with a human, and so we dive into that topic. We talk about zombie identities. Anytime you talk about zombies, you're going to get my attention. So, I hope you enjoy the show, and again, if you like what you're hearing, I would love for you to pause the show right now and give us a five-star review wherever you listen to your podcasts. Enjoy the show.

Sandy, thanks for coming on the show.

[01:30] Sandy Bird: Hey. Thanks for having me.

[01:32] Matt: Awesome. So, I want to jump right in, because I was looking at your background. I think it's fascinating, and I love what you're doing at Sonrai. So, tell me this. At what point did you consciously decide to get into cybersecurity? Tell us about your journey. What happened? If you’ve got a cool story, I always love hearing those.

[01:53] Sandy: Yeah, it's always interesting about starting companies in your career, and how nothing is a straight path. So, I spent a lot of time doing network stuff, initially, and this is 20 years ago. Even flow logs were almost unheard of at that time. It’s a long time ago, and we built this product that monitored network traffic. It was interesting, and the CEO of that company, at the time, as we were starting this thing, saying we have this neat thing, he was like, “Oh, this looks like X-rays of the network. It's very neat. We have two options. We can sell this as a network product or a security product.” He's like, “I think security products will be worth more. We should do that.” So, literally, that is the tipping point of, how did Sandy end up in security?

So, from years, we built QRadar, which started as network anomaly detection and became a great SIEM product over time, and that was acquired by IBM later, but the whole thing was about security analytics and finding weird patterns of, we call them threat actors today. Before, they were probably just script kiddies playing in networks, in the early days, when we started that, but that's what it was all about, and I just love that part of it. The analytics part of security is so much fun.

[03:04] Matt: Yeah. I have a somewhat similar story, where when I first got started, it was the early days of the internet, dialing up with a modem, which, for some of our listeners, they're going to think I'm really old for saying that, but it's true. Some of us who have been around for a while, we had dial-up modems. We actually got to hear that digital sound of the modem connecting. For me, it was just about exploring something new. This was a new thing. It was emerging, very much like today, AI is emerging. I think, listeners who are perhaps earlier in their career, that's something to jump into and be all over. It is going to be, in my view, much greater than even Cloud and things like that. I love hearing that. So, you mentioned QRadar. So, you spent time at Q1 Labs and later, at IBM Security. What were maybe some of the key lessons that you learned from your experiences there?

[03:58] Sandy: There are many, and they're different. Sometimes they say Q1 Labs was four companies before it became IBM. We learned a lot about, when we were doing a lot of this anomaly detection early on, we discovered that that's a great thing for a customer, as a bad event happens, and they detect it, and they resolve it, and it becomes amazing, but if there's not a bad event during that month, in those days, that actually was infrequent. In some ways, compliance was more important. You had to have some other thing to sell, and so we built a lot of workflows into the product that were about, how do you measure security? How do you prove security? How do you do those things? So, I think that was an important part of that. As time moved on, threats became more complex and more interesting to deal with, and in some ways, more prevalent, and I think that obviously helped a lot of Q1 Labs, as we went through that. A lot more threat research went into a lot more modeling and those things. So, I learned a lot about building a business as part of that, as much as building the product.

What's interesting, and this is an interesting thing between, you talked about the early days of Q1 Labs, or even coming into IBM, one thing that's very different between today and, say, 10 years ago, is just usability of the product. We used to be able to train people on these analytics platforms to do crazy complex queries with all these interesting variables in them, and all these things, and people learned how to do it. It was part of the way that they worked. Now, the product pretty much has to be self-led. It has to be intuitive for the person to use and know what to do, automatically, and a lot of the work we do now is actually about, how do you get that experience, more so than maybe even the analytics in the back of it?

IBM was interesting, though, in that I came out of a company that was a single product. Q1 Labs, QRadar, it was a single product. Lots of features, but it was one product, and I landed in IBM and became the CTO of the security division, and immediately, we built this division. They didn't really have a security division before. So, we built the security division, and all of a sudden, I forget how many it was, somewhere between 22 and 27 products I had the next day, and they went from application security to some of the ISS tooling for the prevention devices, to identity governance products, which have almost nothing to do with any of this threat stuff, and then the QRadar platform. I had the balance of, I remember this, I had an MDM product that released code about every 24 hours, and I had a mainframe product Z Secure that release code every 18 months. So, from just understanding all of the aspects of security, you really have this visibility into all of it, but there was this triggering point for me that happened during that.

I was there for five years, and people say, “you were at IBM for five years?” Yeah, I actually was, and I had a good time. I had a lot of security research teams and stuff, and it was so much fun being there for the part that I was, but I saw this transition in our development teams, which I thought was super interesting. Before, you think about that IBM portfolio with 27 products, a lot of specialization. Somebody who's an identity specialist, they were a database specialist, they were application security specialist. A new modern development team doesn't work that way. You build an application, you deploy it with infrastructure as code. The people that deploy it are the database admins. They're the people that sign the identities to the workload identities. They're the people that actually resolve and patch the systems. They're the people that build the network infrastructure around it, and there, it becomes very much a generalist approach to deploying this thing, in some ways, because you're responsible for so much of it, and it makes those old tools not fit the modern model at all, and that was the premise, initially, for Sonrai Security. We said “data security is not really going to work the way that it used to work. It has to be done differently,” and so that was the initial spark that became Sonrai Security.

[07:50] Matt: It's a great story. I think you touched on this a little bit. Let's talk about identity security in the Cloud. This is one of my favorite topics that I've been going down the rabbit hole on for, probably, the last three to four years, and I still work with IANS Research there, and it's one of the things, my clients, they're always asking about Cloud security, even though Cloud is not new, and there are standards that weren't around three, five years ago. One of the things they always ask me is, “where do we start first?” and I almost always tell them, “focus, first, on identity, because it's so complex,” and from my research, you can have “defense in-depth,” but if you get your identity security in Cloud wrong, an attacker can very easily go right through all those defenses.

So, you guys did a report, and it's called Quantifying Cloud Access Risk: Overprivileged Identities and Zombie Identities, which is an awesome title, by the way, and we'll link to it in the show notes, but in that report, you mentioned that many identities in the Cloud are overprivileged or zombie identities. So, two questions there. What are zombie identities, and why should our listeners be concerned about them?

[09:07] Sandy: Zombies are the funnest one, and it's interesting, you said if you're starting in Cloud, what you should start with is identity, and we 100% agree with that. It's unfortunate, most of the customers that we end up working with have been in Cloud for five years, or much longer than just starting. So, they probably should have started with identity, but they didn't, and so what happens is that, over time, you build, we're going to call it cyber litter, because our SEs like to call it that. Our sales engineers call that cyber litter. You get all these identities that were part of previous projects that never get cleaned up, and some of them are as simple as a test function that you tried, and it worked, or it didn't work, and you left the permissions behind. Sometimes it’s, you built version one of the app, then you build version two of the app, but you never removed the identities from version one, and all the Cloud providers have allowed this to happen because they don't charge for identities. They charge you for compute. People clean those up, but they don't charge you for identity. So, they have a tendency to lay behind.

However, a threat actor trying to privilege escalate through these accounts can use those identities for bad purposes, and we have all these customers in our Cloud infrastructure entitlements management product that see these, and sometimes the numbers are overwhelming. You'll see 20,000 unused identities in 100 Amazon, and you're like, “what do I do with this?” and we built, over the past four years at Sonrai, all this automation to clean them up, and I shouldn't say no one used it, people use them, but in reality, across my whole customer base, if I really looked at it, no one was using them, and you start to ask them why this is, and there's two reasons for these zombie identities.

One, is the person that put the zombie identity in the Cloud, however they did it, through infrastructure as code, whatever, no longer even works with the company, and people are super scared about removing those because they’re like, “if I remove that, and I have to put it back, I don't know how to put it back. It's impossible. I deleted it from here, but maybe it got there through a piece of TerraForm. I don't even know where that is anymore,” and so they just won't do it. The second reason that they're there, that was actually for a good reason, which is, and we can have arguments if you should have break glass accounts or not, but we do have them, and you have a break glass account, and it's there for a valid purpose, and it's intentionally not used. So, it shows up on all the analytics as being a zombie, but you don't want to delete it. You don't want to turn it off. So, we spent a lot of time measuring this number.

There's a really neat stat in that report if you look under the covers of it. So, that report starts saying 20% of the identities in your Cloud property are humans and 80% of them are workload identities. It's not exactly that number, but it's super close to that, so we use that as a starting point, but then as we go down, and we look at zombie identities and sensitive permission usage, and all this, what you start to discover is that, when you get to the zombies, the number of that percentage that's left, say 61% of all the identities are zombies, only about 12 or 13% of them are human. The other 88% are machines, and it shows that we probably have a process to deal with humans. We know that when they leave the company, we take their privileges away. We should anyway, but we don't probably have that process with the machine identities. Zombies are really interesting to me, and they're certainly a problem we need to solve because they can be used by an attacker.

[12:28] Matt: That's interesting. I don't think I've ever heard anybody put it that way, in terms of, you're right, we typically have, because obviously humans are associated with something in HR records. Sometimes those can be connected back. If you're fancy, you can use SCIM protocols to automatically provision and deprovision accounts. You're right. That does not exist for machine identities. I never thought about that before. So, I think the numbers you were alluding to, you said that 61% of identities are unused, with 88% being machine identities. So, that makes me think, with such a high percentage, is it overly optimistic to believe that organizations can effectively manage and eliminate these dormant identities without disrupting DevOps? You touched on this DevOps piece when you were talking about how development has changed. So, with those two things, with the numbers being that high, I mean, is it possible to effectively manage and eliminate those dormant identities when DevOps moves so quickly?

[13:35] Sandy: Again, this is where history is so interesting. I always use the example of four years of my life, probably isn't four years, it was probably three, but for three years, we built all these automation processes to say, “Well, we've proved it hasn't been there for a year. So, we can go and delete that,” and we would just delete it, and no one doing it, and then I was like, “Okay, why isn’t anyone doing it? We could have this answer because of the break glass accounts,” and we said, “we have to think about a different way because people aren't going to do this, but we have to fix the attacker problem, still. We can't have them laterally moving through these identities.” Now, this is where Cloud identity gets super interesting. In all three Clouds, Amazon, Azure, and GCP, there are ways that you can take the identities, rather the human identities or machine identities, or some form of Federation, or whatever it is, and they have methods in their privilege, where you can short circuit things. You can basically create the equivalent of, I'm going to call it, deny-first. You basically can just shut the thing off, so it hits a deny.

It's more complicated in Azure than it is in the other two, but we'll leave it at that for now. We’re actually using the, we'll call it, attribute-based controls. Again, it's different between the Clouds, but let's say that it's called attribute-based control. Using an attribute and a rule, centrally, we were able to basically short circuit these identities. We call it quarantine. So, you’ve got your zombie. Now, we're going to quarantine it, and you quarantine it, but you get to leave everything in place, so it has the same policies or roles attached to it. If it had an access key or something, that access key still exists. It's still there. We haven't destroyed the key material. All that's there. So, by doing this, you can basically put these zombies into this quarantine state, but when a month goes by, two months go by, whatever it is, and some team has to wake one of these things back up, we were able to instrument the Clouds to say, as soon as it tries to wake up, we don't know what happened. Maybe a serverless function tried to use the identity. Maybe a human tried to assume into it.

Whatever happens, when we see that wake up happen, we can send a message to the team responsible for that area of the Cloud and say, “this just happened. Hasn't been used in nine months, but it woke up today. Do you want to allow that to happen?” and when they hit “yes,” we actually remove that attribute, and now the thing works, and so it's a really simple process underneath. Obviously, there's lots of measurements to get there. You’ve got to find 19,000 of them. You’ve got to apply the attributes to them. You've got to put all this coordination and automation in place, but the risk of turning it off now is super low, because you can turn it back on, and so it's a new way of doing it. It's a new way of thinking about it. I actually think it's way more effective than what we tried to do for the last three years.

[16:16] Matt: Now, and I'm thinking about the operational impacts, because, like you said, usually, no one wants to touch these accounts, like a service account. People are like, “hey, this has been there for years. Bob set it up, and Bob hasn't been here for three years,” and it's the same thing in Cloud environments. Like we said, Cloud is not new. Some people have been operating in the Cloud since 2007, and if you've been operating in the Cloud for that many years, you are going to have, like you said, 1000s, maybe even 10s of 1000s of these identities, and so there is always that hesitation of wanting to not break anything, because you just don't know. Is there a way, with this process that you're talking about, that it can be tied into a CI/CD pipeline? Because we're looking at infrastructure as code. Things may wake up, and it may not have been a human that called for that to wake up. There could have been a process. A human may not be in the loop with that. So, how would you tie that into a CI/CD pipeline? Have you seen that become programmatic?

[17:19] Sandy: Yeah, and again, I'll separate, in some ways, there's another part of that report that talks about sensitive permissions. So, really nefarious stuff. Create an Internet Gateway, poke holes in the infrastructure, those types of things, and then I'll separate that from zombies. Most of the identities that need the sensitive permissions, that don't use them, are awake all the time. So, they're running. However, and I'll use the Create Internet Gateway as an example, you really only do that once. You create the gateways there, and for the rest of your project, you never run that command again. So, it's quite possible that a piece of infrastructure as code, that laid that down the first time, may never trip one of those sensitive permissions again, and so depending on the window that we were looking at, we wouldn't have seen that, and so we would have taken that privilege away. We would have denied the sensitive permission. We would have left the rest of it alone.

So, it would have done its normal workload, but then you build version two of the app, and in version two of the app, you tear the gateway down and you put in some new gateway, or you create peering of a VPC for the first time, you do something else that's sensitive. If the team knows that upfront, they can provision for that in the CI/CD pipeline. So, there's API's for doing all this, and you can call them and get the approval ahead of the game, but we know, teams are not going to do that. That's not how it's going to work, and so what really happens in life is that you're running this in the development environment where they're building version two of the app, they're testing the pipelines, and they're doing this, and what's going to happen is, one of those pipelines will fail because they didn't know that it was going to make this change, and the way these centralized controls, they're sitting at the top level of the Cloud, so they bump into it, and immediately what happens is, and we actually recommend this, in most scenarios, for development zones, sandboxes, stuff like this, let the team self-approve their own work.

What happens is, we remove the 92% of things that were provisioned with STAR permissions that didn't need them, EC2 STAR, or something like that and didn't need to create Internet Gateway. So, those 92% just go away, but for the 8% that do need them, this TerraForm thing that's deploying infrastructure, it will bump into that, the developer will be like, “that sucks,” and we know immediately when it happens, so we send the whole team a notification on Slack saying, “hey, TerraForm role just was denied doing this. Do you want to approve it?” Yes, we do want to approve it. As they're doing that work, though, what's happened is they've built the list of approvals they need to go to prod now, because it says “this thing is going to bump into this when you put it in prod. We need to do that upfront before we go to the UAT environment or the staging environment, we go to the prod environment,” and so by using the self-approval in the low zones, you end up with a much more restrictive policy in the high zones as you do it.

So, it's actually the way, and again, this is back to history, my four years would tell me, we should perfect least privilege policies on every identity. That's what everybody wants, but I've just never seen it happen. After four years of doing this, it takes forever for people to actually implement these least privilege policies. We can measure them perfectly. We can give you the perfect policy, but just the time to put it in, get the change control in place, get it tested, get it rolled to prod, takes time, and no one has time to do it, and it's not the priority, usually, and so this is a better way to do that, and it puts the controls back in the developers’ hands to actually approve them when they need them, but if they accidentally over-provision something, that's okay. The attacker can't use it against them.

[20:55] Matt: I think maybe we talked about this in our show prep, concept of certain permissions are much more risky than others, and I remember years back, at one of the companies where I was running Cloud security, we had, and this is going back a couple of years, so things weren't as complex as they are today, but we had the concept of the Dirty Dozen. We had the Dirty Dozen. These 12 things should never occur in the Cloud environment. If they do, don't wait to send it to a SOC analyst review. They should be immediately denied. Is there some equivalent of that in the world of Cloud identity?

[21:36] Sandy: There definitely is, and I actually think your Dirty Dozen still exists in some ways. I’ll use the example of Stop Logging. I don't know what to tell you. No one should ever shut the logging off in the Cloud. It shouldn't happen. No one should be ever allowed to do it, and you should just deny it, centrally, and there's very little exception to that. The exception to it should almost be the act of the highest-level person going in and making that change. So, you should have those, but the problem is the next tier down, below that, gets very gray, because it's things like, “Well, I do need to change the security group rules on the VM. I do need to actually attach the Lambda to the VPC. I really do need to peer these two networks together,” and that's part of normal operations. It happens all the times in these teams, and when you look at it in a development zone, it happens even more, because they're testing stuff and trying, and whatever.

So, again, we’ll use Amazon as the example. I think there's, I don't know, 12,000/13,000 permissions in Amazon now. Maybe it's more. It grows every day. Amazon Q just came out, and there's a whole bunch of new permissions and services with it that just showed up this week. Kind of nice. We took that, and we looked at it, and we came up with somewhere, it's between 800 and 1000 permissions that we deemed were these grayer ones, where like, “I don't really want anyone just running that, but yet, when people give things STAR permissions, do they really need it?” and tons of examples, not less the create internet gateway, but even things like, I’ll use read-only as an example.

So, you have read-only roles. They shouldn't be able to do anything sensitive, but there's a couple of quirks in that. When you deploy a Windows VM in one of the Amazon EC2 services, you can get the Windows password for the first time it spins up, so that you can log into it, which makes sense. However, did you intend that to happen for everybody that had read-only access in the Cloud? Probably not, but they'll probably never call it either. So, we can protect that one. It sits behind that Cloud permissions firewall, but no one ever calls it anyway, except for the person that needs it. So, having that little bit of approval flow for the first time the attacker tries to call it and you say, “I don't think that person needs the password” is a good thing, and so, you have to do that work. We have to keep it up to date all the time, as to what are the real important permissions to be looking at? But it's not a dozen anymore, unfortunately, and it's not even really a single-human-solvable problem. You can't go through 12,000 permissions on your own. We have a team of people that do that and keep track of it.

[24:13] Matt: In your blog, you mentioned a concept that I had not heard of before, and I think it relates directly to what we were just talking about, and that's the permissions attack surface. I think I know what that is, but maybe elaborate a little bit, and what are some of the implications for Cloud security?

[24:32] Sandy: Yeah, and this is where, again, I call it the purist view versus what we have to do to make things actually work. If I was to look at the permissions attack surface, I would take almost any permission that had the ability to make a change in the Cloud, create something, delete something, move data, those would all be attack surface, and they are. The reality of the situation is, though, that list of permissions is very high. It’s many 1000s of permissions, and they're called by many identities in volume. There's a lot of frequency to them, and so if you were to actually try to protect every one of those, centrally, you'd be doing some form of approvals every day. It would be super annoying. Every time somebody wants to decrypt something, you’ve got to press a button. It's not going to work. However, when you do this subset of them, like I say, these 1000, we did a lot of research on this, we basically figured out that these are infrequently called, or I shouldn't say infrequent, they may be called highly frequent, but they’re called by very infrequent number of unique identities.

The number of unique identities that calls them is low, and so what happens is, the number of requests you get to approve them is also very low, but they get over-permissioned all the time by people running workloads and not restricting them properly when they do it, and so you end up with too many of these permissions sitting out there, and so when we talk about the permissions as attack surface, we talk two sides of the coin. You, absolutely, for these, say the first 1000, should put something like the Cloud permissions firewall in place that actually denies them, by default, until somebody approves it, but it doesn't mean you get to forget about the rest of the permissions. I mean, we have a whole other product that's about getting to least privilege, and for a really sensitive workload, you really should get to a least privilege policy, but if you can't deal with the 19,000 zombies, and the 20,000 over-privileged identities, do the 1,000 first, because you can make a massive difference in a week in that world, and then start to focus on the right things on the other side, and again, attack surface has two parts to it, definitely. There's really nasty stuff that you poke holes through things that allow you to do anything that you want, and then there's the things that are like, “yeah, you could use that as an attack, but you probably have to do something else first.”

[27:00] Matt: A key component of Zero Trust is zero standing privileges, but this is, as we were just talking about, extremely difficult in the Cloud, specifically with DevOps. What are some best practices for organizations that are looking to implement least privilege policies in their Cloud environments? Where do they start? Many of the clients that I work with, they have Zero Trust initiatives. They are multi-year initiatives because you can't get there immediately. Where should they start?

[27:29] Sandy: Yeah. I've changed my opinion of this over years, too. We always used to say, “start with the most sensitive place in your environment and get that locked down.” Generate least privileged roles for all those workloads, make sure you have zero standing permissions, if that's what you want to use for humans, or just-in-time access. There's lots of different ways you can do it, but to keep the human access low, but then looking at this over the last little while, what I discovered was, the problem is so vast across all of this, and when we looked at when our customers would come back after a red team, and they would say, “we got popped. There was a Get Actions that was tied to this automation role, and the automation role allowed them to create a new user, and then they use that user to log in, and that's how they get in,” and you're like, “yeah, and where was it?” Well, it was in this development account. Okay, well, that was on the last of your list to fix, not the front of your list, and it was a really simple problem. It's not like that infrastructure role never created a user before in its life. Why did we allow this to happen?

So, I've taken a little different approach, now. Let's fix the low-hanging fruit first. Let's get rid of those zombies and make it so you can bring them back if you need them but put a human in the middle of it. Let's take the really sensitive identities that are across the board, that are your Dirty Dozen, plus a bit more. Let's lock those down. Let's create a system where we can make the Cloud continue to run. Like I say, in that example of that red team exercise, that identity had never created a user before ever in its life, but it really was a live identity. The Get Action thing was tied to automation and did deploy stuff. So, let's lock out the permissions it doesn't use, that it's provisioned to us, and the first time it tries to use them, let's just bump it up against the wall and make sure somebody hits “Okay.”

So, it becomes a bit of an alerting system, too, to malicious behavior on these identities, and so if you do that first, you end up with massive reduction in the attack surface. You end up with an early warning system, but you can give anybody back, or any workload back the permissions it needs within 20 seconds. That's pretty good, and so I think, from a best practice, you get to a better spot fixing. Start with your 12, then do your 100, then get rid of your zombies, remove it now, focus on that critical workload, and get it wrapped down first. It just seems to be more effective and not as overwhelming. I think most customers, over the years, we've had customers that will just say to us, “this CIEM thing is amazing, but it's overwhelming. I don't know where to start. There's so many findings, there's so many things to fix.” So, I think this is a way where you can take a big chunk out of it in a hurry.

[30:08] Matt: Going back to almost where we started, where we were talking about the fact that when it comes to human identities, there's usually a pretty good process in place, but for machine identities, that's not the case. Is there a best practice that you guys have either created or stumbled upon, in terms of creating net new machine identities, how those should be handled?

[30:30] Sandy: I used to say, “provision them in the development environment with enough permission to get the job done that you think, and then monitor it during that process.” Have something, rather our tool, to do it, or Amazon has IM access analyzer. Azure, as part of their intro package for permissions management, does it. “Look at what it says it does, and then apply that as the policy before you go to staging and development.” That's a good best practice. Again, my success of actually getting people to do that is very low, though. So, as much as it's nice to say it, I'm not sure people are actually doing it. The best thing that you can do, though, I think, is that if you can give somebody enough ability to experiment and get the job done, but then centrally control it, I think it works in a much better way, but it's a scary thing to say, in today's world, “we want to distribute everything. We want to distribute security to the end. We want to do validation of infrastructure as code linting, before it ever becomes real code and deploy it. We want to do all that, and we should do all that. It's really important.”

I don't want everything to have STAR permissions, but then when I look at what people actually do, and they get in a hurry, and they do deploy stuff, I think we have to put a little bit of centralized control back into the identity in these Clouds. If we let the individual teams own it, it's too easy for them to slip up. Whereas, if there's a central team that can put some guardrails on it from the top, we end up in a better state five years in the future, versus today, where we're just in a big mess that we're trying to clean up.

[32:07] Matt: With your customers, do you see resistance from the DevOps teams to that?

[32:15] Sandy: I’ve found, there's way more resistance from the DevOps teams when you generate hundreds of Jira tickets and say, “there are hundreds of identities that you need to now make least privilege that you didn't do the first time,” versus saying, “we put this Cloud permissions firewall in place. We've taken away the permissions that you didn't use anyway. We've made exemptions for the ones that you did use, and if you need a new one, you have to press a button in Slack.” They’re like “that seems easy. I can do that,” and so, in some ways, it's actually less resistance to do it, centrally. However, you do have to, and this is where it's interesting, you don’t want to lose what we've already tried to train people to do is, you do need to lint your TerraForm before you do it, you do need to lint the policies. You still need to do that. It's not an escape to get out of that. However, if you don't do it, and you deploy everything to STAR, this is an amazing safety net.

[33:13] Matt: I love that. That's a really good point, and again, I love this because I've studied Cloud identities for years, and I feel like we've talked a lot about, obviously, this is a sub-area of configuration management. We've talked about how, if you look at Cloud, it's typically misconfigurations that are getting organizations in the news and the headlines, but identity tends to be where I see, I call it, the most egregious mistakes around that. So, we focused, for many years, on vulnerability management, patching, and whatnot, and obviously, we still see breaches all the time that are because of that, but I feel like we're just starting to get there, where people realize, in Cloud, specifically around identity, that this has been a sweet spot for attackers.

[34:02] Sandy: I had a great conversation with, it was a prospect we were working with. We had showed him how all this stuff works, and he said, “it's interesting. If I was to look at the actual priorities of my organization, it’s patch vulnerabilities, patch vulnerabilities, patch vulnerabilities, but when I look at the last four or five real problems we've had, they were all to do with identity misconfigurations. They weren't to do with vulnerabilities at all,” and so it's interesting. Even our own way that we prioritize stuff for our teams is focused off of the older way that it works. You have to patch. Of course we do, but this identity risk and Cloud is pretty real.

The story I sometimes tell, when I was moving back to my IBM career, at IBM, our lab that we built in was five boundary firewalls from the internet. So, there was nothing anyone could do to expose anything. No matter what you changed, you couldn't really hurt anything, and now, we put people on these Amazon accounts or GCP accounts, and literally every single thing is live on the internet with one API call. So, it's just a different world. We don't have the same protections that we had when we used to have everybody buried in the back of a corner of a lab at IBM.

[35:15] Matt: So, many of us have had people in our lives who helped us to get to where we are today, and you've been in security, now, for 20-plus years, and so you've certainly had people like that. I'm curious. Who was that for you? Who was somebody that you think of that really helped you to get where you are today?

[35:33] Sandy: Look, along the path, I've had so many amazing people as mentors. The current CEO of Sonrai was the CEO of Q1 Labs at the end of that thing, and he actually started doing marketing for that, and he's been super helpful. Brendan Hannigan has been a great mentor of mine, as we've gone through. I've always just had great appreciation for amazing sales engineers through my career, too. So, they were people that were selling our product and working with customers, building complex use case, and I remember, I had this great friend, he's actually a CTO of IBM now, Adam Frank, and he literally could solve any problem that you ever put in front of him, and sometimes it would be very uncomfortable things. You would say, “we really need to build this, whatever, disaster recovery scenario, and we don't really have a product to do it,” and he’ll be like “we're going to stitch this together,” but he always had this attitude, which I love, which was like, “We can do this. We can get it done,” and he's one example of many people, over the years, that I met that way, that just wanted to help customers and just get the things done. I thought that was amazing.

Again, if I go back through my entire career, sometimes it was, and I always say this, salespeople have the hardest jobs, and I know everyone out there listening to this is like, “I don't want a salesperson to ever call me. I never want to talk to them, I’m never going to answer the phone again. I hate them,” and I even say this to our own sales teams here today, like “I appreciate you so much, because I literally couldn't go into my job every day and just have people hang up on me for eight hours.”

[37:14] Matt: It is a certain skillset.

[37:17] Sandy: It is, and they’re not, in terms of influence, who makes you what you are, and did you learn the most from them? It's just appreciation for everybody in every team that contributes every day. The engineers that build amazing code. It's the team. It's always the teams that are the success. It's not the individuals.

[37:38] Matt: How do you stay sharp? I mean, Cloud security is constantly moving. What's your routine?

[37:44] Sandy: Yeah, there's definitely lots of learning that happens through, even podcasts, you've got your great blogs and stuff that you read, and all that stuff. I had this great experience with AWS the other day, where they were releasing a new feature, and we get on with the product management team as they were going to take us through this stuff, and I started to ask them a bunch of very technical questions, and they're like, “did you read the welcome document?” And I was like, “No, I read the API documentation. What do you mean?” They’re like, “all the answers to these are in the welcome document, not the API document.” I was like, “Oh, […].” So, again, there's just lots of learning. I actually find, for me, and everyone learns differently, I have to experiment. So, I have to actually go in and build the IM policies, and I have to put the conditions on them and see how they interact. That's what I need to do to really learn it and really understand it, but that's how I stay sharp. Just experimenting with the latest tech and how it works, and all those things.

Super excited, like you are, about this generative AI stuff and some of the API calls and stuff you can make to that. The hallucinations just drive me, I just love them. I love them. I smile. I'm so excited about them, yet they're so painful, and saying “how would I ever productize this thing? Could literally make stuff up.” So, anyway, I love all tech. I've always loved all tech, and so just learning it through experimentation is always my thing.

[39:02] Matt: Well, I've enjoyed having you on the show. Is there any question that I should have asked you?

[39:08] Sandy: I think you did a great job asking questions. I'm just happy to be here, Matt. Thanks a million for having me.

[39:15] Matt: This has been awesome. Thanks for coming on, Sandy.

[39:17] Sandy: Thank you.

Thank you for joining us for today's episode. To find out more, please visit us at Cloudsecuritytoday.com.