A Battle Plan for Disaster Recovery in the Wake of Ruthless Ransomware Attacks

Batel Zohar &
34:36 min
Oct 31, 2023

In the wake of ruthless ransomware attacks, organizations require a battle plan for effective disaster recovery solutions. In this session, we will present the possible scenarios of ransomware attacks, explore the preventive actions to be taken in order to avoid an attack and we will do an overview of the key elements in the DR plan, discussing the important assets, DR restore, and more. By following this battle plan, organizations can rebuild their foundations and emerge stronger in the face of ransomware adversity.

Batel Zohar

Developer Advocate for JFrog

Batel Zohar is a Developer Advocate for JFrog and has a background in DevOps support engineering, web development, and embedded software engineering. Prior to this, Batel served as an Enterprise Solutions Lead on a dedicated team that accompanies and assists large customers through the architectural implementation of the JFrog platform. She loves her dogs, plays guitar, and is a fan of Marvel’s movies.

Transcript

Batel Zohar 0:09
Thank you. Thank you very, very much, Melissa. And hi, everyone. Today we’re going to talk about ransomware DR. What is it, how we can recover from it and more interesting information? I think you should know. So first of all, let’s talk a bit about disaster recovery and disasters. Right? They could be anytime, in any place in any cloud provider. And I’m going to show you some example in a second. But you know, it’s always always will happen when we’re drinking our hot coffee, or when we’re just going to our vacation or it’s have to be in a jinx way. Right? So let’s talk about the clouds, I know that most of you are all of you actually are using clouds today. So over the years, we can see a few example of disaster that happened to the most familiar cloud providers. So first is you’re right, there was a network fiber cut caused by a survey, some weather conditions in the Netherlands. So the server’s was down. And of course, it could be all everywhere. So also in GCP, and AWS, we had some network destruction during the last year. And I guess it’s all of you remember, nerves, Virginia, that was down for a long time. Let me know in the chat, if you want to share some crazy stories still, or any disasters, or if you’re agree or disagree, it would be great. So disaster can happen everywhere. And anytime, for all kinds of reasons, right? There’s different reasons. Sometimes it’s not really super important, but it could happen. And today, we’re gonna talk about disasters and some ransomware DR., and how we can protect ourselves. So first of all, thanks again for joining. I love to be here. My name is Batel Zohar, and I’m a developer advocate and the last three years, almost three years in jFrog. I wasn’t embedded engineer before then move to full stack. And then then join jFrog. I was part of the support team, and technical account manager and I would love to talk about DevSecOps AI. So really, feel free to reach out to me whenever you have any questions or just want to chat. I would love to discuss with you guys. And as you can see, in the picture, I have two crazy doggies, Ben Joe’s and Saul. And, and yes, that’s it for myself so feel free to reach out later. And I think we can start.

Batel Zohar 2:35
So just to make sure we are all in line. Let’s explain what is a DR. Right disaster recovery, you probably heard about it. So it’s set of processes, policies and procedure that an organization implements to resume its critical IT system applications and data after a natural human made disaster. So it’s good to be simple, right? I just accidentally remove any file that could be critical for my team. It could be much more than that. But anything that will distract my day to day work, it could be an issue. So the goal for DR is to minimize the downtime and data loss in the event of catastrophic events such as a hardware failure, or a data center outage or just power outage or cyber attack. Or even just as we said before, just fluids or something if quake or anything that is pretty natural right can affect us. So let’s say that we have an horrible disaster in Peru, right? We want to make sure that the server on Uruguay are still up and running. And everything functional as usual. So usually would like to create a similar environment on different location, different place, different city, you can whatever will decide it depends on the on the size on the company, and different stuff like that. So we want to make sure as I said before, that we always up and running, right, no disruption.

Batel Zohar 4:09
Everything was perfectly, of course, sometimes is a crazy dream, but we’ll do our best together. So what is a ransomware? Ransomware DR is a cyber attack. Like as an example can, you can’t reach out to your data because it’s encrypted. Right? So have you heard about ransomware before, and the attacker probably will encrypt all of my data so I can reach out to my application to my servers, anything like that. So I’m kind of stuck outside, right? They he just that I could just close my door. And now what I’m gonna do, right and just keep getting any errors or requests for some financial stuff and things like that. So ransomware is a type of malicious software that encrypts the victim’s data, right? This is the official statement for it it making it inaccessible until ransomware is paid to the attacker. So once more attack has became a significant treat in the organization work right. And having a robust ransomware disaster recovery strategy is essential to mitigate the impact such as a factor such attacks like that. So let’s take a second and think what will be the things that we need to think about when we’re talking about cyber attacks, ransomware and DRs and stuff like that. So first, let’s make sure that we are aligned with all the things that related to DR.

Batel Zohar 5:39
So let’s show the difference between DR and ransomware DR. All right, so the two main difference, of course, there is some more, but this is the main one that will help us to make some order in this information. So already, our environment will also be exposed during the attack. And so all our data mean, and our Dr will be vulnerable, right? In case if we have any any DR environment it could be exposed to and quiet attack, it’s also very, very different between just a simple der, to run somewhere in yarn. So when we’re talking about quiet attack, then interior can attack quietly and slowly. And then it will start to destroy or taking out our data, while the organization is not aware of any issue. So think about it, it could be someone from the organization or outside our organization. And if we will imagine our cloud services, it could be someone that just keeping getting more and more information about it about an environment. So it could be that in the beginning, the attacker won’t have all the privileges, right, they can just reach to a specific location with specific privileges that just showing the VMs for example, right, it just just show me the nice, pretty simple, just read. And then after a while, he will find other places that he can reach more and more data and gain more. And we’re really just in that case. And that could be very, very hard to to understand how long he is in our environment, what he tried to achieve what how much information he got when he was part of our organization.

Batel Zohar 7:25
So once I’m going to DR happens, intention intentionally sorry, in most of the cases. Now let’s talk a bit about RPO. I guess that some of you heard about RPO, which is recovery point objective, which means how much data we are willing to lose during a DR event, and how much it will harm to our business. So of course, we really would love to cover everything and make sure that we have the best DR in the world. Everything will just if something will happen, we can just go back. But sometimes it’s not that easy, right? We need to measure the amount of that data that can be lost without any damaging to the business operation or finance, finances. Sorry. So it’s really hard to try to focus and understand worth this point. And sometimes it’s very, it’s even impossible, right? Like we of course everybody thinks that our they are up and running. And they will be safe when what they have to do. But sometimes it’s a little bit more than that. Now, we have the RPO and the RTO. So have your recovery point, as we talked before, we have the recovery time, how long it will take us to get back to business to make sure that everything is up and running as we wanted before, and how quickly we might we need to recover? What is the cost for the downtime? So it’s a little bit tricky, and try to think about your organization, and what will be the best between the two of them, right? Why do what will be okay for a downtime, like if five hours of downtime and it’s something that it’s horrible. Or if we’ll lose just 20% of our data. That’s okay. It’s not like how I should plan it for the future, right? What will happen, if we’ll have a disaster or how I can prevent it, if I can prevent, I’m not sure if we can prevent it. So we’ll discuss about it.

Batel Zohar 9:31
So, as we said cyber attack happened intentionally. Right. So who will want to harm our company? So one case its internal attack right employer within the company, like you can see in this story over here. I’ll send you the link later on. And Amazon manager who saw almost 10 billion with fake invoices and use the money to buy house Tesla, Lamborghini he sent to jail of course, but thinking about it It can be someone from the company already know, all the trick all the businesses being that you need to know. And it’s much more easier than someone out of the company that would like to harm the company, right. And I guess that most of you heard about stories like someone that just got fired or something like that, and was really based off from the situation and want to make sure that he is harmed somehow. So this is just one story. We have tons of stories and example like that, that intent that they turn on, the attacker could be an internal employee, ex employee or something like that. Now, or the second case is just someone and which is external, right, these could be any external attacker, always looking for money, or looking for resources, or looking for whatever it could be crypto mining, it could be anything that you would like to achieve with whatever use getting to the source. So it could be someone like the crazy dictator if you saw the movie, and he’s trying to reach out. So the idea here is to gather resources and thoughts in that case. So as we saw, we have the internal one and the external one. So probably you are now saying, what are the chances that this will happen? Right. And I’m doing all of these preventive action, like running security tools to alert and instruction, or strict policies and permission to block access, or even using SSO method, right, and there is much, much more like tons of things that we can go by the rules and do this securely. But, you know, eventually, we can’t be fully protective. Even if we’ll try our best, I believe that there is always an hole somewhere. And so there is always a way to get in. So let’s try to be prepared for the ransomware attack. Right? That’s right, you can consider what we need to think about when we have a ransomware attack. So one of the most important things to take into consideration is the cabling. Maybe some of you heard about it, don’t hear me note, but the cabling, I think it’s one of the most important services from a it’s basically one of the most important things here that I want you to take with you. And basically a will decouple the important services from one to another and to prevent that attack on a few areas. So the way to do that is using different accounts. So think about it that I’m changing, I’m using the production environment in one specific service or account. When I’m talking about testing, it will be on a different one, it could be for different services that were up and running. For our website, we will do a specific specific account for our service that the payments are is for example, it will be on a different one. And then the chances are reusing because the attacker will find one of the services of course, you can get through all of them. But it will be much more complicated when why we’re having different and different SSO different password different ways to get into those accounts. So for example, it will be good to separate the security tools that we’re protecting our environment, in an account with a minimal permissions. Because if the attacker will reach out to the security tool, he can disable all our preventer preventive sorry measures to the security alert on instruction.

Batel Zohar 13:50
But another example, which is pretty simple, too, is if we have an administrative interface in our product, right, we have something that’s related to our admins who would like to separate it from the rest of the services, it will reduce the risk for the attacker will have admin access, which will allow configuration changes, right so it’s very specific for a specific component. So it won’t be so hard. So we’re trying to decoupling our services. Now let’s talk a bit about data layer. By data layer we refer to the database that usually holds important data. So for example, we have our customers data data sorry, and in order to be protected from ransomware attack the database next it should be copied to a different account. So in case all of in case of cybersecurity, the attacker will not be able to destroy them. So we will be protected and we’ll have a clean backup of the data. Nice. A good practice will be to place the longest period as possible. But what is the longest period right Think about it like I should backup the data for a year for it to for a few months. So it’s really hard to understanding because we have the retention of snapshot. And if it will take time to discover the instruction instruction, it will still have the original data in place. So what is the long time again, right? Remember that quiet attack, it can be for a week, month or even year. So I’m gonna just ram just get recover the full year, it will take time to recover to right. It’s also something that we need to think about. Or we will just take a few months. But what will happen if someone just login or register to the system? Just a little bit before our backup, what we’re doing in that case? So I’m sorry, you have a lot of questions to think about when you’re leaving this session. And I’ll try to answer more most of them. And another important thing is the storage, right? The storage data can be binaries, it could be logs, files, and more that we would like to keep in our storage layer. So to keep it safe, we need to replicate it to a different account. So once again, pretty simple, right? Just a replication, just another service that will up and running. But also here, we need to take into consideration that a new version of the file could be after attack teapots. And again, the data is corrected. So we need to keep our older version or the original version of the five. And what we’re doing in that case, right? Like how how much we are going to pay for it. And how it’s going to work when we have a lot of data, right? Where today. It’s not just a few megabytes or gigabytes, or even terabytes. It’s much more than that. We have tons of data all over the world. And one will be the red line kind of right like when it’s okay, just to keep and stop right here? Or should we try to get more and more data? Should we need to back up a little bit more. So it may make it a little bit more complicated. And of course, eventually the workload, we will need to move our workload to the new account to work with the data that we already transfer. Right? So all these verses have copy from one place one from to another, could take some time. So again, we need to think about it. So should we prepare all the infrastructure ahead? Right, maybe it’s something that we need to do. And the first day that we are just creating a new project tool, we will think about the DR copy everything. Of course, today, there are great tools, right, especially on the cloud, there is a lot of tools that help you to create that er, and we can discuss about some tools later on. But the idea here is, again, why should we just create an automation, just create everything from scratch in case that we have an ER? Why do you think this will be the best way? If you would like to share it on the chat? It could be cool. And we can talk a little bit about that. And but thinking about what would you do, right? Think about the cost and the time. And the idea that we don’t have too much time when we have a run. So we’re gonna do right? Thinking about idea that that’s accurate can say something like if you want to pay in three days, we’re just removing all your data, all your encrypted data and what you’re doing right now, especially when we’re talking about for larger institutions, so So really, we need to think you have what will be the best approach what we’re doing in case that we have something like that, and how to attack it in that case.

Batel Zohar 18:53
Now, let’s try to conclusion some of the things I know I didn’t cover everything, but we can talk a little bit about it later. So we don’t have too much time. So first, I think that the cabling the main services from one another, like I said before, the security tools will help us to reduce the risk. The second one will be how we can make sure that the data in that data will be stayed immutable over a long period of time. So you know, we have our database, we have our storage, and we want to make sure that we have it for a long time. And it will stay there just for as much as we need. And let’s think also about maintaining DR environments, application upgrades, failover tests again, it’s not something that will be so easy to maintain, right? Like even if we’re thinking about the simplest scenario of just copy the same environment right from one place to another. We need to upgrade our services or upgrading our original server So we need to make sure that we are doing the same in the DR environment. All right. So it’s easy, we’ll just create the automation to do that. But what will happen if it’s working perfectly in one environment, it won’t happen in our Dr. Even if they’re in the same some some stuff happened before, right? I guess the viewer will agree with me that it could happen. And, like, we need to decide what will be our RTO and RPO. And taking into account the costs of the selected solution. Again, like a lot of people that I talked with them about DR told me, you can just wait depends about, let’s say, 1000s of dollars, about having a case that maybe something will happen. But what will happen if really something will happen. And then when we need to set up our IDR as soon as possible. It’s also an issue. And we need to think about it like where we’re crossing the line. So what will be our exact point that we would like to recover in that case? How long it should happen, and it should take to recover? And what will be the best solution here. So I hope I make some order today. Thank you very, very much for being here today. Let’s see, I see that we have some q&a. So okay, cool. Hey, Vito. Hey, release.

Melissa McKay 21:29
Yeah, no, I started out listening to you. And I was immediately terrified. I think this is an excellent topic for this theme that we have haunted DevOps. But you did a really good job of giving, like some real life solutions, how to tackle it without just being frozen and not know what to do. Can you are so I am not as familiar with like all the terms and everything for disaster recovery. And you mentioned a couple of times RTO and RPO. Can you go over those, again, just just for our audience, they can really for mine and being selfish. Just remind me what those actually are, and how important they are.

Batel Zohar 22:10
Sure, thank you very much. So IPR is recovery time objective, that measure the duration of time, an application can be down without any, any damaging business operation. So as I tried to show before, I’ll try to find my slides, again, just in case. So we have the recovery time objective. And if we’re talking about RPO, we have recovery point objective, which is measure the amount of data that can be lost without considering a, again, damaging operation or finance. So if you don’t mind, I’ll share my screen again. And many of you, Brian, yeah, help me with that. Or I guess you’re Yes, great. So let’s look, you think about that for a second again. So we said, first, we have the data and if we can afford to recreate or lose over here with the recovery point. And then we have the recovery time. So it’s kind of a tricky game between the two. Like, again, what will be more important to my organization just to recover as soon as possible what we had before a year, or we need everything that we are having before a month or something like that. So we can just recover quickly than just spending tons of money off keeping everything for a long time. It’s kind of a tricky game. And we have the recovery time and recovery points, actually, which is the RPO. Thanks for asking. Cool.

Melissa McKay 23:35
Yeah, this is really helpful. And I actually took a screenshot of this because it’s something I want to tweet about later. Talk about. So have you yourself personally been a victim of an attack? Like, just personally, a company you’ve worked for, or anything like that?

Batel Zohar 23:55
So not exactly personally, but my husband company, just I know that they have some I thought they actually help to attack other attackers and stuff like that it is more devsecops professional than me. But I tried to be involved as much as they can to learn a little bit more about it. I noticed after I left one of the companies that they work before they have an attacker, I didn’t have all the details, I just know that it was ransomware at that point, it just beginning like before, six years or something like that. And but I think it’s really interesting. Like I always like to get more and more information about it. And it’s pretty crazy to see how different organizations are looking for dates, right? Like there’s so many ways to attack the eighth or so many ways to try to prevent it. Even whenever I talk with our DevOps or other DevOps engineers all over the world. It’s kind of a very tricky or interesting question to think about because there is so many ways to think about it for even shared what to do in that arena in case it’s happening, what will be the best approach even after they got to talk, how to really talk to the to the attackers and stuff like that. It’s It’s really crazy, like so many things to think about. And you can do that quickly. Yeah,

Melissa McKay 25:17
this is true. And it seems like, you know, we come up with solutions, and then attackers find a way around it. So it’s probably not something even that will remain static. These solutions will grow and change over time. But some of the things that you talked about seem to be, especially your but what did you like decoupling, you know, things like that, it seems like those will always be a good solution, making sure that you keep things separate, so that when things do get attacked, you know, maybe not everything can go goes down at the same time, or you have exactly some other way of accessing your stuff. So I’m same as you, I have not personally been involved in an attack like that in an organization at any level. But certainly, personally, I mean, I shopped at Target. So I’ve gotten several letters in the mail since then, about various attacks on data. So I was really paying attention to, you know, what you were talking about when you’re talking about data, and making sure that you have redundant data and things like that. But there’s probably some stuff that we need to do personally for our own disaster plans, right. Yeah, so we have a lot of devs in our audience. And I’m sure I’m not alone in saying that, like, even you know, if we go down to just our local workstation, and maybe, maybe you don’t have an attack per se, on your system, but have you ever, you know, had your computer blow up on you are dealing with?

Batel Zohar 26:57
Yeah, definitely. Especially now. I had something similar with my phone, actually. Yeah, it’s not exactly an attack, but he just died. And I was like, alright, well, I’m gonna do right now. I didn’t prepare backup. I was like, everything is on the cloud, but not really everything

Melissa McKay 27:15
has happened yet. So those are dangerous, too, right. So I’m not even just attacks, but just like things that happen that slow us down. Right. And that’s one thing. So as a developer, what would you suggest? You know, what, what can we do individually to protect ourselves? Wow,

Batel Zohar 27:34
all right, that’s a really hard question. But first of all the basic stuff that you already know, don’t use the same password and stuff like that. Use that somewhere like the skin like evolved or something that you can have a different password that will attack your specific account. And so on. two step authentication, alright, it’s so popular right now, everyone asked you to add a two step authentication, but not everybody doing good so far. So please, for the important stuff, like your YouTube account, for your authenticator account for different accounts or something like that, make sure that it’s up and running in always allowed, right want to make sure that we are as secure as possible. And, of course, there is much, much more than that. But the basic one is, again, kind of the cackling again, decoupling of services for our own, because I don’t want to have my personal stuff on my job computer, right, just in case that one day, I will just join another company or just do something else. And whereas all my data, I want to make sure that it’s in my personal computer. And when we’re really involved with other stuff, for my opinion, it’s much more safer. Yeah. And there is some regulation that they can see and stuff like that. So just in case, keep it to your personal stuff on different place. And you see that there isn’t and question actually, from the audience, what would be a good timeframe to recover? And I think it’s really hard. Maybe you would like to share with some folks too, but again, it depends on the organization. It depends how big it is, how many? How many PRs were doing every time? How many versions that were we releasing? And what is the timeframe? Right, like in case if we’re talking about specific application that was attacked? If we’ll go one or two version backwards? Is that enough? Or if you think it has for the last stream on for whatever? I’m sorry, I don’t have the best answer, Mark. But I think that we need to take it and chuck it away the border organization, right, what we having and what is the timeframe? How many bills do we have there? What is the difference between the versions and so on? And try to think about it I think it’s really, really hard to get or even impossible to get to a perfect timeframe. But depends on how many developers we have what we’re developing And what will be the difference between the versions? Because even today, right, we would like to keep the application as is, and just add a small services each time. So maybe it’s not so bad if one service will go down, or it won’t work or something like that, and we can just rebuild it. But, again, changing every time so sorry, I don’t have the perfect answer for this. But what do you think Felisa? You think we have a good timeframe to recover? Yeah, I

Melissa McKay 30:31
mean, you talked about having that balance, you know, figuring out, how much is it worth to you? And then, you know, you also have to ask these questions, is it better to go back or to move forward? So like, right, perhaps, you know, just removing, you know, let’s say an attack came in through a particular package, maybe remove it altogether, or maybe do something different. But Mark, he touches on something that is a sensitive subject for us, in particular, because I think we probably get approached with these questions all the time, especially about how often should I update? Let’s say you have, right, like, it’s because it’s not usually that it’s not the case that all of this stuff you have is proprietary, you might be using other libraries and packages and things like that for your software alone, right. So how often, you know, there’s one end of the spectrum where you update all the time to make sure that that you don’t have, you know, new vulnerabilities creeping in or, you know, dealing with existing vulnerabilities, things like that. But on the other hand, you might be updating something so frequently that now you can’t get back to a point where you were safe. Do you know what I’m saying? Yeah. So many moving pieces now that it’s hard to hard to really tell where to go back to.

Batel Zohar 31:54
So super hard.

Melissa McKay 31:56
Yeah, so managing those builds, and stuff can get tricky. But I think at the very least, having a good list of, you know, everything that is in your software to begin with, you can at least begin that process of figuring out where an attack might have come come from, if it turns out to be like a vulnerability in your software, for example. Right. So yeah, lots to think about Patel.

Batel Zohar 32:29
Yeah, this is a great story.

Melissa McKay 32:33
Yeah, definitely. I loved your conclusions, really liked, you know, especially what you talked about with decoupling about redundancy and data. So yeah, I’m gonna go think about this for a while and think about even just personally, in my level, just as a developer, what can I do, not only to protect myself and help my own disaster recovery plan, but how can I prevent myself from, you know, being inadvertently part of an attack or something on my own organization because of being lazy with passwords.

Batel Zohar 33:08
My crazy story about it, actually, I have my smart home automation with Hamas system. And I was, I was really, you know, silly in the beginning. And it was like, everything is working perfectly, and I don’t need a backup and everything was good. After Of course, it took me a long time to develop everything, and configure everything. So after I read about that, they created my own disaster recovery for my smartphone. So in case that I have, none of them will be down, I have data respirate it will be up and running and get the power and all of it. But it’s could be even a simple stuff, right? Think about it tomorrow, somehow, the router will be down or so we need another way to get your power outage and stuff like that. So it’s not an official LDR. But think about how it can help you with your something small that you’re already doing all the time. For your organization. Yeah,

Melissa McKay 34:03
I didn’t even think about that with the smart homes and like your internal network. And yeah, I was just thinking, you know, I have a daughter now that’s getting really interested in in coding and stuff. And, you know, she’ll come visit and get on my network. And you know, I don’t know exactly what all she’s pulling down or what all she’s doing. So this is probably a conversation I need to be having with her as well. So, yeah, interesting, interesting stuff.

DevOps

More Awesome Sessions

SESSION

Supply Chain Robots, Electric Sheep, and SLSA

Join 5 fantastic speakers this Halloween as they reveal why DevOps isn't as scary as it may seem.

SESSION

Bewitching Deployments: Using AI to Deploy Java Apps to Kubernetes this Halloween

Join 5 fantastic speakers this Halloween as they reveal why DevOps isn't as scary as it may seem.

SESSION

The Phantom Pathways of a Distributed Trace

Join 5 fantastic speakers this Halloween as they reveal why DevOps isn't as scary as it may seem.

SESSION

Spooky Connections: Integration Tests -- Because Reality Can Be Scarier than Our Nightmares!

Join 5 fantastic speakers this Halloween as they reveal why DevOps isn't as scary as it may seem.

SESSION

#gitPanic - Merging and Rebasing

Abbey Perrini will cover techniques and strategies that will make you an expert at resolving conflicts, squashing, merging and rebasing your repo using git.

Check out all 389 sessions

Batel Zohar

Transcript

Tags

More Awesome Sessions

Supply Chain Robots, Electric Sheep, and SLSA

Bewitching Deployments: Using AI to Deploy Java Apps to Kubernetes this Halloween

The Phantom Pathways of a Distributed Trace

Spooky Connections: Integration Tests -- Because Reality Can Be Scarier than Our Nightmares!

#gitPanic - Merging and Rebasing