Open-source software supply chain risks beyond CVEs

·

8 min read

Open-source software supply chain risks beyond CVEs

Hey👋, I am back with another blog this time with an interesting topic about the Open-source software supply chain risks, attacking techniques and their defending techniques using one of the open-source tools. Let us get into the blog.


What is a software supply chain?

Let me explain this with an example. While building software/trying to solve a problem, we generally use packages. The usage of these packages while building something i.e., using other software in your software can be a software supply-chain.

Let's take a scenario in which you are trying to solve a problem and cannot build your logic. At that time, you search about that problem on Google and you see a package/library that solves the problem, add it to your code, and check whether that is working fine and if everything is okay you push the same code to production. This is the normal procedure that we mostly do. And those packages are open-source.

But when you add a package/library, you copy somebody else’s code into your application. So, how do you know that it’s not malicious?

In this blog, our primary focus is on analyzing packages to determine if they could be potentially malicious. We achieve this by understanding various software supply chain attacks and defending techniques.

Open-source software is everywhere

The software that we are using is mostly built using open source. Here open-source in the sense of using open-source package managers such as PyPI, NPM, RubyGems, Packagist and so on...

If we look at their statistics, NPM holds over 1600k packages and receives millions of daily downloads from their users. Similarly, PyPI holds over 300k packages and RubyGems holds 160k packages.

Google Shape;46;p7

The problem:

Since we all use these packages from package managers, the main problem is with the package managers. Because it gives us the flexibility that anyone can produce their packages for the package managers. They don't restrict anyone such as an individual developer, a group of developers, or a company to produce a package. These package managers provide us with a command line tool, with a single command we can produce a package. It is making the job simple to produce a package.

However, there is a limited security vetting/investigation over here. Because these package managers cannot verify the information and code provided by the maintainers of each package. You as a developer should be concerned about the security.

Source: https://imgs.xkcd.com/comics/dependency.png

Let's assume the above fort is a modern infrastructure of your organization. And you can see that a small pillar is almost holding the entire infrastructure. Assume that a small pillar is one of the packages that you are using in your production systems. If something wrong happens with that pillar, there may be chances that your entire infrastructure may collapse. That's the reason, as a developer we have to be careful while using any package.

What is a software supply chain attack?

The attacker(considering him a bad actor) targets the less secure package in the package managers and tries to inject purposefully harmful code(which may contain malware) into it. This injected code is not a common CVE, they can do anything such as download something from the internet, steal sensitive data from your system, and expose it to the internet. And we cannot guess the behavior of this code so we cannot patch this code easily. Since open-source software is adopted by a large number of people, there may be a chance of wide spread of this malware/malicious code.

Types of Software Supply Chain Attacks

There are many ways for attackers who have the moto to steal the sensitive information of people. Some of them were

  • Typosquatting,

  • Social Engineering,

  • Dependency confusion,

  • Account Hijacking

Typosquatting: Here in a typosquatting attack, the bad actor tries to reproduce/publish a new package that looks similar to the existing package by slightly changing the name of the package.

For example, if there is a package named colorama, a bad actor produces a new package with a similar name colourama by adding 'u' to it. The bad actor can do anything like changing the name of the package, changing the order of the package name, and removing the separators in the package name.

Let's say the original package name is python-nmap, a bad actor can produce the new package with nmap-python as the name of the package. This is an example of changing the order of the package name.

Social Engineering: In this attack, the attacker first tries to gain the trust of the maintainers and authors of that package by contributing code. After some days, this bad actor asks for access to that package. Then he will push some malicious code to that package.

One of the live examples from an event stream from the javascript ecosystem: https://github.com/dominictarr/event-stream/issues/116

Dependency confusion: In large organizations, there is something called internal mirrors to the package managers. There is a default configuration for this internal mirror it only takes the package with a higher version. This means it compares the version of the package from the internal mirror and the public package manager and chooses the package with the higher version.

Assume that there is a malicious version(new) of a package in the public package manager and a good version that is present in the internal mirror. If a developer uses that package in the application while developing, this internal mirror will avoid the good package from the internal mirror and bring the malicious package from the public package manager since their version is higher than the existing package.

Source: https://images.app.goo.gl/7N4pGUZCj9NDidan9

These are some of the techniques of software supply chain attacks.

Now let us look at some of the defending techniques by which you can avoid software supply chain attacks.

Defending techniques

The first and foremost thing is that the package maintainers should be careful while maintaining the package and perform some security actions such as enabling Two-Factor Authentication and giving a good name to the package.
Always they should ensure that a valid code is pushed into the package.

These measures fall short when the package maintainer converts like a bad actor/attacker. This kind of thing is also called a protestware attack.

In that case, developers and organizations should analyze the package code and behaviour before adopting the package. And they should use pre-vetted packages.

Vetting a package is possible.

But...

Did you know that one package depends on another package to complete its task?

A recent study shows that an average Javascript application has 10 direct dependencies and 683 indirect/transitive dependencies. That means one package depends on other packages.

Google Shape;127;p17

Source: https://hpc.guix.info/static/images/blog/pytorch-dependency-graph.svg

This is a dependency graph of a package from PyPI python-pytorch. You can look at how it is depending on other packages.

Manual vetting of these many packages is infeasible and error-prone. However, vetting using a software tool is possible.

To do all these things(above mentioned) there is an open-source tool called Packj which is maintained by Ossillate(a cyber security startup).

About Packj

Packj is a command-line tool that checks and analyzes the details of the package features such as

  • Release the history of a package and tell whether the package is old or abandoned.

  • Check whether the package is available publicly or not.

  • Check whether it reads the data from files and sends it over the internet by running a static analysis on the code of that package(if the source code is available publicly).

  • And by analyzing all these things at last it gives you a report on your package.

Packj Demo

We can analyze a package in Packj in different ways, if you want to know other ways please refer to the documentation of Packj. Now I will demonstrate it through a docker image since it is a recommended way.

To analyze through docker image, Docker should be installed in your machine.

Command to analyze a package would be like:

docker run -v /tmp:/tmp/packj -it ossillate/packj:latest audit -p <Package_registry>:<Package_name>

You need to supply the package registry and package name while analyzing.

Let's try it on one of the most used packages from Python package manager requests.

docker run -v /tmp:/tmp/packj -it ossillate/packj:latest audit -p pypi:requests

When you run this, you probably would get a report similar to the below image.

If you observe, this Packj tool is deeply analyzing the package details from the package manager and providing the complete report.

While analyzing, it is doing a lot of things such as

  • Check the package name to determine whether it is an original or typo-squatted name.

  • Whether they have provided valid descriptions or not.

  • Whether the version of that package is the latest or not.

  • Checks the release gap between the versions and also determines whether it is an abounded package or not.

  • It checks the validity of the author through his email.

  • It checks the readme and home page of that package.

  • It also checks whether the source code is available publicly or not.

  • When it gets the source code/repo URL, it further goes and analyzes the details of that repo and checks whether it matches properly with the details provided in the package manager.

  • Checks the dependencies that are being used in this source code.

  • Statically analyses the source code and checks whether it is using any API calls such as open/read/write which is used to read/write files, socket/send/recv which is used to upload/download data from the internet, and exec/eval/fork calls which are used to internally generate the code.

  • The complete details about these API calls such as file path, API name, and line number on which a particular API is called can be provided in the complete report. Below is an example that shows the details of an API call.

    •         "reads files and dirs": [
                      {
                          "filepath": "requests-2.31.0/setup.py",
                          "api_name": "open",
                          "lineno": "78"
                      },
                      {
                          "filepath": "requests-2.31.0/setup.py",
                          "api_name": "open",
                          "lineno": "81"
                      },
              ]
      

      This example shows that in the requests-2.31.0/setup.py file at line number 78, an open API call is used.

  • From static analysis, if it has found any API calls, it further goes for runtime analysis and observes the code behavior.

  • Also, it checks whether there are any known CVEs present in the package.

This is how you can use the Packj tool, analyze the package that you are using in your production systems and be aware of Opensource software supply chain attacks.


Thanks for reading. Hope you have learned something from this blog.

Packj repo: https://github.com/ossillate-inc/packj

Packj website: https://packj.dev/

Â