Friday, May 10, 2019

What is the Scudo hardened allocator?

I am going to make a small series of posts about the Scudo hardened allocator, starting with some general considerations then getting into technical details.

Scudo is a user-mode allocator which aims at providing additional mitigation against heap based vulnerabilities, while maintaining good performance. It’s open-source, part of LLVM’s compiler-rt project, and external contributions are welcome.

Scudo is currently the default allocator in Fuchsia, is enabled in some components in Android, and is used in some Google production services. While it was initially implemented on top of some of sanitizer_common’s components, it is being rewritten to be standalone, without dependencies to other compiler-rt parts, for easier use (and additional performance and security benefits).

Why another allocator?

The journey started a few years ago while exploring the landscape of usermode allocators on Linux. It is no secret that Google uses tcmalloc, and in all honesty, the internal version is blowing everything else away. By a lot. But as it was noted by my esteemed former-colleagues Sean and Agustin, its resilience to abuse is ... lackluster, to say the least.

To understand our options, let’s have a look at a somewhat typical benchmark for production services at Google, involving a lot of asynchronous threading, protobufs, RPCs and other goodies, all of that running on a 72 core Xeon machine with 512GB of RAM (this is not meant to be the most rigorous of comparison, but give you an idea of what’s up). The first metric is the number of Queries Per Second, the second is the peak RSS of the program (as reported by /usr/bin/time).


Allocator

QPS (higher is better)

Max RSS (lower is better)

tcmalloc (internal)
410K
357MB
356K
1359MB
dlmalloc (glibc)
295K
333MB
142K
710MB
24K
393MB
18K
458MB
FATALERROR**

SIGSEGV***

scudo (standalone)
400K
318MB

* hardened_malloc is mostly targeting Android, and only supports up to 4 arenas currently so the comparison is not as relevant as it strongly impacts concurrency. Increasing that number yields to mmap() failures.
** Guarder only supports up to 128 threads per default, increasing that number results in mmap() failures. Limiting the number of threads is the only way I found to make it work, but then the results are not comparable to the others.
*** I really have no idea how real world payloads ever worked with those two.

tcmalloc & jemalloc are fast, but not resilient against heap based vulnerabilities. dlmalloc is, well, sometimes more secure than others, but not as fast. The secure allocators are underperforming, when working at all. I am not going to lie, some benchmarks are less favorable to Scudo, some others more, but this one is representative of one of our target use cases.
The idea of Scudo is to fall in the category of “as fast as possible while being resilient against heap based bugs”. Scudo is not the most secure allocator, but it will (hopefully) make exploitation harder, with a variety of configurable options that allow for increased security (but that comes with a cost in performance and memory footprint, like the Quarantine). It is also meant to be a good working ground for future mitigation (such a memory tagging, or GWP-ASan).

Origins

While various options for improving existing allocators were considered, a meeting with Kostya Serebryany‎ lead to the plan of record: building upon the existing sanitizer_common allocator to create a usermode allocator that would be part of LLVM’s compiler-rt project.

The original sanitizer allocator, which is used as a base for the ones of ASan, TSan, LSan, was originally written by Kostya and Dmitry Vyukov‎, and featured some pretty neat tricks that made it fast, and extensible.

A decent amount of things had to be changed (things were allocated from a fixed base address, in a predictable fashion, overall memory consumption was on higher side, etc). The original version targeted Linux only, and then support came for other Google platforms, Android first, and then Fuchsia.

Aleksey Shlyapnikov did some work to make Scudo work on Solaris with SPARC ADI for some memory tagging research, but that work was never upstreamed. I will probably revisit that at some point. As for other platforms, they will be up to the community.

Fuchsia decided to adopt Scudo as their default libc allocator, which required rewriting the code to remove dependencies to sanitizer_common - and we are reaching the final stages of the upstreaming process.

No comments: