QR Code challenge part 1

First steps

In this challenge, we’re given a 64-bit ELF executable written in OCaml. Running the binary we see it’s validating an alledged QR code as such:

This QR code thing misled me for a while looking for special characters that would be used to encode the QR code. However, the fact that sending a line feed (’\n’) ended our input was a first hint that this program might not consume a standard QR code.

Let’s dig into the code. Loading the binary in IDA, 1344 fuctions were found during the analysis. This is a fairly big binary. We need some shortcuts to focus our efforts on the validation code. To do so, we can search for the error string that is printed when we enter an invalid input: “[!] Unable to read QR code.”.

By doing so, we land in sub_4022F0. At first sight, this function does the following things:

It assigns 278 entries in a list stored as a global variable
The challenge banner is printed
There are 4 unknown functions
There is a function returning an integer which must equal 0x24ED to get our flag

From there, I wanted to know where our input is read. I wanted to go fast (it’s a CTF after all), so I spawned a debugger and checked the return values of the 4 unknown functions. Turns out sub_4060D0 returns a char * containing our input. Tracking our input, we can see the pointer is assigned to a structure. This structure is then passed as a parameter to sub_403370 (validate_input in the screenshot) as we can see below:

This is where the fun begins.

Analyzing the input validation `sub_403370`

This function takes two arguments:

A structure containing our input, a callback and some other stuff
A pointer to a global variable

One could argue that r15 is also passed as an argument, but r15 is used as a global register by OCaml’s memory allocator, so we can ignore it.

So far, we didn’t really feel the fact that we were looking at an OCaml binary, but here is where that ends.

First thing we notice when reverse engineering sub_403370 is recursion. Yay! Let me add a screenshot of the CFG to share the fun:

validate_input calls itself with the next element of what seems to be a linked list and the exit condition of the recursive call is when rbx (the current element pointer) equals 1, which is very likely a null or a zero litteral encoded the OCaml way as we’ll see later.

I didn’t really try to understand what the linked list was about at this point, I wanted to get to the validation of the input itself. After the recursive call, we get into the function that was part of the structure passed to the function. Following the calls, we end up in the core of the validation in sub_403600.

Analyzing `sub_403600`

This function is where I spent most of my time working on this challenge. I wanted to go fast. I took shortcuts and made assumptions that lead me nowhere. At some point, I had to face the fact that I’d have to fully understand this function. Let’s look at it.

This function takes many parameters, but the most important ones are a pointer to our input and a hardcoded linked-list. It iterates over each character of our input while navigating in the linked-list using indexes. I believe the linked-list is just the OCaml implementation of an array. Let’s just look at it as an array. These are the operations that are performed in an iteration of the loop:

A character is read from our input based on an incremental index we’ll call i
This character is encoded the OCaml way (input[i] << 1) | 1
An item in the hardcoded array is retrieved using another index we’ll call j
A field from this item is retrieved and used to update j
A function is called which returns either j or j + 1 based on the character of our input

After iterating over all the characters of the input, the function retrieves the array item at the index corresponding to the last value of j. It then returns the field at offset 0x10 of this item.

At this point, we know each of our characters can cause the j value to be j or j+1. There’s also an interesting check in the code making sure that encoded input - 0xE2 <= 3.

What does this check mean? To satisfy this expression, our encoded input must be one of these values:

0xE2
0xE3
0xE4
0xE5

Since our encoded input is left shifted by one bit, we can decode these values by right shifting by one bit which gives us:

0xE2 » 1 == 0x71
0xE3 » 1 == 0x71
0xE4 » 1 == 0x72
0xE5 » 1 == 0x72

So there’s only two possible values in our input. If we convert these values to ASCII, we get ‘q’ and ‘r’. Remember the name of this challenge? QR Code… Wow, what a troll! It’s not looking for an actual QR code, it’s looking for a code made of ‘q’ and ‘r’. Nice one Towel!

Since only two values are valid, it would make sense that one of these two values is responsible for the j value and the other one for j+1 mentionned above at step 5.

At this point, we need to get a better understanding of the items that are retrieved from the hardcoded array. Looking at how the code is accessing this structure, we quickly find out that there are 4 8-byte fields in the structure. The field at offset 0 (the dereference of the return value in the screenshot) is compared to our encoded input.

So, it contains the value of the item (q or r).

Then, the field at offset 0x18 is used in as an index in a function call:

It is also used to update j so we know for sure it’s another index.

struct item
{
	uint64_t value;        // offset 0x00
	uint64_t unknown_01;   // offset 0x08
	uint64_t unknown_02;   // offset 0x10
	uint64_t next_index;   // offset 0x18
}

The field at offset 0x10 is the one that is returned at the end of the loop when the item at the last value of j is retrieved.

Then, looking at how this return value is used, we find out that if it is 1, the value returned will be 0xF and if the value is something else, the returned value will be 0x23.

At this point, I dumped the content of the hardcoded array to see if there were entries with a value different from 1 at offset 0x10. Turns out that 4 items had a different value, which was 3. It seems odd that there are only 2 values possible and it’s not a boolean that is used here. Once again, it’s because of OCaml. These values are encoded. Decoding these two values gives us:

3 » 1 = 1
1 » 1 = 0

So, this is a boolean value. Since only 4 of the items have this field set to true, we probably want the last item that we retrieve to have this field set to true. This is where things became clear for me. We’re navigating in a graph with our input and we need to end up in an exit node.

In fact, the items in the graph are edges. Each edge points to the next node which contains an array of two edges. You either take the ‘q’ edge which is the edge at item->next_index or the ‘r’ edge which is the edge at index item->next_index + 1. The next_index field of the edge points to the next node which also contains 2 edges. I illustrated this to make it clearer:

Finally, the field at offset 0x8 is used internally as a boolean flag which tells if this edge is the base of the node, in other words if it is the first edge of the node’s edges array. So, we have this structure:

struct edge
{
	uint64_t value;
	uint64_t is_node_base;
	uint64_t is_exit;
	uint64_t next_index;
}

This is it for sub_403600. Here’s a rough draft of how it would look like in Python:

def walk_graph():
    cur_index = 3
    for i in range(len(my_input)):
        my_encoded_byte = (my_input[i] << 1) | 1
        edge = get_item_at_index(graph, cur_index)
        next_idx = edge.next_index

        if get_item_at_index(graph, next_idx).value == 3 or my_encoded_byte in [0xE2, 0xE3, 0xE4, 0xE5]:
            error()

        # Get the index of either the 'q' or the 'r' edge
        actual_next_index = get_next_index(my_encoded_byte, next_idx)
        edge = get_item_at_index(graph, actual_next_index)
        if edge.value != my_encoded_byte:
            error()
        cur_index = actual_next_index

    if get_item_at_index(graph, cur_index).is_exit:
        return True
    return False

We need to provide an input such that the last edge has the is_exit field set to 3 which is 1 or true in the OCaml source code.

278 graphs

Having to find an input that would fulfill this requirement for a single graph would be way too easy. That’s where we have to go back to the first function we looked at. Remember the 278 entries that were stored in a global variable? Well, turns out these are 278 different graphs. The graph validation function we just analyzed will be called 278 times with each one of these graphs and our input needs to end up in an exit edge for each of them.

The recursive function sub_403370 is the one iterating over the 278 graphs and as an output it builds a linked-list that contains 278 elements for the results of each of these graphs (either 0xf or 0x23 as showed earlier):

Then, compute_result in the following screenshot:

computes a value based on the result list as such:

result[0] + result[1] - 1 + result[2] - 1 + ... + result[n] - 1

which gives 0x24ed if all the items of result are equals to 0x23.

At this point, it became more of an algorithm problem than a reverse engineering one. I extracted the graphs and explained the problem to a teammate who came so close to solving it. All of this is part 2 :)

Here’s the script I used to dump the graphs:

graph_heads = [
    0x5E6A48, 0x5ECF18, 0x5F21F0, 0x5F6E60, 0x5FB940, 0x600048, 0x604808, 0x608EC0, 0x60D5B8, 0x611C38,
    0x6162D8, 0x61A950, 0x61F020, 0x6236F0, 0x627DA8, 0x62C460, 0x630AC8, 0x635198, 0x639788, 0x63DE30,
    0x641B80, 0x646210, 0x64A888, 0x64EED8, 0x652D00, 0x657368, 0x65B9B8, 0x660038, 0x6646D8, 0x668D50,
    0x66D370, 0x6719D8, 0x676028, 0x67A528, 0x67EB48, 0x6831C0, 0x687868, 0x68BEE8, 0x690520, 0x694B70,
    0x699240, 0x69D8A8, 0x6A1F50, 0x6A63E0, 0x6AAA60, 0x6AF0E0, 0x6B3730, 0x6B7840, 0x6BBE60, 0x6C04C8,
    0x6C4B70, 0x6C91F0, 0x6CD840, 0x6D1E20, 0x6D5F60, 0x6D95F0, 0x6DDC68, 0x6E22B0, 0x6E6780, 0x6EAE10,
    0x6EF490, 0x6F3B10, 0x6F8178, 0x6FC738, 0x700D80, 0x705358, 0x7099A8, 0x70E060, 0x7126C8, 0x716D48,
    0x71B398, 0x71F838, 0x723EA0, 0x728520, 0x72CB70, 0x7311D8, 0x735840, 0x739C20, 0x73E0C0, 0x742750,
    0x746890, 0x74AEF8, 0x74F440, 0x753AE8, 0x757CD0, 0x75C1F0, 0x760510, 0x764B48, 0x7691B0, 0x76D818,
    0x771EF8, 0x776570, 0x77AAD0, 0x77F138, 0x7837B8, 0x787E20, 0x78C470, 0x790AF0, 0x794CF0, 0x7992E0,
    0x79D378, 0x7A1698, 0x7A5D18, 0x7AA260, 0x7AE880, 0x7B2EE8, 0x7B7520, 0x7BBA98, 0x7C0118, 0x7C4780,
    0x7C8DF8, 0x7CD460, 0x7D1A98, 0x7D60B8, 0x7DA708, 0x7DED58, 0x7E3390, 0x7E7638, 0x7EBCA0, 0x7F0320,
    0x7F4970, 0x7F8F90, 0x7FD5F8, 0x801C48, 0x806250, 0x80A858, 0x80EEA8, 0x8132D0, 0x817988, 0x81BFA8,
    0x8205F8, 0x824C00, 0x829268, 0x82D8E0, 0x831D80, 0x8363D0, 0x83A7E0, 0x83EE30, 0x843480, 0x847A28,
    0x84BFB8, 0x850338, 0x854958, 0x858FD8, 0x85D5F8, 0x861C60, 0x8662E0, 0x86A930, 0x86EB90, 0x873210,
    0x877860, 0x87BEC8, 0x880500, 0x884B50, 0x889050, 0x88D6D0, 0x891E18, 0x896288, 0x89A600, 0x89E7A0,
    0x8A2DF0, 0x8A6FA8, 0x8AB2C0, 0x8AF940, 0x8B3F90, 0x8B85D8, 0x8BC958, 0x8C0ED0, 0x8C5538, 0x8C9BB8,
    0x8CE1C0, 0x8D2828, 0x8D6E90, 0x8DB4F8, 0x8DFB18, 0x8E4168, 0x8E87B8, 0x8ECCD0, 0x8F1388, 0x8F59D8,
    0x8F9FF8, 0x8FE648, 0x902C08, 0x907258, 0x90B8D8, 0x90FF58, 0x9144B8, 0x918AF0, 0x91D158, 0x9217E8,
    0x925E50, 0x92A4A0, 0x92E7F0, 0x932840, 0x936A40, 0x93AD90, 0x93F260, 0x9438E0, 0x947960, 0x94BFC0,
    0x950608, 0x954C58, 0x959218, 0x95D6E8, 0x9619D8, 0x966090, 0x96A560, 0x96EBE0, 0x973218, 0x977838,
    0x97BC40, 0x9802A8, 0x984268, 0x988708, 0x98CDB0, 0x991430, 0x995900, 0x999F20, 0x99E510, 0x9A2B30,
    0x9A6FD0, 0x9AB578, 0x9AF448, 0x9B37C8, 0x9B7D10, 0x9BBFD0, 0x9C0648, 0x9C4C08, 0x9C92D8, 0x9CD850,
    0x9D1EB8, 0x9D64C8, 0x9DAB18, 0x9DF0C0, 0x9E3738, 0x9E7D80, 0x9EC3D0, 0x9F06C0, 0x9F4D40, 0x9F93E8,
    0x9FD9D8, 0xA01458, 0xA05A78, 0xA0A0F8, 0xA0E220, 0xA128A0, 0xA16F20, 0xA1B2B8, 0xA1F818, 0xA23E38,
    0xA28488, 0xA2CB08, 0xA31158, 0xA35508, 0xA39978, 0xA3DFF8, 0xA42378, 0xA466C8, 0xA4ABF8, 0xA4F128,
    0xA52FF8, 0xA575E8, 0xA5BC88, 0xA601A0, 0xA64858, 0xA68EA0, 0xA6D340, 0xA719A8, 0xA75FF8, 0xA7A520,
    0xA7E4B0, 0xA82800, 0xA86E38, 0xA8B458, 0xA8FAC0, 0xA93DE0, 0xA98400, 0xA9C6F0
]


def parse_graph(head):
    my_list = []
    next_node = head
    while next_node != 1:
        elem_ptr   = ida_bytes.get_64bit(next_node)
        next_node  = ida_bytes.get_64bit(next_node + 8)

        value        = ida_bytes.get_64bit(elem_ptr)
        is_node_base = ida_bytes.get_64bit(elem_ptr + 0x8)
        is_exit      = ida_bytes.get_64bit(elem_ptr + 0x10)
        next_index   = ida_bytes.get_64bit(elem_ptr + 0x18)

        my_list.append((value, is_node_base, exit_node, next_index))

    return my_list


graphs = []
for head in graph_heads[::-1]:
    graphs.append(parse_graph(head))


with open(your_file_path, "w") as f:
    f.write("graphs = [\n")
    for i, graph in enumerate(graphs):
        f.write("[\n".format(i))
        for value, is_node_base, is_exit, next_index in graph:
            f.write("({}, {}, {}, {}),\n".format(value, is_node_base, is_exit, next_index))
        f.write("],\n")
    f.write("]\n")

NSEC20 QR-Code. part 1/2 - Mon, Jun 8, 2020 - Frédéric Vachon

QR Code challenge part 1

First steps

Analyzing the input validation `sub_403370`

Analyzing `sub_403600`

278 graphs

Back to Home

NSEC20 QR-Code. part 1/2 - Mon, Jun 8, 2020 - Frédéric Vachon

QR Code challenge part 1

First steps

Analyzing the input validation sub_403370

Analyzing sub_403600

278 graphs

Back to Home

Analyzing the input validation `sub_403370`

Analyzing `sub_403600`