NSEC20 QR-Code. part 1/2 - Mon, Jun 8, 2020 - Frédéric Vachon
QR Code challenge part 1
First steps
In this challenge, we’re given a 64-bit ELF executable written in OCaml. Running the binary we see it’s validating an alledged QR code as such:
This QR code thing misled me for a while looking for special characters that would be used to encode the QR code. However, the fact that sending a line feed (’\n’) ended our input was a first hint that this program might not consume a standard QR code.
Let’s dig into the code. Loading the binary in IDA, 1344 fuctions were found during the analysis. This is a fairly big binary. We need some shortcuts to focus our efforts on the validation code. To do so, we can search for the error string that is printed when we enter an invalid input: “[!] Unable to read QR code.”.
By doing so, we land in sub_4022F0
. At first sight, this function does the following things:
- It assigns 278 entries in a list stored as a global variable
- The challenge banner is printed
- There are 4 unknown functions
- There is a function returning an integer which must equal 0x24ED to get our flag
From there, I wanted to know where our input is read. I wanted to go fast
(it’s a CTF after all), so I spawned a debugger and checked the return values
of the 4 unknown functions. Turns out sub_4060D0
returns a char *
containing our input. Tracking our input, we can see the pointer is
assigned to a structure. This structure is then passed as a parameter to
sub_403370
(validate_input
in the screenshot) as we can see below:
This is where the fun begins.
Analyzing the input validation sub_403370
This function takes two arguments:
- A structure containing our input, a callback and some other stuff
- A pointer to a global variable
One could argue that r15 is also passed as an argument, but r15 is used as a global register by OCaml’s memory allocator, so we can ignore it.
So far, we didn’t really feel the fact that we were looking at an OCaml binary, but here is where that ends.
First thing we notice when reverse engineering sub_403370
is recursion. Yay!
Let me add a screenshot of the CFG to share the fun:
validate_input
calls itself with the next element of what seems to be a linked
list and the exit condition of the recursive call is when rbx
(the current
element pointer) equals 1, which is very likely a null or a zero litteral encoded
the OCaml way as we’ll see later.
I didn’t really try to understand what the linked list was about at this
point, I wanted to get to the validation of the input itself. After the
recursive call, we get into the function that was part of the structure passed
to the function. Following the calls, we end up in the core of the validation
in sub_403600
.
Analyzing sub_403600
This function is where I spent most of my time working on this challenge. I wanted to go fast. I took shortcuts and made assumptions that lead me nowhere. At some point, I had to face the fact that I’d have to fully understand this function. Let’s look at it.
This function takes many parameters, but the most important ones are a pointer to our input and a hardcoded linked-list. It iterates over each character of our input while navigating in the linked-list using indexes. I believe the linked-list is just the OCaml implementation of an array. Let’s just look at it as an array. These are the operations that are performed in an iteration of the loop:
- A character is read from our input based on an incremental index we’ll call
i
- This character is encoded the OCaml way
(input[i] << 1) | 1
- An item in the hardcoded array is retrieved using another index we’ll call
j
- A field from this item is retrieved and used to update
j
- A function is called which returns either
j
orj + 1
based on the character of our input
After iterating over all the characters of the input, the function retrieves
the array item at the index corresponding to the last value of j
. It then
returns the field at offset 0x10
of this item.
At this point, we know each of our characters can cause the j
value to be
j
or j+1
. There’s also an interesting check in the code making sure that
encoded input - 0xE2 <= 3
.
What does this check mean? To satisfy this expression, our encoded input must be one of these values:
- 0xE2
- 0xE3
- 0xE4
- 0xE5
Since our encoded input is left shifted by one bit, we can decode these values by right shifting by one bit which gives us:
- 0xE2 » 1 == 0x71
- 0xE3 » 1 == 0x71
- 0xE4 » 1 == 0x72
- 0xE5 » 1 == 0x72
So there’s only two possible values in our input. If we convert these values to ASCII, we get ‘q’ and ‘r’. Remember the name of this challenge? QR Code… Wow, what a troll! It’s not looking for an actual QR code, it’s looking for a code made of ‘q’ and ‘r’. Nice one Towel!
Since only two values are valid, it would make sense that one of these two
values is responsible for the j
value and the other one for j+1
mentionned
above at step 5.
At this point, we need to get a better understanding of the items that are retrieved from the hardcoded array. Looking at how the code is accessing this structure, we quickly find out that there are 4 8-byte fields in the structure. The field at offset 0 (the dereference of the return value in the screenshot) is compared to our encoded input.
So, it contains the value of the item (q or r).
Then, the field at offset 0x18 is used in as an index in a function call:
It is also used to update j
so we know for sure it’s another index.
struct item
{
uint64_t value; // offset 0x00
uint64_t unknown_01; // offset 0x08
uint64_t unknown_02; // offset 0x10
uint64_t next_index; // offset 0x18
}
The field at offset 0x10 is the one that is returned at the end of the loop
when the item at the last value of j
is retrieved.
Then, looking at how this return value is used, we find out that if it is 1, the value returned will be 0xF and if the value is something else, the returned value will be 0x23.
At this point, I dumped the content of the hardcoded array to see if there were entries with a value different from 1 at offset 0x10. Turns out that 4 items had a different value, which was 3. It seems odd that there are only 2 values possible and it’s not a boolean that is used here. Once again, it’s because of OCaml. These values are encoded. Decoding these two values gives us:
- 3 » 1 = 1
- 1 » 1 = 0
So, this is a boolean value. Since only 4 of the items have this field set to true, we probably want the last item that we retrieve to have this field set to true. This is where things became clear for me. We’re navigating in a graph with our input and we need to end up in an exit node.
In fact, the items in the graph are edges. Each edge points to the next node
which contains an array of two edges. You either take the ‘q’ edge which is
the edge at item->next_index
or the ‘r’ edge which is the edge at index
item->next_index + 1
. The next_index
field of the edge points to the next
node which also contains 2 edges. I illustrated this to make it clearer:
Finally, the field at offset 0x8 is used internally as a boolean flag which tells if this edge is the base of the node, in other words if it is the first edge of the node’s edges array. So, we have this structure:
struct edge
{
uint64_t value;
uint64_t is_node_base;
uint64_t is_exit;
uint64_t next_index;
}
This is it for sub_403600
. Here’s a rough draft of how it would look like in Python:
def walk_graph():
cur_index = 3
for i in range(len(my_input)):
my_encoded_byte = (my_input[i] << 1) | 1
edge = get_item_at_index(graph, cur_index)
next_idx = edge.next_index
if get_item_at_index(graph, next_idx).value == 3 or my_encoded_byte in [0xE2, 0xE3, 0xE4, 0xE5]:
error()
# Get the index of either the 'q' or the 'r' edge
actual_next_index = get_next_index(my_encoded_byte, next_idx)
edge = get_item_at_index(graph, actual_next_index)
if edge.value != my_encoded_byte:
error()
cur_index = actual_next_index
if get_item_at_index(graph, cur_index).is_exit:
return True
return False
We need to provide an input such that the last
edge has the is_exit
field set to 3
which is 1
or true
in the
OCaml source code.
278 graphs
Having to find an input that would fulfill this requirement for a single graph would be way too easy. That’s where we have to go back to the first function we looked at. Remember the 278 entries that were stored in a global variable? Well, turns out these are 278 different graphs. The graph validation function we just analyzed will be called 278 times with each one of these graphs and our input needs to end up in an exit edge for each of them.
The recursive function sub_403370
is the one iterating over the 278 graphs
and as an output it builds a linked-list that contains 278 elements for the
results of each of these graphs (either 0xf or 0x23 as showed earlier):
Then, compute_result
in the following screenshot:
computes a value based on the result list as such:
result[0] + result[1] - 1 + result[2] - 1 + ... + result[n] - 1
which gives 0x24ed if all the items of result are equals to 0x23.
At this point, it became more of an algorithm problem than a reverse engineering one. I extracted the graphs and explained the problem to a teammate who came so close to solving it. All of this is part 2 :)
Here’s the script I used to dump the graphs:
graph_heads = [
0x5E6A48, 0x5ECF18, 0x5F21F0, 0x5F6E60, 0x5FB940, 0x600048, 0x604808, 0x608EC0, 0x60D5B8, 0x611C38,
0x6162D8, 0x61A950, 0x61F020, 0x6236F0, 0x627DA8, 0x62C460, 0x630AC8, 0x635198, 0x639788, 0x63DE30,
0x641B80, 0x646210, 0x64A888, 0x64EED8, 0x652D00, 0x657368, 0x65B9B8, 0x660038, 0x6646D8, 0x668D50,
0x66D370, 0x6719D8, 0x676028, 0x67A528, 0x67EB48, 0x6831C0, 0x687868, 0x68BEE8, 0x690520, 0x694B70,
0x699240, 0x69D8A8, 0x6A1F50, 0x6A63E0, 0x6AAA60, 0x6AF0E0, 0x6B3730, 0x6B7840, 0x6BBE60, 0x6C04C8,
0x6C4B70, 0x6C91F0, 0x6CD840, 0x6D1E20, 0x6D5F60, 0x6D95F0, 0x6DDC68, 0x6E22B0, 0x6E6780, 0x6EAE10,
0x6EF490, 0x6F3B10, 0x6F8178, 0x6FC738, 0x700D80, 0x705358, 0x7099A8, 0x70E060, 0x7126C8, 0x716D48,
0x71B398, 0x71F838, 0x723EA0, 0x728520, 0x72CB70, 0x7311D8, 0x735840, 0x739C20, 0x73E0C0, 0x742750,
0x746890, 0x74AEF8, 0x74F440, 0x753AE8, 0x757CD0, 0x75C1F0, 0x760510, 0x764B48, 0x7691B0, 0x76D818,
0x771EF8, 0x776570, 0x77AAD0, 0x77F138, 0x7837B8, 0x787E20, 0x78C470, 0x790AF0, 0x794CF0, 0x7992E0,
0x79D378, 0x7A1698, 0x7A5D18, 0x7AA260, 0x7AE880, 0x7B2EE8, 0x7B7520, 0x7BBA98, 0x7C0118, 0x7C4780,
0x7C8DF8, 0x7CD460, 0x7D1A98, 0x7D60B8, 0x7DA708, 0x7DED58, 0x7E3390, 0x7E7638, 0x7EBCA0, 0x7F0320,
0x7F4970, 0x7F8F90, 0x7FD5F8, 0x801C48, 0x806250, 0x80A858, 0x80EEA8, 0x8132D0, 0x817988, 0x81BFA8,
0x8205F8, 0x824C00, 0x829268, 0x82D8E0, 0x831D80, 0x8363D0, 0x83A7E0, 0x83EE30, 0x843480, 0x847A28,
0x84BFB8, 0x850338, 0x854958, 0x858FD8, 0x85D5F8, 0x861C60, 0x8662E0, 0x86A930, 0x86EB90, 0x873210,
0x877860, 0x87BEC8, 0x880500, 0x884B50, 0x889050, 0x88D6D0, 0x891E18, 0x896288, 0x89A600, 0x89E7A0,
0x8A2DF0, 0x8A6FA8, 0x8AB2C0, 0x8AF940, 0x8B3F90, 0x8B85D8, 0x8BC958, 0x8C0ED0, 0x8C5538, 0x8C9BB8,
0x8CE1C0, 0x8D2828, 0x8D6E90, 0x8DB4F8, 0x8DFB18, 0x8E4168, 0x8E87B8, 0x8ECCD0, 0x8F1388, 0x8F59D8,
0x8F9FF8, 0x8FE648, 0x902C08, 0x907258, 0x90B8D8, 0x90FF58, 0x9144B8, 0x918AF0, 0x91D158, 0x9217E8,
0x925E50, 0x92A4A0, 0x92E7F0, 0x932840, 0x936A40, 0x93AD90, 0x93F260, 0x9438E0, 0x947960, 0x94BFC0,
0x950608, 0x954C58, 0x959218, 0x95D6E8, 0x9619D8, 0x966090, 0x96A560, 0x96EBE0, 0x973218, 0x977838,
0x97BC40, 0x9802A8, 0x984268, 0x988708, 0x98CDB0, 0x991430, 0x995900, 0x999F20, 0x99E510, 0x9A2B30,
0x9A6FD0, 0x9AB578, 0x9AF448, 0x9B37C8, 0x9B7D10, 0x9BBFD0, 0x9C0648, 0x9C4C08, 0x9C92D8, 0x9CD850,
0x9D1EB8, 0x9D64C8, 0x9DAB18, 0x9DF0C0, 0x9E3738, 0x9E7D80, 0x9EC3D0, 0x9F06C0, 0x9F4D40, 0x9F93E8,
0x9FD9D8, 0xA01458, 0xA05A78, 0xA0A0F8, 0xA0E220, 0xA128A0, 0xA16F20, 0xA1B2B8, 0xA1F818, 0xA23E38,
0xA28488, 0xA2CB08, 0xA31158, 0xA35508, 0xA39978, 0xA3DFF8, 0xA42378, 0xA466C8, 0xA4ABF8, 0xA4F128,
0xA52FF8, 0xA575E8, 0xA5BC88, 0xA601A0, 0xA64858, 0xA68EA0, 0xA6D340, 0xA719A8, 0xA75FF8, 0xA7A520,
0xA7E4B0, 0xA82800, 0xA86E38, 0xA8B458, 0xA8FAC0, 0xA93DE0, 0xA98400, 0xA9C6F0
]
def parse_graph(head):
my_list = []
next_node = head
while next_node != 1:
elem_ptr = ida_bytes.get_64bit(next_node)
next_node = ida_bytes.get_64bit(next_node + 8)
value = ida_bytes.get_64bit(elem_ptr)
is_node_base = ida_bytes.get_64bit(elem_ptr + 0x8)
is_exit = ida_bytes.get_64bit(elem_ptr + 0x10)
next_index = ida_bytes.get_64bit(elem_ptr + 0x18)
my_list.append((value, is_node_base, exit_node, next_index))
return my_list
graphs = []
for head in graph_heads[::-1]:
graphs.append(parse_graph(head))
with open(your_file_path, "w") as f:
f.write("graphs = [\n")
for i, graph in enumerate(graphs):
f.write("[\n".format(i))
for value, is_node_base, is_exit, next_index in graph:
f.write("({}, {}, {}, {}),\n".format(value, is_node_base, is_exit, next_index))
f.write("],\n")
f.write("]\n")