Extracting the Stagecoach App's Word of the Day
Posted on 7th March 2024 at 16:22
A dive into Stagecoach's mobile app and extracting encrypted data relating to their ticketing system
Contents
Where This Started
I rely on my cities bus services to get me around, one of which is Stagecoach. But I noticed something interesting with their ticketing system. When I bought a ticket via their app, it had a QR code on it, makes sense - scan that on the bus.
But no actually, there’s no scanner on the buses here. So you end up just showing the driver your ticket. A ticket which has this spinning graphic with a different word on it every day and the current time, which I can only assume is what they’re checking, as to make sure it’s not just a screenshot or something.
This made me curious, where were they getting this word from? Was it just some plain array in the app or retrieved from a remote server you need authentication for.
Decompiling the App
This part is pretty easy, plenty of tools exist to do this for you. I chose to use jadx as it seemed the most popular.
I grabbed the apk off of an apk mirroring site and loaded it into jadx.
But what am I even looking for? I tried searching the word for today which is prize
but that yielded nothing. I then remembered that the ticket screen has some
paragraph about reporting missing people, so I search for some text from that and found a resource defining strings.
With this I searched for the name of the string and found AbstractQrActiveTicketView
, the view used for showing tickets, which is where the word of the day should be displayed.
First thing I scan through is the imports and class fields, perhaps there’s some class that the words are retrieved from.
A database model called Word catches my eye.
Jumping into this file reveals this is what we we’re looking for, the class fields of day
, word
and colour
make sense. Time to check the usages.
One if its usages leads to a class simply named DatabaseProvider
and gives us this critical line of code.
So this is where the words come from, figure out to how to decrypt these bytes and we’ve got them.
The Word of the Day File
The WordOfTheDayFile
class is simply a class that contains a series of static classes with a single field in each containing an array of bytes, presumably the words in json format.
With a method that writes them all into a single byte array.
This seems like a weird way of doing this, not sure if this is some weird decompilation artifact or what.
The getGeneratedKey
Function
Initially the name of this function worried me, there are so many ways this key could be generated, this might just end here. Thankfully, it didn’t.
This function calls a function called unhide
on the class HidingUtils
. It passes a base64 encoded string to this unhide
function.
This unhide
function is actually a native Java function, meaning it’s using the Java Native Interface (JNI) to call a shared native library.
Which is loaded further up in the class. The native library is called ticket-ref-code
.
So, they’re doing this “unhiding” logic within a native library, most likely c++. This is probably to make extracting these secrets harder/take more time.
We can find the location of this native library in the resources/lib
folder, let’s dig into that.
Decompiling ticket-ref-code.so
I’m not a C++ developer. I’m also not terribly experienced with reverse engineering decompiled C++ code. So that’s why I called upon my partner Harry who is more familiar to help me with this part. This wouldn’t of been possible without them.
One thing I do know about C++ decompiling/reverse engineering is that Ghidra exists and that I should use that. After trekking on a journey to download JDK 17 I was ready to get going.
So, uh, yeah that’s definitely code.
This isn’t actually as bad as it looks, at least that’s what I realised after NoSharp explained it to me.
First off, this wizard knew what a .gdt
archive was, or a Ghidra Data Type
. What these archives do is help Ghidra figure out the structure of datatypes. For example, if you provided a GDT archive
that told Ghidra string
types had a function called slice
at offset 0x458
, it would replace a call such as (*p + 0x458)
with (*p->slice())
, making the code more readable. Thankfully,
someone had already created a GDT archive for us to use with JNI code.
With that archive imported, we did some googling to find out how JNI functions are written in C++ and quickly found that the first argument of most functions is JNIEnv *param_1
, effectively a pointer to a function table.
With the GDT archive, updating the first parameters signature to JNIEnv *
should reveal some function calls…
Awesome! Now we have a slightly better idea of what’s going on. Though it didn’t reveal anything too significant, it helps us parse out what is and isn’t useful.
We now went through and started naming variables to make it easier to track what’s going on.
I’m sure most of us can figure out what’s happening here, the only part we really had to take a second to understand was:
The ^
is the XOR operator in C++. All this code is doing is XOR’ing each character of the input string with the key string. Which means that to get our decryption key we just
need to take the key from the getGeneratedKey
function and the key found inside this native library and XOR each character of each.
There’s also the AES256Cipher
class that’s used to do the decryption, there’s nothing special in here other than telling us the IV used and AES mode.
At this point, we were ready to write a little python script to do this for us.
Decryption Script
Let’s walk through this step by step:
Bytes in Java are signed, meaning a byte can range from -128 to 127. But in python bytes are unsigned, meaning they range from 0 to 255. So the first thing we need to do is convert to bytes from signed to unsigned.
This is what the byte % 256
is doing.
The keys are base64 encoded strings, so we need to decode them and get the bytes.
This is a fancy way of doing the XOR’ing previously mentioned, so each character in each key is XOR’d to get the real decryption key.
I’m using the cryptography library to do the decryption here.
Then finally, we need to remove the AES padding, which is 16 bytes per block and then decode the bytes to UTF-8.
aaaaaaannndddd…
Success! :D
We now have all of the words of the day for the Stagecoach app, seems like they store the entire years worth of words.