Extracting the Stagecoach App's Word of the Day

Posted on 7th March 2024 at 16:22

A dive into Stagecoach's mobile app and extracting encrypted data relating to their ticketing system

Contents

Where This Started

I rely on my cities bus services to get me around, one of which is Stagecoach. But I noticed something interesting with their ticketing system. When I bought a ticket via their app, it had a QR code on it, makes sense - scan that on the bus.

But no actually, there’s no scanner on the buses here. So you end up just showing the driver your ticket. A ticket which has this spinning graphic with a different word on it every day and the current time, which I can only assume is what they’re checking, as to make sure it’s not just a screenshot or something.

Example of the ticket view in the Stagecoach app
Example of the ticket view in the Stagecoach app

This made me curious, where were they getting this word from? Was it just some plain array in the app or retrieved from a remote server you need authentication for.

Decompiling the App

This part is pretty easy, plenty of tools exist to do this for you. I chose to use jadx as it seemed the most popular.

I grabbed the apk off of an apk mirroring site and loaded it into jadx.

jadx successfully loading the Stagecoach apk
Success!

But what am I even looking for? I tried searching the word for today which is prize but that yielded nothing. I then remembered that the ticket screen has some paragraph about reporting missing people, so I search for some text from that and found a resource defining strings.

res/values/strings.xml
1
<string name="minutes">minutes</string>
2
<string name="miss">Miss</string>
3
<string name="missing_people">Missing People</string>

With this I searched for the name of the string and found AbstractQrActiveTicketView, the view used for showing tickets, which is where the word of the day should be displayed.

First thing I scan through is the imports and class fields, perhaps there’s some class that the words are retrieved from.

A database model called Word catches my eye.

views.home.ticketview.activeticketview.AbstractQrActiveTicketView.java
1
import com.stagecoach.core.cache.SecureUserInfoManager;
2
import com.stagecoach.core.model.database.word.Word;
3
import com.stagecoach.core.model.secureapi.DynamicSettingsResponse;

Jumping into this file reveals this is what we we’re looking for, the class fields of day, word and colour make sense. Time to check the usages.

model.database.word.Word.java
1
public class Word {
2
private String colour;
3
private int day;
4
private String word;

One if its usages leads to a class simply named DatabaseProvider and gives us this critical line of code.

logic.DatabaseProvider.java
1
new Gson().parse(this.AES.decryptJsonWithKey(new WordOfTheDayFile().getBytesArray(), this.secureUserInfoManager.getGeneratedKey())

So this is where the words come from, figure out to how to decrypt these bytes and we’ve got them.

The Word of the Day File

The WordOfTheDayFile class is simply a class that contains a series of static classes with a single field in each containing an array of bytes, presumably the words in json format.

logic.WordOfTheDayFile.java
1
public class WordOfTheDayFile {
2
public final byte[] bytes = {...};
3
4
public static class WordOfTheDayFile0 {
5
final byte[] bytes = {...};
6
7
WordOfTheDayFile0() {
8
}
9
}

With a method that writes them all into a single byte array.

logic.WordOfTheDayFile.java
1
public byte[] getBytesArray() throws IOException {
2
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(27000);
3
byteArrayOutputStream.write(this.bytes);
4
byteArrayOutputStream.write(new WordOfTheDayFile0().bytes);
5
byteArrayOutputStream.write(new WordOfTheDayFile1().bytes);
6
byteArrayOutputStream.write(new WordOfTheDayFile2().bytes);
7
byteArrayOutputStream.write(new WordOfTheDayFile3().bytes);
8
byteArrayOutputStream.write(new WordOfTheDayFile4().bytes);
9
byteArrayOutputStream.write(new WordOfTheDayFile5().bytes);
10
byteArrayOutputStream.write(new WordOfTheDayFile6().bytes);
11
byteArrayOutputStream.write(new WordOfTheDayFile7().bytes);
12
return byteArrayOutputStream.toByteArray();
13
}

This seems like a weird way of doing this, not sure if this is some weird decompilation artifact or what.

The getGeneratedKey Function

Initially the name of this function worried me, there are so many ways this key could be generated, this might just end here. Thankfully, it didn’t.

This function calls a function called unhide on the class HidingUtils. It passes a base64 encoded string to this unhide function.

cache.SecureUserInfoManager.java
1
public byte[] getGeneratedKey() {
2
return new HidingUtils().unhide("<key>").getBytes();
3
}

This unhide function is actually a native Java function, meaning it’s using the Java Native Interface (JNI) to call a shared native library. Which is loaded further up in the class. The native library is called ticket-ref-code.

com.lagoru.jnirealm.HidingUtils.java
1
public class HidingUtils {
2
private static final String TAG = "HidingUtil";
3
4
static {
5
System.loadLibrary("ticket-ref-code");
6
}
7
8
static String[] generateKeyXorParts(String str) {...}
9
10
public native String unhide(String str);
11
}

So, they’re doing this “unhiding” logic within a native library, most likely c++. This is probably to make extracting these secrets harder/take more time.

The file directory tree showing the shared library location
The file directory tree showing the shared library location

We can find the location of this native library in the resources/lib folder, let’s dig into that.

Decompiling ticket-ref-code.so

I’m not a C++ developer. I’m also not terribly experienced with reverse engineering decompiled C++ code. So that’s why I called upon my partner @nosharp who is more familiar to help me with this part. This wouldn’t of been possible without them.

One thing I do know about C++ decompiling/reverse engineering is that Ghidra exists and that I should use that. After trekking on a journey to download JDK 17 I was ready to get going.

libticket-ref-code.so
1
void Java_com_lagoru_jnirealm_HidingUtils_unhide(long *param_1,undefined8 param_2,undefined8 param_3)
2
3
{
4
undefined8 uVar1;
5
ulong uVar2;
6
byte *pbVar3;
7
long in_FS_OFFSET;
8
byte local_358 [272];
9
byte local_248 [272];
10
byte local_138;
11
byte abStack_137 [263];
12
long local_30;
13
14
pbVar3 = local_358;
15
local_30 = *(long *)(in_FS_OFFSET + 0x28);
16
uVar1 = (**(code **)(*param_1 + 0x548))(param_1,param_3,0);
17
memset(&local_138,0,0x101);
18
memset(local_248,0,0x101);
19
Base64Decode(uVar1,&local_138,0x100);
20
Base64Decode("...",local_248,0x100);
21
memset(local_358,0,0x101);
22
if (local_138 != 0) {
23
uVar2 = 0;
24
do {
25
local_358[uVar2] = local_138 ^ local_248[uVar2];
26
local_138 = abStack_137[uVar2];
27
uVar2 = uVar2 + 1;
28
} while (local_138 != 0);
29
pbVar3 = local_358 + (uVar2 & 0xffffffff);
30
}
31
*pbVar3 = 0;
32
(**(code **)(*param_1 + 0x550))(param_1,param_3,uVar1);
33
(**(code **)(*param_1 + 0x538))(param_1,local_358);
34
}

So, uh, yeah that’s definitely code.

This isn’t actually as bad as it looks, at least that’s what I realised after NoSharp explained it to me.

First off, this wizard knew what a .gdt archive was, or a Ghidra Data Type. What these archives do is help Ghidra figure out the structure of datatypes. For example, if you provided a GDT archive that told Ghidra string types had a function called slice at offset 0x458, it would replace a call such as (*p + 0x458) with (*p->slice()), making the code more readable. Thankfully, someone had already created a GDT archive for us to use with JNI code.

With that archive imported, we did some googling to find out how JNI functions are written in C++ and quickly found that the first argument of most functions is JNIEnv *param_1, effectively a pointer to a function table. With the GDT archive, updating the first parameters signature to JNIEnv * should reveal some function calls…

libticket-ref-code.so
1
void Java_com_lagoru_jnirealm_HidingUtils_unhide(JNIEnv *param_1,undefined8 param_2,jstring param_3)
2
3
{
12 collapsed lines
4
char *chars;
5
ulong uVar1;
6
byte *pbVar2;
7
long in_FS_OFFSET;
8
byte local_358 [272];
9
byte local_248 [272];
10
byte local_138;
11
byte abStack_137 [263];
12
long local_30;
13
14
pbVar2 = local_358;
15
local_30 = *(long *)(in_FS_OFFSET + 0x28);
16
17
uVar1 = (**(code **)(*param_1 + 0x548))(param_1,param_3,0);
18
19
chars = (*(*param_1)->GetStringUTFChars)(param_1,param_3,(jboolean *)0x0);

Awesome! Now we have a slightly better idea of what’s going on. Though it didn’t reveal anything too significant, it helps us parse out what is and isn’t useful.

We now went through and started naming variables to make it easier to track what’s going on.

libticket-ref-code.so
1
void Java_com_lagoru_jnirealm_HidingUtils_unhide(JNIEnv *param_1,jobject param_2,jstring input)
2
3
{
4
char *chars;
5
ulong counter;
6
byte *pbVar1;
7
long in_FS_OFFSET;
8
byte final_string [272];
9
byte key_decoded [272];
10
byte input_decoded;
11
byte abStack_137 [263];
12
long local_30;
13
14
pbVar1 = final_string;
15
local_30 = *(long *)(in_FS_OFFSET + 0x28);
16
chars = (*(*param_1)->GetStringUTFChars)(param_1,input,(jboolean *)0x0);
17
memset(&input_decoded,0,0x101);
18
memset(key_decoded,0,0x101);
19
Base64Decode(chars,&input_decoded,0x100);
20
Base64Decode("...",key_decoded,0x100);
21
memset(final_string,0,0x101);
22
if (input_decoded != 0) {
23
counter = 0;
24
do {
25
final_string[counter] = input_decoded ^ key_decoded[counter];
26
input_decoded = abStack_137[counter];
27
counter = counter + 1;
28
} while (input_decoded != 0);
29
pbVar1 = final_string + (counter & 0xffffffff);
30
}
31
*pbVar1 = 0;
32
(*(*param_1)->ReleaseStringUTFChars)(param_1,input,chars);
33
(*(*param_1)->NewStringUTF)(param_1,(char *)final_string);
34
}

I’m sure most of us can figure out what’s happening here, the only part we really had to take a second to understand was:

libticket-ref-code.so
1
memset(final_string,0,0x101);
2
if (input_decoded != 0) {
3
counter = 0;
4
do {
5
final_string[counter] = input_decoded ^ key_decoded[counter];
6
input_decoded = abStack_137[counter];
7
counter = counter + 1;
8
} while (input_decoded != 0);
9
pbVar1 = final_string + (counter & 0xffffffff);
10
}

The ^ is the XOR operator in C++. All this code is doing is XOR’ing each character of the input string with the key string. Which means that to get our decryption key we just need to take the key from the getGeneratedKey function and the key found inside this native library and XOR each character of each.

There’s also the AES256Cipher class that’s used to do the decryption, there’s nothing special in here other than telling us the IV used and AES mode.

utils.AES256Cipher.java
1
public class AES256Cipher {
2
private static final String AES_MODE = "AES/CBC/PKCS5PADDING";
3
private static final String TAG = "com.stagecoach.core.utils.AES256Cipher";
4
private static final byte[] ivBytes = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

At this point, we were ready to write a little python script to do this for us.

Decryption Script

decryption.py
1
import base64
2
from Crypto.Util.Padding import unpad
3
from cryptography.hazmat.primitives.ciphers import Cipher
4
from cryptography.hazmat.backends import default_backend
5
from cryptography.hazmat.primitives.ciphers.algorithms import AES
6
from cryptography.hazmat.primitives.ciphers.modes import CBC
7
8
encrypted_bytes_list = [...]
9
encrypted_bytes = bytes([byte % 256 for byte in encrypted_bytes_list])
10
11
iv = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
12
13
private_key = base64.b64decode('<key-from-ticket-ref-lib>')
14
other_key = base64.b64decode('<key-from-get-generated-key-function>')
15
16
key = bytes(a ^ b for a, b in zip(private_key, other_key))
17
18
cipher = Cipher(AES(key), mode=CBC(iv), backend=default_backend())
19
result = cipher.decryptor().update(encrypted_bytes)
20
21
with open('output', 'w') as out:
22
out.write(unpad(result, 16).decode())

Let’s walk through this step by step:

1
encrypted_bytes_list = [...]
2
encrypted_bytes = bytes([byte % 256 for byte in encrypted_bytes_list])

Bytes in Java are signed, meaning a byte can range from -128 to 127. But in python bytes are unsigned, meaning they range from 0 to 255. So the first thing we need to do is convert to bytes from signed to unsigned. This is what the byte % 256 is doing.

1
private_key = base64.b64decode('<key-from-ref-ticket-lib>')
2
other_key = base64.b64decode('<key-from-get-generated-key-function>')

The keys are base64 encoded strings, so we need to decode them and get the bytes.

1
key = bytes(a ^ b for a, b in zip(private_key, other_key))

This is a fancy way of doing the XOR’ing previously mentioned, so each character in each key is XOR’d to get the real decryption key.

1
cipher = Cipher(AES(key), mode=CBC(iv), backend=default_backend())
2
result = cipher.decryptor().update(encrypted_bytes)

I’m using the cryptography library to do the decryption here.

1
with open('output', 'w') as out:
2
out.write(unpad(result, 16).decode())

Then finally, we need to remove the AES padding, which is 16 bytes per block and then decode the bytes to UTF-8.

aaaaaaannndddd…

Successfully decrypted JSON words data
Successfully decrypted JSON words data

Success! :D

We now have all of the words of the day for the Stagecoach app, seems like they store the entire years worth of words.