ELF function override

While reading up on the ELF spec (I have a fun life) I found myself wondering whether it were possible to override dynamically loaded library functions in some strange ways. For those wondering, yes I do know the LD_PRELOAD trick, and I do realise what I talk about here has no real use case. It's just fun.

To recap some of the characteristics of dynamic linking:

Direct Patch of Function Table

This first attempt was based on noticing how the program achieved the lazy loading of function addresses. When the linkage stub is called, it acts as if the address of the libary function is correct in the .got.plt. It reads this value and unconditionally jumps to the position. In order to allow the linker to lazily fill the table in, each position is initialised pointing at the code that calls the dynamic linker to set-up that particular function table entry.

With this information, we can see that if patch the binary after compilation so that the function table initially points to our alternate function, the dynamic linker will never be called to modify that entry and our function will be called instead. To do this, we need the position of our alternate function in memory and the position of the relevant function table entry in the file.

The position of the alternate function may be found after compilation with

      
vshcmd: > readelf -s testprog | grep altstrcmp
    48: 0000000000601038     8 OBJECT  GLOBAL DEFAULT   24 altstrcmp
fake_plt [22:19:05] $ 
      

Similarly, the position of the function table entry in the file may be found by finding the position of the dynamic relocations, and subtracting the difference between the memory address and file offset of the .got.plt section.

      
vshcmd: > readelf -r testprog

Relocation section '.rela.dyn' at offset 0x3a0 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600ff0  000200000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.plt' at offset 0x3d0 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0
000000601020  000300000007 R_X86_64_JUMP_SLO 0000000000000000 strcmp@GLIBC_2.2.5 + 0
fake_plt [22:16:47] $
vshcmd: > readelf --sections testprog | grep -A 1 .got.plt
  [23] .got.plt          PROGBITS         0000000000601000  00001000
       0000000000000028  0000000000000008  WA       0     0     8
elf [18:19:50] $
vshcmd: > offset=$(python -c 'print(0x601020 - (0x601000 - 0x1000))')
elf [18:21:42] $ 
      

Using this information, we can patch the binary with

      
vshcmd: > python -c 'import sys; sys.stdout.buffer.write(b"\x38\x10\x60")'  | dd of=testprog bs=1 seek=$offset conv=notrunc
3+0 records in
3+0 records out
3 bytes copied, 0.0732023 s, 0.0 kB/s
elf [22:19:10] $ 
      

and it will use altstrcmp instead of strcmp.

That worked nicely, but the code had no control over whether it was using the default or my alternate function. I wanted more control over when my function was overwritten, to allow the user to tell the program to switch between functions.

This would allow hypothetical use cases such as the user turning on debug mode by sending a SIGUSR1 or some message over a communication channel.

Patch a Data Variable

For my second attempt, I made a global variable in the code to hold the position in program memory of the .got.plt entry for the interesting function.

      
// Need to initialise this so it's defined in .data instead of .bss.
// That means we can modify it in the file with `dd`.
int (**strcmpgot) (const char *, const char *) = (int (**) (const char *, const char *))100; 
      ...
      
if (check_password(argv[1])) {
    puts("Congratulations!!!");
} else {
    int (*origstrcmp) (const char *, const char *) = *strcmpgot;
    *strcmpgot = mystrcmp;
    if (check_password(argv[1])) {
        puts("So close ... !!");
    } else {
        puts("Sorry, that's the wrong password");
    }
    *strcmpgot = origstrcmp;
    // Double check we've reset it.
    assert(strcmp("hello", "hello") == 0);
}
      
I then found the position of that initialised variable in the program file by getting the position in memory with readelf -s testprog | grep strcmpgot and adjusting by the difference between the file offset and memory address of the .data section. With this address, I patched the value of that variable in the compiled binary with the position of the relevant function table entry in memory.
      
vshcmd: > python -c 'import sys; sys.stdout.buffer.write(b"\x20\x10\x60")'  | dd of=testprog bs=1 seek=$offset conv=notrunc
3+0 records in
3+0 records out
3 bytes copied, 0.0466237 s, 0.1 kB/s
elf [18:30:05] $
vshcmd: > # Check the patched binary behaves as expected
vshcmd: > ./testprog 'etmrhdr '
Congratulations!!!
elf [18:30:06] $
vshcmd: > ./testprog 'funsies!'
So close ... !!
elf [18:30:07] $
vshcmd: > ./testprog 'hello'
Sorry, that's the wrong password
elf [18:30:08] $ 
      

That was much better, with the code having full control over when the alternate function was used, but it still required patching the program after it had been compiled.

Without Patching the Binary

All we need to know in the code is where in memory the strcmp function table entry is. We know this is encoded in the linker stub that our main code uses, that's where the linker stub reads its destination address from. If we can find the linker stub in the program, then we should be able to decode it and find where we need to read from ourselves.

It turns out that finding the position of the relevant linker stub is pretty easy. Recalling that all of our main code uses that linker stub instead of the actual function we want, we can simply ask for the address with &strcmp in our main program.

      uint8_t *pltaddr = (uint8_t *)&strcmp;
      

Finding the position of the code from the instructions we see there is a little more tricky. In order to do this, we disassemble the linker stub, and view the instruction opcodes it uses.

      
vshcmd: > objdump -d testprog -j .plt

testprog:     file format elf64-x86-64


Disassembly of section .plt:

0000000000400460 <.plt>:
  400460:   ff 35 a2 0b 20 00       pushq  0x200ba2(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  400466:   ff 25 a4 0b 20 00       jmpq   *0x200ba4(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  40046c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000400470 <puts@plt>:
  400470:   ff 25 a2 0b 20 00       jmpq   *0x200ba2(%rip)        # 601018 <puts@GLIBC_2.2.5>
  400476:   68 00 00 00 00          pushq  $0x0
  40047b:   e9 e0 ff ff ff          jmpq   400460 <.plt>

0000000000400480 <__assert_fail@plt>:
  400480:   ff 25 9a 0b 20 00       jmpq   *0x200b9a(%rip)        # 601020 <__assert_fail@GLIBC_2.2.5>
  400486:   68 01 00 00 00          pushq  $0x1
  40048b:   e9 d0 ff ff ff          jmpq   400460 <.plt>

0000000000400490 <strcmp@plt>:
  400490:   ff 25 92 0b 20 00       jmpq   *0x200b92(%rip)        # 601028 <strcmp@GLIBC_2.2.5>
  400496:   68 02 00 00 00          pushq  $0x2
  40049b:   e9 c0 ff ff ff          jmpq   400460 <.plt>
playing_with_elf [11:33:08] $
      

Reading the opcodes, and comparing to the instruction specification we can see this uses the absolute indirect form of the jmp instruction. There's more detail in the test program comments, but essentially this means we need to read the 32 bit offset two bytes forwards from the &strcmp stub, and add to it &strcmp + 6.

      
typedef int (**strcmpptr) (const char *, const char *);
strcmpptr getgot(void)
{
    uint8_t *pltaddr = (uint8_t *)&strcmp;
    assert(pltaddr[0] == 0xff);
    assert(pltaddr[1] == 0x25);
    uintptr_t offset = *(uint32_t *)(pltaddr + 2);
    offset += (uintptr_t)(pltaddr + 6);
    return (strcmpptr)offset;
}
      

After compilation and test, we see we now have a working example that doesn't require patching the binary! It is still very brittle, relying on decoding an instruction (which is clearly processor specific), can we do anything about that?

Reading the In-Memory PHDR

If we want to find the address of the strcmp function pointer entry without relying on architecture specifics, then we're going to have to use the ELF data structures that were made specifically for that reason. We could do this by reading our own file, but in the ELF man page it mentions the PT_PHDR header that we can use instead.

This header tells the loader where to put the program header in memory, and with that information in memory, we have easy access for reading and parsing.

      
vshcmd: > readelf -d -l testprog

Elf file type is EXEC (Executable file)
Entry point 0x400570
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    0x8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000120c 0x000000000000120c  R E    0x200000
  LOAD           0x0000000000001e08 0x0000000000601e08 0x0000000000601e08
                 0x0000000000000248 0x0000000000000268  RW     0x200000
  DYNAMIC        0x0000000000001e20 0x0000000000601e20 0x0000000000601e20
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000001018 0x0000000000401018 0x0000000000401018
                 0x000000000000005c 0x000000000000005c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000001e08 0x0000000000601e08 0x0000000000601e08
                 0x00000000000001f8 0x00000000000001f8  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .init_array .fini_array .jcr .dynamic .got

Dynamic section at offset 0x1e20 contains 24 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x4004f8
 0x000000000000000d (FINI)               0x400d84
 0x0000000000000019 (INIT_ARRAY)         0x601e08
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x601e10
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400398
 0x0000000000000006 (SYMTAB)             0x4002c0
 0x000000000000000a (STRSZ)              104 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x602000
 0x0000000000000002 (PLTRELSZ)           120 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400480
 0x0000000000000007 (RELA)               0x400438
 0x0000000000000008 (RELASZ)             72 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400418
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400400
 0x0000000000000000 (NULL)               0x0
elf_override [15:46:10] $
      

We read this header to find where the _DYNAMIC array is in memory, and it is this array that contains all the information we need to find the position in memory of the strcmp pointer.

Each element in the _DYNAMIC array contains a tag that tells us what information this element contains (shown without their DT_ prefixes above). The JMPREL tag shows us where the relocations can be found, and we can associate each relocation with a symbol name using information in the STRTAB and SYMTAB elements.

       
uint16_t find_sym_index(const char * const target,
                      const Elf64_Sym * const symtab, const size_t num_symbols,
                      const char * const strtab, const size_t strsz)
{
    const Elf64_Sym *cursym;
    for (cursym = symtab;
         cursym < symtab + num_symbols;
         cursym += 1) {
        assert(cursym->st_name <= strsz);
        if (strcmp(strtab + cursym->st_name, target) == 0) {
            break;
        }
    }
    // Not found == num_symbols, found == index
    return cursym - symtab;
}

strcmpptr find_rela_addr(const uint16_t sym_index,
        const Elf64_Rela * const rela, const size_t relasz)
{
    for (const Elf64_Rela * currel = rela;
         (void *)currel < (void *)rela + relasz;
         currel += 1) {
        if (ELF64_R_SYM(currel->r_info) == sym_index) {
            // XXX If this addend isn't 0 then the .got.plt is structured in a
            // way I don't understand, fail and alert me so I can investigate.
            assert(currel->r_addend == 0);
            return (strcmpptr) currel->r_offset;
        }
    }
    return NULL;
}
      

The only piece of information we need is where in memory the PHDR program header is located. From observation it appears that it always starts at 0x400040, and this value works when hard-coded into the above program, but I haven't found that specified anywhere. I believe it's possible to specify that value with a linker script by using the PHDRS command, but according to that link once you specify one program header you have to specify them all. That's getting to lose a lot of flexibility, and I refuse to resort to reading the file, so we'll try another tack...

When working on a GNU system, we have some extra niceties that can come in handy sometimes one of these is the dl_iterate_phdr() function. This iterates over each of the programs currently loaded shared objects and calls a user specified callback with a structure specifying, among other things, the position of the program headers in memory.

Using this function, and checking for the object that specifies "us" by comparing info->dlpi_name to "", we finally have all the information we need for a processor independant ability to overwrite our dynamic library functions.

      
struct callback_data {
    uintptr_t addr;
    void *phdr;
};

int get_object_phdr(struct dl_phdr_info *info, size_t size, void *data)
{
    struct callback_data *c = (struct callback_data *)data;
    if (info->dlpi_name[0] == '\0') {
        c->phdr = info->dlpi_phdr;
        c->addr = info->dlpi_addr;
        return 1;
    }
    return 0;
}

  ...

struct callback_data cbdata;
dl_iterate_phdr(get_object_phdr, &cbdata);
      

... thinking on that dl_iterate_phdr() function ... it iterates over all shared objects ... can we override a function that one of our libaries use?

Other object files

Seeing as dl_iterate_phdr() iterates over all loaded shared objects, I thought it might be possible to read the dynamic sections of other libraries. This would open up some actual applications, like a temporary LD_PRELOAD to replace commands with introspective counterparts when calling specific library functions. This appears to work, though there are a few places that my code doesn't match my interpretation of the manual, so I'm not about to use this in production :-)

There are a bunch of minor adjustments to be made to the program in the previous section that amount to accounting for offsets from the base address of the dynamic libary. There is also removing the addition of a base offset from pointers in the Elf64_Dyn structures. On reading the manual it appears that these should be there, but the manual refers to the format of the file, not the format of the program header in memory. From this insight I believe the fact the base offset is always 0 in my previous program is hiding a bug.

If you want to know the full extent of the differences you can diff the test programs from this section and the previous one. Suffice to say that with a little tweaking of the previous program, we get to temporarily overwrite what external functions a dynamically loaded library calls.

I think that's pretty cool.