310 likes | 630 Views
FASM CON 2009, Myjava, Slovak republic. Turning off hypervisor and resuming OS in 100 instructions. by Feryno, Czechoslovakia. FASM CON 2009, Myjava, Slovak republic. hypervisor (ring-1) and OS (ring0 + ring3) are running correctly (Intel IA32-e mode) hypervisor uses its own
E N D
FASM CON 2009, Myjava, Slovak republic Turning off hypervisorand resuming OSin 100 instructions by Feryno, Czechoslovakia
FASM CON 2009, Myjava, Slovak republic hypervisor (ring-1) and OS (ring0 + ring3) are running correctly (Intel IA32-e mode) hypervisor uses its own private virtual memory translation tables (private CR3, not shared with OS) how to turn off hypervisor and resume OS ?
FASM CON 2009, Myjava, Slovak republic ring0 may initiate the shutdown using the VMCALL instruction (3-bytes instruction) (ring0 privileged instruction) ring3 may initiate the shutdown using the CPUID instruction (2-bytes instruction) both instructions cause unconditional VM EXIT = transfer from ring0 or ring3 into ring -1
FASM CON 2009, Myjava, Slovak republic ring0_initialization: mov rax,shutdown_magic_number vmcall jbe failure call cleanup
FASM CON 2009, Myjava, Slovak republic ; ring -1 part: vm_exit_handler: push rax mov eax,4402h ; vm_exit_reason encodings vmread rax,rax ; read VMCS field cmp ax,18 ; vmcall instruction caused VM exit pop rax jz vm_exit_handler_18
FASM CON 2009, Myjava, Slovak republic vm_exit_handler_18: cmp rax,shutdown_magic_number jz hypervisor_shutdown ... vm_exit_handler_18_bad_request: push rax rcx rdx mov ecx,6820h ; guest RFLAGS encodings vmread rax,rcx ; read guest RFLAGS into RAX ; VMFailValid CF=0, PF=0, AF=0, ZF=1, SF=0, OF=0. and eax,not ( (1 shl 0) + (1 shl 2) + (1 shl 4) + (1 shl 7) + (1 shl 11) ) or al,1 shl 6 ; rflags.ZF=1 (bit 6. of rflags) vmwrite rcx,rax ; write guest RFLAGS into VMCS field mov eax,440Ch ; VM-exit instruction length encoding vmread rcx,rax ; instruction length, rcx=3 for the VMCALL instruction mov eax,681Eh ; guest RIP encoding vmread rdx,rax ; read guest RIP add rcx,rdx ; point guest RIP to the instruction after VMCALL vmwrite rax,rcx ; write guest RIP into VMCS field pop rdx rcx rax vmresume
FASM CON 2009, Myjava, Slovak republic hypervisor_shutdown: prologue read necessary informations using VMREAD instructions execute the VMXOFF instruction restore necessary registers epilogue and allow OS to run
FASM CON 2009, Myjava, Slovak republic ; data and structure used by shutdown ; the count of words in data and qwords in structure is the same, ; the n-th word in data is VMCS field encodings of the n-th qword in structure ; data: VMCS_fields_encodings: dw 0800h dw 0802h dw 0804h dw 0806h dw 0808h dw 080Ah dw 080Ch dw 080Eh ... dw 6826h NUMBER_OF_VMCS_FIELDS = \ ($ - VMCS_fields_encodings) / 2 ; there are about 20-30 words of data ; structure: struc VMCS_FIELDS { .guest_ES_selector dq ? .guest_CS_selector dq ? .guest_SS_selector dq ? .guest_DS_selector dq ? .guest_FS_selector dq ? .guest_GS_selector dq ? .guest_LDTR_selector dq ? .guest_TR_selector dq ? ... .guest_IA32_SYSENTER_EIP dq ? } ; there are about 20-30 qwords in structure
FASM CON 2009, Myjava, Slovak republic ; prologue push rax rcx rdx rbx rbp ; we need some stack frame c = NUMBER_OF_VMCS_FIELDS * 8 ; stack frame for reading ; necessary VMCS fields b = 16 ; stack frame for IDT a = 16 ; stack frame for GDT sub rsp,a+b+c
FASM CON 2009, Myjava, Slovak republic ; read VMCS fields into the stack frame virtual at rsp + a + b sfsh VMCS_FIELDS end virtual lea rdx,[VMCS_fields_encodings] mov ecx,number_of_VMCS_fields - 1 sd_read_all_fields: movzx eax,word [rdx + rcx*2] vmread qword [sfsh + rcx*8],rax dec ecx jns sd_read_all_fields
FASM CON 2009, Myjava, Slovak republic ; execute the VMXOFF instruction vmxoff ; Now we can't use VMxxx instructions anymore. ; This is the reason why we have already read ; everything necessary using vmread instructions.
FASM CON 2009, Myjava, Slovak republic loading OS virtual memory translation tables • disabling long mode and paging also (requires identity mapped memory page which has the same physical and virtual addresses, necessary at the moment of disabling paging when virtual memory disappeares), then restore CR3 of OS, then enable paging and long mode (hard to do if CR3 is 0000000100000000h or even higher) • do it on the fly using Global pages feature • (the same principle used during task switching in multitasking OS, processes have different CR3)
FASM CON 2009, Myjava, Slovak republic ; loading OS paging tables using Global pages ; We are going to change CR3. We use the TLB (translation lookaside buffer) ; to have valid translation of virtual into physical memory. ; Make all pages (translation tables, code, data, stack) of the just now ; shutdowned hypervisor global. We are going to execute MOV CR3,new_cr3 ; and then global pages stay in TLB so we will be able to continue. ; Hypervisor had also physical pages holding translation tables mapped into ; its virtual memory to make them easily accessible from its virtual memory. mov rax,cr4 or al,1 shl 7 ; Page Global Enable, bit 7. mov cr4,rax
FASM CON 2009, Myjava, Slovak republic host_virtual_address = 0FFFF800000000000h number_of_PT_entries = 512 ; (all PT entries with the above settings fit into 1 aligned physical memory page of 4 kB) lea rdx,[host_PT_tables] mov ecx,number_of_PT_entries - 1 make_global_pages: mov eax,[rdx+rcx*8] or ah,1 shl (8-8) ; PTE.G (global) movnti [rdx+rcx*8],eax dec ecx jns make_global_pages
FASM CON 2009, Myjava, Slovak republic ; Invalidate the TLB by copying CR3 into itself : mov rcx,cr3 mov cr3,rcx ; the TLB is now empty. the first instruction accessing ; the code in global page will put its virtual memory ; translation into TLB. the first instruction accessing ; stack page which is global also will fill TLB with the ; 1 stack page virtual memory translation. if the code ; of hypervisor shutdown procedure fits into 1 global ; page and stack into 1 global page, we may continue, ; if they are in more pages, we must access all these ; pages (read from stack page, execute instruction in ; code page) to load them into TLB before continuing
FASM CON 2009, Myjava, Slovak republic ; control registers ; note the first instruction forces the 1 global page ; holding code and the 1 global page of stack ; (sfsh is structure in stack) to be loaded into TLB mov rax,[sfsh.guest_CR4] mov rcx,[sfsh.guest_CR3] mov rdx,[sfsh.guest_CR0] or al,(1 shl 7) + (1 shl 5) ; CR4.PGE, PAE or edx,(1 shl 31) + (1 shl 0) ; CR0.PG, PE mov cr4,rax mov cr3,rcx mov cr0,rdx
FASM CON 2009, Myjava, Slovak republic ; descriptor tables mov ax,word [sfsh.guest_GDTR_limit] mov cx,word [sfsh.guest_IDTR_limit] mov word [rsp + 8-2],ax mov word [rsp + a + 8-2],cx mov rdx,[sfsh.guest_GDTR_base] mov rax,[sfsh.guest_IDTR_base] mov [rsp + 8],rdx mov [rsp + a + 8],rax lgdt [rsp + 8-2] lidt [rsp + a + 8-2]
FASM CON 2009, Myjava, Slovak republic ; selectors mov es,word [sfsh.guest_ES_selector] mov ds,word [sfsh.guest_DS_selector] mov fs,word [sfsh.guest_FS_selector] mov gs,word [sfsh.guest_GS_selector] lldt word [sfsh.guest_LDTR_selector] ; fs base, gs base will be updated later, ; updating fs base, gs base before fs, gs ; selectors is useless (loading fs, gs always ; destroys the old fs, gs base)
FASM CON 2009, Myjava, Slovak republic ; task register (at first make busy TSS available) ; rdx = guest_GDT_base movzx eax,word [sfsh.guest_TR_selector] mov ecx,eax and al,not 111b ; test cl,100b ; TI (Table Indicator) ; jz vm_exit_handler_18_L0 ; mov rdx,[sfsh.guest_LDTR_base] ; TSS can’t be in LDT because of #GP ; vm_exit_handler_18_L0: and byte [rdx+rax*1+5],not 0010b ltr cx
FASM CON 2009, Myjava, Slovak republic ; fs.base, gs.base (never before updating fs, gs) mov ecx,MSR_IA32_FS_BASE mov eax,dword [sfsh.guest_FS_base] mov edx,dword [sfsh.guest_FS_base+4] wrmsr mov ecx,MSR_IA32_GS_BASE mov eax,dword [sfsh.guest_GS_base] mov edx,dword [sfsh.guest_GS_base+4] wrmsr
FASM CON 2009, Myjava, Slovak republic ; SYSENTER MSRs mov ecx,MSR_IA32_SYSENTER_CS movzx eax,[sfsh.guest_IA32_SYSENTER_CS] xor edx,edx wrmsr mov ecx,MSR_IA32_SYSENTER_ESP mov eax,[sfsh.guest_IA32_SYSENTER_ESP] mov edx,[sfsh.guest_IA32_SYSENTER_ESP+4] wrmsr mov ecx,MSR_IA32_SYSENTER_EIP mov eax,[sfsh.guest_IA32_SYSENTER_EIP] mov edx,[sfsh.guest_IA32_SYSENTER_EIP+4] wrmsr
FASM CON 2009, Myjava, Slovak republic ; debug registers test [sfsh.VM_exit_controls],1 shl 2 jz after_restoring_guest_debug_state ; CPU saved guest debug state during VM exit ; into guest VMCS fields, we will restore them mov ecx,MSR_IA32_DEBUGCTL mov eax,[sfsh.guest_IA32_DEBUGCTL] mov edx,[sfsh.guest_IA32_DEBUGCTL + 4] wrmsr mov rax,[sfsh.guest_DR7] mov dr7,rax after_restoring_guest_debug_state:
FASM CON 2009, Myjava, Slovak republic ; preparing RIP, CS, RFLAGS, RSP, SS mov rbp,[sfsh.guest_RIP] add rbp,[sfsh.vm_exit_instruction_length] movzx ebx,word [sfsh.guest_CS_selector] mov edx,dword [sfsh.guest_RFLAGS] mov rcx,[sfsh.guest_RSP] movzx eax,word [sfsh.guest_SS_selector] ; signalizing VMsucceed ; CF=0, PF=0, AF=0, ZF=0, SF=0, OF=0. and edx,not ( (1 shl 0) + (1 shl 2) + (1 shl 4) + \ (1 shl 6) + (1 shl 7) + (1 shl 11) )
FASM CON 2009, Myjava, Slovak republic ; procedure epilogue + resuming OS add rsp,a+b+c ; discard stack frame xchg [rsp+8*0],rbp ; restore RBP and store RIP xchg [rsp+8*1],rbx ; restore RBX and store CS xchg [rsp+8*2],rdx ; restore RDX and store rflags xchg [rsp+8*3],rcx ; restore RCX and store RSP xchg [rsp+8*4],rax ; restore RAX and store SS iretq ; db 48h,0CFh; restore: ; RIP, CS, RFLAGS, RSP, SS ; (run the OS)
FASM CON 2009, Myjava, Slovak republic ; cleanup mov rax,shutdown_magic_number vmcall jbe failure call cleanup cleanup: mov rax,host_virtual_address mov ecx,(number_of_PT_entries-1)*1000h remove_TLB_entries: invlpg [rax+rcx*1] sub ecx,1000h jnc remove_TLB_entries ret
FASM CON 2009, Myjava, Slovak republic That was a way how to turn off hypervisor and resume OS in about 100 instructions. Good? No. It is VERY POOR !!! Now a guy who is able to turn off hypervisor in 1 instruction !!!
FASM CON 2009, Myjava, Slovak republic The guy is now hardly thinking how to resume the OS in 1 instruction !!!