560 likes | 572 Views
This article discusses memory addressing in the Linux operating system, including the Page Global Directory, Kernel Page Tables, and the initialization of kernel page tables. It also covers the mapping of linear addresses to physical addresses in the first phase of paging.
E N D
Linux Operating System 許 富 皓 1
Memory Addressing -- with the assistance of 江瑞敏and 許齊顯 2
Entries of Page Global Directory The content of the first entries of the Page Global Directory that map linear addresses lower than 0xc0000000 (the first 768 entries with PAE disabled, or the first 3 entries with PAE enabled) depends on the specific process. Conversely, the remaining entries should be the same for all processes and equal to the corresponding entries of the master kernel Page Global Directory. 3
Kernel Page Tables The kernel maintains a set of page tables for its own use. This set of page tables is rooted at a so-called master kernel Page Global Directory. After system initialization, the set of page tables are never directly used by any process or kernel thread. Rather, the highest entries of the master kernel Page Global Directory are the reference model for the corresponding entries of the Page Global Directories of EVERY regular process in the system. 4
Duplicate the Content of MKPGD copy_prcess() copy_mm() dump_mm() mm_init() mm_alloc_pgd() pgd_alloc() pgd_alloc() pgd_prepopulate_pmd() 5
How Kernel Initializes Its Own page tables A two-phase activity: In the first phase, the kernel creates a limited address space including the kernel’s code segment the kernel’s data segments the initial page tables 128 KB for some dynamic data structures. This minimal address space is just large enough to install the kernel in RAM and to initialize its core data structures. . In the second phase, the kernel takes advantage of all of the existing RAM and sets up the page tables properly. 6
The Special Dot Symbol [GNU] The special symbol `.' refers to the current address that as is assembling into. Thus, the expression `melvin: .long .' defines melvin to contain its own address. Assigning a value to . is treated the same as a .orgdirective. Thus, the expression `.=.+4' is the same as saying `.space 4'. 8
The provisional Page Global Directory is contained in the initial_page_table variable. The provisional Page Tables are stored starting from __brk_base. initial_page_table and __brk_base 9
Assumption CPU architecture is x86_32. vmlinux[wikipedia] size is 7MB. On Linux systems, vmlinux is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, which includes ELF, COFF and a.out. boot loader put linux kernel at physical address 0x01000000. 10
Phase One Mapping Size In order to map 24 MB of RAM, 6 Page Tables are required. 16MB (reserved memory) + 7 MB (vmlinux size) + 1 MB (MAPPING_BEYOND_END) 11
MAPPING_BEYOND_END Beside mapping the vmlinux, linux kernel will map additional memory for bootmem allocator. In x86_32 with PAE disable, the value of MAPPING_BEYOND_ENDis 1MB. bootmem allocator: When a system is initialized, there is no buddy system and slab allocator; hence, bootmem allocator is responsible for memory management and memory allocation. 12
physical address Physical Address Layout 0x00000000 16MB 7 MB: vmlinux 0x01000000 1 MB: mapping beyond end 0x01700000 0x017fffff 13
initial_page_table in Phase One The objective of this first phase of paging is to allow these 24MB of RAM to be easily addressed both in protected mode before and after paging is enabled. Therefore, the kernel must create a mapping from both the linear addresses 0x00000000 through 0x017fffff and the linear addresses 0xc0000000 through 0xc17fffff into the physical addresses 0x00000000 through 0x017fffff. In other words, the kernel during its first phase of initialization can address the first 24 MB of RAM by either linear addresses identical to the physical ones or 24 MB worth of linear addresses, starting from 0xc0000000. 14
Mapping Linear Addressesto Physical Addresses in Phase One (1) pt physical address linear address 0x00000000 0x00000000 pgd 4 K 24M 4 K … 24M 4 K 0x017fffff 0x017fffff 0xc0000000 24M 0xc17fffff 0xffffffff 15
Mapping Linear Addressesto Physical Addresses in Phase One (2) pt physical address linear address 0x00000000 0x00000000 pgd 4 K 24M 4 K … 24M 4 K 0x017fffff 0x017fffff 0xc0000000 24M 0xc17fffff 0xffffffff 16
Contents of initial_page_table in Phase One The Kernel creates the desired mapping by filling all the initial_page_table entries with zeroes, except for entries 0 ~ 5, 0x300 (decimal 768) ~ 0x305 (decimal 773); the latter six entries span all linear addresses between 0xc0000000 and 0xc17fffff. The 0 ~ 5, 0x300 ~ 0x305 entries are initialized as follows: The address field of entries 0 and 0x300 is set to the physical address of __brk_base. 17
Initialize initial_page_table page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi movl $pa(initial_page_table), %edx movl $PTE_IDENT_ATTR, %eax 10: leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ movl %ecx,(%edx) /* Store identity PDE entry */ movl %ecx, page_pde_offset(%edx) /* Store kernel PDE entry */ addl $4,%edx movl $1024, %ecx 11: stosl addl $0x1000,%eax loop 11b /* * End condition: we must map up to the end + MAPPING_BEYOND_END. */ movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %ebp cmpl %ebp,%eax jb 10b addl $__PAGE_OFFSET, %edi movl %edi, pa(_brk_end) shrl $12, %eax movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax movl %eax,pa(initial_page_table+0xffc) 0xc00 (=0x300 * 4) number of entries in PTs. 4k 18
Phase 1: Page Table Layout physical address __brk_base (pte) initial_pagr_table (pgd) 0x00000000 … 4 MB 24 MB : : … 0x017fffff 19
eip eip Objectives of initial_page_table When executing file kernel/head.S, values of eip are within the range between 0x00000000 and 0x017fffff. 88 ENTRY(startup_32) /*protected mode code*/ 99 lgdt pa(boot_gdt_descr) : 211 movl $pa(initial_page_table), %edx : 390 /* Enable paging */ 391 movl $pa(initial_page_table), %eax 392 movl %eax,%cr3 393 movl $CR0_STATE,%eax 394 movl %eax,%cr0 395 ljmp $__BOOT_CS,$1f 396 1: 398 addl $__PAGE_OFFSET, %esp : 448 lgdt early_gdt_descr 449 lidt idt_descr : 468 jmp *(initial_code) : 679 ENTRY(initial_page_table) 680 .fill 1024,4,0 : 718 ENTRY(stack_start) 719 .long init_thread_union+THREAD_SIZE : 754 boot_gdt_descr: 755 .word __BOOT_DS+7 : 759 idt_descr: 760 .word IDT_ENTRIES*8-1 … 765 ENTRY(early_gdt_descr) 766 .word GDT_ENTRIES*8-1 logical address || virtual address (segment base address =0) || physical address (paging is not enabled yet.) Before paging is enable (before line 190), eip’s values are equal to physical addresses. After paging is enable, eip’s values use entry 0 to entry 5 of initial_page_table to tranfer into physical addresses. virtual address physical address Function i386_start_kernel() is inside a pure C program (head32.c); hence, its address is above 0xc0000000;therefore, after this instruction, values of eip will be greater than 0xc0000000. Paging Unit 20
Enable the Paging Unit The startup_32( )assembly language function also enables the paging unit. This is achieved by loading the physical address of initial_page_table into the cr3 control register and by setting the PG flag of the cr0 control register, as shown in the following equivalent code fragment: movl $pa(initial_page_table), %eax movl %eax,%cr3 movl $CR0_STATE,%eax movl %eax,%cr0 21
Initial Page Global Directories startup_32( ) i386_start_kernel() start_kernel(void) setup_arch() cr3points to initial_page_table cr3points to swapper_pg_dir 22
Phase 2 23
Change Page Global Directory • setup_arch() • initial_page_tablecopies to swapper_pg_dir first. • cr3points to swapper_pg_dir. • Change the content of swapper_pg_dir. • After be initialized, the content of swapper_pg_dir will be copied into initial_page_table. • cr3 continuously points to swapper_pg_dir. 24
Function Call Chain to kernel_physical_mapping_init() init_memory_mapping() setup_arch() init_mem_mapping() kernel_physical_mapping_init() • setup_arch()writes the physical address of swapper_pg_dir in the cr3 control register using load_cr3(swapper_pg_dir). 25
kernel_physical_mapping_init() Reinitialize swapper_pg_dir Invokes__flush_tlb_all() to invalidate all TLB entries. 26
startup_32start_kernelsetup_archpaging_init paging_init is no longer in charge of initializing swapper_pg_dir which is one of its major work in Linux versions around 2.6.16. But the initialization of swapper_pg_dir is executed by kernel_physical_mapping_init(). Function Call to paging_init [1][2][3][4] [1][2][3][4] 27
paging_init(): Invokepagetable_init() Invokes __flush_tlb_all() to invalidate all TLB entries #ifdef CONFIG_HIGHMEM pagetable_init(): Invokespermanent_kmaps_init() permanent_kmaps_init() Invokepage_table_range_init() #else pagetable_init(): do nothing. paging_init() 28
Important Function Call In Phase 2 setup_arch() x86_init.paging.pagetable_init init_mem_mapping paging_init init_memory_mapping early_ioremap_page_table_range_init pagetable_init kmap_init kernel_pysical_mapping_init permenent_kmaps_init others continuous linear mapping 29
How Kernel Initializes Its Own Page Tables --- Phase 2 Finish the Page Global Directory The final mapping provided by the kernel Page Tables must transform virtual addresses starting from0xc0000000to physical addresses starting from0x00000000. There are two different configurations that will affect the size of the linear mapping region. CONFIG_HIGHMEM CONFIG_NOHIGHMEM 30
CONFIG_NOHIGHMEM IfCONFIG_NOHIGHMEMis set, the kernel can only access physical memory less than 1024 MB. There are 2 cases in this configuration: Case 1: RAM size is less than 895 MB. Why 895 MB? Case 2: RAM size is between 895 MB and 1024 MB. 31
CONFIG_HIGHMEM If CONFIG_HIGHMEMis set, the kernel can access physical memory larger than 1024 MB. There are 3 cases in this configuration: Case 1: RAM size is less than 887 MB. Case 2: RAM size is between 887 MB and 4096 MB. Case 3: RAM size is larger than 4096 MB. 32
Assumption We assume that the kernel is configuredasCONFIG_HIGHMEM. The following three cases will be discussed: Case 1: RAM size is less than 887 MB. Case 2: RAM size is between 887 MB and 4096 MB. Case 3: RAM size is larger than 4096 MB. P.S.: The operations performed in case 1 and case 2 of configuration CONFIG_NOHIGHMEM are the same as the ones in case 1 and case 2 of configuration CONFIG_HIGHMEM. 33
Phase 2 Case 1: When RAM Size Is Less Than 887MB 34
Assumption We assume that the CPU is a 80x86 microprocessor supporting 4 MB pages and "global" TLB entries. Notice that the User/Supervisor flags in all Page Global Directory entries referencing linear addresses above 0xc0000000 are cleared, thus denying processes in User Mode access to the kernel address space. Notice also that the Page Sizeflag is set so that the kernel can address the RAM by making use of large pages. 35
Linear Address and Physical Address Mapping hole Linear address 887MB 0xff7fe000 0xffc00000 0xfffff000 0xc0400000 0 0xc0000000 4M 880M 0xff800000 0xfffa1000 4k mapping 4M mapping 4k mapping 0x00000000 0x00400000 887MB Physical address 36
Clearance of Page Global Directory Entries Created in Phase 1 The identity mapping of the first 24 megabytes of physical memory built by the startup_32( ) function is required to complete the initialization phase of the kernel. When this mapping is no longer necessary, the kernel clears the corresponding page table entries. 37
MKPGD Mapping physical memory : 4M= 1024x4k swapper_pg_dir pt . . . 3M= 768x4k : 34 entries pt 38
Phase 2 Case 2: When RAM Size Is between 887MB and 4096MB 39
Phase 2 – Case 2 Final kernel page table when RAM size is between 887MB and 4096 MB : In this case, the RAM CNNNOT be mapped entirely into the kernel linear address space, because the address space is only 1GB. Therefore, during the initialization phase Linux only maps a RAM window having size of 887MB into the kernel linear address space. If a program needs to address other parts of the existing RAM, some other linear address interval (from the 888th MB to the 1st GB) must be mapped to the required RAM. This implies changing the value of some page table entries. 40
Phase 2 – Case 2 Code To initialize the Page Global Directory, the kernel uses the same code as in the previous case. 41
Linear Address and Physical Address Mapping hole Linear address 887MB 0xff7fe000 0xffc00000 0xfffff000 0xc0400000 0 0xc0000000 4M 880M 0xff800000 0xfffa1000 4k mapping 4M mapping 4k mapping 0x00000000 0x00400000 887MB Physical address 42
MKPGD Mapping physical memory : 4M= 1024x4k swapper_pg_dir pt . . . 3M= 768x4k : 34 entries pt 43
Phase 2 Case 3: When RAM Size Is More Than 4096MB 44
Assumption Assume: The CPU model supports Physical Address Extension (PAE). The amount of RAM is larger than 4 GB. The kernel is compiled with PAE support. 45
RAM Mapping Principle Although PAE handles 36-bit physical addresses, linear addresses are still 32-bit addresses. As in case 2, Linux maps a 887-MBRAM window into the kernel linear address space; the remaining RAM is left unmapped and handled by dynamic remapping. 46
Layouts of Translation Tables Notice that all CPU models that support PAE also support large 2 MB pages and global pages. As in the previous case, whenever possible, Linux uses large pages to reduce the number of page tables . The first 443 (886/2=443) entries (entry 0 ~ entry 442) in the Page Middle Directory are filled with the physical address of the first 886MB of RAM. Entry 443 points to a Page Table which contains 512 entries. There are 512 entries in the Page Middle Directory, but the last 68 (512-444=68) are reserved for noncontiguous memory allocation . 47
Layouts of Translation Tables swapper_pg_dir pmd 0 1 : : : : : 443 2M 2M : : : 2M 887 MB : : empyt_zero_page 2M 444 445 : : 511 4k : 1M : 4k 68 pt 48
After be initialized, the content of swapper_pg_dir will be copied into initial_page_table. But cr3 continuously points toswapper_pg_dir. static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) { memcpy(dst, src, count * sizeof(pgd_t)); } clone_pgd_range(initial_page_table+KERNEL_PGD_BOUNDARY,swapper_pg_dir+KERNEL_PGD_BOUNDARY,KERNEL_PGD_PTRS); initial_page_table and swapper_pg_dir 49