T3-Memory

T3-Memory

Index • Memory management concepts • Basic Services • Program loading in memory • Dynamic memory • HW support • To memory assignment • To address translation • Services to optimize physical memory usage • COW • Virtual memory • Prefetch • Linux on Pentium

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support Concepts

Execution model • CPU can access only memory and registers • Data and code must be loaded in memory to be referenced • Program loading: allocate memory, write the executable on that memory and pass execution control to the entry point of the program Memory @ CPU Memorycontent

Multi-programmed systems • Multi-programmed systems • Several programs loaded simultaneously in physical memory • Ease concurrent execution and simplify context switch mechanism • 1 process on CPU but N processes in physical memory • When performing a context switch it is not necessary to load again the process that gets assigned the CPU • OS must guarantee physical memory protection between processes • Each process can access only the physical memory that it gets assigned • It must be done by hardware • Memory Management Unit (MMU) Memory Memory P1 P1 P2 P2 @ @ CPU CPU @ MMU Exception? Memory data Memory data

Physical memory vs. Logical memory • “Type” of addresses: • Logical addresses : The memory addresses generated by the CPU • Physical addresses: The memory addresses that arrive to memory • Are they different???  They can be!!! • Current systems offer translation support based on the MMU, it offers: • Memory translation • Memory validation

Address Spaces • Address space: Range of addresses [@first_adress…@last_address] • That concept is applied to different contexts: • Processor address space • Process logical address space • Subset of logical addresses that a process can reference (OS kernel decides which are those valid addresses for each process) • Process physical address space • Relationship between logical addresses and physical addresses • Without translation: logical address space == physical address space • With translation: It can be done at different moments • Option1, During program loading: kernel decides where to place the process in memory and translate references at program loading • Option 2, During program execution: each issued reference is translated at runtime (this is the normal behavior in current systems)

Assignment of addresses to processes • There exists other choices but… current general purpose systems translate @ to instructions and data at runtime • Since logical addresses are decoupled from physical addresses • We can have many processes with the same logical addresses without problem • FORK!!!!!  Parent and child have the same logical address space without conflict • Compiler can translate program references to memory without concerning about other programs references and about which physical addresses will be available when the process starts the execution • Processes are enabled to change their position in memory without changing their logical address space. • Example: Paging (explained in EC course)

Multi-programmed systems with MMU support • Collaboration between MMU (HW) and Kernel (SW) • MMU • It implementsthe mechanism to detect illegal accesses • Out of process logical memory address space • Valid address but invalid access • It throws an exception to the OS if some problem is detected during memory address translation • kernel • It configures MMU • Itmanages the exception according to the situation • For example, if the logical address is not valid it can kill the process (SISEGV signal)

Multiprogrammedsystems: whole picture Process A physical@ Process B MMU Process A -Translation Process A addresses -Protection -Translation Process C addresses -Protection logical@ CPU Process C Exception if invalid access physical@ Process C Memorycontent PhysicalMemory 1-Process A isrunning * Butprocess A and C are bothloadedonmemory 2-Context switchto C

When does the OS need to update the MMU??? • Case 1: When assigning memory • Initialization when assigning new memory (mutation, execlp) • Changes in the address space: grows/diminishes. • Case 2: When switching contexts • For the process that leaves the CPU: if it is not finished, then keep in its data structures (PCB) the information to configure the MMU when it resumes the execution • For the process that resumes the execution: configure the MMU

OS tasks in memory management • Program loading in memory • Once loaded we have already seen how it works!! • Allocate/Deallocate dynamic memory (requested through system calls) • Shared memory between processes • COW: transparent sharing of read-only regions between processes • Shared memory explicitly requested through system calls (out of the scope of this course) • Optimization services • COW • Virtual memory • Prefetch

Program loading Dynamic memory Memory assignment Explicit shared memory between processes OS basicservices

Basic services: program loading • Executable file is stored in disk, but it has to be in memory in order to be executed (execlp or similar) • OS has to: • Read and interpret the format of the executable • Prepare in logical memory the process layout and assign physical memory to it • Initialize the PCB attributes related to memory management: Information to configure MMU each time the process resumes the execution • Initialize MMU • Read the program sections from disk and write them to memory • Load program counter register with the address of the entry point instruction, which is defined in the executable file

Programloading: executableformat • STEP 1: Interpret executable format in disk • If address translation is performed at runtime, which kind of address in in the executable file in disk? Logical or physical? • Header in the executable file defines sections: type of section, size and position in the file (try objdump –h program) • There exists several executable file formats • ELF (Executable and Linkable Format): is the most widespread executable format in POSIX systems

Programloading: Processlayout in memory • STEP 2: Prepare the process layout in memory • Usual layout: code/data/heap/stack memory regions max Local variables, parameters and execution control stack invalid Sections in theexecutable file Dynamicmemory: runtimeallocation (sbrk) heap .bss Global variables data .data code .text 0

Programloading Memory Kernel data: free memoryregions, PCB stack stack CPU datos data 01010101… 1-Allocate memory - Kernel data - Process’ PCB 2-Copy executable file 3- Update MMU código code 01010101… MMU Disk .data 01010101… .bss 01010101… .text 01010101…

Optimizations on program loading • Optimizations on program loading (exec system call) • On-demand loading: not all the code lines are executed • Shared libraries and dynamic linking: Many parts of executables are read only and can be shared by more than one process • The goals are • To save time • by loading just parts of the executable • To save memory and storage area • by loading just parts of the executable in memory • by sharing parts of the executable (both in disk and memory)

Optimizations on program loading • On-demand loading • Loading of routines is delayed until they are called • It requires a mechanism to detect if an address is already in memory or not. • Real MMU information is stored at PCB • MMU exception code validates the memory address, if correct • updates memory content • updates MMU and PCB attributes • Restart instruction

Optimizations on program loading • Shared libraries and dynamic linking • Libraries can be generated in two different ways: static and dynamic version • Executables can use static or dynamic version of libraries (default is dynamic) • Static: Library code is included in the executable file • Dynamic: Executable files (in disk) do not contain the dynamic library code but just a reference to it • That saves a lot of disk space! (Link phase is delayed until runtime) • When executed, that code loads the library if it is not already loaded in memory and updates the process code to substitute the call to the stub by the call to the routine in the shared library • Processes can share those memory areas holding the same code (it is read only) and the code of libraries  It saves a lot of memory space

BASIC services: Dynamic memory

Dynamic memory allocation/deallocation • System call to ask for an extra memory space or to reduce a previously reserved memory area • Heap area: region in the process address space that holds dynamic memory allocations • Required when the size of a variable depends on runtime parameters • In this situation, it is not desirable to fix sizes at compiling time, causes over allocation (memory wasting) or under allocation (runtime error) • Optimization • Physical memory assignment can be delayed until the first write access to the region • Temporal assignment of a 0 filled region to manage read accesses (it depends on the interface).

Dynamicmemoryallocation/deallocation • Linux on Pentium • Unix traditional interface is not user friendly • brk and sbrk (we will use sbrk) • Both system calls just update the heap limit. OS does not control which variables are store in the heap, it just increases or decreases heap size. • Programmer is responsible of controlling the position of each variable in the heap. Man pages recommends to use malloc • Returns: previous heap limit • Size_in_bytes values • >0 increases the heap limit by size bytes • <0 decreases the heap limit by size bytes • ==0 it does not modify the heap limit (it is used to get the current heap limit) void *sbrk(size_in_bytes);

Sbrk:example STACK max intmain(intargc,char *argv[]) { intprocs_nb=atoi(argv[1]); int *pids; pids=sbrk(procs_nb*sizeof(int)); for(i=0;i<10;i++){ pids[i]=fork(); if (pids[i]==0){ …. } } sbrk(-1*procs_nb*sizeof(int)); HEAP DATA CODE 0

C library: Dynamicmemoryallocation/deallocation • C library offers to programmers: • The deep management of the heap: it knows which parts are “reserved” and which parts are “free” • The heap size management • Memory allocation: void * malloc(intsize_in_bytes) • If possible, it “reserves” N consecutive non-used bytes of the heap • Otherwise, it asks to the kernel to increases the heap size • Implementation and optimizations • It controls reserved/free areas • The C library tries to reduce the number of system calls to save time • Asking to the kernel for an extra memory space when calling the kernel is mandatory • Memory deallocation: void free(void *p) • Marks as “free to use” a previously “reserved” area

malloc/free: example intmain(intargc,char *argv[]) { intprocs_nb=atoi(argv[1]); int *pids; pids=malloc(procs_nb*sizeof(int)); for(i=0;i<10;i++){ pids[i]=fork(); if (pids[i]==0){ …. } } free(pids); mallocinterface like sbrk interface. free interface needs as input parameter a pointer to the base address of the region

Dynamicmemory: examples ... new = sbrk(1000); ... ... new = malloc(1000); ... • How does the heap change after executing the following examples? • Example 1: • Example 2: • Does the heap size change in both examples?

Dynamicmemory: examples ... ptr = malloc(1000); ... ... for (i = 0; i < 10; i++) ptr[i] = malloc(100); ... • Howdoestheheapchangeafterexecutingthefollowingexamples? • Example 1: • Example 2: • Do bothexamplesallocatethesamelogicalmemoryaddresses? • Example 1: requires 1000 consecutive bytes • Example 2: requires 10 regions of 100 bytes eachone

Dynamicmemory: examples int *x, *ptr; ... ptr = malloc(SIZE); ... x = ptr; ... free(ptr); sprintf(buffer,”...%d”, *x); ... for (i = 0; i < 10; i++) ptr = malloc(SIZE); // uso de la memoria // ... for (i = 0; i < 10; i++) free(ptr); ... • Which errors are in the following codes? • Code 1: What does happen while executing the second iteration of second loop? • Code 2: Does the access to “*x” produce always the same error? • Code 1: • Code 2:

Fixedpartitions: Paging Variable partitions: Segmentation Basic services: memory assignment

Basic services: memoryassignment • It is executed each time a process needs physical memory: • In Linux: creation (fork), load of executable files (exec), dynamic memory usage, implementation of some optimization (on-demand loading, virtual memory, COW…). • Steps • Select free physical memory and mark it in the OS data structures as in-use memory • Update MMU with the mapping information logical @  physical @ • Necessary to implement address translation

Basic services: memoryassignment • First approach: contiguous assignment • Process physical address space is contiguous • The whole process is loaded on a partition which is selected at loading time • It is not flexible and complicates to apply optimizations (as, for example, on-demand loading) and services such as dynamic memory • Non-contiguous assignment • Process physical address space is not contiguous • Flexible • Increases complexity of OS and MMU • Based on • Fixed partitions: Paging • Variable partitions: Segmentation • Combined schemes • For example, segmentation at a first level and paging in a second level explained in EC course

Memoryassignment: fragmentation • Any non-contiguous scheme of allocation of space suffers from fragmentation • Fragmentation problem: when it is not possible to satisfy a given memory request although the system has enough memory to do it. • There is free memory but cannot be assigned to a process. • It appears in the disk management too • Internal fragmentation: memory assigned to a process that is not going to use it. • External fragmentation: free memory that cannot be used to satisfy any memory request because it is not contiguous. • It can be avoided compacting the free memory. It is necessary the system to support address translation at runtime. • Slowdowns applications

Assignment: Paging • Paging based scheme • Logical address space is divided into fixed size partitions: pages • Physical memory is divided into partitions of the same size: frames • Easy to implement memory management since all the frames are equal • Global list of free frames • MMU per-process information stored at PCB • Page: working unit of the OS • Facilitates on-demand loading: 1 page each time • Enables page-level protection: at page level • Facilitates memory sharing between processes : at page level • Usually, a page belongs to just one memory region to match region protection requirements (code/data/heap/stack)

Assignment: Paging • MMU information: Page Table • One entry per page: validity, access permissions(rwx), associated frame, etc. • One table per process • Typically, architectures have a register that points to the current page table

Assignment: Paging • PROBLEM: Page table size (stored in memory) • Page size is usually power of 2 • Typical size 4Kb (2^12) • Affects to • Internal fragmentation and management granularity • Page table size • Scheme to reduce memory needed by PT: multi-level PT • PT is divided into section and more sections are added as process address space grows

Multi-level page tables • It is a good solution in terms of space requirements, but many memory accesses are required to perform an address translation!!! • Current processors also have a TLB (Translation Lookaside Buffer) • Associative memory (cache) of faster access than RAM to keep translation for active pages • It is necessary to update/invalidate TLB for each change in the MMU • Hardware management / Software management (OS) • Dependent on the architecture

Assignment: Paging MMU #page #frame logical@ CPU p o TLB hit physical @ f o TLB p TLB miss f rw Memory Page table Exception

Assignment: Segmentation • Logical address space divided into variable size partitions(segments), that fit the size that is really needed • At least 3 segments: one for code, one for stack and one for data • References to memory are composed of segment and offset • All physical contiguous memory is an available partition • However, they are not equal like in paging • Assignment: for each segment in a process • Look for a partition big enough to hold the segment • Possible policies: first fit, best fit, worst fit • Select from the partition just the amount of memory needed to hold the segment and the rest of the partition is kept in a free partitions list • Can cause external fragmentation

Assignment: Segmentation • MMU • Segment table • For each segment: base @ and size • One table per process MMU Segmenttable s base limit logical@ CPU s o yes + < no Exception: illegal @ Memory

Assignment: Mixedschemes • Mixedschemes: pagedsegmentation • Processlogicaladdressspaceisdividedintosegments • Segments are dividedintopages • Segmentsizeismultiple of page size • Page is OS workingunit logical@ lineal@ physical@ segmentation unit paging unit physical memory CPU

Basic services: Explicitsharedmemory • Explicit memory sharing between processes • Useful as a method to share data between processes • OS must provide programmers with system calls to manage shared memory regions: allocate memory regions and mark them as sharable, thus other processes can map them into their address space

COW Virtual Memory Prefetch Services to optimize physical memory usage

Optimizations: COW (Copy on Write) • Idea: to delay allocation/initialization of physical memory until it is really necessary • If a new zone is never accessed  it is not necessary to assign physical memory to it • If a copied zone is never written  it is not necessary to replicate it • Save time and physical memory space • It can be applied • When asking for dynamic memory • When creating a new process (fork)

COW: Implementation • Kernel uses the MMU (exception mechanism) to detect write accesses to (speculatively) shared memory pages • MMU • New (logical) pages are initialized with existing (physical) frames, but permissions are set as write protected (both, source and new page) • PCB • Real permissions are set here to differentiate fails because of COW from real invalid accesses • When a process tries to write on the new region or on the source region: • OS exception management code performs the actual allocation and copy • Updates MMU with the real permission for both regions and resets the instruction that generates the exception

COW: example • Compute: • how many pages are modified (and thus cannot be shared)? • how many pages are read-only (and thus can be shared) ? • Process A physical memory assignment: • Code: 3 pages, Data: 2 pages, Stack: 1 page, Heap: 1 page • Let’s consider that process A executes a fork system call. Just after fork: • Total physical memory: • Without COW: process A= 7 pages + child = 7 pages = 14 pages • With COW: process A= 7 pages + child =0 pages = 7 pages • Later on the execution… depends on the code executed by the processes, for example: • If child executes an exec (and its new address space uses 10 pages): • Without COW: process A= 7 pages+ child = 10 pages= 17 pages • With COW: process A= 7 pages+ child A=10 pages= 17 pages • If child does not execute an exec, at least code will be always shared between both processes and the rest of the address space depends on the code. If only the code is shared: • Without COW: process A= 7 pages+ child A= 7 pages= 14 pages • With COW: process A= 7 pages+ child A=4 pages= 11 pages

Optimizations: Virtual memory

Optimizations: Virtual memory • Goal • To reduce amount of physical memory assigned to a process To increase potential multiprogramming grade • Idea: We don’t need to have the whole process loaded on memory (we already know that) • Whatifwe introduce themechanism to moveoutpagesfrommemory to…(where)? Frommemory to Disk! New!

Optimizations: Virtual memory • First approach: swapping of the whole process • To much penalty to swap in from disk • Next approach: Use the MMU and the paging mechanism to offer virtual memory at page granularity • If we need a frame for a new frame request, and no physical memory is available  we swap out one allocated frame and we use the hole generated • We need a memory replacement algorithm to select a victim frame to move from memory to disk

Optimizations: Virtual memory • Memory replacement algorithm: executed when OS needs to free frames • Selects a victim page and deletes its translation from MMU • Try to select victim pages that are no longer necessary or that will take long time until be needed • Example: Least Recently Used (LRU) or approximations • Stores it contents in the swap area • Assigns the free frame to that page that requires it • Page Fault: When a non-present page (but valid page) is referenced  MMU throws an exception to the OS as it cannot perform the translation • Kernel exception code for page fault management • Checks if the access is valid (the PCB always contains full information) • Assigns a free frame to the page (starts the memory replacement algorithm if it is necessary) • Searches for the content of the page in the swap area and writes it into the selected frame • Updates MMU with the physical address assigned

T3-Memory

T3-Memory

Presentation Transcript

T3 Gadget magazine

SEAS T3

T3 Webinar

T3 publication

GalNAc-T3

T3 evolution

T3 Team

Algebra T3

Algebra T3

T3 Webinar

T3 Framework

T3 Cycle Dosage

T3 Steroid

T3 Cytomel