170 likes | 327 Views
Reverse Engineered Architecture of the Linux Kernel. Kristof De Vos. Why Linux Kernel?. Linux is a Unix-like Operating System Large system : 800 KLOC Open Source no barriers to discuss the details of the system implementation Has no fully documented architecture.
E N D
Reverse Engineered Architecture of the Linux Kernel Kristof De Vos
Why Linux Kernel? • Linux is a Unix-like Operating System • Large system : 800 KLOC • Open Source • no barriers to discuss the details of the system implementation • Has no fully documented architecture
Conceptual vs. Concrete Architecture • Conceptual: • how developers think about the system • meaningful relationships • Concrete • as-build (as in the implementation) • might include dependencies for debugging, ...
6 Steps 1 Examine existing documentation 2 form conceptual architecture 3 group source files in subsystems based on: • directory structure • naming conventions • source code comments • examining source code
6 Steps 4 Extract relations between source files 5 use relations between source files to determine relations between subsystems 6 form concrete architecture
Conceptual Architecture • Descriptions of related operating systems and existing Linux documentation used:
7 major subsystems • Process Scheduler • responsible for multitasking • Memory Manager • separates memory spaces for each process • uses swapping to support more processes • File System • access to hardware devieces
7 major subsystems • Network Interface • access to network devices • Inter-Process Communication (IPC) • allows communication between processes on the same processor • Initialization • responsible for initialization of the rest of the kernel • Library • routines, used by the whole kernel
File-sub-architecture • Extracted roles: • provide access to a variety of hardware devices • supports several logical file system formats • allows programs to be stored in several executable formats • Further investigations: Facade design pattern • subsystems are accessible through a single interface • subsystem interdependency is reduced
File System subsystems • Main roles are implemented in 5 subsystems: • Device Drivers • performs all communication with hardware devices • Logical File Systems • implements several logical file systems • allows interoperability with different OS • encryption, compression, high performance, ...
subsystems • Executable File Formats • allows execution of different executables • File Quota • limits amount of file storage for individual users • Buffer Cache • memory buffers for I/O-operations • 2 other subsystems define facade interfaces • all information is extracted from other documentation
Extraction • Manual examination too costly (800KLOC) • automated tools (GROK): • manually define the subsystem hierarchy • manually assign source files to subsystems • let the beast loose: • grok examines all source files • finds relations between subsystems • output is not readable for humans • lsedit visually shows relations
Concrete Architecture • Combination of • conceptual architecture • subsystem hierarchy • results of automated tools
Concrete Architecture • Same subsystems, but different dependencies • 19 vs. 37 interprocess dependencies • reasons: • efficiency • exploration, maybe not really needed • possibly faulty