940 likes | 1.1k Views
文件管理 File Management. 文件 Files. 用于数据的输入和输出 应用程序的输出可永久的存储在文件中. 文件中使用的术语. Field/Item (字段 / 数据项) basic element of data (数据的基本单位) contains a single value (存储一个基本信息) characterized by its length and data type (通过长度及类型来标识) Record (记录) collection of related fields (相关字段的集合)
E N D
文件 Files • 用于数据的输入和输出 • 应用程序的输出可永久的存储在文件中
文件中使用的术语 • Field/Item(字段/数据项) • basic element of data(数据的基本单位) • contains a single value(存储一个基本信息) • characterized by its length and data type(通过长度及类型来标识) • Record(记录) • collection of related fields(相关字段的集合) • treated as a unit(作为一个单元) • Example: employee record
文件中使用的术语 • File • collection of similar records(相同类型记录集合) • treated as a single entity(作为一个实体) • have unique file names(有文件名) • may restrict access(可以限制存取权限) • Database • collection of related data(相关数据集合) • relationships exist among elements(元素之间存在关系)
文件的类型 • 按文件的保护方式 • 只读文件 • 读写文件 • 不保护文件 • 按Linux中文件的组织形式 • 普通文件 • 目录文件 • 特别文件(如终端/打印机/网络..) • 按用途 • 系统文件 • 库文件(*.DLL动态链接库) • 用户文件( *.doc) • 按使用情况 • 临时文件( *.tmp ~xxx.doc) • 档案文件 • 永久文件 • 按信息流向 • 输入文件 • 输出文件 • 输入/输出文件
文件的命名 • 短文件名格式(8.3格式)fdisk.exe • 长文件名格式 < 255字符 • 是否区分大小写 • Linux 区分大小写 • Windows 不区分
文件操作 C语言中文件操作函数 fopen ( ) fclose ( ) fread ( ) fwrite ( ) fseek ( ) feof ( ) fgetc ( ) fgets ( ) fprintf ( ) fputc ( ) fputs ( ) fscanf ( ) ftell ( ) 取文件当前位置 rewind ( ) 置于文件头 常规文件操作 Create 创建文件 Delete 删除文件 Open 打开文件 Close 关闭文件 Retrieve_All 取全部内容 Retrieve_One 取一条记录 Retrieve_Next 取下一条记录 Retrieve_Previous 取前一记录 Insert_One 插入一条记录 Delete_One 删除一条记录 Update_One 更新一条记录 Seek 指定读/写位置
File Management System • 提供用户对文件的存取服务 • 用户无须开发文件管理软件 • 为用户提供的其它功能 • 创建、读写、删除文件 • 指定其它用户对自己文件的访问权限 • 受控访问其它用户的文件 • 重新构建文件 • 备份文件
File System Software Architecture文件系统的结构 User Program 堆文件 顺序文件 索引顺序文件 索引文件 直接文件 逻辑文件 Indexed Sequential Sequential Indexed Hashed Pile Logical I/O Basic I/O Supervisor(管理程序) Basic File System Disk Device Driver Tape Device Driver
文件管理的功能 • 目录管理 • 文件内容的组织 • 文件存储空间的管理 • 文件操作 • 文件的共享、保护和保密
文件组织的标准 • Rapid access(快速存取) • needed when accessing a single record • not needed for batch mode • Ease of update(易于修改) • file on CD-ROM will not be updated, so this is not a concern • Economy of storage(存储的经济性) • should be minimum redundancy in the data(数据最小冗余) • redundancy can be used to speed access such as an index(索引会增加冗余但为提高存取速度) • Simple maintenance(易于维护) • Reliability (可靠)
File Organization(文件的逻辑组织) 1、The Pile(堆文件) • data are collected in the order they arrive(数据按到达次序堆放) • purpose is to accumulate a mass of data and save it(积累数据并保存) • records may have different fields(记录即每次获得的数据可能具有不同的结构) • no structure(文件没有统计的记录结构) • record access is by exhaustive search(信息查找需要遍历) 记录 1 记录2 记录3 记录4 记录5 记录 6
File Organization文件的逻辑组织 2、The Sequential File(顺序文件) • fixed format used for records(统一记录结构) • records are the same length • all fields the same (order and length) • field names and lengths are attributes of the file • one field is the key filed(主键) • uniquely identifies the record(唯一标识记录) • records are stored in key sequence(按主键排序) • new records are placed in a log file or transaction file(先临时放在一个日志文件中) • batch update is performed to merge the log file with the master file(周期性地将日志文件中的内容并入主文件中) Record 1 Record 2 Record 3
File Organization文件的逻辑组织 3、Indexed Sequential File(索引顺序文件) • index provides a lookup capability to quickly reach the vicinity of the desired record(利用索引快速定位到记录) • contains key field and a pointer to the main file(索引表存放一个主键值及相应记录位置) • index is searched to find highest key value that is equal or less than the desired key value(在索引表中找到最大的<=所要查找的主键值的索引项) • search continues in the main file at the location indicated by the pointer(从此处开始往下找)
n Index Levels 2 1 Main File Overflow File File Organization文件的逻辑组织 • Indexed Sequential File (索引顺序文件) • new records are added to an overflow file • record in main file that precedes it is updated to contain a pointer to the new record • the overflow is merged with the main file during a batch update • multiple indexes for the same key field can be set up to increase efficiency
File Organization文件的逻辑组织 • Comparison of sequential and indexed sequential顺序文件与索引顺序文件的比较 • Example: a file contains 1 million records • On average 500,00 accesses are required to find a record in a sequential file • If an index contains 1000 entries, it will take on average 500 accesses to find the key, followed by 500 accesses in the main file. Now on average it is 1000 accesses
File Organization文件的逻辑组织 4、Indexed File(索引文件) • uses multiple indexes for different key fields为不同的关键字域建立索引 • may contain an exhaustive index that contains one entry for every record in the main file(穷举索引:所有记录对应一索引项) • may contain a partial index(部分索引) Exhaustive Index Exhaustive Index Partial Index
File Organization文件的逻辑组织 5、The Direct, or Hashed, File(直接文件) • directly access a block at a known address • key field required for each record Key 关键字 f Hash Function 哈希函数 Primary File 主文件 Overflow File 溢出文件
File Organization文件的逻辑组织 6、分区文件 • 压缩文件.zip .rar • 函数库.lib 索引区 Part A Part B Part C Part A Part C 文件区 Part B
File Directories文件目录 • Contains information about files 包含文件的信息 • attributes 文件属性 • location 存储位置 • ownership 所有者关系 • Directory itself is a file owned by the operating system 目录本身是操作系统拥有的一个文件 • Provides mapping between file names and the files themselves提供文件名与文件内容间的映射关系 • FCB文件控制块
Simple Structure for a Directory简单的目录结构 • List of entries, one for each file文件入口列表,每个文件一项 • Sequential file with the name of the file serving as the key文件名作为关键字顺序存放 • Forces user to be careful not to use the same name for two different files用户应避免不同文件取相同名称
Two-level Scheme for a Directory二级目录结构 • One directory for each user and a master directory有一个主目录,每一用户对应有一个子目录 • Master directory contains entry for each user主目录包含每一用户目录的入口 • provides address and access control information提供入地址及存取控制信息 • Each user directory is a simple list of files for that user用户子目录为一简单文件列表 • Still provides no help in structuring collections of files仍没有提供文件的结构化管理
Hierarchical, or Tree-Structured Directory 多级目录/树形目录结构 • Master directory with user directories underneath it主目录下有用户目录 • Each user directory may have subdirectories and files as entries每一用户目录可包含文件或下一级子目录 • Files can be located by following a path from the root, or master, directory down various branches • this is the pathname for the file 文件路径 • Can have several files with the same file name as long as they have unique path names不同目录可有同名文件 • Current directory is the working directory当前工作目录 • Files are referenced relative to the working directory相对路径 ../Help/index.htm
Master Directory System User A User B User C Directory User C Directory User A Directory User B Draw Word Directory Word Directory Draw Unit A ABC Directory Unit A ABC File ABC File ABC Pathname: /User B/Word/Unit A/ABC
目录操作 DOS系统命令 md rd dir cd cd Linux系统命令 mkdir rmdir ls cd pwd • 创建目录 • 删除目录 • 显示目录 • 改变当前目录 • 建立目录链接 • 删除链接 • 显示当前路径
File Sharing文件共享 • Way to control access to a particular file • Users or groups of users are granted certain access rights to a file
Access Rights 存取权限 • None 无 • user may not know of the existence of the file • user is not allowed to read the user directory that includes the file • Knowledge 知道 • user can only determine that the file exists and who its owner is • Execution 执行 • the user can load and execute a program but cannot copy it • Reading 读 • the user can read the file for any purpose, including copying and execution • Appending 增补/追加 • the user can add data to the file but cannot modify or delete any of the file contents
Access Rights • Updating 更新 • the user can modify, deleted, and add to the file’s data. This includes creating the file, rewriting it, and removing all or part of the data • Changing protection • user can change access rights granted to other users • Deletion 删除 • user can delete the file • Owners 文件所有者 • has all rights previously listed • may grant rights to others using the following classes of users • specific user • user groups • all for public files
Simultaneous Access 文件的同时存取 • User may lock entire file when it is to be updated 更新时锁定整个文件 • User may lock the individual records during the update 更新时锁定个别记录 • Mutual exclusion and deadlock are issues for shared access 文件共享引起互斥和死锁问题
Track 2 R5 R6 R7 R8 Record Blocking Methods块记录方式 - Fixed Blocking块大小固定 Data Waste due to record fit to block size Gaps due to hardware design Waste due to block size constraint from fixed record size Waste due to block fit to track size
R1 R2 R3 R4 R4 R5 R6 Track 1 R6 R7 R8 R9 R9 R10 R11 R12 R13 Track 2 Data Waste due to record fit to block size Gaps due to hardware design Waste due to block size constraint from fixed record size Waste due to block fit to track size Record Blocking Methods - Variable Blocking:Spanned可变大小,可跨跃
R1 Track 1 R2 R4 R5 R3 R8 R6 R7 R9 R10 Track 2 Data Waste due to record fit to block size Gaps due to hardware design Waste due to block size constraint from fixed record size Waste due to block fit to track size Record Blocking Methods - Variable Blocking:UnSpanned不可跨跃
Portion Size 块大小 • Contiguity of space increases performance大块可提高系统性能 • Large number of small portions increases the size of tables needed大量的小块需要增大分配表的容量 • Fixed-size simplifies the reallocation of space 固定大小可简化空间的分配 • Variable-size minimizes waste of unused storage 可变大小可减小空间浪费
文件分配方式 • Contiguous allocation(连续分配,位图法) • single set of blocks is allocated to a file at the time of creation(文件只被分配在一个区域) • only a single entry in the file allocation table • starting block and length of the file • Fragmentation will occur(磁盘上会产生碎块) • Will become difficult to find contiguous blocks of sufficient length
Contiguous File Allocation 连续分配 File Allocation Table FileA File Name Start Block Length 0 1 2 3 4 FileA 2 3 FileB 9 5 5 6 7 8 9 FileC 18 8 FileB FileD 30 2 10 11 12 13 14 FileE 26 3 15 16 17 18 19 FileC 20 21 22 23 24 FileE 25 26 27 28 29 FileD 30 31 32 33 34
文件分配方式 • Chained allocation(链式分配) • allocation on basis of individual block • each block contains a pointer to the next block in the chain • only single entry in the file allocation table • starting block and length of file • No fragmentation(分裂) • Any free block can be added to the chain • No accommodation(预留) of the principle of locality
Chained File Allocation 链式分配 File Allocation Table FileB File Name Start Block Length 0 1 2 3 4 ... ... ... FileB 1 5 5 6 7 8 9 ... ... ... 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
文件分配方式 • Indexed allocation(索引分配) • file allocation table contains a separate one-level index for each file • the index has one entry for each portion allocated to the file • the file allocation table contains block number for the index
Indexed Allocation with Block Portions File Allocation Table FileB File Name Index Block 0 1 2 3 4 ... ... 5 6 7 8 9 FileB 24 ... ... 10 11 12 13 14 15 16 17 18 19 1 8 20 21 22 23 24 3 14 25 26 27 28 29 28 30 31 32 33 34
Indexed Allocation - Variable Length Portions File Allocation Table FileB File Name Index Block 0 1 2 3 4 ... ... 5 6 7 8 9 FileC 24 ... ... 10 11 12 13 14 15 16 17 18 19 Start Block Length 20 21 22 23 24 1 3 28 4 14 1 25 26 27 28 29 30 31 32 33 34
FAT32 MBR主引导扇区 0面0磁道1扇区 MBR (first sector) layout 主引导扇区 分区表项的结构
FAT32 每一个分区的结构 8.3格式的文件目录项结构
DOS操作系统FAT32的链式分配 FAT文件分配表
LINUX文件系统概述 (1) 文件系统的组织是分级树形结构形式。Linux文件系统的结构基本上是一棵倒向的树,这棵树的根是根目录,树上的每个结点都是一个目录,而树的叶则是信息文件。每个用户都可建立自己的文件系统,并把它安装到Linux文件系统上,从而形成一棵更大的树。当然,也可以把安装上去的文件系统完整地拆卸下来,因而,整个文件系统显得十分灵活、方便。
(2) 文件的物理结构为混合索引式文件结构。所谓混合索引文件结构,是指文件的物理结构可能包括多种索引文件结构形式,如单级索引文件结构、两级索引文件结构和多级索引文件结构形式。这种物理结构既可提高对文件的查询速度,又能节省存放文件地址所需的空间。 (3) 采用了成组链接法管理空闲盘块。这种方法实际上是空闲表法和空闲链法相结合的产物,它兼备了这两种方法的优点而克服了这两种方法都有的表(链)太长的缺点,这样,既可提高查找空闲盘块的速度,又可节省存放盘块号的存储空间。
(4) 引入了索引结点的概念。在Linux系统中,把文件名和文件的说明部分分开,分别作为目录文件和索引结点表中的一个表项,这不仅可加速对文件的检索过程,减轻通道的I/O压力,而且还可给文件的联接和共享带来极大的方便。
2.文件系统的结构 由于在Linux系统中,文件名和文件属性(说明)分开存放,由文件属性构成文件的索引结点,这使Linux的目录项与一般文件系统的目录项不同,故Linux文件系统的结构也与一般文件有所差异。图10-18示出了引入索引结点后所形成的Linux文件系统结构。其中,根目录中的bin是二进制系统文件的根目录;usr为用户文件的根目录;dev为特殊文件的根目录。
3.文件系统的资源管理 为了在系统中保存一份文件,必须花费一定的资源,当文件处于“未打开”状态时,文件需占用三种资源: (1) 一个目录项,用以记录文件的名称和对应索引结点的编号。 (2) 一个磁盘索引结点项,用以记录文件的属性和说明信息,这些都驻留在磁盘上。 (3) 若干个盘块,用以保存文件本身。
当文件被引用或“打开”时,须再增加三种资源:当文件被引用或“打开”时,须再增加三种资源: (1) 一个内存索引结点项,该项驻留在内存中。 (2) 文件表中的一个登记项。 (3) 用户文件描述符表中的一个登记项。