1 / 20

Innodb 表空间数据文件格式

Innodb 表空间数据文件格式. xbjiang@corp.netease.com. 内容. 表空间数据文件内部格式介绍 记录格式介绍及 innodb 的 B+ 树应用。 实例演示,通过简单的工具来查看内部的实现。. innodb 数据文件组成. 表空间数据文件可以分为 2 种: 系统表空间: 保存了 innodb 数据库系统数据,也可以保存数据和索引。 ( 在 innodb 中以索引来保存数据 ) 2. 单独表空间: 当设置 innodb_file_per_table 参数时,数据库保存表的数据和索引到单独的文件,即表空间. 系统表空间的介绍.

Download Presentation

Innodb 表空间数据文件格式

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Innodb 表空间数据文件格式 xbjiang@corp.netease.com

  2. 内容 • 表空间数据文件内部格式介绍 • 记录格式介绍及innodb的B+树应用。 • 实例演示,通过简单的工具来查看内部的实现。

  3. innodb数据文件组成 • 表空间数据文件可以分为2种: • 系统表空间: 保存了innodb数据库系统数据,也可以保存数据和索引。(在innodb中以索引来保存数据) 2. 单独表空间: 当设置innodb_file_per_table参数时,数据库保存表的数据和索引到单独的文件,即表空间

  4. 系统表空间的介绍 • 系统表空间可以有多个文件组成,在配置中指定。 • 系统表空间既包括了innodb数据库的表定义元数据信息,又包括了表的数据和索引信息,还包括一些系统信息如:Insert Buffer, rollback segment, log文件信息.

  5. 单独表空间的介绍 • 单独表空间只有一个数据文件组成。保存了表的数据和索引。 • 单独表空间在数据库中被描述的元数据被保存在系统表空间中,所以innodb的单独表空间不能像myisam表一样简单的导入到别的数据库下。

  6. 表空间文件的物理组成 • 数据文件的最小单位是页,一页的默认大小是16K(8K 16K 32K 64K),可以在编译的时候修改源代码。 /* The universal page size of the database */ #define UNIV_PAGE_SIZE (2 * 8192) /* NOTE! Currently, this has to be a power of 2 */ /* The 2-logarithm of UNIV_PAGE_SIZE: */ #define UNIV_PAGE_SIZE_SHIFT 14

  7. 表空间文件的物理组成 • Innodb对64个page组成1个extent来管理。 • extent的使用可以分为2种:full extent, fragment extent. full extent 被分配的时候,64个page一起被分配。 fragment extent 可以从中分配单独的1个page。

  8. 表空间文件的逻辑组成 • innodb在其内部用segment为来组织数据的存储。 • 1个segment的物理页面有多个full extent和从fragment extent中申请的单独的page组成。 segment以3个链表来保存申请的full extent. 分别表示完全使用,部分使用和空闲。 在1个32slot的数组来保存从fragment extent申请的page,所以一个segment最多有32个从fragment extent分配的page。

  9. 不同类型的系统page

  10. old-style record /* Offsets of the bit-fields in an old-style record. NOTE! In the table the most significant bytes and bits are written below less significant. (1) byte offset (2) bit usage within byte downward from origin -> 1 8 bits pointer to next record 2 8 bits pointer to next record 3 1 bit short flag 7 bits number of fields 4 3 bits number of fields 5 bits heap number 5 8 bits heap number 6 4 bits n_owned 4 bits info bits */

  11. new-style record /* Offsets of the bit-fields in a new-style record. NOTE! In the table the most significant bytes and bits are written below less significant. (1) byte offset (2) bit usage within byte downward from origin -> 1 8 bits relative offset of next record 2 8 bits relative offset of next record the relative offset is an unsigned 16-bit integer: (offset_of_next_record - offset_of_this_record) mod 64Ki, where mod is the modulo as a non-negative number; we can calculate the the offset of the next record with the formula: relative_offset + offset_of_this_record mod UNIV_PAGE_SIZE 3 3 bits status: 000=conventional record 001=node pointer record (inside B-tree) 010=infimum record 011=supremum record 1xx=reserved 5 bits heap number 4 8 bits heap number 5 4 bits n_owned 4 bits info bits */

  12. DATA_TRX_PTR /*Builds a roll pointer dulint. */ UNIV_INLINE dulint trx_undo_build_roll_ptr( /* out: roll pointer */ ibool is_insert, /* in: TRUE if insert undo log */ ulint rseg_id, /* in: rollback segment id */ ulint page_no, /* in: page number */ ulint offset) /* in: offset of the undo entry within page */ { #if DATA_ROLL_PTR_LEN != 7 # error "DATA_ROLL_PTR_LEN != 7" #endif ut_ad(rseg_id < 128); return(ut_dulint_create(is_insert * 128 * 256 * 256 + rseg_id * 256 * 256 + (page_no / 256) / 256, (page_no % (256 * 256)) * 256 * 256 + offset)); }

  13. innodb数据库的元数据 • 被保存在系统表空间中,也是以表的结构方式来保存的。和自己创建的表数据的存储方式一样。 • 有以下几个表: SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS SYS_FOREIGN SYS_FOREIGN_COLS

  14. SYS_TABLES dict_mem_table_create("SYS_TABLES", DICT_HDR_SPACE, 8, 0); dict_mem_table_add_col(table, heap, "NAME", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "N_COLS", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "TYPE", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "MIX_ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "MIX_LEN", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "CLUSTER_NAME", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "SPACE", DATA_INT, 0, 4); dict_mem_index_create("SYS_TABLES", "CLUST_IND", DICT_HDR_SPACE, DICT_UNIQUE | DICT_CLUSTERED, 1); dict_mem_index_add_field(index, "NAME", 0); dict_mem_index_create("SYS_TABLES", "ID_IND", DICT_HDR_SPACE, DICT_UNIQUE, 1); dict_mem_index_add_field(index, "ID", 0);

  15. SYS_COLUMNS dict_mem_table_create("SYS_COLUMNS", DICT_HDR_SPACE, 7, 0); dict_mem_table_add_col(table, heap, "TABLE_ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "POS", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "NAME", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "MTYPE", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "PRTYPE", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "LEN", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "PREC", DATA_INT, 0, 4); dict_mem_index_create("SYS_COLUMNS", "CLUST_IND", DICT_HDR_SPACE, DICT_UNIQUE | DICT_CLUSTERED, 2); dict_mem_index_add_field(index, "TABLE_ID", 0); dict_mem_index_add_field(index, "POS", 0);

  16. SYS_INDEXES dict_mem_table_create("SYS_INDEXES", DICT_HDR_SPACE, 7, 0); dict_mem_table_add_col(table, heap, "TABLE_ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "NAME", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "N_FIELDS", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "TYPE", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "SPACE", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "PAGE_NO", DATA_INT, 0, 4); dict_mem_index_create("SYS_INDEXES", "CLUST_IND", DICT_HDR_SPACE, DICT_UNIQUE | DICT_CLUSTERED, 2); dict_mem_index_add_field(index, "TABLE_ID", 0); dict_mem_index_add_field(index, "ID", 0);

  17. SYS_FIELDS dict_mem_table_create("SYS_FIELDS", DICT_HDR_SPACE, 3, 0); dict_mem_table_add_col(table, heap, "INDEX_ID", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "POS", DATA_INT, 0, 4); dict_mem_table_add_col(table, heap, "COL_NAME", DATA_BINARY, 0, 0); dict_mem_index_create("SYS_FIELDS", "CLUST_IND", DICT_HDR_SPACE, DICT_UNIQUE | DICT_CLUSTERED, 2); dict_mem_index_add_field(index, "INDEX_ID", 0); dict_mem_index_add_field(index, "POS", 0);

  18. SYS_FOREIGN & SYS_FOREIGN_COLS error = que_eval_sql(NULL, "PROCEDURE CREATE_FOREIGN_SYS_TABLES_PROC () IS\n" "BEGIN\n" "CREATE TABLE\n" "SYS_FOREIGN(ID CHAR, FOR_NAME CHAR," " REF_NAME CHAR, N_COLS INT);\n" "CREATE UNIQUE CLUSTERED INDEX ID_IND" " ON SYS_FOREIGN (ID);\n" "CREATE INDEX FOR_IND" " ON SYS_FOREIGN (FOR_NAME);\n" "CREATE INDEX REF_IND" " ON SYS_FOREIGN (REF_NAME);\n" "CREATE TABLE\n" "SYS_FOREIGN_COLS(ID CHAR, POS INT," " FOR_COL_NAME CHAR, REF_COL_NAME CHAR);\n" "CREATE UNIQUE CLUSTERED INDEX ID_IND" " ON SYS_FOREIGN_COLS (ID, POS);\n" "COMMIT WORK;\n" "END;\n" , FALSE, trx);

  19. SYS_IBUF_TABLE_0 dict_mem_table_create(“SYS_IBUF_TABLE_0”, space, 2, 0); dict_mem_table_add_col(table, heap, "PAGE_NO", DATA_BINARY, 0, 0); dict_mem_table_add_col(table, heap, "TYPES", DATA_BINARY, 0, 0); table->id = ut_dulint_add(DICT_IBUF_ID_MIN, space); index = dict_mem_index_create( “SYS_IBUF_TABLE_0”, "CLUST_IND", space, DICT_CLUSTERED | DICT_UNIVERSAL | DICT_IBUF, 2); dict_mem_index_add_field(index, "PAGE_NO", 0); dict_mem_index_add_field(index, "TYPES", 0);

  20. B+树介绍 • 在innodb中,B+树的叶子节点和非叶子节点分别存在2个不同的segment中。 • index通过前缀压缩来提高索引存储量。 • 数据都保存在叶子节点。 • innodb的索引和数据的保存和myisam很不一样。处处都是索引,并且secondary index包括primary key,所以要注意primary key的选择。

More Related