一. HugePages 说明
1.1 HugePages 介绍
HugePages is afeature integrated into the Linux kernel with release 2.6. This featurebasically provides the alternative to the 4K page size (16Kfor IA64) providing bigger pages.
关于HugePages,有一些相关的专业术语,具体如下:
(1) Page Table: A page table is thedata structure of a virtual memory system in an operating system to store themapping between virtual addresses and physical addresses. This means that on avirtual memory system, the memory is accessed by first accessing a page tableand then accessing the actual memory location implicitly.
--Page Table 是操作系统上的虚拟内存系统的数据结构,其用来存储虚拟内存地址和物理内存地址之间的映射关系。这就意味着在虚拟内存系统上,我们访问内存时,是先访问Page Table,然后根据Page Table 中的映射关系,隐式的转移到物理的内存位置。
(2) TLB: A Translation LookasideBuffer (TLB) is a buffer (or cache) in a CPU that contains parts ofthe page table. This is a fixed size buffer being used to do virtual addresstranslation faster.
--TLB(Translation Lookaside Buffer) 是CPU 中的一块buffer 或者cache,其大小的固定的, TLB中包含了部分Page Table,用来快速进行虚拟地址的转换。
(3) hugetlb: This is an entryin the TLB that points to a HugePage (a large/big page larger than regular 4Kand predefined in size). HugePages are implemented via hugetlb entries, i.e. wecan say that a HugePage is handled by a "hugetlb page entry". The'hugetlb" term is also (and mostly) used synonymously with a HugePage(See Note261889.1). In this document the term "HugePage" is going to beused but keep in mind that mostly "hugetlb" refers to the sameconcept.
--hugetlb 是TLB中的一个entry,其指向HugePage(大于4k或预定义的一个large page)。 HugePage 通过hugetlb entries来实现,我们也可以说HugePage 是hugetlb page entry的一个句柄。 在MOS 文档:Note 261889.1中,二者是几乎是相同的概念。
(4) hugetlbfs: This is a newin-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocatedon hugetlbfs type filesystem are allocated in HugePages.
--hugetlbfs 是2.6内核中提出的一个新的in-memory filesystem,就像tmpfs一样。
1.2 常见的错误概念
WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems
RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations
WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS
RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too.
WRONG: hugetlbfs means hugetlb
RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs
WRONG: hugetlbfs means hugepages
RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs.
1.3 Regular Pages 与 HugePages 说明
When a singleprocess works with a piece of memory, the pages that the process uses arereference in a local page table for the specific process. The entries in thistable also contain references to the System-Wide Page Table which actually hasreferences to actual physical memory addresses. So theoretically a user modeprocess (i.e. Oracle processes), follows its local page table to access to thesystem page table and then can reference the actual physical table virtually. Asyou can see below, it is also possible (and very common to Oracle RDBMS due toSGA use) that two different O/S processes can point to the same entry in thesystem-wide page table.
--当一个进程使用一块内存来工作时,进程使用的page 从local page table 中引用。 Local page table中的entries 又引用了System-Wide Page Table的page, 该page 指向了实际的物理内存地址。
所以,理论上,用户的进程(如oracle进程),根据local page table中的entry 指向了system page table中的entry,而System page table中的entry 指向了实际的物理内存。
当然,也有可能,2个不同的O/S 进程指向了system-wide page table 中同一个entry,如下图所示,最常见的原因是Oracle SGA的使用。
When HugePagesare in the play, the usual page tables are employed. The very basic differenceis that the entries in both process page table and the system page table hasattributes about huge pages. So any page in a page table can be a huge page ora regular page. The following diagram illustrates 4096K hugepages but thediagram would be the same for any huge page size.
--当配置了HugePage后,最基本的不同是 process page table 和 system page table中的entry 都包含了huge page的属性。所以page table 中的任一page 都可能是huge page 或者regular page。
1.4Some HugePages Facts/Features
(1) HugePages can be allocated on-the-fly but they must be reservedduring system startup. Otherwise the allocation might fail as the memory isalready paged in 4K mostly.
(2) HugePage sizes vary from 2MB to 256MB based onkernel version and HW architecture (See related sectionbelow.)
(3) HugePages are not subject to reservation / release after thesystem startup unless there is system administrator intervention, basicallychanging the hugepages configuration (i.e. number of pages available or poolsize)
1.5 Advantages of HugePages OverNormal Sharing Or AMM
(1) Notswappable: 不需要内存页交换
HugePages are not swappable. Therefore there is nopage-in/page-out mechanism overhead.HugePages are universally regarded aspinned.
(2)Relief of TLB pressure: 减轻TLB的压力
1)Hugepge uses fewer pages to cover thephysical address space, so the size of “book keeping” (mapping from the virtualto the physical address) decreases, so it requiring fewer entries in the TLB
2)TLB entries will cover a larger part ofthe address space when use HugePages, there will be fewer TLB misses before theentire or most of the SGA is mapped in the SGA
3)Fewer TLB entries for the SGA also meansmore for other parts of the address space
(3)Decreased page table overhead: 降低pagetable 的消耗
Each page table entry can be as large as64 bytes and if we are trying to handle 50GB of RAM, the pagetable will beapproximately 800MB in size which is practically will not fit in 880MB sizelowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6kernels) considering the other uses of lowmem. When 95% of memory is accessedvia 256MB hugepages, this can work with a page table of approximately 40MB intotal.
每个一个page table 的entry最大需要64 bytes的内存,如果我们管理50GB的内存,那么Pagetable 就需要约800MB的内存空间. 如果我们使用256MB的hugepage,同样对于50G的内存,我们只需要40MB的pagetable。
Dave 注释:
按普通模式,每个page 4k,那么需要的entri
--创建测试表 create table t01(id integer,name varchar2(10)); --创建测试数据 insert into t01(id,name) values (1,'a'); insert into t01(id,name) values (2,'b'); insert into t01(id,name) values (3,'c'); --提交 commit; --查询表的数据 select * from t01; --使用动态语句为变量赋值 declare id t01.id%type := '1'; name t01.name%type; begin execute immediate 'select name from t01 t where id=:1' into name using id; dbms_output.put_line(name); end; --使用动态语句插入数据 declare id t01.id%type := 4;name t01.name%type := 'd'; begin execute immediate 'insert into t01(id,name) values(:1,:2)' using id,name; commit; end; --使用动态语句更新数据 declare id t01.id%type := 3;name t01.name%type := 'd'; begin execute immediate 'update t01 set name=:1 where id=:2' using name,id; commit; end; --使用动态语句删除数据 declare id t01.id%type := 4; begin execute immediate 'delete from t01 where id=:1' using id; commit; end;
--创建测试表 create table t01(id integer,name varchar2(10)); --创建测试数据 insert into t01(id,name) values (1,'a'); insert into t01(id,name) values (2,'b'); insert into t01(id,name) values (3,'c'); --提交 commit; --查询表的数据 select * from t01; --创建返回结果集的存储过程(系统类型) create or replace procedure p_getdatas(v_cur out sys_refcursor) as begin open v_cur for select id,name from t01; end; --测试结果的正确性 declare id t01.id%type;name t01.name%type;v_cur sys_refcursor; begin p_getdatas(v_cur); loop fetch v_cur into id,name; exit when v_cur%notfound; dbms_output.put_line(name); end loop; close v_cur; end; --创建返回结果的函数(系统类型) create or replace function f_getdatas return sys_refcursor as v_cur sys_refcursor; begin open v_cur for select id,name from t01; return v_cur; end; --测试结果的正确性 select f_getdatas from dual; --定义包(用户自定义类型) create or replace package pk_getdatas as type t_cur is ref cursor; procedure p_result(v_cur out pk_getdatas.t_cur); function f_result return pk_getdatas.t_cur; end; --创建包(用户自定义类型) create or replace package body pk_getdatas as procedure p_result(v_cur out pk_getdatas.t_cur) as begin open v_cur for select id,name from t01; end; function f_result return pk_getdatas.t_cur as v_cur pk_getdatas.t_cur; begin open v_cur for select id,name from t01; return v_cur; end; end; --测试结果的正确性 declare id t01.id%type;name t01.name%type;v_cur pk_getdatas.t_cur; begin p_getdatas(v_cur); loop fetch v_cur into id,name; exit when v_cur%notfound; dbms_output.put_line(name); end loop; close v_cur; end; --测试结果的正确性 select pk_getdatas.f_result from dual;