背景:
1) When we take a disk offline in case the disk is corrupted or database is not able to read or write from the disk. In case of Oracle database 10g, oracle engine use to balance the other disks with the content of offline disk. This process was a relatively costly operation, and could take hours to complete, even if the disk failure was only a transient failure.
--10g时,若有ASM disk离线,数据库会将offline disk的数据分配到其他disk,此操作比较耗时且代价高昂;
2) Oracle Database 11g introduces the ASM Fast Mirror Resync feature that significantly reduces the time required to resynchronize a transient failure of a disk. When a disk goes off line oracle engine doesn’t balance other disk, instead ASM tracks the allocation units that are modified during the outage. The content present in the failed disk is tracked by other disks and any modification that is made to the content of failed disk is actually made in other available disks. Once we get the disk back and attach it, the data belonging to this disk and which got modified during that time will get resynchronized back again. This avoids the heavy re-balancing activity.
--11g引入fast mirror resync功能,若磁盘offline则其他磁盘负责记录并用对该磁盘的所有改动,等磁盘可访问时再进行同步,前提是该磁盘数据没有损坏,否则需要drop;
3) ASM fast disk resync significantly reduces the time required to resynchronize a transient failure of a disk. When a disk goes offline following a transient failure, ASM tracks the extents that are modified during the outage. When the transient failure is repaired, ASM can quickly resynchronize only the ASM disk extents that have been affected during the outage.
4) This feature assumes that the content of the affected ASM disks has not been damaged or modified.
5) When an ASM disk path fails, the ASM disk is taken offline but not dropped if you have set the DISK_REPAIR_TIME attribute for the corresponding disk group. The setting for this attribute determines the duration of a disk outage that ASM tolerates while still being able to resynchronize after you complete the repair.
Note: The tracking mechanism uses one bit for each modified allocation unit. This ensures that the tracking mechanism very efficient.
--ASM使用bit追踪每个被修改的AU,一个bit对应一个AU
ASM 11g New Features - How ASM Disk Resync Works.
Requirements:
1) This feature requires that the redundancy level for the disk should be set to NORMAL or HIGH.
2) compatible.asm & compatible.rdbms = 11.1.0.0.0 or higher
3) You need to set DISK_REPAIR_TIME parameter, which gives the time it takes for the disk to get repaired. The default time for this is set to 3.6 hours.
Examples:
SQL> ALTER DISKGROUP dgroupA SET ATTRIBUTE 'DISK_REPAIR_TIME'='3H';
4) The disk has to be offline (automatically due to the hardware failure or manually for maintenance operations) and should not be dropped.
To take the disk offline use:
SQL> ALTER DISKGROUP … OFFLINE DISKS command.
Example:
ALTER DISKGROUP dgroupA OFFLINE DISKS IN FAILGROUP controller2 DROP AFTER 5H;
Repair time for the disk is associated with diskgroup. You can override the repair time of diskgroup using following command:
SQL> ALTER DISKGROUP dgroupA SET ATTRIBUTE ‘DISK_REPAIR_TIME’='3H’;
Additional Manual Offline Disk Operations Examples:
SQL>ALTER DISKGROUP DG1 OFFLINE DISK DG1_0003 ;
SQL>ALTER DISKGROUP DG1 OFFLINE DISK DG1_0003 DROP AFTER 1H;
SQL>ALTER DISKGROUP DG1 OFFLINE DISKS IN FAILGROUP FG1;
SQL> ALTER DISKGROUP dgroupA OFFLINE DISKS IN FAILGROUP controller2 DROP AFTER 5H;
5) After the transient failure was corrected on the affected disks, you will need to explicitly online the disks.
Examples:
SQL>ALTER DISKGROUP DG1 ONLINE DISK DG1_0003;
SQL>ALTER DISKGROUP DG1 ONLINE DISKS IN FAILGROUP FG1 POWER 8 WAIT;
6) If you cannot repair a failure group that is in the offline state, you can use the ALTER DISKGROUP DROP DISKS IN FAILGROUP command with the FORCE option. This ensures that data originally stored on these disks is reconstructed from redundant copies of the data and stored on other disks in the same diskgroup.
Example:
SQL> ALTER DISKGROUP dgroupA DROP DISKS IN FAILGROUP controller2