环境介绍:前几天搭建了一套 二节点单实例的 linux+oracle11.2.0.3+dataguard maximize availability 的环境。
故障现象:今天发现不能同步了,在trace文件alert_orcl.log里发现有报错信息MRP进程启不来
MRP0: Background Media Recovery terminated with error 328
ORA-00328: 8386238 , 8972415ORA-00334: '/opt/oracle/fast_recovery_area/DG_BEI/archivelog/2014_09_28/o1_mf_1_830_b2h9mjmn_.arc'
在报上述错误前trace文件alert_orcl.log有报与主库网络中断错误
RFS[4]: Assigned to RFS process 20669RFS[4]: Possible network disconnect with primary databaseSun Sep 28 13:23:48 2014
MRP0: Background Media Recovery terminated with error 328详细报错信息如下:
Mon Sep 29 14:32:28 2014alter database recover managed standby database using current logfile disconnect from session nodelayAttempt to start background Managed Standby Recovery process (orcl)Mon Sep 29 14:32:28 2014MRP0 started with pid=27, OS id=23710 MRP0: Background Managed Standby Recovery process started (orcl) started logmerger processMon Sep 29 14:32:33 2014Managed Standby Recovery starting Real Time ApplyParallel Media Recovery started with 2 slavesWaiting for all non-current ORLs to be archived...All non-current ORLs have been archived.Clearing online redo logfile 1 /opt/oracle/oradata/orcl/redo01.logClearing online log 1 of thread 1 sequence number 835Clearing online redo logfile 1 completeClearing online redo logfile 2 /opt/oracle/oradata/orcl/redo02.logClearing online log 2 of thread 1 sequence number 834Clearing online redo logfile 2 completeClearing online redo logfile 3 /opt/oracle/oradata/orcl/redo03.logClearing online log 3 of thread 1 sequence number 835Clearing online redo logfile 3 completeMedia Recovery Log /opt/oracle/fast_recovery_area/DG_BEI/archivelog/2014_09_28/o1_mf_1_830_b2h9mjmn_.arcErrors with log /opt/oracle/fast_recovery_area/DG_BEI/archivelog/2014_09_28/o1_mf_1_830_b2h9mjmn_.arcMRP0: Background Media Recovery terminated with error 328Errors in file /opt/oracle/diag/rdbms/dg_bei/orcl/trace/orcl_pr00_23712.trc:ORA-00328: 8386238 , 8972415ORA-00334: : '/opt/oracle/fast_recovery_area/DG_BEI/archivelog/2014_09_28/o1_mf_1_830_b2h9mjmn_.arc'Managed Standby Recovery not using Real Time ApplyRecovery interrupted!Completed: alter database recover managed standby database using current logfile disconnect from session nodelayMRP0: Background Media Recovery process shutdown (orcl)Mon Sep 29 14:34:06 2014RFS[1]: Assigned to RFS process 23726RFS[1]: Opened log for thread 1 sequence 836 dbid 1356850190 branch 829069458Archived Log entry 77 added for thread 1 sequence 836 rlc 829069458 ID 0x50df5e0e dest 2:Mon Sep 29 14:34:08 2014Primary database is in MAXIMUM AVAILABILITY modeChanging standby controlfile to RESYNCHRONIZATION levelStandby controlfile consistent with primaryRFS[2]: Assigned to RFS process 23728RFS[2]: Selected log 4 for thread 1 sequence 838 dbid 1356850190 branch 829069458Mon Sep 29 14:34:08 2014RFS[3]: Assigned to RFS process 23730RFS[3]: Selected log 5 for thread 1 sequence 837 dbid 1356850190 branch 829069458Mon Sep 29 14:34:08 2014Archived Log entry 78 added for thread 1 sequence 837 ID 0x50df5e0e dest 1:Changing standby controlfile to MAXIMUM AVAILABILITY levelRFS[2]: Selected log 5 for thread 1 sequence 839 dbid 1356850190 branch 829069458Mon Sep 29 14:34:11 2014Archived Log entry 79 added for thread 1 sequence 838 ID 0x50df5e0e dest 1:
处理过程:
1.备库上检查恢复相关的进程,确实少了MRP
select process,status,sequence# from v$managed_standby;
2.在备库上检查归档日志视图
sql>select name,sequence#,applied from v$archived_log;
奇怪的事:trace日志中报错的归档日志显示被应用了,如红框中示
3.对比主库与备库 2014_09_28归档日志,数据和大小都不一样,备库比主库日志数量要多,但比主库要小,如下图示:(估计是传输日志的时候刚好网络中断造成的大小不一致)
4.万能的google搜索,大概是归档日志传输出错,从主库copy日志到备库,然后再执行恢复
4.1 以防万一,先将备库2014_09_28归档日志备份一下,然后删除备库2014_09_28归档文件夹
4.2 从主库上scp 复制2014_09_28归档文件夹到备库
4.3 备库上执行恢复操作
SQL> alter database recover managed standby database using current logfile disconnect;
查看trace日志显示正常了
5.到备库上确认归档日志 sql>select name,sequence#,applied from v$archived_log;
显示能正常应用了,不过有一个日志(红框中示)没用应用,我猜因为从主库copy过来的日志文件没有这个文件(只5个,原备库有8个),检查了数据是正常的。故障应该算是解决了。