Oracle归档日志暴增排查优化

1、ORACLE归档日志介绍

归档日志暴增是oracle比较常见的问题,遇到归档日志暴增,我们该如何排查:
  • 归档日志暴增一般都是应用或者人为引起的
  • 理解归档日志存储的是什么
  • 如何排查归档日志暴增原因
  • 如何优化归档日志暴增

1.1 归档日志是什么

归档日志(Archive Log)是非活动的重做日志(redo)备份.
通过使用归档日志,可以保留所有重做历史记录,当数据库处于ARCHIVELOG模式并进行日志切换式,后台进程ARCH会将重做日志的内容保存到归档日志中.
当数据库出现介质失败时,使用数据文件备份,归档日志和重做日志可以完全恢复数据库。

1.2 归档日志存储的是什么

所有重做的历史记录,包括DML语句、数据改变等

1.3 归档日志暴增的原因

一般是DML操作大量的数据,导致归档日志暴增

1.4 排查归档日志暴增的方法

1.SQL语句
2.AWR
3.挖掘归档日志

2、归档日志暴增排查实战

2.1 制造归档日志暴增

create table scott.object as select * from dba_objects;

-- 执行10次
-- insert
insert into scott.object select * from scott.object;
select count(1) from scott.object;
-- 49384448

-- update
update SCOTT.object set owner='aa';

-- delete
delete from SCOTT.object;
truncate table SCOTT.object;

2.2 查看归档日志切换

SELECT
    THREAD# id,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5)                          DAY
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23
FROM
  v$log_history  a
GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5),THREAD#
ORDER BY id,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) 
/
代表12月19号,H20(20-21时),共切换24个归档日志,如果每一个500M,那么总共约500M*24,对比其余时间,可以说该时间产生异常的归档日志,目标排查改时间段

2.3 SQL语句判断

with aa as 
(SELECT IID,
       USERNAME,
       to_char(BEGIN_TIME,'mm/dd hh24:mi') begin_time,
       SQL_ID,
       decode(COMMAND_TYPE,3,'SELECT',2,'INSERT',6,'UPDATE',7,'DELETE',189,'MERGE INTO','OTH') "SQL_TYPE",
       executions "EXEC_NUM",
       rows_processed "Change_NUM"
  FROM (SELECT s.INSTANCE_NUMBER IID,
               PARSING_SCHEMA_NAME USERNAME,COMMAND_TYPE,
               cast(BEGIN_INTERVAL_TIME as date) BEGIN_TIME,
               s.SQL_ID,
               executions_DELTA executions,
               rows_processed_DELTA rows_processed,
               (IOWAIT_DELTA) /
               1000000 io_time,
               100*ratio_to_report(rows_processed_DELTA) over(partition by s.INSTANCE_NUMBER, BEGIN_INTERVAL_TIME) RATIO,
               sum(rows_processed_DELTA) over(partition by s.INSTANCE_NUMBER, BEGIN_INTERVAL_TIME) totetime,
               elapsed_time_DELTA / 1000000 ETIME,
               CPU_TIME_DELTA / 1000000 CPU_TIME,
               (CLWAIT_DELTA+APWAIT_DELTA+CCWAIT_DELTA+PLSEXEC_TIME_DELTA+JAVEXEC_TIME_DELTA)/1000000 OTIME,
               row_number() over(partition by s.INSTANCE_NUMBER,BEGIN_INTERVAL_TIME order by rows_processed_DELTA desc) TOP_D
                   FROM dba_hist_sqlstat s, dba_hist_snapshot sn,dba_hist_sqltext s2
         where s.snap_id = sn.snap_id
           and s.INSTANCE_NUMBER = sn.INSTANCE_NUMBER 
           and rows_processed_DELTA is not null
           and s.sql_id = s2.sql_id and COMMAND_TYPE in (2,6,7,189)
           and sn.BEGIN_INTERVAL_TIME > sysdate - nvl(180,1)/1440         and PARSING_SCHEMA_NAME<>'SYS')
 WHERE TOP_D <= nvl(20,1)
  )
select aa.*,s.sql_fulltext "FULL_SQL" from aa left join  v$sql s on  aa.sql_id=s.sql_id ORDER BY IID, BEGIN_TIME desc,"Change_NUM" desc
查看2小时的数据该变量,可以看出Change_NUM数据该变量和执行次数EXEC_NUM和SQL语句,update回滚了,所以没有该变量。
此时可以判断大量插入数据导致归档日志暴增,此时并不能判断update。此语句不一定有数据,只能做参考。

2.4 AWR

 创建AWR报告
创建AWR报告
@?/rdbms/admin/awrrpt.sql
SQL> @?/rdbms/admin/awrrpt.sql

Current Instance
~~~~~~~~~~~~~~~~

   DB Id    DB Name     Inst Num Instance
----------- ------------ -------- ------------
 3830097027 .....        1 .....


Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type: html

Type Specified:  html


Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   DB Id     Inst Num DB Name       Instance    Host
------------ -------- ------------ ------------ ------------
* 3830097027        1 .....       .....    dbserver01

Using 3830097027 for database Id
Using           1 for instance number


Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.

Enter value for num_days: 1

Listing the last day's Completed Snapshots

                            Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
.....         .....         36 19 Dec 2021 14:03       1
                 37 19 Dec 2021 15:00       1
                 38 19 Dec 2021 16:00       1
                 39 19 Dec 2021 17:00       1
                 40 19 Dec 2021 18:00       1

                 41 19 Dec 2021 20:12       1
                 42 19 Dec 2021 21:03       1

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 41
Begin Snapshot Id specified: 41

Enter value for end_snap: 42
End   Snapshot Id specified: 42

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrrpt_1_41_42.html.  To use this name,
press <return> to continue, otherwise enter an alternative.

Enter value for report_name: /tmp/awrrpt_1_41_42.html
解析AWR报告
 
可以看出大量redo,该时间段总该变量3762494/1024/1024=3674,每秒约产生3.5M
产生块最多的是scott用户,object对象,改变量是44684992,占比99%,说明是该对象产生的
根据对象可以在AWR报告中查看是否有怀疑的SQL,发现update语句。
其实根据SQL语句和AWR报告可以排查出大部分归档日志暴增的问题,如果无法排查可以继续进行挖掘归档日志。

2.5 挖掘归档日志

-rw-r-----. 1 oracle oinstall 794697216 Dec 19 20:37 1_66_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 20:37 1_67_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 21:03 1_68_1077902149.dbf
-rw-r-----. 1 oracle oinstall 733794304 Dec 19 21:03 1_69_1077902149.dbf
-rw-r-----. 1 oracle oinstall 756531200 Dec 19 21:03 1_70_1077902149.dbf
-rw-r-----. 1 oracle oinstall 761492480 Dec 19 21:14 1_71_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 21:14 1_72_1077902149.dbf
-rw-r-----. 1 oracle oinstall 265107968 Dec 19 21:14 1_73_1077902149.dbf
-- 最好sys或相关权限的用户,也可以使用toad工具
-- 第一次
@?/rdbms/admin/dbmslm.sql
@?/rdbms/admin/dbmslmd.sql

-- 开始执行
execute dbms_logmnr.add_logfile(logfilename=>'../../1_66_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_67_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_68_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_69_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_70_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.start_logmnr(options=>dbms_logmnr.dict_from_online_catalog); 
-- 依次类推小批量解析归档日志

-- 保存记录
create table scott.logmnr_contents as select * from v$logmnr_contents;

-- 分批执行...循环执行上面记录
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss'; 

-- 最后释放pga
execute dbms_logmnr.end_logmnr;
select sql_redo from scott.logmnr_contents where table_name='OBJECT';
select count(*) from scott.logmnr_contents where table_name='OBJECT';
可以从归档日志中查看大量的update语句,此时基本可以排查出归档日志暴增原因

2.6 归档日志暴增优化

1.delete是否可以改造成truncate分区表(ps: truncate需谨慎,无法恢复相关数据)
2.dml可以适量使用临时表
3.避免大事务
4.避免大量for循环dml
 
Tags: