详细解析truncate引发的object checkpoint

| No Comments

之前已经写过:

"详细解析9i10gdatafile header"

"详细解析LMTdatafile的物理结构"

"详细解析datafilestatus"

"详细解析oracle中的transaction"

 

这里是详细解析系列的第五篇文章,在这篇文章里,我们详细解析了oracletruncate时引发的object checkpoint具体做了什么事情,并阐明了可能会由此导致ORA-00600 [kcbz_check_objd_typ_3]

 

"log switch checkpoint" 这篇文章里我们已经提到过oracle里一共有7checkpoint,它们分别是:

1Full Checkpoint

2Thread Checkpoint

3File Checkpoint

4Object Checkpoint

5Parallel Query Checkpoint

6Incremental Checkpoint

7Log Switch Checkpoint

 

Oracle通常会在你执行如下操作的时候触发object checkpoint

Drop table XXX

Drop table XXX purge

Truncate table XXX

Drop index XXX

这么做的目的是为了缩短recovery的时间.

 

简单的说,oracletruncate时触发的object checkpoint做的事情用一句话来概括就是:writes block images to the database for all dirty buffers belonging to an object from all instances.

 

但是事情往往没有那么美好,oracle的上述机制在极端的情况下会导致ORA-00600 [kcbz_check_objd_typ_3]因为oracletruncate的时候并没有对这个object所在的dictionary cachecheckpoint

 

我们来看一个实例:

SQL> create table test_truncate1 as select * from dba_users;

 

Table created

 

SQL> select count(*) from test_truncate1;

 

  COUNT(*)

----------

        29

 

SQL> alter system checkpoint;

 

System altered

 

SQL> select object_id,data_object_id from dba_objects where object_name='TEST_TRUNCATE1';

 

 OBJECT_ID DATA_OBJECT_ID

---------- --------------

     30217          30217

 

SQL> select dataobj#,file#,block# from tab$ where obj#=30217;

 

  DATAOBJ#      FILE#     BLOCK#

---------- ---------- ----------

     30217          1      50601

 

SQL> select dataobj#,to_char(dataobj#,'XXXXXX') RAW_DATAOBJ#,dbms_rowid.rowid_relative_fno(rowid)||'_'||dbms_rowid.rowid_block_number(rowid) location from obj$ where obj#=30217;

 

  DATAOBJ# RAW_DATAOBJ# LOCATION

---------- ------------ --------------------------------------------------------------------------------

     30217    7609      1_30058

 

在没执行truncate之前,我们先去看一下obj$test_truncate1dataobj#test_truncate1的段头

先来看obj$test_truncate1dataobj#

BBED> set file 1

        FILE#           1

 

BBED> set block 30058

        BLOCK#          30058

 

BBED> find /x 544553545f5452554e4341544531

 File: /dras21/testdb/system01.dbf (1)

 Block: 30058            Offsets:  610 to 1121           Dba:0x0040756a

------------------------------------------------------------------------

 54455354 5f545255 4e434154 453102c1 02ff02c1 0307786e 010d0b0a 0f07786e

 010d0b0a 0f07786e 010d0b0a 0f02c102 ffff0180 ff02c107 02c1022c 011104c3

 ......省略显示部分内容

 0207786d 0c170b19 3707786d 0c170b1a 1107786d 0c170b19 3702c102 ffff0180

 

 <32 bytes per line>

 

通过上述offset 610,我们可以定位出这里应该是kdbr[54]

BBED> p kdbr

sb2 kdbr[0]                                 @110      8012

sb2 kdbr[1]                                 @112      7909

......省略显示部分内容

sb2 kdbr[53]                                @216      3642

sb2 kdbr[54]                                @218      502

......省略显示部分内容

sb2 kdbr[72]                                @254      1517

sb2 kdbr[73]                                @256      875

 

BBED> p *kdbr[54]

rowdata[0]

----------

ub1 rowdata[0]                              @594      0x2c

 

BBED> x /rnnnc

rowdata[0]                                  @594    

----------

flag@594:  0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@595:  0x02

cols@596:    17

 

col    0[4] @597: 30217

col    1[4] @602: 30217

col    2[1] @607: 0

col   3[14] @609: TEST_TRUNCATE1

......省略显示部分内容

col   16[2] @666: ?

从上述结果中我们可以看到obj$test_truncate1dataobj#现在是30217

 

我们再来看test_truncate1的段头:

BBED> set file 1

        FILE#           1

 

BBED> set block 50601

        BLOCK#          50601

 

BBED> p ktech

struct ktech, 72 bytes                      @20     

   ub4 spare1_ktech                         @20       0x00000000

   ......省略显示部分内容

   struct hwmark_ktech, 32 bytes            @48     

      ub4 extno_ktehw                       @48       0x00000000

      ub4 blkno_ktehw                       @52       0x00000001

      ub4 extsize_ktehw                     @56       0x00000007

      ub4 blkaddr_ktehw                     @60       0x0040c5ab

      ub4 mapblk_ktehw                      @64       0x00000000

      ......省略显示部分内容

      ub4 kxidsqn                           @84       0x00000000

   ub4 flag_ktech                           @88       0x00000000 (NONE)

 

BBED> p ktemh

struct ktemh, 16 bytes                      @92     

   ub4 count_ktemh                          @92       0x00000001

   ub4 next_ktemh                           @96       0x00000000

   ub4 obj_ktemh                            @100      0x00007609

   ub4 flag_ktemh                           @104      0x40000000

可以看到现在test_truncate1的段头中的HWM0x0040c5abHWM下的block数量是0x00000001,它所指向的data object id0x00007609(即30217)。

 

现在我们来开始执行truncate操作:

SQL> truncate table test_truncate1;

 

Table truncated

 

SQL> select count(*) from test_truncate1;

 

  COUNT(*)

----------

         0

 

然后我们执行shutdown abort

$ sqlplus '/ as sysdba';

 

SQL*Plus: Release 9.2.0.6.0 - Production on Wed Jan 13 11:08:50 2010

 

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

 

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.6.0 - Production.

 

SQL>shutdown abort

ORACLE instance shut down.

 

现在我们再去看一下obj$test_truncate1dataobj#test_truncate1的段头

BBED> set file 1

        FILE#           1

 

BBED> set block 30058

        BLOCK#          30058

 

BBED> p *kdbr[54]

rowdata[0]

----------

ub1 rowdata[0]                              @594      0x2c

 

BBED> x /rnnnc

rowdata[0]                                  @594    

----------

flag@594:  0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@595:  0x02

cols@596:    17

 

col    0[4] @597: 30217

col    1[4] @602: 30217

......省略显示部分内容

col   16[2] @666: ?

从上述结果中我们可以看到oracle并没有对test_truncate1dictionary cachecheckpoint,此时test_truncate1data object id还是30217

 

BBED> set file 1

        FILE#           1

 

BBED> set block 50601

        BLOCK#          50601

 

BBED> p ktech

struct ktech, 72 bytes                      @20     

   ub4 spare1_ktech                         @20       0x00000000

   ......省略显示部分内容

   struct hwmark_ktech, 32 bytes            @48     

      ub4 extno_ktehw                       @48       0x00000000

      ub4 blkno_ktehw                       @52       0x00000000

      ub4 extsize_ktehw                     @56       0x00000007

      ub4 blkaddr_ktehw                     @60       0x0040c5aa

      ub4 mapblk_ktehw                      @64       0x00000000

      ......省略显示部分内容

      ub4 kxidsqn                           @84       0x00000023

   ub4 flag_ktech                           @88       0x00000001 (KTE_LOCKED)

 

BBED> p ktemh

struct ktemh, 16 bytes                      @92     

   ub4 count_ktemh                          @92       0x00000001

   ub4 next_ktemh                           @96       0x00000000

   ub4 obj_ktemh                            @100      0x0000760a

   ub4 flag_ktemh                           @104      0x40000000

可以看到现在test_truncate1的段头的HWM0x0040c5ab变为了0x0040c5aaHWM下的block数量由0x00000001变为了0x00000000,它所指向的data object id0x00007609(即30217)变为了0x0000760a(即30218)。

 

也就是说当oracle在做truncate触发的object checkpoint的时候,会马上去改写目标object的段头。

 

朋友们看到这里应该已经明白了为什么我说----"在极端情况下可能会由此导致ORA-00600 [kcbz_check_objd_typ_3]?"

 

比如说,你这个时候对表A先做了truncate,接着又对表A插入了一些记录,接着库就crash了,而且crash的原因是因为current redo log损坏或者被rm掉了,那当你采用非常规手段恢复了上述数据库后,在10gR2里,如果你这个时候再想去selectA的时候,很有可能oracle这里就会报错ORA-00600 [kcbz_check_objd_typ_3]因为oracle这里检查到表A段头记录的data object id和数据字典里记录的data object id已经不一致了

Leave a comment