ORA-00600: [kccpb_sanity_check_2] During Instance Startup

January 17, 2016, 11:06 pm

≫ Next: ORA-00600 [kcrf_resilver_log_1] on restart after system crash

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.1 and later Information in this document applies to any platform.

***Checked for relevance on 18-Feb-2013***

Symptoms

The database is getting the following errors on Startup:

ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [3621501], [3621462], [0x000000000]

Changes

In this case, the customer moved the box from one data center to another.

Cause

ORA-600 [kccpb_sanity_check_2] indicates that the seq# of the last read block is higher than the seq# of the control file header block. This is indication of

the lost write of the header block during commit of the previous cf transaction.

Solution

1) restore a backup of a controlfile and recover OR

2) recreate the controlfile OR

3) restore the database from last good backup and recover

NOTE: If you do not have any special backup of control file to restore and you are using Multiple Control File copies in your pfile/init.ora/spfile you can attempt to mount the database using each control file one by one. If you are able to mount the database with any of these control file copies you can then issue 'alter database backup controlfile to trace' to recreate controlfile.

↧

ORA-00600 [kcrf_resilver_log_1] on restart after system crash

January 17, 2016, 11:07 pm

≫ Next: ORA-600 [3020] "Stuck Recovery"

≪ Previous: ORA-00600: [kccpb_sanity_check_2] During Instance Startup

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.1.0 and later

Information in this document applies to any platform.

***Checked for relevance on 23-Nov-2012***

Symptoms

Database fails to open after crash with

ORA-00600: intern felkod, argument: [kcrf_resilver_log_1], [0x3B0E7AA68], [2]

From the trace file generated:

----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----

ALTER DATABASE OPEN

----- Call Stack Trace -----

kgeasnmierr

kcrf_write_zeroblks

kcrfis

kcrfais

kcrfr_read_disk

kcrfr_read

kcrfrgv

kcratr_scan

kcratr

kctrec

kcvcrv

Cause

Unpblished Bug 9056657: BOX REBOOT DURING UPGRADE CAUSED ORA-600 [KCRF_RESILVER_LOG_1]

There has been a lost write to the online redolog as a result of the crash.

The fix for this bug will raise a more meaning log corruption error rather than an ORa-00600 error.

Instance recovery is not possible - restore the database and do point in time recovery to the most recent archivelog.

Solution

Unpublished Bug 9056657 is included in 11.2.0.2 Patch Set Release.

Backports may be requested.

↧

ORA-600 [3020] "Stuck Recovery"

January 17, 2016, 11:10 pm

≫ Next: Urgent Help needed with ASM Header Corruption - Q: When is an ASM disk header is read and updated ?

≪ Previous: ORA-00600 [kcrf_resilver_log_1] on restart after system crash

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [3020] [a] [b] [c] [d] [e]

VERSIONS:

version 6.0 and above DESCRIPTION:

This is called a 'STUCK RECOVERY'.

There is an inconsistency between the information stored in the redo and the information stored in a database block being recovered.

ARGUMENTS:

For Oracle 9.2 and earlier: Arg [a] Block DBA

Arg [b] Redo Thread Arg [c] Redo RBA Seq

Arg [d] Redo RBA Block No Arg [e] Redo RBA Offset.

For Oracle 10.1

Arg [a] Absolute file number of the datafile. Arg [b] Block number

Arg [c] Block DBA

FUNCTIONALITY:

kernel cache recovery parallel

IMPACT:

INSTANCE FAILURE during recovery.

SUGGESTIONS:

There have been cases of receiving this error when RECOVER has been issued, but either some datafiles were not restored to disk, or the restoration has not finished.

Therefore, ensure that the entire backup has been restored and that the restore has finished PRIOR to issuing a RECOVER database command.

If problems continue, consider restoring from a backup and doing a point-in-time recovery to a time PRIOR to the one implied by

the ORA-600[3020] error.

Example:

SQL> recover database until time 'YYYY-MON-DD:HH:MI:SS'; This error can also be caused by a lost update.

During normal operations, block updates/writes are being performed to a number of files including database datafiles, redo log files, archived redo log files etc.

This error can be reported if any of these updates are lost for some reason.

Therefore, thoroughly check your operating system and disk hardware.

In the case of a lost update, restore an old copy of the datafile and attempt to recover and roll forward again.

If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.

Known Issues:

Related Articles

Note:1265884.1 Resolving ORA-752 or ORA-600 [3020] During Standby Recovery

KnownBugs

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:

NB	Bug	Fixed	Description
	9847338		Session hang after applying the patch for Bug 9587912 which causes ORA-600 [30
+	13467683	11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP04, 12.1.0.0	Join of temp and permanent tables in RAC might cause corruption of permanent ta Regression by bug 10352368
	12831782	11.2.0.2.BP11, 11.2.0.3.BP01, 12.1.0.0	ORA-600 [3020] / ORA-333 Recovery of datafile or async transport do not read mi there is a stale block
	12582839	11.2.0.3, 12.1.0.0	ORA-8103/ORA-600 [3020] on RMAN recovered locally managed tablespace
	11689702	11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.0	ORA-600 [3020] during recovery after datafile RESIZE (to smaller size)
	10329146	11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.0	Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
	10218814	11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.0	ORA-600 [3020] during recovery / on standby
+	10209232	11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.0	ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
*	10205230	11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.0	ORA-600 / corruption possible during shutdown in RAC
	10094823	11.2.0.2.4, 11.2.0.2.BP09, 11.2.0.3, 12.1.0.0	Block change tracking on physical standby can cause data loss
	10071193	11.2.0.2.BP02, 11.2.0.3, 12.1.0.0	Lost write / ORA-600 [kclchkblk_3] / ORA-600 [3020] in RAC - superceded
	9587912	11.2.0.2, 12.1.0.0	ORA-600 [3020] in datafile that went offline/online in a RAC instance
	8774868	11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0	OERI[3020] reinstating primary
+	8769473	11.2.0.2, 12.1.0.0	ORA-600 [kcbzib_5] on multi block read in RAC. Invalid lock in RAC. ORA-600 [302 Recovery
P	8635179	10.2.0.5, 11.2.0.2, 12.1.0.0	Solaris: directio may be disabled for RAC file access. Corruption / Lost Write
+	8597106	11.2.0.1.BP06, 11.2.0.2, 12.1.0.0	Lost Write in ASM when normal redundancy is used
P	12330911	12.1	EXADATA LSI firmware for lost writes
+	10425010	11.2.0.3, 12.1	Stale data blocks may be returned by Exadata FlashCache
	8826708	10.2.0.5, 11.2.0.2	ORA-600 [3020] for block type 0x3a (58) during recovery for block restored by RM backup
	11684626	11.2.0.1	ORA-600 [3020] on standby involving "BRR" redo when db_lost_write_protect is e
	8230457	10.2.0.4.1, 10.2.0.5, 11.1.0.7.1, 11.2.0.1	Physical standby media recovery gets OERI[krr_media_12]
+	7680907	10.2.0.5, 11.1.0.7.1, 11.2.0.1	ORA-600 [kclexpandlock_2] in LMS / instance crash. Incorrect locks in RAC. ORA-6 [3020] in recovery
	4637668	10.2.0.3, 11.1.0.6	IMU transactions can produce out-of-order redo (OERI [3020] on recovery)
	4594917	9.2.0.8, 10.2.0.2, 11.1.0.6	Write IO error can cause incorrect file header checkpoint information
	4453449	10.2.0.2, 11.1.0.6	OERI:3020 / corruption errors from multiple FLASHBACK DATABASE
	7197445	10.2.0.4.1, 10.2.0.5	Standby Recovery session cancelled due to ORA-600 [3020] "CHANGE IN FUTURE BLOCK"
	5610267	10.2.0.5	MRP terminated by ORA-600[krr_media_12] / OERI:3020 after flashback
	3762714	9.2.0.7, 10.1.0.4, 10.2.0.1	ALTER DATABASE RECOVER MANAGED STANDBY fails with OERI[3020]
	3560209	10.2.0.1	OERI[3020] stuck recovery under RAC
	3397181	9.2.0.5, 10.1.0.3, 10.2.0.1	ALTER SYSTEM KILL SESSION of recovery slave causes stuck recovery
*	3381950	10.2.0.1	Backups from RAC DB before Data Guard Failover cannot be used
	3535712	9.2.0.6, 10.1.0.4	OERI[3020] / ORA-10567 from RAC with standby in max performance mode
	4594912	9.2.0.8, 10.1.0.2	Incorrect checkpoint possible in datafile headers
	3635331	9.2.0.6, 10.1.0.4	Stuck recovery (OERI:3020) / ORA-1172 on startup after a crash
	2322620	9.2.0.1	OERI:3020 possible on recovery of LOB DATA
P+	656370	7.3.3.4, 7.3.4.0, 8.0.3.0	AlphaNT only: Corrupt Redo (zeroed byte) OERI:3020

Note:190263.1

ORA-1172 OR ORA-600[3020] Quick Support Debugging Guide

Given that this error could be due to a lost update to either the datafile and/or the redo files, one thing to do would be to get dumps of both.

Refer to the following notes for information on how to do this :

Note:1031381.6 How to Dump Redo Log File Information

Note:45852.1 Taking BLOCKDUMPS on Oracle8 - The ALTER SYSTEM DUMP command **INTERNAL ONLY**

It is especially useful to focus on the particular datafile block implied by the ORA-600 [3020]. Dump all redo for that block, starting with the log sequence before the restored datafile,

up to the point of failure.

Blockdumps of the datafile should be taken at various stages of the recovery process - for example right after doing the restore; and then again after each redo log file has been applied; and just before the SCN (or point in time) that the ORA-600 was reported; and just after redo

for the given SCN has been applied; and so on.

The idea being that you may narrow down the point at which something went wrong.

ORA-600 [3020] [a] [b] [c] [d] [e]

Versions: 7.0.X - 8.0.5 Source: knl/kcrp.c

===========================================================================

Meaning:

Recovering database and REDO entry has an INC/SEQUENCE number greater than that on the database block.

In Oracle8 where the block structure is different it still means the same basic thing - the redo record we have has an SCN / SEQ which does not match the database block we are wanting to apply it to.

This is called 'STUCK RECOVERY'.

---------------------------------------------------------------------------

Argument Description:

a. Block DBA

b. Redo Thread

c. Redo RBA Seq

d. Redo RBA Block No

e. Redo RBA Offset.

---------------------------------------------------------------------------

Diagnosis:

There are many possible causes for this most resulting from either invalid sets of commands or media corruption.

- Has customer restored a backup, open the DB, closed the DB and then tried to recover without re-loading the backup ??

** If they say no GET THE ALERT LOG and prove it - it's easy to waste a lot of time when this was the real cause.

- If the problem was a lost update, restore of an OLDER copy of the datafile and a recovery may work.

- The quick option here is to restore and recover UP TO an SCN

just before the problem. Customer will lose some data as this is an incomplete recovery so you need to know the priority:

a) TIME or b) Minimal Data Loss.

- Check the tracefile for the 3020 report. It is possible to signal OERI(3020) if the datafile block is corrupt.

Eg: OERI(3020) with Inc=0 Seq=1 reported for the disk block is possibly a zeroed out data-block on the datafile and NOT a redo issue.

- Is parallel server being used ?

If so another thread may have the required changes and they haven't been read for some reason. Check for OS and DLM errors. Try to make sure only ONE instance attempts any recovery by shutting down other instances.

- Are hot backups being used ??

Check that the backups are occuring correctly between BEGIN and END backup commands.

- Up to Oracle 8i you can try to skip the error using the hidden

parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

Be aware that blocks will be marked corrupt if this is used so make sure the error is not on a dictionary object !!

- From 9i you can try to skip the error using the 'ALLOW .. CORRUPTION' clause of the RECOVER DATABASE command.

(Note that in 11g onwards you may need to set DB_LOST_WRITE_PROTECT=NONE for the "ALLOW 1 CORRUPTION" clause to work)

- For logging a bug you need:

(a) Where an error is reported, get any trace files produced and relevant redo log dumps if necessary.

Document completely the circumstances leading

up to the error including configuration; type of backup (manual, RMAN, incremental, etc.);

the exact commands used to create the backups and

the exact commands used to do the restore and recovery.

(b) Provide a reproducible test case or dial-in information to development.

Articles:

Parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

---------------------------------------------------------------------------

Example OERI:3020 dump in Oracle8

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*** 1999.07.02.01.02.58.000

RECOVERY OF THREAD 1 STUCK AT BLOCK 14099 OF FILE 6

REDO RECORD - Thread:1 RBA: 0x0045ee.00009c8b.0010 LEN: 0x00e8 VLD: 0x01 SCN scn: 0x0000.0951d868 07/02/99 00:57:19

CHANGE #2 TYP:2 CLS: 1 AFN:6 DBA:0x01803713 SCN:0x0000.09519e84 SEQ: 1 OP:10.4

buffer tsn: 5 rdba: 0x01803713 (6/14099) scn:0x0000.0951b4d4 seq:0x01 flg:0x00 tail:0xb4d40601

frmt:0x02 chkval:0x0000 type:0x06=trans data

*** 1999.07.02.01.02.58.000

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments:

[3020], [25179923], [1], [17902], [40075], [16], [], []

Breaking this up shows the following SCN information: Redo SCN: 0x0000.0951d868

SCN expected on block: 0x0000.09519e84 SCN on Buffer: 0x0000.0951b4d4

In this case ithe actual SCN marked in the block

in the buffer cache is _later_ than the expected SCN, but _before_

the SCN level for the redo change vector. Normally, the SCN in the CHANGE line must match exactly the one on the block (in the buffer cache);

and redo application brings that block to the (later) SCN/SEQ

on the redo record. One possible explanation is that the system

saw a stale copy of the datafile block when the redo was generated, so that the SCN in the CHANGE line is the wrong one. That would indicate a possible lost update to the datafile.

More commonly, the ORA-600 [3020] error indicates that the SCN on the block is BEHIND the SCN on the redo we want to apply,

so there is a GAP. I.e., the REDO is ahead of the block.

However, in this example there is still a problem even though the block initially appears to be AHEAD of the REDO (normally OK).

Why? The SCN on the block is BELOW the most recent commit SCN.

If we applied the current redo record then the SCN on the block would advance to the more recent commit SCN so if this block is truely

ahead of this redo record it must have an SCN >= the most recent commit SCN. It hasn't, so something is wrong - most likely a lost datafile write which occurred between two items of redo causing

two redo records using the same block SCN to base their change on.

Known issues caused by 3rd party provider

1. Lost IO / Corruption caused by EMC. From JET SR: 3-1260172021. EMC bug ID: emc230687

ID: emc230687

Domain: EMC1 HP-UX 11v1

Solution Class: 3.X Compatibility

ORA-600 [3020] during recovery caused bu LOST IO due to EMC bug ID: emc230687.

No errors raised within the I/O stack at the host level nor from a Timefinder perspective, API ECA debug data void of any anomalies Timefinder w/Oracle best practices process is being adhered to ( recoverable business solution process )

This also caused some corruption errors like:

ORA-00600: internal error code, arguments: [kddummy_blkchk], [29], [2121334], [6108] kdbchk: xaction header lock count mismatch

No errors raised with the Symm as well. Corruption issue resolved by applying fix 44177, see the following Primus article for more i ETA emc204393

2. Lost IO by EMC. EMC solution # is emc251398

Fixed by the latest microcode version 5773.163.113 applied on the Symmetrix DMX (no changes on V-MAX cabins). EMC solution # is emc2

Ensure that this note comes out on top in Metalink when searched ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 3020 3020 3020 3020 3020 3020 3020 3020 3020 3020

3020 3020 3020 3020 3020 3020 3020 3020 3020 3020

↧

Urgent Help needed with ASM Header Corruption - Q: When is an ASM disk header is read and updated ?

February 4, 2016, 6:42 pm

≫ Next: Oracle ASM unable to find ASM disk header in some disks

≪ Previous: ORA-600 [3020] "Stuck Recovery"

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

The 3rd instance of one of the databases ( 11.2.0.3 with ASM + External Redundancy ) crashed out with the below errors reported ...It seems Customer added some Disks to ASM and Midway rebalance ASM picked up some underlying corruptions subsequently dismounting ASM DG and hence crashing the database

ASM Alert entries

>>> Customer added some Disks here >>>

Thu Oct 25 14:16:09 2012
NOTE: disk validation pending for group 19/0x6bc90d3b (DBTCSTRNPA)
SUCCESS: validated disks for 19/0x6bc90d3b (DBTCSTRNPA)
NOTE: disk validation pending for group 19/0x6bc90d3b (DBTCSTRNPA)
NOTE: Assigning number (19,20) to disk (ORCL:DBTCSTRNPA21)
NOTE: Assigning number (19,21) to disk (ORCL:DBTCSTRNPA22)
NOTE: Assigning number (19,22) to disk (ORCL:DBTCSTRNPA23)

>> Rebalance started 14:29 PM as a result >>>

Thu Oct 25 14:29:02 2012
NOTE: Attempting voting file refresh on diskgroup DATCSTRNPA
NOTE: ASM did background COD recovery for group 10/0x6b190d32 (DATCSTRNPA)
NOTE: starting rebalance of group 10/0x6b190d32 (DATCSTRNPA) at power 1
Starting background process ARB0
Thu Oct 25 14:29:02 2012
ARB0 started with pid=45, OS id=11888
NOTE: assigning ARB0 to group 10/0x6b190d32 (DATCSTRNPA) with 1 parallel I/O

>>> ASM Header corruption notes Midway during Rebalance at 15:40 >>

Thu Oct 25 15:40:24 2012
WARNING: cache read a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037734 au=0 blk=48 count=1
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
NOTE: a corrupted block from group DATCSTRNPA was dumped to /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc
WARNING: cache read (retry) a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037734 au=0 blk=48 count=1
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ERROR: cache failed to read group=10(DATCSTRNPA) dsk=72 blk=48 from disk(s): 72(DATCSTRNPA84)
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
NOTE: cache initiating offline of disk 72 group DATCSTRNPA
NOTE: process _arb0_+asm3 (11888) initiating offline of disk 72.3916037734 (DATCSTRNPA84) with mask 0x7e in group 10
WARNING: Disk 72 (DATCSTRNPA84) in group 10 in mode 0x7f is now being taken offline on ASM inst 3
NOTE: initiating PST update: grp = 10, dsk = 72/0xe969fe66, mask = 0x6a, op = clear
Thu Oct 25 15:40:25 2012
GMON updating disk modes for group 10 at 115 for pid 45, osid 11888
ERROR: Disk 72 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 10)
Thu Oct 25 15:40:25 2012
NOTE: cache dismounting (not clean) group 10/0x6B190D32 (DATCSTRNPA)
WARNING: Offline of disk 72 (DATCSTRNPA84) in group 10 and mode 0x7f failed on ASM inst 3
Thu Oct 25 15:40:25 2012
NOTE: halting all I/Os to diskgroup 10 (DATCSTRNPA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 4739, image: oracle@itcccl180.it.express.tnt (B000)
Thu Oct 25 15:40:25 2012
NOTE: LGWR doing non-clean dismount of group 10 (DATCSTRNPA)
NOTE: LGWR sync ABA=231.134 last written ABA 231.134

>> Diskgroup Dismounted as a Result of this >>>

NOTE: cache dismounted group 10/0x6B190D32 (DATCSTRNPA)
SQL> alter diskgroup DATCSTRNPA dismount force /* ASM SERVER */
System State dumped to trace file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc
Thu Oct 25 15:40:27 2012
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
Thu Oct 25 15:40:39 2012

Thu Oct 25 15:40:39 2012
NOTE: AMDU dump of disk group DATCSTRNPA created at /oracle/diag/asm/+asm/+ASM3/trace
NOTE: cache deleting context for group DATCSTRNPA 10/0x6b190d32
ERROR: ORA-15130 thrown in ARB0 for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "" is being dismounted
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [4] [2] [27016521 != 27015521]
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [339] [2147483706] [4232823222 != 261167758]
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [2147483649] [81] [2397242929 != 2383392830]
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ORA-15066: offlining disk "DATCSTRNPA84" in group "DATCSTRNPA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
Thu Oct 25 15:40:39 2012

NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "" is being dismounted

DB Alert log has these entries

Thu Oct 25 02:22:42 2012
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
...
..
Thu Oct 25 02:52:58 2012
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM

>> CAN Be ignored as documented under "WARNING: ASM Communication Error: Op 0 State 0x0 (15055) (Doc ID 1469167.1)"

...
...

>>> ASM Disks added are reported here >>>

Thu Oct 25 14:16:18 2012
SUCCESS: disk DBTCSTRNPA21 (20.3916037823) added to diskgroup DBTCSTRNPA
SUCCESS: disk DBTCSTRNPA22 (21.3916037824) added to diskgroup DBTCSTRNPA
SUCCESS: disk DBTCSTRNPA23 (22.3916037825) added to diskgroup DBTCSTRNPA
Thu Oct 25 14:22:40 2012

>>> DB Crashes as the ASM Diskgroup was dismounted due to Corruptions >>>

Thu Oct 25 15:40:39 2012
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_lgwr_17853.trc:
ORA-00345: redo log write error block 35172 count 1
ORA-00312: online log 17 thread 3: '+DATCSTRNPA/cstrnpa/onlinelog/group_17.399.776096445'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_lgwr_17853.trc:
ORA-00346: log member marked as STALE and closed
ORA-00312: online log 17 thread 3: '+DATCSTRNPA/cstrnpa/onlinelog/group_17.399.776096445'
Thu Oct 25 15:40:48 2012
KCF: read, write or open error, block=0x9b online=1
file=123 '+DATCSTRNPA/cstrnpa/datafile/undotbs3.387.767891645'
error=15078 txt: ''
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_dbw0_17837.trc:
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_dbw0_17837.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 123 (block # 155)
ORA-01110: data file 123: '+DATCSTRNPA/cstrnpa/datafile/undotbs3.387.767891645'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
DBW0 (ospid: 17837): terminating the instance due to error 63999

Hardware vendor HP have tried shelving these issues onto Oracle and have asked us to explain exactly when and how is an ASM Disk Header read and Updated so please can anyone help provide answers to below Q's asked =>
They believe ASM Rebalance caused these corruptions but we don't think that was the reason

1. When ASM rebalances the disks, does it read the block header first then write the block? Can a ASM Rebalance cause Block corruptions under any circumstances OR is this not possible within the ASM Internal mechanism ?

2. When is the ASM header read ?

3. What causes ASM metadata to be updated ? Is this updated when the disk is added immediately, or when the rebalancing occurs?

4. How is locking done on the ASM header between the RAC nodes and how is a lock released on an Oracle instance failure?

5. Why did the Database carry on when the Header corruption error was first reported in the Alert log – This has been partially answered in the fact the error is only detected when the rebalance runs.

6. How can we determine when was the last successful ASM Header read before the corruption ?

Any help would be more than appreciated...

Answer:

ARB0 relocating file +DATCSTRNPA.256.666381297 (8 entries)

*** 2012-10-25 17:05:06.757

ARB0 relocating file +DATCSTRNPA.258.666381295 (76 entries)

*** 2012-10-25 17:07:14.274

WARNING: cache read a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037804 au=0 blk=48 count=1

*** 2012-10-25 17:07:14.274

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)

----- Error Stack Dump -----
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
Hex dump of disk block image:

Dump of memory from 0x00000000694FA000 to 0x00000000694FB000

0694FA000 00000000 00000000 00000000 00000000 [................]

        Repeat 63 times

0694FA400 003C0000 00780000 00060000 007571ED [..<...x......qu.]

0694FA410 003BFFF5 00000000 00000002 00000002 [..;.............]

0694FA420 00008000 00008000 00004000 5051A885 [.........@....QP]

0694FA430 50528195 001D0005 0003EF53 00000001 [..RP....S.......]

0694FA440 5048BD84 00ED4E00 00000000 00000001 [..HP.N..........]

0694FA450 00000000 0000000B 00000080 00000034 [............4...]

0694FA460 00000006 00000003 DB40F439 4643267C [........9.@.|&CF]

0694FA470 5AB9A4A6 6703FEB5 00000000 00000000 [...Z...g........]

0694FA480 00000000 00000000 00000000 00000000 [................]

        Repeat 3 times

0694FA4C0 00000000 00000000 00000000 03FE0000 [................]

0694FA4D0 00000000 00000000 00000000 00000000 [................]

0694FA4E0 00000008 00000000 00000000 6D4FD3DA [..............Om]

0694FA4F0 174CB0E0 9DA83EA6 62C7706F 00000102 [..L..>..op.b....]

0694FA500 00000000 00000000 5048BD84 00000609 [..........HP....]

0694FA510 0000060A 0000060B 0000060C 0000060D [................]

0694FA520 0000060E 0000060F 00000610 00000611 [................]

0694FA530 00000612 00000613 00000614 00000615 [................]

0694FA540 00000A16 00000000 00000000 08000000 [................]

0694FA550 00000000 00000000 00000000 00000000 [................]

Repeat 170 times

OSM metadata block dump:

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfbtTraverseBlock: Invalid OSM block type 0

WARNING: cache read (retry) a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037804 au=0 blk=48 count=1

*** 2012-10-25 17:07:14.277

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)

----- Error Stack Dump -----
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ERROR: cache failed to read group=10(DATCSTRNPA) dsk=72 blk=48 from disk(s): 72(DATCSTRNPA84)
CE: (0x0x693e9018) group=10 (DATCSTRNPA) dsk=72 blk=48

    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

    mirror=0

    flags_kfcpba=0x49 copies=1 blockIndex=48 AUindex=0 AUcount=1 loctr fcn=0.0

    copy #0: disk=72 au=0 flags=01

BH: (0x0x69791290) bnum=2049 type=reading state=reading chgSt=not modifying pageIn=current

    flags=0x00000000 pinmode=excl lockmode=excl bf=0x694fa000

    kfbh_kfcbh.fcn_kfbh = 0.0 lowAba=0.0 highAba=0.0

    last kfcbInitSlot return code=null chgCount=815 cpkt lnk is null ralFlags=0x00000000

    PINS:

    (kfcbps) pin=25743 get by kfd.c line 23273 mode=excl

             dsk=72 blk=48 status=pinned

             flags=0x80000000 flags2=0x00000000

             class=1400 type=ALLOCTBL stateWanted=current

             bastCount=1 waitStatus=0x00000000 relocCount=0

             scanBastCount=2 scanBxid=64781 scanSkipCode=2

             last released by kfc.c 18264

LE: (0x724e36b0) le=2567 group=10 dsk=72 blk=48

    open=T kjStat=0 mode=EX closing=0 lop=(nil)

    flags=00000000 astFlags=00000000 rlsFlags=00000000

    rcvFlags=00000000 id=0x2a000048.30 bucket=1791

    lastScanWaiterMode=0 fcn=0.0



File_name :: +ASM2_arb0_23441.trc

NOTE: cache opening disk 71 of grp 10: DATCSTRNPA83 label:DATCSTRNPA83

NOTE: cache opening disk 72 of grp 10: DATCSTRNPA84 label:DATCSTRNPA84

NOTE: cache opening disk 73 of grp 10: DATCSTRNPA85 label:DATCSTRNPA85

00003e50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

*

00004000 01 82 03 01 04 00 00 00 48 00 00 80 aa 4d 6f 80 |........H....Mo.|

00004010 01 b7 8c 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

00004020 80 03 00 00 c0 01 00 00 08 00 08 00 00 00 c0 01 |................|

]$ kfed read mpath16p1dump blknum=47|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      47 ; 0x004: blk=47

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2309435824 ; 0x00c: 0x89a731b0

kfbh.fcn.base:                  9230112 ; 0x010: 0x008cd720

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

$ kfed read mpath16p1dump blknum=48|more

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

B7F14200 00000000 00000000 00000000 00000000 [................]

        Repeat 63 times

B7F14600 003C0000 00780000 00060000 007571ED [..<...x......qu.]

B7F14610 003BFFF5 00000000 00000002 00000002 [..;.............]

B7F14620 00008000 00008000 00004000 5051A885 [.........@....QP]

B7F14630 50528195 001D0005 0003EF53 00000001 [..RP....S.......]

B7F14640 5048BD84 00ED4E00 00000000 00000001 [..HP.N..........]

B7F14650 00000000 0000000B 00000080 00000034 [............4...]

B7F14660 00000006 00000003 DB40F439 4643267C [........9.@.|&CF]

B7F14670 5AB9A4A6 6703FEB5 00000000 00000000 [...Z...g........]

B7F14680 00000000 00000000 00000000 00000000 [................]

        Repeat 3 times

B7F146C0 00000000 00000000 00000000 03FE0000 [................]

B7F146D0 00000000 00000000 00000000 00000000 [................]

B7F146E0 00000008 00000000 00000000 6D4FD3DA [..............Om]

B7F146F0 174CB0E0 9DA83EA6 62C7706F 00000102 [..L..>..op.b....]

B7F14700 00000000 00000000 5048BD84 00000609 [..........HP....]

B7F14710 0000060A 0000060B 0000060C 0000060D [................]

B7F14720 0000060E 0000060F 00000610 00000611 [................]

B7F14730 00000612 00000613 00000614 00000615 [................]

B7F14740 00000A16 00000000 00000000 08000000 [................]

B7F14750 00000000 00000000 00000000 00000000 [................]

Repeat 170 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

$ kfed read mpath16p1dump blknum=49|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      49 ; 0x004: blk=49

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2158685900 ; 0x00c: 0x80aaeecc

kfbh.fcn.base:                  4799766 ; 0x010: 0x00493d16

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

$ kfed read mpath16p1dump blknum=50|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      50 ; 0x004: blk=50

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2158654602 ; 0x00c: 0x80aa748a

kfbh.fcn.base:                  4820845 ; 0x010: 0x00498f6d

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdatb10.aunum:                   21504 ; 0x000: 0x00005400

kfdatb10.shrink:                    448 ; 0x004: 0x01c0

1. When ASM rebalances the disks, does it read the block header first then write the block?
Can a ASM Rebalance cause Block corruptions under any circumstances
OR is this not possible within the ASM Internal mechanism ?

===>> When rebalance takes place asm do block by block checksum ,here in your case for block 48 ,ASM checksum failed as ASM didnot found asm formatted block .

      On 11.2.0.3 ,till now there is no reported bug at oracle end.



Interestingly, I see from the dd dump that only block 48 is unformatted whereas earlier and later blocks were formatted properly for ASM allocation table metadata.

And when I see the block 48 ,

$ kfed read mpath16p1dump blknum=48|more

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

B7F14200 00000000 00000000 00000000 00000000 [................]

        Repeat 63 times

B7F14600 003C0000 00780000 00060000 007571ED [..<...x......qu.]

B7F14610 003BFFF5 00000000 00000002 00000002 [..;.............]

B7F14620 00008000 00008000 00004000 5051A885 [.........@....QP]

B7F14630 50528195 001D0005 0003EF53 00000001 [..RP....S.......]

B7F14640 5048BD84 00ED4E00 00000000 00000001 [..HP.N..........]

B7F14650 00000000 0000000B 00000080 00000034 [............4...]

B7F14660 00000006 00000003 DB40F439 4643267C [........9.@.|&CF]

B7F14670 5AB9A4A6 6703FEB5 00000000 00000000 [...Z...g........]

B7F14680 00000000 00000000 00000000 00000000 [................]

        Repeat 3 times

B7F146C0 00000000 00000000 00000000 03FE0000 [................]

B7F146D0 00000000 00000000 00000000 00000000 [................]

B7F146E0 00000008 00000000 00000000 6D4FD3DA [..............Om]

B7F146F0 174CB0E0 9DA83EA6 62C7706F 00000102 [..L..>..op.b....]

B7F14700 00000000 00000000 5048BD84 00000609 [..........HP....]

B7F14710 0000060A 0000060B 0000060C 0000060D [................]

B7F14720 0000060E 0000060F 00000610 00000611 [................]

B7F14730 00000612 00000613 00000614 00000615 [................]

B7F14740 00000A16 00000000 00000000 08000000 [................]

B7F14750 00000000 00000000 00000000 00000000 [................]

Repeat 170 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

It seems some of the external values overwritten on that block.

So,could you please check few things.

1. Any OS level application which is running ,can write such string

2. Any Application level which is running ,can write such string.

remember only 4k block got impacted here .

2. When is the ASM header read ?

====>> This is not asm disk header ,rather it is on some internal asm metadata.

This kind read generally happens due to below situation.

a. When diskgroup get mounted and does diskgroup level recovery .

b. When you add disks and rebalence happens .

3. What causes ASM metadata to be updated ? Is this updated when the disk is added immediately, or when the rebalancing occurs?

====>>> This kind of asm metadata get updated when new allocation/deallocation took place at database level .

4. How is locking done on the ASM header between the RAC nodes and how is a lock released on an Oracle instance failure?

====>>> ASM keeps track of changes of each thread at asm diskgroup level and do required recovery on next mount .

5. Why did the Database carry on when the Header corruption error was first reported in the Alert log –
This has been partially answered in the fact the error is only detected when the rebalance runs.

===>>> Unless you are going to read/write data which are pointed using that allocation table ,you are not going to see this issue .

       but when rebalance takes place ,it goes and touch all the blocks to read and make symmetrical stripping distribution of alrady existing

       allocation unit.



       Hence,this time it came into picture.



       So,this corruption took place between ,the start of asm rebalance and last DML operation on that block.


6. How can we determine when was the last successful ASM Header read before the corruption ?

===>>> ASM diskgroup getting mounted ,so all asm disk headers are fine.

ASM diskheader is different from allocation table metadata.

even at the time disk all asm disk headers were read.

↧

Oracle ASM unable to find ASM disk header in some disks

February 4, 2016, 6:52 pm

≫ Next: ASMdisk Status - Candidate disk after reboot [ recover ASM header files ]

≪ Previous: Urgent Help needed with ASM Header Corruption - Q: When is an ASM disk header is read and updated ?

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ASM unable to find ASM disk header in some disks.These headers (at least

5MB+) is zero filled in each disk (mpath218p1, mpath217).

I tried to see if disk start, end is relocated and these header can be found

in different offset of the same disk. But, it doesn't seem so. These data is not relocated. They are corrupted.

It is not issue at ASM/RDBMS level.

FURTHER PLANS:

================================================================

-ASM/RDBMS doesn't do zero fill. It is not issue at ASM/RDBMS level.

Given that 5mb+ is corrupted, it should have caused external to ORACLE.

-Please ask ct to check for following

-any manual error. someone inadvertently does dd if=/dev/zero' on these

disks

-any tools, scripts, 3rd party tools that might do such writes. If

there is no known applications/tool, please suggest ct to run some

monitoring tools that does write on the disks (something like fuser command

should help to list pid that have opened the disks.)

-Given that corruption occurred in same storage (ETL420), it might be

issue at storage level. please involve storage support to see if something

wrong at storage level.

↧

ASMdisk Status - Candidate disk after reboot [ recover ASM header files ]

February 4, 2016, 7:46 pm

≫ Next: Need urgent help on ASM issue – disk header status problem

≪ Previous: Oracle ASM unable to find ASM disk header in some disks

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Customer has migrated oracle databases running on old SAN to new SAN using ASM rebalance operation. Customer is using External redundancy. After completing rebalance operation, customer rebooted all server and removed old SAN device entry on weekend. Customer is unable to bring databases online on One of the 4 servers. Customer is getting following error :
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATA”

I have attached asm-kfed result for your reference. Is it possible to recover ASM header files without so customer doesn’t need to backup/restore 5 TB database ?

Total System Global Area  284008448 bytes
Fixed Size                  2158616 bytes
Variable Size             256684008 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
 

SQL> select group_number,disk_number,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,NAME,PATH from V$asm_disk;
 
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    NAME                           PATH
------------ ----------- ------- ------------ ------- -------- ------------------------------ ----------------------------------------
           0           0 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d26s6
           0          23 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d22s6
           0           2 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM01
           0           3 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM12
           0           4 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d27s6
           0           5 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d25s6
           0           6 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM05
           0           7 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM02
           0           8 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM08
           0           9 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d24s6
           0          10 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM11
           0          11 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d30s6
           0          12 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM07
           0          13 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d31s6
           0          14 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d21s6
           0          15 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM10
           0          16 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d23s6
           0          17 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM09
           0          18 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM03
           0          19 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d29s6
           0          20 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d28s6
           0          21 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM04
           0          22 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d32s6
           0           1 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM06





dev/rdsk/san03dp_dbs05dp_ASM03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM04
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM05
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM06
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM07
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM08
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

ASM disks – ASM03/04/05/06/07/08 – showing the status as “CANDIDATE” is a bit worry. But if the devices were not part of DATA diskgroup before, these devices are not the major cause of ORA-15063.
Please check ASM alert.log whether these 5 devices were belong to DATA.

I’m more concerned about the following 4 devices as they show the status as “IGNORED” which indicates there are other devices showing the same disk information given asm_diskstring parameter.
– ASM01/02/0910/11

Chances are that the following devices below show the same disk information as ASM01/02/0910/11 and there is good chance that these different path point to the same physical devices.
~~
/dev/rdsk/c0d22s6
/dev/rdsk/c0d30s6
/dev/rdsk/c0d31s6
/dev/rdsk/c0d21s6
/dev/rdsk/c0d29s6
/dev/rdsk/c0d32s6
~~

Please check all disk header which disks show the duplicate ASM disk information using the following perl script in the note below
And if duplicate paths point to the same physical device, the additional device path should be disabled using “chmod 000 <device_path>”
– KFED.PL for diagnosing – ORA-15063 ORA-15042 ORA-15020 (Doc ID 1346190.1)

↧

Need urgent help on ASM issue – disk header status problem

February 4, 2016, 8:02 pm

≫ Next: [Urgent] ORA-15042: ASM disk “76” is missing

≪ Previous: ASMdisk Status - Candidate disk after reboot [ recover ASM header files ]

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ODA system
+In order to workaround a known issue (startup hang when using Hitachi disks) FE/customer was in the process of replacing Hitachi drives on the system.
+ They pulled 2 disks out simultaneously and new disks put in
+ Diskgroups (DATA and RECO) dismounted – as diskgroups built in NORMAL redundancy .
+ Clusterware went down and realizing the problem- customer reinstated the original disks.

Current issue:
Disk groups are not mounting.
ASM disks from slot 0 are not being seen by ASM
ASM disks from slot 1 are being seen, but reported as new disks ( Header status=CANDIDATE)

Mounting the diskgroup with FORCE option has also not helped ( because there is 1 disk from slot 0 missing and 1 disk from slot 1 being reported as candidate)
** Customer has no backup and he needs to find out if it is fixable, or the system needs to be rebuilt from scratch.

--------------------------------------------------------------------------------
 Disk          Size Header    Path                                     Disk Group   User     Group   
================================================================================
   1:     491520 Mb CANDIDATE /dev/mapper/HDD_E0_S01_717882548p1       #            grid     asmadmin
   2:      75080 Mb CANDIDATE /dev/mapper/HDD_E0_S01_717882548p2       #            grid     asmadmin
   3:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S04_717894368p1       DATA         grid     asmadmin
   4:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S04_717894368p2       RECO         grid     asmadmin
   5:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S05_717844560p1       DATA         grid     asmadmin
   6:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S05_717844560p2       RECO         grid     asmadmin
   7:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S08_717882264p1       DATA         grid     asmadmin
   8:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S08_717882264p2       RECO         grid     asmadmin
   9:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S09_717844480p1       DATA         grid     asmadmin
  10:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S09_717844480p2       RECO         grid     asmadmin
  11:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S12_717844976p1       DATA         grid     asmadmin
  12:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S12_717844976p2       RECO         grid     asmadmin
  13:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S13_717845048p1       DATA         grid     asmadmin
  14:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S13_717845048p2       RECO         grid     asmadmin
  15:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S16_717895116p1       DATA         grid     asmadmin
  16:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S16_717895116p2       RECO         grid     asmadmin
  17:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S17_717888848p1       DATA         grid     asmadmin
  18:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S17_717888848p2       RECO         grid     asmadmin
  19:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S02_717825396p1       DATA         grid     asmadmin
  20:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S02_717825396p2       RECO         grid     asmadmin
  21:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S03_717894252p1       DATA         grid     asmadmin
  22:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S03_717894252p2       RECO         grid     asmadmin
  23:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S06_717886840p1       DATA         grid     asmadmin
  24:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S06_717886840p2       RECO         grid     asmadmin
  25:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S07_717888592p1       DATA         grid     asmadmin
  26:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S07_717888592p2       RECO         grid     asmadmin
  27:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S10_717843708p1       DATA         grid     asmadmin
  28:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S10_717843708p2       RECO         grid     asmadmin
  29:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S11_717852256p1       DATA         grid     asmadmin
  30:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S11_717852256p2       RECO         grid     asmadmin
  31:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S14_717895376p1       DATA         grid     asmadmin
  32:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S14_717895376p2       RECO         grid     asmadmin
  33:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S15_717843800p1       DATA         grid     asmadmin
  34:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S15_717843800p2       RECO         grid     asmadmin
  35:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S18_717882696p1       DATA         grid     asmadmin
  36:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S18_717882696p2       RECO         grid     asmadmin
  37:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S19_717849420p1       DATA         grid     asmadmin
  38:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S19_717849420p2       RECO         grid     asmadmin
  39:      70005 Mb MEMBER    /dev/mapper/SSD_E0_S20_805725574p1       REDO         grid     asmadmin
  40:      70005 Mb MEMBER    /dev/mapper/SSD_E0_S21_805708282p1       REDO         grid     asmadmin
  41:      70005 Mb MEMBER    /dev/mapper/SSD_E1_S22_805706766p1       REDO         grid     asmadmin
  42:      70005 Mb MEMBER    /dev/mapper/SSD_E1_S23_805706623p1       REDO         grid     asmadmin
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME                                                          
================================================================================
     +ASM1 /u01/app/11.2.0.3/grid                                               
     +ASM2 /u01/app/11.2.0.3/grid

What is the backup block status ,

kfed read <device_name> aunum=1 blknum=254

Does it shows proper header ,if so the run kfed repair command.

if other blocks are fine except header this will work … else on next mount while doing COD recovery ,it will crash .

ASM log file info
==============

NOTE: cache closing disk 0 of grp 1: (not open) _DROPPED_0000_DATA
ERROR: Disk 1 cannot be offlined, since all the disks [1, 0] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)

Here the disks 0,1 have been put back in respective slots but still same issue.

Seems similar to the one described in ORA-15042: ASM disk is missing after add disk took place (Doc ID 1529397.1)

↧

[Urgent] ORA-15042: ASM disk “76” is missing

February 4, 2016, 8:12 pm

≫ Next: ASM diskgroup cann’t mount and drop

≪ Previous: Need urgent help on ASM issue – disk header status problem

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

One customer has a ASM problem about ORA-15042.
O/S: Linux X86 64bit 2.6.18-194.el5
DB Version : 10.2.0.5

Although we can access the ASM header using kfed & dd, the asm instance cannot read these devices.
For example, the ASM instance can read the 75th disk, but cannot read the 76th disk.

Do you have this experience?

# Environment
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] uname -a
Linux LGEDGDMS01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

# Error
SQL> startup
ASM instance started

Total System Global Area 130023424 bytes
Fixed Size                  2094544 bytes
Variable Size             102763056 bytes
ASM Cache                  25165824 bytes
ORA-15042: ASM disk “23” is missing
ORA-15042: ASM disk “22” is missing
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk “” may result in a data loss
ORA-15042: ASM disk “88” is missing
…
ORA-15042: ASM disk “77” is missing
ORA-15042: ASM disk “76” is missing   ==> 76 th device
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk “” may result in a data loss
ORA-15042: ASM disk “88” is missing
ORA-15042: ASM disk “87” is missing
…
ORA-15042: ASM disk “81” is missing

SQL> show parameter asm_diskstring
NAME TYPE VALUE
———————————— ——————————— ——————————
asm_diskstring string /dev/mapper/mpath_asm*

# v$asm_disks results.
select name, group_number,disk_number, path, state, header_status from v$asm_disk order by disk_number
/
NAME       GROUP_NUMBER DISK_NUMBER PATH                                     STATE                    HEADER_STATUS
———- ———— ———– —————————————- ———————— ————————————
0          66 /dev/mapper/mpath_asm129p1               NORMAL                   MEMBER
0          67 /dev/mapper/mpath_asm130p1               NORMAL                   MEMBER
0          68 /dev/mapper/mpath_asm131p1               NORMAL                   MEMBER
0          69 /dev/mapper/mpath_asm132p1               NORMAL                   MEMBER
0          70 /dev/mapper/mpath_asm133p1               NORMAL                   MEMBER
0          71 /dev/mapper/mpath_asm134p1               NORMAL                   MEMBER
0          72 /dev/mapper/mpath_asm135p1               NORMAL                   MEMBER
0          73 /dev/mapper/mpath_asm136p1               NORMAL                   MEMBER
0          74 /dev/mapper/mpath_asm137p1               NORMAL                   MEMBER
0          75 /dev/mapper/mpath_asm138p1               NORMAL                   MEMBER
0          89 /dev/mapper/mpath_asm063p1               NORMAL                   MEMBER   ==> Cannot see the 76 th device
0          90 /dev/mapper/mpath_asm064p1               NORMAL                   MEMBER
0          91 /dev/mapper/mpath_asm065p1               NORMAL                   MEMBER
0          92 /dev/mapper/mpath_asm066p1               NORMAL                   MEMBER
# Permission – OK
* 75th asm file (Good Device)
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] ls -al /dev/mapper/mpath_asm138p1
brw-rw—- 1 orasvc01 dba 253, 248 Jan 30 17:06 /dev/mapper/mpath_asm138p1

* 76th the asm file (Cannot read this device)
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] ls -al /dev/mapper/mpath_asm175
brw-rw—- 1 orasvc01 dba 253, 197 Jan 30 17:06 /dev/mapper/mpath_asm175

# kfed result – OK
* 75th asm file (Good Device)
+ /engn001/orasvc01/product/10.2.0/bin/kfed read /dev/mapper/mpath_asm138p1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483723 ; 0x008: TYPE=0x8 NUMB=0x4b
kfbh.check:                  2774762225 ; 0x00c: 0xa56382f1
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:ORCLDISKASMDISK138 ; 0x000: length=18
kfdhdb.driver.reserved[0]:   1145918273 ; 0x008: 0x444d5341
kfdhdb.driver.reserved[1]:    827020105 ; 0x00c: 0x314b5349
kfdhdb.driver.reserved[2]:        14387 ; 0x010: 0x00003833
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                       75 ; 0x024: 0x004b                       ==> The 75th device
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:           DGDATA01_0075 ; 0x028: length=13
kfdhdb.grpname:                DGDATA01 ; 0x048: length=8
kfdhdb.fgname:            DGDATA01_0075 ; 0x068: length=13
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32973218 ; 0x0a8: HOUR=0x2 DAYS=0xd MNTH=0x8 YEAR=0x7dc
kfdhdb.crestmp.lo:           1898247168 ; 0x0ac: USEC=0x0 MSEC=0x13d SECS=0x12 MINS=0x1c
kfdhdb.mntstmp.hi:             32973219 ; 0x0b0: HOUR=0x3 DAYS=0xd MNTH=0x8 YEAR=0x7dc
kfdhdb.mntstmp.lo:           1163180032 ; 0x0b4: USEC=0x0 MSEC=0x12e SECS=0x15 MINS=0x11
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                   13892 ; 0x0c4: 0x00003644

* 76th the asm file (Cannot read this device)
+ /engn001/orasvc01/product/10.2.0/bin/kfed read /dev/mapper/mpath_asm175
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483724 ; 0x008: TYPE=0x8 NUMB=0x4c
kfbh.check:                  2433973412 ; 0x00c: 0x91137ca4
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:ORCLDISKASMDISK175 ; 0x000: length=18
kfdhdb.driver.reserved[0]:   1145918273 ; 0x008: 0x444d5341
kfdhdb.driver.reserved[1]:    827020105 ; 0x00c: 0x314b5349
kfdhdb.driver.reserved[2]:        13623 ; 0x010: 0x00003537
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                       76 ; 0x024: 0x004c            ==> the 76th device, ASM instance cannot read this device.
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:           DGDATA01_0076 ; 0x028: length=13
kfdhdb.grpname:                DGDATA01 ; 0x048: length=8
kfdhdb.fgname:            DGDATA01_0076 ; 0x068: length=13
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32982981 ; 0x0a8: HOUR=0x5 DAYS=0x1e MNTH=0x1 YEAR=0x7dd
kfdhdb.crestmp.lo:            366295040 ; 0x0ac: USEC=0x0 MSEC=0x14e SECS=0x1d MINS=0x5
kfdhdb.mntstmp.hi:             32982981 ; 0x0b0: HOUR=0x5 DAYS=0x1e MNTH=0x1 YEAR=0x7dd
kfdhdb.mntstmp.lo:            366307328 ; 0x0b4: USEC=0x0 MSEC=0x15a SECS=0x1d MINS=0x5
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                   55572 ; 0x0c4: 0x0000d914

# Check the dd results – OK
* the 75th asm device (Good)
dd if=/dev/mapper/mpath_asm138p1 bs=4096|od -tx1z|more

0000000 01 82 01 01 00 00 00 00 4b 00 00 80 f1 82 63 a5 >……..K…..c.<
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000040 4f 52 43 4c 44 49 53 4b 41 53 4d 44 49 53 4b 31 >ORCLDISKASMDISK1<
0000060 33 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >38…………..<
0000100 00 00 10 0a 4b 00 02 03 44 47 44 41 54 41 30 31 >….K…DGDATA01<
0000120 5f 30 30 37 35 00 00 00 00 00 00 00 00 00 00 00 >_0075………..<
0000140 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31 >……..DGDATA01<
0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000200 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31 >……..DGDATA01<
0000220 5f 30 30 37 35 00 00 00 00 00 00 00 00 00 00 00 >_0075………..<
0000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0000300 00 00 00 00 00 00 00 00 a2 21 f7 01 00 f4 24 71 >………!….$q<
0000320 a3 21 f7 01 00 b8 54 45 00 02 00 10 00 00 10 00 >.!….TE……..<
0000340 80 bc 01 00 44 36 00 00 02 00 00 00 01 00 00 00 >….D6……….<
0000360 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000400 00 00 10 0a 14 cd f6 01 00 2c 95 00 00 00 00 00 >………,……<
0000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0010000 01 82 02 01 01 00 00 00 4b 00 00 80 de 63 17 81 >……..K….c..<
0010020 af e0 35 00 00 00 00 00 00 00 00 00 00 00 00 00 >..5………….<
0010040 00 00 00 00 fe 00 20 00 c0 01 00 01 c0 01 00 01 >…… ………<
0010060 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 01 01 >…………….<
0010100 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 00 01 >…………….<
0010120 c0 01 00 01 c0 01 01 01 c0 01 01 01 c0 01 01 01 >…………….<
0010140 c0 01 01 01 c0 01 01 01 c0 01 01 01 c0 01 01 01 >…………….<
*
0010240 c0 01 01 01 04 00 01 01 00 00 00 00 00 00 00 00 >…………….<
0010260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0020000 01 82 03 01 02 00 00 00 4b 00 00 80 ce 10 bd 80 >……..K…….<
0020020 df ad 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0020040 00 00 00 00 c0 01 00 00 08 00 08 00 00 00 c0 01 >…………….<
0020060 10 00 10 00 00 00 00 00 18 00 18 00 00 00 00 00 >…………….<
0020100 20 00 20 00 00 00 00 00 00 00 00 00 00 00 80 00 > . ………….<
0020120 00 00 00 00 00 00 80 00 d9 0b 00 00 18 01 80 00 >…………….<

* 76th device (Read Failure)
dd if=/dev/mapper/mpath_asm175 bs=4096|od -tx1z|more

0000000 01 82 01 01 00 00 00 00 4c 00 00 80 a4 7c 13 91 >……..L….|..<
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000040 4f 52 43 4c 44 49 53 4b 41 53 4d 44 49 53 4b 31 >ORCLDISKASMDISK1<
0000060 37 35 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >75…………..<
0000100 00 00 10 0a 4c 00 02 03 44 47 44 41 54 41 30 31 >….L…DGDATA01<
0000120 5f 30 30 37 36 00 00 00 00 00 00 00 00 00 00 00 >_0076………..<
0000140 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31 >……..DGDATA01<
0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000200 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31 >……..DGDATA01<
0000220 5f 30 30 37 36 00 00 00 00 00 00 00 00 00 00 00 >_0076………..<
0000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0000300 00 00 00 00 00 00 00 00 c5 47 f7 01 00 38 d5 15 >………G…8..<
0000320 c5 47 f7 01 00 68 d5 15 00 02 00 10 00 00 10 00 >.G…h……….<
0000340 80 bc 01 00 14 d9 00 00 02 00 00 00 01 00 00 00 >…………….<
0000360 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0000400 00 00 10 0a 14 cd f6 01 00 2c 95 00 00 00 00 00 >………,……<
0000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0000660 00 00 00 00 00 00 00 00 02 ec 44 ff 00 00 00 00 >……….D…..<
0000700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0010000 01 82 02 01 01 00 00 00 4c 00 00 80 c3 62 4b 80 >……..L….bK.<
0010020 65 e0 35 00 00 00 00 00 00 00 00 00 00 00 00 00 >e.5………….<
0010040 00 00 00 00 fe 00 7d 00 c0 01 00 01 c0 01 00 01 >……}………<
0010060 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 00 01 >…………….<
*
0010460 c0 01 01 01 c0 01 01 01 c0 01 01 01 c0 01 01 01 >…………….<
*
0011020 c0 01 01 01 c0 01 01 01 14 00 01 01 00 00 00 00 >…………….<
0011040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
*
0020000 01 82 03 01 02 00 00 00 4c 00 00 80 f0 45 ff 80 >……..L….E..<
0020020 9a d8 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 >…………….<
0020040 00 00 00 00 c0 01 00 00 08 00 08 00 00 00 c0 01 >…………….<
0020060 10 00 10 00 00 00 00 00 18 00 18 00 00 00 00 00 >…………….<

kfod status=true asm_diskstring=’/dev/mapper/mpath_asm*’ disk=ALL
——————————————————————————–
Disk          Size Header    Path
================================================================================
1:      13893 Mb CANDIDATE /dev/mapper/mpath_asm001
2:      13892 Mb MEMBER    /dev/mapper/mpath_asm001p1
3:      13893 Mb CANDIDATE /dev/mapper/mpath_asm002
4:      13892 Mb MEMBER    /dev/mapper/mpath_asm002p1
5:      13893 Mb CANDIDATE /dev/mapper/mpath_asm003
6:      13892 Mb MEMBER    /dev/mapper/mpath_asm003p1
7:      13893 Mb CANDIDATE /dev/mapper/mpath_asm004
…
274:      13892 Mb MEMBER    /dev/mapper/mpath_asm137p1
275:      13893 Mb CANDIDATE /dev/mapper/mpath_asm138
276:      13892 Mb MEMBER    /dev/mapper/mpath_asm138p1   ==> MEMBER
277:      13893 Mb CANDIDATE /dev/mapper/mpath_asm139
278:      13892 Mb MEMBER    /dev/mapper/mpath_asm139p1
…
343:      62400 Mb CANDIDATE /dev/mapper/mpath_asm172
344:      62393 Mb MEMBER    /dev/mapper/mpath_asm172p1
345:      62400 Mb CANDIDATE /dev/mapper/mpath_asm173
346:      62393 Mb MEMBER    /dev/mapper/mpath_asm173p1
347:      62400 Mb CANDIDATE /dev/mapper/mpath_asm174
348:      62393 Mb MEMBER    /dev/mapper/mpath_asm174p1
349:      55572 Mb CANDIDATE /dev/mapper/mpath_asm175    ==> CANDIDATE~!
350:      55572 Mb CANDIDATE /dev/mapper/mpath_asm176
351:      55572 Mb CANDIDATE /dev/mapper/mpath_asm177
352:      55572 Mb CANDIDATE /dev/mapper/mpath_asm178
353:      55572 Mb CANDIDATE /dev/mapper/mpath_asm179

I could find out one. The added partition don’t have any partition tables, but the original asm disks do it.
Due to storage engineer fault, I suppose that the KFED results is “MEMBER” and the KFOD result is “CANDICATE’ status.
I’ll replace it to the additional disks with partition tables.
If it successful, I will reply it.

# Reference
(Doc ID 580153.1) How To Setup ASM on Linux Using ASMLIB Disks, Raw Devices or Block Devices?
In order to use a disk (e.g. SAN) in Automatic Storage Management, the disk must have a partition table.

↧

ASM diskgroup cann’t mount and drop

February 4, 2016, 8:17 pm

≫ Next: ORA-00600 [3020] when break remote mirror and startup database

≪ Previous: [Urgent] ORA-15042: ASM disk “76” is missing

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

IHAC who encounter an error as below after restart database and storage .

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "3" is missing
 
Then we found /dev/raw/raw9 is in candidate status
SQL> select path,HEADER_STATUS,MOUNT_STATUS,MODE_STATUS from v$asm_disk;
 
PATH            HEADER_STATU MOUNT_S MODE_ST
--------------- ------------ ------- -------
/dev/raw/raw9   CANDIDATE    CLOSED  ONLINE
/dev/raw/raw6   MEMBER       CLOSED  ONLINE
/dev/raw/raw7   MEMBER       CLOSED  ONLINE
/dev/raw/raw8   MEMBER       CLOSED  ONLINE
/dev/raw/raw1   FOREIGN      CLOSED  ONLINE
/dev/raw/raw4   FOREIGN      CLOSED  ONLINE
/dev/raw/raw3   FOREIGN      CLOSED  ONLINE
/dev/raw/raw2   FOREIGN      CLOSED  ONLINE
/dev/raw/raw5   FOREIGN      CLOSED  ONLINE 
 
While we are using kfed checking the status , we found /dev/raw/raw9 was invalid.
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=2 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=4 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=10 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=100 | grep KFBTYP
kfbh.type:

Right now we want to remove or drop /dev/raw/raw9 and bring the database up , But we can’t drop it in normal because /dev/raw/raw9 can’t mount.

My question is how can we drop or remove /dev/raw/raw9 (CT clear that they is no data or important data in this device) and bring the database /ASM up.

[root@DCSDB1 ~]#   ls -l /dev/raw/*
crw------- 1 root   oinstall 162, 1 01-27 06:37 /dev/raw/raw1
crw------- 1 root   oinstall 162, 2 01-27 06:37 /dev/raw/raw2
crw------- 1 oracle oinstall 162, 3 01-27 06:37 /dev/raw/raw3
crw------- 1 oracle oinstall 162, 4 01-27 06:37 /dev/raw/raw4
crw------- 1 oracle oinstall 162, 5 01-27 06:37 /dev/raw/raw5
crw------- 1 oracle oinstall 162, 6 01-27 06:37 /dev/raw/raw6
crw------- 1 oracle oinstall 162, 7 01-27 06:37 /dev/raw/raw7
crw------- 1 oracle oinstall 162, 8 01-27 06:37 /dev/raw/raw8
crw------- 1 oracle oinstall 162, 9 01-27 06:37 /dev/raw/raw9

[root@DCSDB1 ~]# cat /etc/sysconfig/rawdevices
# raw device bindings
# format:    
#           
# example: /dev/raw/raw1 /dev/sda1
#          /dev/raw/raw2 8 5
/dev/raw/raw1      /dev/mapper/oravg-ocr1
/dev/raw/raw2      /dev/mapper/oravg-ocr2
/dev/raw/raw3      /dev/mapper/oravg-vot1
/dev/raw/raw4      /dev/mapper/oravg-vot2
/dev/raw/raw5      /dev/mapper/oravg-vot3
/dev/raw/raw6     /dev/mapper/oravg-data1
/dev/raw/raw7     /dev/mapper/oravg-data2
/dev/raw/raw8     /dev/mapper/oravg-data3
/dev/raw/raw9     /dev/mapper/oravg-data5

[root@DCSDB1 tmp]# cat /proc/partitions
major minor  #blocks  name

  8     0 1754880000 sda
  8     1     514048 sda1
  8     2 1754362260 sda2
  8    16  262144000 sdb
  8    32  262144000 sdc
  8    48  262144000 sdd
  8    64  262144000 sde
  8    80  262144000 sdf
  8    96  262144000 sdg
  8   112  262144000 sdh
  8   128  262144000 sdi
  8   144  262144000 sdj
  8   160  262144000 sdk
  8   176  262144000 sdl
  8   192  262144000 sdm
  8   208  262144000 sdn
  8   224  262144000 sdo
  8   240  262144000 sdp
 65     0  262144000 sdq
 65    16  262144000 sdr
 65    32  262144000 sds
 65    48  262144000 sdt
 65    64  262144000 sdu
 65    80  262144000 sdv
 65    96  262144000 sdw
 65   112  262144000 sdx
 65   128  262144000 sdy
 65   144  262144000 sdz
 65   160  262144000 sdaa
 65   176  262144000 sdab
 65   192  262144000 sdac
 65   208  262144000 sdad
 65   224  262144000 sdae
 65   240  262144000 sdaf
 66     0  262144000 sdag
 66    16  262144000 sdah
 66    32  262144000 sdai
 66    48  262144000 sdaj
 66    64  262144000 sdak
 66    80  262144000 sdal
 66    96  262144000 sdam
 66   112  262144000 sdan
 66   128  262144000 sdao
 66   144  262144000 sdap
 66   160  262144000 sdaq
 66   176  262144000 sdar
 66   192  262144000 sdas
 66   208  262144000 sdat
 66   224  262144000 sdau
 66   240  262144000 sdav
 67     0  262144000 sdaw
253     0    1048576 dm-0
253     1   52428800 dm-1
253     2   10485760 dm-2
253     3   10485760 dm-3
253     4   10485760 dm-4
253     5   10485760 dm-5
253     6   10485760 dm-6
253     7   33554432 dm-7
253     8 1073741824 dm-8
253     9  262144000 dm-9
253    10  262144000 dm-10
253    11  262144000 dm-11
253    12  262144000 dm-12
253    13  262144000 dm-13
253    14  262144000 dm-14
253    15  262144000 dm-15
253    16  262144000 dm-16
253    17  262144000 dm-17
253    18  262144000 dm-18
253    19  262144000 dm-19
253    20  262144000 dm-20
253    21     512000 dm-21
253    22     512000 dm-22
253    23     512000 dm-23
253    24     512000 dm-24
253    25     512000 dm-25
253    26  157286400 dm-26
253    27  157286400 dm-27
253    28  157286400 dm-28
253    29  157286400 dm-29
253    30  157286400 dm-30
253    31  157286400 dm-31
253    32  157286400 dm-32
253    33  157286400 dm-33
253    34  157286400 dm-34
253    35  157286400 dm-35
253    36  157286400 dm-36
253    37  157286400 dm-37
253    38  157286400 dm-38
253    39  157286400 dm-39
253    40  157286400 dm-40
253    41  157286400 dm-41
253    42  157286400 dm-42
253    43  157286400 dm-43
253    44  157286400 dm-44
 
[oracle@DCSDB2 dbs]$ cat /app/admin/+ASM/pfile/init.ora
 
 
##############################################################################
# Copyright (c) 1991, 2001, 2002 by Oracle Corporation
##############################################################################
 
###########################################
# Cluster Database
###########################################
cluster_database=true
 
###########################################
# Diagnostics and Statistics
###########################################
background_dump_dest=/app/admin/+ASM/bdump
core_dump_dest=/app/admin/+ASM/cdump
user_dump_dest=/app/admin/+ASM/udump
 
###########################################
# Miscellaneous
###########################################
instance_type=asm
 
###########################################
# Pools
###########################################
large_pool_size=12M
 
###########################################
# Security and Auditing
###########################################
remote_login_passwordfile=exclusive
 
 
asm_diskgroups='DATA'
 
+ASM2.instance_number=2
+ASM1.instance_number=1

‘m assuming the following are true:

– that the DATA diskgroup redundancy is either NORMAL or HIGH.
– the redundancy is NOT external
– You have a recent backup of the database.

If this is the case, then do the following:

1. Mount FORCE

alter diskgroup DATA mount force;

Let it mount.

2. Inspect that everything is there.

3. Drop force the disk

alter diskgorup drop disk ‘/dev/raw/raw9’ force;

3. Issue a rebalance if one does not kick off automatcally.

alter diskgroup DATA rebalance;

and let it finish.

From the SQL language documentation for ALTER DISKGROUP:

In the FORCE mode, Oracle ASM attempts to mount the disk group even if it cannot discover all of the devices that belong to the disk group. This setting is useful if some of the disks in a normal or high redundancy disk group became unavailable while the disk group was dismounted. WhenMOUNT FORCE succeeds, Oracle ASM takes the missing disks offline.
If Oracle ASM discovers all of the disks in the disk group, then MOUNT FORCE fails. Therefore, use the MOUNT FORCE setting only if some disks are unavailable. Otherwise, useNOFORCE.
In normal- and high-redundancy disk groups, disks from one failure group can be unavailable and MOUNT FORCE will succeed. Also in high-redundancy disk groups, two disks in two different failure groups can be unavailable and MOUNT FORCE will succeed. Any other combination of unavailable disks causes the operation to fail, because Oracle ASM cannot guarantee that a valid copy of all user data or metadata exists on the available disks.

Are you sure its external? I don’t mean to ask that like you wouldn’t know but here is a sure way to know.

There is a tool called amdu and its in your grid home. This is 11gR2, correct? If so, you can do the following:

$ORACLE_HOME/bin/amdu

It will create a amdu directory with the current date and in that directory it creates a file called report.txt. It will report out all of the disks belonging to the DATA disk groups. One of the fields for each disk
is redundancy. If its set to 0 or 1, I believe your external. If its set to 2 or 3, your NORMAL or HIGH.

I don’t know how an external redundant AMS diskgroup can be recovered.

From the ASM doc:

External redundancy
Oracle ASM does not provide mirroring redundancy and relies on the storage system to provide RAID functionality. Any write error causes a forced dismount of the disk group. All disks must be located to successfully mount the disk group.

I will let someone else comment but, you may have to restore and recover the database.

if only the ASM diskheader was corrupted, and not the whole disk, it might be worth a try to only recover the disk header. This does make sense in an external DG, since you can’t access the Data anyways. Search in MOS how to do this.

↧

ORA-00600 [3020] when break remote mirror and startup database

February 4, 2016, 9:03 pm

≫ Next: Oracle ORA-600 [25027]

≪ Previous: ASM diskgroup cann’t mount and drop

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

1. Customer is using HDS remote mirror for DR solution. After breaking the mirror, in the DR site, some databases cannot startup with the following errors:

a1.
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: ‘+DATA07_AI401PO1/ai401po1/datafile/sysaux_01.dbf’
ORA-01207: file is more recent than control file – old control file

a2 (same database as a1, after some commands).
ORA–00600: internal error code, arguments: [3020], [5], [896], [20972416], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 5, block# 896, file offset is 7340032 bytes)
ORA-10564: tablespace UNDOTBS2
ORA-01110: data file 5: ‘+DATA07_AI401PO1/ai401po1/datafile/undotbs2_01.dbf’
ORA-10560: block type ‘KTU UNDO BLOCK’

Resolved by “recover datafile 3”.

b.
ERROR at line 1:
ORA–00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],
[], [], [], [], [], [], []

Resolved by “recover database”.

Questions:
Q1. Understand from the MOS note 604683.1 and 784776.1 that, the storage vendor (HDS) is responsible for Oracle requirements of “crash consistent”, “write ordering”, POC and procedure. However, given the above errrors, how to tell if the break mirror fulfill Oracle requirements or not?

Q2. For (b) above, the MOS note 393984.1 matches it. It may happen even in a single site crash recovery scenario. It is an Oracle bug or expected behavior?

The database version is 11.2.0.2.

A1. The errors in a1 indicate that a datafile had at least a higher database checkpoint than the controlfile. It may have helped to get a controlfile dump and data file header dumps to verify.

It’s not clear what commands were issued to get to the state of a2, but an ORA-600 [3020] probably means that a data block was behind the file header checkpoint info. In other words, based on file header info, we started recovery with logfile #N. But block 896 probably needed a redo record from logfile N-1 to be applied first. If recovering from an older backup of the data file worked, then that would give more weight to that theory. Note 30866.1 does list some bugs where you can still get ORA-600 [3020] during regular recovery though.

A2. You can also read bugs that reference ORA-600 [kcratr_scan_lastbwr]. The ORA-600 [kcratr1_lastbwr] seems to only be in 11.2.0.1 rather than in 11.2.0.2. Maybe the customer is really on 11.2.0.1 + PSU 2? Anyway it could be indicative of a stale mirror as noted in bug 9584943, but there are other bugs that I didn’t read in detail.

Customer just updated that the EMC “consistent group” was not implemented for some reasons.

We are going to tell customer that, in this break remote mirror DR solution, if the Oracle requirements of “crash consistent”, “write ordering” cannot be meet (MOS note 604683.1 and 784776.1 ), in the worst case, customer may not be able to even recover the database. Is this correct?

Recovery might work if they restore a prior backup and roll forward. :-)
But maybe full recovery from a backup could still result in transaction loss if the active online redo logs are also corrupt because of lost writes to the mirror those redo logs reside on.

↧

Oracle ORA-600 [25027]

February 20, 2016, 8:34 am

≫ Next: ASM Metadata Dump Utility (AMDU)

≪ Previous: ORA-00600 [3020] when break remote mirror and startup database


ERROR:

Format: ORA-600 [25027] [a] [b]

VERSIONS:
  versions 9.2 and above


ARGUMENTS:
  Arg [a]  Tablespace Number (TSN)
  Arg [b]  Decimal Relative Data Block Address (RDBA)

In 12c it includes Multitenant information:
  
  Arg [a]  0 if Multitenant is not enabled or 0 if there is not Root CDB session, 1 ROOT PDBID, otherwise PDBID top session
  Arg [b]  PDBID
  Arg [c]  Tablespace Number (TSN)
  Arg [d]  Decimal Relative Data Block Address (RDBA)



SUGGESTIONS:
  
 1. If the Arg [b] onr [d] in 12c (the RDBA) is 0 (zero), then this could be caused by fake indexes.

  The following query will list fake indexes:

     select do.owner,do.object_name, do.object_type,sysind.flags
     from dba_objects do, sys.ind$ sysind
     where do.object_id = sysind.obj#
     and bitand(sysind.flags,4096)=4096;

  If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error. 

2. Run analyze table validate structure on the table referenced in the Current SQL statement in 
    the related trace file.

  If the Known Issues section below does not help in terms of identifying
  a solution, please submit the trace files and alert.log to Oracle 
  Support Services for further analysis.

  Known Issues:

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:

NB Prob Bug Fixed Description
II 18878420 ORA-600 [25027] can occur with large datafiles using ASSM
I 18490543 12.1.0.2, 12.2.0.0 ORA-600 [25027][0][0] from ALTER TABLE .. MOVE with nosegment index
I 14576755 12.1.0.1.4, 12.1.0.2, 12.2.0.0 Corruption type ORA-600 errors from heavy concurrent DML on index cluster table
II 14010183 11.2.0.3.BP22, 11.2.0.4.2, 11.2.0.4.BP03, 12.1.0.1.4, 12.1.0.2, 12.2.0.0 ORA-600 [ktspfundo:objdchk_kcbgcur_3] in SMON after failed temp segment merge load
III 13503554 11.2.0.4, 12.2.0.0 Various ORA-600 errors crashing the apply process in a downstreams environment
II 13785716 11.2.0.4, 12.1.0.1 Intermittent ORA-600 [25027] during upgrade from 10.2 to 11.2
I 11661824 11.2.0.1.BP09 Assorted Dumps by SQL*LOADER using DIRECT and PARALLEL after exadata bp8 is applied
II 19171086 12.2.0.0 ORA-600 [25027] when local index has unusable index partitions
II 10067246 12.1.0.2, 12.2.0.0 ORA-600 [25027] ORA-7445 [kauxs_do_dml_cooperation] ORA-8102 during CREATE INDEX ONLINE
III 13505390 11.2.0.3.BP04, 11.2.0.4 ORA-600 [kkedsgettabblkcnt: null segment] / ORA-600 [25027] against PARTITION table with Delayed Segment Creation or Interval Partitioned Table
II 14138130 11.2.0.3.5, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1 SGA memory corruption / ORA-7445 when modifying uncompressed blocks of an HCC-compressed segment
II 13566938 11.2.0.3.4, 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1 ORA-600 [kcbgtcr_1] / ORA-600 [kkpo_rcinfo_defstg:objnotfound] / ORA-600 [25027] against a Partitioned Table during Dynamic Sampling
II 13330018 11.2.0.4, 12.1.0.1 ora-600 [ktspfmb_add1], [4294959240] occurred, then cannot recover with ora-600[25027]
III 13103913 11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1 ORA-600 [25027] [ts#] [1] or false ORA-1 during dml while index is being rebuilt online
II 12821418 11.2.0.3.8, 11.2.0.3.BP18, 11.2.0.4, 12.1.0.1 Direct NFS appears to be sending zero length windows to storage device. It may also cause Lost Writes
II 12619529 11.2.0.3.BP18, 11.2.0.4, 12.1.0.1 ORA-600[kdsgrp1] from SELECT on plugged in tablespace with FLASHBACK
II 12321309 11.2.0.4, 12.1.0.1 ORA-600 / ORA-8103 UNUSABLE state of partitioned index is not carried across by TABLESPACE transport using DataPump
III 10394825 11.2.0.3, 12.1.0.1 ORA-600[25027] [..] [0] inserting to ASSM segment
- 10329146 11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1 Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+ II 10209232 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1 ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
+ IIII 9399991 11.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1 Assorted Internal Errors and Dumps (mostly under kkpa*/kcb*) from SQL against partitioned tables
* III 9145541 11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1 OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
E II 8837919 11.2.0.2, 12.1.0.1 DBV / RMAN enhanced to detect ASSM blocks with ktbfbseg but not ktbfexthd flag set as in Bug 8803762
III 8803762 11.1.0.7.6, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 ORA-600[kdsgrp1], ORA-600[25027] or wrong results on 11g database upgrade from 9i
II 8716064 11.2.0.2, 12.1.0.1 Analyze Table Validate Structure fails on ADG standby with several errors
+ II 8597106 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 Lost Write in ASM when normal redundancy is used
II 7251049 11.2.0.1.BP08, 11.2.0.2, 12.1.0.1 Corruption in bitmap index introduced when using transportable tablespaces
- 8437213 10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1 ASSM first level bitmap block corruption
III 8356966 11.2.0.1 ORA-7445 [kdr9ir2rst] by DBMS_ADVISOR or false ORA-1498 by ANALYZE on COMPRESS table
* III 8198906 10.2.0.5, 11.2.0.1 OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
* III 7263842 10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1 ORA-955 during CTAS / OERI [ktsircinfo_num1] / dictionary inconsistency for PARTITIONED Tables
- 6666915 10.2.0.5, 11.1.0.7, 11.2.0.1 OERI[25027] / dictionary corruption from concurrent partition DDL
- 6025993 10.2.0.5, 11.1.0.6 ORA-600 [25027] in flashback archiving queries
- 4925342 9.2.0.8, 10.2.0.3, 11.1.0.6 OERI [25027] / OERI [25012] on IOT analyze estimate statistics
* IIII 7190270 10.2.0.4.1, 10.2.0.5 Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
- 4310371 9.2.0.8, 10.2.0.2 OERI [25027] from concurrent startup / shutdown in RAC
- 4177651 10.2.0.1 Row migration within a MERGE may OERI[25027]
- 4020195 10.1.0.5, 10.2.0.1 OERI 25027 can occur in RAC accessing transported tablespace
- 4000840 9.2.0.7, 10.1.0.4, 10.2.0.1 Update of a row with more than 255 columns can cause block corruption
II 3963135 10.1.0.5, 10.2.0.1 OERI[kcbgcur_3] / OERI:25027 during bitmap index updates
- 3829900 10.1.0.4, 10.2.0.1 OERI[25027] possible accessing index in 10g
- 2942185 9.2.0.6, 10.1.0.4, 10.2.0.1 Corruption occurs on direct path load into IOT with ADDED columns
II 3085057 10.1.0.2 ORA-600: [25027] from ALTER TABLE .. SHRINK SPACE CASCADE
- 2926182 9.2.0.5, 10.1.0.2 OERI[25027] / ORA-22922 accessing LOB columns in IOT in AFTER UPDATE trigger

↧

ASM Metadata Dump Utility (AMDU)

February 22, 2016, 7:48 pm

≫ Next: BLOCK CORRUPTIONS ON ORACLE AND UNIX

≪ Previous: Oracle ORA-600 [25027]

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ASM Metadata Dump Utility (AMDU)

This is a functional description of a utility to quickly

extract all the available metadata from one or more ASM

disks and/or generate formatted printouts of individual

blocks. The dump output can be shipped back to Oracle for

analysis. The utility can be used at Oracle to generate

formatted block printouts from the dump output. The utility

does not require that any disk group is even mountable. It

also has the ability to extract one or more files from an

unmounted diskgroup and write them to the OS file system.

Operations

AMDU performs three different functions. A given execution

of AMDU may perform one, two or all three of these

functions.

1.Dump metadata from ASM disks to the OS file system for

later analysis.

2.Extract the contents of an ASM file and write it to an OS

file system even if the diskgroup is not mounted.

3.Print metadata blocks based on the C structures in the

blocks, or in hex.

The input data may be the contents of the ASM disks, or it

may be derived from a directory created by a previous run

of AMDU. The options -diskstring and -exclude are used to

specify ASM disks to read. The option -directory specifies

a directory created by a previous run of AMDU. The

directory may contain a copy of the original directory

contents. These options are incompatible with each other.

Operational Phases

The basic steps of operation are listed in this section.

Command line options provide the ability to control which

phases are executed and how they operate.

1.Discover disks: This uses ASM discovery to find a set of

disks. The headers are read to determine which disks are

in which diskgroups. The disks to be scanned in the next

phase are chosen. The results of the discovery are put in

the report file. With the option -directory, reading the

existing report file rather than creating a new one

accomplishes this phase.

2.Scan disks: The allocation tables of disks are scanned.

Based on the allocation table entries and command line

options, interesting blocks are written to image files.

Map files are created describing the interesting AU's and

where they were written to the image files. If any files

are being extracted, their extent maps are constructed in

memory from the allocation table entries (extent maps are

ignored). If any blocks are being printed the location of

the blocks is saved in memory. With the -directory option

this phase is accomplished by reading the existing map

files rather than creating map and image files.

3.Extract files: The extent maps of files to extract are

sorted. The file data is read from the ASM disks and

written to output files. If -directory is specified for extraction of

an ASM metadata file, the map and image files are read to build

the extent maps.

4.Printout blocks: Formatted block printouts are written to

standard out along with information about how the block

data was read. A kfed command to dump the block on the

system where the report was generated is also printed.

With the -directory option the data is read from the

image files.

Output Files

Four types of ouptut files are created by AMDU. They are

all placed in a new dump directory. The file names are

automatically generated by AMDU. A new dump directory is

created for each run so the output files can be easily

tarred and zipped to send back to Oracle. The name of the

directory is based on the time and date to one second

resolution. The directory name is written to standard out

before any files are created in the directory. Note that

the directory name is relative to the current directory

unless a full path name is specified on the command line

with the -parent option.

If AMDU is run with the -directory option then no dump

directory and no output files are created. Instead the -

directory option specifies the location of a previously

created dump directory. In this case -print can be

specified to generate formatted block printouts from the

previously created dump directory. The printouts are sent

to standard out rather than creating a new file. If -extract

is specified with -directory, -output is required to indicate

the location of the extracted file.

Extracted Files

One extracted file is created for every file listed under

the -extract option on the command line. Normally, the extracted

file is placed in the dump directory under the name

<group>_<number>.f where <group> is the diskgroup name in

uppercase, and <number> is the file number of the file specified

on the command line. The extracted file will appear to have the

same contents it would have if accessed through the

database. If some portion of the file is unavailable then

that portion of the output file will be filled with

0xBADFDA7A, and a message will appear on stderr.

The -output option can be used to extract a single file to

a specific file name rather than the dump directory. This

can be used in combination with -nodir option to avoid the

creation of a dump directory completely. If -directory is

specified, -output is required.

Image Files

Image files contain block images from the ASM disks. This

is the raw data that is copied from the disks. Since there

can be a lot of data, and some file systems have problems

with large files, an image file is always smaller than 2

gigabytes. When there is more that 2Gb of data, multiple

image files are created. An image file may contain data

from multiple disks, but only from disks that are part of

the same disk group (according to the disk's header). All

the data from one disk will be grouped together in the

image files (possibly spanning a file boundary). Blocks

from a single allocation unit will always be adjacent and

not span image files. Uninteresting data, such as empty

blocks, will not be dumped, so a partial AU might be in the

dump. Thus the size of a full image file is not constant.

Disks that have been dropped from a disk group will still

contain the group name in their header and may be included

in the image files for that disk group if the -former

option is specified. Note that, unlike mount, the PST is

not consulted to decide which disks are parts of the disk

group. Disks which were forcibly dropped will be included

even without the -former option.

Image file names are constructed from the group name and a

sequence number. The form is as follows where <group> is

the group name in uppercase, and <NNNN> is the sequence

number including leading zeroes. The first image file has

sequence number 0001.

<group>_<NNNN>.img

Map Files

Map files are ASCII files that describe the data in the

image files for a particular disk group. AMDU creates one

map file for each series of image files, i.e. one map file

per disk group. The map file contains one line for each

allocation unit that has contents dumped to an image file.

Some allocation units may have an entry in the map file

even though nothing was written to the image file. Every

line has the same fields of the same length. The lines are

in the order of the data in the image file, but contain

absolute references to the locations in the image file so

that they can be sorted into different orders without

losing track of where the AU is stored in the image files.

The following fields appear in each line. The fields are

separated by blanks. Each field starts with a unique letter

immediately followed by a decimal number with leading

zeroes. This should facilitate using sort and grep to

reorganize the map. In the following descriptions the

leading letter and the number of decimal digits are given

within parentheses. For example (D4) means the letter 'D'

followed by 4 decimal digits.

1.Disk Report Number (N4): Every disk discovered by shallow

discovery is assigned a disk report number. This number

is printed in the report file along with information

about the disk. Two disks from the same diskgroup with

the same disk number will still have different disk

report numbers. The first disk reported will have a disk

report number of 1.

2.Disk number (D4): This is the disk number field extracted

from the header. If the disk number is invalid or the

header unrecognizable this field is 9999.

3.Disk repeat (R2): Normally this is zero. It is possible

to find two disks for the same disk number in the same

disk group. The first repeat gets a repeat count of 1 for

its map file entries. If there are more than 100 disks

with the same number then extra digits will be printed

and the line sizes will be wrong. This is highly

unlikely.

4.Allocation Unit (A8): The AU within the disk where the

data was read. Note that this is different than the

extent number for physically addressed metadata since

extent 2 is near AU 113,000. If the disk is greater than

100 terabytes and the AU size is one megabyte, then this

field could exceed 8 digits.

5.File Number (F8): The ASM file that owns the extent. If

the number is less than 256 then this is ASM metadata or

an ASM registry. If this is physically addressed metadata

then the file number will be 00000000.

6.Indirect flag (I1): If this is a data extent for the file

then the indirect flag is 0. If this is an indirect

extent then this is 1.

7.Extent Number (E8): The physical extent number within the

file. This is the index in the file extent map that a

database instance would use to find this AU. If the file

was (two-way) mirrored then this is a primary extent if

the number is even, and a secondary copy if it is odd. If

this is an indirect extent then this is a value between 0

and 299 giving the index into the indirect extents. For

physically addressed metadata this is the extent within

the physically addressed metadata, not the AU within the

disk.

8.AU within extent (U2): Large extents are supported for

large files. Thus there could be multiple AU's dumped for

the same extent. Note that metadata files do not

currently use large extents so this only happens for user

file dumps to image files.

9.Block count (C5): The number of blocks copied to the

image file from the AU. A lot of space is saved by not

creating images of blocks that are just initialized

contents. This is particularly true for indirect extents

where most indirect extents will have only a few blocks

of extent pointers. If the extent is not dumped to the

image file then this is zero. The count is in ASM

metadata blocks, even if the file number is >256 and the

indirect flag is 0. This is normally 4K blocks, but could

be different in the future. With the -noimage option this

is always zero since no images are ever created.

10.Image File Sequence Number (S4): This is the NNNN

field of the image file name where blocks from the AU are

dumped. With the -noimage option this is always zero

since no image files are ever created.

11.Byte Offset in Image File (B10): This is the location

within the image file where the block images appear. It

is always a multiple of the ASM metadata block size.

Since the image file is always less than 2Gb this will

always fit in a 32 bit signed integer. Note that this

will be an offset to the end of the previously dumped AU

when the block count is zero. With the -noimage option

this is always zero since no images are ever created.

12.Corrupt Block Flag (X0): If any of the blocks in the

AU are corrupt, then the line will end with 'X'. Normally

this is a blank character so that the line ends in two

blanks.

This adds up to 56 digits, 12 letters, 11 blanks, and one

'\n' per line. This is a total of 79 characters including

the newline.

The map files are named "<group>.map" where <group> is the

disk group name in uppercase.

Report File

One report file is generated for every run of the utility

without the -directory option (except if -noreport is

specified). It is written to "report.txt" in the dump

directory. If -nodir is specified the report is written to

standard out instead of the dump directory name. Lines are

flushed to the report file as soon as they are generated so

tail -f can be used to monitor progress.

When AMDU is run with -print and -directory options then no

report is generated. Instead an existing report file must

be found and parsed. Information in the report file is used

instead of discovering the disks. The map file is used to

find the blocks to printout, and the block contents are

retrieved from the image files.

The report is divided into sections and subsections. Each

section begins with a title line. The title line has the

title centered and surrounded with '*'. There are always at

least three asterisks on either side of the title. A

subsection title is like a section title except that it is

surrounded with '-' rather than '*'.

Any errors reported by AMDU are also printed in the report

file. Warnings about unexpected conditions are printed in

upper case surrounded by "** ".

The following describes the sections in the report file.

AMDU Setting

The first lines describe the environment where the dump was

created. This includes the time when the report was

generated and the endianess of the data in the image files.

The host name, platform, and software version are also

included.

The following subsections describes all the arguments from

the command line: operations, disk selection, reading

control, and output control. This is a report of the

settings that result from the command line parsing, not a

copy of the command line.

The CORE package LRM is used to parse the command line

arguments. No dump directory or report file is generated if

there are argument parsing errors or if the user is only

requesting help. Command line errors will result in an exit

status of 1 rather than 0. Problems reading disks or

extracting a file will be reported on stderr and the report

file. The exit status will be 5 in accordance with LPM

standards.

Discovery

This section describes every disk returned by discovery.

There is one subsection for each disk. The title contains

the disk report number. This is followed by the information

from shallow discovery. If deep discovery is done for the

disk, then the results of deep discovery are reported next.

A warning message may indicate that a disk is being

ignored.

If the -noscan option is specified then this is the end of

the report. If the -noread option is given then this is the

end of the report and there is no deep discovery

information for any of the disks.

Sleeping for Heartbeat

Unless the -noheart option is given, a section header is

reported containing the time sleeping for heatbeat

detection. This makes it likely that any disks which

contain a PST of a mounted diskgroup will have a heartbeat

detected. The section has no lines other than the section

header.

Diskgroup Scan

There is one section for every disk group encountered by

deep discovery and referenced in either a -dump, -extract

or -print option ("-dump all" references all diskgroups

mentioned in any valid disk headers). The name of the disk

group is in the section header. This is followed by

information gathered about the diskgroup during deep. This

includes group wide parameters from the disk headers such

as AU size and creation time.

A disk scan subsection for each scanned disk in the

diskgroup follows the header. Disks that are ignored due to

deep discovery and/or command line options, do not have a

subsection. The subsection header includes the disk report

number. Some of the information from discovery is repeated

for convenience. This is reported before the scan begins.

Error messages and warnings, such as heartbeat detected,

may be reported during the scan. When the scan is complete

statistics from the scan are reported. This includes

information about data written to the map and image files.

Statistics such as space allocated and free are also

reported.

A group report subsection follows all the disk scan

subsections for the disks in the group. This subsection

gives cumulative statistics from all the disks in the disk

group.

Extracting File Sections

A section is reported for each file that is extracted. The

section header includes the diskgroup name and file number

from the -extract option. The name of the OS file created

by the extraction is on the first line of the section. Any

errors encountered are reported followed by statistics

about the extraction. If -directory is indicated, this info

will be writted to stdout.

End of Report

The last line of a report is the end of report section

header.

Printing Blocks

The -print option can be used to generate a formatted

printout of blocks from a diskgroup that is scanned in this

run of AMDU or from a dump directory created by a previous

run of AMDU. Use the -directory option to print from a

previous AMDU run.

Output Format

The formatted output is sent to standard out rather than to

a file. A section header, as in a report file, is printed

for each -print option on the command line. The section

header includes the block specification for the printout.

There is one subsection for each count in the block

specification. The subsection title is "BLOCK n OF c" where

n is the number of this block (starting at one), and c the

count of blocks in the block specification.

There may be multiple blocks on disk that match the

criteria for printing in one subsection. This may be due to

multiple disks appearing to be the same ASM disk or it may

be due to the normal mirroring of data. With the -fullscan

option it is common to encounter old stale blocks that

match the same criteria. A block description is printed for

each block that matches the printing criteria. When the

block contents are identical, then multiple block

descriptions are printed before the formatted printout of

the block. If the blocks are different then there may be

multiple formatted printouts in one subsection.

A block description consists of three lines. The first line

is a separator of all dots. The second line gives the

location of the block both as (disk, AU, block) and (file,

extent, block). The third line is the kfed command that

would create the same formatted output. This is useful for

constructing a kfed command to patch the block. It includes

the device name of the disk on the system where the dump

was created. If the AMDU directory was copied from another

system then the kfed command will have to be run on the

other system.

Block Specification

There are five different kinds of <block_spec>'s for

specifying a range of blocks to printout. They all start

with a diskgroup name. The name is case insensitive but it

is converted to uppercase. The name is followed by values

specified by '.', letter, number. The letter indicates the

meaning of the number and may be upper or lower case. The

number is a decimal number less than 2^32. The last value

may be an optional count of blocks to print using the

letter 'C'. So if the last field is ".C4" Then four blocks

will be printed starting at the first one specified by the

<block_spec>.

The five forms are as follows:

1.Report disk block: This form specifies a disk by it's

discovery order and a block by AU and block within AU.

The disk report number is always unique, but it is hard

to know the number unless you have already run AMDU and

seen at least the shallow discovery report. The advantage

of this form is that it never refers to multiple blocks

since AMDU gives every disk a unique disk report number.

For example <block_spec> "DATA.N0001.A1.b0.c256" would

dump the entire PST AU from the first disk discovered

(providing it is in disk group DATA). Note that the

diskgroup name must match even though disk report numbers

are unique.

2.Group disk block: This is similar to report disk block

except that the ASM disk number is given rather than the

report disk number generated by AMDU. It is possible, but

a bad configuration, to see more than one disk with the

same ASM disk number for the same ASM disk group. If this

happens then this <block_spec> will refer to the blocks

on all the disks.

For example <block_spec> "Data.d2.A0.B0" would print the

disk header from disk 2 in diskgroup DATA. Also

<block_spec> "data.d2.a0.b256" and "data.d2.a1.b0" would

both print the PST header block of disk 2 in diskgroup

DATA (assuming an AU size of 1 MB and metadata block size

of 4096).

3.Extent file block: This form allows specification of a

block by a file physical extent number and block within

extent. When a file is mirrored there are two physical

extents for every virtual extents. This form allows

specification of only one mirror copy. It will support

printing of any file that is described by the map file.

However it is unlikely that a block dump will produce

anything but hex data for anything that is not an ASM

metadata file. Note that the block size is always the ASM

metadata block size no matter which file is being

printed. Note that any striping is not taken into account

when locating the block.

For example <block_spec> "flash.F3.X42.B0" would print

the secondary mirror copy of the checkpoint block of ACD

thread 2 in diskgroup FLASH. "Data.f3.x0.b0.c10752" would

print all the redo for thread 1 in diskgroup DATA (I hope

you have an empty file system)

4.Virtual file block: This form allows specification of a

block by its virtual block number within the file. Unless

this is an external redundancy disk group, all 3 copies

of the block are printed. If the copies are the same then

only one printout of the contents is generated. This form

is only allowed for ASM metadata files because the

redundancy can be determined from the diskgroup type, and

there is no striping.

For example <block_spec> "flash.F1.v2856" would print the

file directory block for file 2856 in diskgroup FLASH.

5.Extent map file block: This form allows specification of

a block in a files extent map. The first 60 extent

pointers are in the file directory the rest are in extent

map with 480 pointers per map block. For example

<block_spec> "flash.f2856.m0.c427" would print the entire

extent map for a 200GB file number 2856.

Command Line

AMDU uses the LRM package from CORE to parse its command

line. Thus it follows the LRM conventions. In particular it

follows the unix command style. The command line looks like

this:

admu [ <option> ... ]

Some options require specification of a number or string

while others are boolean flags that do not require a value.

Some options may appear multiple times to provide multiple

values. String options are specified as follows:

-keyword string

Number options are specified as:

-keyword number

Note that a number may end in K, k, M, m, G, or g to

indicate kilo (2^10), mega(2^20), or giga (2^30).

Boolean flags are specified as:

-keyword

Note that the CORE package LRM is used to parse the command

line options. This means you can specify options as

keyword=value, but unless you are very clever and

understand completely how LRM works, you will get

unexpected results such as ignored parameters. Stick to -

keyword syntax and you will be fine.

The options fall into four broad classes: operations, disk

selection, read control, and output control.

Operation

These parameters control the fundamental function of AMDU:

dumping metadata, extracting file contents, or printing

metadata blocks. If none of these are specified then only

discovery is performed (same as -noscan).

1.-dump <diskgroup>: This option specifies the name of a

diskgroup to have its metadata dumped. This option may be

specified multiple times to dump multiple diskgroups. If

the diskgroup name is "ALL" then all diskgroups

encountered will be dumped. The diskgroup name is not

case sensitive, but will be converted to uppercase for

all reports. If this option is not specified then no map

or image files will be created, but -extract and -print

may still work.

2.-extract <diskgroup>.<file>: This extracts the file (by name

or number) from the named diskgroup, case insensitive. This

option may be specified multiple times to extract

multiple files. The extracted file is placed in the dump

directory under the name <diskgroup>_<number>.f where

<diskgroup> is the diskgroup name in uppercase, and

<number> is the file number. The -output option may be

used to write the file to any location and is required

if -directory is specified. The extracted

file will appear to have the same contents it would have

if accessed through the database. If some portion of the

file is unavailable then that portion of the output file

will be filled with 0xBADFDA7A, and a message will appear

on stderr.

ASM metadata files Number Name

FILE DIRECTORY 1 FILE

ASM DISK DIRECTORY 2 ASMDISK

ACTIVE CHANGE DIRECTORY 3 CHANGE

CONTINUING OPERATIONS DIRECTORY 4 CONTOP

TEMPLATE DIRECTORY 5 TEMPLATE

ALIAS DIRECTORY 6 ALIAS

AVD VOLUME FILE DIRECTORY 7 VOL

USED SPACE 8 USEDSPC

ATTRIBUTES DIRECTORY 9 ATTRIBUTES

ASM USER DIRECTORY 10 USER

ASM USER GROUP DIRECTORY 11 GROUP

STALENESS DIRECTORY 12 STALENESS

Files which have fixed numbers but are not ASM metadata files

STALE BITMAP SPACE REGISTRY 254 STALEREG

ORACLE CLUSTER REPOSITORY REGISTRY 255 OCR

3.-print <block_spec>: This option prints one or more

blocks to standard out. This option may be specified

multiple times to print multiple <block_spec>'s. The

printout contains information about how each block was

found as well as a formatted printout. Multiple blocks

matching the same <block_spec> may be found when scanning

the disks. For example there may be multiple disks that

have headers for the same diskgroup and disk number. If

the block is from a mirrored file then multiple copies

should exist on different disks. If multiple copies of

the same block have identical contents then only one

formatted printout of the contents will be generated, but

a header will be printed for each copy. A <block_spec>

may include a count of sequential blocks to print. A

<block_spec> may specify a block either by disk or file.

<block_spec> ::= <single_block> | <single_block>.C<count>

<single_block> ::= <report_disk_block> | <group_disk_block> |

<extent_file_block> | <virtual_file_block> | <xmap_file_block>

<report_disk_block> ::=

<group_name>.N<report_number>.A<au_number>.B<block_number>

<group_disk_block> ::=

<group_name>.D<disk_number>.A<au_number>.B<block_number>

<extent_file_block> ::=

<group_name>.F<file_number>.X<physical_extent>.B<block_number>

<virtual_file_block> ::=

<group_name>.F<file_number>.V<virtual_block_number>

<xmap_file_block> ::=

<group_name>.F<file_number>.M<extent_map_block_number>

Disk Selection

These parameters control the disk discover phase of

operations. They allow specification of which disks should

be scanned for AU's to dump. The operation options -dump, -

extract, and -print also limit scanning to disks in the

diskgroups specified by the options. The following options

can be specified to control how the disks are discovered

and scanned

1.-diskstring <string>: By default the null string is used

for discovery. The null string should discover all disks

the user has access to. Many installations specify an

asm_diskstring parameter for their ASM instance. If so

that parameter value should be given here. Multiple

discovery strings can be specified by multiple

occurrences of -diskstring <string>. Beware of shell

syntax conflicts with discovery strings. Diskstrings are

usually the same syntax the shell uses for expanding path

names on command lines so they will most likely need to

be enclosed in single quotes.

2.-exclude <string>: Multiple exclude options may be

specified. These strings are used for discovery just like

the values for diskstring. Only shallow discovery is done

on these diskstrings. Any disks found in the exclude

discovery will not be accessed. If they are also

discovered using the -diskstring strings, then the report

will include the information from shallow discovery along

with a message indicating the disk was excluded.

3.-former: Normally disks marked as former are not scanned,

but this option will scan them and include their contents

in the output. This is useful when it is necessary to

look at the contents of a disk that was dropped. Note

that dropped normal disks will not have any entries in

their allocation tables and thus only the physically

addressed extents will be dumped. Force dropped disks

will not have status former in their disk headers and are

not affected by this option. However if DROP DISKGROUP is

used, the disks will have the contents as of the time of

the drop, and will be in status former. Thus this option

is useful for extracting files from a dropped diskgroup.

4.-baddisks <diskgroup>: Normally disks with bad disk

headers, or that look like they were never part of a disk

group, will not be scanned. This option forces them to be

scanned anyway and to be considered part of the given

diskgroup. This is most useful when a disk header has

been damaged. The disk will still need to have a valid

allocation table to drive the scan unless -fullscan is

used. If block 0 is damaged, AMDU will try to read the

backup disk header. If this fails, and AMDU needs to

construct a working disk header, at least one block in the

first two AUs must be valid so that the disk number can be

determined. The options -ausize and -blksize are required

since these values are normally fetched from the disk header.

If the diskgroup uses external redundancy then -external should

be specified. These values will be compared against any

valid disks found in the diskgroup and they must be the

same.

5.-directory <string>: This option completely eliminates

the discovery and disk scanning phases of operation. It

specifies the name of a dump directory from a previous

run of AMDU. The report file and map files are read

instead of doing a discovery and scan. The parsing of

these ASCII files is very dependent on them being exactly

as written by AMDU. AMDU is unlikely to work properly if

they have been modified by a text editor, or if some of

the files are missing or truncated. Note that the

directory may be a copy FTP'ed from another machine. The

other machine may even be a different platform with a

different endianess.

Read Control

These parameters control which AU's on a disk are read and

how they are found. Every AU read from a -dump diskgroup is

dumped, unless the -noimage output option is set. Reading

still checks for I/O errors and corrupt blocks even if -

noimage is set. The default scanning algorithm is to look

at the allocation table and dump any extent that contains

ASM metadata according to its allocation table entry. The

registries are not considered metadata and are not dumped

by default. Registries are not modified through the ASM

buffer cache, and may not have ASM block headers on them.

If part of the AU contains metadata blocks that were never

modified, then the unmodified blocks are not dumped. The

most common case is the extra blocks in an indirect extent.

1.-fullscan: This option reads every AU on the disk and

looks at the contents of the AU rather than limiting the

AU's read based on the allocation table. This is useful

when the allocation table is corrupt or needs recovery.

An AU will be written to the image file if it starts with

a block that contains a valid ASM block header. The file

and extent information for the map will be extracted from

the block header. Physically addressed metadata will be

dumped regardless of its contents. This option is

incompatible with extracting a file. It is an error to

specify -extract with this option. Note that this option

is likely to find old garbage metadata in unallocated

AU's since there is no means of determining what is

allocated. Thus there may be many different copies of the

same block, possibly of different versions.

2.-ausize <bytes> -blksize <bytes>: Both of these options

must be set when -baddisks is set. They must both be a

power of 2. These sizes are required to scan a disk

looking for metadata, and it is normally read from the

disk header. The values apply to all disks that do not

have a valid header. The values from the disk header will

be used if a valid header is found.

3.-external: Normally amdu determines the diskgroup

redundancy from the disk headers. However this is not

possible with the -baddisks option. It is assumed that

the redundancy of diskgroup "none" is normal or high

unless this option is given to specify external

redundancy.

4.-compare: This option only applies to file extraction

from a normal or high redundancy disk group. Every extent

that is mirrored on more than one discovered disk will

have all sides of its mirror compared. If they are not

identical a message will be reported on standard error

and the report file. The message will indicate which copy

was extracted. A count of the blocks that are not

identical will be in the report file.

5.-registry: The ASM registries will be read and dumped to

the image file. There will be no block consistency checks

since these files do not have ASM cache headers. To dump

one specific registry specify -filedump and include the

file object for the registry (e.g. DATA.255)

6.-noheart: Normally the heartbeat block will be saved at

discovery time and checked when the disk is scanned. A

sleep is added between discovery and scanning to ensure

there is time for the heartbeat to be written. If the

heartbeat block changes then it is most likely that the

diskgroup containing this disk is mounted by an active

ASM instance. An error and warning is generated but

operation proceeds normally. This option suppresses this

check and avoids the sleep.

7.-noxmap: This option eliminates reading of the indirect

extents containing the file extent maps. This is the bulk

of the metadata in most diskgroups. Even the entries in

the map file will be eliminated.

8.-novirtual: This option eliminates reading of any virtual

metadata. Only the physically addressed metadata will be

read. This implicitly eliminates the ACD and extent maps

so -noacd and -noxmap will be assumed.

9.-noscan: This eliminates any reading of any disks after

deep discovery. This results in just doing a deep

discovery using the disksting parameter. The report will

end after the discovery section. It is an error to

specify this option and specify a file to extract. It is

an error to specify this and -fullscan.

10.-noread: This eliminates any reading of any disks at

all. Only shallow discovery will be done. The report will

end after the discovery section. It is an error to

specify this option and specify a file to extract or

blocks to print. It is an error to specify this and -

fullscan.

Output control

Output control parameters change which output files are

created, where they are created, and how they are created.

The following options are supported.

1.-parent <path_name>: By default the dump directory is

created in the current directory, but another directory

can be specified using this option. The parent directory

for the dump directory must already exist.

2.-noacd: This option limits the dumping of the Active

Change Directory to just the control blocks that contain

the checkpoint. There is 126 MB of ACD per ASM instance

(42 MB for external redundancy). It is normally of no

interest if there has been a clean shutdown or no updates

for a while. This option avoids dumping a lot of

unimportant data. The blocks will still be read and

checked for corruption. The map file will still contain

entries for the ACD extents, but the block counts will be

zero.

3.-noimage: No image files will be created n the dump

directory. All the reads specified by the read options

will still be done. The map files may be used to find

blocks on the disks themselves. In the map file, the

count of blocks dumped, the image file sequence number,

and the byte offset in the image file will all always be

zero (C00000 S0000 B0000000000).

4.-nomap: No map file is created and no image file is

created. The only output is the report file. The -noimage

option is assumed if this is set since an image file

without a map is useless. The options -noscan and -noread

also result in no map or image files, but -nomap still

reads the metadata to check for I/O errors and corrupt

blocks.

5.-filedump: This option causes the file objects in the

command line to have their blocks dumped to the image

files rather than extracted. This can be combined with

the -novirtual option to selectively dump only some of

the metadata files. It may also be used to dump user

files (number >= 256) so that all mirrored copies can be

examined.

6.-output <file_name>: This option specifies a different

file for writing an extracted file. The file will be

overwritten if it already exists. This option requires

that exactly one file is extracted via the -extract

option. Required with -extract and -directory.

7.-noextract: This prevents files from being extracted to

an output file, but the file will be read and any errors

in selecting the correct output will be reported. This is

most useful in combination with the -compare option.

8.-nodir: No dump directory is created, and no files are

created in it. The directory name is not written to

standard out. The report file is written to standard out

before any block printouts from any -print options. This

option conflicts with -filedump. It is an error to

specify this and extract a file to the dump directory.

9.-noreport: This suppresses the generation of the report

file. It is most useful in combination with -nodir and -

print to get block printouts without a lot of clutter. It

is unnecessary to include this with -directory since no

report is generated then anyway.

10.-hex: This prints the block contents in hex without

attempting to print them as ASM metadata. This is useful

when the block is known to not be ASM metadata. It avoids

the ASM block header dump and ensures the block is not

accidentally interpreted as ASM metadata. This option

requires at least one -print option.

11.-noprint: This suppresses the printout of the block

contents for blocks printed with the -print option. It is

useful for getting just the block reports without a lot

of data. This option requires at least one -print option.

Inconsistencies

Since AMDU does not do all the checks required to mount a

diskgroup, it is possible for the disks to be inconsistent.

There may be missing disks or older stale disks. There

could be two different diskgroups with the same name. Since

the diskgroup may need crash recovery there could be

duplicate entries for the same file extent in the

allocation tables. Here are a list of the possible

inconsistencies and how they are dealt with

1.There could be two paths to the same disk. If two disks

have identical headers it is assumed they are the same

disk. The second disk is ignored and a message appears in

place of its deep discovery report.

2.There could be disks from two different diskgroups with

the same diskgroup name. An error message is given and

the disk group is not scanned. No files will be extracted

from the diskgroup and no metadata will be dumped or

printed. Use the exclude parameter to eliminate the disks

from one disk group.

3.There could be two disks in the same diskgroup with the

same disk number. This happens if a disk is dropped

force, another disk is added, and the old disk is

discovered by AMDU. Metadata will be dumped for both

disks. A file extraction will only look for extents on

the disk with the highest disk creation timestamp. The

other disk will be ignored even if it contains the only

copy of an extent.

4.There could be two AU's that are for the same file and

extent. This can happen if a relocation is incomplete.

For metadata dumping both extents are dumped. For file

extraction the contents will be compared. If they are the

same then there is no problem. If the contents differ

then the disk with the lowest disk report number will be

chosen. An error message will indicate the problem and

which disk was chosen.

5.With the -compare option the mirror copies of an extent

could differ. If this happens the primary extent will be

chosen. With high redundancy and a missing primary extent

the first secondary will be chosen. An error message will

be reported.

↧

BLOCK CORRUPTIONS ON ORACLE AND UNIX

March 1, 2016, 10:58 pm

≫ Next: Oracle ORA-00600 [25027] ORA-600 [25027]

≪ Previous: ASM Metadata Dump Utility (AMDU)

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

PURPOSE
This article discusses block corruptions in Oracle and how they are related
to the underlying operating system and hardware. To better illustrate the
discussion, Unix is taken as the operating system of reference, although similar
situations can be observed on other operating systems as well.

SCOPE & APPLICATION
For users requiring further understanding as to how a block could become
corrupted.

Block corruption has been a common occurrence on most UNIX based systems and
relational databases for many years. It is one of the most frequent ways to
lose data and cause serious business impact. Through a survey of literary
technical sources, this document will discuss several ways that block
corruptions can occur, provide conclusions and possible solutions.

To fully comprehend all the reasons for block corruptions, it is necessary to
understand how I/O device subsystems work, how memory buffers are used to
support the reading and writing of data blocks, how blocks are sized on both
UNIX and Oracle, and how these three objects work together to maintain data
consistency.

I/O devices are designed specifically for host machines and there have been
few attempts to standardize a particular interface across the industry. Most
software, including Oracle, on UNIX machines uses standard C program calls that
in turn perform system calls to support the reading and writing of data to
disk. These system calls access I/O device software that retrieves or writes
data on disk.

The UNIX system contains two types of devices, block devices and raw or
character devices. Block devices look like random access storage devices to
the rest of the system while character devices include all other devices such
as terminals and network media. (Bach, 1990 314). These device types are
important to understand because different combinations can increase corruptions.

Device drivers are configured by the operating system and the configuration
procedure generates or populates tables that form part of the code of the
kernel. This kernel to device driver interface is described by the block
device switch table and the character device switch table. Each device type
has entries in these tables that direct the kernel to the appropriate driver
interfaces for the system calls. The open and close system calls of a device
file funnel through the two device switch tables, according to file type. The
mount and umount system calls also invoke the device open and close procedures
for block devices. Read and write system calls of character special files pass
through the respective procedures in the character device switch tables. Read
and write system calls of block devices and of files on mounted file systems
invokes the algorithms of the buffer cache, which invoke the device strategy
procedure. (Bach, 1990 314). This buffer cache plays an important role in
block corruptions since it is the location where data blocks are the most
vulnerable.

The difference between the two disk interfaces is whether they deal with the
buffer cache. When accessing the block device interface, the UNIX kernel
follows the same algorithm as for regular files, except that after converting
the logical byte offset into a logical block offset, it treats the logical
block offset as a physical block number in the file system. It then accesses
the data via the buffer cache and, ultimately, the driver strategy interface.
However, when accessing the disk via the raw interface, the kernel does not
convert the byte offset into the file but passes the offset immediately to the
driver. The driver's read or write routine converts the byte offset to a
block offset and copies the data directly to the user address space, bypassing
kernel buffers.

Thus, if one process writes a block device and a second process then reads a
raw device at the same address, the second process may not read the data that
the first process had written, because the data may still be in the buffer
cache and not on disk. However, if the second process had read the block
device, it would automatically pick up the new data, as it exists in the
buffer cache. (Bach, 1990 328).

Use of the raw interface may also introduce strange behavior. If a process
reads or writes a raw device in units smaller than the block size, results are
driver-dependent. For instance, when issuing 1-byte writes to a tape drive,
each byte may appear in different tape blocks. (Bach 1990)

The advantage of using the raw interface is speed, assuming there is no
advantage to caching data for later access. Processes accessing block devices
transfer blocks of data whose size are constrained by the file system logical
block size. Furthermore, use of the block interface entails an extra copy of
data between user address space and kernel buffers, which is avoided in the
raw interface. For example, if a file system has a logical block size 1K
bytes, at most 1K bytes are transferred per I/O operation. However, processes
accessing the disk as a raw device can transfer many disk blocks during a disk
operation, subject to the capabilities of the disk controller.

Disk controllers are hardware devices that control the I/O actions of one or
more disks. These controllers can also create a bottleneck in a system.
(Corey, Abbey, Dechichio 1995). Controllers are the most frequent piece of
hardware to have and cause problems on many systems. When a system has
multiple disks controlled by one controller, the results can be fatal. The
bottleneck on controllers is a common cause of write error.

It is important to remember that Oracle and other products use these device
access methods to perform their work. It is also important to note the added
complexity that the Oracle kernel adds to the I/O game.

The Oracle Relational Database Management System (RDBMS) keeps its
information, including data, in block format. However, the Oracle data block
can be, and in most cases is, composed of several operating system blocks.

An Oracle database block is the physical unit of storage in which all Oracle
database data are stored in files. The Oracle database block size is
determined by setting a parameter called db_block_size when the
database is created. (Millsap, 1995).

The most common UNIX block is 512 bytes but the Oracle block size can range
from 512 to 32K. The difference in block sizing between the operating system
and the Oracle kernel are beneficial for Oracle; boosting performance gains
while allowing UNIX to maintain small files with minimal wasted space. The
Oracle block can be considered a superset of the UNIX file system block size.

Each block of an Oracle data file is formatted with a fixed header that
contains information about the particular block. This information provides a
means to ensure the integrity for each block and in turn, the entire Oracle
database. One component of the fixed header of a data block is called a Relative
Data Block Address (DBA). This DBA is a 4 bytes that stores the relative file
number of the Oracle database file and the Oracle block number offset relative
to the beginning of the file. (Presley, 1993).

Whenever there is a problem with the RDBA, Oracle may signal an Oracle error
ORA-1578: Data block corrupted in file # block #. This error provide information that point to where the
corruption exists.

Oracle uses the standard C system function calls to read and write blocks to
its database files. Once the block has been read it is mapped to shared
memory by the operating system, After the block has been read into shared
memory, the Oracle kernel does verification checks on the block to ensure the
integrity of the fixed header. The RDBA check is the first verification made
on the fixed header. So why do RDBAs become corrupt and how can we identify
and correct them?

Case One
--------

The first case of block corruption occurs when the block has been zeroed out. If the Oracle block
is completely zeroed out, sql statements may generate an ORA-8103 as the block type=0 is invalid
and it is not formatted as an empty block. In this case the dbverify utility (dbv) can detect it
and will produce an error message. Dbv output example:

DBVERIFY - Verification starting : FILE = /oradata/data_01.dbf
Page 307161 is marked corrupt
***
Corrupt block relative dba: 0x0644afd9 (file 0, block 307161)
Completely zero block found during dbv:

Usually the first operating system block of an Oracle block is zeroed out when
there was a software error on disk and the operating system attempted to repair
its block. In addition, disk repair utility programs have caused this zeroing out effect.

Programs that read from and write to the disk directly can destroy the
consistency of file system data. The file system algorithms coordinate disk
I/O operation to maintain a consistent view of disk data structures, including
linked lists of free disk blocks and pointer from inodes to direct and
indirect data blocks. Processes that access the disk directly bypass these if
they run while other file system activity is going on. For this reason, these
programs should not be run on an active file system. (Bach, 1990 328).

Case Two
--------

The RDBA in the physical block on disk is incorrect. It can generate an error ORA-1578
and a message in the alert.log with message "Data in bad block" as next:

***
Corrupt block relative dba: 0x56c07ac1 (file 347, block 31425)
Bad header found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x06407ac1
last change scn: 0x0000.00a02808 seq: 0x1 flg: 0x02
consistency value in tail: 0x28080601
check value in block header: 0x0, block checksum disabled
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x56c07ac1 (file 347, block 31425) found same corrupted data

Blocks are sometimes written into the wrong places in the data file. This is
called "write blocks out of sequence." This typically happens when the operating system
I/O device driver fails to write the block in the proper location that Oracle
requested via the lseek() system call.

The lseek() system call is one of the most important calls related to block
corruption. The calculations that lseek() performs are often the cause of
block problems. To understand lseek() a brief discussion of byte positioning
is necessary.

Every open file has a "current byte position" associated with it. This is
measured as the number of bytes from the start of the file. The create system
call sets the file's position to the beginning of the file, as does the open
system call. The read and write system calls update the file's position by
the number of bytes read or written. Before a read or write, an open file can
be positioned using lseek(). The format is:

lseek(int fildes, long offset, int whence);

The offset and whence arguments are interpreted as follows: If whence is 0,
the file's position is set to offset bytes from the beginning of the file. If
whence is 1, the file's position is set to its current position plus the
offset. If whence is 2, the file's position is set to the size of the file
plus the offset. The file's offset can be greater than the file's current
size, in which case the next write to the file will extend the file. Lseek()
returns a long integer byte offset of the file. (Stevens, 1990 40).

There is great opportunity for miscalculation of an offset based on the
lseek() system call. Though lseek is not the only system call culprit in the
block corruption problem, it is a major contributor.

This may also happen if the block was corrupted in memory but was written to disk.
This situation is quite rare and in most cases it is usually caused by memory
faults that go undetected. The RDBA found in the block is usually garbage and
not a valid RDBA.

If there is a possibility of memory problems on the system, the database
administrator can enable further sanity block checking by placing the
following parameters in the database instance init.ora parameter file:

db_block_checking=TRUE
db_block_checksum=TRUE / FULL (10.2+)
_db_block_cache_protect= true

db_block_checking force the Oracle RDBMS kernel to call functions that check
the block. Oracle checks a block by going through the data on the block, making
sure it is self-consistent. Block checking can often prevent memory and data corruption

db_block_checksum determines whether DBWn and the direct loader will calculate
a checksum (a number calculated from all the bytes stored in the block) and
store it in the cache header of every data block when writing it to disk.
Checksums are verified when a block is read only if this parameter is true and the
last write of the block stored a checksum. If set to FULL, DB_BLOCK_CHECKSUM also
catches in-memory corruptions and stops them from making it to the disk.

The _db_block_cache_protect=true protects the cache layer from becoming corrupted.
This parameter will prevent certain corruption from getting to disk, although
it may crash the foreground of the database instance. It will help catch
stray writes in the cache. When a process tries to write past the buffer size
in the SGA, it will fail first with a stack violation.

If the database writer process detects a corrupted block in cache prior to
writing the block to disk, it will signal an error and will crash the
database instance. The block that is corrupted is never written to disk.
After receiving such an error, simply attempt to restart the database instance.
There is no doubt that this can be a costly workaround to avoid block
corruptions. However, the workaround once a corruption has occurred can be
even costlier.

Case 3
------

A third cause for block corruption is the requested I/O not being serviced by
the operating system. The calls that Oracle makes to lseek() and read() are checked for
return error codes. In addition, Oracle checks to see the number of bytes read in by the read()
system call to ensure that the block size or a multiple of the block size was
read. Since these checks appeared to have been successful, Oracle assumes
that the direct read succeeded. Upon sanity checking, the RDBA is incorrect
and the database operation request fails. Therefore, the I/O read request
really never took place. In this case, the RDBA found can point to a block of
a different file.

Case 4
------

Another reason for block corruption is reading the wrong block from the same
device. Typically, this is caused by a very busy disk. In some cases, the
block read was off by 1 block but can range into several hundreds of blocks.
Since this occurs when the disk is very busy and under lots of
stress, try spreading datafiles across multiple disks and ensure that the disk
drive can support the load.

In the third and fourth situations, the database files will not be physically
corrupted and the operation can be tried again with success. Most diagnostics
testing will not reveal anything wrong with either the operating system or the
hardware. However, the problem is due to operating system or hardware related
problems. (Velpuri, 1995).

So what causes the operating system calls to behave the way they do and how
can companies try to minimize their risk? To evaluate these questions,
another look into how UNIX works is required.

UNIX vendors, in a attempt to speed performance, have implemented many
features into the filesystem. The filesystem manages a large cache of I/O
buffers, called the buffer cache. This cache allows UNIX to optimize read and
write operations. When a program writes data, the filesystem stores the data
in a buffer rather that writing it to disk immediately. At some later point
in time, the system will send this data to the disk driver, together with
other data that has accumulated in the cache. In other words, the buffer
cache lets the disk driver schedule disk operations in batches. It can make
larger transfers and use techniques such as seek optimization to make disk
access more efficient. This is called write-behind.

When a program reads data, the system first checks the buffer cache to see if
the desired data is already there. If the data is already in the buffer
cache, the filesystem does not need to access the disk for those blocks. It
just gives the user the data it found in its buffer, eliminating the need to
wait for a disk drive. The filesystem only needs to read the disk if the data
isn't already in the cache. To increase efficiency even further, the
filesystem assumes the program will read the file consecutively and read
several blocks from the disk at once. This increases the likelihood that the
data for future read operations will already be in the cache. (Loukides, M.,
1990) This also increases the chance of block corruption.

As a filesystem gets busy and buffers are being read, modified, written, and
aged out of the cache the chance of the kernel reading or writing the wrong
block increases. Also, the more complex the scheme to read from and write to
disk, the greater the likelihood of function failure.

The UNIX kernel uses the strategy interface to transmit data between the
buffer cache and a device, although the read and write procedures of character
devices sometime use their block counterpart strategy procedure to transfer
data directly between the device and the user address space. The strategy
procedure may queue I/O jobs for a device on a work list or do more
sophisticated processing to schedule I/O jobs. Drivers can set up data
transmission for one physical address or many, as appropriate. The UNIX
kernel passes a buffer header address to the driver strategy procedure. The
header contains a list of addresses and sizes for transmission of data to or
from the device. This is also how the swapping operations work. For the
buffer cache, the kernel transmits data from one address; when swapping, the
kernel transmits data from many data addresses. If data is being copied to or
from the user's address space, the driver must lock the process in memory
until the I/O transfer is complete.

The kernel loses control over a buffer only when it waits for the completion
of I/O between the buffer and the disk. It is conceivable that a disk drive
is corrupt so that it cannot interrupt the CPU, preventing the kernel from
ever releasing the buffer. There are processes that monitor the hardware for
such cases and zero out the block and return an error to the kernel for a bad
disk job. (Bach, 1990 52).

On the UNIX level there are several utilities that will check for bad disk
blocks and zero out any blocks they find corrupted. These utilities do not
realize that the block in question may be an Oracle RDBMS block and zero out
the block by mistake.

In (Silberschatz, Galvin, 1994), the authors consider the possible effect of a
computer crash. In this case, the table of opened files is generally lost,
and with it any changes in the directories of opened files. This event can
leave the file system in an inconsistent structure. Frequently, a special
program is run at reboot time to check for and correct disk inconsistencies.

The consistency checker compares the data in the directory structure with the
data blocks on disk, and tries to fix and inconsistencies it finds.
(Silberschatz, Galvin, 1994) This will often result in the reformatting of
blocks which will cause the Oracle block information to be removed. This will
definitely cause Oracle corruption.

It is important to realize that monitoring of hardware is required for all
operating systems. Hardware monitors can sense electrical signals on the
busses and can accurately record them even at high speed. A hardware monitor
keeps observing the system even when it is malfunctioning, and thus, it can be
used to debug the system. (Jain, 1991 99) These tools can help determine the
cause of the problem and detect problems like controller error and media
faulting which are frequent corruption contributors.

In any case, there are many opportunities for blocks, either on disk or in the
buffer cache, to become corrupt. Fixing the corruption can sometimes provide
even greater opportunities.

Conclusion
----------

Data block corruption is an ongoing problem on all operating systems,
especially UNIX. There are many types and causes of corruptions to consider.
Advanced system configurations can increase the chance and hardware problems
are a common source of corruptions. When receiving block corruption errors,
remember that a couple of them are not physical corruptions but memory
corruptions that are never written to disk.

Oracle Customer Support provides a number of bulletins on block corruption
problems that help recover what is left of the data once corruption has
occurred. If block corruption occurs on a machine, be sure to identify the
type of corruption and establish a plan for its correction.

[1] Bach, M. (1990). The Design of the UNIX Operating System.
The I/O Subsystem 328.
[2] Corey, M., Abbey, M., Dechichio, D. (1995). Tuning Oracle 52.
[3] Jain, R. (1991). The Art of Computer Systems Performance Analysis. 99
[4] Loney, K. (1994). Oracle DBA Handbook. 23.
[5] Loukides, M., (1990) System Performance Tuning. 161-162.
[6] Millsap, C. (1995). Oracle7 Server Space Management. 1-2.
[7] Presley, D. (1993). Data Block Corruption Detection. Oracle Corporation.
[8] Silberschatz A., Galvin P. (1994) Operating System Concepts. 404.
[9] Stevens, W. (1990). UNIX Network Programming. 163.
[10] Velpuri, R. (1995). Oracle Backup and Recovery Handbook. 286

↧

Oracle ORA-00600 [25027] ORA-600 [25027]

March 1, 2016, 11:24 pm

≫ Next: Summary of Bugs Containing ORA-00600 [2662] ORA-600 [2662]

≪ Previous: BLOCK CORRUPTIONS ON ORACLE AND UNIX

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [25027] [a] [b]

VERSIONS:
versions 9.2 and above
DESCRIPTION:
An invalid Tablespace Number (TSN) and/or Relative File Number (RFN) has been found

ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)

FUNCTIONALITY:
Kernel File management Tablespace component
IMPACT:
PROCESS FAILURE
POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

1. If the Arg [b] (the RDBA) is 0 (zero), then this could be due to fake indexes.

The following query will list fake indexes:

select do.owner,do.object_name, do.object_type,sysind.flags
from dba_objects do, sys.ind$ sysind
where do.object_id = sysind.obj#
and bitand(sysind.flags,4096)=4096;

If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error.

Run analyze table validate structure on the table referenced in the Current SQL statement in
the related trace file.

If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Known Bugs

NB	Bug	Fixed	Description
	14010183	11.2.0.3.BP22, 11.2.0.4.BP03, 12.1.0.2, 12.2.0.0	ORA-600 [ktspfundo:objdchk_kcbgcur_3] in SMON after failed temp segment merge load
	13503554	11.2.0.4, 12.2.0.0	Various ORA-600 errors crashing the apply process in a downstreams environment
	13785716	11.2.0.4, 12.1.0.1	Intermittent ORA-600 [25027] during upgrade from 10.2 to 11.2
	11661824	11.2.0.1.BP09	Assorted Dumps by SQL*LOADER using DIRECT and PARALLEL after exadata bp8 is applied
	10067246	12.2.0.0	ORA-600 [25027] ORA-7445 [kauxs_do_dml_cooperation] by CREATE INDEX ONLINE
	14138130	11.2.0.3.5, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1	SGA memory corruption / ORA-7445 when modifying uncompressed blocks of an HCC-compressed segment
	13330018	11.2.0.4, 12.1.0.1	ora-600 [ktspfmb_add1], [4294959240] occurred, then cannot recover with ora-600[25027]
	13103913	11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1	ORA-600 [25027] [ts#] [1] or false ORA-1 during dml while index is being rebuilt online
	10394825	11.2.0.3, 12.1.0.1	ORA-600[25027] [..] [0] inserting to ASSM segment
	10329146	11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1	Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+	10209232	11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1	ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
+	9399991	11.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1	Assorted Internal Errors and Dumps (mostly under kkpa/kcb) from SQL against partitioned tables
*	9145541	11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1	OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
	8837919	11.2.0.2, 12.1.0.1	DBV / RMAN enhanced to detect ASSM blocks with ktbfbseg but not ktbfexthd flag set as in Bug 8803762
	8803762	11.1.0.7.6, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1	ORA-600[kdsgrp1], ORA-600[25027] or wrong results on 11g database upgrade from 9i
	8716064	11.2.0.2, 12.1.0.1	Analyze Table Validate Structure fails on ADG standby with several errors
+	8597106	11.2.0.1.BP06, 11.2.0.2, 12.1.0.1	Lost Write in ASM when normal redundancy is used
	7251049	11.2.0.1.BP08, 11.2.0.2, 12.1.0.1	Corruption in bitmap index introduced when using transportable tablespaces
	8437213	10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1	ASSM first level bitmap block corruption
	8356966	11.2.0.1	ORA-7445 [kdr9ir2rst] by DBMS_ADVISOR or false ORA-1498 by ANALYZE on COMPRESS table
*	8198906	10.2.0.5, 11.2.0.1	OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
*	7263842	10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1	ORA-955 during CTAS / OERI [ktsircinfo_num1] / dictionary inconsistency for PARTITIONED Tables
	6666915	10.2.0.5, 11.1.0.7, 11.2.0.1	OERI[25027] / dictionary corruption from concurrent partition DDL
	6025993	10.2.0.5, 11.1.0.6	ORA-600 [25027] in flashback archiving queries
	4925342	9.2.0.8, 10.2.0.3, 11.1.0.6	OERI [25027] / OERI [25012] on IOT analyze estimate statistics
*	7190270	10.2.0.4.1, 10.2.0.5	Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
	4310371	9.2.0.8, 10.2.0.2	OERI [25027] from concurrent startup / shutdown in RAC
	4177651	10.2.0.1	Row migration within a MERGE may OERI[25027]
	4020195	10.1.0.5, 10.2.0.1	OERI 25027 can occur in RAC accessing transported tablespace
	4000840	9.2.0.7, 10.1.0.4, 10.2.0.1	Update of a row with more than 255 columns can cause block corruption
	3963135	10.1.0.5, 10.2.0.1	OERI[kcbgcur_3] / OERI:25027 during bitmap index updates
	3829900	10.1.0.4, 10.2.0.1	OERI[25027] possible accessing index in 10g
	2942185	9.2.0.6, 10.1.0.4, 10.2.0.1	Corruption occurs on direct path load into IOT with ADDED columns
	3085057	10.1.0.2	ORA-600: [25027] from ALTER TABLE .. SHRINK SPACE CASCADE
	2926182	9.2.0.5, 10.1.0.2	OERI[25027] / ORA-22922 accessing LOB columns in IOT in AFTER UPDATE trigger

↧

Summary of Bugs Containing ORA-00600 [2662] ORA-600 [2662]

March 1, 2016, 11:45 pm

≫ Next: ORA-00600 [2662] ORA-600 [2662] “Block SCN is ahead of Current SCN”

≪ Previous: Oracle ORA-00600 [25027] ORA-600 [25027]

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Purpose

The purpose of this Note is to explain bugs filed for ORA-00600 [2662] error against specific Oracle database versions, and explain the symptoms ofeach bug, workarounds if any and references the patch available at the time this article was written.
Scope
This article is a consolidated effort to summarize the top bugs reported (error) which have been fixed. It is directed towards Oracle Support Analystsand Oracle Customers to have an overview of various bugs logged for the same error
Error Description:

The ORA-600 [2662] is raised when data block SCN is ahead of the current SCN.
This is generally related to the redo application which is used to bring the database to a consistent state.
Summary of Bugs Containing ORA-00600 [2662]

Bug 4453449
Abstract: Flashback to guaranteed restore point in orphan inc may result in ORA-600[3020]
Versions affected: 10.2.0.1
Fixed in versions: 10.2.0.2 & 11.0
Backportable: Yes

Symptoms:

The symptom of this bug include ORA-600[3020], ORA-600[2662] after flashback
database and ORA-600[flashback_validation] during flashback database.
There may also be other symptoms.

Details:
ORA-600[3020] / ORA-600 [2662] / ORA-600 [flashback_validation] can occur
after/during multiple flashback/recovery through multiple database resetlogs
without opening the database. There may also be other symptoms which appear as
recovery related corruption errors.
Workaround:
1. If you flashback a crashed primary database, follow flashback database with open
resetlogs. Alternatively, if you’d like to completely undo flashback database,
follow flashback database with recover database without shutting down the
instance first.
2. Restore backup and recover.
Patch details:
Currently there is no one-off patch available for any platform and versions.
Bug 2899477 (Unpublished)
Abstract:ORA-600[2662] CAUSES INSTANCE CRASH
Versions affected: 9.2.0.4
Fixed in versions: 9.2.0.4 & 10.1
Backportable: Yes
Symptoms:
When you have a corrupted SCN and if the corruption is found in selexe,
getting uninitialized selenv from opiexe, then this may be the bug.

One-off patch available for few platforms on top of 9.2.0.4
Check the Metalink for Patch 2899477 availability.
Bug 2764106
Abstract: ORA-600 [2662] BRINGS THE DATABASE DOWN
Versions affected: 8.1.7.4 & 9.2.0.4
Fixed in versions: 9.2.0.5 & 10.1
Backportable: Yes
Symptoms:
OERI(2662) even The dependent scn present in the disk blocks are fine.
Details:
A false ORA-600 [2662] error can occur on SELECT operations
which can result in an instance crash even though there is no
underlying problem with the on disk SCN.
Workaround:
None
Patch details:
One-off patch available for few platforms on top of 8.1.7.4 & 9.2.0.4
Check the Metalink for Patch 2764106 availability.
Bug 2216823 (Unpublished)
Abstract:OERI(2662) REPORTED WHEN REUSING TEMPFILE WITH RESTORED DB
Versions affected: 9.2.0
Fixed in versions: 10.1.0
Backportable: No
Symptoms:
eg:
1. Create a TEMP tablespace.
2. Shutdown a database.
3. Copy control file, data files, and log files to another directory
(but not tempfile).
4. Restart a database.
5. Create a temporary table and insert into it, thereby causing tempfile
to be updated.
6. Shutdown a database.
7. Restore a database.
8. Restart a database.
9. Create a temporary table and insert into it.
10. Commit
^- ORA-600 [2662]
Details:
ORA-600 [2662] can occur when reusing a TEMPFILE with
a restored database.
Workaround:
The workaround is not to use the pre-existing tempfile.
Instead either backup the tempfile with rest of the database
or remove the tempfile then recreate a new tempfile once the
database is open.
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 2054025 (Unpublished)

Abstract:ORA-600 [2662] RELATED TO KDIT.C
Versions affected: 9.0.1.2
Fixed in versions: 9.0.1.3 9.2.0.1
Backportable: No
Symptoms:
OERI:2662 possible on new TEMPORARY index block
Details:
ORA-600 [2662] possible on new TEMPORARY index block
Workaround:
None
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 851959
Abstract : ORA-600 [2662] OCCURRED DURING CREATE SNAPSHOT AT MASTER SITE
Details :
It is possible to get ORA-600 [2662] caused by mis-adjustment of the Oracle7 SCN (in PARALLEL SERVER mode) when an Oracle8 instance selects from
it over a DBLINK
Version affected : 7.3.4.X
Fixed in version: 7.3.4.5
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 647927 (Unpublished)
Abstract : LOCK PROCESS DIES WITH ORA-600 [2662], [0], [40057943], [0], [40063994]
Version affected 8.0.4.X
Fixed in version : 8.0.4.2 8.0.5.0
Symptoms :
Digital Unix ONLY: OERI:2662 could occur under heavy load
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 5612217 (Unpublished)
Abstract : ORA-7445 [KDKBIN] LEADING TO ORA-600 [2662] DUE TO BUFFER CORRUPTION
Version affected : 9.2.0.X
Workaround :
None
Patch details :
One-off patch available for few platforms on top of 9.2.0.7
Check the Metalink for Patch 5612217 availability.
Bug 4599505 (Unpublished)
Abstract : ORA-600 [2662] error
Version affected : 10.2.0.X

Fixed in version : 11.0
Symptoms :
ORA-600[2662] after flashback database.
Workaround :
This problem may disappear by itself after the database has been opened for a while and its SCN has passed the SCN of the problematic block. This is
however not a guaranteed workaround
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 2998110
Abstract :ORA-600 [2662] LARGE QUERIES ON STANDBY WITH LOCALLY MANAGED TMP TBLSP
Version affected : 9.2.0.X 10.1.0.X
Fixed in version : 10.2
Symptoms :
The scn of the tempfiles is advanced but not on any other files
when the database is opened in read only mode.
Workaround :
1) Increase the sort_area_size to avoid sort on disk thus avoiding the use of the tempfiles
–OR–
2) After opening the database read only and BEFORE executing any queries
against the standby database, drop and recreate the tempfiles.
–OR–
3) If you are on 10.1 release you can set the following parameter:
_init_tempfile_on_open=TRUE
in your init.ora/spfile and bounce the database.
Setting this parameter will clear all tempfile bitmaps when the database is opened
so the database open may be take a little longer.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
This bug is fixed in 10.2 and is not backportable to previous releases.
Note 356583.1 has been linked to this scenario.
Bug 3517013 (Unpublished)
Abstract :OPEN DB RESETLOG AFTER FLASHBACK DB FAILS ORA-600 [KCLCHKBLK_4], [1904]
Symptoms :
1) When restored the database from backup and did an incomplete recovery.
2) Opened the database with resetlogs.
3) After opening the database, you start getting following errors:
ORA-00600 [kclchkblk_4]
ORA-00600 [2662]
4) Stack trace is:- kclchkblk kcbzib kcbgcur ktfbhget ktftfcload
Cause :
1)
Error, ORA-600[KCLCHKBLK_4], is signaled because the SCN in a tempfile block
is too high. The same reason caused the ORA-600[2662]s in the alert logs.
2)
This issue is because the tempfiles may not get reinitialized during open
resetlogs.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Note 275902.1 has been linked to this scenario and solution
given under this note.

Many other bugs were filed with development for this issue.
Those bugs are not progressed due to
— Lack of response from the customers
— one-time occurances
— Vendor OS Problem
Disclaimer :
This note contains most frequently hit bugs that can throw the error ORA-00600 [2662] . However the above mentioned are not the complete list of
bugs that can generate this error

↧

ORA-00600 [2662] ORA-600 [2662] “Block SCN is ahead of Current SCN”

March 2, 2016, 12:19 am

≫ Next: Oracle ORA-600[4000] ORA-00600[4000]

≪ Previous: Summary of Bugs Containing ORA-00600 [2662] ORA-600 [2662]

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [2662] [a] [b] [c] [d] [e]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:

A data block SCN is ahead of the current SCN.
The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN stored in a UGA variable.
If the SCN is less than the dependent SCN then we signal the ORA-600 [2662] internal error.

ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

FUNCTIONALITY:

File and IO buffer management for redo logs

IMPACT:
INSTANCE FAILURE

POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

There are different situations where ORA-600 [2662] can be raised.

It can be raised on startup or duing database operation.

If not using Parallel Server, check that 2 instances have not mounted the same database.
Check for SMON traces and have the alert.log and trace files ready to send to support.
Check the SCN difference [argument d]-[argument b].

If the SCNs in the error are very close, then try to shutdown and startup the instance several times.
In some situations, the SCN increment during startup may permit the database to open. Keep track of the number of times you attempted a If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.
Known Issues:

NB Bug Fixed Description
4453449 10.2.0.2, 11.1.0.6 OERI:3020 / corruption errors from multiple FLASHBACK DATABASE
5889016 Corruption / OERI during recovery
2899477 9.2.0.5, 10.1.0.2 Minimise risk of a false OERI[2662]
2764106 9.2.0.5, 10.1.0.2 False OERI[2662] possible on SELECT which can crash the instance
2216823 10.1.0.2 OERI [2662] reusing a TEMPFILE with a restored database
2054025 9.0.1.3, 9.2.0.1 OERI:2662 possible on new TEMPORARY index block
P 647927 8.0.4.2, 8.0.5.0 Digital Unix ONLY: OERI:2662 could occur under heavy load
851959 7.3.4.5 OERI:2662 possible from distributed OPS select

INTERNAL ONLY SECTION – NOT FOR PUBLICATION OR DISTRIBUTION TO CUSTOMERS
========================================================================
There were 2 forms of this error until 7.2.3:
Type I: 4/5 argument forms –
The SCN found on a block (dependent SCN) is ahead of the
current SCN. See below for this
Type II: 1 Argument (before 7.2.3 only):
Oracle is in the process of writing a block to a log file.
If the calculated block checksum is less than or equal to 1
(0 and 1 are reserved) ORA-600 [2662] is returned.
This is a problem generating an offline immediate log marker
(kcrfwg).
*NOT DOCUMENTED HERE*
Type I
~~~~~~
a. Current SCN WRAP
b. Current SCN BASE
c. dependent SCN WRAP
d. dependent SCN BASE
e. Where present this is the DBA where the dependent SCN came from.
From kcrf.h:
If the SCN comes from the recent or current SCN then a dba
of zero is saved. If it comes from undo$ because the undo segment is
not available then the undo segment number is saved, which looks like
a block from file 0. If the SCN is for a media recovery redo (i.e.

block number == 0 in change vector), then the dba is for block 0
of the relevant datafile. If it is from another database for a
distributed transaction then dba is DBAINF(). If it comes from a TX
lock then the dba is really usn<<16+slot.
Type II
~~~~~~~
a. checksum -> log block checksum – zero if none (thread # in old format)
—————————————————————————
Diagnosis:
~~~~~~~~~~
In addition to different basic types from above, there are different
situations where ORA-600 [2662] type I can be raised.
Getting started:
~~~~~~~~~~~~~~~~
(1) is the error raised during normal database operations (i.e. when the
database is up) or during startup of the database?
(2) what is the SCN difference [d]-[b] ( subtract argument ‘b’ from arg ‘d’)?
(3) is there a fifth argument [e] ?
If so convert the dba to file# block#
Is it a data dictionary object? (file#=1)
If so find out object name with the help of reference dictionary
from second database
(4) What is the current SQL statement? (see trace)
Which table is refered to?
Does the table match the object you found in previous step?
Be careful at this point: there may be no relationship between DBA in [e]
and the real source of problem (blockdump).
Deeper analysis:
~~~~~~~~~~~~~~~~
(1) investigate trace file:
this will be a user trace file normally but could be an smon trace too
(2) search for: ‘buffer’
(“buffer dba” in Oracle7 dumps, “buffer tsn” in Oracle8/Oracle9 dumps)
this will bring you to a blockdump which usually represents the
‘real’ source of OERI:2662
WARNING: There may be more than one buffer pinned to the process
so ensure you check out all pinned buffers.
-> does the blockdump match the dba from e.?
-> what kind of blockdump is it?
(a) rollback segment header
(b) datablock
(c) other
Check list and possible causes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If Parallel Server check both nodes are using the same lock manager
instance & point at the same control files.
Possible causes:
(1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled
(2) a hardware problem, like a faulty controller, resulting in a failed
write to the control file or the redo logs
(3) restoring parts of the database from backup and not doing the
appropriate recovery
(4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP
CONTROLFILE
(5) having _DISABLE_LOGGING set during crash recovery
(6) problems with the DLM in a parallel server environment
(7) a bug

Solutions:
(1) if the SCNs in the error are very close, attempting a startup several
times will bump up the dscn every time we open the database even if
open fails. The database will open when dscn=scn.
(2)You can bump the SCN either on open or while the database is open
using Event:ADJUST_SCN (see Note:30681.1).
Be aware that you should rebuild the database if you use this
option.
Once this has occurred you would normally want to rebuild the
database via exp/rebuild/imp as there is no guarantee that some
other blocks are not ahead of time.
Articles:
~~~~~~~~~
Solutions:
Note:30681.1 Details of the ADJUST_SCN Event
Note:1070079.6 Alter System Checkpoint
Possible Causes:
Note:1021243.6 CHECK INIT.ORA SETTING _DISABLE_LOGGING
Note:41399.1 Forcing the database open with `_ALLOW_RESETLOGS_CORRUPTION`
Note:851959.9 OERI:2662 DURING CREATE SNAPSHOT AT MASTER SITE
Known Bugs:
~~~~~~~~~~~
Fixed In. Bug No. Description
———+————+—————————————————-
7.1.5 BUG:229873
7.1.3 Bug:195115 Miscalculation of SCN on startup for distributed TX ?
7.1.6.2.7 Bug:297197 Port specific Solaris OPS problem
7.3 Bug:336196 Port specific IBM SP AIX problem -> dlm issue
7.3.4.5 Bug:851959 OERI:2662 possible from distributed OPS select
Not fixed Bug:2216823 OERI:2662 reported when reusing tempfile with restored DB
8.1.7.4 Bug:2177050 OERI:729 space leak possible (with tags “define var info”/”oactoid info”)
can corrupt UGA and cause OERI:2662

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
2662 2662 2662 2662 2662 2662 2662 2662 2662
2662 2662 2662 2662 2662 2662 2662 2662 2662

↧

Oracle ORA-600[4000] ORA-00600[4000]

March 2, 2016, 1:24 am

≫ Next: Oracle ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”

≪ Previous: ORA-00600 [2662] ORA-600 [2662] “Block SCN is ahead of Current SCN”

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.4 to 11.1.0.7
Information in this document applies to any platform.

Purpose

Symptoms

Database fails to start because of ora-600[4000].

Alert.log will show:

Errors in file /oracle/admin/sdwh/udump/sdwh_ora_13186.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Tue Sep 9 14:48:04 2008
Error 704 happened during db open, shutting down database
sdwh_ora_13186.trc shows:
*** 2008-09-09 15:33:26.194
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
..
..
row cache parent object: address=0xc9efb27c cid=3(dc_rollback_segments)
hash=35e74caf typ=5 transaction=(nil) flags=00000001
own=0xc9efb2f0[0xc7c83ba0,0xc7c83ba0] wat=0xc9efb2f8[0xc9efb2f8,0xc9efb2f8] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
data=
00000001 ….
BH (0x0x6ffff4ac) file#: 1 rdba: 0x0040007a (1/122) class 1 ba: 0x0x6ff8a000
set: 17 dbwrid: 0 obj: 18 objn: 18
hash: [74ffdc70,c85d94cc] lru: [6ffffad4,c771aabc]
ckptq: [NULL] fileq: [NULL]
use: [c84043f0,c84043f0] wait: [NULL]
st: XCURRENT md: SHR rsop: 0x(nil) tch: 0

LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0]
Using State Objects
—————————————-
SO: 0xc84043d0, type: 24, owner: 0xc722382c, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0x0xc71d1440 FLG: 0x500400
lock rls: 0x(nil), class bit: 0x(nil)
kcbbfbp: [BH: 0x0x6ffff4ac, LINK: 0x0xc84043f0]
where: kdswh02: kdsgrp, why: 0
buffer tsn: 0 rdba: 0x0040007a (1/122)
scn: 0x0000.15ad85b0 seq: 0x01 flg: 0x06 tail: 0x85b00601
frmt: 0x02 chkval: 0xabfc type: 0x06=trans data
Block header dump: 0x0040007a
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.15ad85ad itc: 1 flg: – typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0001.027.000056dc 0x0080d065.16f2.14 –U- 1 fsc 0x0000.15ad85b0
Trace file shows _SYSSMU1$ has a TX against obj$, and the scn ofthe block touched by this TX is scn:
0x0000.15ad85b0 –> 363693488 decimal.
The ora-600[4000] could be raised at startup if the above scn is ahead of the database SCN.
Last Review Date
October 3, 2008
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are
included in the document to assist in troubleshooting.
Troubleshooting Details
1) Find database SCN
SQL> startup mount
SQL> select checkpoint_change# from v$database;
2) SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
3) set parameter _minimum_giga_scn=<result from 2> in the init.ora file.
Using the above trace file example, we found:
SQL> select checkpoint_change# from v$database;
355532971
As suspected the database scn = 355532971 is lower than TX scn=363693488.
SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
Enter value for decimal_scn_expected: 363693488
old 1: select ceil(&decimal_scn_expected/1024/1024/1024) from dual
new 1: select ceil(363693488/1024/1024/1024) from dual

CEIL(363693488/1024/1024/1024)
——————————
1
1) set parameter _minimum_giga_scn=1 in the init.ora file.
2) open the database
startup mount
recover database
alter database open;
4) Startup database
SQL> startup mount
SQL> recover database
SQL> alter database open;
5) If database opens:
– remove parameter _minimum_giga_scn from init.ora and bounce database
SQL> shutdown immediate
SQL> startup
6) Investigate what could cause the ora-600[4000] , could be because customer forced to open database
using _allow_resetlogs_corruption, and if this is the case we strongly suggest to recreate the database
from scratch taking a full export.

ORA-600 [4000] “trying to get dba of undo segment header block from usn”

Format: ORA-600 [4000] [a]
VERSIONS:
version 6.0 to 9.2
DESCRIPTION:
This has the potential to be a very serious error.
It means that Oracle has tried to find an undo segment number in the
dictionary cache and failed.
ARGUMENTS:
Arg [a] Undo segment number
FUNCTIONALITY:
KERNEL TRANSACTION UNDO
IMPACT:
INSTANCE FAILURE – Instance will not restart
STATEMENT FAILURE
SUGGESTIONS:
As per Note 1371820.8, this can be seen when executing DML on tables residing
in tablespaces transported from another database.
It is fixed in 8.1.7.4, 9.0.1.4 and 9.2.0.1 The workaround however is to
create more rollback segments in the target database until the highest
rollback segment number (select max(US#) from sys.undo$;) is at least
as high as in equivalent max(US#) from the source database.
It has also been seen where memory has been corrupted so try shutting
down and restarting the instance.
If the database will not start contact Oracle Support Services
immediately, providing the alert.log and associated trace files

NB Bug Fixed Description
* 9145541 11.1.0.7.4, 11.2.0.1.2, OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile

* 9145541 11.1.0.7.4, 11.2.0.1.2,
11.2.0.2, 12.1.0.0
OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile
after CREATE CONTROLFILE in 11g
+ 10425010 11.2.0.3, 12.1 Stale data blocks may be returned by Exadata FlashCache
12353983 ORA-600 [4000] with XA in RAC
7687856 11.2.0.1 ORA-600 [4000] from DML on transported ASSM tablespace
2917441 11.1.0.6 OERI [4000] during startup
3115733 9.2.0.5, 10.1.0.2 OERI[4000] / index corruption can occur during index coalesce
2959556 9.2.0.5, 10.1.0.2 STARTUP after an ORA-701 fails with OERI[4000]
1371820 8.1.7.4, 9.0.1.4, 9.2.0.1 OERI:4506 / OERI:4000 possible against transported tablespace
+ 434596 7.3.4.2, 8.0.3.0 ORA-600[4000] from altering storage of BOOTSTRAP$

Bug 1362499
ORA-600 [4000] after migrating 7.3.4.3 to 8.0.6.1 on HP-UX 32-bit
Specific to HP-UX, fixed in one-off patch

Historic info on the Oracle 7.3.x issues re unlimited extents and bootstrap$
In 7.3.4 then due to Bug:434596, this can result from altering the
SYS.BOOTSTRAP$ table.
When a SHUTDOWN command follows this, the database will not startup again.
Example: Any of following modifications of SYS.BOOTSTRAP$
will cause this error:
ALTER TABLE BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED );
ALTER TABLE BOOTSTRAP$ STORAGE (NEXT 1024);
ALTER TABLE SYS.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
ALTER TABLE sys.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
A lock byte is now set on the SYS.BOOTSTRAP$ segment header and
following shutdown the database will not start.
A select from bootstrap$ before shutdown will cleanout the lock on
the SYS.BOOTSTRAP$ segment header and prevent the errors from occuring.
Example: Issue the following BEFORE shutdown:
sql> select count(*) from sys.bootstrap$;
Get a backup history of the Database/s and the exact sequence of steps performed.
Two possible options
a) Go back to backup before the storage clause on BOOTSTRAP$ was changed
b) Oracle Support may be able to patch bootstrap$. See Note:43132.1
Obviously, option a) is always the way to go if at all possible.
Articles:
ALERT about changing MAXEXTENTS to UNLIMITED Note:50380.1
Another cause of an ORA-600 [4000] is that a block scn is ahead of the database scn.
In that case the block with the high scn could be printed in the trace file and

Event ADJUST_SCN or parameter _MINIMUM_GIGA_SCN Note:552438.1 can be used to bump the SCN.

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000

↧

Oracle ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”

March 2, 2016, 1:35 am

≫ Next: Oracle ORA-00600 [4193] ORA-600 [4193] “seq# mismatch while adding undo record”

≪ Previous: Oracle ORA-600[4000] ORA-00600[4000]

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [4194] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and rollback (Undo) records.

We are validating the Undo record number relating to the change being
applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails.

ARGUMENTS:
Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

FUNCTIONALITY:
Kernel Transaction Undo called from Cache layer

IMPACT:
PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

NB Bug Fixed Description
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction
3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC
+ 792610 8.0.6.0, 8.1.6.0 Rollback segment corruption OERI:4194 can occur if block checking detects a
corrupt block

Historic information:
7.3.3 to 8.1.5
==============
Note:69863.1 ALERT: Apparent data corruptions involving Solaris 2.6,
ISM & DR on Starfire
Check USE_ISM parameter on SUN Solaris E10000 Platforms.
ORA-600 [4194] [a] [b]
Versions: 6.0 – 9.2 Source: ktuc.c
===========================================================================
Meaning:
Undo record number mismatch while adding an undo record to an undo
block. This is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhcnt): undo record count – This is the maximum number of undo
records that have ever existed
within this Undo Block. In other
words, it is the High Water Mark for
undo records in that undo block.
This is from the Undo Block.
b. (ktudbrec): redo record number – This is the record number for the
new undo record that is to be added
to the undo block. It should be
one greater than the maximum in the
undo block currently. This is from
the Redo Record.
—————————————————————————
Diagnosis:

This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block), we check that the number of undo
records in the undo block +1 matches the record number in the redo
record. Because we are adding a new undo record, we know that the record
number in that undo block must be one greater than the maximum number in
that block.
So for UBA=0x08000592.00a0.0b
0x08000592 is the dba of the undo block.
0x00a0 is the seq# number that is in the block that THIS UNDO IS TO
BE APPLIED TO.
0x0b is the number of undo records in the undo block.
In the header this looks like:
UNDO BLK::
xid: 0x0004.00e.0000017f seq: 0x00a0 cnt: 0x0b ……..
Since we are adding a new undo record to our undo block, we would expect
that the new record number is equal to the maximum record number in the
undo block +1. If this is not the case, we get ORA 600 [4194].
This implies some kind of block corruption in either the redo or the
undo block. Look for other errors that would imply that a block is
corrupted.
Note: If the ORA-4194 follows another ORA-600 AND IF AND ONLY IF
the arguments [a] and [b] are the same, then this MAY be due
to Bug:792610 which can cause undo corruption following a
failed block change.
Note:452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0.
Bugs must be closed or hold useful information.)
Fixed In. Bug No. Description
———+————+—————————————————-
8.0.6/8.1.6 Bug:792610 ORA-600 during redo application to a block may
in turn cause an OERI:4194 on the undo block.
E.g., block checking noticing a corrupt index
block during a multi-row insert.
7.1.5 Bug:239671 Truncate (could possibly happen on other
operations too) on 16k+ block size can cause
the maximum number of undo records in a block
(255) to be exceeded.

Workarounds: Use < 16K blocksize, or avoid
using the TRUNCATE command with the DROP
STORAGE option (which is the default).
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194

↧

Oracle ORA-00600 [4193] ORA-600 [4193] “seq# mismatch while adding undo record”

March 2, 2016, 1:40 am

≫ Next: Oracle ORA-600 [4097] ORA-00600 [4097] “Corruption”

≪ Previous: Oracle ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [4193] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and Rollback (Undo) records.
We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.

This error is reported when this validation fails.
ARGUMENTS:

Arg [a] Undo record seq number
Arg [b] Redo record seq number

FUNCTIONALITY:

KERNEL TRANSACTION UNDO

IMPACT:

PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

This error may indicate a rollback segment corruption.
This may require a recovery from a database backup depending on the situation.

NB Bug Fixed Description
14034244 11.2.0.3.BP09,
12.1.0.0 Lost write type corruption using ASM in 11.2.0.3
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

ORA-600 [4193] [a] [b] [ ] [ ] [ ]
Versions: 7.2.2 – 9.2.0 Source: ktuc.c
===========================================================================
Meaning: seq# mismatch while adding an undo record to an undo block. This
is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhseq): undo record seq# – this is the seq# of the block that
this undo record WILL BE APPLIED TO.
This is from the Undo Block. It is
NOT the seq# of the undo block itself.
b. (ktudbseq): redo RECORD seq# – this is the seq# number in the block
that this redo WILL BE APPLIED TO.
This is from the Redo Record.
—————————————————————————
Diagnosis:
This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block) we check that the seq# in the undo
record matches the seq# in the redo record. These seq# should be the
same because when we apply a redo record we must apply it to the
correct version of the block. We can only apply a redo record to a
block that contains the same seq# as in the redo record.
If the seq# do not match then this error is raised. This implies some
kind of block corruption in either the redo or the undo block.
7.3.x – 8.1.7.x
ASSERT2(ubh->ktubhseq == db->ktudbseq, OERI(4193), KSESVSGN,
ubh->ktubhseq, db->ktudbseq);
9.2.x
ksesic2(OERI(4193), ksenrg(ubh->ktubhseq), ksenrg(db->ktudbseq));
struct ktubh
{
kxid ktubhxid; /* txid of tx currently using or last used this block */
ub2 ktubhseq; /* undo block sequence number */
ub1 ktubhcnt; /* high water mark record index, number of undo entries */

ub1 ktubhirb; /* rollback record index, rec index to start the rollback */
ub1 ktubhicl; /* collecting record index, rec index to start retrieving col info */
ub1 ktubhflg; /* dummy */
ub2 ktubhidx[1]; /* byte offset of record in block, grows at runtime */
};
struct ktudb Kernel Transaction Undo Data operation Block (redo)
{
ub2 ktudbsiz; /* size of entry */
ub2 ktudbspc; /* verification: space left in undo block */
ub2 ktudbflg; /* flag to indicate the kind of redo operation */
kxid ktudbxid; /* current tx id */
ub2 ktudbseq; /* block sequence number */
ub1 ktudbrec; /* new record index for this change */
};
Note 452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
Articles:
None
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0)
(Bugs must be closed or hold useful information)
Fixed In. Bug No. Description
———+————+—————————————————-
7.X Bug:XXXXXX Desc
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193

↧

Oracle ORA-600 [4097] ORA-00600 [4097] “Corruption”

March 2, 2016, 6:31 am

≫ Next: Oracle ORA-00600 [4000] ORA-600 [4000] “trying to get dba of undo segment header block from usn”

≪ Previous: Oracle ORA-00600 [4193] ORA-600 [4193] “seq# mismatch while adding undo record”

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [4097]

VERSIONS: versions 7.3 to

DESCRIPTION:

We are accessing a rollback segment header to see if a transaction has been committed.

However, the xid given is in the future of the transaction table.

This could be due to a rollback segment corruption issue OR you might be hitting the following known problem.

FUNCTIONALITY: Rollback

IMPACT:

If known issue (see below) this might cause missing data.
Otherwise, this could be a possible rollback segment corruption issue.

Known Bugs

NB	Bug	Fixed	Description
13340388	11.2.0.3.3, 11.2.0.3.BP07, 12.1.0.0	ORA-600 [kzaxpopr14 -Error in decoding xml text] when querying V$XML_AUDIT_TRAIL
	11.2.0.3.3, 11.2.0.3.BP07, 12.1.0.0

OERI SECUREFILE TRANSPORT
10249791 11.2.0.2.BP02,on DMLS referencing SECUREFILE plugged

11.2.0.2.7,

11.2.0.3, 12.1.0.0 11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.0 11.1.0.7.2, 11.2.0.1.1,

11.2.0.2, 12.1.0.0

7687856 11.2.0.1 5653641 11.2.0.1

ORA-600 [4097] / ORA-600 [4000] reported using transportable tablespaces

* 9145541

OERI[25027]/OERI[4097]/OERI[4000]/ORA- 1555 in plugged datafile after CREATE CONTROLFILE in 11g

OERI[4097] after using distributed 8565708 11.2.0.1.BP04,transactions in RAC

3613078

2628232

9.2.0.6,

ORA-600 [4000] from DML on transported ASSM tablespace
Corrupt dictionary from DROP TABLESPACE containing _offline_rollback_segments OERI[4097] from DML on TRANSPORTED tables with ASSM

Block corruption possible on temp files

ORA-600’s from CR served block from a plugged in tablespace

OERI:4097 possible on objects in read only transported tablespace

Tru64: OERI:4097 possible on RAC / OPS

Drop of Rollback segments can cause OERI:4097 / missing data

10.1.0.3 3249755 9.2.0.5, 10.1.0.2 9.2.0.4,

10.1.0.2

8.1.7.4, 2165601 9.0.1.3, 9.2.0.1

P 1885251 * 427389

‘*’ against a bug indicates that an alert exists for that issue. ‘+’ indicates a particularly notable bug.
‘P’ indicates a port specific bug.
‘@’ indicates UNPUBLISHED information

Fixed versions use “BPnn” to indicate Exadata bundle nn. “OERI:xxxx” may be used as shorthand for ORA-600 [xxxx].

9015PSE, 9.2.0.1 7.3.3.3, 7.3.4.0, 8.0.3.0

Some historic info….

Upgrade/install a patchset to bring the database to one of the following levels : 7.3.3.3, 7.3.4.0, 8.0.3.0
To avoid encountering this bug, rollback segments should only be dropped and recreatedaftertheinstancehasbeenshutdownnormalandrestarted. Ifyou have already encountered the bug, use the following workaround:
Possible workaround:
– Drop all rollback segments, except for SYSTEM
– Create the same number of rollback segments, small ones, with different names – recreate the original rollback segments
– drop the small dummy rollback segments
Every time you need to add a rollback segment, first create all of the dummy segments again, to make sure they use up the old segment numbers. Then create the new segment, then drop all dummy segments.
If you are getting this error not because of the above bug — see Description to see how you could run into the bug — then you might have a rollback segmentcorruptionissue. Typicalcausesaremediacorruptiontothe rollbacksegmentblocks,checkyourhardware. Toworkaroundarollback segment corruption problem (not because of known bug above) log the
issue with Oracle support.

ORA-600 [4097]
Versions:7.1.3 -7.3.2

Source:ktu.c

===========================================================================

Meaning:

We are accessing a rollback segment header to see if a transaction has beencommitted. However,thexidgivenisinthefutureofthe transaction table. Ie: the WRAP of the XID is higher than
the current WRAP number on the RBS header.

————————————————————————— Argument Description:

No arguments.

————————————————————————— Diagnosis:

This should be considered as a corruption.

1. Try to identify which object has this TX in its ITL list. (see the trace file)

If this object is recreatable that may be an option but we cant be sure whether it is the TX table that is too old or the block holding the ITL that is corrupt.
You MAY be able to recreate the RBS – It is safest to force
cleanout of all blocks before recreating the RBS (by FTS and recreating indexes).

Typical causes are media corruption to the data or RBS blocks, especially lost writes to RBS header.
This is also possible if rollback segments are recreated after a shutdownabort. SeeBug:427389 fordetails&options.Inthis case no data is corrupt, the rollback segments are just out of step.

Description and Workarounds for Bug:427389 Note:1011003.102

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4097 4097 4097 4097 4097

4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097

↧