Quantcast
Channel: Oracle and MySQL Database Recovery Repair Software recover delete drop truncate table corrupted datafiles dbf asm diskgroup blogs
Viewing all 175 articles
Browse latest View live

ORA-00600: [kccpb_sanity_check_2] During Instance Startup

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

 

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Applies to:                                                                                                                                                                                  

Oracle Database - Enterprise Edition - Version 10.2.0.1 and later Information in this document applies to any platform.

***Checked for relevance on  18-Feb-2013***

 

Symptoms                                                                                                                                                                                   

 

The database is getting the following errors on Startup:

 

ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [3621501], [3621462], [0x000000000]

 

 

Changes                                                                                                                                                                                     

 

In this case, the customer moved the box from one data center to another.

 

Cause                                                                                                                                                                                         

 

ORA-600 [kccpb_sanity_check_2] indicates that the seq# of the last read block is higher than the seq# of the control file header block. This is indication of

the lost write of the header block during commit of the previous cf transaction.

 

 

 

Solution                                                                                                                                                                                      

 

 

1) restore a backup of a controlfile and recover OR

2) recreate the controlfile OR

3) restore the database from last good backup and recover

 

 

NOTE: If you do not have any special backup of control file to restore and you are using Multiple Control File copies in your pfile/init.ora/spfile you can attempt to mount the database using each control file one by one. If you are able to mount the database with any of these control file copies you can then issue 'alter database backup controlfile to trace' to recreate controlfile.


ORA-00600 [kcrf_resilver_log_1] on restart after system crash

$
0
0

 

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.1.0 and later

Information in this document applies to any platform.

***Checked for relevance on 23-Nov-2012***

Symptoms

Database fails to open after crash with

ORA-00600: intern felkod, argument: [kcrf_resilver_log_1], [0x3B0E7AA68], [2]

From the trace file generated:

 

----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----

ALTER DATABASE OPEN

----- Call Stack Trace -----

kgeasnmierr

kcrf_write_zeroblks

kcrfis

kcrfais

kcrfr_read_disk

kcrfr_read

kcrfrgv

kcratr_scan

kcratr

kctrec

kcvcrv

Cause

Unpblished Bug 9056657: BOX REBOOT DURING UPGRADE CAUSED ORA-600 [KCRF_RESILVER_LOG_1]

There has been a lost write to the online redolog as a result of the crash.

The fix for this bug will raise a more meaning log corruption error rather than an ORa-00600 error.

Instance recovery is not possible - restore the database and do point in time recovery to the most recent archivelog.

Solution

Unpublished Bug 9056657 is included in 11.2.0.2 Patch Set Release.

Backports may be requested.

 

ORA-600 [3020] "Stuck Recovery"

$
0
0

 

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Format: ORA-600 [3020] [a] [b] [c] [d] [e]

 

 

VERSIONS:

version 6.0 and above DESCRIPTION:

This is called a 'STUCK RECOVERY'.

 

There is an inconsistency between the information stored in the redo and the information stored in a database block being  recovered.

 

ARGUMENTS:

 

For Oracle 9.2 and earlier: Arg [a] Block DBA

Arg [b] Redo Thread Arg [c] Redo RBA Seq

Arg [d] Redo RBA Block No Arg [e] Redo RBA Offset.

 

For Oracle 10.1

Arg [a] Absolute file number of the datafile. Arg [b] Block number

Arg [c] Block DBA

 

FUNCTIONALITY:

kernel cache recovery parallel

 

IMPACT:

INSTANCE FAILURE during recovery.

 

SUGGESTIONS:

 

There have been cases of receiving this error when RECOVER has been issued, but either some datafiles were not restored to disk, or the restoration has not finished.

 

Therefore, ensure that the entire backup has been restored and that the restore has finished PRIOR to issuing a RECOVER database  command.

If problems continue, consider restoring from a backup and doing a point-in-time recovery to a time PRIOR to the one implied  by

the ORA-600[3020] error.

 

Example:

 

SQL> recover database until time 'YYYY-MON-DD:HH:MI:SS'; This error can also be caused by a lost  update.

During normal operations, block updates/writes are being performed  to a number of files including database datafiles, redo log files, archived redo log files etc.

 

This error can be reported if any of these updates are lost for some reason.

 

Therefore, thoroughly check your operating system and disk  hardware.

 

In the case of a lost update, restore an old copy of the datafile and attempt to recover and roll forward  again.

 

If the Known Issues section below does not help in terms of  identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further  analysis.

 

Known Issues:

 

Related Articles

 

Note:1265884.1      Resolving ORA-752 or ORA-600 [3020] During Standby  Recovery

 

KnownBugs

 

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:

 

NB

Bug

Fixed

Description

 

9847338

 

Session hang after applying the patch for Bug 9587912 which causes ORA-600 [30

+

13467683

11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP04, 12.1.0.0

Join of temp and permanent tables in RAC might cause corruption of permanent ta Regression by bug 10352368

 

12831782

11.2.0.2.BP11, 11.2.0.3.BP01, 12.1.0.0

ORA-600 [3020] / ORA-333 Recovery of datafile or async transport do not read mi there is a stale block

 

12582839

11.2.0.3, 12.1.0.0

ORA-8103/ORA-600 [3020] on RMAN recovered locally managed tablespace

 

11689702

11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.0

ORA-600 [3020] during recovery after datafile RESIZE (to smaller size)

 

10329146

11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02,

11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.0

Lost write in ASM with multiple DBWs and a disk is offlined and then onlined

 

10218814

11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.0

ORA-600 [3020] during recovery / on standby

+

10209232

11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01,

11.2.0.3, 12.1.0.0

ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM

*

10205230

11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.0

ORA-600 / corruption possible during shutdown in RAC

 

10094823

11.2.0.2.4, 11.2.0.2.BP09, 11.2.0.3, 12.1.0.0

Block change tracking on physical standby can cause data loss

 

10071193

11.2.0.2.BP02, 11.2.0.3, 12.1.0.0

Lost write / ORA-600 [kclchkblk_3] / ORA-600 [3020] in RAC - superceded

 

9587912

11.2.0.2, 12.1.0.0

ORA-600 [3020] in datafile that went offline/online in a RAC instance

 

8774868

11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0

OERI[3020] reinstating primary

+

8769473

11.2.0.2, 12.1.0.0

ORA-600 [kcbzib_5] on multi block read in RAC. Invalid lock in RAC. ORA-600 [302 Recovery

P

8635179

10.2.0.5, 11.2.0.2, 12.1.0.0

Solaris: directio may be disabled for RAC file access. Corruption / Lost Write

+

8597106

11.2.0.1.BP06, 11.2.0.2, 12.1.0.0

Lost Write in ASM when normal redundancy is used

P

12330911

12.1

EXADATA LSI firmware for lost writes

+

10425010

11.2.0.3, 12.1

Stale data blocks may be returned by Exadata FlashCache

 

8826708

10.2.0.5, 11.2.0.2

ORA-600 [3020] for block type 0x3a (58) during recovery for block restored by RM backup

 

11684626

11.2.0.1

ORA-600 [3020] on standby involving "BRR" redo when db_lost_write_protect is e

 

8230457

10.2.0.4.1, 10.2.0.5, 11.1.0.7.1, 11.2.0.1

Physical standby media recovery gets OERI[krr_media_12]

+

7680907

10.2.0.5, 11.1.0.7.1, 11.2.0.1

ORA-600 [kclexpandlock_2] in LMS / instance crash. Incorrect locks in RAC. ORA-6 [3020] in recovery

 

4637668

10.2.0.3, 11.1.0.6

IMU transactions can produce out-of-order redo (OERI [3020] on recovery)

 

4594917

9.2.0.8, 10.2.0.2, 11.1.0.6

Write IO error can cause incorrect file header checkpoint information

 

4453449

10.2.0.2, 11.1.0.6

OERI:3020 / corruption errors from multiple FLASHBACK DATABASE

 

7197445

10.2.0.4.1, 10.2.0.5

Standby Recovery session cancelled due to ORA-600 [3020] "CHANGE IN FUTURE BLOCK"

 

5610267

10.2.0.5

MRP terminated by ORA-600[krr_media_12] / OERI:3020 after flashback

 

3762714

9.2.0.7, 10.1.0.4, 10.2.0.1

ALTER DATABASE RECOVER MANAGED STANDBY fails with OERI[3020]

 

3560209

10.2.0.1

OERI[3020] stuck recovery under RAC

 

3397181

9.2.0.5, 10.1.0.3, 10.2.0.1

ALTER SYSTEM KILL SESSION of recovery slave causes stuck recovery

*

3381950

10.2.0.1

Backups from RAC DB before Data Guard Failover cannot be used

 

3535712

9.2.0.6, 10.1.0.4

OERI[3020] / ORA-10567 from RAC with standby in max performance mode

 

4594912

9.2.0.8, 10.1.0.2

Incorrect checkpoint possible in datafile headers

 

3635331

9.2.0.6, 10.1.0.4

Stuck recovery (OERI:3020) / ORA-1172 on startup after a crash

 

2322620

9.2.0.1

OERI:3020 possible on recovery of LOB DATA

P+

656370

7.3.3.4, 7.3.4.0, 8.0.3.0

AlphaNT only: Corrupt Redo (zeroed byte) OERI:3020

 

 

Note:190263.1

ORA-1172 OR ORA-600[3020] Quick Support Debugging  Guide

 

Given that this error could be due to a lost update to either the datafile and/or the redo files, one thing to do would be to get dumps of both.

 

Refer to the following notes for information on how to do this  :

 

Note:1031381.6  How to Dump Redo Log File Information

Note:45852.1    Taking BLOCKDUMPS on Oracle8 - The ALTER SYSTEM  DUMP command **INTERNAL ONLY**

 

It is especially useful to focus on the particular datafile block implied by the ORA-600 [3020]. Dump all redo for that block, starting with the log sequence before the restored  datafile,

up to the point of failure.

 

Blockdumps of the datafile should be taken at various stages of the recovery process - for example right after doing the restore; and then again after each redo log file has been applied; and just before the SCN (or point in time) that the ORA-600 was reported; and just after  redo

for the given SCN has been applied; and so  on.

 

The idea being that you may narrow down the point at which something went wrong.

 

ORA-600 [3020] [a] [b] [c] [d] [e]

Versions: 7.0.X  - 8.0.5                                  Source: knl/kcrp.c

===========================================================================

Meaning:

Recovering database and REDO entry has an INC/SEQUENCE number greater than that on the database  block.

In Oracle8 where the block structure is different it still means the same basic thing - the redo record we have has an SCN / SEQ which does not match the database block we are wanting to apply it to.

This is called 'STUCK RECOVERY'.

 

---------------------------------------------------------------------------

Argument Description:

 

a.   Block DBA

b.   Redo Thread

c.   Redo RBA Seq

d.   Redo RBA Block No

e.   Redo RBA Offset.

---------------------------------------------------------------------------

Diagnosis:

There are many possible causes for this most resulting from either invalid sets of commands or media  corruption.

 

-  Has customer restored a backup, open the DB, closed the DB and then tried to recover without re-loading the backup  ??

** If they say no GET THE ALERT LOG and prove it - it's easy to waste a lot of time when this was the real  cause.

 

-  If the problem was a lost update, restore of an OLDER copy of the datafile and a recovery may work.

 

-  The quick option here is to restore and recover UP TO an  SCN

 

just before the problem. Customer will lose some data as this is an incomplete recovery so you need to know the  priority:

a)  TIME   or  b) Minimal Data Loss.

 

-  Check the tracefile for the 3020 report. It is possible to signal OERI(3020) if the datafile block is  corrupt.

Eg: OERI(3020) with Inc=0 Seq=1 reported for the disk block is possibly a zeroed out data-block on the datafile and NOT a redo issue.

 

-  Is parallel server being used ?

If so another thread may have the required changes and they haven't been read for some reason. Check for OS and DLM errors. Try to make sure only ONE instance attempts any recovery by shutting down other instances.

 

-  Are hot backups being used ??

Check that the backups are occuring correctly between BEGIN and END backup commands.

 

-  Up to Oracle 8i you can try to skip the error using the  hidden

parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

Be aware that blocks will be marked corrupt if this is used so make sure the error is not on a dictionary object  !!

 

-  From 9i you can try to skip the error using the 'ALLOW .. CORRUPTION' clause of the RECOVER DATABASE  command.

(Note that in 11g onwards you may need to set DB_LOST_WRITE_PROTECT=NONE for the "ALLOW 1 CORRUPTION" clause to  work)

 

-  For logging a bug you need:

(a)  Where an error is reported, get any trace files produced and relevant redo log dumps if  necessary.

Document completely the circumstances  leading

up to the error including configuration; type of backup (manual, RMAN, incremental, etc.);

the exact commands used to create the backups  and

the exact commands used to do the restore and  recovery.

(b)  Provide a reproducible test case or dial-in information to development.

(c)  Where relevant, determine if generic or port-specific  issue.

 

 

 

Articles:

Parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

 

---------------------------------------------------------------------------

 

Example OERI:3020 dump in Oracle8

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 

*** 1999.07.02.01.02.58.000

RECOVERY OF THREAD 1 STUCK AT BLOCK 14099 OF FILE  6

 

REDO RECORD - Thread:1 RBA: 0x0045ee.00009c8b.0010 LEN: 0x00e8 VLD: 0x01 SCN scn: 0x0000.0951d868 07/02/99  00:57:19

CHANGE #2 TYP:2 CLS: 1 AFN:6 DBA:0x01803713 SCN:0x0000.09519e84 SEQ:  1   OP:10.4

buffer tsn: 5 rdba: 0x01803713 (6/14099) scn:0x0000.0951b4d4 seq:0x01 flg:0x00  tail:0xb4d40601

frmt:0x02 chkval:0x0000 type:0x06=trans  data

 

*** 1999.07.02.01.02.58.000

ksedmp: internal or fatal error

ORA-00600: internal error code,  arguments:

[3020], [25179923], [1], [17902], [40075], [16], [],  []

 

 

Breaking this up shows the following SCN information: Redo SCN:                              0x0000.0951d868

SCN expected on block:                       0x0000.09519e84 SCN on Buffer:                       0x0000.0951b4d4

 

In this case ithe actual SCN marked in the  block

in the buffer cache is _later_ than the expected SCN, but  _before_

the SCN level for the redo change vector. Normally, the SCN in the CHANGE line must match exactly the one on the block (in the buffer  cache);

and redo application brings that block to the (later)  SCN/SEQ

on the redo record.  One possible explanation is that the  system

saw a stale copy of the datafile block when the redo was generated, so that the SCN in the CHANGE line is the wrong one. That would indicate a possible lost update to the  datafile.

 

More commonly, the ORA-600 [3020] error indicates that the SCN on the block is BEHIND the SCN on the redo we want to  apply,

so there is a GAP.  I.e., the REDO is ahead of the  block.

 

However, in this example there is still a problem even though the block initially appears to be AHEAD of the REDO (normally  OK).

Why?  The SCN on the block is BELOW the most recent commit  SCN.

If we applied the current redo record then the SCN on the block would advance to the more recent commit SCN so if this block is  truely

ahead of this redo record it must have an SCN >= the most recent commit SCN. It hasn't, so something is wrong - most likely a lost datafile write which occurred between two items of redo  causing

two redo records using the same block SCN to base their change  on.

 

 

  Known issues caused by 3rd party  provider

 

1.  Lost IO / Corruption caused by EMC. From JET SR: 3-1260172021.  EMC bug ID:   emc230687

 

 

ID: emc230687

Domain: EMC1 HP-UX 11v1

Solution Class: 3.X Compatibility

 

ORA-600 [3020] during recovery caused bu LOST IO due to EMC bug ID:   emc230687.

 

 

No errors raised within the I/O stack at the host level nor from a Timefinder perspective, API ECA debug data void of any anomalies Timefinder w/Oracle best practices process is being adhered to ( recoverable business solution process   )

This also caused some corruption errors  like:

 

ORA-00600: internal error code, arguments: [kddummy_blkchk], [29], [2121334], [6108] kdbchk: xaction header lock count  mismatch

 

No errors raised with the Symm as well. Corruption issue resolved by applying fix 44177, see the following Primus article for more i ETA emc204393

2.  Lost IO by EMC.  EMC solution # is  emc251398

 

Fixed by the latest microcode version 5773.163.113 applied on the Symmetrix DMX (no changes on V-MAX cabins). EMC solution # is emc2

 

 

 

==

 

Ensure that this note comes out on top in Metalink when searched ora-600 ora-600 ora-600 ora-600 ora-600 ora-600  ora-600

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 3020 3020 3020 3020 3020 3020 3020 3020 3020  3020

3020 3020 3020 3020 3020 3020 3020 3020 3020  3020

Urgent Help needed with ASM Header Corruption - Q: When is an ASM disk header is read and updated ?

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 

 

 

 

 

The 3rd instance of one of the databases ( 11.2.0.3 with ASM + External Redundancy  ) crashed out with the below errors reported ...It seems Customer added some Disks to ASM and Midway rebalance ASM picked up some underlying corruptions subsequently dismounting ASM DG and hence crashing the database

ASM Alert entries

>>> Customer added some Disks here >>>

Thu Oct 25 14:16:09 2012
NOTE: disk validation pending for group 19/0x6bc90d3b (DBTCSTRNPA)
SUCCESS: validated disks for 19/0x6bc90d3b (DBTCSTRNPA)
NOTE: disk validation pending for group 19/0x6bc90d3b (DBTCSTRNPA)
NOTE: Assigning number (19,20) to disk (ORCL:DBTCSTRNPA21)
NOTE: Assigning number (19,21) to disk (ORCL:DBTCSTRNPA22)
NOTE: Assigning number (19,22) to disk (ORCL:DBTCSTRNPA23)

>> Rebalance started 14:29 PM as a result >>>

Thu Oct 25 14:29:02 2012
NOTE: Attempting voting file refresh on diskgroup DATCSTRNPA
NOTE: ASM did background COD recovery for group 10/0x6b190d32 (DATCSTRNPA)
NOTE: starting rebalance of group 10/0x6b190d32 (DATCSTRNPA) at power 1
Starting background process ARB0
Thu Oct 25 14:29:02 2012
ARB0 started with pid=45, OS id=11888 
NOTE: assigning ARB0 to group 10/0x6b190d32 (DATCSTRNPA) with 1 parallel I/O

>>> ASM Header corruption notes Midway during Rebalance at 15:40 >>

Thu Oct 25 15:40:24 2012
WARNING: cache read  a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037734 au=0 blk=48 count=1
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
NOTE: a corrupted block from group DATCSTRNPA was dumped to /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc
WARNING: cache read (retry) a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037734 au=0 blk=48 count=1
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ERROR: cache failed to read group=10(DATCSTRNPA) dsk=72 blk=48 from disk(s): 72(DATCSTRNPA84)
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
NOTE: cache initiating offline of disk 72 group DATCSTRNPA
NOTE: process _arb0_+asm3 (11888) initiating offline of disk 72.3916037734 (DATCSTRNPA84) with mask 0x7e in group 10
WARNING: Disk 72 (DATCSTRNPA84) in group 10 in mode 0x7f is now being taken offline on ASM inst 3
NOTE: initiating PST update: grp = 10, dsk = 72/0xe969fe66, mask = 0x6a, op = clear
Thu Oct 25 15:40:25 2012
GMON updating disk modes for group 10 at 115 for pid 45, osid 11888
ERROR: Disk 72 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 10)
Thu Oct 25 15:40:25 2012
NOTE: cache dismounting (not clean) group 10/0x6B190D32 (DATCSTRNPA) 
WARNING: Offline of disk 72 (DATCSTRNPA84) in group 10 and mode 0x7f failed on ASM inst 3
Thu Oct 25 15:40:25 2012
NOTE: halting all I/Os to diskgroup 10 (DATCSTRNPA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 4739, image: oracle@itcccl180.it.express.tnt (B000)
Thu Oct 25 15:40:25 2012
NOTE: LGWR doing non-clean dismount of group 10 (DATCSTRNPA)
NOTE: LGWR sync ABA=231.134 last written ABA 231.134

>> Diskgroup Dismounted as a Result of this >>>

NOTE: cache dismounted group 10/0x6B190D32 (DATCSTRNPA) 
SQL> alter diskgroup DATCSTRNPA dismount force /* ASM SERVER */ 
System State dumped to trace file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc
Thu Oct 25 15:40:27 2012
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
Thu Oct 25 15:40:39 2012

Thu Oct 25 15:40:39 2012
NOTE: AMDU dump of disk group DATCSTRNPA created at /oracle/diag/asm/+asm/+ASM3/trace
NOTE: cache deleting context for group DATCSTRNPA 10/0x6b190d32
ERROR: ORA-15130 thrown in ARB0 for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_arb0_11888.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "" is being dismounted
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [4] [2] [27016521 != 27015521]
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [339] [2147483706[4232823222 != 261167758]
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [2147483649] [81] [2397242929 != 2383392830]
ORA-15130: diskgroup "DATCSTRNPA" is being dismounted
ORA-15066: offlining disk "DATCSTRNPA84" in group "DATCSTRNPA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
Thu Oct 25 15:40:39 2012

NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 in COD recovery for diskgroup 10/0x6b190d32 (DATCSTRNPA)
ERROR: ORA-15130 thrown in RBAL for group number 10
Errors in file /oracle/diag/asm/+asm/+ASM3/trace/+ASM3_rbal_16047.trc:
ORA-15130: diskgroup "" is being dismounted

DB Alert log has these entries 

Thu Oct 25 02:22:42 2012
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
...
..
Thu Oct 25 02:52:58 2012
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM

>> CAN Be ignored as documented under "WARNING: ASM Communication Error: Op 0 State 0x0 (15055) (Doc ID 1469167.1)"

...
...

>>> ASM Disks added are reported here >>>

Thu Oct 25 14:16:18 2012
SUCCESS: disk DBTCSTRNPA21 (20.3916037823) added to diskgroup DBTCSTRNPA
SUCCESS: disk DBTCSTRNPA22 (21.3916037824) added to diskgroup DBTCSTRNPA
SUCCESS: disk DBTCSTRNPA23 (22.3916037825) added to diskgroup DBTCSTRNPA
Thu Oct 25 14:22:40 2012

>>> DB Crashes as the ASM Diskgroup was dismounted due to Corruptions >>>

Thu Oct 25 15:40:39 2012
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_lgwr_17853.trc:
ORA-00345: redo log write error block 35172 count 1
ORA-00312: online log 17 thread 3: '+DATCSTRNPA/cstrnpa/onlinelog/group_17.399.776096445'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_lgwr_17853.trc:
ORA-00346: log member marked as STALE and closed
ORA-00312: online log 17 thread 3: '+DATCSTRNPA/cstrnpa/onlinelog/group_17.399.776096445'
Thu Oct 25 15:40:48 2012
KCF: read, write or open error, block=0x9b online=1
        file=123 '+DATCSTRNPA/cstrnpa/datafile/undotbs3.387.767891645'
        error=15078 txt: ''
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_dbw0_17837.trc:
Errors in file /oracle/diag/rdbms/cstrnpa/CSTRNPA3/trace/CSTRNPA3_dbw0_17837.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 123 (block # 155)
ORA-01110: data file 123: '+DATCSTRNPA/cstrnpa/datafile/undotbs3.387.767891645'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
DBW0 (ospid: 17837): terminating the instance due to error 63999

Hardware vendor HP have tried shelving these issues onto Oracle and have asked us to explain exactly when and how is an ASM Disk Header read and Updated so please can anyone help provide answers to below Q's asked =>
They believe ASM Rebalance caused these corruptions but we don't think that was the reason 

1. When ASM rebalances the disks, does it read the block header first then write the block?  Can a ASM Rebalance cause Block corruptions under any circumstances OR is this not possible within the ASM Internal mechanism ?

2. When is the ASM header read ?

3. What causes ASM metadata to be updated ? Is this updated when the disk is added immediately, or when the rebalancing occurs?

4. How is locking done on the ASM header between the RAC nodes and how is a lock released on an Oracle instance failure?

5. Why did the Database carry on when the Header corruption error was first reported in the Alert log  – This has been partially answered in the fact the error is only detected when the rebalance runs. 

6.  How can we determine when was the last successful ASM Header read before the corruption ?
  

Any help would be more than appreciated...

 

 

Answer:

 

ARB0 relocating file +DATCSTRNPA.256.666381297 (8 entries)



*** 2012-10-25 17:05:06.757

ARB0 relocating file +DATCSTRNPA.258.666381295 (76 entries)



*** 2012-10-25 17:07:14.274

WARNING: cache read  a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037804 au=0 blk=48 count=1



*** 2012-10-25 17:07:14.274

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)

----- Error Stack Dump -----
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
Hex dump of disk block image:

Dump of memory from 0x00000000694FA000 to 0x00000000694FB000

0694FA000 00000000 00000000 00000000 00000000  [................]

        Repeat 63 times

0694FA400 003C0000 00780000 00060000 007571ED  [..<...x......qu.]

0694FA410 003BFFF5 00000000 00000002 00000002  [..;.............]

0694FA420 00008000 00008000 00004000 5051A885  [.........@....QP]

0694FA430 50528195 001D0005 0003EF53 00000001  [..RP....S.......]

0694FA440 5048BD84 00ED4E00 00000000 00000001  [..HP.N..........]

0694FA450 00000000 0000000B 00000080 00000034  [............4...]

0694FA460 00000006 00000003 DB40F439 4643267C  [........9.@.|&CF]

0694FA470 5AB9A4A6 6703FEB5 00000000 00000000  [...Z...g........]

0694FA480 00000000 00000000 00000000 00000000  [................]

        Repeat 3 times

0694FA4C0 00000000 00000000 00000000 03FE0000  [................]

0694FA4D0 00000000 00000000 00000000 00000000  [................]

0694FA4E0 00000008 00000000 00000000 6D4FD3DA  [..............Om]

0694FA4F0 174CB0E0 9DA83EA6 62C7706F 00000102  [..L..>..op.b....]

0694FA500 00000000 00000000 5048BD84 00000609  [..........HP....]

0694FA510 0000060A 0000060B 0000060C 0000060D  [................]

0694FA520 0000060E 0000060F 00000610 00000611  [................]

0694FA530 00000612 00000613 00000614 00000615  [................]

0694FA540 00000A16 00000000 00000000 08000000  [................]

0694FA550 00000000 00000000 00000000 00000000  [................]

  Repeat 170 times

OSM metadata block dump:

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfbtTraverseBlock:  Invalid OSM block type 0

WARNING: cache read (retry) a corrupt block: group=10(DATCSTRNPA) dsk=72 blk=48 disk=72 (DATCSTRNPA84) incarn=3916037804 au=0 blk=48 count=1



*** 2012-10-25 17:07:14.277

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)

----- Error Stack Dump -----
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483720] [48] [0 != 1]
ERROR: cache failed to read group=10(DATCSTRNPA) dsk=72 blk=48 from disk(s): 72(DATCSTRNPA84)

CE: (0x0x693e9018) group=10 (DATCSTRNPA) dsk=72 blk=48

    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

    mirror=0

    flags_kfcpba=0x49 copies=1 blockIndex=48 AUindex=0 AUcount=1 loctr fcn=0.0

    copy #0:  disk=72  au=0 flags=01

BH: (0x0x69791290) bnum=2049 type=reading state=reading chgSt=not modifying pageIn=current

    flags=0x00000000 pinmode=excl lockmode=excl bf=0x694fa000

    kfbh_kfcbh.fcn_kfbh = 0.0 lowAba=0.0 highAba=0.0

    last kfcbInitSlot return code=null chgCount=815 cpkt lnk is null ralFlags=0x00000000

    PINS:

    (kfcbps) pin=25743 get by kfd.c line 23273 mode=excl

             dsk=72 blk=48 status=pinned

             flags=0x80000000 flags2=0x00000000

             class=1400 type=ALLOCTBL stateWanted=current

             bastCount=1 waitStatus=0x00000000 relocCount=0

             scanBastCount=2 scanBxid=64781 scanSkipCode=2

             last released by kfc.c 18264

 LE: (0x724e36b0) le=2567 group=10 dsk=72 blk=48

    open=T kjStat=0 mode=EX closing=0 lop=(nil)

    flags=00000000 astFlags=00000000 rlsFlags=00000000

    rcvFlags=00000000 id=0x2a000048.30 bucket=1791

    lastScanWaiterMode=0 fcn=0.0

    

 File_name :: +ASM2_arb0_23441.trc

 

 NOTE: cache opening disk 71 of grp 10: DATCSTRNPA83 label:DATCSTRNPA83

 NOTE: cache opening disk 72 of grp 10: DATCSTRNPA84 label:DATCSTRNPA84

 NOTE: cache opening disk 73 of grp 10: DATCSTRNPA85 label:DATCSTRNPA85







00003e50  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

*

00004000  01 82 03 01 04 00 00 00  48 00 00 80 aa 4d 6f 80  |........H....Mo.|

00004010  01 b7 8c 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

00004020  80 03 00 00 c0 01 00 00  08 00 08 00 00 00 c0 01  |................|











]$ kfed read mpath16p1dump blknum=47|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      47 ; 0x004: blk=47

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2309435824 ; 0x00c: 0x89a731b0

kfbh.fcn.base:                  9230112 ; 0x010: 0x008cd720

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000





$ kfed read mpath16p1dump blknum=48|more

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

B7F14200 00000000 00000000 00000000 00000000  [................]

        Repeat 63 times

B7F14600 003C0000 00780000 00060000 007571ED  [..<...x......qu.]

B7F14610 003BFFF5 00000000 00000002 00000002  [..;.............]

B7F14620 00008000 00008000 00004000 5051A885  [.........@....QP]

B7F14630 50528195 001D0005 0003EF53 00000001  [..RP....S.......]

B7F14640 5048BD84 00ED4E00 00000000 00000001  [..HP.N..........]

B7F14650 00000000 0000000B 00000080 00000034  [............4...]

B7F14660 00000006 00000003 DB40F439 4643267C  [........9.@.|&CF]

B7F14670 5AB9A4A6 6703FEB5 00000000 00000000  [...Z...g........]

B7F14680 00000000 00000000 00000000 00000000  [................]

        Repeat 3 times

B7F146C0 00000000 00000000 00000000 03FE0000  [................]

B7F146D0 00000000 00000000 00000000 00000000  [................]

B7F146E0 00000008 00000000 00000000 6D4FD3DA  [..............Om]

B7F146F0 174CB0E0 9DA83EA6 62C7706F 00000102  [..L..>..op.b....]

B7F14700 00000000 00000000 5048BD84 00000609  [..........HP....]

B7F14710 0000060A 0000060B 0000060C 0000060D  [................]

B7F14720 0000060E 0000060F 00000610 00000611  [................]

B7F14730 00000612 00000613 00000614 00000615  [................]

B7F14740 00000A16 00000000 00000000 08000000  [................]

B7F14750 00000000 00000000 00000000 00000000  [................]

  Repeat 170 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]







$ kfed read mpath16p1dump blknum=49|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      49 ; 0x004: blk=49

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2158685900 ; 0x00c: 0x80aaeecc

kfbh.fcn.base:                  4799766 ; 0x010: 0x00493d16

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000





$ kfed read mpath16p1dump blknum=50|more

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      50 ; 0x004: blk=50

kfbh.block.obj:              2147483720 ; 0x008: disk=72

kfbh.check:                  2158654602 ; 0x00c: 0x80aa748a

kfbh.fcn.base:                  4820845 ; 0x010: 0x00498f6d

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdatb10.aunum:                   21504 ; 0x000: 0x00005400

kfdatb10.shrink:                    448 ; 0x004: 0x01c0

1. When ASM rebalances the disks, does it read the block header first then write the block? 
Can a ASM Rebalance cause Block corruptions under any circumstances 
OR is this not possible within the ASM Internal mechanism ?

===>> When rebalance takes place asm do block by block checksum ,here in your case for block 48 ,ASM checksum failed as ASM didnot found asm formatted block .

      On 11.2.0.3 ,till now there is no reported bug at oracle end.

      

 Interestingly, I see from the dd dump that only block 48 is unformatted whereas earlier and later blocks were formatted properly for ASM allocation table metadata.

 

 And when I see the block 48 ,

 

 

$ kfed read mpath16p1dump blknum=48|more

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

B7F14200 00000000 00000000 00000000 00000000  [................]

        Repeat 63 times

B7F14600 003C0000 00780000 00060000 007571ED  [..<...x......qu.]

B7F14610 003BFFF5 00000000 00000002 00000002  [..;.............]

B7F14620 00008000 00008000 00004000 5051A885  [.........@....QP]

B7F14630 50528195 001D0005 0003EF53 00000001  [..RP....S.......]

B7F14640 5048BD84 00ED4E00 00000000 00000001  [..HP.N..........]

B7F14650 00000000 0000000B 00000080 00000034  [............4...]

B7F14660 00000006 00000003 DB40F439 4643267C  [........9.@.|&CF]

B7F14670 5AB9A4A6 6703FEB5 00000000 00000000  [...Z...g........]

B7F14680 00000000 00000000 00000000 00000000  [................]

        Repeat 3 times

B7F146C0 00000000 00000000 00000000 03FE0000  [................]

B7F146D0 00000000 00000000 00000000 00000000  [................]

B7F146E0 00000008 00000000 00000000 6D4FD3DA  [..............Om]

B7F146F0 174CB0E0 9DA83EA6 62C7706F 00000102  [..L..>..op.b....]

B7F14700 00000000 00000000 5048BD84 00000609  [..........HP....]

B7F14710 0000060A 0000060B 0000060C 0000060D  [................]

B7F14720 0000060E 0000060F 00000610 00000611  [................]

B7F14730 00000612 00000613 00000614 00000615  [................]

B7F14740 00000A16 00000000 00000000 08000000  [................]

B7F14750 00000000 00000000 00000000 00000000  [................]

  Repeat 170 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]



It seems some of the external values overwritten on that block.

So,could you please check few things.



1. Any OS level application which is running ,can write such string



2. Any Application level which is running ,can write such string.



remember only 4k block got impacted here .

2. When is the ASM header read ?

====>> This is not asm disk header ,rather it is on some internal asm metadata.

       

This kind read generally happens due to below situation.



a. When diskgroup get mounted and does diskgroup level recovery .



b. When you add disks and rebalence happens .

        
3. What causes ASM metadata to be updated ? Is this updated when the disk is added immediately, or when the rebalancing occurs?

====>>> This kind of asm metadata get updated when new allocation/deallocation took place at database level .
        
4. How is locking done on the ASM header between the RAC nodes and how is a lock released on an Oracle instance failure?

====>>> ASM keeps track of changes of each thread at asm diskgroup level and do required recovery on next mount .

       
5. Why did the Database carry on when the Header corruption error was first reported in the Alert log  –
This has been partially answered in the fact the error is only detected when the rebalance runs.

===>>> Unless you are going to read/write data which are pointed using that allocation table ,you are not going to see this issue .

       but when rebalance takes place ,it goes and touch all the blocks to read and make symmetrical stripping distribution of alrady existing

       allocation unit.

       

       Hence,this time it came into picture.

       

       So,this corruption took place between ,the start of asm rebalance and last DML operation on that block.
       
 
6.  How can we determine when was the last successful ASM Header read before the corruption ?

===>>> ASM diskgroup getting mounted ,so all asm disk headers are fine.

       ASM diskheader is different from allocation table metadata.

       even at the time disk all asm disk headers were read.

Oracle ASM unable to find ASM disk header in some disks

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 
 
 ASM unable to find ASM disk header in some disks.These headers (at least 
  5MB+) is zero filled in each disk (mpath218p1, mpath217). 
 
  I tried to see if disk start, end is relocated and these header can be found 
  in different offset of the same disk. But, it doesn't seem so. These data is  not relocated. They are corrupted. 
 
  It is not issue at ASM/RDBMS level. 
 
 FURTHER PLANS: 
 ================================================================ 
 -ASM/RDBMS doesn't do zero fill. It is not issue at ASM/RDBMS level. 
  Given that 5mb+ is corrupted, it should have caused external to ORACLE. 
 
 
 -Please ask ct to check for following 
  -any manual error. someone inadvertently does dd if=/dev/zero' on these 
   disks 
  -any tools, scripts, 3rd party tools that might do such writes. If 
   there is no known applications/tool, please suggest ct to run some 
   monitoring tools that does write on the disks (something like fuser command 
 
   should help to list pid that have opened the disks.) 
  -Given that corruption occurred in same storage (ETL420), it might be 
   issue at storage level. please involve storage support to see if something 
   wrong at storage level.   

ASMdisk Status - Candidate disk after reboot [ recover ASM header files ]

$
0
0

 

 

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Customer has  migrated  oracle databases running on old SAN to new SAN using ASM  rebalance operation. Customer is using External   redundancy. After completing rebalance operation, customer rebooted all server and removed old SAN device entry on weekend.  Customer is unable to bring databases online on  One of the 4 servers. Customer is getting following error :
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATA”

 

 

I have attached asm-kfed result for your reference. Is it possible to recover ASM header files without so customer doesn’t need to backup/restore  5 TB database ?

 

Total System Global Area  284008448 bytes
Fixed Size                  2158616 bytes
Variable Size             256684008 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
 

SQL> select group_number,disk_number,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,NAME,PATH from V$asm_disk;
 
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    NAME                           PATH
------------ ----------- ------- ------------ ------- -------- ------------------------------ ----------------------------------------
           0           0 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d26s6
           0          23 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d22s6
           0           2 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM01
           0           3 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM12
           0           4 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d27s6
           0           5 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d25s6
           0           6 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM05
           0           7 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM02
           0           8 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM08
           0           9 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d24s6
           0          10 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM11
           0          11 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d30s6
           0          12 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM07
           0          13 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d31s6
           0          14 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d21s6
           0          15 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM10
           0          16 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d23s6
           0          17 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM09
           0          18 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM03
           0          19 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d29s6
           0          20 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/c0d28s6
           0          21 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM04
           0          22 IGNORED MEMBER       ONLINE  NORMAL                                  /dev/rdsk/c0d32s6
           0           1 CLOSED  CANDIDATE    ONLINE  NORMAL                                  /dev/rdsk/san03dp_dbs05dp_ASM06





dev/rdsk/san03dp_dbs05dp_ASM03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM04
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM05
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM06
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM07
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

/dev/rdsk/san03dp_dbs05dp_ASM08
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
10037FE00 00000000 00000000 00000000 00000000 [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

 

 

ASM disks – ASM03/04/05/06/07/08 –  showing the status as “CANDIDATE” is a bit worry.  But if the devices were not part of DATA diskgroup before, these devices are not the major cause of ORA-15063.
Please check ASM alert.log whether these 5 devices were belong to DATA.

I’m more concerned about the following 4 devices as they show the status as “IGNORED” which indicates there are other devices showing the same disk information given asm_diskstring parameter.
– ASM01/02/0910/11

Chances are that the following devices below show the same disk information as ASM01/02/0910/11 and there is good chance that these different path point to the same physical devices.
~~
/dev/rdsk/c0d22s6
/dev/rdsk/c0d30s6
/dev/rdsk/c0d31s6
/dev/rdsk/c0d21s6
/dev/rdsk/c0d29s6
/dev/rdsk/c0d32s6
~~

Please check all disk header which disks show the duplicate ASM disk information using the following perl script in the note below
And if duplicate paths point to the same physical device, the additional device path should be disabled using “chmod 000 <device_path>”
– KFED.PL for diagnosing – ORA-15063 ORA-15042 ORA-15020 (Doc ID 1346190.1)

Need urgent help on ASM issue – disk header status problem

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 

 
ODA system
+In order to workaround a known issue (startup hang when using Hitachi disks) FE/customer was in the process of replacing Hitachi drives on the system.
+ They pulled 2 disks out simultaneously and new disks put in
+ Diskgroups (DATA and RECO) dismounted  – as diskgroups built in  NORMAL redundancy .
+ Clusterware went down and realizing the problem- customer reinstated the original disks.

Current issue:
Disk groups are not mounting.
ASM disks from slot 0 are not being seen by ASM
ASM disks from slot 1 are being seen, but reported as new disks ( Header status=CANDIDATE)

Mounting the diskgroup with FORCE option has also not helped ( because there is 1 disk from slot 0 missing and 1 disk from slot 1 being reported as candidate)
** Customer has no backup and he needs to find out if it is fixable, or the system needs to be rebuilt from scratch.

 

--------------------------------------------------------------------------------
 Disk          Size Header    Path                                     Disk Group   User     Group   
================================================================================
   1:     491520 Mb CANDIDATE /dev/mapper/HDD_E0_S01_717882548p1       #            grid     asmadmin
   2:      75080 Mb CANDIDATE /dev/mapper/HDD_E0_S01_717882548p2       #            grid     asmadmin
   3:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S04_717894368p1       DATA         grid     asmadmin
   4:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S04_717894368p2       RECO         grid     asmadmin
   5:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S05_717844560p1       DATA         grid     asmadmin
   6:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S05_717844560p2       RECO         grid     asmadmin
   7:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S08_717882264p1       DATA         grid     asmadmin
   8:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S08_717882264p2       RECO         grid     asmadmin
   9:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S09_717844480p1       DATA         grid     asmadmin
  10:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S09_717844480p2       RECO         grid     asmadmin
  11:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S12_717844976p1       DATA         grid     asmadmin
  12:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S12_717844976p2       RECO         grid     asmadmin
  13:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S13_717845048p1       DATA         grid     asmadmin
  14:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S13_717845048p2       RECO         grid     asmadmin
  15:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S16_717895116p1       DATA         grid     asmadmin
  16:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S16_717895116p2       RECO         grid     asmadmin
  17:     491520 Mb MEMBER    /dev/mapper/HDD_E0_S17_717888848p1       DATA         grid     asmadmin
  18:      75080 Mb MEMBER    /dev/mapper/HDD_E0_S17_717888848p2       RECO         grid     asmadmin
  19:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S02_717825396p1       DATA         grid     asmadmin
  20:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S02_717825396p2       RECO         grid     asmadmin
  21:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S03_717894252p1       DATA         grid     asmadmin
  22:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S03_717894252p2       RECO         grid     asmadmin
  23:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S06_717886840p1       DATA         grid     asmadmin
  24:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S06_717886840p2       RECO         grid     asmadmin
  25:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S07_717888592p1       DATA         grid     asmadmin
  26:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S07_717888592p2       RECO         grid     asmadmin
  27:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S10_717843708p1       DATA         grid     asmadmin
  28:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S10_717843708p2       RECO         grid     asmadmin
  29:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S11_717852256p1       DATA         grid     asmadmin
  30:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S11_717852256p2       RECO         grid     asmadmin
  31:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S14_717895376p1       DATA         grid     asmadmin
  32:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S14_717895376p2       RECO         grid     asmadmin
  33:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S15_717843800p1       DATA         grid     asmadmin
  34:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S15_717843800p2       RECO         grid     asmadmin
  35:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S18_717882696p1       DATA         grid     asmadmin
  36:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S18_717882696p2       RECO         grid     asmadmin
  37:     491520 Mb MEMBER    /dev/mapper/HDD_E1_S19_717849420p1       DATA         grid     asmadmin
  38:      75080 Mb MEMBER    /dev/mapper/HDD_E1_S19_717849420p2       RECO         grid     asmadmin
  39:      70005 Mb MEMBER    /dev/mapper/SSD_E0_S20_805725574p1       REDO         grid     asmadmin
  40:      70005 Mb MEMBER    /dev/mapper/SSD_E0_S21_805708282p1       REDO         grid     asmadmin
  41:      70005 Mb MEMBER    /dev/mapper/SSD_E1_S22_805706766p1       REDO         grid     asmadmin
  42:      70005 Mb MEMBER    /dev/mapper/SSD_E1_S23_805706623p1       REDO         grid     asmadmin
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME                                                          
================================================================================
     +ASM1 /u01/app/11.2.0.3/grid                                               
     +ASM2 /u01/app/11.2.0.3/grid                                               

 
What is the backup block status ,

kfed read <device_name> aunum=1 blknum=254

Does it shows proper header ,if so the run kfed repair command.

if other blocks are fine except header this will work … else on next mount while doing COD recovery ,it will crash .

 

ASM log file info
==============

NOTE: cache closing disk 0 of grp 1: (not open) _DROPPED_0000_DATA
ERROR: Disk 1 cannot be offlined, since all the disks [1, 0] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)

Here the disks 0,1 have been put back in respective slots but still same issue.

 

Seems similar to the one described in ORA-15042: ASM disk is missing after add disk took place (Doc ID 1529397.1)

 

[Urgent] ORA-15042: ASM disk “76” is missing

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

One customer has a ASM problem about ORA-15042.
O/S: Linux X86 64bit 2.6.18-194.el5
DB Version : 10.2.0.5

Although we can access the ASM header using kfed & dd, the asm instance cannot read these devices.
For example, the ASM instance can read the 75th disk, but cannot read the 76th disk.

Do you have this experience?

# Environment
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] uname -a
Linux LGEDGDMS01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

# Error
SQL> startup
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  2094544 bytes
Variable Size             102763056 bytes
ASM Cache                  25165824 bytes
ORA-15042: ASM disk “23” is missing
ORA-15042: ASM disk “22” is missing
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk “” may result in a data loss
ORA-15042: ASM disk “88” is missing

ORA-15042: ASM disk “77” is missing
ORA-15042: ASM disk “76” is missing   ==> 76 th device
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk “” may result in a data loss
ORA-15042: ASM disk “88” is missing
ORA-15042: ASM disk “87” is missing

ORA-15042: ASM disk “81” is missing

SQL> show parameter asm_diskstring
NAME                                 TYPE                              VALUE
———————————— ——————————— ——————————
asm_diskstring                       string                    /dev/mapper/mpath_asm*

# v$asm_disks results.
select name, group_number,disk_number, path, state, header_status from v$asm_disk order by disk_number
/
NAME       GROUP_NUMBER DISK_NUMBER PATH                                     STATE                    HEADER_STATUS
———- ———— ———– —————————————- ———————— ————————————
0          66 /dev/mapper/mpath_asm129p1               NORMAL                   MEMBER
0          67 /dev/mapper/mpath_asm130p1               NORMAL                   MEMBER
0          68 /dev/mapper/mpath_asm131p1               NORMAL                   MEMBER
0          69 /dev/mapper/mpath_asm132p1               NORMAL                   MEMBER
0          70 /dev/mapper/mpath_asm133p1               NORMAL                   MEMBER
0          71 /dev/mapper/mpath_asm134p1               NORMAL                   MEMBER
0          72 /dev/mapper/mpath_asm135p1               NORMAL                   MEMBER
0          73 /dev/mapper/mpath_asm136p1               NORMAL                   MEMBER
0          74 /dev/mapper/mpath_asm137p1               NORMAL                   MEMBER
0          75 /dev/mapper/mpath_asm138p1               NORMAL                   MEMBER
0          89 /dev/mapper/mpath_asm063p1               NORMAL                   MEMBER   ==> Cannot see the 76 th device
0          90 /dev/mapper/mpath_asm064p1               NORMAL                   MEMBER
0          91 /dev/mapper/mpath_asm065p1               NORMAL                   MEMBER
0          92 /dev/mapper/mpath_asm066p1               NORMAL                   MEMBER
# Permission – OK
* 75th asm file (Good Device)
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] ls -al /dev/mapper/mpath_asm138p1
brw-rw—- 1 orasvc01 dba 253, 248 Jan 30 17:06 /dev/mapper/mpath_asm138p1

* 76th the asm file (Cannot read this device)
LGEDGDMS01:/engn001/orasvc01/product/10.2.0] ls -al /dev/mapper/mpath_asm175
brw-rw—- 1 orasvc01 dba 253, 197 Jan 30 17:06 /dev/mapper/mpath_asm175

# kfed result – OK
* 75th asm file (Good Device)
+ /engn001/orasvc01/product/10.2.0/bin/kfed read /dev/mapper/mpath_asm138p1    
kfbh.endian:                          1 ; 0x000: 0x01                          
kfbh.hard:                          130 ; 0x001: 0x82                          
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD               
kfbh.datfmt:                          1 ; 0x003: 0x01                          
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0                      
kfbh.block.obj:              2147483723 ; 0x008: TYPE=0x8 NUMB=0x4b
kfbh.check:                  2774762225 ; 0x00c: 0xa56382f1                    
kfbh.fcn.base:                        0 ; 0x010: 0x00000000                    
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000                    
kfbh.spare1:                          0 ; 0x018: 0x00000000                    
kfbh.spare2:                          0 ; 0x01c: 0x00000000                    
kfdhdb.driver.provstr:ORCLDISKASMDISK138 ; 0x000: length=18                     
kfdhdb.driver.reserved[0]:   1145918273 ; 0x008: 0x444d5341                    
kfdhdb.driver.reserved[1]:    827020105 ; 0x00c: 0x314b5349                    
kfdhdb.driver.reserved[2]:        14387 ; 0x010: 0x00003833                    
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000                    
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000                    
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000                    
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000                    
kfdhdb.dsknum:                       75 ; 0x024: 0x004b                       ==> The 75th device
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL                 
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER                 
kfdhdb.dskname:           DGDATA01_0075 ; 0x028: length=13                     
kfdhdb.grpname:                DGDATA01 ; 0x048: length=8                      
kfdhdb.fgname:            DGDATA01_0075 ; 0x068: length=13                     
kfdhdb.capname:                         ; 0x088: length=0                      
kfdhdb.crestmp.hi:             32973218 ; 0x0a8: HOUR=0x2 DAYS=0xd MNTH=0x8 YEAR=0x7dc
kfdhdb.crestmp.lo:           1898247168 ; 0x0ac: USEC=0x0 MSEC=0x13d SECS=0x12 MINS=0x1c
kfdhdb.mntstmp.hi:             32973219 ; 0x0b0: HOUR=0x3 DAYS=0xd MNTH=0x8 YEAR=0x7dc
kfdhdb.mntstmp.lo:           1163180032 ; 0x0b4: USEC=0x0 MSEC=0x12e SECS=0x15 MINS=0x11
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200                        
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000                        
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000                    
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80                    
kfdhdb.dsksize:                   13892 ; 0x0c4: 0x00003644                    

* 76th the asm file (Cannot read this device)
+ /engn001/orasvc01/product/10.2.0/bin/kfed read /dev/mapper/mpath_asm175      
kfbh.endian:                          1 ; 0x000: 0x01                          
kfbh.hard:                          130 ; 0x001: 0x82                          
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD               
kfbh.datfmt:                          1 ; 0x003: 0x01                          
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0                      
kfbh.block.obj:              2147483724 ; 0x008: TYPE=0x8 NUMB=0x4c
kfbh.check:                  2433973412 ; 0x00c: 0x91137ca4                    
kfbh.fcn.base:                        0 ; 0x010: 0x00000000                    
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000                    
kfbh.spare1:                          0 ; 0x018: 0x00000000                    
kfbh.spare2:                          0 ; 0x01c: 0x00000000                    
kfdhdb.driver.provstr:ORCLDISKASMDISK175 ; 0x000: length=18                     
kfdhdb.driver.reserved[0]:   1145918273 ; 0x008: 0x444d5341                    
kfdhdb.driver.reserved[1]:    827020105 ; 0x00c: 0x314b5349                    
kfdhdb.driver.reserved[2]:        13623 ; 0x010: 0x00003537                    
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000                    
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000                    
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000                    
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000                    
kfdhdb.dsknum:                       76 ; 0x024: 0x004c            ==> the 76th device, ASM instance cannot read this device.
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL                 
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER                 
kfdhdb.dskname:           DGDATA01_0076 ; 0x028: length=13                     
kfdhdb.grpname:                DGDATA01 ; 0x048: length=8                      
kfdhdb.fgname:            DGDATA01_0076 ; 0x068: length=13                     
kfdhdb.capname:                         ; 0x088: length=0                      
kfdhdb.crestmp.hi:             32982981 ; 0x0a8: HOUR=0x5 DAYS=0x1e MNTH=0x1 YEAR=0x7dd
kfdhdb.crestmp.lo:            366295040 ; 0x0ac: USEC=0x0 MSEC=0x14e SECS=0x1d MINS=0x5
kfdhdb.mntstmp.hi:             32982981 ; 0x0b0: HOUR=0x5 DAYS=0x1e MNTH=0x1 YEAR=0x7dd
kfdhdb.mntstmp.lo:            366307328 ; 0x0b4: USEC=0x0 MSEC=0x15a SECS=0x1d MINS=0x5
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200                        
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000                        
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000                    
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80                    
kfdhdb.dsksize:                   55572 ; 0x0c4: 0x0000d914                    

Check the dd results – OK
* the 75th asm device (Good)
dd if=/dev/mapper/mpath_asm138p1 bs=4096|od -tx1z|more                    
                              
0000000 01 82 01 01 00 00 00 00 4b 00 00 80 f1 82 63 a5  >……..K…..c.<
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000040 4f 52 43 4c 44 49 53 4b 41 53 4d 44 49 53 4b 31  >ORCLDISKASMDISK1<
0000060 33 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >38…………..<
0000100 00 00 10 0a 4b 00 02 03 44 47 44 41 54 41 30 31  >….K…DGDATA01<
0000120 5f 30 30 37 35 00 00 00 00 00 00 00 00 00 00 00  >_0075………..<
0000140 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31  >……..DGDATA01<
0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000200 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31  >……..DGDATA01<
0000220 5f 30 30 37 35 00 00 00 00 00 00 00 00 00 00 00  >_0075………..<
0000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0000300 00 00 00 00 00 00 00 00 a2 21 f7 01 00 f4 24 71  >………!….$q<
0000320 a3 21 f7 01 00 b8 54 45 00 02 00 10 00 00 10 00  >.!….TE……..<
0000340 80 bc 01 00 44 36 00 00 02 00 00 00 01 00 00 00  >….D6……….<
0000360 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000400 00 00 10 0a 14 cd f6 01 00 2c 95 00 00 00 00 00  >………,……<
0000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0010000 01 82 02 01 01 00 00 00 4b 00 00 80 de 63 17 81  >……..K….c..<
0010020 af e0 35 00 00 00 00 00 00 00 00 00 00 00 00 00  >..5………….<
0010040 00 00 00 00 fe 00 20 00 c0 01 00 01 c0 01 00 01  >…… ………<
0010060 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 01 01  >…………….<
0010100 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 00 01  >…………….<
0010120 c0 01 00 01 c0 01 01 01 c0 01 01 01 c0 01 01 01  >…………….<
0010140 c0 01 01 01 c0 01 01 01 c0 01 01 01 c0 01 01 01  >…………….<
*                                                           
0010240 c0 01 01 01 04 00 01 01 00 00 00 00 00 00 00 00  >…………….<
0010260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0020000 01 82 03 01 02 00 00 00 4b 00 00 80 ce 10 bd 80  >……..K…….<
0020020 df ad 1e 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0020040 00 00 00 00 c0 01 00 00 08 00 08 00 00 00 c0 01  >…………….<
0020060 10 00 10 00 00 00 00 00 18 00 18 00 00 00 00 00  >…………….<
0020100 20 00 20 00 00 00 00 00 00 00 00 00 00 00 80 00  > . ………….<
0020120 00 00 00 00 00 00 80 00 d9 0b 00 00 18 01 80 00  >…………….<

* 76th device (Read Failure)                                                    
dd if=/dev/mapper/mpath_asm175 bs=4096|od -tx1z|more                    
                              
0000000 01 82 01 01 00 00 00 00 4c 00 00 80 a4 7c 13 91  >……..L….|..<
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000040 4f 52 43 4c 44 49 53 4b 41 53 4d 44 49 53 4b 31  >ORCLDISKASMDISK1<
0000060 37 35 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >75…………..<
0000100 00 00 10 0a 4c 00 02 03 44 47 44 41 54 41 30 31  >….L…DGDATA01<
0000120 5f 30 30 37 36 00 00 00 00 00 00 00 00 00 00 00  >_0076………..<
0000140 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31  >……..DGDATA01<
0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000200 00 00 00 00 00 00 00 00 44 47 44 41 54 41 30 31  >……..DGDATA01<
0000220 5f 30 30 37 36 00 00 00 00 00 00 00 00 00 00 00  >_0076………..<
0000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0000300 00 00 00 00 00 00 00 00 c5 47 f7 01 00 38 d5 15  >………G…8..<
0000320 c5 47 f7 01 00 68 d5 15 00 02 00 10 00 00 10 00  >.G…h……….<
0000340 80 bc 01 00 14 d9 00 00 02 00 00 00 01 00 00 00  >…………….<
0000360 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0000400 00 00 10 0a 14 cd f6 01 00 2c 95 00 00 00 00 00  >………,……<
0000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0000660 00 00 00 00 00 00 00 00 02 ec 44 ff 00 00 00 00  >……….D…..<
0000700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0010000 01 82 02 01 01 00 00 00 4c 00 00 80 c3 62 4b 80  >……..L….bK.<
0010020 65 e0 35 00 00 00 00 00 00 00 00 00 00 00 00 00  >e.5………….<
0010040 00 00 00 00 fe 00 7d 00 c0 01 00 01 c0 01 00 01  >……}………<
0010060 c0 01 00 01 c0 01 00 01 c0 01 00 01 c0 01 00 01  >…………….<
*                                                           
0010460 c0 01 01 01 c0 01 01 01 c0 01 01 01 c0 01 01 01  >…………….<
*                                                           
0011020 c0 01 01 01 c0 01 01 01 14 00 01 01 00 00 00 00  >…………….<
0011040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
*                                                           
0020000 01 82 03 01 02 00 00 00 4c 00 00 80 f0 45 ff 80  >……..L….E..<
0020020 9a d8 1c 00 00 00 00 00 00 00 00 00 00 00 00 00  >…………….<
0020040 00 00 00 00 c0 01 00 00 08 00 08 00 00 00 c0 01  >…………….<
0020060 10 00 10 00 00 00 00 00 18 00 18 00 00 00 00 00  >…………….<

 

kfod status=true asm_diskstring=’/dev/mapper/mpath_asm*’ disk=ALL
——————————————————————————–
Disk          Size Header    Path
================================================================================
1:      13893 Mb CANDIDATE /dev/mapper/mpath_asm001
2:      13892 Mb MEMBER    /dev/mapper/mpath_asm001p1
3:      13893 Mb CANDIDATE /dev/mapper/mpath_asm002
4:      13892 Mb MEMBER    /dev/mapper/mpath_asm002p1
5:      13893 Mb CANDIDATE /dev/mapper/mpath_asm003
6:      13892 Mb MEMBER    /dev/mapper/mpath_asm003p1
7:      13893 Mb CANDIDATE /dev/mapper/mpath_asm004

274:      13892 Mb MEMBER    /dev/mapper/mpath_asm137p1
275:      13893 Mb CANDIDATE /dev/mapper/mpath_asm138
276:      13892 Mb MEMBER    /dev/mapper/mpath_asm138p1   ==> MEMBER
277:      13893 Mb CANDIDATE /dev/mapper/mpath_asm139
278:      13892 Mb MEMBER    /dev/mapper/mpath_asm139p1

343:      62400 Mb CANDIDATE /dev/mapper/mpath_asm172
344:      62393 Mb MEMBER    /dev/mapper/mpath_asm172p1
345:      62400 Mb CANDIDATE /dev/mapper/mpath_asm173
346:      62393 Mb MEMBER    /dev/mapper/mpath_asm173p1
347:      62400 Mb CANDIDATE /dev/mapper/mpath_asm174
348:      62393 Mb MEMBER    /dev/mapper/mpath_asm174p1
349:      55572 Mb CANDIDATE /dev/mapper/mpath_asm175    ==> CANDIDATE~!
350:      55572 Mb CANDIDATE /dev/mapper/mpath_asm176
351:      55572 Mb CANDIDATE /dev/mapper/mpath_asm177
352:      55572 Mb CANDIDATE /dev/mapper/mpath_asm178
353:      55572 Mb CANDIDATE /dev/mapper/mpath_asm179

 

I could find out one. The added partition don’t have any partition tables, but the original asm disks do it.
Due to storage engineer fault, I suppose that the KFED results is “MEMBER” and the KFOD result is “CANDICATE’ status.
I’ll replace it to the additional disks with partition tables.
If it successful, I will reply it.

# Reference
(Doc ID 580153.1) How To Setup ASM on Linux Using ASMLIB Disks, Raw Devices or Block Devices?
In order to use a disk (e.g. SAN) in Automatic Storage Management, the disk must have a partition table.


ASM diskgroup cann’t mount and drop

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

IHAC who encounter an error as below after restart database and storage .

 

 

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "3" is missing
 
Then we found /dev/raw/raw9 is in candidate status
SQL> select path,HEADER_STATUS,MOUNT_STATUS,MODE_STATUS from v$asm_disk;
 
PATH            HEADER_STATU MOUNT_S MODE_ST
--------------- ------------ ------- -------
/dev/raw/raw9   CANDIDATE    CLOSED  ONLINE
/dev/raw/raw6   MEMBER       CLOSED  ONLINE
/dev/raw/raw7   MEMBER       CLOSED  ONLINE
/dev/raw/raw8   MEMBER       CLOSED  ONLINE
/dev/raw/raw1   FOREIGN      CLOSED  ONLINE
/dev/raw/raw4   FOREIGN      CLOSED  ONLINE
/dev/raw/raw3   FOREIGN      CLOSED  ONLINE
/dev/raw/raw2   FOREIGN      CLOSED  ONLINE
/dev/raw/raw5   FOREIGN      CLOSED  ONLINE 
 
While we are using kfed checking the status , we found /dev/raw/raw9 was invalid.
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=2 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=4 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=10 | grep KFBTYP
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
[oracle@DCSDB2 ~]$ kfed read /dev/raw/raw9 blkn=100 | grep KFBTYP
kfbh.type: 
 

Right now we want to remove or drop /dev/raw/raw9 and bring the database up , But we can’t drop it in normal because /dev/raw/raw9 can’t mount.

My question is  how can we drop or remove /dev/raw/raw9 (CT clear that they is no data or important data in this device) and bring the database /ASM up.

 

[root@DCSDB1 ~]#   ls -l /dev/raw/*
crw------- 1 root   oinstall 162, 1 01-27 06:37 /dev/raw/raw1
crw------- 1 root   oinstall 162, 2 01-27 06:37 /dev/raw/raw2
crw------- 1 oracle oinstall 162, 3 01-27 06:37 /dev/raw/raw3
crw------- 1 oracle oinstall 162, 4 01-27 06:37 /dev/raw/raw4
crw------- 1 oracle oinstall 162, 5 01-27 06:37 /dev/raw/raw5
crw------- 1 oracle oinstall 162, 6 01-27 06:37 /dev/raw/raw6
crw------- 1 oracle oinstall 162, 7 01-27 06:37 /dev/raw/raw7
crw------- 1 oracle oinstall 162, 8 01-27 06:37 /dev/raw/raw8
crw------- 1 oracle oinstall 162, 9 01-27 06:37 /dev/raw/raw9

[root@DCSDB1 ~]# cat /etc/sysconfig/rawdevices
# raw device bindings
# format:    
#           
# example: /dev/raw/raw1 /dev/sda1
#          /dev/raw/raw2 8 5
/dev/raw/raw1      /dev/mapper/oravg-ocr1
/dev/raw/raw2      /dev/mapper/oravg-ocr2
/dev/raw/raw3      /dev/mapper/oravg-vot1
/dev/raw/raw4      /dev/mapper/oravg-vot2
/dev/raw/raw5      /dev/mapper/oravg-vot3
/dev/raw/raw6     /dev/mapper/oravg-data1
/dev/raw/raw7     /dev/mapper/oravg-data2
/dev/raw/raw8     /dev/mapper/oravg-data3
/dev/raw/raw9     /dev/mapper/oravg-data5

[root@DCSDB1 tmp]# cat /proc/partitions
major minor  #blocks  name

  8     0 1754880000 sda
  8     1     514048 sda1
  8     2 1754362260 sda2
  8    16  262144000 sdb
  8    32  262144000 sdc
  8    48  262144000 sdd
  8    64  262144000 sde
  8    80  262144000 sdf
  8    96  262144000 sdg
  8   112  262144000 sdh
  8   128  262144000 sdi
  8   144  262144000 sdj
  8   160  262144000 sdk
  8   176  262144000 sdl
  8   192  262144000 sdm
  8   208  262144000 sdn
  8   224  262144000 sdo
  8   240  262144000 sdp
 65     0  262144000 sdq
 65    16  262144000 sdr
 65    32  262144000 sds
 65    48  262144000 sdt
 65    64  262144000 sdu
 65    80  262144000 sdv
 65    96  262144000 sdw
 65   112  262144000 sdx
 65   128  262144000 sdy
 65   144  262144000 sdz
 65   160  262144000 sdaa
 65   176  262144000 sdab
 65   192  262144000 sdac
 65   208  262144000 sdad
 65   224  262144000 sdae
 65   240  262144000 sdaf
 66     0  262144000 sdag
 66    16  262144000 sdah
 66    32  262144000 sdai
 66    48  262144000 sdaj
 66    64  262144000 sdak
 66    80  262144000 sdal
 66    96  262144000 sdam
 66   112  262144000 sdan
 66   128  262144000 sdao
 66   144  262144000 sdap
 66   160  262144000 sdaq
 66   176  262144000 sdar
 66   192  262144000 sdas
 66   208  262144000 sdat
 66   224  262144000 sdau
 66   240  262144000 sdav
 67     0  262144000 sdaw
253     0    1048576 dm-0
253     1   52428800 dm-1
253     2   10485760 dm-2
253     3   10485760 dm-3
253     4   10485760 dm-4
253     5   10485760 dm-5
253     6   10485760 dm-6
253     7   33554432 dm-7
253     8 1073741824 dm-8
253     9  262144000 dm-9
253    10  262144000 dm-10
253    11  262144000 dm-11
253    12  262144000 dm-12
253    13  262144000 dm-13
253    14  262144000 dm-14
253    15  262144000 dm-15
253    16  262144000 dm-16
253    17  262144000 dm-17
253    18  262144000 dm-18
253    19  262144000 dm-19
253    20  262144000 dm-20
253    21     512000 dm-21
253    22     512000 dm-22
253    23     512000 dm-23
253    24     512000 dm-24
253    25     512000 dm-25
253    26  157286400 dm-26
253    27  157286400 dm-27
253    28  157286400 dm-28
253    29  157286400 dm-29
253    30  157286400 dm-30
253    31  157286400 dm-31
253    32  157286400 dm-32
253    33  157286400 dm-33
253    34  157286400 dm-34
253    35  157286400 dm-35
253    36  157286400 dm-36
253    37  157286400 dm-37
253    38  157286400 dm-38
253    39  157286400 dm-39
253    40  157286400 dm-40
253    41  157286400 dm-41
253    42  157286400 dm-42
253    43  157286400 dm-43
253    44  157286400 dm-44
 
[oracle@DCSDB2 dbs]$ cat /app/admin/+ASM/pfile/init.ora
 
 
##############################################################################
# Copyright (c) 1991, 2001, 2002 by Oracle Corporation
##############################################################################
 
###########################################
# Cluster Database
###########################################
cluster_database=true
 
###########################################
# Diagnostics and Statistics
###########################################
background_dump_dest=/app/admin/+ASM/bdump
core_dump_dest=/app/admin/+ASM/cdump
user_dump_dest=/app/admin/+ASM/udump
 
###########################################
# Miscellaneous
###########################################
instance_type=asm
 
###########################################
# Pools
###########################################
large_pool_size=12M
 
###########################################
# Security and Auditing
###########################################
remote_login_passwordfile=exclusive
 
 
asm_diskgroups='DATA'
 
+ASM2.instance_number=2
+ASM1.instance_number=1

 

 

‘m assuming the following are true:

–  that the DATA diskgroup redundancy is either NORMAL or HIGH.
–  the redundancy is NOT external
–  You have a recent backup of the database.

If this is the case, then do the following:

1.  Mount FORCE

alter diskgroup DATA mount force;

Let it mount.

2.  Inspect that everything is there.

3.  Drop force the disk

alter diskgorup drop disk ‘/dev/raw/raw9’ force;

3.  Issue a rebalance if one does not kick off automatcally.

alter diskgroup DATA rebalance;

and let it finish.

From the SQL language documentation for ALTER DISKGROUP:

  • In the FORCE mode, Oracle ASM attempts to mount the disk group even if it cannot discover all of the devices that belong to the disk group. This setting is useful if some of the disks in a normal or high redundancy disk group became unavailable while the disk group was dismounted. WhenMOUNT FORCE succeeds, Oracle ASM takes the missing disks offline.

    If Oracle ASM discovers all of the disks in the disk group, then MOUNT FORCE fails. Therefore, use the MOUNT FORCE setting only if some disks are unavailable. Otherwise, useNOFORCE.

    In normal- and high-redundancy disk groups, disks from one failure group can be unavailable and MOUNT FORCE will succeed. Also in high-redundancy disk groups, two disks in two different failure groups can be unavailable and MOUNT FORCE will succeed. Any other combination of unavailable disks causes the operation to fail, because Oracle ASM cannot guarantee that a valid copy of all user data or metadata exists on the available disks.

 

Are you sure its external?  I don’t mean to ask that like you wouldn’t know but here is a sure way to know.

There is a tool called amdu and its in your grid home.  This is 11gR2, correct?  If so, you can do the following:

$ORACLE_HOME/bin/amdu

It will create a amdu directory with the current date and in that directory it creates a file called report.txt.  It will report out all of the disks belonging to the DATA disk groups.  One of the fields for each disk
is redundancy.  If its set to 0 or 1, I believe your external. If its set to 2 or 3, your NORMAL or HIGH.

I don’t know how an external redundant AMS diskgroup can be recovered.

From the ASM doc:

  • External redundancy

    Oracle ASM does not provide mirroring redundancy and relies on the storage system to provide RAID functionality. Any write error causes a forced dismount of the disk group. All disks must be located to successfully mount the disk group.

I will let someone else comment but, you may have to restore and recover the database.

if only the ASM diskheader was corrupted, and not the whole disk, it might be worth a try to only recover the disk header. This does make sense in an external DG, since you can’t access the Data anyways. Search in MOS how to do this.
 

 

ORA-00600 [3020] when break remote mirror and startup database

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

1. Customer is using HDS remote mirror for DR solution. After breaking the mirror, in the DR site, some databases cannot startup with the following errors:

a1.
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: ‘+DATA07_AI401PO1/ai401po1/datafile/sysaux_01.dbf’
ORA-01207: file is more recent than control file – old control file

a2 (same database as a1, after some commands).
ORA00600: internal error code, arguments: [3020], [5], [896], [20972416], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 5, block# 896, file offset is 7340032 bytes)
ORA-10564: tablespace UNDOTBS2
ORA-01110: data file 5: ‘+DATA07_AI401PO1/ai401po1/datafile/undotbs2_01.dbf’
ORA-10560: block type ‘KTU UNDO BLOCK’

Resolved by “recover datafile 3”.

b.
ERROR at line 1:
ORA00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],
[], [], [], [], [], [], []

Resolved by “recover database”.

Questions:
Q1. Understand from the MOS note 604683.1 and 784776.1 that, the storage vendor (HDS) is responsible for Oracle requirements of “crash consistent”, “write ordering”, POC and procedure. However, given the above errrors, how to tell if the break mirror fulfill Oracle requirements or not?

Q2. For (b) above, the MOS note 393984.1 matches it. It may happen even in a single site crash recovery scenario. It is an Oracle bug or expected behavior?

The database version is 11.2.0.2.

 

A1. The errors in a1 indicate that a datafile had at least a higher database checkpoint than the controlfile. It may have helped to get a controlfile dump and data file header dumps to verify.

It’s not clear what commands were issued to get to the state of a2, but an ORA-600 [3020] probably means that a data block was behind the file header checkpoint info. In other words, based on file header info, we started recovery with logfile #N. But block 896 probably needed a redo record from logfile N-1 to be applied first. If recovering from an older backup of the data file worked, then that would give more weight to that theory. Note 30866.1 does list some bugs where you can still get ORA-600 [3020] during regular recovery though.

A2. You can also read bugs that reference ORA-600 [kcratr_scan_lastbwr]. The ORA-600 [kcratr1_lastbwr] seems to only be in 11.2.0.1 rather than in 11.2.0.2.  Maybe the customer is really on 11.2.0.1 + PSU 2?  Anyway it could be indicative of a stale mirror as noted in bug 9584943, but there are other bugs that I didn’t read in detail.

 

Customer just updated that the EMC “consistent group” was not implemented for some reasons.

We are going to tell customer that, in this break remote mirror DR solution, if the Oracle requirements of “crash consistent”, “write ordering” cannot be meet (MOS note 604683.1 and 784776.1 ), in the worst case, customer may not be able to even recover the database.  Is this correct?

 

Recovery might work if they restore a prior backup and roll forward. :-)
But maybe full recovery from a backup could still result in transaction loss if the active online redo logs are also corrupt because of lost writes to the mirror those redo logs reside on.

Oracle ORA-600 [25027]

$
0
0

 

 


ERROR:
  Format: ORA-600 [25027] [a] [b]
VERSIONS:
  versions 9.2 and above


ARGUMENTS:
  Arg [a]  Tablespace Number (TSN)
  Arg [b]  Decimal Relative Data Block Address (RDBA)

In 12c it includes Multitenant information:
  
  Arg [a]  0 if Multitenant is not enabled or 0 if there is not Root CDB session, 1 ROOT PDBID, otherwise PDBID top session
  Arg [b]  PDBID
  Arg [c]  Tablespace Number (TSN)
  Arg [d]  Decimal Relative Data Block Address (RDBA)



SUGGESTIONS:
  
 1. If the Arg [b] onr [d] in 12c (the RDBA) is 0 (zero), then this could be caused by fake indexes.

  The following query will list fake indexes:

     select do.owner,do.object_name, do.object_type,sysind.flags
     from dba_objects do, sys.ind$ sysind
     where do.object_id = sysind.obj#
     and bitand(sysind.flags,4096)=4096;

  If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error. 

2. Run analyze table validate structure on the table referenced in the Current SQL statement in 
    the related trace file.

  If the Known Issues section below does not help in terms of identifying
  a solution, please submit the trace files and alert.log to Oracle 
  Support Services for further analysis.

  Known Issues:

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button: 
               
 

 
NBProbBugFixedDescription
 II18878420 ORA-600 [25027] can occur with large datafiles using ASSM
 I1849054312.1.0.2, 12.2.0.0ORA-600 [25027][0][0] from ALTER TABLE .. MOVE with nosegment index
 I1457675512.1.0.1.4, 12.1.0.2, 12.2.0.0Corruption type ORA-600 errors from heavy concurrent DML on index cluster table
 II1401018311.2.0.3.BP22, 11.2.0.4.2, 11.2.0.4.BP03, 12.1.0.1.4, 12.1.0.2, 12.2.0.0ORA-600 [ktspfundo:objdchk_kcbgcur_3] in SMON after failed temp segment merge load
 III1350355411.2.0.4, 12.2.0.0Various ORA-600 errors crashing the apply process in a downstreams environment
 II1378571611.2.0.4, 12.1.0.1Intermittent ORA-600 [25027] during upgrade from 10.2 to 11.2
 I1166182411.2.0.1.BP09Assorted Dumps by SQL*LOADER using DIRECT and PARALLEL after exadata bp8 is applied
 II1917108612.2.0.0ORA-600 [25027] when local index has unusable index partitions
 II1006724612.1.0.2, 12.2.0.0ORA-600 [25027] ORA-7445 [kauxs_do_dml_cooperation] ORA-8102 during CREATE INDEX ONLINE
 III1350539011.2.0.3.BP04, 11.2.0.4ORA-600 [kkedsgettabblkcnt: null segment] / ORA-600 [25027] against PARTITION table with Delayed Segment Creation or Interval Partitioned Table
 II1413813011.2.0.3.5, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1SGA memory corruption / ORA-7445 when modifying uncompressed blocks of an HCC-compressed segment
 II1356693811.2.0.3.4, 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1ORA-600 [kcbgtcr_1] / ORA-600 [kkpo_rcinfo_defstg:objnotfound] / ORA-600 [25027] against a Partitioned Table during Dynamic Sampling
 II1333001811.2.0.4, 12.1.0.1ora-600 [ktspfmb_add1], [4294959240] occurred, then cannot recover with ora-600[25027]
 III1310391311.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1ORA-600 [25027] [ts#] [1] or false ORA-1 during dml while index is being rebuilt online
 II1282141811.2.0.3.8, 11.2.0.3.BP18, 11.2.0.4, 12.1.0.1Direct NFS appears to be sending zero length windows to storage device. It may also cause Lost Writes
 II1261952911.2.0.3.BP18, 11.2.0.4, 12.1.0.1ORA-600[kdsgrp1] from SELECT on plugged in tablespace with FLASHBACK
 II1232130911.2.0.4, 12.1.0.1ORA-600 / ORA-8103 UNUSABLE state of partitioned index is not carried across by TABLESPACE transport using DataPump
 III1039482511.2.0.3, 12.1.0.1ORA-600[25027] [..] [0] inserting to ASSM segment
 -1032914611.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+II1020923211.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
+IIII939999111.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1Assorted Internal Errors and Dumps (mostly under kkpa*/kcb*) from SQL against partitioned tables
*III914554111.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
EII883791911.2.0.2, 12.1.0.1DBV / RMAN enhanced to detect ASSM blocks with ktbfbseg but not ktbfexthd flag set as in Bug 8803762
 III880376211.1.0.7.6, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1ORA-600[kdsgrp1], ORA-600[25027] or wrong results on 11g database upgrade from 9i
 II871606411.2.0.2, 12.1.0.1Analyze Table Validate Structure fails on ADG standby with several errors
+II859710611.2.0.1.BP06, 11.2.0.2, 12.1.0.1Lost Write in ASM when normal redundancy is used
 II725104911.2.0.1.BP08, 11.2.0.2, 12.1.0.1Corruption in bitmap index introduced when using transportable tablespaces
 -843721310.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1ASSM first level bitmap block corruption
 III835696611.2.0.1ORA-7445 [kdr9ir2rst] by DBMS_ADVISOR or false ORA-1498 by ANALYZE on COMPRESS table
*III819890610.2.0.5, 11.2.0.1OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
*III726384210.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1ORA-955 during CTAS / OERI [ktsircinfo_num1] / dictionary inconsistency for PARTITIONED Tables
 -666691510.2.0.5, 11.1.0.7, 11.2.0.1OERI[25027] / dictionary corruption from concurrent partition DDL
 -602599310.2.0.5, 11.1.0.6ORA-600 [25027] in flashback archiving queries
 -49253429.2.0.8, 10.2.0.3, 11.1.0.6OERI [25027] / OERI [25012] on IOT analyze estimate statistics
*IIII719027010.2.0.4.1, 10.2.0.5Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
 -43103719.2.0.8, 10.2.0.2OERI [25027] from concurrent startup / shutdown in RAC
 -417765110.2.0.1Row migration within a MERGE may OERI[25027]
 -402019510.1.0.5, 10.2.0.1OERI 25027 can occur in RAC accessing transported tablespace
 -40008409.2.0.7, 10.1.0.4, 10.2.0.1Update of a row with more than 255 columns can cause block corruption
 II396313510.1.0.5, 10.2.0.1OERI[kcbgcur_3] / OERI:25027 during bitmap index updates
 -382990010.1.0.4, 10.2.0.1OERI[25027] possible accessing index in 10g
 -29421859.2.0.6, 10.1.0.4, 10.2.0.1Corruption occurs on direct path load into IOT with ADDED columns
 II308505710.1.0.2ORA-600: [25027] from ALTER TABLE .. SHRINK SPACE CASCADE
 -29261829.2.0.5, 10.1.0.2OERI[25027] / ORA-22922 accessing LOB columns in IOT in AFTER UPDATE trigger

 

ASM Metadata Dump Utility (AMDU)

$
0
0

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 

     ASM Metadata Dump Utility (AMDU)
 
     This is a functional description of a utility to quickly 
     extract all the available metadata from one or more ASM 
     disks and/or generate formatted printouts of individual 
     blocks. The dump output can be shipped back to Oracle for 
     analysis. The utility can be used at Oracle to generate 
     formatted block printouts from the dump output. The utility 
     does not require that any disk group is even mountable. It 
     also has the ability to extract one or more files from an 
     unmounted diskgroup and write them to the OS file system.
     
     Operations
     
     AMDU performs three different functions. A given execution 
     of AMDU may perform one, two or all three of these 
     functions. 
     
     1.Dump metadata from ASM disks to the OS file system for 
     later analysis.
     
     2.Extract the contents of an ASM file and write it to an OS 
     file system even if the diskgroup is not mounted.
     
     3.Print metadata blocks based on the C structures in the 
     blocks, or in hex.
     
     The input data may be the contents of the ASM disks, or it 
     may be derived from a directory created by a previous run 
     of AMDU. The options -diskstring and -exclude are used to 
     specify ASM disks to read. The option -directory specifies 
     a directory created by a previous run of AMDU. The 
     directory may contain a copy of the original directory 
     contents. These options are incompatible with each other.
     
     Operational Phases
     
     The basic steps of operation are listed in this section. 
     Command line options provide the ability to control which 
     phases are executed and how they operate.
     
     1.Discover disks: This uses ASM discovery to find a set of 
     disks. The headers are read to determine which disks are 
     in which diskgroups. The disks to be scanned in the next 
     phase are chosen. The results of the discovery are put in 
     the report file. With the option -directory, reading the 
     existing report file rather than creating a new one 
     accomplishes this phase.
     
     2.Scan disks: The allocation tables of disks are scanned. 
     Based on the allocation table entries and command line 
     options, interesting blocks are written to image files. 
     Map files are created describing the interesting AU's and 
     where they were written to the image files. If any files 
     are being extracted, their extent maps are constructed in 
     memory from the allocation table entries (extent maps are 
     ignored). If any blocks are being printed the location of 
     the blocks is saved in memory. With the -directory option 
     this phase is accomplished by reading the existing map 
     files rather than creating map and image files.
     
     3.Extract files: The extent maps of files to extract are 
     sorted. The file data is read from the ASM disks and 
     written to output files. If -directory is specified for extraction of
     an ASM metadata file, the map and image files are read to build
     the extent maps. 
     
     4.Printout blocks: Formatted block printouts are written to 
     standard out along with information about how the block 
     data was read. A kfed command to dump the block on the 
     system where the report was generated is also printed. 
     With the -directory option the data is read from the 
     image files.
     
     Output Files
     
     Four types of ouptut files are created by AMDU. They are 
     all placed in a new dump directory. The file names are 
     automatically generated by AMDU. A new dump directory is 
     created for each run so the output files can be easily 
     tarred and zipped to send back to Oracle. The name of the 
     directory is based on the time and date to one second 
     resolution. The directory name is written to standard out 
     before any files are created in the directory. Note that 
     the directory name is relative to the current directory 
     unless a full path name is specified on the command line 
     with the -parent option.
     
     If AMDU is run with the -directory option then no dump 
     directory and no output files are created. Instead the -
     directory option specifies the location of a previously 
     created dump directory. In this case -print can be 
     specified to generate formatted block printouts from the 
     previously created dump directory. The printouts are sent 
     to standard out rather than creating a new file. If -extract
     is specified with -directory, -output is required to indicate 
     the location of the extracted file. 
     
     Extracted Files
     
     One extracted file is created for every file listed under 
     the -extract option on the command line. Normally, the extracted
     file is placed in the dump directory under the name 
     <group>_<number>.f  where <group> is the diskgroup name in 
     uppercase, and <number> is the file number of the file specified 
     on the command line. The extracted file will appear to have the 
     same contents it would have if accessed through the 
     database. If some portion of the file is unavailable then 
     that portion of the output file will be filled with 
     0xBADFDA7A, and a message will appear on stderr.
     
     The -output option can be used to extract a single file to 
     a specific file name rather than the dump directory. This 
     can be used in combination with -nodir option to avoid the 
     creation of a dump directory completely. If -directory  is
     specified, -output is required. 
     
     Image Files
     
     Image files contain block images from the ASM disks. This 
     is the raw data that is copied from the disks. Since there 
     can be a lot of data, and some file systems have problems 
     with large files, an image file is always smaller than 2 
     gigabytes. When there is more that 2Gb of data, multiple 
     image files are created. An image file may contain data 
     from multiple disks, but only from disks that are part of 
     the same disk group (according to the disk's header). All 
     the data from one disk will be grouped together in the 
     image files (possibly spanning a file boundary). Blocks 
     from a single allocation unit will always be adjacent and 
     not span image files. Uninteresting data, such as empty 
     blocks, will not be dumped, so a partial AU might be in the 
     dump. Thus the size of a full image file is not constant. 
     Disks that have been dropped from a disk group will still 
     contain the group name in their header and may be included 
     in the image files for that disk group if the -former 
     option is specified. Note that, unlike mount, the PST is 
     not consulted to decide which disks are parts of the disk 
     group. Disks which were forcibly dropped will be included 
     even without the -former option.
     
     Image file names are constructed from the group name and a 
     sequence number. The form is as follows where <group> is 
     the group name in uppercase, and <NNNN> is the sequence 
     number including leading zeroes. The first image file has 
     sequence number 0001.
     <group>_<NNNN>.img
     
     Map Files
     
     Map files are ASCII files that describe the data in the 
     image files for a particular disk group. AMDU creates one 
     map file for each series of image files, i.e. one map file 
     per disk group. The map file contains one line for each 
     allocation unit that has contents dumped to an image file. 
     Some allocation units may have an entry in the map file 
     even though nothing was written to the image file. Every 
     line has the same fields of the same length. The lines are 
     in the order of the data in the image file, but contain 
     absolute references to the locations in the image file so 
     that they can be sorted into different orders without 
     losing track of where the AU is stored in the image files.
     
     The following fields appear in each line. The fields are 
     separated by blanks. Each field starts with a unique letter 
     immediately followed by a decimal number with leading 
     zeroes. This should facilitate using sort and grep to 
     reorganize the map. In the following descriptions the 
     leading letter and the number of decimal digits are given 
     within parentheses. For example (D4) means the letter 'D' 
     followed by 4 decimal digits.
     
     1.Disk Report Number (N4): Every disk discovered by shallow 
     discovery is assigned a disk report number. This number 
     is printed in the report file along with information 
     about the disk. Two disks from the same diskgroup with 
     the same disk number will still have different disk 
     report numbers. The first disk reported will have a disk 
     report number of 1.
     
     2.Disk number (D4): This is the disk number field extracted 
     from the header. If the disk number is invalid or the 
     header unrecognizable this field is 9999.
     
     3.Disk repeat (R2): Normally this is zero. It is possible 
     to find two disks for the same disk number in the same 
     disk group. The first repeat gets a repeat count of 1 for 
     its map file entries. If there are more than 100 disks 
     with the same number then extra digits will be printed 
     and the line sizes will be wrong. This is highly 
     unlikely.
     
     4.Allocation Unit (A8): The AU within the disk where the 
     data was read. Note that this is different than the 
     extent number for physically addressed metadata since 
     extent 2 is near AU 113,000. If the disk is greater than 
     100 terabytes and the AU size is one megabyte, then this 
     field could exceed 8 digits.
     
     5.File Number (F8): The ASM file that owns the extent. If 
     the number is less than 256 then this is ASM metadata or 
     an ASM registry. If this is physically addressed metadata 
     then the file number will be 00000000.
     
     6.Indirect flag (I1): If this is a data extent for the file 
     then the indirect flag is 0. If this is an indirect 
     extent then this is 1.
     
     7.Extent Number (E8): The physical extent number within the 
     file. This is the index in the file extent map that a 
     database instance would use to find this AU. If the file 
     was (two-way) mirrored then this is a primary extent if 
     the number is even, and a secondary copy if it is odd. If 
     this is an indirect extent then this is a value between 0 
     and 299 giving the index into the indirect extents. For 
     physically addressed metadata this is the extent within 
     the physically addressed metadata, not the AU within the 
     disk.
     
     8.AU within extent (U2): Large extents are supported for 
     large files. Thus there could be multiple AU's dumped for 
     the same extent. Note that metadata files do not 
     currently use large extents so this only happens for user 
     file dumps to image files.
     
     9.Block count (C5): The number of blocks copied to the 
     image file from the AU. A lot of space is saved by not 
     creating images of blocks that are just initialized 
     contents. This is particularly true for indirect extents 
     where most indirect extents will have only a few blocks 
     of extent pointers. If the extent is not dumped to the 
     image file then this is zero. The count is in ASM 
     metadata blocks, even if the file number is >256 and the 
     indirect flag is 0. This is normally 4K blocks, but could 
     be different in the future. With the -noimage option this 
     is always zero since no images are ever created.
     
     10.Image File Sequence Number (S4): This is the NNNN 
     field of the image file name where blocks from the AU are 
     dumped. With the -noimage option this is always zero 
     since no image files are ever created.
     
     11.Byte Offset in Image File (B10): This is the location 
     within the image file where the block images appear. It 
     is always a multiple of the ASM metadata block size. 
     Since the image file is always less than 2Gb this will 
     always fit in a 32 bit signed integer. Note that this 
     will be an offset to the end of the previously dumped AU 
     when the block count is zero. With the -noimage option 
     this is always zero since no images are ever created.
     
     12.Corrupt Block Flag (X0): If any of the blocks in the 
     AU are corrupt, then the line will end with 'X'. Normally 
     this is a blank character so that the line ends in two 
     blanks.
     
     This adds up to 56 digits, 12 letters, 11 blanks, and one 
     '\n' per line. This is a total of 79 characters including 
     the newline.
     
     The map files are named "<group>.map" where <group> is the 
     disk group name in uppercase. 
     
     Report File
     
     One report file is generated for every run of the utility 
     without the -directory option (except if -noreport is 
     specified). It is written to "report.txt" in the dump 
     directory. If -nodir is specified the report is written to 
     standard out instead of the dump directory name. Lines are 
     flushed to the report file as soon as they are generated so 
     tail -f can be used to monitor progress.
     
     When AMDU is run with -print and -directory options then no 
     report is generated. Instead an existing report file must 
     be found and parsed. Information in the report file is used 
     instead of discovering the disks. The map file is used to 
     find the blocks to printout, and the block contents are 
     retrieved from the image files.
     
     The report is divided into sections and subsections. Each 
     section begins with a title line. The title line has the 
     title centered and surrounded with '*'. There are always at 
     least three asterisks on either side of the title. A 
     subsection title is like a section title except that it is 
     surrounded with '-' rather than '*'.
     
     Any errors reported by AMDU are also printed in the report 
     file. Warnings about unexpected conditions are printed in 
     upper case surrounded by "** ".
     
     The following describes the sections in the report file.
     
     AMDU Setting
     
     The first lines describe the environment where the dump was 
     created. This includes the time when the report was 
     generated and the endianess of the data in the image files. 
     The host name, platform, and software version are also 
     included.
     
     The following subsections describes all the arguments from 
     the command line: operations, disk selection, reading 
     control, and output control. This is a report of the 
     settings that result from the command line parsing, not a 
     copy of the command line.
     
     The CORE package LRM is used to parse the command line 
     arguments. No dump directory or report file is generated if 
     there are argument parsing errors or if the user is only 
     requesting help. Command line errors will result in an exit 
     status of 1 rather than 0. Problems reading disks or 
     extracting a file will be reported on stderr and the report 
     file. The exit status will be 5 in accordance with LPM 
     standards.
     
     Discovery
     
     This section describes every disk returned by discovery. 
     There is one subsection for each disk. The title contains 
     the disk report number. This is followed by the information 
     from shallow discovery. If deep discovery is done for the 
     disk, then the results of deep discovery are reported next. 
     A warning message may indicate that a disk is being 
     ignored.
     
     If the -noscan option is specified then this is the end of 
     the report. If the -noread option is given then this is the 
     end of the report and there is no deep discovery 
     information for any of the disks.
     
     Sleeping for Heartbeat
     
     Unless the -noheart option is given, a section header is 
     reported containing the time sleeping for heatbeat 
     detection. This makes it likely that any disks which 
     contain a PST of a mounted diskgroup will have a heartbeat 
     detected. The section has no lines other than the section 
     header.
     
     Diskgroup Scan
     
     There is one section for every disk group encountered by 
     deep discovery and referenced in either a -dump, -extract 
     or -print option ("-dump all" references all diskgroups 
     mentioned in any valid disk headers). The name of the disk 
     group is in the section header. This is followed by 
     information gathered about the diskgroup during deep. This 
     includes group wide parameters from the disk headers such 
     as AU size and creation time.
     
     A disk scan subsection for each scanned disk in the 
     diskgroup follows the header. Disks that are ignored due to 
     deep discovery and/or command line options, do not have a 
     subsection. The subsection header includes the disk report 
     number. Some of the information from discovery is repeated 
     for convenience. This is reported before the scan begins. 
     Error messages and warnings, such as heartbeat detected, 
     may be reported during the scan. When the scan is complete 
     statistics from the scan are reported. This includes 
     information about data written to the map and image files. 
     Statistics such as space allocated and free are also 
     reported.
     
     A group report subsection follows all the disk scan 
     subsections for the disks in the group. This subsection 
     gives cumulative statistics from all the disks in the disk 
     group. 
     
     Extracting File Sections
     
     A section is reported for each file that is extracted. The 
     section header includes the diskgroup name and file number 
     from the -extract option. The name of the OS file created 
     by the extraction is on the first line of the section. Any 
     errors encountered are reported followed by statistics 
     about the extraction. If -directory is indicated, this info 
     will be writted to stdout. 
     
     End of Report
     The last line of a report is the end of report section 
     header.
     
     Printing Blocks
     The -print option can be used to generate a formatted 
     printout of blocks from a diskgroup that is scanned in this 
     run of AMDU or from a dump directory created by a previous 
     run of AMDU. Use the -directory option to print from a 
     previous AMDU run. 
     
     Output Format
     
     The formatted output is sent to standard out rather than to 
     a file. A section header, as in a report file, is printed 
     for each -print option on the command line. The section 
     header includes the block specification for the printout. 
     There is one subsection for each count in the block 
     specification. The subsection title is "BLOCK n OF c" where 
     n is the number of this block (starting at one), and c the 
     count of blocks in the block specification. 
     
     There may be multiple blocks on disk that match the 
     criteria for printing in one subsection. This may be due to 
     multiple disks appearing to be the same ASM disk or it may 
     be due to the normal mirroring of data. With the -fullscan 
     option it is common to encounter old stale blocks that 
     match the same criteria. A block description is printed for 
     each block that matches the printing criteria. When the 
     block contents are identical, then multiple block 
     descriptions are printed before the formatted printout of 
     the block. If the blocks are different then there may be 
     multiple formatted printouts in one subsection.
     
     A block description consists of three lines. The first line 
     is a separator of all dots. The second line gives the 
     location of the block both as (disk, AU, block) and (file, 
     extent, block). The third line is the kfed command that 
     would create the same formatted output. This is useful for 
     constructing a kfed command to patch the block. It includes 
     the device name of the disk on the system where the dump 
     was created. If the AMDU directory was copied from another 
     system then the kfed command will have to be run on the 
     other system.
     
     Block Specification
     
     There are five different kinds of <block_spec>'s for 
     specifying a range of blocks to printout. They all start 
     with a diskgroup name. The name is case insensitive but it 
     is converted to uppercase. The name is followed by values 
     specified by '.', letter, number. The letter indicates the 
     meaning of the number and may be upper or lower case. The 
     number is a decimal number less than 2^32. The last value 
     may be an optional count of blocks to print using the 
     letter 'C'. So if the last field is ".C4" Then four blocks 
     will be printed starting at the first one specified by the 
     <block_spec>.
     
     The five forms are as follows:
     
     1.Report disk block: This form specifies a disk by it's 
     discovery order and a block by AU and block within AU. 
     The disk report number is always unique, but it is hard 
     to know the number unless you have already run AMDU and 
     seen at least the shallow discovery report. The advantage 
     of this form is that it never refers to multiple blocks 
     since AMDU gives every disk a unique disk report number.
     <group>.N<report_number>.A<au_number>.B<block_number>
     For example <block_spec> "DATA.N0001.A1.b0.c256" would 
     dump the entire PST AU from the first disk discovered 
     (providing it is in disk group DATA). Note that the 
     diskgroup name must match even though disk report numbers 
     are unique.
     
     2.Group disk block: This is similar to report disk block 
     except that the ASM disk number is given rather than the 
     report disk number generated by AMDU. It is possible, but 
     a bad configuration, to see more than one disk with the 
     same ASM disk number for the same ASM disk group. If this 
     happens then this <block_spec> will refer to the blocks 
     on all the disks. 
     
     <group>.D<disk_number>.A<au_number>.B<block_number>
     
     For example <block_spec> "Data.d2.A0.B0" would print the 
     disk header from disk 2 in diskgroup DATA. Also 
     <block_spec> "data.d2.a0.b256" and "data.d2.a1.b0" would 
     both print the PST header block of disk 2 in diskgroup 
     DATA (assuming an AU size of 1 MB and metadata block size 
     of 4096).
     
     3.Extent file block: This form allows specification of a 
     block by a file physical extent number and block within 
     extent. When a file is mirrored there are two physical 
     extents for every virtual extents. This form allows 
     specification of only one mirror copy. It will support 
     printing of any file that is described by the map file. 
     However it is unlikely that a block dump will produce 
     anything but hex data for anything that is not an ASM 
     metadata file. Note that the block size is always the ASM 
     metadata block size no matter which file is being 
     printed. Note that any striping is not taken into account 
     when locating the block.
     
     <group>.F<file_number>.X<extent_number>.B<block_number>
     
     For example <block_spec> "flash.F3.X42.B0" would print 
     the secondary mirror copy of the checkpoint block of ACD 
     thread 2 in diskgroup FLASH. "Data.f3.x0.b0.c10752" would 
     print all the redo for thread 1 in diskgroup DATA (I hope 
     you have an empty file system)
     
     4.Virtual file block: This form allows specification of a 
     block by its virtual block number within the file. Unless 
     this is an external redundancy disk group, all 3 copies 
     of the block are printed. If the copies are the same then 
     only one printout of the contents is generated. This form 
     is only allowed for ASM metadata files because the 
     redundancy can be determined from the diskgroup type, and 
     there is no striping. 
     
     <group>.F<file_number>.V<virtual_block_number>
     
     For example <block_spec> "flash.F1.v2856" would print the 
     file directory block for file 2856 in diskgroup FLASH.
     
     5.Extent map file block: This form allows specification of 
     a block in a files extent map. The first 60 extent 
     pointers are in the file directory the rest are in extent 
     map with 480 pointers per map block. For example 
     <block_spec> "flash.f2856.m0.c427" would print the entire 
     extent map for a 200GB file number 2856.
     
     Command Line
     
     AMDU uses the LRM package from CORE to parse its command 
     line. Thus it follows the LRM conventions. In particular it 
     follows the unix command style. The command line looks like 
     this:
     
     admu [ <option> ... ]
     Some options require specification of a number or string 
     while others are boolean flags that do not require a value. 
     Some options may appear multiple times to provide multiple 
     values. String options are specified as follows:
     -keyword string
     
     Number options are specified as:
     -keyword number
     
     Note that a number may end in K, k, M, m, G, or g to 
     indicate kilo (2^10), mega(2^20), or giga (2^30). 
     
     Boolean flags are specified as:
     -keyword
     
     Note that the CORE package LRM is used to parse the command 
     line options. This means you can specify options as 
     keyword=value, but unless you are very clever and 
     understand completely how LRM works, you will get 
     unexpected results such as ignored parameters. Stick to -
     keyword syntax and you will be fine.
     
     The options fall into four broad classes: operations, disk 
     selection, read control, and output control. 
     
     Operation
     
     These parameters control the fundamental function of AMDU: 
     dumping metadata, extracting file contents, or printing 
     metadata blocks. If none of these are specified then only 
     discovery is performed (same as -noscan).
     
     1.-dump <diskgroup>: This option specifies the name of a 
     diskgroup to have its metadata dumped. This option may be 
     specified multiple times to dump multiple diskgroups.  If 
     the diskgroup name is "ALL" then all diskgroups 
     encountered will be dumped. The diskgroup name is not 
     case sensitive, but will be converted to uppercase for 
     all reports. If this option is not specified then no map 
     or image files will be created, but -extract and -print 
     may still work.
     
     2.-extract <diskgroup>.<file>: This extracts the file (by name
     or number) from the named diskgroup, case insensitive. This 
     option may be specified multiple times to extract 
     multiple files. The extracted file is placed in the dump 
     directory under the name <diskgroup>_<number>.f where 
     <diskgroup> is the diskgroup name in uppercase, and 
     <number> is the file number. The -output option may be 
     used to write the file to any location and is required
     if -directory is specified. The extracted 
     file will appear to have the same contents it would have 
     if accessed through the database. If some portion of the 
     file is unavailable then that portion of the output file 
     will be filled with 0xBADFDA7A, and a message will appear 
     on stderr.
     
     ASM metadata files                  Number   Name
     FILE DIRECTORY                        1      FILE
     ASM DISK DIRECTORY                    2      ASMDISK
     ACTIVE CHANGE DIRECTORY               3      CHANGE
     CONTINUING OPERATIONS DIRECTORY       4      CONTOP
     TEMPLATE DIRECTORY                    5      TEMPLATE
     ALIAS DIRECTORY                       6      ALIAS
     AVD VOLUME FILE DIRECTORY             7      VOL
     USED SPACE                            8      USEDSPC
     ATTRIBUTES DIRECTORY                  9      ATTRIBUTES
     ASM USER DIRECTORY                   10      USER
     ASM USER GROUP DIRECTORY             11      GROUP
     STALENESS DIRECTORY                  12      STALENESS
     
     Files which have fixed numbers but are not ASM metadata files 
     STALE BITMAP SPACE REGISTRY         254      STALEREG
     ORACLE CLUSTER REPOSITORY REGISTRY  255      OCR
     
     3.-print <block_spec>: This option prints one or more 
     blocks to standard out. This option may be specified 
     multiple times to print multiple <block_spec>'s. The 
     printout contains information about how each block was 
     found as well as a formatted printout. Multiple blocks 
     matching the same <block_spec> may be found when scanning 
     the disks. For example there may be multiple disks that 
     have headers for the same diskgroup and disk number. If 
     the block is from a mirrored file then multiple copies 
     should exist on different disks. If multiple copies of 
     the same block have identical contents then only one 
     formatted printout of the contents will be generated, but 
     a header will be printed for each copy. A <block_spec> 
     may include a count of sequential blocks to print. A 
     <block_spec> may specify a block either by disk or file.
      
     <block_spec> ::= <single_block> | <single_block>.C<count> 
     <single_block> ::= <report_disk_block> | <group_disk_block> |            
     <extent_file_block> | <virtual_file_block> | <xmap_file_block> 
     <report_disk_block> ::= 
     <group_name>.N<report_number>.A<au_number>.B<block_number> 
     <group_disk_block> ::= 
     <group_name>.D<disk_number>.A<au_number>.B<block_number> 
     <extent_file_block> ::= 
     <group_name>.F<file_number>.X<physical_extent>.B<block_number> 
     <virtual_file_block> ::= 
     <group_name>.F<file_number>.V<virtual_block_number> 
     <xmap_file_block> ::= 
     <group_name>.F<file_number>.M<extent_map_block_number> 
     
     Disk Selection
     
     These parameters control the disk discover phase of 
     operations. They allow specification of which disks should 
     be scanned for AU's to dump. The operation options -dump, -
     extract, and -print also limit scanning to disks in the 
     diskgroups specified by the options. The following options 
     can be specified to control how the disks are discovered 
     and scanned
     
     1.-diskstring <string>: By default the null string is used 
     for discovery. The null string should discover all disks 
     the user has access to. Many installations specify an 
     asm_diskstring parameter for their ASM instance. If so 
     that parameter value should be given here. Multiple 
     discovery strings can be specified by multiple 
     occurrences of -diskstring <string>. Beware of shell 
     syntax conflicts with discovery strings. Diskstrings are 
     usually the same syntax the shell uses for expanding path 
     names on command lines so they will most likely need to 
     be enclosed in single quotes.
     
     2.-exclude <string>: Multiple exclude options may be 
     specified. These strings are used for discovery just like 
     the values for diskstring. Only shallow discovery is done 
     on these diskstrings. Any disks found in the exclude 
     discovery will not be accessed. If they are also 
     discovered using the -diskstring strings, then the report 
     will include the information from shallow discovery along 
     with a message indicating the disk was excluded.
     
     3.-former: Normally disks marked as former are not scanned, 
     but this option will scan them and include their contents 
     in the output. This is useful when it is necessary to 
     look at the contents of a disk that was dropped. Note 
     that dropped normal disks will not have any entries in 
     their allocation tables and thus only the physically 
     addressed extents will be dumped. Force dropped disks 
     will not have status former in their disk headers and are 
     not affected by this option. However if DROP DISKGROUP is 
     used, the disks will have the contents as of the time of 
     the drop, and will be in status former. Thus this option 
     is useful for extracting files from a dropped diskgroup.
     
     4.-baddisks <diskgroup>:  Normally disks with bad disk 
     headers, or that look like they were never part of a disk 
     group, will not be scanned. This option forces them to be 
     scanned anyway and to be considered part of the given 
     diskgroup. This is most useful when a disk header has 
     been damaged. The disk will still need to have a valid 
     allocation table to drive the scan unless -fullscan is 
     used. If block 0 is damaged, AMDU will try to read the 
     backup disk header. If this fails, and AMDU needs to
     construct a working disk header, at least one block in the 
     first two AUs must be valid so that the disk number can be 
     determined. The options -ausize and -blksize are required 
     since these values are normally fetched from the disk header. 
     If the diskgroup uses external redundancy then -external should 
     be specified. These values will be compared against any 
     valid disks found in the diskgroup and they must be the 
     same.
     
     5.-directory <string>: This option completely eliminates 
     the discovery and disk scanning phases of operation. It 
     specifies the name of a dump directory from a previous 
     run of AMDU. The report file and map files are read 
     instead of doing a discovery and scan. The parsing of 
     these ASCII files is very dependent on them being exactly 
     as written by AMDU. AMDU is unlikely to work properly if 
     they have been modified by a text editor, or if some of 
     the files are missing or truncated. Note that the 
     directory may be a copy FTP'ed from another machine. The 
     other machine may even be a different platform with a 
     different endianess.
     
     Read Control
     
     These parameters control which AU's on a disk are read and 
     how they are found. Every AU read from a -dump diskgroup is 
     dumped, unless the -noimage output option is set. Reading 
     still checks for I/O errors and corrupt blocks even if -
     noimage is set. The default scanning algorithm is to look 
     at the allocation table and dump any extent that contains 
     ASM metadata according to its allocation table entry. The 
     registries are not considered metadata and are not dumped 
     by default. Registries are not modified through the ASM 
     buffer cache, and may not have ASM block headers on them. 
     If part of the AU contains metadata blocks that were never 
     modified, then the unmodified blocks are not dumped. The 
     most common case is the extra blocks in an indirect extent.
     
     1.-fullscan: This option reads every AU on the disk and 
     looks at the contents of the AU rather than limiting the 
     AU's read based on the allocation table. This is useful 
     when the allocation table is corrupt or needs recovery. 
     An AU will be written to the image file if it starts with 
     a block that contains a valid ASM block header. The file 
     and extent information for the map will be extracted from 
     the block header. Physically addressed metadata will be 
     dumped regardless of its contents. This option is 
     incompatible with extracting a file. It is an error to 
     specify -extract with this option. Note that this option 
     is likely to find old garbage metadata in unallocated 
     AU's since there is no means of determining what is 
     allocated. Thus there may be many different copies of the 
     same block, possibly of different versions.
     
     2.-ausize <bytes> -blksize <bytes>: Both of these options 
     must be set when -baddisks is set. They must both be a 
     power of 2. These sizes are required to scan a disk 
     looking for metadata, and it is normally read from the 
     disk header. The values apply to all disks that do not 
     have a valid header. The values from the disk header will 
     be used if a valid header is found.
     
     3.-external: Normally amdu determines the diskgroup 
     redundancy from the disk headers. However this is not 
     possible with the -baddisks option. It is assumed that 
     the redundancy of diskgroup "none" is normal or high 
     unless this option is given to specify external 
     redundancy.
     
     4.-compare: This option only applies to file extraction 
     from a normal or high redundancy disk group. Every extent 
     that is mirrored on more than one discovered disk will 
     have all sides of its mirror compared. If they are not 
     identical a message will be reported on standard error 
     and the report file. The message will indicate which copy 
     was extracted. A count of the blocks that are not 
     identical will be in the report file.
     
     5.-registry: The ASM registries will be read and dumped to 
     the image file. There will be no block consistency checks 
     since these files do not have ASM cache headers. To dump 
     one specific registry specify -filedump and include the 
     file object for the registry (e.g. DATA.255)
     
     6.-noheart: Normally the heartbeat block will be saved at 
     discovery time and checked when the disk is scanned. A 
     sleep is added between discovery and scanning to ensure 
     there is time for the heartbeat to be written. If the 
     heartbeat block changes then it is most likely that the 
     diskgroup containing this disk is mounted by an active 
     ASM instance. An error and warning is generated but 
     operation proceeds normally. This option suppresses this 
     check and avoids the sleep.
     
     7.-noxmap: This option eliminates reading of the indirect 
     extents containing the file extent maps. This is the bulk 
     of the metadata in most diskgroups. Even the entries in 
     the map file will be eliminated.
     
     8.-novirtual: This option eliminates reading of any virtual 
     metadata. Only the physically addressed metadata will be 
     read. This implicitly eliminates the ACD and extent maps 
     so -noacd and -noxmap will be assumed.
     
     9.-noscan: This eliminates any reading of any disks after 
     deep discovery. This results in just doing a deep 
     discovery using the disksting parameter. The report will 
     end after the discovery section. It is an error to 
     specify this option and specify a file to extract. It is 
     an error to specify this and -fullscan.
     
     10.-noread: This eliminates any reading of any disks at 
     all. Only shallow discovery will be done. The report will 
     end after the discovery section. It is an error to 
     specify this option and specify a file to extract or 
     blocks to print. It is an error to specify this and -
     fullscan.
     
     Output control
     Output control parameters change which output files are 
     created, where they are created, and how they are created. 
     The following options are supported.
     
     1.-parent <path_name>: By default the dump directory is 
     created in the current directory, but another directory 
     can be specified using this option. The parent directory 
     for the dump directory must already exist. 
     
     2.-noacd: This option limits the dumping of the Active 
     Change Directory to just the control blocks that contain 
     the checkpoint. There is 126 MB of ACD per ASM instance 
     (42 MB for external redundancy). It is normally of no 
     interest if there has been a clean shutdown or no updates 
     for a while. This option avoids dumping a lot of 
     unimportant data. The blocks will still be read and 
     checked for corruption. The map file will still contain 
     entries for the ACD extents, but the block counts will be 
     zero.
     
     3.-noimage: No image files will be created n the dump 
     directory. All the reads specified by the read options 
     will still be done. The map files may be used to find 
     blocks on the disks themselves. In the map file, the 
     count of blocks dumped, the image file sequence number, 
     and the byte offset in the image file will all always be 
     zero (C00000 S0000 B0000000000).
     
     4.-nomap: No map file is created and no image file is 
     created. The only output is the report file. The -noimage 
     option is assumed if this is set since an image file 
     without a map is useless. The options -noscan and -noread 
     also result in no map or image files, but -nomap still 
     reads the metadata to check for I/O errors and corrupt 
     blocks.
     
     5.-filedump: This option causes the file objects in the 
     command line to have their blocks dumped to the image 
     files rather than extracted. This can be combined with 
     the -novirtual option to selectively dump only some of 
     the metadata files. It may also be used to dump user 
     files (number >= 256) so that all mirrored copies can be 
     examined.
     
     6.-output <file_name>: This option specifies a different 
     file for writing an extracted file. The file will be 
     overwritten if it already exists. This option requires 
     that exactly one file is extracted via the -extract 
     option. Required with -extract and -directory. 
     
     7.-noextract: This prevents files from being extracted to 
     an output file, but the file will be read and any errors 
     in selecting the correct output will be reported. This is 
     most useful in combination with the -compare option.
     
     8.-nodir: No dump directory is created, and no files are 
     created in it. The directory name is not written to 
     standard out. The report file is written to standard out 
     before any block printouts from any -print options.  This 
     option conflicts with -filedump. It is an error to 
     specify this and extract a file to the dump directory.
     
     9.-noreport: This suppresses the generation of the report 
     file. It is most useful in combination with -nodir and -
     print to get block printouts without a lot of clutter. It 
     is unnecessary to include this with -directory since no 
     report is generated then anyway.
     
     10.-hex: This prints the block contents in hex without 
     attempting to print them as ASM metadata. This is useful 
     when the block is known to not be ASM metadata. It avoids 
     the ASM block header dump and ensures the block is not 
     accidentally interpreted as ASM metadata. This option 
     requires at least one -print option.
     
     11.-noprint: This suppresses the printout of the block 
     contents for blocks printed with the -print option. It is 
     useful for getting just the block reports without a lot 
     of data. This option requires at least one -print option.
     
     Inconsistencies
     Since AMDU does not do all the checks required to mount a 
     diskgroup, it is possible for the disks to be inconsistent. 
     There may be missing disks or older stale disks. There 
     could be two different diskgroups with the same name. Since 
     the diskgroup may need crash recovery there could be 
     duplicate entries for the same file extent in the 
     allocation tables. Here are a list of the possible 
     inconsistencies and how they are dealt with
     
     1.There could be two paths to the same disk. If two disks 
     have identical headers it is assumed they are the same 
     disk. The second disk is ignored and a message appears in 
     place of its deep discovery report.
     
     2.There could be disks from two different diskgroups with 
     the same diskgroup name. An error message is given and 
     the disk group is not scanned. No files will be extracted 
     from the diskgroup and no metadata will be dumped or 
     printed. Use the exclude parameter to eliminate the disks 
     from one disk group. 
     
     3.There could be two disks in the same diskgroup with the 
     same disk number. This happens if a disk is dropped 
     force, another disk is added, and the old disk is 
     discovered by AMDU. Metadata will be dumped for both 
     disks. A file extraction will only look for extents on 
     the disk with the highest disk creation timestamp. The 
     other disk will be ignored even if it contains the only 
     copy of an extent.
     
     4.There could be two AU's that are for the same file and 
     extent. This can happen if a relocation is incomplete. 
     For metadata dumping both extents are dumped. For file 
     extraction the contents will be compared. If they are the 
     same then there is no problem. If the contents differ 
     then the disk with the lowest disk report number will be 
     chosen. An error message will indicate the problem and 
     which disk was chosen.
     
     5.With the -compare option the mirror copies of an extent 
     could differ. If this happens the primary extent will be 
     chosen. With high redundancy and a missing primary extent 
     the first secondary will be chosen. An error message will 
     be reported. 

BLOCK CORRUPTIONS ON ORACLE AND UNIX

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

PURPOSE
  This article discusses block corruptions in Oracle and how they are related 
  to the underlying operating system and hardware.  To better illustrate the 
  discussion, Unix is taken as the operating system of reference, although similar
  situations can be observed on other operating systems as well.

SCOPE & APPLICATION
  For users requiring further understanding as to how a block could become 
  corrupted.


Block corruption has been a common occurrence on most UNIX based systems and
relational databases for many years.  It is one of the most frequent ways to
lose data and cause serious business impact.  Through a survey of literary
technical sources, this document will discuss several ways that block 
corruptions can occur, provide conclusions and possible solutions.

To fully comprehend all the reasons for block corruptions, it is necessary to
understand how I/O device subsystems work, how memory buffers are used to
support the reading and writing of data blocks, how blocks are sized on both
UNIX and Oracle, and how these three objects work together to maintain data
consistency.

I/O devices are designed specifically for host machines and there have been
few attempts to standardize a particular interface across the industry.  Most
software, including Oracle, on UNIX machines uses standard C program calls that
in turn perform system calls to support the reading and writing of data to 
disk.  These system calls access I/O device software that retrieves or writes 
data on disk.

The UNIX system contains two types of devices, block devices and raw or 
character devices.  Block devices look like random access storage devices to
the rest of the system while character devices include all other devices such
as terminals and network media. (Bach, 1990 314).  These device types are
important to understand because different combinations can increase corruptions.

Device drivers are configured by the operating system and the configuration
procedure generates or populates tables that form part of the code of the
kernel.  This kernel to device driver interface is described by the block
device switch table and the character device switch table.  Each device type
has entries in these tables that direct the kernel to the appropriate driver
interfaces for the system calls.  The open and close system calls of a device
file funnel through the two device switch tables, according to file type.  The
mount and umount system calls also invoke the device open and close procedures
for block devices. Read and write system calls of character special files pass
through the respective procedures in the character device switch tables.  Read
and write system calls of block devices and of files on mounted file systems
invokes the algorithms of the buffer cache, which invoke the device strategy
procedure. (Bach, 1990 314).  This buffer cache plays an important role in
block corruptions since it is the location where data blocks are the most
vulnerable.

The difference between the two disk interfaces is whether they deal with the
buffer cache.  When accessing the block device interface, the UNIX kernel
follows the same algorithm as for regular files, except that after converting
the logical byte offset into a logical block offset, it treats the logical
block offset as a physical block number in the file system.  It then accesses
the data via the buffer cache and, ultimately, the driver strategy interface.
However, when accessing the disk via the raw interface, the kernel does not
convert the byte offset into the file but passes the offset immediately to the
driver.  The driver's read or write routine converts the byte offset to a
block offset and copies the data directly to the user address space, bypassing
kernel buffers.

Thus, if one process writes a block device and a second process then reads a
raw device at the same address, the second process may not read the data that
the first process had written, because the data may still be in the buffer
cache and not on disk.  However, if the second process had read the block
device, it would automatically pick up the new data, as it exists in the
buffer cache.  (Bach, 1990 328).

Use of the raw interface may also introduce strange behavior.  If a process
reads or writes a raw device in units smaller than the block size, results are
driver-dependent.  For instance, when issuing 1-byte writes to a tape drive,
each byte may appear in different tape blocks. (Bach 1990)

The advantage of using the raw interface is speed, assuming there is no
advantage to caching data for later access.  Processes accessing block devices
transfer blocks of data whose size are constrained by the file system logical
block size. Furthermore, use of the block interface entails an extra copy of
data between user address space and kernel buffers, which is avoided in the
raw interface.  For example, if a file system has a logical block size 1K
bytes, at most 1K bytes are transferred per I/O operation.  However, processes
accessing the disk as a raw device can transfer many disk blocks during a disk
operation, subject to the capabilities of the disk controller.

Disk controllers are hardware devices that control the I/O actions of one or
more disks.  These controllers can also create a bottleneck in a system.
(Corey, Abbey, Dechichio 1995). Controllers are the most frequent piece of
hardware to have and cause problems on many systems.  When a system has
multiple disks controlled by one controller, the results can be fatal.  The
bottleneck on controllers is a common cause of write error.

It is important to remember that Oracle and other products use these device
access methods to perform their work.  It is also important to note the added
complexity that the Oracle kernel adds to the I/O game.

The Oracle Relational Database Management System (RDBMS) keeps its
information, including data, in block format.  However, the Oracle data block
can be, and in most cases is, composed of several operating system blocks.

An Oracle database block is the physical unit of storage in which all Oracle
database data are stored in files.  The Oracle database block size is
determined by setting a parameter called db_block_size when the
database is created. (Millsap, 1995).

The most common UNIX block is 512 bytes but the Oracle block size can range
from 512 to 32K.  The difference in block sizing between the operating system
and the Oracle kernel are beneficial for Oracle; boosting performance gains
while allowing UNIX to maintain small files with minimal wasted space.  The
Oracle block can be considered a superset of the UNIX file system block size.

Each block of an Oracle data file is formatted with a fixed header that
contains information about the particular block.  This information provides a
means to ensure the integrity for each block and in turn, the entire Oracle
database.  One component of the fixed header of a data block is called a Relative 
Data Block Address (DBA).  This DBA is a 4 bytes that stores the relative file 
number of the Oracle database file and the Oracle block number offset relative 
to the beginning of the file. (Presley, 1993).

Whenever there is a problem with the RDBA, Oracle may signal an Oracle error
ORA-1578: Data block corrupted in file #  block #.  This error provide information that point to where the
corruption exists.

Oracle uses the standard C system function calls to read and write blocks to
its database files.  Once the block has been read it is mapped to shared
memory by the operating system,  After the block has been read into shared
memory, the Oracle kernel does verification checks on the block to ensure the
integrity of the fixed header.  The RDBA check is the first verification made
on the fixed header.  So why do RDBAs become corrupt and how can we identify
and correct them?

Case One
--------

The first case of block corruption occurs when the block has been zeroed out. If the Oracle block 
is completely zeroed out, sql statements may generate an ORA-8103 as the block type=0 is invalid 
and it is not formatted as an empty block. In  this case the dbverify utility (dbv) can detect it 
and will produce an error message.  Dbv output example:

DBVERIFY - Verification starting : FILE = /oradata/data_01.dbf
Page 307161 is marked corrupt
***
Corrupt block relative dba: 0x0644afd9 (file 0, block 307161)
Completely zero block found during dbv:


Usually the first operating system block of an Oracle block is zeroed out when 
there was a software error on disk and the operating system attempted to repair 
its block.  In addition, disk repair utility programs have caused this zeroing out effect.

Programs that read from and write to the disk directly can destroy the
consistency of file system data.  The file system algorithms coordinate disk
I/O operation to maintain a consistent view of disk data structures, including
linked lists of free disk blocks and pointer from inodes to direct and
indirect data blocks.  Processes that access the disk directly bypass these if
they run while other file system activity is going on.  For this reason, these
programs should not be run on an active file system. (Bach, 1990 328).


Case Two
--------

The RDBA in the physical block on disk is incorrect.  It can generate an error ORA-1578 
and a message in the alert.log with message "Data in bad block" as next:

***
Corrupt block relative dba: 0x56c07ac1 (file 347, block 31425)
Bad header found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x06407ac1
last change scn: 0x0000.00a02808 seq: 0x1 flg: 0x02
consistency value in tail: 0x28080601
check value in block header: 0x0, block  checksum disabled
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x56c07ac1 (file 347, block 31425) found same corrupted data


Blocks are sometimes written into the wrong places in the data file.  This is
called "write blocks out of sequence."  This typically happens when the operating system
I/O device driver fails to write the block in the proper location that Oracle
requested via the lseek() system call.

The lseek() system call is one of the most important calls related to block
corruption.  The calculations that lseek() performs are often the cause of
block problems.  To understand lseek() a brief discussion of byte positioning
is necessary.

Every open file has a "current byte position" associated with it.  This is
measured as the number of bytes from the start of the file.  The create system
call sets the file's position to the beginning of the file, as does the open
system call.  The read and write system calls update the file's position by
the number of bytes read or written.  Before a read or write, an open file can
be positioned using lseek().  The format is:

lseek(int fildes, long offset, int whence);

The offset and whence arguments are interpreted as follows: If whence is 0,
the file's position is set to offset bytes from the beginning of the file.  If
whence is 1, the file's position is set to its current position plus the
offset.  If whence is 2, the file's position is set to the size of the file
plus the offset.  The file's offset can be greater than the file's current
size, in which case the next write to the file will extend the file.  Lseek()
returns a long integer byte offset of the file.  (Stevens, 1990 40).

There is great opportunity for miscalculation of an offset based on the
lseek() system call.  Though lseek is not the only system call culprit in the
block corruption problem, it is a major contributor.


This may also happen if the block was corrupted in memory but was written to disk.  
This situation is quite rare and in most cases it is usually caused by memory 
faults that go undetected.  The RDBA found in the block is usually garbage and 
not a valid RDBA.  

If there is a possibility of memory problems on the system, the database
administrator can enable further sanity block checking by placing the
following parameters in the database instance init.ora parameter file:

db_block_checking=TRUE
db_block_checksum=TRUE / FULL (10.2+)
_db_block_cache_protect= true

db_block_checking force the Oracle RDBMS kernel to call functions that check
the block. Oracle checks a block by going through the data on the block, making 
sure it is self-consistent. Block checking can often prevent memory and data corruption

db_block_checksum determines whether DBWn and the direct loader will calculate 
a checksum (a number calculated from all the bytes stored in the block) and 
store it in the cache header of every data block when writing it to disk. 
Checksums are verified when a block is read only if this parameter is true and the 
last write of the block stored a checksum.  If set to FULL, DB_BLOCK_CHECKSUM also 
catches in-memory corruptions and stops them from making it to the disk.

The _db_block_cache_protect=true protects the cache layer from becoming corrupted.
This parameter will prevent certain corruption from getting to disk, although
it may crash the foreground of the database instance.  It will help catch
stray writes in the cache. When a process tries to write past the buffer size
in the SGA, it will fail first with a stack violation.

If the database writer process detects a corrupted block in cache prior to
writing the block to disk, it will signal an error and will crash the
database instance.  The block that is corrupted is  never written to disk.
After receiving such an error, simply attempt to restart the database instance.
There is no doubt that this can be a costly workaround to avoid block
corruptions.  However, the workaround once a corruption has occurred can be
even costlier.




Case 3
------

A third cause for block corruption is the requested I/O  not being serviced by
the operating system.  The calls that Oracle makes to lseek() and read() are checked for 
return error codes. In addition, Oracle checks to see the number of bytes read in by the read()
system call to ensure that the block size or a multiple of the block size was
read.  Since these checks appeared to have been successful, Oracle assumes
that the direct read succeeded.  Upon sanity checking, the RDBA is incorrect
and the database operation request fails.  Therefore, the I/O read request
really never took place.  In this case, the RDBA found can point to a block of
a different file.

Case 4
------

Another reason for block corruption is reading the wrong block from the same
device.  Typically, this is caused by a very busy disk.  In some cases, the
block read was off by 1 block but can range into several hundreds of blocks.
Since this occurs when the disk is very busy and under lots of
stress, try spreading datafiles across multiple disks and ensure that the disk
drive can support the load.


In the third and fourth situations, the database files will not be physically
corrupted and the operation can be tried again with success.  Most diagnostics
testing will not reveal anything wrong with either the operating system or the
hardware.  However, the problem is due to operating system or hardware related
problems. (Velpuri, 1995).

So what causes the operating system calls to behave the way they do and how
can companies try to minimize their risk?  To evaluate these questions,
another look into how UNIX works is required.

UNIX vendors, in a attempt to speed performance, have implemented many
features into the filesystem.  The filesystem manages a large cache of I/O
buffers, called the buffer cache. This cache allows UNIX to optimize read and
write operations.  When a program writes data, the filesystem stores the data
in a buffer rather that writing it to disk immediately.  At some later point
in time, the system will send this data to the disk driver, together with
other data that has accumulated in the cache.  In other words, the buffer
cache lets the disk driver schedule disk operations in batches.  It can make
larger transfers and use techniques such as seek optimization to make disk
access more efficient.  This is called write-behind.

When a program reads data, the system first checks the buffer cache to see if
the desired data is already there.  If the data is already in the buffer
cache, the filesystem does not need to access the disk for those blocks.  It
just gives the user the data it found in its buffer, eliminating the need to
wait for a disk drive.  The filesystem only needs to read the disk if the data
isn't already in the cache.  To increase efficiency even further, the
filesystem assumes the program will read the file consecutively and read
several blocks from the disk at once.  This increases the likelihood that the
data for future read operations will already be in the cache. (Loukides, M.,
1990)  This also increases the chance of block corruption.

As a filesystem gets busy and buffers are being read, modified, written, and
aged out of the cache the chance of the kernel reading or writing the wrong
block increases.  Also, the more complex the scheme to read from and write to
disk, the greater the likelihood of function failure.

The UNIX kernel uses the strategy interface to transmit data between the
buffer cache and a device, although the read and write procedures of character
devices sometime use their block counterpart strategy procedure to transfer
data directly between the device and the user address space.  The strategy
procedure may queue I/O jobs for a device on a work list or do more
sophisticated processing to schedule I/O jobs.  Drivers can set up data
transmission for one physical address or many, as appropriate.  The UNIX
kernel passes a buffer header address to the driver strategy procedure.  The
header contains a list of addresses and sizes for transmission of data to or
from the device.  This is also how the swapping operations work.  For the
buffer cache, the kernel transmits data from one address; when swapping, the
kernel transmits data from many data addresses.  If data is being copied to or
from the user's address space, the driver must lock the process in memory
until the I/O transfer is complete.

The kernel loses control over a buffer only when it waits for the completion
of I/O between the buffer and the disk.  It is conceivable that a disk drive
is corrupt so that it cannot interrupt the CPU, preventing the kernel from
ever releasing the buffer. There are processes that monitor the hardware for
such cases and zero out the block and return an error to the kernel for a bad
disk job. (Bach, 1990 52).

On the UNIX level there are several utilities that will check for  bad disk
blocks and zero out any blocks they find corrupted.  These utilities do not
realize that the block in question may be an Oracle RDBMS block and zero out
the block by mistake.

In (Silberschatz, Galvin, 1994), the authors consider the possible effect of a
computer crash.  In this case, the table of opened files is generally lost,
and with it any changes in the directories of opened files.  This event can
leave the file system in an inconsistent structure.  Frequently, a special
program is run at reboot time to check for and correct disk inconsistencies.

The consistency checker compares the data in the directory structure with the
data blocks on disk, and tries to fix and inconsistencies it finds.
(Silberschatz, Galvin, 1994)  This will often result in the reformatting of
blocks which will cause the Oracle block information to be removed.  This will
definitely cause Oracle corruption.

It is important to realize that monitoring of hardware is required for all
operating systems.  Hardware monitors can sense electrical signals on the
busses and can accurately record them even at high speed.  A hardware monitor
keeps observing the system even when it is malfunctioning, and thus, it can be
used to debug the system. (Jain, 1991 99)  These tools can help determine the
cause of the problem and detect problems like controller error and media
faulting which are frequent corruption contributors.

In any case, there are many opportunities for blocks, either on disk or in the
buffer cache, to become corrupt.  Fixing the corruption can sometimes provide
even greater opportunities.


Conclusion
----------

Data block corruption is an ongoing problem on all operating systems,
especially UNIX.  There are many types and causes of corruptions to consider.
Advanced system configurations can increase the chance and hardware problems
are a common source of corruptions. When receiving block corruption errors,
remember that a couple of them are not physical corruptions but memory
corruptions that are never written to disk.

Oracle Customer Support provides a number of bulletins on block corruption
problems that help recover what is left of the data once corruption has
occurred.  If block corruption occurs on a machine, be sure to identify the
type of corruption and establish a plan for its correction.


[1]  Bach, M. (1990). The Design of the UNIX Operating System.
                      The I/O Subsystem  328.
[2]  Corey, M., Abbey, M., Dechichio, D. (1995).  Tuning Oracle 52.
[3]  Jain, R. (1991). The Art of Computer Systems Performance Analysis. 99
[4]  Loney, K. (1994).  Oracle DBA Handbook.   23.
[5]  Loukides, M., (1990) System Performance Tuning. 161-162.
[6]  Millsap, C. (1995). Oracle7 Server Space Management. 1-2.
[7]  Presley, D. (1993).  Data Block Corruption Detection. Oracle Corporation.
[8]  Silberschatz A., Galvin P. (1994)  Operating System Concepts. 404.
[9]  Stevens, W. (1990). UNIX Network Programming. 163.
[10] Velpuri, R. (1995).  Oracle Backup and Recovery Handbook. 286
 

Oracle ORA-00600 [25027] ORA-600 [25027]

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Format: ORA-600 [25027] [a] [b]

VERSIONS:
versions 9.2 and above
DESCRIPTION:
An invalid Tablespace Number (TSN) and/or Relative File Number (RFN) has been found

ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)

FUNCTIONALITY:
Kernel File management Tablespace component
IMPACT:
PROCESS FAILURE
POSSIBLE PHYSICAL CORRUPTION

 

SUGGESTIONS:

1. If the Arg [b] (the RDBA) is 0 (zero), then this could be due to fake indexes.

The following query will list fake indexes:

select do.owner,do.object_name, do.object_type,sysind.flags
from dba_objects do, sys.ind$ sysind
where do.object_id = sysind.obj#
and bitand(sysind.flags,4096)=4096;

 

If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error.

Run analyze table validate structure on the table referenced in the Current SQL statement in
the related trace file.

If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Known Bugs

 

NBBugFixedDescription
  14010183 11.2.0.3.BP22, 11.2.0.4.BP03, 12.1.0.2, 12.2.0.0 ORA-600 [ktspfundo:objdchk_kcbgcur_3] in SMON after failed temp segment merge load
  13503554 11.2.0.4, 12.2.0.0 Various ORA-600 errors crashing the apply process in a downstreams environment
  13785716 11.2.0.4, 12.1.0.1 Intermittent ORA-600 [25027] during upgrade from 10.2 to 11.2
  11661824 11.2.0.1.BP09 Assorted Dumps by SQL*LOADER using DIRECT and PARALLEL after exadata bp8 is applied
  10067246 12.2.0.0 ORA-600 [25027] ORA-7445 [kauxs_do_dml_cooperation] by CREATE INDEX ONLINE
  14138130 11.2.0.3.5, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1 SGA memory corruption / ORA-7445 when modifying uncompressed blocks of an HCC-compressed segment
  13330018 11.2.0.4, 12.1.0.1 ora-600 [ktspfmb_add1], [4294959240] occurred, then cannot recover with ora-600[25027]
  13103913 11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1 ORA-600 [25027] [ts#] [1] or false ORA-1 during dml while index is being rebuilt online
  10394825 11.2.0.3, 12.1.0.1 ORA-600[25027] [..] [0] inserting to ASSM segment
  10329146 11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1 Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+ 10209232 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1 ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
+ 9399991 11.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1 Assorted Internal Errors and Dumps (mostly under kkpa*/kcb*) from SQL against partitioned tables
* 9145541 11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1 OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
  8837919 11.2.0.2, 12.1.0.1 DBV / RMAN enhanced to detect ASSM blocks with ktbfbseg but not ktbfexthd flag set as in Bug 8803762
  8803762 11.1.0.7.6, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 ORA-600[kdsgrp1], ORA-600[25027] or wrong results on 11g database upgrade from 9i
  8716064 11.2.0.2, 12.1.0.1 Analyze Table Validate Structure fails on ADG standby with several errors
+ 8597106 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 Lost Write in ASM when normal redundancy is used
  7251049 11.2.0.1.BP08, 11.2.0.2, 12.1.0.1 Corruption in bitmap index introduced when using transportable tablespaces
  8437213 10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1 ASSM first level bitmap block corruption
  8356966 11.2.0.1 ORA-7445 [kdr9ir2rst] by DBMS_ADVISOR or false ORA-1498 by ANALYZE on COMPRESS table
* 8198906 10.2.0.5, 11.2.0.1 OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
* 7263842 10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1 ORA-955 during CTAS / OERI [ktsircinfo_num1] / dictionary inconsistency for PARTITIONED Tables
  6666915 10.2.0.5, 11.1.0.7, 11.2.0.1 OERI[25027] / dictionary corruption from concurrent partition DDL
  6025993 10.2.0.5, 11.1.0.6 ORA-600 [25027] in flashback archiving queries
  4925342 9.2.0.8, 10.2.0.3, 11.1.0.6 OERI [25027] / OERI [25012] on IOT analyze estimate statistics
* 7190270 10.2.0.4.1, 10.2.0.5 Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
  4310371 9.2.0.8, 10.2.0.2 OERI [25027] from concurrent startup / shutdown in RAC
  4177651 10.2.0.1 Row migration within a MERGE may OERI[25027]
  4020195 10.1.0.5, 10.2.0.1 OERI 25027 can occur in RAC accessing transported tablespace
  4000840 9.2.0.7, 10.1.0.4, 10.2.0.1 Update of a row with more than 255 columns can cause block corruption
  3963135 10.1.0.5, 10.2.0.1 OERI[kcbgcur_3] / OERI:25027 during bitmap index updates
  3829900 10.1.0.4, 10.2.0.1 OERI[25027] possible accessing index in 10g
  2942185 9.2.0.6, 10.1.0.4, 10.2.0.1 Corruption occurs on direct path load into IOT with ADDED columns
  3085057 10.1.0.2 ORA-600: [25027] from ALTER TABLE .. SHRINK SPACE CASCADE
  2926182 9.2.0.5, 10.1.0.2 OERI[25027] / ORA-22922 accessing LOB columns in IOT in AFTER UPDATE trigger

 

Summary of Bugs Containing ORA-00600 [2662] ORA-600 [2662]

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

Purpose

The purpose of this Note is to explain bugs filed for ORA-00600 [2662] error against specific Oracle database versions, and explain the symptoms ofeach bug, workarounds if any and references the patch available at the time this article was written.
Scope
This article is a consolidated effort to summarize the top bugs reported (error) which have been fixed. It is directed towards Oracle Support Analystsand Oracle Customers to have an overview of various bugs logged for the same error
Error Description:

 

The ORA-600 [2662] is raised when data block SCN is ahead of the current SCN.
This is generally related to the redo application which is used to bring the database to a consistent state.
Summary of Bugs Containing ORA-00600 [2662]

Bug 4453449
Abstract: Flashback to guaranteed restore point in orphan inc may result in ORA-600[3020]
Versions affected: 10.2.0.1
Fixed in versions: 10.2.0.2 & 11.0
Backportable: Yes

Symptoms:

The symptom of this bug include ORA-600[3020], ORA-600[2662] after flashback
database and ORA-600[flashback_validation] during flashback database.
There may also be other symptoms.

Details:
ORA-600[3020] / ORA-600 [2662] / ORA-600 [flashback_validation] can occur
after/during multiple flashback/recovery through multiple database resetlogs
without opening the database. There may also be other symptoms which appear as
recovery related corruption errors.
Workaround:
1. If you flashback a crashed primary database, follow flashback database with open
resetlogs. Alternatively, if you’d like to completely undo flashback database,
follow flashback database with recover database without shutting down the
instance first.
2. Restore backup and recover.
Patch details:
Currently there is no one-off patch available for any platform and versions.
Bug 2899477 (Unpublished)
Abstract:ORA-600[2662] CAUSES INSTANCE CRASH
Versions affected: 9.2.0.4
Fixed in versions: 9.2.0.4 & 10.1
Backportable: Yes
Symptoms:
When you have a corrupted SCN and if the corruption is found in selexe,
getting uninitialized selenv from opiexe, then this may be the bug.

 

One-off patch available for few platforms on top of 9.2.0.4
Check the Metalink for Patch 2899477 availability.
Bug 2764106
Abstract: ORA-600 [2662] BRINGS THE DATABASE DOWN
Versions affected: 8.1.7.4 & 9.2.0.4
Fixed in versions: 9.2.0.5 & 10.1
Backportable: Yes
Symptoms:
OERI(2662) even The dependent scn present in the disk blocks are fine.
Details:
A false ORA-600 [2662] error can occur on SELECT operations
which can result in an instance crash even though there is no
underlying problem with the on disk SCN.
Workaround:
None
Patch details:
One-off patch available for few platforms on top of 8.1.7.4 & 9.2.0.4
Check the Metalink for Patch 2764106 availability.
Bug 2216823 (Unpublished)
Abstract:OERI(2662) REPORTED WHEN REUSING TEMPFILE WITH RESTORED DB
Versions affected: 9.2.0
Fixed in versions: 10.1.0
Backportable: No
Symptoms:
eg:
1. Create a TEMP tablespace.
2. Shutdown a database.
3. Copy control file, data files, and log files to another directory
(but not tempfile).
4. Restart a database.
5. Create a temporary table and insert into it, thereby causing tempfile
to be updated.
6. Shutdown a database.
7. Restore a database.
8. Restart a database.
9. Create a temporary table and insert into it.
10. Commit
^- ORA-600 [2662]
Details:
ORA-600 [2662] can occur when reusing a TEMPFILE with
a restored database.
Workaround:
The workaround is not to use the pre-existing tempfile.
Instead either backup the tempfile with rest of the database
or remove the tempfile then recreate a new tempfile once the
database is open.
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 2054025 (Unpublished)

 

Abstract:ORA-600 [2662] RELATED TO KDIT.C
Versions affected: 9.0.1.2
Fixed in versions: 9.0.1.3 9.2.0.1
Backportable: No
Symptoms:
OERI:2662 possible on new TEMPORARY index block
Details:
ORA-600 [2662] possible on new TEMPORARY index block
Workaround:
None
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 851959
Abstract : ORA-600 [2662] OCCURRED DURING CREATE SNAPSHOT AT MASTER SITE
Details :
It is possible to get ORA-600 [2662] caused by mis-adjustment of the Oracle7 SCN (in PARALLEL SERVER mode) when an Oracle8 instance selects from
it over a DBLINK
Version affected : 7.3.4.X
Fixed in version: 7.3.4.5
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 647927 (Unpublished)
Abstract : LOCK PROCESS DIES WITH ORA-600 [2662], [0], [40057943], [0], [40063994]
Version affected 8.0.4.X
Fixed in version : 8.0.4.2 8.0.5.0
Symptoms :
Digital Unix ONLY: OERI:2662 could occur under heavy load
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 5612217 (Unpublished)
Abstract : ORA-7445 [KDKBIN] LEADING TO ORA-600 [2662] DUE TO BUFFER CORRUPTION
Version affected : 9.2.0.X
Workaround :
None
Patch details :
One-off patch available for few platforms on top of 9.2.0.7
Check the Metalink for Patch 5612217 availability.
Bug 4599505 (Unpublished)
Abstract : ORA-600 [2662] error
Version affected : 10.2.0.X

 

Fixed in version : 11.0
Symptoms :
ORA-600[2662] after flashback database.
Workaround :
This problem may disappear by itself after the database has been opened for a while and its SCN has passed the SCN of the problematic block. This is
however not a guaranteed workaround
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 2998110
Abstract :ORA-600 [2662] LARGE QUERIES ON STANDBY WITH LOCALLY MANAGED TMP TBLSP
Version affected : 9.2.0.X 10.1.0.X
Fixed in version : 10.2
Symptoms :
The scn of the tempfiles is advanced but not on any other files
when the database is opened in read only mode.
Workaround :
1) Increase the sort_area_size to avoid sort on disk thus avoiding the use of the tempfiles
–OR–
2) After opening the database read only and BEFORE executing any queries
against the standby database, drop and recreate the tempfiles.
–OR–
3) If you are on 10.1 release you can set the following parameter:
_init_tempfile_on_open=TRUE
in your init.ora/spfile and bounce the database.
Setting this parameter will clear all tempfile bitmaps when the database is opened
so the database open may be take a little longer.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
This bug is fixed in 10.2 and is not backportable to previous releases.
Note 356583.1 has been linked to this scenario.
Bug 3517013 (Unpublished)
Abstract :OPEN DB RESETLOG AFTER FLASHBACK DB FAILS ORA-600 [KCLCHKBLK_4], [1904]
Symptoms :
1) When restored the database from backup and did an incomplete recovery.
2) Opened the database with resetlogs.
3) After opening the database, you start getting following errors:
ORA-00600 [kclchkblk_4]
ORA-00600 [2662]
4) Stack trace is:- kclchkblk kcbzib kcbgcur ktfbhget ktftfcload
Cause :
1)
Error, ORA-600[KCLCHKBLK_4], is signaled because the SCN in a tempfile block
is too high. The same reason caused the ORA-600[2662]s in the alert logs.
2)
This issue is because the tempfiles may not get reinitialized during open
resetlogs.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Note 275902.1 has been linked to this scenario and solution
given under this note.

 

Many other bugs were filed with development for this issue.
Those bugs are not progressed due to
— Lack of response from the customers
— one-time occurances
— Vendor OS Problem
Disclaimer :
This note contains most frequently hit bugs that can throw the error ORA-00600 [2662] . However the above mentioned are not the complete list of
bugs that can generate this error


ORA-00600 [2662] ORA-600 [2662] “Block SCN is ahead of Current SCN”

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

ERROR:

Format: ORA-600 [2662] [a] [b] [c] [d] [e]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:

A data block SCN is ahead of the current SCN.
The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN stored in a UGA variable.
If the SCN is less than the dependent SCN then we signal the ORA-600 [2662] internal error.

ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

FUNCTIONALITY:

File and IO buffer management for redo logs

IMPACT:
INSTANCE FAILURE

POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

There are different situations where ORA-600 [2662] can be raised.

It can be raised on startup or duing database operation.

If not using Parallel Server, check that 2 instances have not mounted the same database.
Check for SMON traces and have the alert.log and trace files ready to send to support.
Check the SCN difference [argument d]-[argument b].

If the SCNs in the error are very close, then try to shutdown and startup the instance several times.
In some situations, the SCN increment during startup may permit the database to open. Keep track of the number of times you attempted a If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.
Known Issues:

NB Bug Fixed Description
4453449 10.2.0.2, 11.1.0.6 OERI:3020 / corruption errors from multiple FLASHBACK DATABASE
5889016 Corruption / OERI during recovery
2899477 9.2.0.5, 10.1.0.2 Minimise risk of a false OERI[2662]
2764106 9.2.0.5, 10.1.0.2 False OERI[2662] possible on SELECT which can crash the instance
2216823 10.1.0.2 OERI [2662] reusing a TEMPFILE with a restored database
2054025 9.0.1.3, 9.2.0.1 OERI:2662 possible on new TEMPORARY index block
P 647927 8.0.4.2, 8.0.5.0 Digital Unix ONLY: OERI:2662 could occur under heavy load
851959 7.3.4.5 OERI:2662 possible from distributed OPS select

 

INTERNAL ONLY SECTION – NOT FOR PUBLICATION OR DISTRIBUTION TO CUSTOMERS
========================================================================
There were 2 forms of this error until 7.2.3:
Type I: 4/5 argument forms –
The SCN found on a block (dependent SCN) is ahead of the
current SCN. See below for this
Type II: 1 Argument (before 7.2.3 only):
Oracle is in the process of writing a block to a log file.
If the calculated block checksum is less than or equal to 1
(0 and 1 are reserved) ORA-600 [2662] is returned.
This is a problem generating an offline immediate log marker
(kcrfwg).
*NOT DOCUMENTED HERE*
Type I
~~~~~~
a. Current SCN WRAP
b. Current SCN BASE
c. dependent SCN WRAP
d. dependent SCN BASE
e. Where present this is the DBA where the dependent SCN came from.
From kcrf.h:
If the SCN comes from the recent or current SCN then a dba
of zero is saved. If it comes from undo$ because the undo segment is
not available then the undo segment number is saved, which looks like
a block from file 0. If the SCN is for a media recovery redo (i.e.

block number == 0 in change vector), then the dba is for block 0
of the relevant datafile. If it is from another database for a
distributed transaction then dba is DBAINF(). If it comes from a TX
lock then the dba is really usn<<16+slot.
Type II
~~~~~~~
a. checksum -> log block checksum – zero if none (thread # in old format)
—————————————————————————
Diagnosis:
~~~~~~~~~~
In addition to different basic types from above, there are different
situations where ORA-600 [2662] type I can be raised.
Getting started:
~~~~~~~~~~~~~~~~
(1) is the error raised during normal database operations (i.e. when the
database is up) or during startup of the database?
(2) what is the SCN difference [d]-[b] ( subtract argument ‘b’ from arg ‘d’)?
(3) is there a fifth argument [e] ?
If so convert the dba to file# block#
Is it a data dictionary object? (file#=1)
If so find out object name with the help of reference dictionary
from second database
(4) What is the current SQL statement? (see trace)
Which table is refered to?
Does the table match the object you found in previous step?
Be careful at this point: there may be no relationship between DBA in [e]
and the real source of problem (blockdump).
Deeper analysis:
~~~~~~~~~~~~~~~~
(1) investigate trace file:
this will be a user trace file normally but could be an smon trace too
(2) search for: ‘buffer’
(“buffer dba” in Oracle7 dumps, “buffer tsn” in Oracle8/Oracle9 dumps)
this will bring you to a blockdump which usually represents the
‘real’ source of OERI:2662
WARNING: There may be more than one buffer pinned to the process
so ensure you check out all pinned buffers.
-> does the blockdump match the dba from e.?
-> what kind of blockdump is it?
(a) rollback segment header
(b) datablock
(c) other
Check list and possible causes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If Parallel Server check both nodes are using the same lock manager
instance & point at the same control files.
Possible causes:
(1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled
(2) a hardware problem, like a faulty controller, resulting in a failed
write to the control file or the redo logs
(3) restoring parts of the database from backup and not doing the
appropriate recovery
(4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP
CONTROLFILE
(5) having _DISABLE_LOGGING set during crash recovery
(6) problems with the DLM in a parallel server environment
(7) a bug

 

Solutions:
(1) if the SCNs in the error are very close, attempting a startup several
times will bump up the dscn every time we open the database even if
open fails. The database will open when dscn=scn.
(2)You can bump the SCN either on open or while the database is open
using Event:ADJUST_SCN (see Note:30681.1).
Be aware that you should rebuild the database if you use this
option.
Once this has occurred you would normally want to rebuild the
database via exp/rebuild/imp as there is no guarantee that some
other blocks are not ahead of time.
Articles:
~~~~~~~~~
Solutions:
Note:30681.1 Details of the ADJUST_SCN Event
Note:1070079.6 Alter System Checkpoint
Possible Causes:
Note:1021243.6 CHECK INIT.ORA SETTING _DISABLE_LOGGING
Note:41399.1 Forcing the database open with `_ALLOW_RESETLOGS_CORRUPTION`
Note:851959.9 OERI:2662 DURING CREATE SNAPSHOT AT MASTER SITE
Known Bugs:
~~~~~~~~~~~
Fixed In. Bug No. Description
———+————+—————————————————-
7.1.5 BUG:229873
7.1.3 Bug:195115 Miscalculation of SCN on startup for distributed TX ?
7.1.6.2.7 Bug:297197 Port specific Solaris OPS problem
7.3 Bug:336196 Port specific IBM SP AIX problem -> dlm issue
7.3.4.5 Bug:851959 OERI:2662 possible from distributed OPS select
Not fixed Bug:2216823 OERI:2662 reported when reusing tempfile with restored DB
8.1.7.4 Bug:2177050 OERI:729 space leak possible (with tags “define var info”/”oactoid info”)
can corrupt UGA and cause OERI:2662

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
2662 2662 2662 2662 2662 2662 2662 2662 2662
2662 2662 2662 2662 2662 2662 2662 2662 2662

 

Oracle ORA-600[4000] ORA-00600[4000]

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.4 to 11.1.0.7
Information in this document applies to any platform.

Purpose

Symptoms

Database fails to start because of ora-600[4000].

Alert.log will show:

Errors in file /oracle/admin/sdwh/udump/sdwh_ora_13186.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Tue Sep 9 14:48:04 2008
Error 704 happened during db open, shutting down database
sdwh_ora_13186.trc shows:
*** 2008-09-09 15:33:26.194
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
..
..
row cache parent object: address=0xc9efb27c cid=3(dc_rollback_segments)
hash=35e74caf typ=5 transaction=(nil) flags=00000001
own=0xc9efb2f0[0xc7c83ba0,0xc7c83ba0] wat=0xc9efb2f8[0xc9efb2f8,0xc9efb2f8] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
data=
00000001 ….
BH (0x0x6ffff4ac) file#: 1 rdba: 0x0040007a (1/122) class 1 ba: 0x0x6ff8a000
set: 17 dbwrid: 0 obj: 18 objn: 18
hash: [74ffdc70,c85d94cc] lru: [6ffffad4,c771aabc]
ckptq: [NULL] fileq: [NULL]
use: [c84043f0,c84043f0] wait: [NULL]
st: XCURRENT md: SHR rsop: 0x(nil) tch: 0

 

LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0]
Using State Objects
—————————————-
SO: 0xc84043d0, type: 24, owner: 0xc722382c, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0x0xc71d1440 FLG: 0x500400
lock rls: 0x(nil), class bit: 0x(nil)
kcbbfbp: [BH: 0x0x6ffff4ac, LINK: 0x0xc84043f0]
where: kdswh02: kdsgrp, why: 0
buffer tsn: 0 rdba: 0x0040007a (1/122)
scn: 0x0000.15ad85b0 seq: 0x01 flg: 0x06 tail: 0x85b00601
frmt: 0x02 chkval: 0xabfc type: 0x06=trans data
Block header dump: 0x0040007a
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.15ad85ad itc: 1 flg: – typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0001.027.000056dc 0x0080d065.16f2.14 –U- 1 fsc 0x0000.15ad85b0
Trace file shows _SYSSMU1$ has a TX against obj$, and the scn ofthe block touched by this TX is scn:
0x0000.15ad85b0 –> 363693488 decimal.
The ora-600[4000] could be raised at startup if the above scn is ahead of the database SCN.
Last Review Date
October 3, 2008
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are
included in the document to assist in troubleshooting.
Troubleshooting Details
1) Find database SCN
SQL> startup mount
SQL> select checkpoint_change# from v$database;
2) SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
3) set parameter _minimum_giga_scn=<result from 2> in the init.ora file.
Using the above trace file example, we found:
SQL> select checkpoint_change# from v$database;
355532971
As suspected the database scn = 355532971 is lower than TX scn=363693488.
SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
Enter value for decimal_scn_expected: 363693488
old 1: select ceil(&decimal_scn_expected/1024/1024/1024) from dual
new 1: select ceil(363693488/1024/1024/1024) from dual

 

CEIL(363693488/1024/1024/1024)
——————————
1
1) set parameter _minimum_giga_scn=1 in the init.ora file.
2) open the database
startup mount
recover database
alter database open;
4) Startup database
SQL> startup mount
SQL> recover database
SQL> alter database open;
5) If database opens:
– remove parameter _minimum_giga_scn from init.ora and bounce database
SQL> shutdown immediate
SQL> startup
6) Investigate what could cause the ora-600[4000] , could be because customer forced to open database
using _allow_resetlogs_corruption, and if this is the case we strongly suggest to recreate the database
from scratch taking a full export.

 

 

ORA-600 [4000] “trying to get dba of undo segment header block from usn”

Format: ORA-600 [4000] [a]
VERSIONS:
version 6.0 to 9.2
DESCRIPTION:
This has the potential to be a very serious error.
It means that Oracle has tried to find an undo segment number in the
dictionary cache and failed.
ARGUMENTS:
Arg [a] Undo segment number
FUNCTIONALITY:
KERNEL TRANSACTION UNDO
IMPACT:
INSTANCE FAILURE – Instance will not restart
STATEMENT FAILURE
SUGGESTIONS:
As per Note 1371820.8, this can be seen when executing DML on tables residing
in tablespaces transported from another database.
It is fixed in 8.1.7.4, 9.0.1.4 and 9.2.0.1 The workaround however is to
create more rollback segments in the target database until the highest
rollback segment number (select max(US#) from sys.undo$;) is at least
as high as in equivalent max(US#) from the source database.
It has also been seen where memory has been corrupted so try shutting
down and restarting the instance.
If the database will not start contact Oracle Support Services
immediately, providing the alert.log and associated trace files

 

NB Bug Fixed Description
* 9145541 11.1.0.7.4, 11.2.0.1.2, OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile

* 9145541 11.1.0.7.4, 11.2.0.1.2,
11.2.0.2, 12.1.0.0
OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile
after CREATE CONTROLFILE in 11g
+ 10425010 11.2.0.3, 12.1 Stale data blocks may be returned by Exadata FlashCache
12353983 ORA-600 [4000] with XA in RAC
7687856 11.2.0.1 ORA-600 [4000] from DML on transported ASSM tablespace
2917441 11.1.0.6 OERI [4000] during startup
3115733 9.2.0.5, 10.1.0.2 OERI[4000] / index corruption can occur during index coalesce
2959556 9.2.0.5, 10.1.0.2 STARTUP after an ORA-701 fails with OERI[4000]
1371820 8.1.7.4, 9.0.1.4, 9.2.0.1 OERI:4506 / OERI:4000 possible against transported tablespace
+ 434596 7.3.4.2, 8.0.3.0 ORA-600[4000] from altering storage of BOOTSTRAP$

 

Bug 1362499
ORA-600 [4000] after migrating 7.3.4.3 to 8.0.6.1 on HP-UX 32-bit
Specific to HP-UX, fixed in one-off patch

 

Historic info on the Oracle 7.3.x issues re unlimited extents and bootstrap$
In 7.3.4 then due to Bug:434596, this can result from altering the
SYS.BOOTSTRAP$ table.
When a SHUTDOWN command follows this, the database will not startup again.
Example: Any of following modifications of SYS.BOOTSTRAP$
will cause this error:
ALTER TABLE BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED );
ALTER TABLE BOOTSTRAP$ STORAGE (NEXT 1024);
ALTER TABLE SYS.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
ALTER TABLE sys.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
A lock byte is now set on the SYS.BOOTSTRAP$ segment header and
following shutdown the database will not start.
A select from bootstrap$ before shutdown will cleanout the lock on
the SYS.BOOTSTRAP$ segment header and prevent the errors from occuring.
Example: Issue the following BEFORE shutdown:
sql> select count(*) from sys.bootstrap$;
Get a backup history of the Database/s and the exact sequence of steps performed.
Two possible options
a) Go back to backup before the storage clause on BOOTSTRAP$ was changed
b) Oracle Support may be able to patch bootstrap$. See Note:43132.1
Obviously, option a) is always the way to go if at all possible.
Articles:
ALERT about changing MAXEXTENTS to UNLIMITED Note:50380.1
Another cause of an ORA-600 [4000] is that a block scn is ahead of the database scn.
In that case the block with the high scn could be printed in the trace file and

 

Event ADJUST_SCN or parameter _MINIMUM_GIGA_SCN Note:552438.1 can be used to bump the SCN.

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000

Oracle ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [4194] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and rollback (Undo) records.

We are validating the Undo record number relating to the change being
applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails.

ARGUMENTS:
Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

FUNCTIONALITY:
Kernel Transaction Undo called from Cache layer

IMPACT:
PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

 

NB Bug Fixed Description
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction
3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC
+ 792610 8.0.6.0, 8.1.6.0 Rollback segment corruption OERI:4194 can occur if block checking detects a
corrupt block

 

Historic information:
7.3.3 to 8.1.5
==============
Note:69863.1 ALERT: Apparent data corruptions involving Solaris 2.6,
ISM & DR on Starfire
Check USE_ISM parameter on SUN Solaris E10000 Platforms.
ORA-600 [4194] [a] [b]
Versions: 6.0 – 9.2 Source: ktuc.c
===========================================================================
Meaning:
Undo record number mismatch while adding an undo record to an undo
block. This is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhcnt): undo record count – This is the maximum number of undo
records that have ever existed
within this Undo Block. In other
words, it is the High Water Mark for
undo records in that undo block.
This is from the Undo Block.
b. (ktudbrec): redo record number – This is the record number for the
new undo record that is to be added
to the undo block. It should be
one greater than the maximum in the
undo block currently. This is from
the Redo Record.
—————————————————————————
Diagnosis:

 

This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block), we check that the number of undo
records in the undo block +1 matches the record number in the redo
record. Because we are adding a new undo record, we know that the record
number in that undo block must be one greater than the maximum number in
that block.
So for UBA=0x08000592.00a0.0b
0x08000592 is the dba of the undo block.
0x00a0 is the seq# number that is in the block that THIS UNDO IS TO
BE APPLIED TO.
0x0b is the number of undo records in the undo block.
In the header this looks like:
UNDO BLK::
xid: 0x0004.00e.0000017f seq: 0x00a0 cnt: 0x0b ……..
Since we are adding a new undo record to our undo block, we would expect
that the new record number is equal to the maximum record number in the
undo block +1. If this is not the case, we get ORA 600 [4194].
This implies some kind of block corruption in either the redo or the
undo block. Look for other errors that would imply that a block is
corrupted.
Note: If the ORA-4194 follows another ORA-600 AND IF AND ONLY IF
the arguments [a] and [b] are the same, then this MAY be due
to Bug:792610 which can cause undo corruption following a
failed block change.
Note:452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0.
Bugs must be closed or hold useful information.)
Fixed In. Bug No. Description
———+————+—————————————————-
8.0.6/8.1.6 Bug:792610 ORA-600 during redo application to a block may
in turn cause an OERI:4194 on the undo block.
E.g., block checking noticing a corrupt index
block during a multi-row insert.
7.1.5 Bug:239671 Truncate (could possibly happen on other
operations too) on 16k+ block size can cause
the maximum number of undo records in a block
(255) to be exceeded.

Workarounds: Use < 16K blocksize, or avoid
using the TRUNCATE command with the DROP
STORAGE option (which is the default).
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194

Oracle ORA-00600 [4193] ORA-600 [4193] “seq# mismatch while adding undo record”

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [4193] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and Rollback (Undo) records.
We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.

This error is reported when this validation fails.
ARGUMENTS:

Arg [a] Undo record seq number
Arg [b] Redo record seq number

FUNCTIONALITY:

KERNEL TRANSACTION UNDO

IMPACT:

PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

 

This error may indicate a rollback segment corruption.
This may require a recovery from a database backup depending on the situation.

 

NB Bug Fixed Description
14034244 11.2.0.3.BP09,
12.1.0.0 Lost write type corruption using ASM in 11.2.0.3
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

 

ORA-600 [4193] [a] [b] [ ] [ ] [ ]
Versions: 7.2.2 – 9.2.0 Source: ktuc.c
===========================================================================
Meaning: seq# mismatch while adding an undo record to an undo block. This
is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhseq): undo record seq# – this is the seq# of the block that
this undo record WILL BE APPLIED TO.
This is from the Undo Block. It is
NOT the seq# of the undo block itself.
b. (ktudbseq): redo RECORD seq# – this is the seq# number in the block
that this redo WILL BE APPLIED TO.
This is from the Redo Record.
—————————————————————————
Diagnosis:
This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block) we check that the seq# in the undo
record matches the seq# in the redo record. These seq# should be the
same because when we apply a redo record we must apply it to the
correct version of the block. We can only apply a redo record to a
block that contains the same seq# as in the redo record.
If the seq# do not match then this error is raised. This implies some
kind of block corruption in either the redo or the undo block.
7.3.x – 8.1.7.x
ASSERT2(ubh->ktubhseq == db->ktudbseq, OERI(4193), KSESVSGN,
ubh->ktubhseq, db->ktudbseq);
9.2.x
ksesic2(OERI(4193), ksenrg(ubh->ktubhseq), ksenrg(db->ktudbseq));
struct ktubh
{
kxid ktubhxid; /* txid of tx currently using or last used this block */
ub2 ktubhseq; /* undo block sequence number */
ub1 ktubhcnt; /* high water mark record index, number of undo entries */

 

 

ub1 ktubhirb; /* rollback record index, rec index to start the rollback */
ub1 ktubhicl; /* collecting record index, rec index to start retrieving col info */
ub1 ktubhflg; /* dummy */
ub2 ktubhidx[1]; /* byte offset of record in block, grows at runtime */
};
struct ktudb Kernel Transaction Undo Data operation Block (redo)
{
ub2 ktudbsiz; /* size of entry */
ub2 ktudbspc; /* verification: space left in undo block */
ub2 ktudbflg; /* flag to indicate the kind of redo operation */
kxid ktudbxid; /* current tx id */
ub2 ktudbseq; /* block sequence number */
ub1 ktudbrec; /* new record index for this change */
};
Note 452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
Articles:
None
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0)
(Bugs must be closed or hold useful information)
Fixed In. Bug No. Description
———+————+—————————————————-
7.X Bug:XXXXXX Desc
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193

Oracle ORA-600 [4097] ORA-00600 [4097] “Corruption”

$
0
0

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [4097]

VERSIONS: versions 7.3 to

DESCRIPTION:

We are accessing a rollback segment header to see if a transaction has been committed.

However, the xid given is in the future of the transaction table.

This could be due to a rollback segment corruption issue OR you might be hitting the following known problem.

FUNCTIONALITY: Rollback

IMPACT:

If known issue (see below) this might cause missing data.
Otherwise, this could be a possible rollback segment corruption issue.

Known Bugs

NB

Bug

Fixed

Description

13340388

11.2.0.3.3, 11.2.0.3.BP07, 12.1.0.0

ORA-600 [kzaxpopr14 -Error in decoding xml text] when querying V$XML_AUDIT_TRAIL

 

OERI SECUREFILE TRANSPORT
10249791 11.2.0.2.BP02,on DMLS referencing SECUREFILE plugged

11.2.0.2.7,

11.2.0.3, 12.1.0.0 11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.0 11.1.0.7.2, 11.2.0.1.1,

11.2.0.2, 12.1.0.0

7687856 11.2.0.1 5653641 11.2.0.1

ORA-600 [4097] / ORA-600 [4000] reported using transportable tablespaces

* 9145541

OERI[25027]/OERI[4097]/OERI[4000]/ORA- 1555 in plugged datafile after CREATE CONTROLFILE in 11g

OERI[4097] after using distributed 8565708 11.2.0.1.BP04,transactions in RAC

3613078

2628232

9.2.0.6,

ORA-600 [4000] from DML on transported ASSM tablespace
Corrupt dictionary from DROP TABLESPACE containing _offline_rollback_segments OERI[4097] from DML on TRANSPORTED tables with ASSM

Block corruption possible on temp files

ORA-600’s from CR served block from a plugged in tablespace

OERI:4097 possible on objects in read only transported tablespace

Tru64: OERI:4097 possible on RAC / OPS

Drop of Rollback segments can cause OERI:4097 / missing data

10.1.0.3 3249755 9.2.0.5, 10.1.0.2 9.2.0.4,

10.1.0.2

8.1.7.4, 2165601 9.0.1.3, 9.2.0.1

P 1885251 * 427389

‘*’ against a bug indicates that an alert exists for that issue. ‘+’ indicates a particularly notable bug.
‘P’ indicates a port specific bug.
‘@’ indicates UNPUBLISHED information

Fixed versions use “BPnn” to indicate Exadata bundle nn. “OERI:xxxx” may be used as shorthand for ORA-600 [xxxx].

9015PSE, 9.2.0.1 7.3.3.3, 7.3.4.0, 8.0.3.0

Some historic info….

Upgrade/install a patchset to bring the database to one of the following levels : 7.3.3.3, 7.3.4.0, 8.0.3.0
To avoid encountering this bug, rollback segments should only be dropped and recreatedaftertheinstancehasbeenshutdownnormalandrestarted. Ifyou have already encountered the bug, use the following workaround:
Possible workaround:
– Drop all rollback segments, except for SYSTEM
– Create the same number of rollback segments, small ones, with different names – recreate the original rollback segments
– drop the small dummy rollback segments
Every time you need to add a rollback segment, first create all of the dummy segments again, to make sure they use up the old segment numbers. Then create the new segment, then drop all dummy segments.
If you are getting this error not because of the above bug — see Description to see how you could run into the bug — then you might have a rollback segmentcorruptionissue. Typicalcausesaremediacorruptiontothe rollbacksegmentblocks,checkyourhardware. Toworkaroundarollback segment corruption problem (not because of known bug above) log the
issue with Oracle support.

ORA-600 [4097]
Versions:7.1.3 -7.3.2

 

Source:ktu.c

 

===========================================================================

 

Meaning:

We are accessing a rollback segment header to see if a transaction has beencommitted. However,thexidgivenisinthefutureofthe transaction table. Ie: the WRAP of the XID is higher than
the current WRAP number on the RBS header.

————————————————————————— Argument Description:

No arguments.

————————————————————————— Diagnosis:

This should be considered as a corruption.

1. Try to identify which object has this TX in its ITL list. (see the trace file)

 

  1. If this object is recreatable that may be an option but we cant be sure whether it is the TX table that is too old or the block holding the ITL that is corrupt.
  2. You MAY be able to recreate the RBS – It is safest to force

    cleanout of all blocks before recreating the RBS (by FTS and recreating indexes).

Typical causes are media corruption to the data or RBS blocks, especially lost writes to RBS header.
This is also possible if rollback segments are recreated after a shutdownabort. SeeBug:427389 fordetails&options.Inthis case no data is corrupt, the rollback segments are just out of step.

Description and Workarounds for Bug:427389 Note:1011003.102

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4097 4097 4097 4097 4097

4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097 4097

 

Viewing all 175 articles
Browse latest View live