Retrieve the data from Corrupted oracle table partition

March 1, 2017, 9:54 pm

≪ Previous: Oracle Database Recover with prm-dul

User Case:

I just want to know whether we can able to recover the data from the corrupted partitions. In this case, not the whole partition got corrupted, some of the blocks in those partitions are corrupted. Unfortunately there was no backup being enabled. Could you pls let me know whether any option is there to recover those. And also let us know whether i can have any option that i can take to recover those corrupted blocks since the data is very critical for us.

Could you please let me know wether i can have any option to select the records from the table.

Because when i am trying to select the fields from the table (Where the table is partitioned (Blocks corrupted in those)), it is throwing an error "object no longer exists".

Is there any way that i can select the particular column values?

We need to get files from that table name only. Without putting a select statement we could not able to know what and all files we need to reload. That is why i have asked whether any options are there to select the particular records alone.

Feedback:

Our software Prm-dul can load the corrupted partition : data block corrupted or even segment header lost/corrupted. We can provide prm-dul as product or recovery service.

"object no longer exists". is ORA-08103 error, it means your segment has wrong extent map info . If you select any record the error will raise to prevent your querying . You can just unload this records by using PRM-DUL , and rebuild your partition .

You don't need to get files or any select statement if you use prm-dul. The software can load data from oracle datafile directly, without Oracle instance & SQL statement.

↧

Oracle Data Recovery Query

March 1, 2017, 10:07 pm

≫ Next: Need help with corrupted Oracle Datapump export

≪ Previous: Retrieve the data from Corrupted oracle table partition

user case:

We have worked on the demo(community edition) of your PRM-DUL An Oracle Database Recovery/Unloader Software. We intend to purchase this product immediately if you can clear our below

Doubts.

1. Can we able to unload the recovered data beginning from the last happened data(or based on a date).

2. Need more clarification on the term per data base licensing .For e.g.: our company having 3 distributed Oracle DB instances, but we use the same “Net Service Name” for all ,but with different Oracle Instances(For e.g.: TGORCL or TGORCL.COM). Can we use one DB license to meet our above mentioned case.

Kindly reply or call back as early as possible .Waiting for your reply.

Feedback:

1. Can we able to unload the recovered data beginning from the last happened data(or based on a date).

No , we can't . PRM-DUL can just unload the whole table but not a range .

2 . Need more clarification on the term per data base licensing .For e.g.: our company having 3 distributed Oracle DB instances, but we use the same “Net Service Name” for all ,but with different Oracle Instances(For e.g.: TGORCL orTGORCL.COM). Can we use one DB license to meet our above mentioned case.

It's up to the DB_NAME parameter , for example 3 databases have same DB_NAME ,then they can share a license key .

↧

Need help with corrupted Oracle Datapump export

March 1, 2017, 10:12 pm

≫ Next: PRM-DUL capabilities

≪ Previous: Oracle Data Recovery Query

I have a corrupted Datapump export (Oracle 11g). How do I use your tool to extract 2 tables from it?

I have downloaded DUL4108.

feedback:

We can provide a data recovery service for corrupted Datapump export. For extract data from datapump is not a packaged function in PRM-DUL 4108 .

We wonder what about the export size , can you pls send us the datapump file?

↧

PRM-DUL capabilities

March 1, 2017, 10:24 pm

≫ Next: Oracle data recovery without any system file.

≪ Previous: Need help with corrupted Oracle Datapump export

I am a technology consultant working in Israel.

One of my Major customers is looking for a solution to download tables from an Oracle 11g database to files, in the most cost effective way - in terms of Product price, performance and flexibility (Ability to use the tool with configuration, without writing Code).

Regretfully, All documentation i found on your site is Chinese - I need some English document that describes the capabilities.

Can you send one to me? If the solution looks good, we can proceed.

feedback :

All documents(English version) for PRM-DUL is on website: http://www.parnassusdata.com/en , the links are on the bottom of the website.

You can download them for reference.

like: Oracle PRM-DUL Whitepaper ParnassusData Recovery Manager For Oracle Database User Guide V0.4

http://7xl1jo.com2.z0.glb.qiniucdn.com/ORACLE%20PRM-DUL%20data%20unloade...

↧

Oracle data recovery without any system file.

March 1, 2017, 10:36 pm

≫ Next: Help to recover Oracle database

≪ Previous: PRM-DUL capabilities

I have dbf files without any system file. I tried but could not recover data properly from attached files.

can you please help?

feedback:

Thank you for visiting our Parnassusdata website and ask for help. This is from PD services.

I think you can first download PRM-DUL product to work on your dbf file thru Non-dictionary mode.

You can follow below guide to know how to use this tool for dbf file recovery.

http://7xl1jo.com2.z0.glb.qiniucdn.com/ParnassusData%20Recovery%20Manager%20For%20Oracle%20Database%20User%20Guide%20V0.3.pdf

If you can rescue your data by using our product, you can consider buying this product.

Our remote online service is not for free. But if you need our advanced services.

We can talk about this further.

 We can find some data from unloaded dbf via PRM-DUL (http://7xl1jo.com2.z0.glb.qiniucdn.com/DUL3206.zip),you can try it also.

 Pls let me know your country and your database character  set. While we know the database characterset ,then you can avoid load data in wrong format.

  You can also find me on skype : liu.maclean@gmail.com

↧

Help to recover Oracle database

March 14, 2017, 12:10 am

≫ Next: oracle Tablespace deleted issue

≪ Previous: Oracle data recovery without any system file.

Question :

We had the lost of our oracle database, and we couldn't recover using oracle recovery methods, we were trying your product but we couldn't recover nothing in a test environment, your tool doesn't work when we selected the datafiles and tried to recover.

Before buy the licence we need to Know if the database will be recover, we need some help to make the test and take the decision to buy the product.

I am going to explain the situation, the server where de database is suffering damage to the disk, then the tablespace system is corrupt and we can not read and we can not start up the database. Additionally the backup off the database was lost too and tho end our problem the export backup was damage too, in conclusion we only have the datafiles to recover data.

Answer:

pls send us the prm.log , if possible also send us your system tablespace datafile .

At least prm can process your system datafile as picture:

​

prm.log show:

Error when process inserting data into prm_files or prm_work_mode: Database './dbinfo/parnassus_dbinfo_DB_20170312094809' not found.

I think you may have no proper permission to write(PRM_HOME/dbinfo/parnassus_dbinfo_DB_20170312094809) this directory , can you pls try root user ?

Yes, we think prm can recover this case . And I will send you a payment link if you think it's ok to purchase .

All documents(English version) for PRM-DUL is on website: http://www.parnassusdata.com/en, the links are on the bottom of the website.

↧

oracle Tablespace deleted issue

March 14, 2017, 12:16 am

≫ Next: How to Resolve Ora-00600 [3020] when Allow 1 Corruption Doesnot work

≪ Previous: Help to recover Oracle database

There are 7 .dbf file with wrong extension(.dfb) in a tablespace but DBA has deleted these files physically and tablespace is offline.

Now problem is that tablespace is not alter online and don't have have backup of this tablespace.

How to sole this issue?

Please guide

Answer:

Yes, we can try scan oracle block directly from filesystem by prm-scan tools , pls shutdown your database first and offline your filesystem or take it as read only mode.

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

↧

How to Resolve Ora-00600 [3020] when Allow 1 Corruption Doesnot work

March 14, 2017, 8:15 pm

≫ Next: ora-00600 [kfcema02] cause the diskgroup can not bring up.

≪ Previous: oracle Tablespace deleted issue

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Oracle Server - Enterprise Edition - Version: 9.2.0.4 and later [Release: 9.2 and later ]

Information in this document applies to any platform.

Goal

This Article is to help users resolve ora-00600[3020] when

> Restore and recovery of the datafile gives the same error .

> Allow 1 corruption doesnot work .

> Customer has no backup of the problematic datafile .

Warning :-

These steps shouldnot be used on System or Undo datafiles as they would cause data/dictionary inconsistency.

The options to resolve this issue is to corrupt the blocks (when recovering) and use some salvage option to get lost data for the affected segments .

ORA-00283: recovery session canceled due to errors

ORA-00600: internal error code, arguments: [3020], [385882742], [1], [330236],

[49015], [200], [], []

ORA-10567: Redo is inconsistent with data block (file# 92, block# 6774)

ORA-10564: tablespace TSPACE5

ORA-01110: data file 92: '/bill/oradata/data9/tspave5_07.dbf'

ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 172573

Solution

Step 1 :- Identify the datafile on which Ora-00600[3020]

First step is to Identify on which datafile ora-00600[3020] is reported.

Taking the above example :-

ORA-00283: recovery session canceled due to errors

ORA-00600: internal error code, arguments: [3020], [385882742], [1], [330236],

[49015], [200], [], []

ORA-10567: Redo is inconsistent with data block (file# 92, block# 6774)

ORA-10564: tablespace TSPACE5

ORA-01110: data file 92: '/bill/oradata/data9/tspave5_07.dbf'

ORA-10561: block

In the above example the datafile having the issue is

data file 92: '/bill/oradata/data9/tspave5_07.dbf'

Step 2 :-

Try to recover the block using allow 1 corruption

ALTER DATABASE RECOVER datafile '/bill/oradata/data9/tspave5_07.dbf' allow 1 corruption

This would fail on the same block with the same error.

Step 3 :-

Take a backup of the existing state of the affected datafile.

Step 4 :- Configure BBED for usage

From 11G onwards BBED is not available, but DD can be used.

a. Generate the bbed executable:

cd $ORACLE_HOME/rdbms/lib

make -f ins_rdbms.mk `pwd`/bbed

mv bbed $ORACLE_HOME/bin

b. Create file file.list with the datafile where datafile on which Ora-00600[3020] is stored:

file.list has:

In our session file.list contains:

92 /bill/oradata/data9/tspave5_07.dbf

cat file.list

92 /bill/oradata/data9/tspave5_07.dbf

c. Create file bbed.par

bbed.par has:

MODE=EDIT

LISTFILE=<File name created in step b>

BLOCKSIZE=<db_block_size>

In our session bbed.par contains

MODE=EDIT

LISTFILE=file.list

BLOCKSIZE=8192

cat bbed.par

d. Run bbed. Use password blockedit:

$ bbed parfile=bbed.par

Password:

BBED: Release 2.0.0.0.0 - Limited Production on Mon Apr 13 11:20:42 2009

************* !!! For Oracle Internal Use only !!! ***************

BBED>

e. Go to Block where the Ora-00600[3020] is reported . In our example it is block 6774:

BBED> set block 6774

BLOCK# 6774

Verify that everything is set correctly:

BBED > Show all

-> FILE# 92

BLOCK# 6774

OFFSET 0

DBA 0x17001a66 (385882726 92,6774)

-> FILENAME /bill/oradata/data9/tspave5_07.dbf

BIFILE bifile.bbd

-> LISTFILE /home/oracle/bbed/listfiles.txt

-> BLOCKSIZE 8192

-> MODE Browse

EDIT Unrecoverable

IBASE Dec

OBASE Dec

WIDTH 80

COUNT 512

LOGFILE log.bbd

SPOOL No

f. Run map to see the C structures for the block and the DBA:

BBED> map

File: /bill/oradata/data9/tspave5_07.dbf (92)

Block: 92 Dba:0x17001a66

------------------------------------------------------------

KTB Data Block (Table/Cluster)

struct kcbh, 20 bytes @0

struct ktbbh, 72 bytes @20

struct kdbh, 14 bytes @100

struct kdbt[1], 4 bytes @114

sb2 kdbr[519] @118

ub1 freespace[809] @1156

ub1 rowdata[6223] @1965

ub4 tailchk @8188

g print kcbh

BBED> print kcbh

struct kcbh, 20 bytes @0

ub1 type_kcbh @0 0x06

ub1 frmt_kcbh @1 0x02

ub1 spare1_kcbh @2 0x00

ub1 spare2_kcbh @3 0x00

ub4 rdba_kcbh @4 0x17001a66

ub4 bas_kcbh @8 0x002eda83

ub2 wrp_kcbh @12 0x0000

ub1 seq_kcbh @14 0x9b

ub1 flg_kcbh @15 0x04 (KCBHFCKV)

ub2 chkval_kcbh @16 0x205f

ub2 spare3_kcbh @18 0x0000

We will mark the sequence as FF and Flag as 00 Corrupt the dba

BBED>Corrupt dba

BBED> print kcbh

struct kcbh, 20 bytes @0

ub1 type_kcbh @0 0x06

ub1 frmt_kcbh @1 0x02

ub1 spare1_kcbh @2 0x00

ub1 spare2_kcbh @3 0x00

ub4 rdba_kcbh @4 0x17001a66

ub4 bas_kcbh @8 0x00000000 ----------------------->Zeroed out

ub2 wrp_kcbh @12 0x0000 ----------------------->Zeroed out

ub1 seq_kcbh @14 0xff ------->Sequence marked FF

ub1 flg_kcbh @15 0x04 (KCBHFCKV)

ub2 chkval_kcbh @16 0x2071

ub2 spare3_kcbh @18 0x0000

Step 5

======

ALTER DATABASE RECOVER datafile '/bill/oradata/data9/tspave5_07.dbf' allow 1 corruption

This would go through now.

However if the issue is with other Block allow ORA-00600[3020] would be reported on next corrupt block. Re-run allow 1 corruption again and check if it passes beyond the next block if yes bring the datafile online. Else the patch the next block using the above steps.

Step 6

=====

Once the blocks are patching the object which contains the corrupt block would error out with ORA-1578

Salvage the Good data excluding the corrupt block and recreate the Object

Run the query from dba_extents for the datafile and block reported corrupt during Stuck recovery

SQL>Select segment_name,segment_type,owner from dba_extents where file_id=<file number> and <block Id> between block_id and block_id + blocks -1 ;

SQL> alter session set events '10231 trace name context forever, level 10'

SQL> Create table Salvage_table as select * from <original table> ;

You can then truncate the original table and re-insert good data from Salvage table.

Please note :- From 11g onwards bbed is not shipped .

For 11g database you can use the following Rman command to mark the block softcorrupt

RMAN> BLOCKRECOVER DATAFILE <file#> BLOCK <block1#> CLEAR ;

Please refer

How to soft Corrupt Block using RMAN to produce ORA-01578

↧

ora-00600 [kfcema02] cause the diskgroup can not bring up.

March 14, 2017, 8:50 pm

≫ Next: Collecting The Required Information For Support To Troubleshot Oracle ASM/ASMLIB Issues.

≪ Previous: How to Resolve Ora-00600 [3020] when Allow 1 Corruption Doesnot work

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Encountered a disk storage destroyed yesterday.

After recovery get this disk error.

Found rac can not bring up due to the diskgroup can not mount.

SQL> ALTER DISKGROUP ALL MOUNT

...

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

alert__ASM3.log

===================

SQL> ALTER DISKGROUP ALL MOUNT

NOTE: cache registered group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: cache began mount (not first) of group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: cache registered group DG_ORA number=2 incarn=0x873827c3

NOTE: cache began mount (first) of group DG_ORA number=2 incarn=0x873827c3

WARNING::ASMLIB library not found. See trace file for details.

NOTE: Assigning number (1,0) to disk (/dev/raw/raw10)

NOTE: Assigning number (2,6) to disk (/dev/raw/raw9)

NOTE: Assigning number (2,5) to disk (/dev/raw/raw8)

NOTE: Assigning number (2,4) to disk (/dev/raw/raw7)

NOTE: Assigning number (2,3) to disk (/dev/raw/raw6)

NOTE: Assigning number (2,2) to disk (/dev/raw/raw5)

NOTE: Assigning number (2,1) to disk (/dev/raw/raw4)

NOTE: Assigning number (2,0) to disk (/dev/raw/raw3)

kfdp_query(ASMREDO1): 3

kfdp_queryBg(): 3

NOTE: cache opening disk 0 of grp 1: ASMREDO1_0000 path:/dev/raw/raw10

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache mounting (not first) group 1/0x0A4827C2 (ASMREDO1)

kjbdomatt send to node 0

NOTE: attached to recovery domain 1

NOTE: LGWR attempting to mount thread 2 for diskgroup 1

NOTE: LGWR mounted thread 2 for disk group 1

NOTE: opening chunk 2 at fcn 0.146305 ABA

NOTE: seq=6 blk=5782

NOTE: cache mounting group 1/0x0A4827C2 (ASMREDO1) succeeded

NOTE: cache ending mount (success) of group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: start heartbeating (grp 2)

kfdp_query(DG_ORA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)

* allocate domain 2, invalid = TRUE

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: ALTER DISKGROUP ALL MOUNT

NOTE: cache dismounting group 2/0x873827C3 (DG_ORA)

NOTE: lgwr not being msg'd to dismount

kjbdomdet send to node 0

detach from dom 2, sending detach message to node 0

Please provide the following:

-- AMDU output

Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)

-- Kfed read output of all the disks that are part of the diskgroup you are unable to mount.

-Let us use the kfed to read the device

Building and using the kfed utility

------------------------------------------------

* For releases 10.2.0.X and up execute:

1) Change to the rdbms/lib directory:

% cd $ORACLE_HOME/rdbms/lib

2) Generate the executable:

10.2.0.XX:

% make -f ins_rdbms.mk ikfed

Using kfed:

Reading a file:

kfed read

example:

% kfed read /dev/rdsk/emcpower10a

-Please run the kfed read on the disks and provide me with the output

/dev/raw/raw1: bound to major 8, minor 16

/dev/raw/raw2: bound to major 8, minor 32

/dev/raw/raw3: bound to major 8, minor 48

/dev/raw/raw4: bound to major 8, minor 64

/dev/raw/raw5: bound to major 8, minor 80

/dev/raw/raw6: bound to major 8, minor 96

/dev/raw/raw7: bound to major 8, minor 112

/dev/raw/raw8: bound to major 8, minor 128

/dev/raw/raw9: bound to major 8, minor 144

/dev/raw/raw10: bound to major 8, minor 160

<<< from the above disks -- do they all belong to the diskgroup?

kfed read /dev/raw/raw4

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1

kfbh.check: 2061250939 ; 0x00c: 0x7adc317b

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 1 ; 0x024: 0x0001

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0001 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0001 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw5

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483650 ; 0x008: TYPE=0x8 NUMB=0x2

kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 2 ; 0x024: 0x0002

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0002 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0002 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw6

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483651 ; 0x008: TYPE=0x8 NUMB=0x3

kfbh.check: 2061320572 ; 0x00c: 0x7add417c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 3 ; 0x024: 0x0003

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0003 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0003 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw7

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483652 ; 0x008: TYPE=0x8 NUMB=0x4

kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 4 ; 0x024: 0x0004

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0004 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0004 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw8

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483653 ; 0x008: TYPE=0x8 NUMB=0x5

kfbh.check: 2061320572 ; 0x00c: 0x7add417c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 5 ; 0x024: 0x0005

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0005 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0005 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw9

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6

kfbh.check: 2059439481 ; 0x00c: 0x7ac08d79

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 6 ; 0x024: 0x0006

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0006 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0006 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw10

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0

kfbh.check: 4131885754 ; 0x00c: 0xf64792ba

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 0 ; 0x024: 0x0000

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: ASMREDO1_0000 ; 0x028: length=13

kfdhdb.grpname: ASMREDO1 ; 0x048: length=8

kfdhdb.fgname: ASMREDO1_0000 ; 0x068: length=13

kfdhdb.capname: ; 0x088: length=0

I have the kfed read outputs only from

/dev/raw/raw4

/dev/raw/raw5

/dev/raw/raw6

/dev/raw/raw7

/dev/raw/raw8

/dev/raw/raw9

/dev/raw/raw10

From the AMDU output -- we see :

----------------------------- DISK REPORT N0009 ------------------------------

Disk Path: /dev/raw/raw2

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 2048 megabytes

** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **

----------------------------- DISK REPORT N0010 ------------------------------

Disk Path: /dev/raw/raw1

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 2048 megabytes

** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **

Do the above 2 disks belong to the diskgroup that you are trying to mount?

we encounter a disk storage destroyed yesterday.

after we recover this disk error. We found our rac can not bring up due to the diskgroup can not mount.

error ora-00600 shows in the alert log file.

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Could you please share generated trace and incident files with us,

cd /opt/oracle/db/diag/asm/+asm/+ASM3/trace

grep '2012-09-22 22' *trc | awk -F: '{print $1}' | uniq

and incident file named as below ,

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

What the exact process you do to recover disk error,please let us know.

Research ::---------

==========

kfdp_query(ASMREDO1): 3

----- Abridged Call Stack Trace -----

<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922

*** 2012-09-17 22:03:22.816

<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----

*** 2012-09-17 22:03:26.954

kfdp_query(DG_ORA): 5

----- Abridged Call Stack Trace -----

2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272

WARNING:Oracle process running out of OS kernel I/O resources

Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2, flags = 0x4

kjb_abort_recovery: abort recovery for domain 2 @ inc 4

kjb_abort_recovery: domain flags=0x0, valid=0

kfdp_dismount(): 6

----- Abridged Call Stack Trace -----

File_name :: +ASM3_ora_13438.trc

Could you please share already requested incident file with us ,

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

I am looking for below file ,please share .

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Research ::---------

==========

NOTE: start heartbeating (grp 2)

kfdp_query(DG_ORA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)

* allocate domain 2, invalid = TRUE

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: ALTER DISKGROUP ALL MOUNT

kfdp_query(ASMREDO1): 3

----- Abridged Call Stack Trace -----

*** 2012-09-17 22:03:22.816

*** 2012-09-17 22:03:26.954

kfdp_query(DG_ORA): 5

----- Abridged Call Stack Trace -----

2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272

WARNING:Oracle process running out of OS kernel I/O resources

Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2, flags = 0x4

kjb_abort_recovery: abort recovery for domain 2 inc 4

kjb_abort_recovery: domain flags=0x0, valid=0

kfdp_dismount(): 6

----- Abridged Call Stack Trace -----

File_name :: +ASM3_ora_13438.trc

========= Dump for incident 5754 (ORA 600 [kfcema02]) ========

----- Beginning of Customized Incident Dump(s) -----

CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1

copy #0: disk=4 au=910336

BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000

kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

-------------------------------------------------------------------------------

----- Invocation Context Dump -----

Address: 0x2b5faa6e8498

Phase: 3

flags: 0x18E0001

Incident ID: 5754

Error Descriptor: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []

Error class: 0

Problem Key # of args: 1

Number of actions: 8

----- Incident Context Dump -----

Address: 0x7fff6c8a42b8

Incident ID: 5754

Problem Key: ORA 600 [kfcema02]

Error: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []

[00]: dbgexExplicitEndInc [diag_dde]

[01]: dbgeEndDDEInvocationImpl [diag_dde]

[02]: dbgeEndDDEInvocation [diag_dde]

[03]: kfcema [ASM]<-- Signaling

[04]: kfrPass2 [ASM]

[05]: kfrcrv [ASM]

[06]: kfcMountPriv [ASM]

[07]: kfcMount [ASM]

[08]: kfgInitCache [ASM]

[09]: kfgFinalizeMount [ASM]

[10]: kfgscFinalize [ASM]

[11]: kfgForEachKfgsc [ASM]

[12]: kfgsoFinalize [ASM]

[13]: kfgFinalize [ASM]

[14]: kfxdrvMount [ASM]

[15]: kfxdrvEntry [ASM]

[16]: opiexe []

[17]: opiosq0 []

[18]: kpooprx []

[19]: kpoal8 []

[20]: opiodr []

[21]: ttcpip []

[22]: opitsk []

[23]: opiino []

[24]: opiodr []

[25]: opidrv []

[26]: sou2o []

[27]: opimai_real []

[28]: ssthrdmain []

[29]: main []

[30]: __libc_start_main []

[31]: _start []

MD [00]: 'SID'='115.3' (0x3)

MD [01]: 'ProcId'='19.1' (0x3)

MD [02]: 'PQ'='(50331648, 1347894201)' (0x7)

MD [03]: 'Client ProcId'='oraclemos5200db3 (TNS V1-V3).13438_47689880133216' (0x0)

Impact 0:

Impact 1:

Impact 2:

Impact 3:

Derived Impact:

File_name :: +ASM3_ora_13438_i5754.trc

1. Execute

kfed read /dev/raw/raw7 aunum=910336 blknum=2115 text=/tmp/kfed_raw7_910336_2115.txt

kfed read /dev/raw/raw7 text=/tmp/kfed_raw7.txt

2. get the 'File 1 Block 1' for the diskgroup following:

a. for each disk in the diskgroup execute:

kfed read <DSK> read | grep f1b1

3. you may get non-zero values for 'kfdhdb.f1b1locn' like:

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

4. for that disk execute (Replace <AUNUM> using the one from previous step):

kfed read <DSK> aunum=<AUNUM> text=kfed_<DSK>_<AUNUM>.txt

kfed read <DSK> text=kfed_<DSK>_w_f1b1.txt

5. Set the below event in asm pfile and try to mount the diskgroup DG_ORA manually at asm level,

Reproduce the problem setting on the instance:

event = "15199 trace name context forever, level 0x8007"

Then start the asm instnace using that pfile

startup nomount pfile=<pfile name>;

Then try to mount each diskgroup manually one-by-one including DG_ORA,

sql> . alter diskgroup <diskgroup_name> mount;

and collect the traces from bdump/udump. The event will dump the redo until

we get the error.

6. for each disk in the diskgroup get a backup for the first 50Mb (Replace the <disk_name>):

dd if=<disk_name> of=/tmp/<disk_name>.dd

later compress those files and upload them to the bug.

7. At the end please upload:

a. Complete alert for all the ASM instances

b. traces produced when event was set

c. metadata dumps (files /tmp/kfed*)

d. OS logs /var/adm/messages* for each node which contains latest timestamp of those mount.

e. dd dumps

----------------------------- DISK REPORT N0008 ------------------------------

Disk Path: /dev/raw/raw3

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 1047552 megabytes

Group Name: DG_ORA

Disk Name: DG_ORA_0000

Failure Group Name: DG_ORA_0000

Disk Number: 0

Header Status: 3

Disk Creation Time: 2012/03/01 15:31:59.955000

Last Mount Time: 2012/04/07 15:40:22.454000

Compatibility Version: 0x0a100000(10010000)

Disk Sector Size: 512 bytes

Disk size in AUs: 1047552 AUs

Group Redundancy: 1

Metadata Block Size: 4096 bytes

AU Size: 1048576 bytes

Stride: 113792 AUs

Group Creation Time: 2012/03/01 15:31:59.829000

File 1 Block 1 location: AU 2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

File_name :: report.txt

It is available for diskgroup DG_ORA on disk /dev/raw/raw3.

Please execute that command on same device.

Yes,we need only for disk raw3 as this issue of mount is related to dg_ora diskgroup and there should be atleast one location in a diskgroup,so you can see 2 from 2 different diskgroup.

Hence ,for this step 2-4 do the mentioned action plan then rest of them.

2. get the 'File 1 Block 1' for the diskgroup following:

a. for each disk in the diskgroup execute:

kfed read /dev/raw/raw3 read | grep f1b1

3. you may get non-zero values for 'kfdhdb.f1b1locn' like:

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

4. for that disk execute (Replace <AUNUM> using the one from previous step):

kfed read /dev/raw/raw3 aunum=2 text=kfed_raw3_2.txt

kfed read /dev/raw/raw3 text=kfed_raw3_w_f1b1.txt

It seems file asmlog.part01.rar is broken ,could you please share the same again with us .

1. Clarification of current patch level status:

=> from uploaded OPatch lsinventory output:

Oracle Home : /opt/oracle/db/product/11g/db_1

Installed Top-level Products (2):

Oracle Database 11g 11.1.0.6.0

Oracle Database 11g Patch Set 1 11.1.0.7.0

Interim patches (3) :

Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012

Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012

Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012

=> DATABASE PSU 11.1.0.7.8 (INCLUDES CPUJUL2011)

=> Note:

1) The prior mentioned bug 6163771 is already fixed in patchset 11.1.0.7

@@ PATCHSET REQUEST #70719 CREATED IN BUG 6712856 FOR FIX IN 11.1.0.7.0

2) I cannot find any informations inthis SR if the ASM and DB are common. Need to clarify.

2. From +ASM3_ora_13438_i5754.trc:

*** ACTION NAME:() 2012-09-17 22:03:27.432

Dump continued from file: /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

========= Dump for incident 5754 (ORA 600 [kfcema02]) ========

----- Beginning of Customized Incident Dump(s) -----

CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1

copy #0: disk=4 au=910336

BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000

kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

...

*** 2012-09-17 22:03:27.553

----- Current SQL Statement for this session (sql_id=2pa6sbf4762ga) -----

ALTER DISKGROUP ALL MOUNT

----- Call Stack Trace -----

Function List:

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp

<- PGOSF52_ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocatio <- nImpl

<- dbgeEndDDEInvocatio <- kfcema <- kfrPass2 <- kfrcrv <- kfcMountPriv

<- kfcMount <- kfgInitCache <- kfgFinalizeMount <- 2241 <- kfgscFinalize

<- kfgForEachKfgsc <- kfgsoFinalize <- kfgFinalize <- kfxdrvMount <- kfxdrvEntry

<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr

<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv

<- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main

@@ Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE

1. The prior engineer sent out an action plan to you with the target to patch the diskgroup.

Could we ask you for the results from this action plan please ? Was you able to mount the diskgroup after that plan ?

+++ for Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE, there is no patch at all. also it happens on 11gR2

2. If the problem is still remaining verify that the affected diskgroup is dismounted on all nodes/asm instances.

After that please try to mount it on ASM instance 1 only (manually in SQLPLUS).

What is the result ? Do you still get the same ORA-600 error as before ?

Please re-upload the most current alertfile from ASM instance 1 together with the tracefiles which will be written.

++ no patch can be applied, do you still want us to do this?

3. If the internal error is still remaining and the patching of the diskgroup failed then we have to rebuild the diskgroup.

Do you have a full backup of the data within the affected diskgroup ? Please clarify.

+++ sorry, no backup

Unfortunately this is a misunderstanding.

The action plan which was provided to you by the prior engineer was to patch the bad blocks in the affected disks.

- not to apply any patch. Even if we could apply a patch it would be possible that this patch does only avoid new

occurances - but probably it will not repair the current situation (if the diskgroup is corrupted).

Please note that in case that we cannot repair the diskgroup you will have to rebuild the diskgroup and then

to restore and recover the lost data. Accordingly you should have at least some kind of worste case backup ?

Your issue was transferred to me. My name is Pallavi and I will be helping you with your issue. I am currently reviewing/researching the situation and will update the SR / call you as soon as I have additional information. Thank you for your patience.

We can try to patch the diskgroup. If this doesn't work, you will have to recreate the diskgroup and restore data from a valid backup.

!!!!!! VERY IMPORTANT: Be sure you have a valid backup of data pertaining to ora_data diskgroup. !!!!!!

-----------------------------------------------------------------

You need to create kfed and amdu for further use.

1) kfed is a tool that allows to read/write the ASM metadata. To create kfed, connect as the user owner of the oracle software and execute:

$cd $ORACLE_ASMHOME/rdbms/lib

$make -f ins_rdbms.mk ikfed

2)AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks.

As many other tools released with 11g, it can be used on 10g environments. Note 553639.1 is the placeholder for the different platforms. The note include also instructions for the configuration.

* Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.

There is no guarantee that the patching would work. It all depends on the status of the disk that we are trying to patch. We will only know what the status is when we try.

As the ASM software owner, execute facp:

$ ./facp 'diskstring''DISKGROUP NAME' ALL

eg:

$./facp '/dev/vg00/rraw*''DATAHP' ALL

Run this only ONCE -- and then please update the sr with all the files it has generated.

Did you execute facp command as requested ,if not please do the same and share related generated files with us ,

As the ASM software owner, execute facp:

$ ./facp 'diskstring''<DISKGROUP NAME>' ALL

$ ./facp '/dev/raw/raw*''DG_ORA' ALL

Then share related files named as below ,

facp_report

facp_dump_1

facp_dump_2

facp_dump_3

facp_restore

facp_patch_1 (one per node that uses the dg)

facp_adjust

facp_check

facp_patch

Note:: Run this only ONCE

We are waiting for the same.

Execute the below command and share generated logfile with us,

script /tmp/facp.log

# Run the following to lower all checkpoints by 10 blocks:

$ ./facp_adjust -10

# Then run facp_check.

$ ./facp_check

exit

Share the file named as /tmp/facp.log

Try to adjust to some lower value than 10 using below command,

./facp_adjust -<integer>

Then ,validate ,

$ ./facp_check

If facp_check reports "Valid Checkpoint" for all threads, it's the indication

to proceed with the real patching, which means, updating the ACD records

on the disks with the records from files fac_patch_*.

To continue with this step, facp_check should have returned "Valid Checkpoint" for all threads.

Then execute the below command to patch the ACDC ,

./facp_patch

Then try to mount this diskgroup manually ,

SQL> alter diskgroup dg_ora mount;

if again mount fail with same error then go back up to facp_adjust step, using a new argument for facp_adjust and continue until diskgroup is mounted.

Instruction:Following note : How to fix error ORA-600 [KFCEMA02] (Doc ID 728884.1) As per above note,Ct tried to patch acd ,but still not able to mount the diskgroup, oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9 3 patch files written oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check --- Executing amdu to validate checkpoint target blocks --- Thread 1 (348,1533): Valid Checkpoint Thread 2 (189,5018): Valid Checkpoint Thread 3 (182,5371): Valid Checkpoint oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch --- Executing amdu to check for heartbeat --- Patching Thread 1 kfracdc.ckpt.seq: 348 kfracdc.ckpt.blk: 1533 Patching Thread 2 kfracdc.ckpt.seq: 189 kfracdc.ckpt.blk: 5018 Patching Thread 3 kfracdc.ckpt.seq: 182 kfracdc.ckpt.blk: 5371 Save files ./facp_* to document what was patched oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID= ASM1 Refer to the SQL*Plus User's Guide and Reference for more information. oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount; ASM instance started Total System Global Area 283930624 bytes Fixed Size 2158992 bytes Variable Size 256605808 bytes ASM Cache 25165824 bytes SQL> alter diskgroup DG_ORA mount; alter diskgroup DG_ORA mount * ERROR at line 1: ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [], [], [], [], [], [], [], [], [] SQL> host oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit Seems Ct needs to recreate this diskgroup and restore data from backup. If they does not have backup. Then ,Need to log a bug to involve development team further.

Activity Instruction

Created:18-Sep-2012 03:28:29 PM GMT+00:00Instruction Type:Severity 1 : End of Shift Note

Instruction:Currently we try to find out if the diskgroup can be patched/repaired. Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback. If the diskgroup cannot be repaired we have to rebulid it.

Activity Instruction

Created:18-Sep-2012 08:08:23 AM GMT+00:00Instruction Type:Severity 1 : End of Shift Note

Instruction:Seems Ct is on PSU 8 and related known defect is already RFIed into 11.1.0.7 BUG 6712856 - RFI BACKPORT OF BUG 6163771 FOR INCLUSION IN 11.1.0.7.0 Waiting for Ct to share requested information,after that needs to raise a defect with development team and page BDE immediately to involve them.

2. TECHNICAL & BUSINESS IMPACT

Probably diskgroup corruption.

If we try to mount the affected diskgroup then we fail during the dg recovery with an internal error:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Currently we try to find out if the diskgroup can be patched/repaired.

Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback.

ove note,Ct tried to patch acd ,but still not able to mount the diskgroup,

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9

3 patch files written

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check

--- Executing amdu to validate checkpoint target blocks ---

Thread 1 (348,1533): Valid Checkpoint

Thread 2 (189,5018): Valid Checkpoint

Thread 3 (182,5371): Valid Checkpoint

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch

--- Executing amdu to check for heartbeat ---

Patching Thread 1

kfracdc.ckpt.seq: 348

kfracdc.ckpt.blk: 1533

Patching Thread 2

kfracdc.ckpt.seq: 189

kfracdc.ckpt.blk: 5018

Patching Thread 3

kfracdc.ckpt.seq: 182

kfracdc.ckpt.blk: 5371

Save files ./facp_* to document what was patched

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID=+ASM1

Refer to the SQL*Plus User's Guide and Reference for more information.

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012

Connected to an idle instance.

SQL> startup nomount;

ASM instance started

Total System Global Area 283930624 bytes

Fixed Size 2158992 bytes

Variable Size 256605808 bytes

ASM Cache 25165824 bytes

SQL> alter diskgroup DG_ORA mount;

alter diskgroup DG_ORA mount

ERROR at line 1:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [],

[], [], [], [], [], [], [], []

SQL> host

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit

Seems Ct needs to recreate this diskgroup and restore data from backup.

If they does not have backup.

Then ,Need to log a bug to involve development team further.

again the latest uploaded file 'facplog' is not readable on our side - it cannot be de-compressed.

Please make sure that the uploaded compressed files are readable/can be de-compressed again -

still before uploading them. In that way we all can save time...

To get the current correct status of the patching action please upload the next informations:

1. Re-upload file ''facplog'.

2. Upload the most current asm alertfile from all instances.

3. Upload that tracefile which was written during the last happened ORA-600 [kfcema02]

1. from asm alertfile (inst.1):

=> latest Occurance:

Wed Sep 19 15:33:40 2012

SQL> alter diskgroup DG_ORA mount

...

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x95BC2DFD (DG_ORA)

...

Wed Sep 19 15:33:45 2012

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery for thread 1 at

NOTE: seq=348 blk=1542

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery for thread 2 at

NOTE: seq=189 blk=5027

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

NOTE: starting recovery for thread 3 at

NOTE: seq=182 blk=5380

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM1/trace/+ASM1_ora_2519.trc (incident=9775):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: alter diskgroup DG_ORA mount

...

2. from +ASM1_ora_2519.trc:

2012-09-19 15:25:25.156129 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1537 group=2

NOTE: starting recovery of thread=2 ckpt=189.5022 group=2

NOTE: starting recovery of thread=3 ckpt=182.5375 group=2

...

*** 2012-09-19 15:25:25.172

kfrHtAdd: obj=0x1 blk=0x6e6 op=133 fcn:0.165051322 -> 0.165051323

kfrHtAdd: bcd: obj=1 blk=1766 from:0.165051322 to:0.165051323

...

=> revovery is running...

...

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057973 -> 0.165057974

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x1 blk=0x6e9 op=133 fcn:0.165057974 -> 0.165057975

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x80000006 blk=0x60f op=65 fcn:0.165057967 -> 0.165057975

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057974 -> 0.165057975

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400

WARNING:Oracle process running out of OS kernel I/O resources

*** 2012-09-19 15:25:25.212

kfrRcvSetRem: obj=0x1 blk=0x6e7 [set] = 284

block needed no recovery:

CE: (0x0x617be2b0) group=2 (DG_ORA) obj=1 blk=1767

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x18 copies=1 blockIndex=231 AUindex=0 AUcount=0

copy #0: disk=3 au=762686

BH: (0x0x6178e360) bnum=5 type=FILEDIR state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61404000

kfbh_kfcbh.fcn_kfbh = 0.165054713 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

...

=> from here is seems that the recovery was interrupted due to an I/O kernel limitation:

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400

WARNING:Oracle process running out of OS kernel I/O resources

=== Follow up ===

3. From the block patching actions:

=> regarding to the patched blocks we are here:

SQL> host

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -0

3 patch files written

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check

--- Executing amdu to validate checkpoint target blocks ---

Thread 1 (348,1542): WRONG SEQ NUMBER

Thread 2 (189,5027): WRONG SEQ NUMBER

Thread 3 (182,5380): Valid Checkpoint

DO NOT PATCH WITH THE CURRENT PATCH FILES

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch

--- Executing amdu to check for heartbeat ---

Patching Thread 1

kfracdc.ckpt.seq: 348

kfracdc.ckpt.blk: 1542

Patching Thread 2

kfracdc.ckpt.seq: 189

kfracdc.ckpt.blk: 5027

Patching Thread 3

kfracdc.ckpt.seq: 182

kfracdc.ckpt.blk: 5380

Save files ./facp_* to document what was patched

=> not sure why '/facp_adjust' command was used with zero (-0) ?

SQL> alter diskgroup DG_ORA mount;

alter diskgroup DG_ORA mount

ERROR at line 1:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [],[], [], [], [], [], [], [], []

4. Patch level status of the instance:

Oracle Database 11g Patch Set 1 11.1.0.7.0

There are 2 products installed in this Oracle Home.

Interim patches (3) :

Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012

Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012

Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012

=> PSU 11.1.0.7.8

=> so we are on 11.1.0.7.8 here

please see our latest analysis below.

Currently I can see two problems when we are trying to mount the corrupted diskgroup.

We get the known ORA-600 but also an error about an I/O kernel limitation during the block recovery.

I would like to avoid the I/O kernel limitation error during the recovery. Maybe after that the recovery can

complete and resolve the situation - instead to patch blocks manually.

We know about the next bugs in companion with I/O kernel limitation errors (from note 868590.1):

"...

For 11gR1

The fix for unpublished Bug 6687381 is included in patch set 11.1.0.7

The fix for Bug 7523755 is available as overlay patch on Patch Set Update 11.1.0.7.10 ,

... apply patch set 11.1.0.7 and Patch 13343461 on top of that., then Apply fix for Bug 7523755...

Accordingly I would suggest to follow the next actions now:

Since you are currently on 11.1.0.7.8 you would need to apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).

Afterwards apply the fix for Bug 7523755.

Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.

Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.

At least the I/O errors should not be reported anymore now.

apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).

Afterwards apply the fix for Bug 7523755

Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.

Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.

At least the I/O errors should not be reported anymore now.

↧

Collecting The Required Information For Support To Troubleshot Oracle ASM/ASMLIB Issues.

March 14, 2017, 9:07 pm

≫ Next: Understanding and fixing ORACLE ASM errors ORA-600 [kfcChkAio01] and ORA-15196.

≪ Previous: ora-00600 [kfcema02] cause the diskgroup can not bring up.

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

1) The present document provides a list of steps to collect the required information to troubleshoot & diagnostic ASM/ASMLIB Issues required for support.

2) Obtain the most recent ASMLIB & ASM state from your current environment.

Solution

1) In order to check if the ASMLIB API is correctly configured, please execute the next commands and provide us the output (from each node if this is RAC):

$> cat /etc/*release 

	$> uname -a 

	$> rpm -qa |grep oracleasm 

	$> df -ha 

	$>/usr/sbin/oracleasm configure

	$> /sbin/modinfo oracleasm

2) Check the discovery path (from each node if this is RAC):

$> /etc/init.d/oracleasm status 

	$> /usr/sbin/oracleasm-discover 

	$> /usr/sbin/oracleasm-discover 'ORCL:*'

3) Please check if the ASMLIB devices can be accessed (from each node if this is RAC):

$> /etc/init.d/oracleasm scandisks 

	$> /etc/init.d/oracleasm listdisks 

	$> /etc/init.d/oracleasm querydisk -p <each disk from previous output> 

	$> ls -l /dev/oracleasm/disks

	$> /sbin/blkid

4) Upload the next files from each node if this is RAC:

=)> /var/log/messages* 

	=)> /var/log/oracleasm

	=)> /etc/sysconfig/oracleasm

5) Please show us the partition table (from each node if this is RAC):

$> cat /proc/partitions

6) If you are using multipath devices (mapper devices or emcpower) then show me the output of:

$> ls -l /dev/mpath/* 
	$> ls -l /dev/mapper/* 
	$> ls -l /dev/dm-*  
	$> ls -l /dev/emcpower*

Or if you have another multipath configuration then list the devices:

$> ls -l /dev/<multi path device name>*

7) Finally connect to your ASM instance, execute the next script and upload me the output file (from each node if this is RAC):

spool asm<#>.html

	SET MARKUP HTML ON 

	set echo on
	set pagesize 200
	alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
	select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate "" from dual;

	select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE "" from v$session where program like '%SMON%';
	select * from v$asm_diskgroup;
	SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER; 
	SELECT * FROM V$ASM_CLIENT; 
	select * from V$ASM_ATTRIBUTE;
	select * from v$asm_operation;

	select * from gv$asm_operation

	select * from v$version;
	show parameter asm

	show parameter cluster

	show parameter instance_type

	show parameter instance_name

	show parameter spfile
	show sga
	spool off
	exit

Note: please compress those files in just one file (*.zip or *.tar) and upload it thru Metalink.

8) Also, if this is not a new ASM/ASMLIB implementation, please describe in detail what has changed since this last worked (OS patches, OS kernel upgrade, SAN migration, etc.)?

Note: If you are installing UEK (Unbreakable Enterprise Kernel), therefore the Oracle ASMLib kernel driver is now included in the Unbreakable Enterprise Kernel. No driver package needs to be installed when using this kernel. The oracleasm-support and oracleasmlib packages still need to be installed from ULN (below):

Example:

# up2date -i oracleasm-support oracleasmlib oracleasm-`uname -r`

The above command will install only 2 packages (oracleasm-support and oracleasmlib):

[oracle@cstdb02 database]$ cat /etc/*release

	Enterprise Linux Enterprise Linux Server release 5.7 (Carthage)

	Oracle Linux Server release 5.7

	Red Hat Enterprise Linux Server release 5.7 (Tikanga)
	[oracle@cstdb02 database]$ uname -a

	Linux cstdb02.cstdi.com 2.6.32-200.20.1.el5uek #1 SMP Fri Oct 7 02:29:42 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

	[oracle@cstdb02 database]$ rpm -qa |grep oracleasm
oracleasm-support-2.1.7-1.el5

	oracleasmlib-2.0.4-1.el5

This is due to the driver package is now embedded in the UEK kernel :

[root@cstdb02 database]# modinfo oracleasm
filename: /lib/modules/2.6.32-200.20.1.el5/kernel/drivers/block/oracleasm/oracleasm.ko

	description: Kernel driver backing the Generic Linux ASM Library.

	author: Joel Becker <joel.becker@oracle.com>

	version: 2.0.6

	license: GPL

	srcversion: BB13CDD65668CBDA51D0C25

	depends:

	vermagic: 2.6.32-200.20.1.el5 SMP mod_unload

↧

Understanding and fixing ORACLE ASM errors ORA-600 [kfcChkAio01] and ORA-15196.

March 14, 2017, 11:27 pm

≫ Next: ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

≪ Previous: Collecting The Required Information For Support To Troubleshot Oracle ASM/ASMLIB Issues.

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Symptoms

Errors ORA-600 [kfcChkAio01] and ORA-15196 can be reported, after a NON-CLEAN dismount of the diskgroup, normally caused by a crash of the ASM instance.

During the restart of ASM instance and mounting the diskgroup, following messages will be reported on the alert.log of the ASM instance:

* Messages indicating recovery:

NOTE: starting recovery of thread=1 ckpt=201.9904 group=2
NOTE: starting recovery of thread=2 ckpt=139.4186 group=2

* The messages about the error ORA-600 and ORA-15196:

Tue Dec 16 03:00:51 2008
Errors in file /u01/app/oracle/product/10.2.0/asm/admin/+ASM/udump/+asm2_ora_15305.trc:
ORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], []
ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0]
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup FLASH

As a result the diskgroup is dismounted. Subsequent mounts will report same set of errors.

Bug 7589862 was created for this case.

Cause

For the diagnostic and identification of the problem, there are important parts of information dumped into the trace file generated by the errors

The call stack on the trace

kfcChkAio <- kfcGet0 <- kfcGet1Priv <- kfcRcvGet <- kfcema <- kfrPass2 <- kfrcrv <- kfcMount <- kfgInitCache <- kfgFinalizeMount <-
kfgscFinalize <- kfgForEachKfgsc <- kfgsoFinalize <- kfgFinalize <- kfxdrvMount <- kfxdrvEntry

Functions on the call stack indicate the operations like mount diskgroup (kfxdrvMount) and Recovery (kfrcrv)

Description of the errors

ORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], []

kfcChkAio01 will be signaled if the IO operation failed because an invalid block.

ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0]

This error is reported when block failed the validation. The arguments:


endian_kfbh	is the first field on the block header. This is the field that missed the validation.
2079	Is the asm file number. Note that this value will be different on each case
2147483648	The block number found on kfbh.block.blk, other field on the block header. Converted to hex, the bytes on the right reference the block number. 0X80000000
1 != 0	1 was the value found on the field referenced on the first argument, but 0 was the expected value.

The trace file will have the information about the Cache Element and Buffer header affected by the error:

Start recovery for domain 2, valid = 0, flags = 0x4
NOTE: starting recovery of thread=1 ckpt=201.9904 group=2
NOTE: starting recovery of thread=2 ckpt=139.4186 group=2
CE: (0xc0000000153d0bb8) group=2 (FLASH) obj=2079 blk=0 (indirect)
hashFlags=0x0100 lid=0x0002 lruFlags=0x0000 bastCount=1
redundancy=0x11 fileExtent=0 AUindex=0 blockIndex=0
copy #0: disk=0 au=7492
BH: (0xc0000000153a54d0) bnum=322 type=rcv reading state=rcvRead chgSt=not modifying
flags=0x00000000 pinmode=excl lockmode=null bf=0xc000000015141000
kfbh_kfcbh.fcn_kfbh = -1.-1826817 lowAba=0.0 highAba=0.0
last kfcbInitSlot return code=null cpkt lnk is null

From the Cache Element, it is possible to identify the disk and allocation unit involved with the error:
copy #0: disk=0 au=7492
From the alert.log is possible to identify the path of the disk. Review the file back in time and identify the last time diskgroup was mounted without errors. Check for messages like:
NOTE: cache opening disk 0 of grp 2: FLASH_0000 path:/dev/rdsk/c29t1d4

* The second argument of error ORA-15196 indicate the ASM file number involved with the problem. This can be also validated by some of the information printed in the trace file, searching for the words KSTDUMP In memory trace dump:

KSTDUMP: In-memory trace dump
TIME(usecs):SEQ# ORAPID SID EVENT OP DATA
========================================================================
88894E39:000E0839 16 255 10495 20 kfcMoveLRU: gn=2 fn=2079 indblk=218 src=5 dest=2 line=3201
88894E39:000E083A 16 255 10495 3 kfcAddPin: pin=267 kfc.c 3289 excl bnum=189 class=0
88894E3B:000E083B 16 255 10495 10 kfcbpInit: gn=2 fn=2079 indblk=219 pin=268 excl rcvRead kfr.c 5524
88894E3C:000E083C 16 255 10495 12 kfcFlush: bnum=190 kfc.c 3179
88894E3C:000E083D 16 255 10495 11 kfcMakeFree: bnum=190 flags=00000000 kfc.c 3180
88894E3D:000E083E 16 255 10495 19 kfcMoveBucket: [ gn=2 fn=2079 indblk=26 ] --> [ gn=2 fn=2079 indblk=219 ]

From this line:
88894E39:000E0839 16 255 10495 20 kfcMoveLRU: gn=2 fn=2079 indblk=218 src=5 dest=2 line=3201

gn=2 is the diskgroup number
fn=2079 is the ASM file Number
indblk=218 is the block where the indirect extent is stored


gn=2	is the diskgroup number
fn=2079	is the ASM file Number
indblk=218	is the block where the indirect extent is stored

All the references on the In-memory trace dump will be for 256 blocks of the same file, in this case 2079.

Validating the content of Allocation Unit, using kfed

Using kfed to dump the blocks on the Allocation Unit referenced on the Cache Element will show invalid data:

$kfed read /dev/rdsk/c29t1d4 aunum=7492 blknum=0 ausize=1048576|more

kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 66 ; 0x001: 0x42
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 89088 ; 0x004: T=0 NUMB=0x15c00
kfbh.block.obj: 11626 ; 0x008: TYPE=0x0 NUMB=0x2d6a
kfbh.check: 2182659237 ; 0x00c: 0x8218bca5
kfbh.fcn.base: 4293140479 ; 0x010: 0xffe41fff
kfbh.fcn.wrap: 4294967295 ; 0x014: 0xffffffff
kfbh.spare1: 4294967247 ; 0x018: 0xffffffcf
kfbh.spare2: 4294967295 ; 0x01c: 0xffffffff

All 256 (0 through 255) will have similar content. The type will be KFBTYP_INVALID which indicates content/type of the block is incorrect.

The reason of these errors is because during a file creation, ASM incorrectly commits the allocation of an indirect extent before pre-formatting the extent to contain valid blocks. Thus if a crash occurs during the middle of this operation, during recovery the blocks for the indirect extents are found unformatted (kfbh.type: 0 ; 0x002: KFBTYP_INVALID), signaling the errors already mentioned.

Solution

If the patch is not available, the block has to be manually modified. Please carefully follow the procedure described next.

1. Download file patch.zip and copy to any directory on the server running ASM.

( If the downloaded patch.sh is giving any error for some reason, you can just copy/paste the patch.sh script as mentioned below in this document and run it after necessary modifications )

The zip file contains two files:

empty_indirect.txt: which is the valid format of a indirect block.
path.sh: is a shell script used to patch the Allocation Unit having the blocks with the incorrect format.

2. Edit file empty_indirect.txt to make the following changes:

The modifications to the file apply to few fields from the block header.

kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 12 ; 0x002: KFBTYP_INDIRECT
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483648 ; 0x004: T=1 NUMB=0x0
kfbh.block.obj: 2901 ; 0x008: TYPE=0x0 NUMB=0xb55

kfbh.endian:

Possible values are:

1 for little endian processors
0 for big endian processors

Here is a list of the platforms:

PLATFORM_ID	PLATFORM_NAME	ENDIAN_FORMAT
4	HP-UX IA (64-bit)	Big
1	Solaris[tm] OE (32-bit)	Big
16	Apple Mac OS	Big
3	HP-UX (64-bit)	Big
9	IBM zSeries Based Linux	Big
6	AIX-Based Systems (64-bit)	Big
2	Solaris[tm] OE (64-bit)	Big
18	IBM Power Based Linux	Big
17	Solaris Operating System (x86)	Little
12	Microsoft Windows 64-bit for AMD	Little
13	Linux 64-bit for AMD	Little
8	Microsoft Windows IA (64-bit)	Little
15	HP Open VMS	Little
5	HP Tru64 UNIX	Little
10	Linux IA (32-bit)	Little
7	Microsoft Windows IA (32-bit)	Little
11	Linux IA (64-bit)	Little

kfbh.block.obj:

This is the asm file number that was been created during the failure. It is the third argument referenced on error ORA-15196

Because this example was on HP Itanium, with ASM file Number 2079, the header of the block on file empty_indirect.txt should looks like this:

kfbh.endian: 0 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 12 ; 0x002: KFBTYP_INDIRECT
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483648 ; 0x004: T=1 NUMB=0x0
kfbh.block.obj: 2079 ; 0x008: TYPE=0x0 NUMB=0xb55

When modifying files generated by kfed, it is required only to change the value on the left of the ';'.

2. Modify script patch.sh

i=0

	while [ $i -le 255 ]

	do

	echo "write block $i"

	kfed write ausz=1048576 blksz=4096 aunum=<AU#> blknum=$i dev=<path for ASM disk> text=/tmp/empty_indirect.txt

	i=`expr $i + 1`

	done
	i=1

	while [ $i -le 255 ]

	do

	echo "merge block $i"

	blk=`expr 2147483648 + $i`

	echo "kfbh.block.blk: $blk" > /tmp/merge

	kfed merge ausz=1048576 blksz=4096 aunum=<AU#> blknum=$i dev=<path for ASM disk> text=/tmp/merge

	i=`expr $i + 1`

	done

The code in file patch.sh execute two changes:

All the blocks in the allocation unit are replaced with the valid format for an indirect block. This is executed in the first loop.
The second loop adjust the correct value for field kfbh.block.blk. It includes the block number.

This script needs to be adapted for every particular case. The changes required are:

aunum=<AU#>.

The Allocation Unit number is reported on the trace file generated by error ORA-600 and ORA-15196, right on the CE and BH area. It's the last line of the CE dump and before the BH.

CE: (0xc0000000153d0bb8) group=2 (FLASH) obj=2079 blk=0 (indirect)
hashFlags=0x0100 lid=0x0002 lruFlags=0x0000 bastCount=1
redundancy=0x11 fileExtent=0 AUindex=0 blockIndex=0
copy #0: disk=0 au=7492

In this example is Allocation Unit 7492.

dev=<path for ASM disk>

This is the full path of the ASM disk number. The CE dumps together with the Allocation Unit number,the disk number. Before in the note was explained how to find the complete path of the disk reviewing the alert.log of the ASM instance. Using v$asm* views is not an option because diskgroup if diskgroup is dismounted.

ausz=1048576.

It will be extremely important to specify the correct size of the Allocation Unit of the diskgroup.

For this example, the version of patch.sh will be:

i=0

	while [ $i -le 255 ]

	do

	echo "write block $i"

	kfed write ausz=1048576 blksz=4096 aunum=7492 blknum=$i dev=/dev/rdsk/c29t1d4 text=/tmp/empty_indirect.txt

	i=`expr $i + 1`

	done
	i=1

	while [ $i -le 255 ]

	do

	echo "merge block $i"

	blk=`expr 2147483648 + $i`

	echo "kfbh.block.blk: $blk" > /tmp/merge

	kfed merge ausz=1048576 blksz=4096 aunum=7492 blknum=$i dev=/dev/rdsk/c29t1d4 text=/tmp/merge

	i=`expr $i + 1`

	done

3. Execute script patch.sh

4. Validate that blocks on the Allocation Unit have now the format of indirect extents block

Following with the example used on this note:

kfed read ausz=1048576 blksz=4096 aunum=7492 blknum=0 dev=/dev/rdsk/c29t1d4 |more

The output should be like:

kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 12 ; 0x002: KFBTYP_INDIRECT
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2147483648 ; 0x004: T=1 NUMB=0x0
kfbh.block.obj: 2079 ; 0x008: TYPE=0x0 NUMB=0x81f

5. After this, diskgroup should operate without problems.

↧

ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

March 15, 2017, 1:24 am

≫ Next: Database Restore after Server's storage crash

≪ Previous: Understanding and fixing ORACLE ASM errors ORA-600 [kfcChkAio01] and ORA-15196.

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

This document provides an explanation of error ORA-15196, including the details of each argument, suggestions for the diagnostic of the error and finally includes a case study using a real problem reported by a customer.

Error Description

ORA-15196 is reported after a validation of an ASM metadata block has failed. The error will be reported in the following format:

ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]

Where the arguments indicate:

Argument Meaning

1st Function and line number in the code, where the exception is raised 2nd Field failing the validation
3rd ASM object number stored in the block
4th ASM block number stored in the block
5th Value associated with field referenced by argument 2 6th Expected value for field referenced by argument 2

Example:

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

Function and line number in the code, where the exception is raised = kfc.c:7997

Field failing the validation = endian_kfbh ASM object number stored in the block = 1 ASM block number stored in the block = 93

Value associated with field referenced by argument #2 = 211

Expected value for field referenced by argument #2 = 0

Arguments description

Function and line number in the code, where the exception is raised

In general terms it is valid to say this argument will be the same in most of the possible cases, because is always the same routine where this exception is raised.

#define kfbValid(data, len, type, bl) \

kfbValidPriv(data, len, type, bl, FILE , LINE ).

Field failing the validation

The ASM metadata is composed by many different structures like file directory, disk directory, active change directory (ACDC), etc, which are organized by files (asm file# between 1 and 255). Each file will be made of extents, which will be made of ASM block (4096 bytes). Each block has a generic block header (kfbh), and any of those fields can be validated.

kfbh.endian:                           0 ; 0x000: 0x00
kfbh.hard:                           130 ; 0x001: 0x82
kfbh.type:                             4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                           1 ; 0x003: 0x01
kfbh.block.blk:                       80 ; 0x004: T=0 NUMB=0x50
kfbh.block.obj:                        1 ; 0x008: TYPE=0x0 NUMB=0x1
kfbh.check:                   4268948098 ; 0x00c: 0xfe72fa82
kfbh.fcn.base:                         0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                         0 ; 0x014: 0x00000000
kfbh.spare1:                           0 ; 0x018: 0x00000000
kfbh.spare2:                           0 ; 0x01c: 0x00000000

A short description of each of the fields referenced above (file kf3.h):

kfbh.endian endianness of writer big or little endian

kfbh.hard H.A.R.D. magic # and block size

kfbh.type metadata block type (type of ASM metadata)

kfbh.datfmt metadata block data format

kfbh.block.blk block location of this block

kfbh.block.obj check value to verify consistency

kfbh.check change number of last change

kfbh.spare1 zero pad out to 32bytes

kfbh.spare2 zero pad out to 32 bytes

A list of the fields reported by this error through different SR is:

endian_kfbh
obj_kfbl hard_kfbh
type_kfbh
datfmt_kfbh
check_kfbh

ASM object number stored in the block

Every ASM metadata block belongs to a specific file associated with a specific ASM structure. That’s why ASM File numbers between 1 and 255 are used to identify the files storing those structures. The value on this field, references the ASM file number.

ASM File Number ASM Metadata

1 File Directory

2 Disk Directory

3 Active Change Directory (ACD)

4 Continous Operations Directory (COD)

5 Template Directory

6 Alias Directory

9 Attributes Directory

12 Staleness Directory

For other ASM metadata structures like PST, ATB, DISK HEADER, this field will have a static value 2147483648 (0x80000000)

ASM block number stored in the block

An ASM file will allocate extents, which are associated with Allocation Units. Multiple ASM metadata blocks of 4096 bytes make the extent, considering the default Allocation Unit size of 1MB; there are 256 blocks on each extent/AU.

The value stored on this field indicates the block number relative to a particular file. In this example, (93) is the block number, which will be stored in the first extent of the file. That extent will be allocated on a specific Allocation Unit of any of the disks in the diskgroup.

Value associated with field referenced by argument #2

This is the value found in the block for the field referenced in argument #2.

Expected value for field referenced by argument 2

This is the expected value for the block referenced by argument # 2.

Having the description of all the arguments for error ORA-15196, It should be possible to have a better understanding of the message:

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

In the previous example, the field failing the validations is endian_kfbh, belong to file 1 (FILE DIRECTORY); it was also relative block 93, and the value for endian_kfbh was 211 while the correct value should have been 0.

Diagnostics

Up to 10gR2, there are some bugs (patch included) related to this error.

5554692	Related to indirect extent allocation. Please read the bug descriptionin webiv, because not all cases of ORA-15196 are this particular bug.
6027802	This was closed as not a bug, but was related to some IO issues caused by EMC Powerpath. Same type of data mismatch has been observed on other PP installations
6453944	ORA-15196 with ASM disks larger than 2TB using ASMLIB

The major number of issues of this error is associated with data changed outside of ASM. This include:

Disks formatted at the OS level while it was used by ASM
Disks assigned to a file system while used by ASM
IO errors (stale writes)
Usage of 3rdparty software

Once this error is reported, the diskgroup needs to be recreated. There are situations where diskgroup cannot be mounted, or others where any reference to the metadata (recursive or non recursive), will signal the error and dismount the diskgroup.

Data Collection

In order to understand the extension of the problem and produce a correct diagnostic, it is essential to obtain the following data:

Alert.log and trace file associated to the error
First 300MB of the disk affected with the error

In the alert.log, review the line before the report of error ORA-15196:

WARNING: cache failed to read fn=1 blk=80 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

In the line prior the report of error ORA-15196, it indicates the disk storing the block: from disk(s): 0.

To get the first 300MB:

$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300

It may be necessary to provide partial copy of other disks in the diskgroup.

Output from AMDU if available

AMDU will be explained with more detail in a different note (TBD).

This tool is part of the New Features introduced with 11g. It reads the ASM disks and extract information into different files. Those files have a mapping of the ASM metadata, an image with the content of the disks or it is possible to extract files from the diskgroup.

AMDU can extract the information even if the diskgroup is dismounted.

The mapping file is very important for the diagnostic of error ORA-15196. It has the specific location for each of the extents of each ASM metadata file.

Note 553639.1 is the placeholder for the AMDU binaries for some of the platforms.

Data Review

Always review other blocks in the boundaries of the affected block. If more than one block has incorrect data (zeros), and they belong to different ASM structures (file directory, disk directory, etc), it is most likely was caused outside of ASM: disk reformatted, assigned to another volume manager, etc.

Use kfed to extract the content of the blocks.

Reviewing the trace file generated by the error.

The trace file always will print a dump of the ASM metadata block in memory, and also a short call stack. The output of the block is the same generated by kfed, which is a readable by the user.

*** SERVICE NAME:() 2008-01-23 11:57:23.892

*** SESSION ID:(39.74) 2008-01-23 11:57:23.892

OSM metadata block dump:

kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 80 ; 0x004: T=0 NUMB=0x50

kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check: 4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

/* data remove on purpose */

After the OSM metadata block dump, the short call stack is printed:

—– Abridged Call Stack Trace —–

kfcReadBlk()+1276 kfcLoad()+2148 kffbScanNext()+252 kffbTableCb()+700 kfgTableCb()+1252 kffilTableCb()+240 qerfxFetch()+896 qersoFetch()+720 qerjotFetch()+184 opifch2()+8092 kpoal8()+4196 opiodr()+1548 ttcpip()+1284 opitsk()+1432 opiino()+1128 opiodr()+1548 opidrv()+896 sou2o()+80 opimai_real()+124 main()+152

Compare the data in the trace file with the data extracted from disk using kfed.

Comparing the block dumped in the trace file and the block in disk, it is possible to identify the exact cause of the check validation failure. Every case will be different, but if the data stored in disk is zeros, always remember to validate other blocks (adjacent). If more blocks are reporting invalid data (zeros), this is an indication the disk has been formatted outside ASM.

Example 1:

This is an example of a block with invalid data. The type of the block is KFBTYP_INVALID, generated when a incorrect type is stored.

kfbh.endian: 0 ; 0x000: 0x00

kfbh.hard: 34 ; 0x001: 0x22

kfbh.type: 0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt: 0 ; 0x003: 0x00

kfbh.block.blk: 4290772992 ; 0x004: T=1 NUMB=0x7fc00000

kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0

kfbh.check: 0 ; 0x00c: 0x00000000

kfbh.fcn.base: 13879 ; 0x010: 0x00003637

kfbh.fcn.wrap: 512 ; 0x014: 0x00000200

kfbh.spare1: 978943 ; 0x018: 0x000eefff

kfbh.spare2: 2054913149 ; 0x01c: 0x7a7b7c7d

Example 2:

The full content of the block has 0xd4.

disk:0 au:2 block:253 file:1 physical extent:0 block:253
kfed	read	ausz=1048576	blksz=4096	aunum=2	blknum=253 dev=/dev/rdsk/c2t50060E8000C41384d2s6

kfbh.endian:	212 ; 0x000: 0xd4
kfbh.hard:	212 ; 0x001: 0xd4
kfbh.type:	212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:	212 ; 0x003: 0xd4
kfbh.block.blk:	3570717908 ; 0x004: T=1 NUMB=0x54d4d4d4 
kfbh.block.obj:	3570717908 ; 0x008: TYPE=0xd NUMB=0x4d4d4 
kfbh.check:	3570717908 ; 0x00c: 0xd4d4d4d4
kfbh.fcn.base:	3570717908 ; 0x010: 0xd4d4d4d4 
kfbh.fcn.wrap:	3570717908 ; 0x014: 0xd4d4d4d4 
kfbh.spare1:	3570717908 ; 0x018: 0xd4d4d4d4 
kfbh.spare2:	3570717908 ; 0x01c: 0xd4d4d4d4 
kfbtTraverseBlock: Invalid OSM block type 212
0000: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0020: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0040: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0060: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0080: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 00a0: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4

CASE STUDY

The diskgroup was not used for some months, used by a copy of a database. Due to business reasons, that database required to be used. Mounting the diskgroup was possible, but when the database was mounted, and reading the ASM metadata was required, error ORA-15196 was signaled and diskgroup dismounted.

The diskgroup was configured using external redundancy with a single disk and using the default Allocation Unit size of 1MB.

Data Collected

The messages in the alert.log:

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

The ASM block dumped in the trace file.

*** SESSION ID:(108.5) 2008-02-06 10:05:31.054

OSM metadata block dump:

kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 7 ; 0x002: KFBTYP_ACDC

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 10752 ; 0x004: T=0 NUMB=0x2a00

kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3

kfbh.check: 1103194877 ; 0x00c: 0x41c16afd

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

AMDU together with 300MB for the disk were collected.

Data Review

The error:

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

The error provides the following information:

o The field failing the validation is obj_kfbl

o The block belongs to file 1 (fn=1). File 1 is the File Directory.

o The block is block 256 (blk=256)

o The value for obj_kfbl found was 3 but the expected value should be 1.

File extents, allocation units, blocks in ASM start at 0. Also, block size is 4096. Using the default AU size (1MB), there are 256 blocks. Block 256 is stored in the second extent.

Although the diskgroup was mounted, any query referencing x$kffxp trying to get the extent mapping for file 1 failed. As a result, it was not possible to identify the AU used by block 256 from file 1 (the affected block).

Using AMDU

One of the files generated by AMDU is the mapping file (*.map) . That file contains the location on disk for every extent of the files stored in the diskgroup. The only record for file 1 was:

N0001 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152

This line indicates that for File 1 (F00000001)), the first extent is stored in Allocation Unit 2 ( A00000002 ) from disk 0 ( D0000 ) .

t was not another entry for file 1 in the mapping file, but AMDU was generating a core dump. It was discovered AMDU was trying to read Allocation Unit 50.

One of the cool things of AMDU, is the possibility of dumping the content of a complete extent for a particular file, redirecting the output into a text file.

$amdu –diskstring ‘<path of device>’ –dump ‘<diskgroup name> -print ‘DG.F1.X1.B0.C256’

The previous command will dump 256 blocks of File 1 Extent 1 starting at block 0.

The results of the last command were:

************************** PRINTING XYZ.F1.X1.B0.C2 **************************

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

At this point the conclusions were:

The ASM metadata shows that Allocation Unit 50 from disk 0 belongs to File 1.

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

If the block belongs to file 1, the value for kfbh.block.obj field should have been 1 together with the value for kfbh.type, which should have been KFBTYP_FILEDIR. But that was not the case:

The error ORA-15196:

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

The content dumped into the trace file was the same found on disk. The check validation failed because the data stored in the block was not part of the correct ASM metadata, in this case file directory.

The next step was to validate all the blocks in the same Allocation Unit. Those blocks belong to the same ASM metadata (KFBTYP_FILEDIR). One Allocation Unit is used exclusively by one unique file.

Example for block 1 from AU 50:

disk:0 au:50 block:1 file:1 physical extent:1 block:1

kfed read ausz=1048576 blksz=4096 aunum=50 blknum=1 dev=/emea/bde/home/users/jfiguer2/disk.dd

The solution

There was not an available backup for the database stored on the diskgroup, so it was required to keep the diskgroup mounted. Patching the ASM metadata, replacing the content of the first block from Allocation Unit 50, with a valid data.

It was not possible to rebuild the real data for the block 0, so it was replaced with block

Additional patching was required, in order to adjust other fields in the block. Once the block was successfully patched, the diskgroup was mounted and queries on internal views did not dismount the diskgroup.

Opening the database report errors trying to identify one data file. The extent mapping for this file was stored in the patched block. Luckily that file was not relevant for the database. After setting the file offline, the database opened without errors.

Because was not possible to guarantee the integrity of the diskgroup, it was recommended to take a backup of the database and rebuild the diskgroup

↧

Database Restore after Server's storage crash

April 18, 2017, 2:03 am

≫ Next: extracted from data files

≪ Previous: ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

So after power failure storage crashed for good and the server is "close to death", no way services can be restored or anything. We have daily backup using symantec backup exec from the entire system. So we have the full storage restored to an exteral HD (where all the files for the DB are).

We got a new server and we try to replicate the oracle database on it. We are using the cold backup restore way with no luck, cause after all the steps at alter database open resetlogs; we get a consistency error. The recover says to use backup controlfile, the problem is there are none archivelogs in any of our backups.

Any suggestions?

We use:

Windows server 2003 R2

Oracle 11g

answer:

If the database was backed up just using symantec to backup the files in noarchivelog and this wasn't integrated with a database shutdown then none of your backups are going to be consistent. Symantec is no different to any other tool that backs up files at the os level. You may be able to open your inconsistent database using some underscore parameters to bypass some of the checks the database is trying to do or you may be able to get access to PRM-DUL( http://www.parnassusdata.com/) or some other utility that can extract data from an unopened database but you'll need to contact oracle support for any of these. In any case any data you do extract is likely to be inconsisent. If you know your application/database was doing nothting during the time the backusp were taken then forcing the database open may be an option - but you'll need to do a full export from it and load it into a new database to get back into a 'good' state.

Good luck - this is a situation no-one wants to be in.....

↧

extracted from data files

April 18, 2017, 2:13 am

≫ Next: Test recovery with unsupported parameters

≪ Previous: Database Restore after Server's storage crash

Hello All,

DB version: 9i Rel 2

Which Oracle command line tools allow data to be extracted from data files that are NOT ATTACHED to a database?

Oracle Support has a tool called DUL, Data UnLoader, to extract data from data files. They have a nice rate they charge for the service as well. There are other groups that have such tool, and they charge the for the service as well.

PRM-DUL is another dul option: http://www.parnassusdata.com/

↧

Test recovery with unsupported parameters

April 18, 2017, 2:16 am

≫ Next: ORACLE DBF Data Recovery ??

≪ Previous: extracted from data files

> Our backup is overwritten with a corrupt backup too.

Restore from the next older backup

Most customers buy a Perpetual licence that does not terminate. What your organisation needs to be doing is to be paying for Support every year.

As regards the ORA-600s and recovery : Your best bet is Oracle. There are third party vendors -- search google for "DUL" and "PRM-DUL".http://www.parnassusdata.com/

↧

ORACLE DBF Data Recovery ??

April 18, 2017, 2:18 am

≫ Next: import data from a DBF to oracle

≪ Previous: Test recovery with unsupported parameters

Could someone please suggest if it's and how it's possible to recover data from DBF files if oracle tablespace can't be run ??

I read a lot of mails,articles etc ... regarding how to and when , but just got from one error to another ....

Here is what happened ...:

We have Oracle 10.2.0.1 which is 90Gb big and is in 5 files ..... OS is linux ...

Due to some optimization testing we made a backup copy of our /HOME dir on linux ..... but the guys who did the copy job ,missed out that the 5th DBF file had a link to the original file which was outside of our /HOME ... so the original file was not copied just link

So when we run the backup copy .... Oracle was working ok .... we did some tests , some truncation of one tabel .... but after a few hours we decided to stick with our original db ....

When we then tried to run the original db , we failed ..... the first error was "Oracle in shutdown ......etc..." .... then we shutdon unmount, mount and got the real error :" Inconsisstency with log files for this 5th file " .....

So what happened is that the 5th file , because of the link .... was never copied like the other 4 BUT was used at first with our original DB and then used also with our backup DB (ofcourse DBs were identical at that time)... and because we did some table truncation, the 5th file the file was changed ....

I know we lost data from our 5th file .....So now the question is " How can we get data from our first 4 files " ?? So that we at least get something out of it ? Is there any software available for extraction of "corrupted" data from dbf files ?

It will run, but that is meant for dropping the tablespace after u open the database. U can also take the datafile offline and try to open the database, but I do not know if you will be able to read all tables in the tablespace, you should probably export the tables anyway.

Here are a few links that might help.

1. use DUL, I do not have experience with it, so u have to do some research.

http://www.parnassusdata.com/en/oracle-dul

2. Open the database by setting the ALLOWRESETLOGS_CORRUPTION=TRUE in the init.ora. But there is no 100% guarantee that we can open the database. However, once the database is opened, then you must immediately rebuild the database. Database rebuild means: perform a full-database export, create a new and separate database, and import the recent export dump.

[http://www.dbspecialists.com/files/presentations/missing_logs.html]

3. Third, but it should be first, if u have Oracle Support, call them

↧

import data from a DBF to oracle

April 18, 2017, 2:24 am

≫ Next: Oracle Database Dead?

≪ Previous: ORACLE DBF Data Recovery ??

i want to import data from a DBF file to my database Oracle 10g.

how to do that?

In some very special cases(you cannot restart database instance of the DBF file), there are some unloader tools such as DUL (contact Oracle support) or PRM-DUL

( http://www.parnassusdata.com/en/oracle-dul ).

↧

Oracle Database Dead?

April 18, 2017, 2:27 am

≫ Next: Merge data from oracle .dbf datafile

≪ Previous: import data from a DBF to oracle

Hello. I'm not very knowledgeable when it comes to DBA work, but I've hit a snag and would appreciate a little advice. I have an Oracle XE install on my home computer for keeping track of business receipts and such. Recently I couldn't connect to APEX and started looking into it. When I tried to connect with SQL+, I would get the message:

ERROR:

OR-01033: ORACLE initialization or shutdown in progress

So I stopped and started the DB, checked to see the listener service was going, and still have the issue. I checked my alert_xe.log file, and it's... huge. Attached at the bottom, I included a fragment of my log file that consists of the last recent timestamp.

I do have a script that runs on my computer that does an EXP every night on my schema, but if I try an IMP, I get the same message (which makes sense since my DB apparently isn't running).

What would be your suggestions? I wanted some advice before I uninstalled and re-installed then imported my dmp backups.

Thanks to any necromancers who can help me bring this back from the dead!

Dump file e:\oraclexe\app\oracle\admin\xe\bdump\alert_xe.log

Thu Oct 29 20:40:40 2009

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Windows XP Version V5.1 Service Pack 2

CPU : 1 - type 586

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:211M/511M, Ph+PgF:1012M/1247M, VA:1945M/2047M

Thu Oct 29 20:40:40 2009

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 2

Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST

Autotune of undo retention is turned on.

IMODE=BR

ILAT =10

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Thu Oct 29 20:40:52 2009

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.1.0.

System parameters with non-default values:

sessions = 49

__shared_pool_size = 104857600

__large_pool_size = 8388608

__java_pool_size = 4194304

__streams_pool_size = 0

spfile = E:\ORACLEXE\APP\ORACLE\PRODUCT\10.2.0\SERVER\DBS\SPFILEXE.ORA

sga_target = 146800640

control_files = E:\ORACLEXE\ORADATA\XE\CONTROL.DBF

__db_cache_size = 25165824

compatible = 10.2.0.1.0

db_recovery_file_dest = E:\oraclexe\app\oracle\flash_recovery_area

db_recovery_file_dest_size= 10737418240

undo_management = AUTO

undo_tablespace = UNDO

remote_login_passwordfile= EXCLUSIVE

dispatchers = (PROTOCOL=TCP) (SERVICE=XEXDB)

shared_servers = 4

job_queue_processes = 4

audit_file_dest = E:\ORACLEXE\APP\ORACLE\ADMIN\XE\ADUMP

background_dump_dest = E:\ORACLEXE\APP\ORACLE\ADMIN\XE\BDUMP

user_dump_dest = E:\ORACLEXE\APP\ORACLE\ADMIN\XE\UDUMP

core_dump_dest = E:\ORACLEXE\APP\ORACLE\ADMIN\XE\CDUMP

db_name = XE

open_cursors = 300

os_authent_prefix =

pga_aggregate_target = 41943040

PSP0 started with pid=3, OS id=2988

MMAN started with pid=4, OS id=2992

PMON started with pid=2, OS id=2984

DBW0 started with pid=5, OS id=3004

LGWR started with pid=6, OS id=3008

CKPT started with pid=7, OS id=3012

SMON started with pid=8, OS id=3016

RECO started with pid=9, OS id=3020

CJQ0 started with pid=10, OS id=3024

MMON started with pid=11, OS id=3028

MMNL started with pid=12, OS id=3032

Thu Oct 29 20:40:55 2009

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

starting up 4 shared server(s) ...

Oracle Data Guard is not available in this edition of Oracle.

Thu Oct 29 20:41:00 2009

alter database mount exclusive

Thu Oct 29 20:41:05 2009

Setting recovery target incarnation to 2

Thu Oct 29 20:41:05 2009

Successful mount of redo thread 1, with mount id 2582644764

Thu Oct 29 20:41:05 2009

Database mounted in Exclusive Mode

Completed: alter database mount exclusive

Thu Oct 29 20:41:05 2009

alter database open

Thu Oct 29 20:41:06 2009

Beginning crash recovery of 1 threads

Thu Oct 29 20:41:06 2009

Started redo scan

Thu Oct 29 20:41:07 2009

Completed redo scan

9180 redo blocks read, 514 data blocks need recovery

Thu Oct 29 20:41:07 2009

Started redo application at

Thread 1: logseq 797, block 5520

Thu Oct 29 20:41:09 2009

Recovery of Online Redo Log: Thread 1 Group 2 Seq 797 Reading mem 0

Mem# 0 errs 0: E:\ORACLEXE\APP\ORACLE\FLASH_RECOVERY_AREA\XE\ONLINELOG\O1_MF_2_2MXYQN2G_.LOG

RECOVERY OF THREAD 1 STUCK AT BLOCK 28 OF FILE 2

Thu Oct 29 20:41:11 2009

Aborting crash recovery due to error 1172

Thu Oct 29 20:41:11 2009

Errors in file e:\oraclexe\app\oracle\admin\xe\udump\xe_ora_3080.trc:

ORA-01172: recovery of thread 1 stuck at block 28 of file 2

ORA-01151: use media recovery to recover block, restore backup if needed

ORA-1172 signalled during: alter database open...

Thu Oct 29 20:55:05 2009

db_recovery_file_dest_size of 10240 MB is 0.98% used. This is a

user-specified limit on the amount of space that will be used by this

database for recovery-related files, and does not reflect the amount of

space available in the underlying filesystem or ASM diskgroup.

Thu Oct 29 22:50:17 2009

WARNING: inbound connection timed out (ORA-3136)

While I would generally wholeheartedly concur with that, I believe that XE only allows you to install one instance per server. I haven't tried to see whether that is a licensing restriction or a technical restriction that is checked by the installer, but I'm not sure that you can get a second XE database running on the server.

And while there are tools that can recover data from old data files, given that the free version of Oracle is being used, those tools probably aren't an option-- I don't think Oracle's DUL is available unless you have a support contract which isn't available on XE and the competitors are pretty darned pricey. If the data isn't important enough to need a commercial version of the database with support, patches, etc., it's probably not important enough to license one of these tools.

You can also try PRM-DUL : http://www.parnassusdata.com/en/oracle-dul

↧

Merge data from oracle .dbf datafile

April 18, 2017, 2:29 am

≫ Next: Unable to open the database after data block corruption

≪ Previous: Oracle Database Dead?

Hi, i need a help.

I'm totaly new to oracle.

I currently have database server wich had an error so i create another server that runs for some times and have data in it too. Now i want to merge the data from the old database tho the new database wich have excatly the same name, structure and schema with the old one. The problem is i only have the .dbf file from the old database while the new one is running fine.

Can anyone help me? Thanks

In general it is not possible to work only with datafiles. You need to to have a running database in order to be able access data and merge it.

If you have all datafiles and controlfiles from a consistent backup you should try to restore and restart this database.

If you this is not possible there are some specialized tools to work directly on datafiles like PRM-DUL and DUL from specialized companies; Oracle Support can also help in this case.

PRM-DUL can be find here : http://www.parnassusdata.com/en/oracle-dul

↧

Unable to open the database after data block corruption

April 18, 2017, 2:34 am

≫ Next: My database is crash.I have only one datafile of billing.

≪ Previous: Merge data from oracle .dbf datafile

O.S-windows vista enterprise(Personal PC),

oracle 11.1.0.2.0

I am getting the following error, please help to fix:

I am unable to recover data file &open database.

SQL> conn sys as sysdba

Enter password:

Connected to an idle instance.

SQL> startup open

ORACLE instance started.

Total System Global Area 644468736 bytes

Fixed Size 1376520 bytes

Variable Size 251662072 bytes

Database Buffers 385875968 bytes

Redo Buffers 5554176 bytes

Database mounted.

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00604: error occurred at recursive SQL level 1

ORA-01578: ORACLE data block corrupted (file # 1, block # 665)

ORA-01110: data file 1: 'C:\ORACLE\ORADATA\GIRO\SYSTEM01.DBF'

Process ID: 6132

Session ID: 767 Serial number: 3

SQL> recover datafile 'C:\ORACLE\ORADATA\GIRO\SYSTEM01.DBF';

ERROR:

ORA-03114: not connected to ORACLE

SQL> shutdown immediate

ORA-24324: service handle not initialized

ORA-01041: internal error. hostdef extension doesn't exist

SQL>

Answer:

You cannot recover media failure in noarchivelog mode. The best you can do is use a data recovery service like PRM-DUL http://www.parnassusdata.com/en/oracle-dul or Oracle's expensive service unless you get lucky and Aman's suggestion works. Support might have some help since it is the system data file, but I wouldn't hold my breath.

This is why you take backups.

If all the data you need is character type, you might try the unix strings command.

↧