MySQL Recover Table Structure From InnoDB Dictionary

November 6, 2019, 11:29 pm

≪ Previous: MySQL Recover InnoDB dictionary

When a table gets dropped MySQL removes respective .frm file. This post explain how to recover table structure if the table was dropped.

You need the table structure to recover a dropped table from InnoDB tablespace. The B+tree structure of InnoDB index doesn’t contain any information about field types. MySQL needs to know that in order to access records of InnoDB table. Normally MySQL gets the table structure from .frm file. But when MySQL drops a table the respective frm file removed too.

Fortunately there is one more place where MySQL keeps the tables structure . It is the InnoDB dictionary.

InnoDB dictionary is a set of tables where InnoDB keeps some information about the tables. I reviewed them in details is a separate InnoDB Dictionary post earlier. After the DROP InnoDB deletes records related to the dropped table from the dictionary. So we need to recover deleted records from the dictionary and then get the table structure.

Compiling Data Recovery Tool

First, we need to get the source code. The code is hosted on LaunchPad.

# bzr branch lp:undrop-for-innodb

To compile it we need gcc, bison and flex.

# make

cc -g -O3 -I./include -c stream_parser.c

cc -g -O3 -I./include -pthread -lm stream_parser.o -o stream_parser

flex sql_parser.l

bison -o sql_parser.c sql_parser.y

sql_parser.y: conflicts: 6 shift/reduce

cc -g -O3 -I./include -c sql_parser.c

cc -g -O3 -I./include -c c_parser.c

cc -g -O3 -I./include -c tables_dict.c

cc -g -O3 -I./include -c print_data.c

cc -g -O3 -I./include -c check_data.c

cc -g -O3 -I./include sql_parser.o c_parser.o tables_dict.o print_data.o check_data.o -o c_parser -pthread -lm

cc -g -O3 -I./include -o innochecksum_changer innochecksum.c

Recover InnoDB Dictionary

Now let’s create dictionary tables in database sakila_recovered. The data recovery tool comes with structure of the dictionary tables.

# cat dictionary/SYS_* | mysql sakila_recovered

The dictionary is stored in ibdata1 file. So, let’s parse it.

# ./stream_parser -f /var/lib/mysql/ibdata1

...

Size to process: 79691776 (76.000 MiB)

Worker(0): 84.13% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 7.984 MiB/sec

Worker(2): 84.21% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 8.000 MiB/sec

Worker(1): 84.21% done. 2014-09-03 16:31:21 ETA(in 00:00:00). Processing speed: 4.000 MiB/sec

All workers finished in 2 sec

Now we need to extract the dictionary records from InnoDB pages. Let’s create a directory for table dumps.

# mkdir -p dumps/default

And now we can generate table dumps and LOAD INFILE commands to load the dumps. We also need to specify -D option to c_parser because the records we need were deleted from the dictionary when the table was dropped.

SYS_TABLES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page \

-t dictionary/SYS_TABLES.sql \

> dumps/default/SYS_TABLES \

2> dumps/default/SYS_TABLES.sql

SYS_INDEXES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page \

-t dictionary/SYS_INDEXES.sql \

> dumps/default/SYS_INDEXES \

2> dumps/default/SYS_INDEXES.sql

SYS_COLUMNS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000002.page \

-t dictionary/SYS_COLUMNS.sql \

> dumps/default/SYS_COLUMNS \

2> dumps/default/SYS_COLUMNS.sql

and SYS_FIELDS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000004.page \

-t dictionary/SYS_FIELDS.sql \

> dumps/default/SYS_FIELDS \

2> dumps/default/SYS_FIELDS.sql

With the generated LOAD INFILE commands it’s easy to load the dumps.

# cat dumps/default/*.sql | mysql sakila_recovered

Now we have InnoDB dictionary loaded into normal InnoDB tables.

Compiling sys_parser

sys_parser is a tool that reads dictionary from tables stored in MySQL and generates CREATE TABLE structure for a table.

To compile it we will need MySQL libraries and development files. Depending on a distribution they may be in -devel or -dev package. On RedHat based system you can check it with command yum provides “*/mysql_config” . On my server it was package mysql-community-devel.

If all necessary packages are installed compilation boils down to simple command:

# make sys_parser

/usr/bin/mysql_config

cc `mysql_config --cflags` `mysql_config --libs` -o sys_parser sys_parser.c

Recover Table Structure

Now sys_parser can do its magic. Just run it to get the CREATE statement in standard output.

# ./sys_parser

sys_parser [-h <host>] [-u <user>] [-p <passowrd>] [-d <db>] databases/table

It will use root as username to connect to MySQL, querty as the password. The dictionary is stored in SYS_* tables in database sakila_recovered. And we want to recover is sakila.actor. InnoDB uses a slash ‘/’ as a separator between database name and table name so does sys_parser.

# ./sys_parser -u root -p qwerty -d sakila_recovered sakila/actor

CREATE TABLE `actor`(

`actor_id` SMALLINT UNSIGNED NOT NULL,

`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,

`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,

`last_update` TIMESTAMP NOT NULL,

PRIMARY KEY (`actor_id`)

) ENGINE=InnoDB;

# ./sys_parser -u root -p qwerty -d sakila_recovered sakila/customer

CREATE TABLE `customer`(

`customer_id` SMALLINT UNSIGNED NOT NULL,

`store_id` TINYINT UNSIGNED NOT NULL,

`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,

`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,

`email` VARCHAR(50) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci',

`address_id` SMALLINT UNSIGNED NOT NULL,

`active` TINYINT NOT NULL,

`create_date` DATETIME NOT NULL,

`last_update` TIMESTAMP NOT NULL,

PRIMARY KEY (`customer_id`)

) ENGINE=InnoDB;

There are few caveats though.

InnoDB doesn’t store all information you can find in the frm file. For example, if a field is AUTO_INCREMENT InnoDB dictionary knows nothing about it. Therefore, sys_parser will not recover that property. If there were any field or table level comments they’ll be lost

sys_parser generates the table structure eligible for further data recovery. It could but it does not recover secondary indexes, foreign keys.

InnoDB doesn’t stores DECIMAL type as a binary string. It doesn’t store precision of a DECIMAL field. So that information will be lost.

For example, table payment uses DECIMAL to store money.

# ./sys_parser -u root -p qwerty -d sakila_recovered sakila/payment

CREATE TABLE `payment`(

`payment_id` SMALLINT UNSIGNED NOT NULL,

`customer_id` SMALLINT UNSIGNED NOT NULL,

`staff_id` TINYINT UNSIGNED NOT NULL,

`rental_id` INT,

`amount` DECIMAL(6,0) NOT NULL,

`payment_date` DATETIME NOT NULL,

`last_update` TIMESTAMP NOT NULL,

PRIMARY KEY (`payment_id`)

) ENGINE=InnoDB;

Fortunately Oracle is planning to extend InnoDB dictionary and finally get rid of .frm files. I salute that decision, having the structure in two places leads to inconsistencies.

↧

MySQL UnDROP tool for InnoDB

November 6, 2019, 11:36 pm

≫ Next: Take image from corrupted hard drive

≪ Previous: MySQL Recover Table Structure From InnoDB Dictionary

TwinDB data recovery toolkit is a set of tools that work with InnoDB tablespaces at low level.

Incredible Performance of stream_parser

stream_parser is a tool that finds InnoDB pages in stream of bytes. It can be either file such as ibdata1, *.ibd or raw partition.

stream_parser runs as many parallel workers as number of CPUs in the system. The performance of stream_parser is amazing! Compare how stream_parser outperforms page_parser on a four-CPU virtual machine running on

my laptop:

# ./page_parser -f /dev/mapper/vg_twindbdev-lv_root -t 18G

Opening file: /dev/mapper/vg_twindbdev-lv_root

...

Size to process: 19327352832 (18.000 GiB)

1.00% done. 2014-06-23 03:03:48 ETA(in 00:18 hours). Processing speed: 17570320 B/sec

2.00% done. 2014-06-23 03:05:27 ETA(in 00:19 hours). Processing speed: 16106127 B/sec

3.00% done. 2014-06-23 03:02:11 ETA(in 00:16 hours). Processing speed: 19327352 B/sec

4.00% done. 2014-06-23 03:03:48 ETA(in 00:17 hours). Processing speed: 17570320 B/sec

...

So, it takes almost 20 minutes to parse 18G partition.

Let’s check stream_parser

# ./stream_parser -f /dev/mapper/vg_twindbdev-lv_root -t 18G

...

Size to process: 19327352832 (18.000 GiB)

Worker(0): 1.91% done. 2014-06-23 02:51:41 ETA(in 00:00:56). Processing speed: 79.906 MiB/sec

Worker(2): 1.74% done. 2014-06-23 02:51:47 ETA(in 00:01:02). Processing speed: 72.000 MiB/sec

Worker(3): 3.30% done. 2014-06-23 02:51:15 ETA(in 00:00:30). Processing speed: 144.000 MiB/sec

Worker(1): 1.21% done. 2014-06-23 02:52:20 ETA(in 00:01:35). Processing speed: 47.906 MiB/sec

Worker(2): 5.38% done. 2014-06-23 02:51:11 ETA(in 00:00:25). Processing speed: 168.000 MiB/sec

Worker(3): 9.72% done. 2014-06-23 02:51:00 ETA(in 00:00:14). Processing speed: 296.000 MiB/sec

...

Worker(0): 88.91% done. 2014-06-23 02:52:06 ETA(in 00:00:02). Processing speed: 191.625 MiB/sec

Worker(0): 93.42% done. 2014-06-23 02:52:06 ETA(in 00:00:01). Processing speed: 207.644 MiB/sec

Worker(0): 97.40% done. 2014-06-23 02:52:06 ETA(in 00:00:00). Processing speed: 183.641 MiB/sec

All workers finished in 31 sec

So, 18 minutes versus 31 seconds. 34 times faster! Impressive, isn’t it?

c_parser Improvements

c_parser is a tool that reads InnoDB page or many pages, extracts records and stores them in tab-separated values dumps. InnoDB page with user data doesn’t store information about table structure. You should tell c_parser what fields you’re looking for. Command line option -t specifies a file with CREATE TABLE statement.

This is how it works. Here’s the CREATE statement (I took it from mysqldump)

# cat sakila/actor.sql

CREATE TABLE `actor` (

`actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,

`first_name` varchar(45) NOT NULL,

`last_name` varchar(45) NOT NULL,

`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

PRIMARY KEY (`actor_id`),

KEY `idx_actor_last_name` (`last_name`)

) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8;

And now let’s fetch records of table actor from InnoDB pages:

# ./c_parser -6f pages-actor.ibd/FIL_PAGE_INDEX/0000000000001828.page -t sakila/actor.sql

-- Page id: 3, Format: COMPACT, Records list: Valid, Expected records: (200 200)

000000005313 970000013C0110 actor 1 "PENELOPE" "GUINESS" "2006-02-15 04:34:33"

000000005313 970000013C011B actor 2 "NICK" "WAHLBERG" "2006-02-15 04:34:33"

000000005313 970000013C0126 actor 3 "ED" "CHASE""2006-02-15 04:34:33"

...

000000005313 970000013C09D8 actor 199 "JULIA""FAWCETT" "2006-02-15 04:34:33"

000000005313 970000013C09E4 actor 200 "THORA""TEMPLE" "2006-02-15 04:34:33"

-- Page id: 3, Found records: 200, Lost records: NO, Leaf page: YES

The version 5.6 of MySQL introduced few format changes. Most of them were already supported. The c_parser fixes on top of that some bugs in processing temporal fields.

The new UnDROP tool for InnoDB is still no reason not to take backups :-), but at least you can be armed better if the inevitable happens.

How to Recover Table Structure

MySQL stores table structure in a respective .frm file. When the table is dropped the .frm file is gone. Fortunately InnoDB stores copy of the structure in the dictionary. sys_parser is a tool that can read the dictionary and generate CREATE TABLE statement. Check how you can Recover Table Structure From InnoDB Dictionary.

How to Install TwinDB Data Recovery Toolkit

Check out the source code from LaunchPAD:

# $ bzr branch lp:undrop-for-innodb

Branched 33 revisions.

Or you can download an archive with the latest revision from download page.

Compile the source code. But first install dependencies: make, gcc, flex, bison.

root@twindb-dev undrop-for-innodb]# make

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c stream_parser.c

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -pthread -lm stream_parser.o -o stream_parser

flex sql_parser.l

bison -o sql_parser.c sql_parser.y

sql_parser.y: conflicts: 6 shift/reduce

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c sql_parser.c

lex.yy.c:3078: warning: ‘yyunput’ defined but not used

lex.yy.c:3119: warning: ‘input’ defined but not used

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c c_parser.c

./include/ctype-latin1.c:359: warning: ‘my_mb_wc_latin1’ defined but not used

./include/ctype-latin1.c:372: warning: ‘my_wc_mb_latin1’ defined but not used

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c tables_dict.c

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c print_data.c

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -c check_data.c

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include sql_parser.o c_parser.o tables_dict.o print_data.o check_data.o -o c_parser -pthread -lm

cc -D_FILE_OFFSET_BITS=64 -Wall -g -O3 -pipe -I./include -o innochecksum_changer innochecksum.c

[root@twindb-dev undrop-for-innodb]#

UPDATE:

The toolkit is tested on following systems:

CentOS release 5.10 (Final) x86_64

CentOS release 6.5 (Final) x86_64

CentOS Linux release 7.0.1406 (Core) x86_64

Fedora release 20 (Heisenbug) x86_64

Ubuntu 10.04.4 LTS (lucid) x86_64

Ubuntu 12.04.4 LTS (precise) x86_64

Ubuntu 14.04 LTS (trusty) x86_64

Debian GNU/Linux 7.5 (wheezy) x86_64

32 bit operating systems are not supported

An InnoDB index doesn’t carry information about the table structure indeed. MySQL keeps the structure in .frm files and InnoDB stores the structure in the dictionary. When the table structure isn’t available from external source (old backup, installation script etc) then possible way to recover the structure are:

1) Recover from .frm files. There are some tools around available . I prefer to create a dummy table, replace the .frm file and run SHOW CREATE TABLE. This option however is useless when DROP TABLE happens, MySQL deletes the .frm file as well.

2) Recover the structure from the InnoDB dictionary. InnoDB stores almost all necessary information about the table structure in the dictionary. When a user runs DROP TABLE the respective records are deleted from the dictionary tables, so when recover the dictionary tables you need to specify -D option to c_parser (-D recovers records that are marked as deleted). The tables you need are SYS_TABLES, SYS_INDEXES, SYS_FIELDS and SYS_COLUMNS. Then load everything into a live instance of MySQL. A tool sys_parser from the toolkit reads SYS_* tables from MySQL and generates CREATE TABLE statement.

↧

Take image from corrupted hard drive

November 6, 2019, 11:41 pm

≫ Next: prm dul release 5108 rc8

≪ Previous: MySQL UnDROP tool for InnoDB

There are at least two cases when it makes sense to take an image from a corrupted hard drive as soon as possible: disk hardware errors and corrupted filesystem. Faulty hard drives can give just one chance to read a block, so there is no time for experiments. The similar picture with corrupted filesystems. Obviously something went wrong, it’s hard to predict how the operating system will behave next second and whether it will cause even more damage.

Save disk image to local storage

Probably the best and fastest way is to plug the faulty disk into a healthy server and save the disk image locally:

# dd if=/dev/sdb of=/path/on/sda/faulty_disk.img conv=noerror

Where /dev/sdb is the faulty disk and faulty_disk.img is the image on the healthy /dev/sda disk.

conv=noerrror tells dd to continue reading even if read() call exited with an error. Thus dd will skip bad areas and dump as much information from the disk as possible.

By default dd reads 512 bytes and it is a good value. Reading larger blocks would be faster, but the larger block will fail even if a small portion of the block is unreadable. InnoDB page is 16k, so dd reads one page in eight operations. It’s possible to extract information even if the page is partially corrupt. So, reading in 512 bytes blocks seems to be optimal unless somebody convinces me in opposite.

Save disk image to remote storage

If the faulty disk can’t be unplugged the best (if not only) way is to save the disk image on a remote storage.

Netcat is an excellent tool for this purpose.

Start on the destination side a server:

# nc -l 1234 > faulty_disk.img

On the server with the faulty disk take a dump and stream it over network

# dd if=/dev/sdb of=/dev/stdout conv=noerror | nc a.b.c.d 1234

a.b.c.d is the IP address of the destination server.

Why dd is better for MySQL data recovery

There is a bunch of good file recovery or file undelete tools. However they serve slightly different purpose. In short they try to reconstruct a file. They care about a file system.

For MySQL data recovery we don’t need files, we need data. InnoDB page can be recognized by a short signature in the beginning of the page. In the fixed places there are two internal records in every index page infimum and supremum:

00000000 3f ff 6f 3d 00 00 11 e0 ff ff ff ff 00 00 11 e8 |?.o=............|

00000010 00 00 00 00 14 8f 8f 57 45 bf 00 00 00 00 00 00 |.......WE.......|

00000020 00 00 00 00 00 00 00 17 3b 58 00 af 1d 95 1d fb |........;X......|

00000030 1d 3e 00 02 00 03 00 5a 00 00 00 00 00 00 00 00 |.>.....Z........|

00000040 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 |................|

00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 01 |................|

00000060 00 00 03 00 8d 69 6e 66 69 6d 75 6d 00 09 03 00 |.....infimum....|

00000070 08 03 00 00 73 75 70 72 65 6d 75 6d 00 38 b4 34 |....supremum.8.4|

00000080 30 28 24 20 18 11 0b 00 00 10 15 00 d5 53 59 53 |0($ .........SYS|

00000090 5f 46 4f 52 45 49 47 4e 00 00 00 00 03 00 80 00 |_FOREIGN........|

If the header is good then we know what table the page belongs to, how many records to expect etc. Even if the rest of the page is heavily corrupted it’s possible to extract all survived records.

I had several cases when dd excelled.

Story #1.

It was a dying hard drive. InnoDB crashed all the time. When a customer figured out the problem was with the disk they tried to copy MySQL file. But simple copy has failed. The customer had tried to read the files with some file recovery tool.

MySQL refused to start and reported checksum mismatched in the error log.

The customer provided the recovered files. Size of ibdata1 file was reasonable, but stream_parser has found ~20MB of pages. ibdata1 was almost empty inside – just all zeroes where the data should be. I doubt that even 40% of data was recovered.

Then we tried to take a dump of the disk and recover InnoDB tables from the image. First of all, there were found ~200MB of pages. Many tables were 100% recovered and around 80-90% records were fetched from corrupted tables.

Story #2.

A customer has dropped InnoDB database. MySQL was running with innodb_file_per_table=ON. So, the tables were in .ibd file that were deleted. It was a Windows server and the customer used some tool to undelete the .ibd files from NTFS filesystem. The tool restored the files, but the ibd files were almost empty inside. The recovery rate was close to 20%.

Recovery from a disk dump gave around 70-80% of records.

↧

prm dul release 5108 rc8

November 20, 2019, 10:22 pm

≫ Next: prm dul release 5108 rc10

≪ Previous: Take image from corrupted hard drive

add rdba to cannot analysis block error message

ignore SYC_C00 column

https://zcdn.parnassusdata.com/DUL5108rc8.zip

↧

prm dul release 5108 rc10

January 29, 2020, 4:28 am

≫ Next: RMAN-6026 RMAN-6023 During RESTORE Operation

≪ Previous: prm dul release 5108 rc8

add support TR8MSWIN1254

https://zcdn.parnassusdata.com/DUL5108rc10.zip

↧

RMAN-6026 RMAN-6023 During RESTORE Operation

March 7, 2020, 7:01 pm

≫ Next: Common Causes for RMAN-06023 and RMAN-06026

≪ Previous: prm dul release 5108 rc10

Problem Description

-------------------

You are attempting to restore a database using Oracle Recovery

Manager (RMAN) using a 'set until time' parameter to do a point-in-time

recovery:

run {

set until time = '09-JUN-2000:10:30:00';

allocate channel x type disk;

restore database;

recover database;

}

However, this command fails with the following error stack:

RMAN-03002: failure during compilation of command

RMAN-03013: command type: restore

RMAN-03002: failure during compilation of command

RMAN-03013: command type: IRESTORE

RMAN-06026: some targets not found - aborting restore

RMAN-06023: no backup or copy of datafile 7 found to restore

RMAN-06023: no backup or copy of datafile 6 found to restore

RMAN-06023: no backup or copy of datafile 5 found to restore

RMAN-06023: no backup or copy of datafile 4 found to restore

RMAN-06023: no backup or copy of datafile 3 found to restore

RMAN-06023: no backup or copy of datafile 2 found to restore

RMAN-06023: no backup or copy of datafile 1 found to restore

...

A 'list backupset of database' command shows there to be multiple backups

of these files available.

Solution Description

--------------------

You have issued a 'resetlogs' prior to the last backup but before the 'Until

Time' clause in the RMAN script.

For instance,

- the last BACKUP taken of the database was June 8, 2000.

- you opened the database with RESETLOGS On June 9, at 9:08 AM,

- Then, do to complications, you decide to restore the database to a point in time on June 9, 10:30 AM.

Because you cannot roll forward through the resetlogs, RMAN cannot find any legitimate

backups to restore from within this incarnation.

o The solution is to reset database incarnation to previous incarnation and

set the 'until time' clause to a time before the resetlogs.

Explanation

-----------

You need to check the incarnation of the database:

rman>list incarnation of database;

RMAN-03022: compiling command: list

List of Database Incarnations

DB Key Inc Key DB Name DB ID CUR Reset SCN Reset Time

------- ------- ------------------------------ ---------------- --- ---------- ----------

1 2 <backup name> 4094805351 NO 159907 28-apr-2000:10:24:43

1 461 <backup name> 4094805351 NO 220532 09-jun-2000:08:22:08

1 521 <backup name> 4094805351 YES 220693 09-jun-2000:09:08:20

If the current incarnation reset time falls between the last backup and

the time specified for 'Set Time,' then the recovery catalog acknowledges

that there are no backups that match the time criteria specified, and errors

out with RMAN-6023.

Search Words

------------

BACKUP, COPY, RECOVER, TARGETS

REFERENCES

↧

Common Causes for RMAN-06023 and RMAN-06026

March 7, 2020, 7:03 pm

≫ Next: RMAN recover database fails RMAN-6025 - v$archived_log.next_change# is 281474976710655

≪ Previous: RMAN-6026 RMAN-6023 During RESTORE Operation

PURPOSE

This document describes known root-causes for the error :

RMAN-06023: no backup or copy of datafile %s found to restore

RMAN-6023: no backup or copy of datafile %s found to restore

In case you may want or need more about your current topic - please also access the Backup & Recover Community of Customers and Oracle Specialists directly via:

https://community.oracle.com/community/support/oracle_database/database_...

TROUBLESHOOTING STEPS

1) General description

RMAN-06023 "no backup or copy of datafile %d found to restore"

// *Cause: A datafile, tablespace, or database restore could not proceed

// because no backup or copy of the indicated file was found.

// It may be the case that a backup or copy of this file exists but

// does not satisfy the criteria specified in the user's restore

// operands.

The error RMAN-6023 means that RMAN cannot find a backup for that datafile in its repository. The RMAN-repository is ALWAYS in the controlfile, but might be in an RMAN-catalog database aswell. So a good starting point for diagnosing the issue is a LIST BACKUP output.

Example :

RMAN> list backup of datafile 1;

--OR--

RMAN> list backup of archivelog sequence;

The backup needs to be marked as AVAILABLE and there needs to be a channel allocated for the 'Device Type' reported in the 'LIST BACKUP'

Example :

BS Key Type LV Size Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ --------------------

4 Full 537.90M DISK 00:00:25 17-JUN-2011 17:12:42

BP Key: 4 Status: AVAILABLE Compressed: NO Tag: <TAG_NAME>

Piece Name: /<DIR>/<DB_HOME>/dbs/<PIECE_NAME>

List of Datafiles in backup set 4

File LV Type Ckp SCN Ckp Time Name

---- -- ---- ---------- -------------------- ----

1 Full 975048 17-JUN-2011 17:11:39 /<DIR>/<DB_NAME>/datafile/<FILE_NAME>.dbf

However if they above is matching, than it might be one of the issues below which is causing the problems.

2) Backups available on disk / tape but not in the RMAN repository

It might be that RMAN can not find any backup to restore from and they are not shown in the 'LIST BACKUP'-output,

but the backups available on disk or tape.

So in your setup the backups are removed from the RMAN-repository (controlfile and/or catalog), but are still available on disk or tape.

There are configurations possible where this is intended behaviour.

Than the backups need to be cataloged again using the CATALOG command

Backups can only cataloged from 10g and later versions.

Example :

RMAN> catalog start with '<directory where the backups are>';

Afterwards the backups should be shown again in the 'LIST BACKUP' output

3) UNTIL TIME conversion

When an SET UNTIL TIME is being used, RMAN will convert it to an UNTIL SCN. This is an estimate as there is NO hard relation between a timestamp and an SCN. RMAN is making an estimate. Especially when a timestamp is used which is close to the end-time of the backup, than this might be an issue. If the conversion to an SCN is generating an SCN which is BEFORE the end fuzziness of the datafiles in the backup, than the backup can NOT be used.

Example :

Backup start on T1 (SCN=1000) and ends on T2 (SCN=1050), than the backup can ONLY be used if the UNTIL SCN is 1050 or higher.

So if the 'UNTIL TIME T2' is converted to SCN 1045, than this backup will NOT be used.

V$BACKUP_DATAFILE / RC_BACKUP_DATAFILE is giving more info on this.

CHECKPOINT_CHANGE# corresponds with T1

ABSOLUTE_FUZZY_CHANGE# corresponds with T2. When ABSOLUTE_FUZZY_CHANGE# is NULL, than it is the same as the CHECKPOINT_CHANGE#

There is a known RMAN issue with an incorrect UNTIL TIME conversion due to skipping the TIME-part.

Bug 9128954 RMAN IS SELECTING WRONG BACKUP WITH 'SET UNTIL'

4) Inactive thread

A RAC-database but also a Single instance database, can have multiple threads enabled. Each thread will have its own set of redologs files and will archive them.

In a thread (instance) is idle, or not started for some time, than an RMAN DUPLICATE could fail on it as it is looking for datafile or archived redologs from the inactive instance which will not be there anymore.

In addition, RAC becomes the different behavior from 11g. If a user execute 'ALTER SYSTEM ARCHIVE LOG CURRENT' on node#1 and,

even if node#2 stops, the archivelog of thread#2 is NOT archived.

it is archived on 10g even if node#2 stops.

There is an known issue related to inactive/disabled threads as handled in

Bug 9044053 RMAN DUPLICATE CAUSES RMAN-06457 WHEN USING 'UNTIL SCN' UNTIL STARTUP NODE#2.

Best practice is to drop the threads which are not used at all anymore.

SQL> select thread#, status, enabled, instance

from v$thread;

select group#, thread# from v$log;

alter database disable instance '<name>';

alter database drop logfile group <group#>;

5) Incarnation issues

5a) New incarnation added due to implicit resync

This is issue is only relevant if a Flash | Fast Recovery Area (FRA) is being used.

If 1 or more restore and recovery attempts have been done for this database and the database has been opened with RESETLOGS,

than there might be archived redologs generated for this new incarnation of the database.

During the RMAN recovery-phase, RMAN will do an catalog of all the files in the FRA, and will catalog the new archived redologs aswell.

As they belong to another incarnation, the incarnation will be added (if not there) and will be marked as CURRENT.

The recovery will than look for archived redologs of a different incarnation than intended, as the CURRENT incarnation belongs to a prior RESETLOGS-operation

The best option is to remove all the old files from the FRA eg. flashbacklogs, archivelogs, backupsets, datafiles etc.

belonging to an incarnation of a prior attempt.

Note 965122.1 RMAN RESTORE FAILS WITH RMAN-06023 BUT THERE ARE BACKUPS AVAILABLE

5b) Incarnations have the same RESETLOG_CHANGE#

There is a RMAN issue, which is causing different incarnations having the same RESETLOGS_CHANGE#. So there are multiple records in V$DATABASE_INCARNATION / RC_DATABASE_INCARNATION,

having the same RESETLOGS_CHANGE#. RMAN will loose track of which incarnation to use and might use an incorrect incarnation resulting in unexpected errors

Bug 5844752 RESTORE FAILS - CURRENT INCARNATION RESETLOGS SCN SAME AS PARENT INCARNATION

Note 727655.1 Despite Available Backups, Restore Fails with RMAN-03002:ORA-01180:Can Not Create Datafile 1

5c) Restoring from an none-current incarnation

The symptoms of this issue are closely related to the above issue (5b), but this time it is because the backups really belong to a different incarantion, than the current one.

Possible errors are :

RMAN-06026: some targets not found - aborting restore

RMAN-06023: no backup or copy of datafile 1 found to restore

RMAN-06026: some targets not found - aborting restore

RMAN-06100: no channel to restore a backup or copy of datafile 1

ORA-01180: can not create datafile 1

ORA-01110: data file 1: '+DATA/<DB_NAME>/<PATH>/<FILE_NAME>'

Check for more details :

RMAN RESTORE fails with RMAN-06023 or ORA-19505 or RMAN-06100 inspite of proper backups

Note 112248.1 RMAN-6026 RMAN-6023 During RESTORE Operation

Document 2038119.1 Resolving RMAN-06023 or RMAN-06025

6) Archivelog backup is missing

This issue is likely to occure during an RMAN-duplicate.

An RMAN DUPLICATE will use an UNTIL SCN recovery on the Auxiliary instance, to recover the database.

The end point of the recovery is specified by the UNTIL SCN, which is derived from the last archived redolog on the TARGET.

If this archive is NOT backed up, than the recovery on the AUXILIARY, and therefor the DUPLICATE, will fail on it.

There are 2 solutions for this.

Make an explicit archivelog backup, before you start the RMAN DUPLICATE. NOTE : this will only help if there are no additional archives created after the BACKUP and before the DUPLICATE starts

Specify an explicit UNTIL-clause, like UNTIL SEQUENCE. The following query might be usefull in that case:

SQL> select thread#, max(sequence#) + 1 seq#, to_char(max(first_time), 'dd-mon-yyyy hh24:mi:ss') first_time

from v$archived_log

where backup_count >= 1

group by thread#;

RMAN RESTORE fails with RMAN-06023 or ORA-19505 or RMAN-06100 inspite of proper backups

7) Corrupted or missing backups on disk

RMAN is automatically failing over to another backup, if there is an issue with the backuppiece during the restore.

Related errors might be :

ORA-19870: error reading backup piece

ORA-19587 error occurred reading %s bytes at block

ORA-19505: failed to identify file

If an older backup is found in the repository, than RMAN will continue the restore, but most likely will require more (older) archived redologs during the recovery.

Especially when the restore is done on another host and not ALL the backups are accessible on that host, than it may endup in a situation that RMAN will try to CREATE the datafile(s).

This is really an issue when this involves datafile 1, as that can NEVER be created, as it is done during a CREATE DATABASE

Example :

creating datafile fno=1 name=+DATA/<DB_NAME>/<DIR>/<FILE_NAME>

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of restore command at 04/28/2010 20:12:34

ORA-01180: can not create datafile 1

ORA-01110: data file 1: '+DATA/<DB_NAME>/<DIR>/<FILE_NAME>'

So double check why the initial restore failed and resolve that issue as the errors ORA-1180 or RMAN-6023 are just a result of the initial errors.

Note 429689.1 RMAN Restore Fails: ORA-19870 ORA-19587 ORA-27091 ORA-27067 RMAN-06026 RMAN-06023

Note 1271551.1 RMAN duplicate failing with ora-19870 ora-19612 then RMAN-06023

Note 1300586.1 RMAN-6026 RMAN-6023 when restoring to new host

8) Never backed up

8a) No backup

In 10g onwards, if RMAN is starting a restore and is using a backup from before the CREATION_SCN of a datafile, than RMAN will automaticly create the datafile.

This is the case when a datafile was added after the backup.

However through 11g Release 1, this is still an issue during an RMAN DUPLICATE, which than will fail.

Note 782317.1 Rman-06023 encountered during duplicate to point in time after datafile was added

Note 135630.1 RMAN-6026 RMAN-6023 Restoring Database

Note 779558.1 Cannot restore incremental backups using tag when datafile has been added

The error can also occur if there is No valid backup available for the specific point in time

8b) Plugged in Tablespace

The datafile is a plugged in datafile, and there has been NO backups taken after the pluggin operation.

So you need to plugin the datafile again from its source.

Note 1453090.1 RMAN-06023 during restore of a plugged in datafile

9) Backup piece are readonly

The backuppiece are readonly at the operating system level.

Make the Rman backuppiece not only readable but also writable .See Bug 5412531 for other details

Fixed Version : -11gR1

Applicable only for Operating system IBM AIX on POWER Systems .

10) Known Defects

For known issues reference bugs in the next articles:

Doc ID 48182.1 OERR: RMAN-6023 "no backup or copy of datafile %d found to restore"

Doc ID 48185.1 OERR: RMAN-6026 "some targets not found - aborting restore"

↧

RMAN recover database fails RMAN-6025 - v$archived_log.next_change# is 281474976710655

March 7, 2020, 7:10 pm

≫ Next: RMAN Command "RESTORE ARCHIVELOG ALL VALIDATE" Failing with RMAN-06025

≪ Previous: Common Causes for RMAN-06023 and RMAN-06026

SYMPTOMS

RMAN database recover failing with the following errors:

RMAN-06025: no backup of archived log for thread number with sequence number and starting SCN of string found to restore

Cause: An archived log restore restore could not proceed because no backup of the indicated archived log was found. It may be the case that a backup of this file exists but does not satisfy the criteria specified in the user's restore operands.

Action: None - this is an informational message. See message 6026 for further details.

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of recover command at 05/17/2010 10:22:51

RMAN-06053: unable to perform media recovery because of missing log

RMAN-06025: no backup of log thread 1 seq 2668 lowscn 1079975830 found to restore

The archivelog being requested is very old compared to current log sequence.

Inspection of V$ARCHIVED_LOG shows:

CREATOR REGISTR SEQ# FIRST_CHANGE# NEXT_CHANGE# NAME

------- ------- -------------------- -------------------- -------------------- --------------------

RMAN RMAN 2668 1079861514 281474976710655 /<path>/onlinelog/group_3.4508.718549667

To confirm if you have hit the same problem run this query against the controlfile :

SQL> select thread#, sequence#, creator, registrar, archived,

to_char(first_change#), to_char(next_change#), name

from v$archived_log

where archived='NO';

If you are using a catalog and the above query returns no rows then check the catalog:

SQL>select * from rc_database;

== note the dbinc_key of your target

SQL> select thread#, sequence#, creator, archived,

to_char(first_change#), to_char(next_change#), name

from rc_archived_log

where archived='NO' and dbinc_key=<your dbinc_key>;

CAUSE

V$ARCHIVED_LOG or RC_ARCHIVED_LOG contains entries for online redo log files.

Online redo logs are temporarily cataloged by RMAN as 'archived logs' during

FULL media recovery; they are removed from the AL table when media

recovery completes successfully. One of the online redo logs will be 'current'

at the time so the SCN range for this log when cataloged will be low_SCN to 281474976710655 (FFFFFFFFFFFF(hex)). When media recovery completes, these online entries in v$archived_log/rc_archived_log are deleted automatically by RMAN. If media recovery fails and recovery is completed via SQLPlus, these entries in AL table are not removed.

During any subsequent recovery exercise, if the start SCN for recovery is

greater than the low_SCN of any of the cataloged online redo logs, the one with

an infinite next_SCN value will always be chosen as it will always fall within

the SCN range calculated for recovery - but this 'archived log' does not really exist so RMAN fails.

SOLUTION

There is no simple solution if a catalog database is not used

Take a backup of the recovery catalog BEFORE deleting the following rows:

SQL> select * from rc_database;

== Note the dbinc_key for your target database

SQL> delete from al

where dbinc_key = <dbinc_key>

and archived = 'N';

SQL> commit;

↧

RMAN Command "RESTORE ARCHIVELOG ALL VALIDATE" Failing with RMAN-06025

March 7, 2020, 7:13 pm

≫ Next: RMAN-06054 While performing Duplicate

≪ Previous: RMAN recover database fails RMAN-6025 - v$archived_log.next_change# is 281474976710655

SYMPTOMS

RMAN command 'RESTORE ARCHIVELOG ALL VALIDATE' failing with error:

RMAN-06025: no backup of archived log for thread number with sequence number and starting SCN of string found to restore

Action: None - this is an informational message. See message 6026 for further details.

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of restore command at 01/13/2012 11:38:39

RMAN-06026: some targets not found - aborting restore

RMAN-06025: no backup of log thread 1 seq 1 lowscn 1164241 found to restore

RMAN-06025: no backup of log thread 1 seq 58 lowscn 1164240 found to restore

RMAN-06025: no backup of log thread 1 seq 57 lowscn 1164238 found to restore

CAUSE

- The issue is caused when not using catalog database or no catalog connection is used.

- "ALL" keyword in "RESTORE ARCHIVELOG ALL VALIDATE" statement does not take into account backup retention policy but tries to access all archived redo logs referenced in RMAN repository

RMAN> RESTORE ARCHIVELOG ALL VALIDATE;

Starting restore at 13-JAN-12

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=32 devtype=DISK

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of restore command at 01/13/2012 11:38:39

RMAN-06026: some targets not found - aborting restore

RMAN-06025: no backup of log thread 1 seq 1 lowscn 1164241 found to restore

RMAN-06025: no backup of log thread 1 seq 58 lowscn 1164240 found to restore

RMAN-06025: no backup of log thread 1 seq 57 lowscn 1164238 found to restore

RMAN-06025: no backup of log thread 1 seq 56 lowscn 1162285 found to restore

RMAN-06025: no backup of log thread 1 seq 55 lowscn 1162276 found to restore

RMAN-06025: no backup of log thread 1 seq 54 lowscn 1162274 found to restore

......

RMAN-06025: no backup of log thread 1 seq 3 lowscn 360493 found to restore

RMAN-06025: no backup of log thread 1 seq 2 lowscn 360490 found to restore

RMAN-06025: no backup of log thread 1 seq 1 lowscn 349389 found to restore

RMAN-06025: no backup of log thread 1 seq 21 lowscn 349388 found to restore

RMAN-06025: no backup of log thread 1 seq 20 lowscn 349382 found to restore

MAN-06025: no backup of

RMAN>

SOLUTION

Option 1:- Using only the controlfile, no catlaog database used:

Use the below syntax from RMAN command prompt, for validating archivelog backups.

RMAN> restore archivelog from time='<RECOVERY WINDOWS DAYS#>' validate;

Suppose you have set the recovery window of 7 days, then use the below command.

RMAN> show RETENTION POLICY;

RMAN configuration parameters are:

CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;

RMAN> restore archivelog from time='SYSDATE-7' validate;

Starting restore at 13-JAN-12

using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of archive log backupset

channel ORA_DISK_1: reading from backup piece <path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T111853_7JZKG717_.BKP

channel ORA_DISK_1: restored backup piece 1

piece handle=<path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T111853_7JZKG717_.BKP tag=TAG20120113T111853

channel ORA_DISK_1: validation complete, elapsed time: 00:00:02

channel ORA_DISK_1: starting validation of archive log backupset

channel ORA_DISK_1: reading from backup piece <path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T112054_7JZKL018_.BKP

channel ORA_DISK_1: restored backup piece 1

piece handle=<path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T112054_7JZKL018_.BKP tag=TAG20120113T112054

channel ORA_DISK_1: validation complete, elapsed time: 00:00:03

Finished restore at 13-JAN-12

RMAN>

Option2: If you have recovery catalog configured, connect to target database and recovery catalog, and "RESTORE ARCHIVELOG ALL VALIDATE;" works without errors.

rman target / catalog <username>/<password>@<catalog_tns>

Recovery Manager: Release 10.2.0.4.0 - Production on Fri Jan 13 11:37:11 2012

connected to target database: <dbname> (DBID=<dbid>)

connected to recovery catalog database

RMAN> RESTORE ARCHIVELOG ALL VALIDATE;

Starting restore at 13-JAN-12

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=35 devtype=DISK

channel ORA_DISK_1: starting validation of archive log backupset

channel ORA_DISK_1: reading from backup piece <path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T111853_7JZKG717_.BKP

channel ORA_DISK_1: restored backup piece 1

piece handle=<path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T111853_7JZKG717_.BKP tag=TAG20120113T111853

channel ORA_DISK_1: validation complete, elapsed time: 00:00:02

channel ORA_DISK_1: starting validation of archive log backupset

channel ORA_DISK_1: reading from backup piece v\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T112054_7JZKL018_.BKP

channel ORA_DISK_1: restored backup piece 1

piece handle=<path>\BACKUPSET\2012_01_13\O1_MF_ANNNN_TAG20120113T112054_7JZKL018_.BKP tag=TAG20120113T112054

channel ORA_DISK_1: validation complete, elapsed time: 00:00:03

Finished restore at 13-JAN-12

RMAN>

↧

RMAN-06054 While performing Duplicate

March 7, 2020, 7:17 pm

≫ Next: Can't Recover a Database Saved With the OEM Backup Facility RMAN-06169

≪ Previous: RMAN Command "RESTORE ARCHIVELOG ALL VALIDATE" Failing with RMAN-06025

SYMPTOMS

RMAN duplicate fails with error:

RMAN-06054 media recovery requesting unknown archived log for thread 1 with sequence XXXX and starting SCN of XXXXXX

RMAN-06054: media recovery requesting unknown archived log for thread string with sequence string and starting SCN of string

Cause: Media recovery is requesting a log whose existence is not recorded in the recovery catalog or target database control file.

Action: If a copy of the log is available, then add it to the recovery catalog and/or control file via a CATALOG command and then retry the RECOVER command. If not, then a point-in-time recovery up to the missing log is the only alternative and database can be opened using ALTER DATABASE OPEN RESETLOGS command.

CHANGES

CAUSE

Duplicate process takes the SCN (for until scn) from most recent backup control file checkpoint_change# and if the SCN of the control file

in the backup was higher than the SCN of the most recent archived log in backup, then RMAN recovery will be looking for archives to recover

the database as per UNTIL SCN condition and it fails.

Until 11.2.0.3, It was using Highest SCN of Archive log from the backup

Bug 21868720 - RMAN-06054 ON BACKUP BASED DUPLICATE"

SOLUTION

* Ensure the controlfile checkpoint SCN is less than last archivelog SCN in the backup.

* While taking backup for duplicate purpose, disable CONTROLFILE AUTOBACKUP and perform complete database backup, this will make sure controlfile is backed up in the middle of the backup.

Option 1:

=======

Apply patch 21868720

Option 2:

======

Workaround : Connect to catalog while doing duplicate.

Option 3:

======

While doing backup, if we disable controlfile autobackup, the controlfile backup will not be taken after the archivelog backup. Thus the archivelog SCN will be higher than the controlfile SCN.

1. Turn off controlfile autobackup:

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP OFF;

2. Perform database/archivelog backup:

run

{

allocate channel a1 device type disk format '/<path>/<datafilename>/%U';

allocate channel a2 device type disk format '/<path>/<datafilename>/%U';

allocate channel a3 device type disk format '/<path>/<datafilename>/%U';

allocate channel a4 device type disk format '/<path>/<datafilename>/%U';

allocate channel a5 device type disk format '/<path>/<datafilename>/%U';

allocate channel a6 device type disk format '/<path>/<datafilename>/%U';

backup as compressed backupset database plus archivelog;

}

3. With the above backup, perform the duplicate.

Option 4:

======

1. While performing duplicate, use a SET UNTIL clause. I.e., "SET UNTIL SCN" or "SET UNTIL SEQUENCE" explicitly. For example:

rman auxiliary /

run

{

set until SCN 1234455;

allocate auxiliary channel a1 device type disk;

allocate auxiliary channel a2 device type disk;

allocate auxiliary channel a3 device type disk;

allocate auxiliary channel a4 device type disk;

DUPLICATE DATABASE TO dev1 BACKUP LOCATION '/<path>/<datafilename>/';

}

Option 5:

=======

If we don't want to re-execute the duplicate command, complete the process manually using information in the following note:

Manual Completion of a Failed RMAN Duplicate (Note 360962.1)

NOTE: It may be required for you to manually make more archivelog files available to satisfy the until SCN condition.

↧

Can't Recover a Database Saved With the OEM Backup Facility RMAN-06169

March 7, 2020, 7:24 pm

≫ Next: Errors on rman blockrecover attempt RMAN-06026, RMAN-06023

≪ Previous: RMAN-06054 While performing Duplicate

APPLIES TO:

Oracle Database - Enterprise Edition - Version 8.1.6.0 to 10.1.0.2 [Release 8.1.6 to 10.1]

Information in this document applies to any platform.

SYMPTOMS

RMAN restore fails with RMAN-6169

RMAN-06169: could not read file header for datafile 1 error reason 15

CHANGES

CAUSE

From kcv.c

/* if the controlfile is not a backup then the controlfile checkpoint count

** stored in the file header should be less than or equal the one in the

** controlfile. If it is not, then controlfile is an old restored image

** copy */

if (KCCFHX(&hx->kcvhxft.kccftbch)->kccfhxfh.kccfhtyp != KCCTYPBC

&& fhp->kcvfhccc > fe.kccfecpc)

{ /* wrong checkpoint count */

hx->kcvhxerr = KCVHXCPC;

goto got_header;

}

#define KCVHXCPC 15 /* wrong checkpoint count */

An incorrect version of controlfile is used

The Controlfile was CURRENT but from an older COLD-backup

than the restored datafiles.

SOLUTION

Restore a more recent controlfile.

↧

Errors on rman blockrecover attempt RMAN-06026, RMAN-06023

March 7, 2020, 7:27 pm

≫ Next: "RMAN-07517: The File Header Is Corrupted Restoring Files on Another Server

≪ Previous: Can't Recover a Database Saved With the OEM Backup Facility RMAN-06169

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.2.0.3 and later

Information in this document applies to any platform.

***Checked for relevance on 09-Feb-2011***

RMAN-06026: some targets not found - aborting restore

Cause: Some of the files specified for restore could not be found. Message 6023, 6024, or 6025 is also issued to indicate which files could not be found. Some common reasons why a file can not be restored are that there is no backup or copy of the file that is known to recovery manager, or there are no backups or copies that fall within the criteria specified on the RESTORE command, or some datafile copies have been made but not cataloged.

Action: The Recovery Manager LIST command can be used to display the backups and copies that Recovery Manager knows about. Select the files to be restored from that list.

SYMPTOMS

========

Disclaimer:

NOTE: In the images and/or the document content below, the user information and environment data used represents

fictitious data from the Oracle sample schema(s), Public Documentation delivered with an Oracle database product or

other training material. Any similarity to actual environments, actual persons, living or dead, is purely coincidental and

not intended in any manner.

For the purposes of this document, the following fictitious environment is used as an example to describe the procedure:

Target: DB Name: O1RMTNYP

========

Oracle 10.2.0.3

Sun Sparc Solaris

Errors when trying to do blockrecover:

RMAN> BLOCKRECOVER CORRUPTION LIST;

RMAN-03002: failure of blockrecover command at 08/17/2009 08:10:24

RMAN-06026: some targets not found - aborting restore

RMAN-06023: no backup or copy of datafile 37 found to restore

CAUSE

There is corruption in the backup of the datafile

RMAN> list backup of datafile 37;

List of Backup Sets

===================

BS Key Type LV Size Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ --------------------

47555 Incr 0 797.68M DISK 00:12:41 08-AUG-2009 17:19:53

BP Key: 47554 Status: AVAILABLE Compressed: YES Tag: TAG20090808T170710

Piece Name: /< backup directory >/O1RMTNYP_LEVEL_0_DB_BKUP_17:05:46-08-08-2009_1_47626.rman

List of Datafiles in backup set 47555

File LV Type Ckp SCN Ckp Time Name

---- -- ---- ---------- -------------------- ----

37 0 Incr 1386008078 08-AUG-2009 17:07:14 < directory >/O1RMTNYP/dbf02/rmtd18.dbf

SQL> select set_stamp from v$backup_piece where handle like '/< backup directory >/O1RMTNYP_LEVEL_0_DB_BKUP_17:05:46-08-08-2009_1_47626%';

SET_STAMP

----------

694372032

SQL> select * from v$backup_corruption where file# = 37 and set_stamp ='694372032';

RECID STAMP SET_STAMP SET_COUNT PIECE# FILE# BLOCK#

---------- ---------- ---------- ---------- ---------- ---------- ----------

BLOCKS CORRUPTION_CHANGE# MAR CORRUPTIO

---------- ------------------ --- ---------

164 694372793 694372032 47626 1 37 178472

90 0 YES ALL ZERO

165 694372793 694372032 47626 1 37 215037

5 0 YES ALL ZERO

SOLUTION

You cannot recover blocks 178472 and 215037 using that piece as that backup-set doesn't contain a good block image of the block that you want to repair. You can repair other blocks in file# 37 other than above using the piece listed.

CORRUPTION LIST means all the blocks listed in v$database_block_corruption. If there are more than those two blocks, then you have to specify them individually in the block recover command (enhanced in 11gR1 so that one can specify a range of blocks in the syntax). If there are only those two blocks, then user has to give us a backup (and catalog those in RMAN) that was not taken using 'set maxcorrupt'.

↧

"RMAN-07517: The File Header Is Corrupted Restoring Files on Another Server

March 7, 2020, 7:29 pm

≫ Next: Rman Backup Failed with RMAN-600 [6000]

≪ Previous: Errors on rman blockrecover attempt RMAN-06026, RMAN-06023

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 to 12.2 BETA1 [Release 11.2 to 12.2]

Oracle Database Cloud Schema Service - Version N/A and later

Oracle Database Exadata Cloud Machine - Version N/A and later

Oracle Cloud Infrastructure - Database Service - Version N/A and later

Oracle Database Backup Service - Version N/A and later

Information in this document applies to any platform.

SYMPTOMS

Using RMAN to backup/restore datafiles from one server to another (same OS, same endianness) fails

RMAN-07517: Reason: The file header is corrupted

CAUSE

The source Production database, has a 32KB tablespace and an initialisation parameter defined for db_32k_cache_size,

but the target pfile didn't have the parameter db_32k_cache_size defined.

SOLUTION

Set in the pfile/spfile of the target db_32k_cache_size.

↧

Rman Backup Failed with RMAN-600 [6000]

March 7, 2020, 8:16 pm

≫ Next: RMAN RESTORE fails with RMAN-00600 [8064]

≪ Previous: "RMAN-07517: The File Header Is Corrupted Restoring Files on Another Server

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later

Oracle Database Exadata Cloud Machine - Version N/A and later

Oracle Cloud Infrastructure - Database Service - Version N/A and later

Oracle Database Cloud Exadata Service - Version N/A and later

Oracle Database Exadata Express Cloud Service - Version N/A and later

Information in this document applies to any platform.

***Checked for relevance on 22-Feb-2013***

SYMPTOMS

Backup of datafiles is complete but after it we see a failure showing:

RMAN-00600: internal error, arguments [string] [string] [string] [string] [string]

Cause: An internal error in recovery manager occurred.

Action: Contact Oracle Support Services.

DBGANY: 612 TEXTNOD = sys.dbms_backup_restore.setRmanStatusRowId(rsid=>0, rsts=>0);

DBGANY: 613 TEXTNOD = end;

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-00601: fatal error in recovery manager

RMAN-03004: fatal error during execution of command

RMAN-00600: internal error, arguments [6000] [] [] [] []

After the failure the next step on the backup script is not executed, this could be an autobackup or a backup of archives.

CAUSE

Taking a closer look at the RMAN output obtained from the backup we see earlier errors:

channel dsk1: starting piece 2 at 15-MAY-11

RMAN-03009: failure of backup command on dsk2 channel at 05/15/2011 07:29:07

ORA-19504: failed to create file "/<DIR>/<DB_NAME>/<FILE_NAME>"

ORA-27040: file create error, unable to create file

Linux-x86_64 Error: 2: No such file or directory

channel dsk2 disabled, job failed on it will be run on another channel

RMAN-03009: failure of backup command on dsk3 channel at 05/15/2011 07:29:07

ORA-19504: failed to create file "/<DIR>/FILE_NAME"

ORA-27040: file create error, unable to create file

Linux-x86_64 Error: 2: No such file or directory

channel dsk3 disabled, job failed on it will be run on another channel

There could be other errors but in this case the output directory to write backup pieces did not exist on disk for at least one of the channels

This causes the channel to be disabled and the job to be executed on another available channel.

Then once the backup is complete the channel fails to be auto-allocated to execute the next backup command

The issue has been raised on unpublished Bug:

Bug 6314281: RMAN-600: [6000] [] [] [] [] DOING "BACKUP RECOVERY AREA"

Fixed->12.2

SOLUTION

Workaround :-

------------------

run {

set autolocate off;

<... rest of backup commands ...>

}

Check for Availability of One off patch using link Patch 6314281

↧

RMAN RESTORE fails with RMAN-00600 [8064]

March 7, 2020, 8:17 pm

≫ Next: flashback query failed with ORA-01555?

≪ Previous: Rman Backup Failed with RMAN-600 [6000]

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 10.2.0.4 [Release 10.2]

Oracle Database Cloud Schema Service - Version N/A and later

Oracle Database Exadata Express Cloud Service - Version N/A and later

Oracle Database Exadata Cloud Machine - Version N/A and later

Oracle Cloud Infrastructure - Database Service - Version N/A and later

Information in this document applies to any platform.

SYMPTOMS

When RMAN backup is restored to a new database with the same datafile paths as production server, the RESTORE fails with RMAN-00600 [8064]. The error stack looks like the following:

RMAN> restore database preview;

Starting restore at 03-JUN-09

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=415 devtype=DISK

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-00601: fatal error in recovery manager

RMAN-03012: fatal error during compilation of command

RMAN-03028: fatal error code for command restore : 600

RMAN-00600: internal error, arguments [8064] [1] [<path>\<db_unique_name of restored db>\SYSTEM01.DBF]

[<path>\<db_unique_name of prod>\SYSTEM01.DBF] []

CHANGES

The path of datafiles in controlfile of production server is different from the path stored in recovery catalog:

SQL> select name from v$datafile ;

NAME

------------------------------------------------

<path>\<db_unique_name of prod>\SYSTEM01.DBF

<path>\<db_unique_name of prod>\UNDOTBS01.DBF

<path>\<db_unique_name of prod>\UNDOTBS02.DBF

.............

RMAN Connected WITHOUT recovery catalog

=================================

RMAN> report schema;

Report of database schema

List of Permanent Datafiles

===========================

File Size(MB) Tablespace RB segs Da0tafile Name

---- -------- -------------------- ------- ------------------------

1 2048 SYSTEM *** <path>\<db_unique_name of prod>\SYSTEM01.DBF

2 2048 UNDO *** <path>\<db_unique_name of prod>\UNDOTBS01.DBF

................

RMAN Connected WITH recovery catalog

=========================

RMAN> report schema;

Report of database schema

List of Permanent Datafiles

===========================

File Size(MB) Tablespace RB segs Datafile Name

---- -------- -------------------- ------- ------------------------

1 2048 SYSTEM YES <path>\<db_unique_name of previous clone activity>\SYSTEM01.DBF

2 2048 UNDO YES <path>\<db_unique_name of previous clone activity>\UNDOTBS01.DBF

................

RMAN> list incarnation;

List of Database Incarnations

DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time

------- ------- -------- ---------------- --- ---------- ----------

1 1 <db_unique_name prod> <dbid> CURRENT 1 15/AUG/08

View RC_DATAFILE will also show a different path then controlfile (V$DATAFILE) of production database.

CAUSE

While cloning the database, when an RMAN RESTORE is attempted to a new location while connected to recovery catalog, the catlaog is resynced with the new path as a result of SWITCH DATAFILE ALL command.

If the DBID/DBNAME is not changed, the backup of production database (original database) will still be successfully completed even if the location in controlfile ( V$DATAFILE) and recovery catalog (RC_DATAFILE or REPORT SCHEMA) are different. This is because the RMAN reads File# instead of NAME for backup activities.

A next cloning attempt from RMAN connected with recovery catalog on a new server, this time with original path structures as present in production database controlfile ( V$DATAFILE ) will fail with RMAN-00600 [8064], because RMAN will try to restore files on the path stored in recovery catalog (RC_DATAFILE) as opposed to V$DATAFILE of production database.

Cloning a production database should be not be attempted with RMAN connected to recovery catalog, particularly when the path/name of datafiles are changed with SET NEW NAME...SWTICH DATAFILE.. command. It resyncs the catalog with new datafile path/name which leads to the difficulty in furhter restore. It is mentioned in Oracle documentation also :

http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmrecov....

Now, the problem here is that we need to update the recovery catalog with correct file name as present in controlfile of production database to avoid any confusions.

SOLUTION

Option 1:

=======

UNREGISTER / REGISTER the database and catalog all backups :

RMAN> UNREGISTER DATABASE ;

RMAN> REGISTER DATABASE ;

RMAN> CATALOG START WITH <backup_path>\ ;

Option 2:

======

Change any property of datafiles, so that RMAN resync the schema information into recovery catalog. For example, resizing the datafile will resync the controlfile information into recovery catalog :

SQL> ALTER DATABASE DATAFILE <datafile_number> RESIZE <old_size+1> ;

In above SQL, the size of datafile is increased just by 1 byte.

Now, connect to RMAN with recovery catalog and resync catalog :

RMAN> CONNECT TARGET /

RMAN> CONNECT CATALOG <UN>/<PWD>@<TNS>

RMAN> RESYNC CATALOG ;

RMAN> REPORT SCHEMA ; # It should now show the correct path

Further RESTORE should succeed after updating the recovery catalog information with correct path/name of datafiles.

↧

flashback query failed with ORA-01555?

March 12, 2020, 2:40 am

≫ Next: devos ransomware/malware encrypted oracle database datafiles

≪ Previous: RMAN RESTORE fails with RMAN-00600 [8064]

if you deleted/updated data by mistake in oracle. and try to recover data from flashback query. Your query may fail with ora-01555 error:

SQL> l  1  declare  2  cursor c is select * from testt2 as of scn 5385449;  3  begin  4  for i in c loop  5  null;  6  end loop;  7* end;SQL> /declare*ERROR at line 1:ORA-01555: snapshot too old: rollback segment number  with name "" too smallORA-06512: at line 4

if you have no backup , then you can not recover data in this case .

we provide a better flashback service , it can recover part of deleted data . for example .

SQL> set serveroutput on;

SQL> exec better_flashback_table_save('TEST2','TESTT2',2843925,'MYTVSAVE3');

table TEST2.TESTT2 @ scn 2843925 find 5568 rows , copied to TEST2.MYTVSAVE3

PL/SQL procedure successfully completed.

the service step :

1. first expand your undo_retention parameter :

alter system set undo_retention=86400;

expand all undo datafiles

alter database datafile 'undofile' resize BIGGER_SIZE;

the first step avoid undo extent reused by system

2. we will provide procedure better_flashback_table_query , check how many rows can be recovered

3. we will provide procedure better_flashback_table_save , which will recover data and stored in new table .

4. we can also take advantage prm-dul undelete function to help you recover data . pls check https://youtu.be/hIYutqNcVBI

We provide above as a service . contact service@parnassusdata.com

↧

devos ransomware/malware encrypted oracle database datafiles

March 14, 2020, 5:57 am

≫ Next: Steps to troubleshoot Oracle RAC clusterware GIPCHA connection failure

≪ Previous: flashback query failed with ORA-01555?

one oracle database has been encrypted by devos ransomware malware ,

user can recover data from encrpted datafiles using prm-dul , reference videos: https://youtu.be/jOT6k-KF8Hg

↧

Steps to troubleshoot Oracle RAC clusterware GIPCHA connection failure

April 4, 2020, 11:38 pm

≫ Next: How to Recover Oracle Database from Loss Of Online Redo Log And ORA-312 And ORA-313

≪ Previous: devos ransomware/malware encrypted oracle database datafiles

PURPOSE

How to troubleshoot when a local process (non bootstrap client) fails to CONNECT to a peer process using the GIPCHA communication protocol?

TROUBLESHOOTING STEPS

Lets assume that P1 process running on node-1 and P2 process running on node-2 and P1 process made attempt to connect P2 process (assume the connection string is

gipcha://node-2:foo) using the gipcha communication protocol at timestamp t1 and this CONNECTION was not succeeded . Root cause the CONNECT failure reason as below.

1. Check the aliveness of P2 process and GIPCDaemon process on node-2 at timestamp t1. If they are not alive, then it means P1 tried to CONNECT to a non existent process and because of that the CONNECTION request failed.

2. If P2 process is alive but GIPCD-2 (GIPCDaemon which runs on node-2) was not alive, then we need to investigate why GIPCDaemon was not spawned by AGENT on node-2.

3. If GIPCDaemon was not running on node-1, then we need to investigate why GIPCDaemon was not spawned by AGENT on node-1.

4. If P1, P2, GIPCD-1 (GIPCDaemon which runs on node-1) and GIPCD-2 (GIPCDaemon which runs on node-2) all alive then try the below steps.

(a) Investigate whether P2 process is listening at connection string "gipcha://node-2:foo or not at timestamp t1. The log files of GIPCD-2 will help us to identifying this info.

When P2 process creates listen endpoint gipcha://node-2:foo then you will see log messages in GIPCD-2 which looks as something as below.

Example log of GIPCDaemon:

2016-05-13 03:15:22.887 :GIPCDCLT:889784064: gipcdClientThread: req from local client of type gipcdmsgtypeCreateName, endp 0000000000000105

2016-05-13 03:15:22.887 :GIPCDCLT:889784064: gipcdClientCreateName: Received type(gipcdmsgtypeCreateName), endp(0000000000000105), len(1008), buf(0x7fcd2c0ea1b8):[hostname(node-2), portstr: (foo), haname(d470-975b-4d7b-5095), retStatus(gipcretSuccess)]

2016-05-13 03:15:22.887 :GIPCDCLT:889784064: gipcdInitPortEntry: port foo entry initialized. port memid : 0000050800000000

2016-05-13 03:15:22.887 :GIPCDCLT:889784064: gipcdAddPortEntry: added port foo entry to shared memory. port mid: 0000050800000000client memid 000003c800000000, client 5983 incarnation 2

If node-2 deletes/closed the same listen endpoint, then you will see the below log messages.

Example log of GIPCDaemon:

2016-05-13 03:16:42.962 :GIPCDCLT:889784064: gipcdClientDeleteName: Received type(gipcdmsgtypeDeleteName), endp(0000000000000105), len(1008), buf(0x7fcd2c0fb728):[hostname(node-2), portstr: (foo), haname(d470-975b-4d7b-5095), retStatus(gipcretSuccess)]

2016-05-13 03:16:42.962 :GIPCDCLT:889784064: gipcdClientDeleteName: Name deleted(foo)

2016-05-13 03:16:42.962 :GIPCDCLT:889784064: gipcdDelPortEntry: port foo entry deleted from shared memory. port memid: 0000050800000000

In the GIPCDaemon logs , if you see only LISTEN endpoint creation logs but no deletion/close logs before timestamp t1, then it means P2 process successfully listening at time stamp t1.

(b) If P2 process SUCCESSFULLY listening at the LISTEN endpoint, then we need to investigate whether GIPCD-1 SUCCESSFULLY resolved the P2 connection string or not.

When P1 process makes attempt to CONNECT P2 process, then it needs to resolve the P2 connection string (gipcha://node-2:foo) and for that it sends a LOOKUP request to GIPCD-1.

Example log GIPCDaemon:

2016-05-13 03:15:52.204 :GIPCDCLT:889784064: gipcdClientThread: req from local client of type gipcdmsgtypeLookupName, endp 0000000000000105

2016-05-13 03:15:52.204 :GIPCDCLT:889784064: gipcdClientLookupName: Received type(gipcdmsgtypeLookupName), endp(0000000000000105), len(1008), buf(0x7fcd2c0e9898):[hostname(node2), portstr: (foo), haname(), retStatus(gipcretSuccess)]

If GIPCD-1 fails to resolve it locally, then it forwards the lookup request to GIPCD-2. And for forwarding the same it needs to CONNECT GIPCD-2. If GIPCD-1 already connected to GIPCD-2, then immediately it sends the LOOKUP request, if it is not connected, then it first connects to GIPCD-2 and it sends the lookup request as below.

2016-05-13 03:15:52.204 :GIPCDCLT:889784064: gipcdEnqueueMsgForNode: Enqueuing for NodeThread (gipcdReqTypeLookupName)

2016-05-13 03:15:52.204 :GIPCDNDE:887682816: gipcdProcessClientRequest: Dequeued req for host (node-2), type(gipcdReqTypeLookupName), id (0000000000000105, 0000000000000000), cookie 0x7fcd2c2c5738

2016-05-13 03:15:52.204 :GIPCDNDE:887682816: gipcdSendReq: recvd msg clnt header: (req: 0x7fcd2c0ea240 [hostname(node-2), id (0000000000000105, 0000000000000000), len(392), req cookie(00007fcd2c2c5738), type(gipcdReqTypeLookupName)])

2016-05-13 03:15:52.206 :GIPCDNDE:887682816: gipcdEnqueueSendReq: Enqueuing the msg in the pending send request for host node-2, msg 0x7fcd2c0ea240

2016-05-13 03:15:52.206 :GIPCDNDE:887682816: gipcdSendReq: Enqueued the request and waiting for connection to complete with host node-2

>>>once GIPCD-1 successfully connects to GIPCD-2, then it forwards the LOOKUP request to GIPCD-2. If GIPCD-1 fails to CONNECT GIPCD-2, then please ask GIPC team to investigate it.

2016-05-13 03:15:52.219 :GIPCDNDE:887682816: gipcdNodeThread: Connection established with hostname node-2, endp: 0000000000001b3a

2016-05-13 03:15:52.220 :GIPCDNDE:887682816: gipcdNodeSendReq: Sending using id 0000000000001b3a, (nodehdr: req:0x7fcd200b43d0 [version(4107), len(3087073280), type((uknown)), req cookie(0000000000000002)] flags 335544320 clnthdr: req: 0x7fcd2c0ea240 [hostname(node-2), id (0000000000000105, 0000000000000000), len(392), req cookie(00007fcd200b4148), type(gipcdReqTypeLookupName)])

>>>After sending the LOOKUP request to GIPCD-2, GIPCD-2 mostly comes back with LOOKUPACK request as below. If LOOKUP ACK is not received then please ask GIPC team to investigate it.

2016-05-13 03:15:52.222 :GIPCDNDE:887682816: gipcdNodeThread: Msg received from endp 0000000000001b3a, req:0x7fcd1c09f438 [version(185597952), len(600), type(gipcdReqTypeLookupNameAck), req cookie(0000000000000002)] flags 20

>>> P2 is SUCCESSFULLY listening at LISTEN endpoint but still GIPCD-2 failed to resolve the connection string, then please ask GIPC team to investigate it.

(c) If P1 process SUCCESSFULLY resolved the P2 connection string, then you will a message in P1 process which is looks as below.

2016-05-13 03:22:13.893 :GIPCHDEM:3670411008: gipchaDaemonCreateeesolveResponse: creating resolveResponse for host: node-2, port:foo, haname:475b-13b9-0f5b-aa6e, ret:0

>>>If P1 process SUCCESSFULLY resolved the connection string, then "RET" should be ZERO

(d) If P1 process SUCCESSFULLY resolved the P2 connection string but still P1 failed to CONNECT P2 , then we need to investigate whether P1 successfully fetched the P2 interfaces or not. Find the UDP interfaces of P1 and UDP interfaces of P2. When a process creates UDP interfaces for GIPCHA communication, then you will see a log message which looks as below.

2016-05-13 03:22:13.828 :GIPCHTHR:3671987968: gipchaWorkerCreateInterface: created local interface for node 'node-1', haName 'CSS_aime1adc00rqh', inf 'udp://10.232.130.185:54140' inf 0x7fcea02cffd0

If P1 Process fetches the interfaces of P2 process then you will see a log message which looks as below

2016-05-13 03:22:13.897 :GIPCHTHR:3671987968: gipchaWorkerCreateInterface: created remote interface for node 'node-2', haName '475b-13b9-0f5b-aa6e', inf 'udp://10.232.130.185:64368' inf 0x7fcea02bc540

2016-05-13 03:22:13.897 :GIPCHGEN:3671987968: gipchaWorkerAttachInterface: Interface attached inf 0x7fcea02bc540 { host 'node-2', haName '475b-13b9-0f5b-aa6e', local 0x7fcea02cffd0, ip '10.232.130.185:64368', subnet '10.232.128.0', mask '255.255.248.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0x6 }

(e) If P1 process SUCCESSFULLY resolved the P2 process connection string and it successfully fetches the correct interface of P2 process, but still connection is FAILED. Then in that case, please investigate whether P1 process SENT the CONNECT request or not, similarly check whether P2 process ACCEPTED the CONNECT request of P2 or not.

If P1 process SUCCESSFULLY sent the CONNECT request then you will see a log message which looks as below.

2016-05-13 03:29:08.591 :GIPCHAUP:1253738240: gipchaUpperConnect: initiated connect for umsg 0x7f3f2008cca0 { msg 0x7f3f2005a260, ret gipcretRequestPending (15), flags 0x6 }, msg 0x7f3f2005a260 { type gipchaMsgTypeConnect (3), srcPort '43a4-4456-3ec9-7c86', dstPort 'foo', srcCid 00000000-0000047e, cookie 00007f3f-2008cca0 } dataLen 0, endp 0x1b51380 [000000000000047e] { gipchaEndpoint : port '43a4-4456-3ec9-7c86', peer ':', srcCid 00000000-0000047e, dstCid 00000000-00000000, numSend 0, maxSend 100, groupListType 1, hagroup 0x17ea3d0, priority 0, forceAckCount 0, usrFlags 0x4000, flags 0x0 } node 0x7f3f143029e0 { host 'aime1-adc00rqh-0', haName '64fb-4631-033f-fb7a', srcLuid dc93ae6c-9bd26476, dstLuid 00000000-00000000 numInf 0, sentRegister 1, localMonitor 0, baseStream 0x7f3f14302e20 type gipchaNodeType12001 (20), nodeIncarnation 395ca385-154924a4, incarnation 2, cssIncarnation 0, roundTripTime 4294967295 lastSeenPingAck 0 nextPingId 1 latencySrc 0 latencyDst 0 flags 0x80000}

>>Please check the status of CONNECT umsg "0x7f3f2008cca0" in P1 process log

In P2 process logs check whether P2 process accepted the P1 connect or not. If P2 Process accepted the CONNECT request then you will see a log message which looks as below.

2016-05-13 03:38:45.986 :GIPCHAUP:2620307200: gipchaUpperProcessAccept: completed new hastream 0x7f7c44062a80 { host 'node-1', haName '6304-ac42-e1c3-579e' srcStreamId 00000000-00008526 dstStreamId 00000000-000006e2 , hendp (nil) haNode 0x7f7c440642c0 numInf 1, contigSeq 1, lastAck 0, lastValidAck 0, sendSeq [1 : 1], priority 0, duplicate recv 0, completed recv 0, completed send 0, total send 0, total recv 1, flags 0x1} for hendp 0x7f7c44065070 [0000000000008526] { gipchaEndpoint : port 'foo/1910-9a85-f387-c449', peer ':', srcCid 00000000-00008526, dstCid 00000000-00000000, numSend 0, maxSend 100, groupListType 1, hagroup 0x20731e0, priority 0, forceAckCount 0, usrFlags 0x4000, flags 0x0 }

(f) Even after trying all the above steps still if it is not clear why CONNECT failed, then please ask GIPC team to investigate it.

↧

How to Recover Oracle Database from Loss Of Online Redo Log And ORA-312 And ORA-313

April 6, 2020, 12:50 am

≫ Next: How Recover an Oracle Database Backup in Windows When Everything is Lost

≪ Previous: Steps to troubleshoot Oracle RAC clusterware GIPCHA connection failure

PURPOSE

This article aims at walking you through some of the common recovery scenarios after a loss of Online Redolog

SCOPE

All Oracle support Analysts, DBAs and Consultants who have a role to play in recovering an Oracle database

DETAILS

Recovering After the Loss of Online Redo Log Files: Scenarios

If a media failure has affected the online redo logs of a database, then the

appropriate recovery procedure depends on the following:

- The configuration of the online redo log: mirrored or non-mirrored

- The type of media failure : temporary or permanent

- The types of online redo log files affected by the media failure: CURRENT, ACTIVE, UNARCHIVED, or INACTIVE

- The database was shutdown normally before loss of archivelog file

1) Recovering After Losing a Member of a Multiplexed Online Redo Log Group

If the online redo log of a database is multiplexed, and if at least one member of each online redo log group is not affected by the media failure, then the database continues functioning as normal, but error messages are written to the log writer trace file and the alert_SID.log of the database.

ACTION PLAN

If the hardware problem is temporary, then correct it. The log writer process accesses the previously unavailable online redo log files as if the problem never existed.

If the hardware problem is permanent, then drop the damaged member and add a new member by using the following procedure.

To replace a damaged member of a redo log group:

Locate the filename of the damaged member in V$LOGFILE. The status is INVALID if the file is inaccessible:

SQL> SELECT GROUP#, STATUS, MEMBER FROM V$LOGFILE WHERE STATUS='INVALID';

GROUP# STATUS MEMBER

------- ----------- ---------------------

0002 INVALID /<redo log path>/<redo log name>

+ Drop the damaged member.

For example, to drop redo log member from group 2, issue:

SQL> ALTER DATABASE DROP LOGFILE MEMBER '/<redo log path>/<redo log name>';

+ Add a new member to the group.

For example, to add new redo log member to group 2, issue:

SQL> ALTER DATABASE ADD LOGFILE MEMBER '/<redo log path>/<redo log name>' TO GROUP 2;

+ If the file you want to add already exists, then it must be the same size as the other group members, and you must specify REUSE.

For example:

SQL> ALTER DATABASE ADD LOGFILE MEMBER '/<redo log path>/<redo log name>' REUSE TO GROUP 2;

2) Losing an Inactive Online Redo Log Group

If all members of an online redo log group with INACTIVE status are damaged, then the procedure depends on whether you can fix the media problem that damaged the inactive redo log group.

If the failure is ... Temporary... then Fix the problem. LGWR can reuse the redo log group when required.

If the failure is ... Permanent... then the damaged inactive online redo log group eventually halts normal database operation.

ACTION PLAN

Reinitialize the damaged group manually by issuing the "ALTER DATABASE CLEAR LOGFILE"

You can clear an inactive redo log group when the database is open or closed.

The procedure depends on whether the damaged group has been archived.

To clear an inactive, online redo log group that has been archived:

If the database is shut down, then start a new instance and mount the database:

STARTUP MOUNT

Reinitialize the damaged log group.

For example, to clear redo log group 2, issue the following statement:

ALTER DATABASE CLEAR LOGFILE GROUP 2;

Clearing Inactive, Not-Yet-Archived Redo

Clearing a not-yet-archived redo log allows it to be reused without archiving it. This action makes backups unusable if they were started before the last change in the log, unless the file was taken

offline prior to the first change in the log. Hence, if you need the cleared log file for recovery of a backup, then you cannot recover that backup. Also, it prevents complete recovery from backups due to the missing log.

To clear an inactive, online redo log group that has not been archived:

If the database is shut down, then start a new instance and mount the database:

STARTUP MOUNT

Clear the log using the UNARCHIVED keyword. For example, to clear log group 2,

issue:

ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2;

If there is an offline datafile that requires the cleared log to bring it online, then the keywords UNRECOVERABLE DATAFILE are required. The datafile and its entire tablespace have to be dropped because the redo necessary to bring it online is being cleared, and there is no copy of it.

For example enter:

ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2 UNRECOVERABLE DATAFILE;

Note: If this is performed on an Active (current) logfile an error will occur:

Immediately back up the whole database including controlfile, so that you have a backup you can use for complete recovery without relying on the cleared log group.

Failure of CLEAR LOGFILE Operation

The ALTER DATABASE CLEAR LOGFILE statement can fail with an I/O error due to media failure when it is not possible to:

* Relocate the redo log file onto alternative media by re-creating it under the currently configured redo log filename

* Reuse the currently configured log filename to re-create the redo log file because the name itself is invalid or unusable (for example, due to media failure)

In these cases, the ALTER DATABASE CLEAR LOGFILE statement (before receiving the I/O error) would have successfully informed the control file that the log was being cleared and did not require archiving.

The I/O error occurred at the step in which the CLEAR LOGFILE statement attempts to create the new redo log file and write zeros to it. This fact is reflected in V$LOG.CLEARING_CURRENT.

3) Loss of online logs after normal shutdown

You have a database in archive log mode, shutdown immediate and deleted one of the online redo logs, in this case there are only 2 groups with 1 log member in each. When you try to open the database you receive the following errors:

ora-313 open failed for members of log group 2 of thread 1.

ora-312 online log 2 thread 1 'filename'

It is not possible to recover the missing log, so the following needs to be performed!

Mount the database and check v$log to see if the deleted log is current.

- If the missing log is not current, simply drop the log group (alter database drop logfile group N).

If there are only 2 log groups then it will be necessary to add another group before dropping this one.

- If the missing log is current they should simply perform fake recovery and then open resetlogs

sql> connect <username>/<password> as sysdba

sql> startup mount

sql> recover database until cancel;

(cancel immediately)

sql> alter database open resetlogs;

Be sure the location (directory) for the online log files exists before trying to open the database. If not available then create it and rerun the resetlogs else this will give error

NOTE: If the current online log, needed for instance recovery, is lost, the database must be restored and recovered through the last available archivelog file(PITR).

↧

How Recover an Oracle Database Backup in Windows When Everything is Lost

April 6, 2020, 12:56 am

≫ Next: Recover Oracle database after disk loss

≪ Previous: How to Recover Oracle Database from Loss Of Online Redo Log And ORA-312 And ORA-313

GOAL

Which steps have to follow in order to recover a backup of a database in a windows platform when everything is lost ?

SOLUTION

NOTE: In the images and/or the document content below, the user information and environment data used represents fictitious data from the

Oracle sample schema(s), Public Documentation delivered with an Oracle database product or other training material. Any similarity to actual

environments, actual persons, living or dead, is purely coincidental and not intended in any manner.

For the purposes of this document, the following fictitious environment is used as an example to describe the procedure:

Database and SID Name: YOURDB

************

First of all we need to install the Oracle Database Software, we need to install the same database release and the same patch set level. But we must not create any database, in the installer window you

must select the option of installing software only.

Once the Oracle Database software is installed, to recover from an OS backup of your database in a windows platform, it's necessary :

1: Create an Oracle Password File

-------------------------------------------------------------

For full details on how to create a password file please refer to Oracle9i Database

Administrator's Guide.

For example: orapwd file=oraYOURDB.pwd password=<password> entries=10

2: Create an Initialization Parameter File

----------------------------------------------------------------------

Restore the init.ora file from the backup, and if you don't have the init.ora

you can use an init.ora from another database and make the necessary changes

You need setup the required parameters e.g DB_NAME, CONTROL_FILES and

directories for bdump, udump,cdump etc...

Parameter file '<ORACLE_HOME>\DATABASE\initYOURDB.ORA'

3: Restore all the database files

----------------------------------------------------------------------

Restore all the database files to the same location that they were in production database

You must restore:

-> controlfiles <to the location indicated in control_files parameter in the init...ora>

-> database files

-> Archivelog files <to the log_archive_dest directory in the init...ora>

Make sure that you have the necessary backups of database and archived redo logs

4: Create the Oracle services

--------------------------------

Create a new NT service for the duplicate database YOURDB using oradim.

C:\>oradim -new -sid YOURDB -intpwd <password> -maxusers 10 -startmode auto -pfile '<your pfile location>'

if you don't have at least one control file, you will need to recreate the control file.

But must be careful, you need to be sure that all the data files are included in

the CREATE CONTROLFILE command and that are all in the right location. Also must

assure that the redolog files can be created in the indicated location.

To recreate the control file

C:\> set ORACLE_SID=YOURDB

C:\> sqlplus "sys/<password> as sysdba"

SQL> startup nomount

SQL> CREATE CONTROLFILE REUSE DATABASE YOURDB RESETLOGS ARCHIVELOG

MAXLOGFILES 16

MAXLOGMEMBERS 3

MAXDATAFILES 100

MAXINSTANCES 8

MAXLOGHISTORY 454

LOGFILE

GROUP 1 '<log_file_name_and_location>' SIZE <size>M,

GROUP 2 '<log_file_name_and_location>' SIZE <size>M,

GROUP 3 '<log_file_name_and_location>' SIZE <size>M

DATAFILE

'<datafile_1_name_and_location>',

.....

'<datafile_1_name_and_location>'

CHARACTER SET <your_db_charset>;

You can change the CREATE control file options if you want:

* CREATE CONTROLFILE SYNTAX:

This information is fully documented in the Oracle SQL Reference Manual.

CREATE CONTROLFILE [REUSE]

DATABASE name

[LOGFILE filespec [, filespec] ...]

RESETLOGS | NORESETLOGS

[MAXLOGFILES integer]

[DATAFILE filespec [, filespec] ...]

[MAXDATAFILES integer]

[MAXINSTANCES integer]

[ARCHIVELOG | NOARCHIVELOG]

[SHARED | EXCLUSIVE]

5: Recover and Open database

-------------------------------------

C:\> set ORACLE_SID=YOURDB

C:\> sqlplus "/ as sysdba"

SQL> startup mount

SQL> recover database until cancel using backup control file;

===> apply all the archivelogs available and when you won't have

more available then type CANCEL

SQL> alter database open resetlogs;

↧