Archive for the ‘Disaster Recovery’ Category

The Problem

The server was on a VM and the system drive was backed up. We had separate drives for SQL Data, SQL Logs and SQL Backups. Only the backups went to tape.
As part of the decommissioning of a server we lost all the drives on our production box.
They were replaced and the system disk restored. The SQL Backups were also restored. The data and log drives were empty.

The Solution

The restore of the system drive gave us the SQL binaries but we lacked the system databases. In our support database I had a record of where the data and log files were stored so I could recreate the directory structures. I also used the SQL Config manager to find out where the error logs went and created those directories too.
SQL wouldn’t start, however, as there was no master db. To get it going I found another instance of SQL which was at the same product level. I copied the master mdf and ldf into the appropriate drives (I had to stop this SQL instance to get these files as they can’t be copied while SQL is running). I was now able to get SQL to start for a second. It was filing now due to a lack of Tempdb.
At first I was a bit confused as Tempdb is created when SQL server starts. Then I remembered that it was created from model. I went back to that other SQL install and took a copy of model mdf and ldf.
I was now able to Start SQL up and attempt to do a restore of the local master db.
At this point I encountered another issue. To restore master you need SQL to be running in single user mode. That’s fine but there was an app which was connecting to this box every half-second and it kept stealing that single session.
To stop the app I used IPSec (http://searchwindowsserver.techtarget.com/feature/Use-IPSec-to-manage-connections) to block the ip address the app was using – I got this from the SQL error logs. Thinking about it now I might have been able to turn of TCIP and used shared memory.
With the app blocked I was able to restore maters and then MSDB. I didn’t do Model as the copy I had borrowed was from an identically set-up SQL. You may want to restore model to be on the safe side.
At this point I could start SQL up normally and from SSMS I did a restore of all the user databases.

Lessons

We are now creating an archive of Master and model mdf and ldf files. We can then use these without stopping another SQL instance.
You could also copy master and model files into the backup directory whenever you do a new build, apply a service pack or create a hot fix (ie whenever the version number changes). You could then pull these from the backup directory.

Following some 3rd party scripts to set up replication I got this error:

The Distributor has not been installed correctly. Could not enable database for publishing

To fix this I ran these three scripts.

sp_adddistributor ‘NameofDistributorServer’;
sp_adddistributiondb N’NameofDatabaseBeingReplicated’;
sp_adddistpublisher @publisher = ‘NameofDistributorServer’, @distribution_db = N’NameofDatabaseBeingReplicated’;

This then allowed me to run:

use master;
exec sp_replicationdboption N’NameofDatabaseBeingReplicated ‘, N’publish’, N’true’

When setting up replication you might get an error saying login failed.

This will show up in the SQL Error Logs as:

Login failed for domain\username

Followed by

Error: 18456, Severity: 14, State: 6

The error code means that a windows log-in was entered in the place of a SQL log-in. In the case I had it was in the Log Agent Reader security.

In the section connect to publisher I had entered a windows service account (although it says quite clearly here: using the following SQL Server Login). I didn’t see this though as I added replication through a script which I had adapted.

After dropping all replication objects and then recreating them the agents were still failing.

The error that was showing stated: server execution failed

Ran the snapshot replication agent from the command line.

To do this go to the sql folder with snapshot.exe In this case C:\Program Files\Microsoft SQL Server\80\COM

Then run snapshot.exe followed by the string found in the second step of the start agent job. In this case: -Publisher [Server2Name] -PublisherDB [replictedDatabaseName] -Distributor [Server2Name] -Publication [nameOfPublication] -DistributorSecurityMode 1

This then generated the real error: ATL71.Dll was missing from – series of directories all defined by PATH.

I did a search and found this dll in the temp directory. I copied it to system32 and re ran the agents.

They are now distributing transactions. It took a while to catch back up again though.

What it is and how it works.
Log shipping is the process of transferring transaction logs from one server to another – shipping them. It is used as a form of disaster recovery for a database. It does not have any automated fail over – transfer to the new database will involve human intervention.

Two servers are involved, the Primary server and the Secondary Server. The Primary server is the server with the database that is being protected. The secondary server holds the backup copy of the database.

The database to be protected needs to be in Full or Bulk Logged recovery mode. This is to stop the transaction logs being truncated. A full backup should then be taken and applied to the secondary server. The option No Recovery is used when restoring on the secondary server. This leaves the database in a restoring state. No log shipping is set up and transaction log backups are copied from the primary server to the secondary server and then applied to the restoring database. This has the affect of keeping the database in step with the primary version.

As an extra process login accounts need to be transferred to the secondary server. This way if the database ever has to be brought on line and used access will be the same as it is on the primary server.

How to set up Log Shipping
In this step by step guide I will set up log shipping on the DataImports database from serverA to serverB.

Create a directory on the primary server
On the primary server create a directory for the transaction log backups. The backups will be made into this directory and then copied across to a receiving directory on the secondary server.
I create this under the main maintenance or sql backup directories.
If there is not one there already create a folder called LS. Then create a sub-folder with the name of your database: \\serverA\m$\Backups\LS\dataImports

Create a directory on the secondary server
On the secondary server create a shared directory for receiving the log shipped files.
Again I create this main maintenance or sql backup directories.
If there is not one there already create a folder called LS. Then create a sub-folder with the name of your database: file://serverb/f$/Maint/LS/dataImports

Exclude the database from TLog backups
If you have any maintenance plans which carryout Transaction Log backups for this database you will need to exclude it.

Set recovery model of the database to Full
If the recovery model of the database is simple set it to Full. You can leave it if it is set to Bulk Insert.

Backup the database
Take a full backup of the database in to the folder you created on the primary server and copy it to the new folder on the secondary server.

Create a database on the secondary server and then restore the backup
Create a new database with the same name as the one you are going to log ship.
Now restore the database using the backup you have just taken – make sure you set the file paths to restore over the files you have just created and not try to use the paths the backup sets.
You must select the option to restore with no recovery.

Click OK.

The database name will now have a green arrow next to it and say restoring …
You can also delete the backup file you used for the restore.

Configure Log Shipping
In SSMS right click on the database name and select properties.
Click on Transaction Log Shipping. Now tick the box which says “Enable this as a primary database in a log shipping configuration”
The button ‘Backup Settings’ will now be enabled. Click this and fill out the options.

You should enter the full path to the folder on the primary server. Underneath this you should also enter the local path. In my example the dataimports data rarely changes so I have set different values from the default. I am deleting files older than 24 hours – 72 is the default. I am alerting on 2 hours where 1 is the default. Next click on the schedule button. As the database I am using does not change very often I have scheduled my backups to take place once an hour. The default is every 15 minutes.

Click OK.

You are now on the configuration screen again. There is a white box where it says secondary server instances and databases. Underneath the box click add. Click on the button at the top right which says connect. Enter the secondary server details. The database should be selected by default.

Now you are back on the secondary database settings make sure the option ‘no, the secondary database is initialised’ has been selected. Now click the Copy Files tab. In this screen you need to enter the path for the directory on the secondary server: file://serverb/f$/Maint/LS/dataImports
I change the deleted copied files option to 24 hours again. Now click schedule. I change this to run every hour again – the default is 15 minutes. This time I will also change the start time to 00:15. There is no point it trying to copy the files on the hour as the actual backup is being taken on the hour. Click Ok and that task will be scheduled.

Now click on the Restore Transaction Job task. I change the delay option to 5 minutes and the alert option to 2 hours. This is because I have a slow rate of change and so only copy every hour. Click on schedule again. This time I will set the task to run at 30 minutes past the hour and to run hourly. Click ok and then on the next screen click on again.

You are now on the properties page again.
Click ok again and the server will create the log shipping jobs. Once the two items have gone green you can click close.

Run the Log Shipping Jobs.
On the primary server there will be a job named like LSBackup_DatabaseName.
Run the LSBackup_DatabaseName job and check that a trn backup file has appeared in the folder you created on the primary server.

On the secondary server look for a job like LSCopy_PrimaryServerName_DatabaseName and run this job. You should see the trn file appear in th efoledr you created on ServerB. Now run LSRestore_PrimaryServerName_DatabaseName.

Finally Transfer Logins
You should use SSIS to create a transfer logins task to copy the logins from serverA to serverB. This way if you have to fail the database over to serverB you can enable to logins and won’t have any permissions problems.

I separate System database backups and User database backups – although the code is essentially the same.

On Some servers extra databases are added to the not in list – update statistics takes a long time to run on a large database so I will run that separately once a week.

These steps are run nightly.

Each Step is step to progress even if it fails – I don’t want to stop all backups just because one db failed something. I have emails sent to me and I also read the job history log each working day so I pick up these failures.

Step 1: Shrink log file if recovery is simple
This only shrinks the log if the database is in simple recovery mode. You shouldn’t really have to do this but I have had some problems which this cleared up for me.

DECLARE @dbname sysname, @cmd varchar(1000);
DECLARE db_recovery_cursor CURSOR FOR
SELECT [name]
from sys.databases
WHERE [name] not in(‘master’,’tempdb’,’model’,’distribution’,’msdb’)
AND is_read_only = 0
AND recovery_model_desc = ‘Simple’;
OPEN db_recovery_cursor;
FETCH NEXT FROM db_recovery_cursor INTO @dbname;
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @databaseId as int;
SELECT @databaseId = database_id
FROM sys.databases
WHERE [name] = @dbname;

DECLARE @LogName as varchar(250);
SELECT @LogName = name
FROM sys.master_files
Where type_Desc = ‘log’
AND database_id = @databaseId;

DECLARE @sql as Varchar(2000);
SET @sql = ‘Use [‘ + @dbName + ‘];’;
SET @sql = @sql + ‘Checkpoint; ‘;
SET @sql = @sql + ‘DBCC ShrinkFile([‘+ @LogName + ‘],30);’;

EXEC(@sql);
FETCH NEXT FROM db_recovery_cursor INTO @dbname;
END;
CLOSE db_recovery_cursor;
DEALLOCATE db_recovery_cursor;

Step 2: Update Usage and Checkpoint and turn off autoshrink
Update usage helps fix problems in space reporting on the database.
Some of our databases were created with auto shrink on – a big no no!

USE Master;
DECLARE @dbName varchar(255);
DECLARE @SQL varchar(255);
DECLARE dbCursor CURSOR Local Fast_Forward FOR
select [name]
From master.sys.databases
WHERE Name not in (‘master’,’msdb’,’tempdb’,’model’,’Distribution’,’Resource’)
And is_read_only = 0
And state_desc = ‘ONLINE’
Order by [name];
OPEN dbCursor;
FETCH NEXT FROM dbCursor INTO @dbName;
WHILE @@FETCH_STATUS = 0
BEGIN
Set @sql = ‘dbcc updateusage ([‘ + @dbName + ‘]) WITH NO_INFOMSGS;’;
Exec (@sql);
Set @sql = ‘Use ‘ + QuoteName(@dbName) + ‘; Checkpoint;’ ;
Exec (@sql);
SET @sql = ‘EXEC sp_dboption ”’ + @dbname + ”’, ”autoshrink”, ”False”;’;
EXEC(@sql);
Use Master;
FETCH NEXT FROM dbCursor INTO @dbName;
END;
CLOSE dbCursor;
DEALLOCATE dbCursor;

Step 3: Integrity Check
Here you may not want to run this against every database every night but if you can you should. I use the Data_purity option as we have a lot of databases that are getting imported from sql 2000

USE Master;
DECLARE @dbName varchar(255);
DECLARE @SQL varchar(255);
DECLARE dbCursor CURSOR Local Fast_Forward FOR
select [name]
From master.sys.databases
WHERE Name not in (‘master’,’msdb’,’tempdb’,’model’,’Distribution’,’Resource’)
And is_read_only = 0
And state_desc = ‘ONLINE’
Order by [name];
OPEN dbCursor;
FETCH NEXT FROM dbCursor INTO @dbName;
WHILE @@FETCH_STATUS = 0
BEGIN
Set @sql = ‘dbcc CHECKDB ([‘ + @dbName + ‘]) WITH NO_INFOMSGS , DATA_PURITY ;’;
Exec (@sql);
FETCH NEXT FROM dbCursor INTO @dbName;
END;
CLOSE dbCursor;
DEALLOCATE dbCursor;

Step 4: Update Stats
Sp_updatestats will only update the stats SQL thinks needs updating. Update statistics will up date stats whether they need it or not. Updating stats will cause stored procs to recompile – which is ok if the stats were out of date but not if they werern’t.

USE Master;
DECLARE @dbName varchar(255);
DECLARE dbCursor CURSOR Local Fast_Forward FOR
select [name] From master.sys.databases
WHERE Name not in (‘master’,’msdb’,’tempdb’,’model’,’Distribution’,’Resource’)
AND is_read_only = 0
AND state_desc = ‘ONLINE’
Order by [name];
OPEN dbCursor;
FETCH NEXT FROM dbCursor INTO @dbName;
WHILE @@FETCH_STATUS = 0
BEGIN
Exec (‘Use [‘ + @dbName + ‘] EXEC SP_updatestats ‘);
FETCH NEXT FROM dbCursor INTO @dbName;
END;
CLOSE dbCursor;
DEALLOCATE dbCursor;

Step 5: Backup and remove old backups
I delete files older than 96 hours – more accurate than x days. On some servers where space is more of a constraint this time is reduced. On out live servers backups are also copied to tape but this process runs outside of sql. Backups will only be removed if there is a corresponding live database. This means if a database is dropped backups will stay on the server until separately dropped – I like having this in case someone wants to restore it. I’ll normally manually delete them after 6 months or so.

If a database is over a certain size I stripe the backup to 4 files.

USE Master;

Declare @delDate as VARCHAR(50);
Set @delDate = CAST(DATEADD(hh, -96, GETDATE()) AS VARCHAR(50));

DECLARE @rootDir as Varchar(255);
SET @rootDir = ‘M:\Backups\User\’; /*Enter path to user backups here*/

DECLARE @dbName varchar(255);
DECLARE @sqlCreateDir varchar(255);

Declare @y as char(4);
Set @y = Cast(Year(getDate()) as char(4));
Declare @m as char(2);
Set @m = Right(‘0’ + Cast(Month(getDate()) as varchar(2)),2);
Declare @d as char(2);
Set @d = Right(‘0’ + Cast(Day(getDate()) as varchar(2)),2);
Declare @date as char(10);
Set @date = @y + ‘_’ + @m + ‘_’ + @d;

Declare @h as char(2);
Set @h = Right(‘0’ + Cast(DatePart(hh,getDate()) as varchar(2)),2);
Declare @mi as char(2);
Set @mi = Right(‘0’ + Cast(DatePart(mi,getDate()) as varchar(2)),2);
Declare @time as char(5);
Set @time = @h + ‘_’ + @mi;

DECLARE dbCursor CURSOR Local Fast_Forward FOR
select [name] From master.sys.databases
WHERE state_desc = ‘ONLINE’
AND Name not in (‘master’,’msdb’,’tempdb’,’model’,’Distribution’,’Resource’)
Order by [name];
OPEN dbCursor;
FETCH NEXT FROM dbCursor INTO @dbName;
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @dirPath as varchar(250);
Set @dirPath = @rootDir + @dbName;

Set @sqlCreateDir = ‘master.dbo.xp_create_subdir N”’ + @dirPath + ””;
Exec (@sqlCreateDir);

EXECUTE master.dbo.xp_delete_file 0, @dirPath,N’bak’,@delDate;

DECLARE @size as numeric(25,0);

SELECT @size = b.backup_size
FROM master..sysdatabases d with (nolock)
left join msdb..backupset b with (nolock)
on d.name = b.database_name
AND b.backup_start_date = (select max(backup_start_date)
from msdb..backupset b2
where b.database_name = b2.database_name
and b2.type = ‘D’)
WHERE d.name = @dbName;

Declare @BakName as Varchar(250);
If @size < 2147483648BEGINSet @BakName = @dbName + '_' + @date + '_' + @time + '.bak';–Now run a backup commandDECLARE @BakCommand as varchar(1024);SET @BakCommand = 'BACKUP Database ' + QuoteName(@dbName);SET @BakCommand = @BakCommand + ' TO DISK=''' + @dirPath + '\' + @BakName + '''';SET @BakCommand = @BakCommand + ' WITH INIT, Checksum;';exec (@BakCommand);ENDELSEBEGINSet @BakName = @dbName + '_' + @date + '_' + @time;–Now run a backup commandSET @BakCommand = 'BACKUP Database ' + QuoteName(@dbName);SET @BakCommand = @BakCommand + ' TO DISK=''' + @dirPath + '\' + @BakName + 'a.bak''';SET @BakCommand = @BakCommand + ' ,DISK=''' + @dirPath + '\' + @BakName + 'b.bak''';SET @BakCommand = @BakCommand + ' WITH INIT, Checksum;';exec (@BakCommand);END;FETCH NEXT FROM dbCursor INTO @dbName; END;CLOSE dbCursor; DEALLOCATE dbCursor;

References:
Update Usage: http://msdn.microsoft.com/en-us/library/ms188414.aspx
Check Db with Data_purity: http://www.sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-How-to-tell-if-data-purity-checks-will-be-run.aspx

I don’t like the system generated maintenance plans so use this instead.

I normally go from one step to the next with total disregard as to it’s outcome. This is because I check job histories and logs each morning so if update stats or integrity checks failed I will know about it and fix it then. I also want my backups to run regardless – they stay on disk for 3 days but are also copied to tape so if a problem with integrity creeps in and isn’t noticed for > 3 days (I am on holiday) then if I can’t fix it I can use tape backups to repair almost as easily as disk backups. If this doesn’t suit you you could add an extra step which takes a backup but does not delete old backups – you just need the space for this if you can’t check for a while.

The most unsatisfying thing with this is detecting read only databases – it is easy in 2005 but haven’t found a satisfactory easy method for 2000.

I have a separate job for system dbs where NOT IN is replaced with IN. This runs at a different time of the evening

 

Step 1: Update Usage and Checkpoint

USE Master;

DECLARE @dbName varchar(255);

DECLARE @SQL varchar(255);

DECLARE dbCursor CURSOR Local Fast_Forward FOR

 select [name]

 From master..sysdatabases

 WHERE [Name] not in (‘master’,‘msdb’,‘tempdb’,‘model’,‘Distribution’,‘Resource’)

 Order by [name];

OPEN dbCursor

 FETCH NEXT FROM dbCursor INTO @dbName

WHILE @@FETCH_STATUS = 0

BEGIN

Set @sql = ‘dbcc updateusage ([‘ + @dbName + ‘]) WITH NO_INFOMSGS;’;

Exec (@sql);

Set @sql = ‘Use ‘ + QuoteName(@dbName) + ‘ Checkpoint;’;

Exec (@sql);

Use Master;

FETCH NEXT FROM dbCursor INTO @dbName;

END;

CLOSE dbCursor;

DEALLOCATE dbCursor;

 

Step 2: Integrity Check

USE Master;

DECLARE @dbName varchar(255);

DECLARE @SQL varchar(255);

DECLARE dbCursor CURSOR Local Fast_Forward FOR

select [name]

From master..sysdatabases

WHERE [Name] not in (‘master’,‘msdb’,‘tempdb’,‘model’,‘Distribution’,‘Resource’)

Order by [name];

OPEN dbCursor

FETCH NEXT FROM dbCursor INTO @dbName;

WHILE @@FETCH_STATUS = 0

BEGIN

Set @sql = ‘dbcc CHECKDB ([‘ + @dbName + ‘]) WITH NO_INFOMSGS;’;

Exec (@sql);

FETCH NEXT FROM dbCursor INTO @dbName;

END;

CLOSE dbCursor;

DEALLOCATE dbCursor;

 

Step 3: Update Stats

USE Master;

DECLARE @dbName varchar(255);

DECLARE dbCursor CURSOR Local Fast_Forward FOR

/* 1048 represents REadonly databases but it is no the only code that represents Readonly

you will need to check your dbs*/

select [name]

From master..sysdatabases

WHERE Name not in (‘master’,‘msdb’,‘tempdb’,‘model’,‘Distribution’,‘Resource’)

AND status <> 1048

Order by [name];

OPEN dbCursor

FETCH NEXT FROM dbCursor INTO @dbName

WHILE @@FETCH_STATUS = 0

BEGIN

Exec (‘Use [‘ + @dbName + ‘] EXEC sp_MSforeachtable ”UPDATE Statistics ? With Sample 50 Percent”’);

FETCH NEXT FROM dbCursor INTO @dbName;

END;

CLOSE dbCursor;

DEALLOCATE dbCursor;

 

Step 4; Backup databases

USE Master;

DECLARE @rootDir as Varchar(255);

SET @rootDir = ‘E:\Backup’; /*Enter path to user backups here*/

DECLARE @dbName varchar(255);

DECLARE dbCursor CURSOR Local Fast_Forward FOR

select [name] From master..sysdatabases

WHERE Name not in (‘master’,‘msdb’,‘tempdb’,‘model’,‘Distribution’,‘Resource’)

Order by [name];

OPEN dbCursor

FETCH NEXT FROM dbCursor INTO @dbName;

WHILE @@FETCH_STATUS = 0

BEGIN

DECLARE @BakCommand as varchar(1024)

SET @BakCommand = ‘master.dbo.xp_sqlmaint N”-S . -D ‘ + @dbName + ‘ -BkUpDB “‘+ @rootDir + ‘” -BkUpMedia DISK -CrBkSubDir -DelBkUps 3DAYS -BkExt “BAK””’;

–Print @BakCommand

EXEC (@BakCommand);

FETCH NEXT FROM dbCursor INTO @dbName;

END;

CLOSE dbCursor;

DEALLOCATE dbCursor;

http://lqqsql.blogspot.com/2009/09/sql-server-2000-maintenance-plan-non.html