I have 2 servers in a SQL Server Fail-Over Cluster. IOW I use always-on availability groups. I run backups - full, diff and log - regularly via SQL Agent on one server only depending on which is primary. If there is a fail-over, then backups will continue on the other server. If I have to restore a database in an availability group I probably would need some combination of full, diff, and log backups from each server. Would that actually work? I test the backups weekly however I just realized that I never tested that scenario.
Let's say I have a two node AG, Server A and Server B. Server A is normally the primary and Server B is the replica. Whenever the primary fails over from A to B or from B to A, I'd like to automatically run a script that will restart the SQL Agent service on the new replica.
I've setup a two node Cluster Server (non-shared storage) with a file sharing witness. I'm testing some of the different failover scenarios to see that everything is working properly. Everything works fine until I try testing the failure of the SQL Server service. When I stop the SQL Service on the primary server, it fails over to the secondary server as expected. I then start the service on the (now) secondary server and it comes back online as the secondary server. I then try to test that the service will fail back over when I stop the service on the new primary server.
However, when I stop the service, the secondary server now shows "resolving" and never comes back online. When I bring the service back up on the primary server, the secondary now shows as secondary instead of resolving. So to see if it's something about failing over from one server to another, I do a manual failover making the original primary server the primary again and everything is as it was originally.
I then stop the service on the primary server, but the secondary server now says resolving and the AG will not become available again until I start the service on the primary server.
It seems that when I first configured the quorum it worked fine the first failover scenario, then stopped working. I then added the file sharing witness, and failover worked the first time again, but not after that. For some reason after the initial failover it won't automatically failover again after that.
Config:
Servers: Windows Server 2012 Standard SQL : SQL Server 2012 Enterprise SP1
Recently after turning on trace I restarted the sql services on a box which is configured for automatic failover availability groups. The ag has not failed over to other node. The other node was in resolving state. When the restarted server is back, the AG went back to that server. I checked the sys.availability groups field for failover property failure condition level, it's set to 1 which means service restarts should initiate the failover.
need to migrate a cluster with an AG dtabases to new data center cluster with AG.
I was wondering if is possible to do mirroring on top of the AG configuration? or what other options could be to migrate a cluster that has 3 nodes and setup the ag databases to a new datacenter.
Hi there, I am testing the db mirroring, making sure it will auto failover. I've stopped the SQL services on my principal and then I looked at the mirror db is says it's restoring. It stayed like that for 10 min before I enabled the mirroring again. Anyone knows why it's not failing over??????
Here's my setup: SQL 2005 Standard, Server 1 Principal, Server 2 Mirror & Witness.
We've recently set up a Principle, Mirror and Witness configuration with the Mirror and Witness in a separate building to the Principle. All three are part of the same domain (DMZ) and are different servers, the buildings are connected via a fiber optic cable. All servers and SQL Server instances are logged in with the same domain admin account DMZesAdmin.
Mirroring is all set-up and the databases are synchronized. Every once in a while some (not all, normally 6 out of 15) databases will switch roles and become active on the mirror. The SQL Server mirroring monitor job then reports:
Date 25/01/2007 12:37:01 Log Job History (Database Mirroring Monitor Job)
Step ID 1 Server DMZSQL01 Job Name Database Mirroring Monitor Job Step Name Duration 00:00:02 Sql Severity 16 Sql Message ID 32038 Operator Emailed Operator Net sent Operator Paged Retries Attempted 0
Message Executed as user: DMZesadmin. An internal error has occurred in the database mirroring monitor. [SQLSTATE 42000] (Error 32038). The step failed.
I have no idea, what causes the failover, it could be a slow network or a bad set-up, can anyone give me some ideas of what to do to track down the problem or any experience of what could be causing this, it happens randomly every day or three. No warning and if I go to the mirror and failover back to the principle again then it's all just fine. However I don't want half my databases working on 1 server and half on the other.
Any ideas?
Thanks Ed
UPDATE:
I've just been looking at the logs on my Mirror and at the same time it reports in this order
Error: 1479, Severity: 16, State: 1.
The mirroring connection to "TCP://DMZSQL01.dmz.local:5022" has timed out for database "WARCMedia" after 10 seconds without a response. Check the service and network connections.
Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.
Recovery is writing a checkpoint in database 'WARCMedia' (41). This is an informational message only. No user action is required.
The mirrored database "WARCMedia" is changing roles from "PRINCIPAL" to "MIRROR" due to Failover.
Database mirroring is inactive for database 'WARCMedia'. This is an informational message only. No user action is required.
...
This looks like a time out, is there any way to set the TimeOut threashold for Database mirroring or set retry intervals??
3 servers - PRINCIPAL IP: 10.2.5.31 - DNS Lookup: db-server-2.mosside.choruscall.com - MIRROR IP: 10.2.5.30 - DNS Lookup: sql-mirror.mosside.choruscall.com - WITNESS ip: 10.2.5.32 - DNS Lookup: sql-witness.mosside.choruscall.com
Each Server is running Windows Server 2003 Enterprise Edition with SQL Server 2005 Enterprise Edition. All server instances are enabled for remote connections(By default they are not). All servers have the flag 1400 traceon and have been restarted. PORT 5022 is unrestricted on network.
The server instances are connecting via certificates. Each server has an endpoint for the certificates to to connect on.
Certificate Setup Proceedure:
Principal_Host:
1. Create Master Key with Password
2. Create certificate with subject
3. Create endpoint for certificate (Listener_Port = 5022, Listener_ip = all) to connect on for database_mirroring
4. Backup Certificate (principal_cert.cer)
5. Take backed up certificate to Mirror_Host
(Reapeat Steps 1-5 for Witness and Mirror)
Mirror_Host: Create Certificate on Mirror_Host for inbound connections from Principal:
6.(On Mirror_Host) Create Login for Principal using same password in step 1 (principal_login)
7. Create user for login just created. (principal_user)
8. Create local certificate for Principal on Mirror using certificate generated by principal.
ex: Create Certificate Principal_cert Authorization Principal_user FROM FILE='c:principal_cert.cer'
9. (If an endpoint has been created already on the mirror)Grant connectiion to the login:
ex: Grant connect on endpoint::mirror_endpoint to principal_login
Repeat Steaps 6-9 for Principal and Witness Servers accordingly.
10. Import Database to SQL Server 2005 Principal Instance
11. Backup Database to disk with format
12. Backup Database log file to disk with format
13. Copy backups to mirror
14. Restore Database and log file with norecovery on Mirror_Host
15. Configre Database for Database Mirroring on Principal Server
There are two ways to do this. Via the wizzard or via the Transact-SQL window. Using the wizzard appears to work since I started using FQDN.
PROBLEM:
After configuration, everythig appears to be correct. That is, the principal displays that it is the principal and it is synchronized with the mirror. The mirror also displays that it is the mirror and it is synchronized with the principal and it is in recovery. If I failover manually, the mirror becomes the principal and the principal becomes the mirror (They form a quarum). If I disconnect the principal from the network, the mirror is supposed to form a quarum with the witness and promote itself to principal status. This is not what is happening. The witness recognizes that the principal is down and logs that info into its log file. The Mirror attempts to contact the witness but cannot log onto the machine. The Mirror Logs the following:
Error: 1438, Severity: 16, State: 2. The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.
<<<<<<<MIRROR SERVER >>>>>>>>
2007-09-06 15:08:45.32 spid23s Error: 1438, Severity: 16, State: 2. 2007-09-06 15:08:45.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry. 2007-09-06 15:09:05.32 spid23s Error: 1438, Severity: 16, State: 2. 2007-09-06 15:09:05.32 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry. 2007-09-06 15:09:25.33 spid23s Error: 1438, Severity: 16, State: 2. 2007-09-06 15:09:25.33 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry. 2007-09-06 15:09:45.34 spid23s Error: 1438, Severity: 16, State: 2. 2007-09-06 15:09:45.34 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry. 2007-09-06 15:10:05.35 spid23s Error: 1438, Severity: 16, State: 2. 2007-09-06 15:10:05.35 spid23s The server instance Witness rejected configure request; read its error log file for more information. The reason 1451, and state 3, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry. 2007-09-06 15:10:25.36 spid23s Error: 1438, Severity: 16, State: 2.
<<<<<<< WITNESS SERVER >>>>>>>>
2007-09-06 14:19:55.90 spid52 The Database Mirroring protocol transport is now listening for connections. 2007-09-06 15:07:11.64 spid24s Error: 1479, Severity: 16, State: 1. 2007-09-06 15:07:11.64 spid24s The mirroring connection to "TCP://db-server-2:5022" has timed out for database "APS_SQL_DEV" after 10 seconds without a response. Check the service and network connections. 2007-09-06 15:07:43.20 Server Error: 1474, Severity: 16, State: 1. 2007-09-06 15:07:43.20 Server Database mirroring connection error 4 '64(The specified network name is no longer available.)' for 'TCP://db-server-2:5022'. 2007-09-06 15:08:06.03 spid9s Error: 1474, Severity: 16, State: 1. 2007-09-06 15:08:06.03 spid9s Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://db-server-2:5022'.
we're running a mirrored database with High Availability for Automatic failover including a Witness instance for a web application.
When doing a manual failover on the database in Management studio, the roles are switched correctly and the database is in "Principal, Synchronized" and "Mirror, Synchronized/Restoring" mode. The web application has no problems switching servers by using client failover with the jdbc driver. There is no problem accessing the database with Management Studio.
However, if we stop the SQL service on the Principal server the role is automatically failed over to the Mirror server by the Witness. The database is then in the mode "Principal, Disconnected" which should be fine. However, accessing the database from the web application or with Management Studio yields some strange results. It is not possible to write to the database, and reading from the database works inconsistently (the web application seems like it can do it, but not from the Management Studio).
Starting the SQL service on the former Principal server makes the database go into mode "Mirror, Synchronizing/Restoring" and "Principal, Synchronizing". And it will stay that way indefinitely. There are not that many updates/transactions made to the database that can make it stay in this state, especially if you can't write to the database in the first place.
The next step taken after being stuck in this state is to stop the SQL service on the Mirror (former Principal), restart the service on the Principal (former Mirror). Accessing the database now works. The database is in mode "Principal, Disconnected". Starting the SQL service on the Mirror (former Principal) makes the database go into the normal "Principal, Synchronized" and "Mirror, Synchronized/Restoring" mode. Access to database is normal.
The same erroneous behaviour can be observed by unplugging the network cable on the Principal server, so it seems like we can only get a smooth transition by doing a manual failover.
Any ideas on what might be the problem? Has anybody experienced a similar situation?
I'd like to understand why it is not possible to automatic Failover Availability Groups using Failover Cluster Instances. I think it would be great for DR and HA. Do you understand why that limitation exists?
The link [URL] ....
SQL Server Failover Cluster Instances (FCIs) do not support automatic failover by availability groups, so any availability replica that is hosted by an FCI can only be configured for manual failover.
Hello, Does any one know, any software out there that can provide a solid failover / cluster / high availability solution for SQL Server 2000 Databases. I have tried Incepto but it requires an extra column in every single table that involves in Replication and its not gonna work
I have setup a mirror configuration with a witness to be able to use the automatic failover. The principal is DBSP01, the mirror is DBSP02 and the witness is DBSP03. I have an application running an DBCP01. When the mirroring is working, the application can connect to the database on DBSP01. I disconnect dbsp01 from the network, so that DBSP02 becomes the principal. When I try to connect the application to the database on DBSP02, the login fails. Whithout the mirroring I was able to logon to DBSP02, but as soon as it is part of the mirroring, I'm not able to connect to it anymore, whatever the state of the database is. What could be the problem? Can anybody help?
I have a 3 node 2014 AlwaysOn setup. The primary and secondary are set for automatic failover. The third node, of course, is manual (until 2016). The 2 nodes with are automatic are sitting in one datacenter, the third is in another. If the first datacenter was to go down, I would manually have to failover to the third node? What's the normal process here for having two datacenters and ensuring the availability group is always available?
In an sql authentication environment with an automatic failover in database mirroring how to you manage new logins which have been created on the principle since the start of mirroring? Since the master cannot be mirrored, and the mirror database cannot be read during mirroring (except as a snapshot) in order to find the missing logins, I assume that only after failover a script should run to create the new logins and then run sp_change_users_login . The qestions are:
1) should the script create a new login first and then run sp_change_users_login with option update_one , or should sp_change_users_login using option
Auto_Fix create the missing logins?
2) But what is the password of these users? is it initially NULL , as a consequence of sp_change_users_login? What about the SIDs?
3) Or should we bypass sp_change_users_login altogether and use
CREATE LOGIN <loginname> WITH PASSWORD = <password>, SID = <sid for same login on principal server>,...as described in http://blogs.msdn.com/chadboyd/archive/2007/01/05/login-failures-connecting-to-new-principal-after-failover-using-database-mirroring.aspx
4) What is the event that would trigger this script to run after the aitomatic failover ?
Is there a definitive MIcrosoft agreed apon and recommended method to tackle this?
Prod - shared storage between server 1 and 2 Server1: clustered SQL instance with availability group as primary Server2: Passive server for clustered instance of PROD
DR - shared storage between server 1 and 2 Server1: Clustered SQL instance with availability group as replica Server2: Passive server for clustered instance of DR
Approach 2: Using replicated SAN Prod - Server 1: Standalone instance with availability group as Primary Server 2:Standalone instance with availability group as replica
DR - Server 1: Offline until Disk group 1 (Prod server 1) has been broken and brought online at DR Server 2: Offline until Disk group 2 (Prod server 2) has been broken and brought online at DR
Both these approaches will work wont they? I have only built and played with normal availability groups across servers, not mixing it with clustered instance replicated SAN
I'm looking for a solution to have cross data center automatic failover in the event of a data center loss for highly critical databases. I would like to have local HA and also automatic failover to the DR site. This does not seem possible with AlwaysOn.
Is my only option for automatic cross data center failover to build a node in one data center and a node in the other data center with a node/FS at a third data center in order to maintain quorum? I'd like to have local HA in the mix but that doesn't seem possible.What pattern for the highest data security and also availability?
An automatic failover set exists. This set consists of a primary replica and a secondary replica (the automatic failover target) that are both configured for synchronous-commit mode and set to AUTOMATIC failover.Configured the both AG Group database automatic failover and synchronous-commit mode.But automatic Failover failed also Cluster service not started automatically at Node2. It got connected through AO Listerner after starting Node1. As below SQL Error log during shutdown Node1
Date,Source,Severity,Message 10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required. 10/27/2015 10:44:20,spid37s,Unknown,AlwaysOn Availability Groups: Local Windows Server Failover Clustering node started.
I am a C++/C# developer my SQL skills are very limited. I have a database set up for mirroring and I would like to get an e-mail notification whenever an Automatic Failover occurs. Can anyone show me how to do this using SQLServer 2005? Please provide T-SQL script sample if possible!
I have setup a database mirroring session with witness - MachineA is the principal, MachineB is the mirror, and MachineC is the witness. Each SQL Server instance is hosted on its own machine. The mirroring is working correctly. If I submit data to the database on MachineA, and then unplug the network cable on MachineA, MachineB automatically becomes the principal, and I can see the data that I originally submitted to MachineA on MachineB. All the settings are showing correctly in Management Studio.
My issue is with the SQL Native Client and a front-end application that needs to make use of this database. I have setup my front-end application to use the ODBC client and specified the failover server in both the ODBC setup and the connection string. Here is the connection string that I am using :
Everything works perfectly on my front-end application when MachineA is the principal. If I unplug the network cable on MachineA, MachineB becomes the principal, and the failover occurs correctly on the database side. The problem is that my front-end application is not able to query the database on MachineB.
BUT - if I plug the network cable back in on MachineA (making the database on MachineA the mirror), the front-end application now works and can access the principal database on MachineB. I wrote a quick tester application to verify what I am seeing, and I am convinced that this is what is happening. The mirroring is working perfectly, and everything is setup correctly. The SQL Native Client is setup correctly. The problem is that the automatic failover to MachineB that is built into the SQL Native Client only works if both servers are plugged in.
In this scenario, when I plug both servers in, I know that the front-end app is definitely pulling from MachineB (since the mirror database on MachineA is in recovery mode, it's unavailable, and the front-end app displays the server that it is pulling data from).
Am I using an out-dated SQL Native Client? The version number displayed in the ODBC configuration page is 2005.90.2047.00, and is dated 4/14/2006. Has anyone experienced this issue? I'm guessing that it's a problem in the SQL Native client, since the mirroring really seems to be working correctly.
We have a requirement to build SQL environment which will give us local high availability and disaster recovery to second site. We have two sites- Site A & Site B. We are planning to have two nodes at Site A and 2 nodes at Site B. All four nodes will be part of same Windows failover cluster. We will build two SQL Cluster, InstanceA will be clustered between the nodes at Site A Server and InstanceB will be clustered between the nodes at Site B, we will enable Always On Between the InstanceA and InstanceB and will be primary owner where data will be written on InstanceA and will be replicated to InstaceB. URL....Now we want we will have instanceC on the Site B and data will be writen from the application available on Site B, will be replicated to the instance on the Site A as replica.
I'm looking at a setup where they have server1 and server2 in a mirroring relationship with automatic failover.
Server1 is the principalThey are using transactional replication to replicate asingle databse to server3 is AWS.Distribution database is on Server1All Agents (log reader, snapshot, distributor) run on Server1Server2 has not been set up for replication...My understanding is that in this set up you would normallly place the distribution database on a separate server and enable publication on the mirror, Server2.
What happens if they failover? Replication would stop, and presumably records added while the mirror is the active database would not be marked for replication?How would they recover? Failback and reinintialize
i am preparing for always on with multiple instant.is there any consideration with multiple instant ? with each instant shall i create new availability group?
We're having an issue with an AG where the the log backup does not appear to truncate the log. symptoms..Run full backupRun transaction log backup DBCC Loginfo shows all VLFs with a status of 2sys.databases.log_reuse_wait_desc says LOG_BACKUPOPENTRANS indicates no open trans.All backups are being run on the primary.
SQL 2012 EnterprisePrimary server and 2 x secondary serversWindows 2012 R2
I have removed one of the database from availability group by mistake. Luckily I am still operational with primary server. database on secondary servers are on restoring mode.
I have done full backup of database from primary (prod) server and restored on secondary servers with no recovery when I add database into availability group I get an error message log missing what is the best method to achieve and add database again into availability group.
Note I cannot restore database on primary server as it is on production
I've got an availability group with multiple databases, replicating to multiple secondary servers. On one of the secondary servers, some of the databases are not synchronising, and when we try re-establish the sync we get an LSN error. I can't see any obvious way to re-establish only one database on one secondary without affecting all databases on that secondary or affecting that database on all secondary nodes.
The options I seem to have are to either remove the database and then re-add it, in which case this affects all secondary replicas, or to remove the secondary replica and add it, in which case all the DBs are added.
PROD1(cluster 1) Clustered SQL instance1 PROD2(cluster 1) DR 1 (cluster 2) Clustered SQL insatace 2 DR 2 (cluster 2)
I have set an availability group up from the PROD instance to the DR instance.How does the AG behave if a SQL instance fails at PROD? Does it try to fail over to Node 2 on Prod before going over to DR? or bring the Replica at DR online straight away? Can we only use Manual Failover of the AG in this scenario to make use of the High Availability of the Windows cluster?
I have set up a couple of servers in a SQL 2012 AlwaysOn Availability Group (non FCI). I have also configured a Listener which enables SQL clients to connect to the server currently servicing the database, as expected.
I would also like non SQL clients to be able to connect to the server currently hosting the database so that they can run scripts sitting in a share. I don't have a shared disk so just have a directory share on each server with the same scripts in each directory.
I am able to ping and RDP to the listener IP address/name and end up on the correct server but am unable to connect to the share ListenerNameShare. Is that actually supported? If it is, any thoughts on what I need to do to get it going. If it isn't what other options do I have?
So I have Availability groups configured as well as the Availability Group Listener, what If I want to change the port that the listener is listening on, do I need to reboot the server or is it dynamic across the board ?