SQL Server Admin 2014 :: Handling AlwaysOn Failover And Failback To Preferred Node?
Aug 25, 2015
I am running SQL 2014 2-node AlwaysON Availability groups, Enterprise Edition in our environment and 5 databases are part of AG.
Question is, sometimes AG is getting failed over to node2 but always our preferred node is node1 due to some business needs otherwise some of our jobs will fail.
So, what I looking for is, a sql script which can handle a situation wherein, for some reason, AG is failed over to node2, it should be able to detect if node1 is back online or not and if so, it should fail back to node1. How to do this using tsql query or stored proc or sql agent job ?
1. Once fail over to secondary replica, what will happen to connected session in primary node? can the session fail over to secondary seamlessly or need to re-login. what happen committed transactions which has not write to disk. 2. Assume I have always on cluster with three nodes, if primary fails, how second node make write/ read mode. 3. after fail over done to 2nd secondary node what mode in production(readonly or read write). 4. how to rollback to production primary ,will change data in secondary will get updated in primary.
if for any reason AG fails over to async node, how replication behaves? As data will not be in sync with previous primary replica, how replication will work? I think that we have to reset replication from scratch as there's a high chance subscribers might be more updated than current primary replica as failover to this node causes data loss. How to keep replication in sync without resetting up? Can we achieve this?
I have a 3 node 2014 AlwaysOn setup. The primary and secondary are set for automatic failover. The third node, of course, is manual (until 2016). The 2 nodes with are automatic are sitting in one datacenter, the third is in another. If the first datacenter was to go down, I would manually have to failover to the third node? What's the normal process here for having two datacenters and ensuring the availability group is always available?
Is there any single TSQL query which provides below info.When did my AlwaysOn Availability group failed over and from which node it failed to which new node(i.e. replica)?
We had to failover our primary db server for maintenance to our secondary replica. The primary was rebooted during maintenance. We failed back after the maintenance and one of the databases is not synchronizing.
I checked sys.dm_hadr_database_replica_states, and it is showing that it is INITIALIZING.
It has been in this state for more than 45 mins now. The last_sent_time, last_received_time, last_hardened_time and last-redone_time are all stuck with a time stamp 45 mins ago.
They haven't changed. How do i resume this database and bring it back in sync?
I tried suspending and resuming the data movement, but hasn't worked.
Currently - we have two-node A/P cluster residing on flash array. Need to leverage AlwaysOn to offload processing. Replica server with have Flash storage. Replica node has same CPU and memory footprint. 10GB connection between nodes. Anyone generating such large transaction log for 15/30 minute time period?
The MSDN doc makes it sound like after a failover of the primary, the CDC data won't "keep working" on the secondary unless you "To allow the logreader to proceed further and still have disaster recovery capacity, remove the original primary replica from the availability group using ALTER AVAILABITY GROUP <group_name> REMOVE REPLICA. Then add a new secondary replica to the availability group."
We have a few CDC tracked tables that we use and the general idea of AlwaysOn I thought was to minimize all the overhead and let things "just work" so your apps just connect and the listener re-routes everything where it needs to go.
It looks like to get this working properly an automated job /trigger would have to wait for a failover event and then kick off tasks to remove and re-add the replica and perhaps start up the CDC job on the secondary?
I would like to setup replica for one of the databases for reporting. The current environment is a 2 node cluster(active/passive). I would like to add a 3rd node that can server as a secondary replica. The secondary replica will be on asynchronous commit mode.
The database that needs to have alwayson setup has column level encryption enabled.
Other Questions,
* Do I need to backup and restore the service master key on secondary server in order to have the column level encryption to work on secondary server? * What would be preferred Quorum settings? * What is the setting for 'readable secondary' for primary and replica db? * What should be the setting for 'Connections in Primary Role' for primary and replica db? * We are trying to setup without a Listner. Do I need to setup AG Listener? Can the application exclusively use the [secondary instance name].[replica DB name] without a listener?
I have a database that is part of AlwaysOn that is filling up the transaction log drive even though I have a daily full backup and transaction logs set for every 2 hours. The backups are going from both the primary and secondary replica backuping up to the shared disk and I have the backup preferences set to the primary.
When I try to shrink the log I get 'The transaction log for database 'DB' is full due to 'LOG_BACKUP''. I have to manually backup the trans log and then shrink, why the maintenance plan backups aren't doing this even though they are "working".
We had a big issue today during maintenance work in our SQL environment.
So our environment: - 2x SQL Server 2014 Enterprise on Windows Server 2012 R2 (SRV1 and SRV2) -- Both Hyper-V VMs on different Hosts -- Both configured to an Windows Failover Cluster and AlwaysOn Availability Group (AG1) -- AG Listener: AG1_lis -- No shared storage (each Hyper-V Host has its own local storage) -- Asynchronous Mode -- SRV1 is primary, SRV2 is secondary SQL node
What happened? - Shutting down Windows on SRV2 due hardware maintenance - Cluster goes offline, AG1 goes offline -- Error message: "Stopped listening on virtual network name 'AG1_lis'." -- Error message: "The availability group database "DatabaseXY" is changing roles from "PRIMARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization."
Results? - AG1_lis wasn't available for our applications and they stopped working properly because database connection was lost!
I think, I HOPE, this is not the normale behaviour when one node is shutting down (especially the secondary node!)
I have 10 databases which are configured as principal in mirroring I need to failover all the databases as part of failover , instead of writing query each database as parner failover, is an script which will generate the databases as principal to failover ?
When I setup my listener: ListenerA...Do I need to use the instance name in it?
ListenerAInstance01 or ListenerAInstance02 depending on which SQLNode is the "active" availability group?
Am I better off to use the same instance name for both nodes, since my goal is to have all databases on both instances in the same availability group and sync'd? When SQLNode1 migrates over to SQLNode2 I will need to update the instance name in my connection string on the listener from ListenerAInstance01 to Instance02? When I connect with SSMS do I just use: ListenerAInstance01 (or 02)?
We have a 2 node clustered instance(SQL 2014) with 26 databases and we would like to enable alwayson for one of the databases for reporting (only one secondary and do not need high availability setup). I'm thinking if the reporting application/queries can explicitly connect to the secondary database(Instance namedatabase name) without using a listener and setup the secondary in asynchronous commit mode. Read about the REDO thread blocking due to reporting workload. How does this affect if I implement the secondary in this way.
I am trying to build out an AlwaysOn AG with 2 nodes each in a different subnet (in AWS if that matters), windows 2012r2 / SQL 2014 RTM
I created a AG Listener with 2 ip address, 1 for each subnet (checked that neither ip address are used). But whenever i failover the AG to the secondary, and try and connect via the listener it fails,
I am trying to connect via SSMS from the primary instance. and just time out, If i roll over to the primary i can connect no issues, I've tried playing with the connection settings, upping the time out to 30 secs, adding the MultiSubnetFailover=true. etc but not getting any joy.
What I asked for: Three Windows Server 2012 R2 machines with independent storage running a SQL Server 2014 AlwaysOn Availability Group. DB1 would be the primary, DB2 would be a synchronous replica, and DB3 would be a remote asynchronous replica.
What I was given: a two-node Windows Server 2012 R2 WSFC to run SQL Server 2014 Enterprise with shared storage and a third (remote) Windows Server 2012 R2 machine with independent storage, also with SQL Server 2014 Enterprise, to host an AlwaysOn Availability Groups asynchronous replica.
DB1 and DB2 (as Cluster1) share an E: drive. The remote DB3 has its own E: drive. Initially, DB3’s E: drive was claimed as a cluster resource and I couldn’t even see it. I’ve had several ugly days trying to make this work and have temporarily given up, installing DB3 as a standalone SQL Server that is no longer part of the WSFC and pointing everything towards that (it was originally a third node in the WSFC).
Is it possible to create an AlwaysOn Availability Group with nested clusters (i.e. create the AOAG with Cluster1 and DB3 and somehow ignore the individual nodes that comprise Cluster1)?
Having an annoying AG/AO problem with the read only routing side of it.
Let me give some specifics first:
2 SQL Server Instances, Not Clustered. Availability Group is named 'Ireland'
There is a primary Replica and a Secondary Replica, named:
'IrelandPrimary' and 'IrelandSecondary'
There is a listener configured with the name 'ListenIreland' on Port 14330 (the two 3's are correct)
Read Only Routing URLS are configured as follows: IrelandPrimary tcp://Ireland.dom.local:49891ALL IrelandSecondary tcp://Ireland.dom.local:49841ALL
So now my problem:
When I try to connect using the ApplicationIntent=Readonly; or even using -K ReadONLY in sqlcmd I get the error telling me that my connection was actively refused.
This is connecting to the Listener, not the instance itself - that works fine. I'm at a bit of a loss now.
To explain what I am trying to achieve is a for a connection to be redirected to the secondary replica when its set for read-intent.
I've just noticed that it only fails when I specify ApplicationIntent=ReadOnly; If I omit the Intent It connects to the read-write database instead.
How you are handling the replication of the many instance-level objects/items (logins, linked servers, server roles, database mail, operators, on and on) to the replicas in an AlwaysOn topology.
I'm especially curious about DBAs managing larger SQL Server environments. In my current environment, we have approximately 80 production SQL instances containing about 650 databases that require high availability and disaster recovery.
We use mirroring today and have a solid, home-grown solution for replicating the instance-level items from production to disaster recovery. AlwaysOn changes things a bit since we'll have multiple replicas and of course the database could be active on any one of those at any time. So my concern is about instance-level items being created in one instance but never deployed to the other instances participating in the AG group.
I am planning to have AlwaysON Availability Groups setup between Server 1 and Server 2
Server 1 -->Publisher-->2014 SQL Enterprise edition-->Windows Std 2012 --> Always on Primary Replica
Server 2 -->Publisher(when DR happens)-->2014 SQL Enterprise edition-->Windows Std 2012 --> Secondary Primary
Server 4 as Subscriber
Server X as Remote Distributor ..
If i create Publications on Server1 (primary replica) to subscriber 4 servcer, will the publication be created automatically in Secondary Replica Server2 ? or do i have to create manullay using GUI/T Sql on Both Servers?
I have installed 2 node windows Fail-over clusters successfully. But QUARUM Configuration is not appearing in Failover cluster manager instead "Witness: Disk (Disk Cluster 4)". I have also configured quarum configuration from Quarum "Configure Cluster QUARUM Settings". I have attached the snapshots of windows cluster configuration. Is it the issue or not. I have not got any warning and error during cluster validation while installing Windows failover cluster. I am assuming it is okay and i can move ahead to installation of SQL Failover cluster setup.
Products used for installation in Virtual Machine: Windows Server 2012 R2 SQL Server 2012 R2 Note: Service Pack is not installed.
We are not able to failover the AG to secondary replica. The process gets timed out and AG goes to resolving mode. Had to reboot the box in order to switch the AG back to primary node. We even rebuilt the whole AG from scratch but the issue remains.
Failed to bring availability group 'xxxx' online. The operation timed out. Verify that the local Windows Server Failover Clustering (WSFC) node is online. Then verify that the availability group resource exists in the WSFC cluster. If the problem persists, you might need to drop the availability group and create it again. [SQLSTATE 42000] (Error 41131). The step failed.
This is my first deployment of an always on availability group for SQL 2014 and I'm trying to get my custom backup procedure to handle all databases appropriately depending on the primary group. Basiscally I want the system databases and all databases that don't participate in the availability group to be backed up on both nodes and those that do participate backed up ONLY on the primary server. I've looked at the sys.fn_hadr_backup_is_preferred_replica funcation, but would like to only have to test for a single databases existance in the availability group. If the one database is in the group, only backup the system databases and those that don't participate, otherwise backup everydatabase. This would be the case for both full backups and transaction logs.
We have always on setup in our environment with read only replica. The primary database has 2 schema one is a dbo and other xyz. We have some store procs created in dbo schema and xyz schema. These store procs are being used by SSRS reports to retrieve the data (select only) no data changes will be made.
when we run the store proc from the read only server the storeprocs in the dbo schema run fine but xyz schema are failing with the message saying failed to update the database as this is a read only...
We are looking at going down the High Availability Always On route. However we have some concerns around the lack of support for MSDTC. In short we are concerned that developers may introduce functionality either on purpose or by mistake that uses the or escalates the Query’s to the MSTDTC. As this could result in database splitting.
Understand that this will be a moot point in SQL 2016 but for 20122014 is it possible to disable the MSDTC to protect against this and run High Availability Always On. ? Does it just need to be disabled on the SQL Server or does it need to be done on the application server ?
Recently after turning on trace I restarted the sql services on a box which is configured for automatic failover availability groups. The ag has not failed over to other node. The other node was in resolving state. When the restarted server is back, the AG went back to that server. I checked the sys.availability groups field for failover property failure condition level, it's set to 1 which means service restarts should initiate the failover.
-MS Server 2012 R2. -SQL 2014 EE. -All windows updates. -Clean install of both OS and SQL, all 3 nodes are identical. -SQL Server is running on an alternate port, which I've opened in the firewall. Connections from all network locations are working swimmingly; including connections between all 3 nodes. -I've got the groups up and running; Listener is set up correctly. Connections work great. -One node is synchronous, one is asynchronous. Both show synchronized, and synchronizing respectively. -Data added at the primary node is moved across to all 3 with lightning speed.
When I attempt a manual failover it hangs..and hangs...then pops up an error 41131 and rolls back the failover. Leaving the cluster perfectly intact and working just as it did prior to the failover attempt. What I've checked so far:
-There is absolutely NOTHING in the cluster events log. -Windows event log shows no errors, just the standard stuff of the primary nodes state changing from primary to resolving and then back again. -SQL Event log has a few things in there, but nothing that's leading me to a solution, I've attached the log from start to finish on an attempted manual failover: