HELP - SOURCES

 

A data source represents access via JDBC to a back-end database server. Each VDB definition must have at least one default data source associated with it, which will be used if no policy dictates what database should receive a query.  Source configuration includes the connection properties, connection pooling, and Load balancing and cluster management characteristics.

Important: If using the connection pooling option in the data source tab, the default number of connections is set to 100.  To adjust this, in the connection pool properties, adjust the “maxActive” setting to the desired number of connections, please see below.

The driver field allows either a new or existing JDBC driver to be specified.  Heimdall comes with a small library of JDBC drivers that it has been tested with, but other drivers can be configured in the driver tab as needed.  When using the proxy functionality of a VDB, the tested drivers should in general be used, as some internal and version specific features of a driver are often used by the proxy to provide compatibility with the binary layer translation that occurs.

The JDBC URL configuration will normally be build through the configuration wizard, or can be copied from an existing JDBC application.  Properties on the JDBC URL can be moved to the name-value list in the connection properties, OR can remain in the JDBC URL–either works, although using the name-value list provides an easier to view list vs. a long URL.  When internally doing database type detection, the JDBC url is also used to determine what type of database is in place, for per-database behavior adjustments.

The data source username and password are used for several purposes:

  1. For health checking the data source;
  2. When performing lag detection, to write to and read from the Heimdall schema;
  3. When doing cluster detection, reading cluster variables as appropriate;
  4. When no user or password is specified in JDBC mode, this user will be used to connect to the database.

The test query option is used in a variety of situations, including health checks, and verifying that a connection is still viable after an exception.  It is also used in a variety of conditions when pooling is configured, in order to validate the connection while idle, etc.

Connection properties are database and driver specific.  When in proxy mode, a default set of properties will be inherited by the source in order to provide compatibility with the proxy mode used.  In JDBC cases, no default properties will be set by Heimdall.

For Connection pool properties, Heimdall implements connection pooling compatible with the Tomcat Connection Pool, and accepts the parameters as Tomcat would. This includes the following parameters:

abandonWhenPercentageFull
accessToUnderlyingConnectionAllowed
alternateUsernameAllowed
commitOnReturn
connectionpool_prop
dataSource
dataSourceJNDI
defaultAutoCommit
defaultCatalog
defaultReadOnly
driverClassName
fairQueue
ignoreExceptionOnPreLoad
initialSize
jdbcInterceptors
jmxEnabled
logAbandoned
logValidationErrors
maxActive
maxAge
maxIdle
maxWait
minEvictableIdleTimeMillis
minIdle
numTestsPerEvictionRun
password or pass
propagateInterruptState
removeAbandoned
removeAbandonedTimeout
rollbackOnReturn
suspectTimeout
testOnBorrow
testOnConnect
testOnReturn
testWhileIdle
timeBetweenEvictionRunsMillis
url
useDisposableConnectionFacade
useEquals
useLock
username or user
validationInterval
validationQuery
validationQueryTimeout
validatorClassName
For detailed information about each pool parameter, please see the Tomcat pool attributes page (https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html#Attributes).

Load balancing & High Availability

Heimdall provides a multi-tiered approach to supporting High Availability that supports a wider variety of databases vendors and database topologies than any other product on the market.  This includes failover logic at the JDBC driver itself, and coordinated failover via the central manager.  With it, the following aspects of database behavior can be accounted for and made highly reliable, while accounting for the limits that may be inherent in any database synchronization technology:

  • True active+active load balancing
  • One or more write-only masters
  • One or more read-only slaves
  • Hot-standby servers
  • Cold-standby servers
  • Control of pre-emption on restoration of services
  • Scripted orchestration on failures with multi-phase configuration commits
  • Automatic disaster recovery activation
  • Custom query routing

How to account for each of these will be described, and a variety of configuration scenario examples explained in detail.

Basic Theory

While simple at a high level, high availability of database servers is an extremely complex topic, and the interaction that Heimdall has with the databases is as such also complex topic.  Below are a few questions that need to be asked both in setting up the database cluster, and the answers are important in evaluating what should be done with the Heimdall configuration:

  1. Should the infrastructure have a shared-nothing approach to the data processing of queries?  This implies that all disks and disk infrastructure should be redundant.  Solutions such as Oracle RAC leverage a shared-disk infrastructure in order to simplify the task of managing data, when load balancing of the database nodes is performed.  While an excellent solution for many environments, this infrastructure is very expensive, and still leaves the risk of a storage corruption resulting in a complete outage.
  2. If not sharing the disk infrastructure, will synchronous or asynchronous replication be performed between nodes?
    1. Synchronous replication indicates that between two or more nodes, all nodes must agree that a commit can be performed, and that it completed before a response is provided to the client.  This results in a significant overhead in writes, but guarantees that database nodes are in sync, and allows a more flexible active-active approach to load balancing of write traffic.  For the purposes of configuration, a shared disk infrastructure will be considered the same as synchronous replication.
    2. Asynchronous replication is the method of executing a complete transaction on a single server, then replicating the changes to a second server.  In most cases, this works well, but can have unfortunate side-effects.
    3. Will a single write-only node be acceptable, with reads spread across nodes, or will all activity take place on a single node, with a second node acting as a pure standby node?
  3. Will the replication be unidirectional or bidirectional, i.e. will all changes be pushed from one node to one or more, or will all changes on all nodes be pushed to all other nodes.

An excellent article on the issues with Asynchronous replication and load balancing can be found at: http://datacharmer.blogspot.com/2013/03/multi-master-data-conflicts-part-1.html, and covers much of what is necessary to help answer these questions and an overview of the issues related with them.

Once the question of if the server is doing synchronous replication or asynchronous replication and unidirectional vs. bidirectional replication is answered, then the Heimdall configuration can be designed.

Basic Configuration

First, for load balancing to operate, the “enable load balancing” option must be selected.  If not, then the only server Heimdall will connect to will be the primary JDBC URL.  If load balancing is enabled however, then the jdbc URL specified in the top section of the configuration will be ignored, although used as the initially configured server in the load balancing configuration.

The next option, that of track cluster changes, enables the cluster detection and tracking logic to become active with Heimdall.  The purpose of this is to allow Heimdall to reconfigure itself based on the cluster’s configuration as it may change.  This works for the following cluster types:

  1. MySQL (stock) Master/slave replication
  2. Galera Cluster Master/Master replication
  3. Postgres Master/Read-slave replication (not PG 10 logical replication)
  4. SQL Server Always-on availability groups
  5. SQL Server Mirroring

When enabled, Heimdall will on a failure event and every 30 seconds check the state of the database cluster, and will reconfigure itself as necessary based on the cluster configuration, allowing for automatic role selection.  Please see below for the rules in determining what server takes what role (when it is not deterministic) via the target read and write capacity.

The node configurations contain a name, JDBC url, enabled and writeable flags, weight (for load balancing), and read and write capacity, as described below:

  • NAME:  A simple name used internally for tracking the individual server, and may be used in various logging operations.  When cluster tracking is used, the name will indicate the role and ID of the server.
  • URL: The URL is the JDBC URL that will be used to access this data source.  Note that when load balancing is configured, this JDBC URL overrides the normal JDBC URL.  In the case a database server has multiple ports, each can be used as an independent database in this configuration.
  • Enabled: The Enabled flag specifies if the server should be used or not.  If a configuration change is made from enabled to not enabled, then all active connections will be migrated away from the server as soon as transactions on each connection are completed.  Enabled (even if not used) also means that the server will be monitored with internal monitoring, so will allow state change scripts to operate properly.  Internal to the driver is another value similar to Enabled, called “Active”.  This may be observed in the logs during failures, with the state being changed automatically by the driver.  This value represents if the server is considered healthy or not, i.e. it is passing health checks.  This value is managed by the driver itself and can’t be set.
  • Writeable: Specifies if the server should be considered usable for queries that are flagged as writes.  If a server is not writable, then if and only if a query matches a “reader eligible” rule or specially flagged forwarded request will the server be used, i.e. read/write split.
  • Weight: Represents what share of connections each server should receive, as a random share based on the weight.  If one server was configured with a weight of 1, and another as a weight of 2, then the second will get 2x as many connections as server 1.  This is determined as a random weighted assignment–it does not account for the number of connections that a server already has, or the traffic load, and with a small number of connections may not be accurately represented.  In the case where some servers are read-only, the share allocation will only be computed including the read-only servers if the setReadOnly() method is used on a connection.  The act of using this method will force a reconnect based on the new weights, even if it was previously connected to another server.

A weight of 0 is considered a special case–it is used to designate a “server of last resort” for a data source, and allows a driver to perform an independent failover without coordinating with the central server, in configurations that allows this behavior.  In the case that several servers have a weight of 0, then all servers with a weight of zero will have traffic balanced to them during the failure in a random manner.

When a server of last resort is used, it is ONLY used until at least one server with a weight greater than 0 comes back online.  At that point, as connections complete transactions (or immediately if they are ide), the JDBC connections will again revert back to using the weighted server or servers, and the weights will again be used to compute the share they allocate.

Connection Hold Time: If no server of last resort is available, and all valid servers for a connection are in a non-active state, then the overall load balancing property of Connection Hold Time will take effect.  This value is specified in milliseconds, and represents how long we should wait until one of two events occurs:

  1. A server becomes active again;
  2. The data source configuration is changed via the central manager.

When a configuration change takes place or a server state change, all waiting threads will be woken, and will then check to see if a new connection can be made.  If not, they will again go to sleep until either of these events occur again until the hold-time has been passed, at which time an exception will be passed to the calling application.

Cluster Role Decision Making

The target read and write capacity is used to help select what server should be used for what role, in particular in the condition of a Galera cluster, as any node in theory can be a write node, but usually, it is desirable to only have one node act as the write node, one or more nodes as read nodes, and other nodes may be hot-standby or used for maintenance or reporting activities.  The logic works as follows:

For each node, a write and read capacity will be associated.  The online nodes will be ordered for write consideration based on the highest write weight, then the next, etc.  The nodes will then be selected for a write role until the target write capacity is achieved.  These nodes will also be used as read nodes unconditionally.

For the read roles, IF the write nodes don’t satisfy the desired read weight, then again, the read weights will be ordered from highest to lowest, and additional read-only nodes selected for reading until the target read capacity (including the write nodes) has been achieved. 

If any nodes are not selected as write or read nodes, then they will be flagged as disabled, and will not be used by Heimdall until after a failure happens, at which point they will be evaluated for their role again.

The goal of this is to allow a customer to decide which servers in a cluster fill what role, based on their desires.  In order to support the DB changing the role at runtime without logging into the Heimdall Management console, a table named “heimdall.servers_info” is used to mirror the settings configured on the GUI, and if a change is detected, the changes will be used to reconfigure the settings in the GUI, and can trigger a role change as well.

State Change Scripting

In order to account for the variety of different scenarios that may be necessary, Heimdall provides a generic state change scripting mechanism to allow simple scripts to orchestrate with third party tools as part of a failover.  The script should be named statechange-<source name>.(.sh | .ps1 | .bat | .py | .pl) and should be located in the install directory for the Heimdall Server component.  An example .bat script is provided in the default install package for the “dbdemo-mysql” data source.

The script is provided the state of the data source via the command line, i.e.

statechange-dbdemo-mysql.bat Primary:true,false,true,jdbc,1:mysql://mysql.heimdalldata.com:3307/tpch Secondary:false,true,false,0,jdbc:mysql://mysql.heimdalldata.com:3308/tpch

The values for each server entry are:

  • Server name
  • Enabled (true or false)
  • Healthy (true if healthy, or not monitored)
  • Writeable state (true or false)
  • Weight (0 or positive integer)
  • jdbc URL

The script needs to take these values into account and then print the desired changes to the data source configuration on the standard output of the script.  The valid commands that can be issued are:

  • debug
  • commit
  • enabled <server name> <true|false>
  • writeable <server name> <true|false>
  • weight <server name> <true|false>

The debug option results in all commands being printed as received.  Enabled, writeable and weight all adjust the respective options for the specified server.  The commit option enacts the new configuration.  The changes are made to the stored configuration as the output is returned, and made effective on a commit.  Multiple commits may be made in a single script execution, and is generally recommended.  First, the failed server should be disabled and a commit made.  Any orchestration that will then take time should then be performed, and finally, a new server should be made active.

It is possible to leverage both the scripted failover logic, as well as the cluster change logic.  In this case, the scripted failover logic will be executed first when a failover is detected, which can promote a cluster node, then the cluster change logic will execute, which can then detect the changed servers based on the cluster state.  To receive notifications of health change events, please insure the notification section in the Heimdall Management Console is configured and use the test button to verify it is working properly.