HELP - RULES

A rule list is a set of rules that is processed in order on queries, and dictates the processing done on a particular query and its response.

The regex field allows a regex based on the re2j variation of regex (https://github.com/google/re2j).  (1.0.13) Additionally, the syntax of “tables:<table1>,<table2>,…” can be used to explicitly match based on a queries tables instead of the query match.  These should be fully qualified table names, i.e. catalog.table format–if in doubt, use the “printtables” option below to debug the table names detected by a rule.  If any table in the list matches any table in the query, the rule will be considered a match. Upon a match, each rule in the list can currently implement one of the following actions.

The release version of each action or parameter is noted beside its name.

  • Allow (1.0.4):  Matching queries will bypass the “Learn Pattern” and “Drop” policy types, even if they would otherwise match.  Used for white-listing queries so they are not processed by the “Learn Pattern” or “Drop” actions.
  • Async Execute (1.0):  When a DML/update query matches, processing will immediately be returned to the calling program, and the query will be executed on another thread and connection, reducing processing delay.  Async execution is table aware, and will insure a “first in, first out” processing against a table.  If another query (even a SELECT) is made against the same table, it will be buffered until the DML operation completes, as long as the query is on the same connection as the original DML operation.  This insures that the operation of “insert, select” will have the proper content returned to the user.  Care should be taken with this option, as it can’t report back the number of rows modified, which may be necessary, depending on the calling code.  This feature does not work with the MySQL proxy access mode.

    • source (1.0.13):  Like with the “forward” action, specifies the source to perform the query against
    • asyncsize (1.0):  The maximum size of a query to be allowed to be made async.  Defaults to 32768 bytes.
  • Cache (1.0):  Enable caching for this query (see below for details and options).  Note:  Heimdall will only cache queries that have at least one table associated with it, i.e. “SELECT 1” will not cache.  To override this, please use the “tables” option and specify the name of “none” if no tables are to be used for invalidation.

    • ttl (1.0):  Time in ms to cache, will expire sooner on an eviction.
  • Call (1.0.4):  Allows an alternate rule list to be executed (called).  Upon termination, the calling rule list will continue executing.
    • rule (1.0.4):  the name of the rule list to execute.  This must be specified in the VDB definition as well, or it will not be executed.
  • Drop (1.0):  Matching queries will receive a SQLException, and no other processing is done as a result of this query.
  • Forward (1.0):  Allows forwarding of matching queries to an alternate data source.  Upon a match, rule processing is terminated, and the query is forwarded to the desired data source.
    • source (1.0):  the name of the data source to forward the query to.  This source must be specified in the VDB definition, or the forward rule will be ignored.
    • setcatalog (1.0.13):  If the catalog of the current connection does not match the catalog on the target connection, if set to true, the catalog will be set to match the current (parent) connection before executing the query.  This should be set to false if the catalog on the target data source does not match.
    • readonly (1.0.13):  When forwarding, if a read-only server should be considered as part of load balancing decisions for the queries forwarded.  This is useful for read/write split handling.
  • Ignore (1.0.3):  Allows any following policy of the named type to be ignored for processing.  Useful to create exclusions to other rules, i.e. a no-cache policy for example.
    • action (1.0.3):  the action type to no longer match on.  This is a case-insensative value matching the action type in the GUI.  For example, to ignore any cache rules that match later, enter the action of “Cache”.  This allows exceptions to later cache rules to be added without creating complex regular expressions that exclude a subset of something that should otherwise match.  The names of any rule type to be later ignored can be entered, and the rules will be processed in the order listed.
  • Log (1.0):  Specify that the query should be logged.
  • Learn Pattern (1.0.4):  On a match, if no rule in the named rule list exactly matches the query pattern’s regular expression (as internally generated) and rule type, then a new rule will be created and pushed to all driver instances.  The learn action in conjunction with allow and drop actions allows the configuration of a high performance SQL firewall.  For a non-enforcing firewall, the notify parameter can be used to notify on unknown SQL patterns.

    • action (1.0.4):  the action type of the rule created, defaults to allow, even if not specified.
    • rule (1.0.4):  the name of the rule list to add the action to.  The rule will always be added to the end of the list.  If not specified, the rule will be added to the current rule list.
  • Stack Trace (1.0.5):  When a matching query completes execution, a stack trace will be generated to show the methods that resulted in it being called.
    • threshold (1.0.5):  Sets a response time limit (in microseconds), under which the stack trace will NOT be generated, to allow tracking the call path of slow queries.
  • Transform (1.0):  Allows a query to be modified/edited to a new query.
    • target (1.0):  specifies a string to replace the query with, based on regex grouping.  Each regex group can be specified in the target string in the format $1, $2, etc.  Example:  For a query of “select (.*)” a target string of “SELECT $1 limit 1” could be used.
  • Table Cache (1.0):  Specifies a table name and ttl value to override normal cache behavior (see below).
    • table (1.0):
  • Tag (1.0.3):  A no-op action for use with universal parameters only (see below).

Universal Parameters:

These parameters can be applied to any rule type:

  • capture (1.0.3):  When set, regular expression capture groups are used to edit the parameter values for the query, in order to dynamically adjust behavior.  A common use would be to include a comment at the front of a query, and then use the value in the comment part of or a parameter’s value.  Example:
    Regex:  /\* (.*) \*/
    Action:  Cache, ttl=$1
    Query:  /* ttl=30000 */ SELECT * from Table
    Result:  The query would be cached for 30 seconds
  • delay (1.0):  When a rule with a delay flag matches, induce a delay in processing. Useful to determine if a rule can have an overall impact on performance, i.e. if a delay doesn’t make a noticeable difference, then caching won’t either. Also useful in inducing delays to determine if moving the database further away will impact performance significantly.  Time units are in Microseconds starting with 1.0.13, or Milliseconds in earlier builds.
  • evicttables (1.0.4):  explicitly specify the tables to be evicted, or use the special word “all” or “*” to evict all tables on the current VDB.
  • exclude (1.0):  A ratio of queries should NOT match the policy, as expressed as 1/X. Useful to allow a small number of objects that would otherwise be a cache hit to be a cache miss instead.  Example:  A value of 10 would exclude 1 out of every 10 matches from actually matching the rule.
  • invalidate (1.4):  If a query is an update, should we invalidate the tables associated with it–defaults to true.  Set to false, cache objects will not be invalidated.
  • log (1.0.4):  Log matching query just like with a match to the log action.
  • logger (1.0.13):  Specify how log data should be logged.  Values allowed are “log4j” and “hdlog” (the default).  In the case of log4j, the log data will be output using the com.heimdall logger, which is the default used for other log information generated by the driver.  If no Log4j facility is present, it will fall-back to using stdout.  Multiple loggers can be specified, i.e. “log4j,hdlog” to log to both.
  • matchlimit (1.0.4):  Enforces a rate of X pattern matches, with up to Y seconds of burst.  To specify the burst value, use the parameter “maxburst”.  Useful to limit the amount of log data generated in a similar manner to using the sample parameter.
  • maxburst (1.0.4):  Specify the number of seconds of bursting that is allowed before clamping the rate for the ratelimit or matchlimit parameters.  Defaults to 10.
  • notify (1.0.4):  Add the ability to generate a “notify” record to the management server.  This will result in a message entry batched and e-mailed to the address configured in the admin notification section in the management server GUI.
  • notrack (1.0.13):  At times, when monitoring traffic, it is desired to ignore certain patterns from the dashboard display and statistics.  By specifying “notrack”
  • olderthan (1.0.3):  specify the time window in milliseconds to allow content to not be evicted.  Defaults to zero.  Negative values are allowed to prevent objects from being served from the cache for a period in the future.
  • printmatch (1.0.4):  Print the rule and SQL on a rule match for debugging regex matches.  Honors the various options that prevent a match such as sample or matchlimit.
  • printresults (1.0.12):  Print a representation of results that match–these may not be printed when cached via the proxy access modes.
  • printtables (1.0.3):  Trigger the driver to print the SQL and the tables associated with it on standard out, as the query is processed.  Useful for debugging table invalidation issues, and verify proper SQL parsing.  Accepted values are true or false.
  • printtiming (1.0.4):  Print the execution time of the rule.  Best used with Call rules to report the execution time of a large list of Allow rules, to determine the overall impact on performance.
  • ratelimit (1.0.4):  Enforces a rate of X queries per second, with up to Y seconds of burst.  To specify the burst value, use the parameter “maxburst”.
  • sample (1.0):  A ratio of queries should match the policy, as expressed as 1/X. Useful to limit logging to a small percentage of traffic.  Example:  A value of 10 would mean that 1 out of 10matches would be treated as a match.
  • stop (1.0.4):  Stop rule processing at this rule, if it is a match.  Accepts “true” or “yes”, all other values are considered false.
  • tables (1.0):  A setting to explicitly add to the list of tables associated with a query. Useful for tagging stored procedures with the tables they modify, or handling special cases that may not be parsed properly.  Example value “table1,table2”
  • update (1.0):  A flag to explicitly set that a matching rule is also updating a table, in case it is not auto-detected by Heimdall. Most useful for stored procedures.  Values:  true or false.

Additionally, in either a property value or the table name (for table matching), the following replacement values can be used to dynamically adjust behavior:

  • ${catalog} or ${database}:  the current catalog/(database in MySQL term) being used.
  • ${vdb}:  The VDB the traffic was received on
  • ${user}:  The JDBC user on a connection

In order to process a rule, the following steps are taken (updated for 1.0.13):

  1. Check if rulelist is enabled
  2. Iterate through rules
  3. Check if a rule is enabled
  4. Check if the in-transaction flag is set when in a transaction
  5. Check if the ruletype should be excluded for processing, due to a previous allow or ignore rule
  6. Check if the regular expression or table matches
  7. Process the capture parameter, if set
  8. Check if the rule rate matching limits are allowed
  9. Process the update, printmatch, printtables, tables, and notify flags (although no action is taken at this point)
  10. Process the rate limiting parameters (except delay–this is done only once, after caching, async and forwarding are executed)
  11. Per-action rule processing

In the case of multiple Allow rules in consecutive order, the rules internally will self-organize based on the frequency of hits observed by the driver.  The more often a rule has been hit, the further up the list it will be positioned to optimize the regex lookup performance.  Any other type of rule will be processed in the order specified, and no Allow rule will optimize around another other rule type.

Cache Behavior

Caching will trigger the query to generate a 64 bit hash of the query, and lookup in the cache index if there is a matching response. In the event there is, request processing is terminated, and the response is processed to the caller. On a cache miss, the processing continues, and on the response side of processing, the response will be stored in the cache store for later use, with a time to live based on the cache rule setting. The default cache provider is Hazelcast, and as such, the Hazelcast jar file will need to be available for class loading. This is provided in the grid library option of the install.

In the event that due to a previous cache configuration, an object was cached that would now no longer be cached, due to the order of operations, the old “stale” object will be thrown out after being pulled from the cache.  This applies to table exclusion rules, or TTL changes.  As such, unlike most systems, a change to the cache rules will not require a purge of the cache to prevent invalid data from being returned.

Each cache object is tagged with the tables the query was associated with, and internally, all tables are tracked with a last modified time, in milliseconds.  On a cache lookup, if the object was created prior to the last modified time of any table it is associated with, then the object is flushed from the cache, and repopulated.  This helps insure that even if cache policies are changing, the current policies ttl limit will be honored, even if the policy is created or changed after the object enters the cache.  Last modified data can come from any one of several sources:

  1. A DML was observed on the table directly by the driver
  2. An invalidation record with a newer last modified time was pulled from the grid cache
  3. A last modified time was pulled from the database via trigger notification

The first  two of these don’t require any additional configuration.  The third requires some configuration to setup and configure the database to support this.  Currently, sample trigger configurations are available for MySQL and MS-SQL, and can be developed as needed for other databases.  Please contact Heimdall support for assistance with this feature.

Table Caching provides a refinement to any cache policy in order to override a TTL of 0, or reduce the cache TTL on a match.  In the case of a cache value of TTL, table cache entries may override the base TTL if and only if all tables have a positive ttl specified. If a table cache entry specifies a ttl of zero, and the table is present in a query, the query will not be cached. Regex patterns do not apply on a table cache, nor are they ignored, even if the “ignore” action is used.

Logging

Log and trace entries are logged in the $serverhome/log directory, and follow the following format (CSV, and can be opened in a spreadsheet if desired):
Log timestamp (ms since epoch)
Event type, and the condition that triggers generation:
TRACE, Generated with trace actions
DONE_SQL, Generated with log actions
CONNECTION_OPEN, Generated if log connections set
CONNECTION_CLOSE, Generated if log connections set
METHOD_COMPLETE, Generated if log methods set
TRANSACTION_START, Generated if log methods set
TRANSACTION_END, Generated if log methods set
DEBUG, generated in rare error cases and if the debug query option is set on the Heimdall JDBC query string
Status flag (1 for cache hit, 0 for cache miss, 2 for cache miss (in transaction), 3 for cache hit (in transaction))
VDB name
IP address
Session id (uniquely assigned from server to identify different users)
Log entry source method type:
C: connection
S: statement
R: resultSet
P: prepared
X: callable
D: Driver level
B: Data Source
U: Unspecified type
Object id (semi-unique identifier for given object in this session)
Parent id (id of parent object. E.g. If ResultSet, parent id would be id of statement that created it)
Runtime in microseconds i.e. response time of operation generally. For connection close events is total time connection was open, and for TRANSACTION_END, it is the time since the last TRANSACTION_END or the TRANSACTION_START.
Database (name of data source)
Message – event text. For both the log (done_sql) and trace (trace) event types, all of the following fields will be inserted, but for the done_sql, many will be empty values, as the data is not tracked for this type to lower overhead. Other record types include test that represents what was done, i.e. the method, etc.
# of rows
# of rows not read
Estimate of raw byte size of Result
Estimate of Java byte size of Result (including object overhead)
Overall time (all rows processing time)
MD5 hash of data read by application
SQL statement & Parameters The SQL statement is logged in the format of “Statement~num parameters~param1~param2~”. Each parameter is replaced with a ? in the statement. The target is to simplify the parsing of query parameters, yet insure that if log records are sorted based on this portion of the message, than unique queries will be sorted together. This format also makes stripping potentially sensitive information from the records when needed.
Note: The log fields may change between major revisions of code. Each file contains a header that provides the position of each field, and should be used for parsing of the file to insure that additional or rearranged fields do not break any code depending on the format of the log file.

Example logs entries:
1429205700561, TRACE, 0, stash, 67.222.138.106, 1, R, 5, 6, 1771, mysql-stash, 407,0,39992,105112,38109,D41D8CD98F00B204E9800998ECF8427E,SELECT FILENAME,AUTHOR,ID,M D5SUM,DATEEXECUTED,ORDEREXECUTED,TAG,EXECTYPE FROM `stash`.`DATABASECHANGELOG` ORDER BY DATEEXECUTED ASC, ORDEREXECUTED ASC~0~ 1429205700575, DONE_SQL, 0, stash, 67.222.138.106, 1, S, 7, 12, 186, mysql-stash, ,,,,,,UPDATE `stash`.`DATABASECHANGELOGLOCK` SET `LOCKED` = ?, `LOCKEDBY` = NULL, `LOCKGRANTED` = NULL WHERE `ID` = ?~2~0~1~ 1429205700605, DEBUG, 0, stash, 67.222.138.106, 1, C, 12, 0, 0, mysql-stash, rollback() exception: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionExcept ion: Can’t call rollback when autocommit=true Time: -1 1429205700980, TRACE, 0, stash, 67.222.138.106, 1, R, 6, 8, 267, mysql-stash, 1,0,516,556,433,D41D8CD98F00B204E9800998ECF8427E,SELECT prop_key, prop_value FROM app_ property WHERE prop_key = ?~1~license~