Heimdall Data for Pivotal Greenplum
Heimdall Data is a SQL traffic manager that:
- Batches DML operations
- Intelligent routes to diverse workloads (i.e. OLTP, OLAP)
- Improves response times up to 100x without code changes
- Improved application response times due to fewer commits
- Improved DML scale
As explained in our technical blog, companies choose Pivotal Greenplum for its large-scale MPP (Massively Parallel Processing). However, there are some drawbacks:
- Higher latency: Many nodes must coordinate to generate an answer;
- No support for materialized views
- Frequent queries slowing down processing
The ideal solution to these drawbacks requires no code changes such as Heimdall Data. Heimdall uses a separate Postgres database to maximize Pivotal Greenplum performance:
- Intelligently routes Analytics and OLTP traffic
- Supports materialized views via Postgres
- Batches DML operations (e.g. Singleton transactions) via Postgres
- Automatically caches frequent queries
Fast materialized views is very important in analytics environments. When reports are generated, or dashboards are viewed, a subset of data is pulled from the back-end data store and processed. Heimdall provides the following functionality:
- Queries against a materialized view are routed to an alternate database (e.g. Postgres) acting on behalf of Pivotal Greenplum. Postgres answers queries offloading Greenplum.
- Heimdall triggers a refresh of the view automatically. Heimdall is aware updated views from Pivotal Greenplum and when data was loaded that may impact the view. The net result is faster reports and a lighter load on Greenplum.
With Heimdall, DML statements are aggregated to be efficiently sent from Postgres to Greenplum.
Visit our technical blog.
Heimdall Data for Pivotal Greenplum allows customers to deploy applications for both OLTP and OLAP workloads, often called HTAP (Hybrid Transactional/Analytic Processing).
Heimdall’s SQL Caching utilizes Pivotal GemFire as a look-aside cache. SQL result sets are cached into Pivotal GemFire, and are invalidated upon writes to Greenplum. Ideal for low latency performance improvement, Heimdall’s intelligent caching and invalidation logic is distributed across application servers. It is all automated requiring zero code change. View our SQL caching video.
Why Heimdall Data for Pivotal?
Heimdall Data is a Database Proxy that optimizes SQL performance for Pivotal Greenplum up to 100x. Heimdall is deployed between the application and data source providing query optimization (e.g. batch processing, materialized view offload, SQL auto-caching)
How is Heimdall Data deployed?
Heimdall Data is a transparent Database Proxy deployed in two ways:
1) VM, Docker Container or Standalone instance between the application and database
2) Sidecar process, an agent, or JDBC driver installed on each application instance to maximize performance and reduce latency:
Deployment requires no application or database changes. Just change the connection string or networking setting of the application to route through the Heimdall Database Proxy. SQL performance visibility and optimization is managed by a Heimdall Data central console.
What are other Heimdall for Pivotal Use Cases?
HTAP (Hybrid Transactional / Analytical Processing):
Problem: As a result of queries being issued by a client, it is necessary to perform (trigger) actions that perform data synchronization and/or updates to refresh data. This can be used in conjunction with query routing to aggregate or batch data to load into a large-scale data warehouse for example. In this case, the inserts may be made against a front-end fast with a small database such as Postgres, then after a period of time during which no more updates occur, the data is synchronized as a whole into the large-scale data warehouse.
HTAP Solution: Heimdall is implementing a generic system whereby actions (SQL calls or external programs) can be triggered while processing a query, and the trigger can be executed either before the matching SQL, after, or in parallel. When in “parallel” this means in another thread. Further, the parallel execution can be delayed, with repeated calls being aggregated into a single call.
Summary: Heimdall Data optimizes Greenplum performance for both query reads and writes.
Batching DML Operations: Inserting data row-by-row, with aggregation for a bulk insert into the data warehouse when there is a lull in insert activity. This allows the back-end to have low overhead, even while large amounts of individual row data is being inserted, potentially by many clients. Use-case: When using IoT devices, which are providing large amounts of data into a data warehouse, a front-end database can be used as a buffer, with a periodic flush of the data into the warehouse, without additional logic being needed at the application layer.
Automatic Materialized View Management:
Heimdall serves as a traffic manager:
- Queries against a materialized view are routed to an alternate database (e.g. Postgres) acting on behalf of Pivotal Greenplum. Postgres answers queries, offloading Greenplum. Also, frequent queries of small data sets are best deployed using a general purpose Postgres DB than Greenplum. Heimdall routes traffic to the appropriate data source for the best performance.
- Heimdall triggers a refresh of the view automatically. Heimdall is aware of updated views from Pivotal Greenplum and when data was loaded that may impact the view. The net result is faster reports and a lighter load on Greenplum.
How does Heimdall help Greenplum with frequent singleton transactions?
Heimdall routes singleton transactions through Postgres. The transactions are then batched and sent to Greeplum to optimize and offload Greenplum processing.
How does Heimdall work with Pivotal GemFire?
As a look-aside SQL results cache, Heimdall can automatically and intelligently offload queries from Greenplum and cache into GemFire.
This cache use case, Heimdall is storing SQL results into GemFire and serving out results from a corresponding query.
- Heimdall treats GemFire as a Key/Value pair: Key = query and Value = Result set
- Heimdall is NOT writing SQL into GemFire nor is GemFire used as a Read-through or Write-through cache.
- No code changes are required for automatic cache invalidation.
How does Heimdall's Automated Failover help Pivotal Greenplum?
Heimdall supports failover of the Master to the Standby for unplanned and planned (i.e. maintenance upgrade) outages. Failover is configured at the SQL level. Heimdall monitors the health of each Greenplum instance. If Heimdall detects a failure on the primary node, it will automatically failover to the secondary node. It supports many databases including Greenplum, SQL Server, MySQL, Oracle, and Postgres.
What concurrency levels can you expect with the Heimdall solution?
The speed of the application and the concurrency depends on a lot of factors. This includes hardware, software configurations, number of user, data volumes etc. However, with Heimdall, we have seen Greenplum performance improve up to 100x.
What transactional databases does Heimdall support?
Heimdall supports any SQL database (Postgres, MySQL, SQL Server, Oracle etc.
How is Heimdall priced?
Heimdall is priced by node. For specific Heimdall Data for Pivotal pricing, please email firstname.lastname@example.org
Can I demo Heimdall for Pivotal?
Yes, see this document for directions.