Heimdall Data for Pivotal Data
As explained in our detailed blog, for large analytics data, large-scale MPP (Massively Parallel Processing) database systems like Pivotal Greenplum use parallel compute resources to answer queries quickly. However, there are some drawbacks:
- Higher latency: For distributed queries, many nodes must coordinate to generate an answer;
- No materialized views: If a query result is used for multiple following queries, additional work is required to preserve the result in a temporary table or similar storage, or repeated calls to the same base query will be needed;
- No built-in query result cache
These drawbacks can be resolved by modifying the application, but the ideal solution requires no code changes such as Heimdall Data. Heimdall provides Pivotal Greenplum users with three key features:
- Efficient use of analytics and OLTP traffic
- Fast materialized views for results from Pivotal Greenplum
- Automatic result set caching for repeated queries
Fast materialized views is very important in analytics environments. When reports are generated, or dashboards are viewed, a subset of data is pulled from the back-end data store, then various operations are performed on that data. Heimdall provides the following functionality:
- Queries against a materialized view can be routed to an alternate database, typically Postgres, which acts on behalf of Pivotal Greenplum. Postgres answers queries offloading Pivotal Greenplum.
- Heimdall triggers a refresh of the view automatically. Heimdall is aware updated views from Pivotal Greenplum and when data was loaded that may impact the view. The net result is faster reports and a lighter load on Greenplum, allowing the processing of other queries to be faster and more scalable.
Visit our technical blog.
- SQL Results Caching is ideal for low latency performance improvement. Heimdall’s intelligent auto-caching and auto-invalidation logic are distributed across application servers. Result sets are cached into Pivotal GemFire, and are invalidated upon writes to Greenplum. As analytics data is loaded in bulk into Greenplum, nearly all queries between data loads will remain valid once made into the cache. This can result in a significant performance improvement for both the Pivotal Greenplum cluster and to the end-user. View our SQL caching video.