We are going to discuss how to optimize the Denodo query to make the most of the optimization in the Denodo platform. Below are the most powerful query optimization techniques that we can apply in Denodo platform manually which will increase the performance significantly.
Below is the diagram which simplifies how Denodo parses/optimizes a query.
Denodo developers/administrators can also improve the performance of the query using some features provided by Denodo platform.
In some specific scenarios, it could be advisable to disable the optimization of the query as the process of finding/applying the simplifications and evaluating the best execution plan using cost information might take some milli seconds. For fast queries which return few rows, it may not worth this overhead.
DATA MOVEMENT
Data movement optimization provides a way to execute the queries across distributed data source more efficiently. VDP can transfer the data of a smaller view to a data source of larger view and execute the operation in 2nd DS.
This may offer great performance improvements for the below operations
When a query involves a data movement is executed, execution engine does the following.
Only supported data source for cache is available for data movement. The option Allow temporary table for data movement is available under read and write tab of the DS.
It is also possible to enable the data movement to enable it at the execution time using the DATAMOVEMENTPLAN option in the CONTEXT clause.
Example: Select * from sample_viewname where (DATAMOVEMENTPLAN=samplesourceviewname:OFF target_viewname:JDBC ds1);
PRIMARY KEYS
The primary key of a view is a column or set of columns whose value is unique in the view. At runtime, clients can obtain the primary key definition of a view by invoking the appropriate methods of JDBC or ODBC interfaces of the server. VDP does not enforces the primary key definition.But it plays important role for applying optimizations in Denodo.
BRANCH PRUNING
While dealing with unions, sometimes query needs the result only from one branch. For example, data of current year is stored in one data source and historical data is stored in a different data source, the full union query can be replaced at the execution time by the branch that will be executed by the conditions of the query. This allows delegations of the operations to the DS on top of the union and do not execute an unneeded branch.
Developers have to configure the partitions union by adding Where conditions to the sub views. Each of the selection views defines a clause with conditions that match the data stored in the underlying partition. These conditions actually wont filter any of the rows in the DS when issuing full scan of the view. But when a query specifies a condition that are incompatible with the definition of a certain sub view, Denodo using branch pruning, can be known in advance that subview will return 0 records.
ALTERNATIVE WRAPPERS
When working with star schema, occasionally fact table is stored in one DB and dimension tables are stored in another DB. For increasing performance, the small dimension tables can be replicated on all the databases of star schema. This allows query optimizer to push down the operations to the DS. This optimization specifically applies only to JDBC Data sources. This functionality is available by going to the base view options -> Search methods
When to use Cache?
Sometimes, real time access to the source is not a good fit and vdp cache can be enabled. Here are some specific situations were enabling cache is advisable.
Memory usage
Query execution in Denodo platform works internally as follows:
For each query, VDP uses one thread for each DS involved in that query and one thread for combining the data obtained from the sources. Each row is returned to the client as soon as it is available, while in streaming mode for asynchronous operations like unions, joins (Not HASH), projections and Selections (not using DISTINCT). For synchronous operations such as GROUP BY, HASH JOIN, ORDER BY, DISTINCT etc., the entire records required to be realized in memory before processing them. For such scenarios following aspects are important when dealing with memory consumption.
STREAMING DATA
There are 2 options to import a DS with an XML structure which return compound values:
Do not stream output When base view is created, server receives the entire document and then parses it thus having to store the entire document in memory.
Stream output at specified level The server optimizes the processing, so it does not require the entire document to be realized in memory before processing it. The memory consumption is muh lower in this case.
CONCLUSIONS
In general, when dealing with performance, developers should take these same steps:
Taking time to fully understand the behavior of the queries is critical.
Denodo platform 7.0 reference manuals:
Manju is a senior consultant in KPI Partners. Her professional focus is on the world of ETL development, Support, and Implementation. She has extensively worked on Data warehousing applications using ETL tools like Data Stage and Informatica and Data virtualization tool like Denodo and also in SQL, PL/SQL and UNIX scripts.