Thursday, February 12, 2015

Improving the Performance of Calculated Fields and Formulas, Part 1

Stonefield Query has very fast database and calculation engines. Coupled with a high-performance database server like Microsoft SQL Server or Oracle, reports are often generated in Stonefield Query significantly faster than in other reporting solutions. We’ve heard of cases where a report took hours to run in (unnamed competitor) but only seconds in Stonefield Query.

However, occasionally we come across reports that run slower than expected. Frequently, the cause is a calculated field (defined by the developer in the data dictionary) or formula (defined by the user in the Formula Editor) that doesn’t perform well. This blog post and the next one discuss a couple of common causes and solutions. This post discusses the RunSQL function.

Note: this is an advanced topic intended for developers or technically-oriented users.

Using RunSQL

The RunSQL function has been in Stonefield Query for a long time. It was added to allow a formula to execute a SQL statement that retrieves additional data from the database. For example, suppose you want to show the amount of freight charged for each customer, but freight is a line item in an invoice details table rather than being a discrete field so you can’t just add a field for it to the report. In that case, a formula with an expression like this would calculate the freight charge for each customer:

RunSQL("select sum(AMOUNT) from INVOICE_DETAILS where ITEMNO = 'FREIGHT' and CUSTOMER_NO = ?CUST_ID")

The ?CUST_ID syntax means use the value of the CUST_ID field in the current record of the result set. In this case, the report displays the CUST_ID field from the CUSTOMERS table, then for each record, calculates the freight by executing the SQL statement in the RunSQL function.

The problem with RunSQL

The way Stonefield Query generates the result set for a report is as follows:

  • Determine the SQL statement needed to retrieve the data for “real” fields and for calculated fields and formulas that have the “Send formula to database engine” setting turned on.
  • Add to the SQL statement the fields necessary to perform the calculation of calculated fields and formulas that have the “Send formula to database engine” setting turned off.
  • Retrieve the data using the SQL statement into a temporary result set.
  • Go through every record in the temporary result set and process any calculated fields and formulas that have the “Send formula to database engine” setting turned off to give the final result set.

It’s the “go through every record” in the final point that causes the problem when you use RunSQL. Suppose you have 2,000 customers. As Stonefield Query processes the expression for the Freight formula, it sends a SQL statement to the database engine for every record. In this case, that’s 2,000 additional SQL statements. No wonder the report takes so long to run!

The solution

There are a couple of solutions to this problem. One is to use a new feature added in Stonefield Query version 5.0: grouping formulas. A grouping formula is like a regular formula but allows you to specify a summarization to be done on the expression (like the SUM function used in the SQL statement above), a field to group the summarized data on (like the implied grouping due to the filter on customer number in the SQL statement above), and optionally a filter to limit the records being retrieved (like the WHERE clause on ITEMNO in the SQL statement above). In many cases, a grouping formula can replace a formula using a RunSQL statement, but grouping formulas have one critical advantage: only a single SQL statement is sent to the database and the result set retrieved by it is cached and indexed for high-speed lookups. This results in orders of magnitude better performance, at the cost of using a little more memory.

GroupingFormula

Following the example above, you’d replace the expression of the Freight formula with just AMOUNT, click the Grouping button, select “Sum” for Summary, select CUST_ID for Grouping Field, and specify a filter of ITEMNO equals FREIGHT. The resulting formula is not only easier to create (you don’t have to know the SQL language or real table and field names) but performs a LOT faster.

For those few cases where a grouping formula can’t replace a RunSQL statement (although we haven’t encountered one yet), the other solution is to use the technique discussed in the next blog post: caching and indexing your own result set.

0 comments: