Thursday, February 12, 2015

Improving the Performance of Calculated Fields and Formulas, Part 2

In the previous blog post, I discussed one potential cause of slow performance of calculated fields and formulas: using the RunSQL function. In this post, I’ll discuss another common cause for performance issues: calling a script that retrieves multiple result sets from the database engine.

Note: this is an advanced topic intended for developers or technically-oriented users.

Calling a script from a formula

Sometimes, a formula needs to do more than just use an expression to calculate a value. In that case, it needs to call a script to execute some code. Often, that code retrieves additional data from the database, and that can lead to the same problem discussed in the previous blog post: multiple SQL statements sent to the database engine, slowing down performance.

For example, in Stonefield Query for Sage 300 (Sage 300 is an accounting system formerly known as Accpac), to determine the total quantity on hand for an inventory item, a script has to access the ICILOC table, which contains the quantity on hand at each location (such as each warehouse), and add up the various amounts for a particular item. Here’s a script that would do that:

lparameters tcItem
lcSelect = 'select sum(QTYONHAND) as QOH from ICILOC ' + ;
    'where ITEMNO=?tcItem'
loDB = SQApplication.DataEngine.Databases.GetMainDatabase()
loDB.ExecuteSQLStatement(lcSelect)
lnAmount = QOH
return lnAmount

If the name of this script is GetQOH, it would be used in a formula as follows:

GetQOH(ICITEM.ITEMNO)

The problem with this script is that if the report shows 1,000 inventory items, this code is called 1,000 times, and 1,000 additional SQL statements are sent to the database engine, which would likely be pretty slow.

The solution

The solution is to execute only a single SQL statement that sums the quantity on hand across all location for all items, not just the current one, keep that result set in memory (sometimes called caching), and then do an indexed lookup (a SEEK) to find the one we’re interested in. Here’s the code for the updated GetQOH script:

lparameters tcItem
if not used('_TotalQOH')
    lcSelect = 'select ITEMNO, sum(QTYONHAND) as QOH ' + ;
        '
from ICILOC group by ITEMNO'
    loDB = SQApplication.DataEngine.Databases.GetMainDatabase()
    loDB.ExecuteSQLStatement(lcSelect, , '_TotalQOH')
    index on ITEMNO tag KEY
endif
select _TotalQOH
seek tcItem
lnAmount = QOH
return lnAmount

There isn’t much extra code, but notice the SQL statement is only executed once, the first time the script is called, and that the result set it retrieves calculates the total for all items and indexes the result set on the item number. The second and subsequent times the script is called, it sees that the result set it needs is already in memory so it doesn’t bother retrieving it again, instead doing a SEEK to find the record for the current item. This can result in orders of magnitude improvement in performance, depending on the size of the result set retrieved for the report.

There are a couple of things to watch out for when writing a script that retrieves additional data.

Wrinkle #1: showing current data

After the report is run, the result set retrieved by the script is still cached. So, when you run the report again, it doesn’t bother retrieving a fresh set of data. That’s fine if the data doesn’t change but if users have added or edited records in the meantime and you want the most current data to be used, you have to tell Stonefield Query to close the cached data at the end of each report run. To do that, add the following line of code before the ENDIF statement in the code above:

oApp.oData.oCursorsToCloseAfterRun.Add('_TotalQOH')

Note: in the upcoming Stonefield Query version 5.1, this is a little simpler; you can use:

SQApplication.RegisterCursorToCloseAfterRun('_TotalQOH')

Wrinkle #2: handling multi-database reports

The second issue is a little more complicated, but only occurs if you have a multi-database report; that is, a report in which you’ve selected more than one database using the Database button on the Filter page (not all versions of Stonefield Query support this). Because Stonefield Query does all of the data retrieval it needs for the report before processing calculated fields and formulas, by the time your script is called, only the current database (the one shown in the status bar) is being used so the data your script retrieves comes from that database, not all of the databases for the report. In that case, your script has to handle getting the data from the correct database for the record it’s currently processing. A couple of things are required.

First, in order to know which database the record came from, you have to add the Source Database field to the report (this is a virtual field added to every table when multi-database queries are supported), even if you turn off its Display Field in Report setting so it doesn’t actually appear in the report. That way, your script can identify which database it needs to access to get the additional data for each record.

Second, your script has to retrieve more than one result set; it needs a different one from each database. That means it has to keep track of the current database, connect to the one used by the record if necessary, retrieve the data, and reconnect to the original database. Of course, we’ll cache the different result sets so this only happens once. Here’s the script above updated to support that:

lparameters tcItem
loDB = SQApplication.DataEngine.Databases.GetMainDatabase()
lcCurrent = loDB.CurrentDataSource.DataSource
if type('loReport') = 'O' and ;
    not empty(loReport.aDataSources[1]) and ;
    type('SOURCEDATABASE') = 'C'
    lcDataSource = trim(SOURCEDATABASE)
else
    lcDataSource = lcCurrent
endif
lcCursorSuffix = oUtility.GetValidName(lcDataSource)
lcCursor = '_TotalQOH' + lcCursorSuffix
if not used(lcCursor)
    llReopen = .F.

    if lcDataSource <> lcCurrent
        llReopen = .T.
        loDatabase.OpenDataSource(lcDataSource)
    endif

    lcSelect = 'select ITEMNO, sum(QTYONHAND) as QOH ' + ;
        'from ICILOC group by ITEMNO'
    loDB.ExecuteSQLStatement(lcSelect, , lcCursor)
    index on ITEMNO tag KEY
    if llReopen
        loDatabase.OpenDataSource(lcCurrent)
    endif
endif
select (lcCursor)
seek tcItem
lnAmount = QOH
return lnAmount

Whew! That’s a bit of complicated code, but you can use this example as a template for your own scripts. Just replace the values assigned to the lcCursor, lcSelect, and lnAmount variables, change the name of the variable used for the parameter and the SEEK statement (if desired; no harm if you leave it as “lcItem”), and change the INDEX ON statement to use the appropriate index expression.

Summary

Using the techniques in this blog post and the previous one can make your reports run significantly faster. They’re a bit of work to implement but the results are well worth the effort!

Improving the Performance of Calculated Fields and Formulas, Part 1

Stonefield Query has very fast database and calculation engines. Coupled with a high-performance database server like Microsoft SQL Server or Oracle, reports are often generated in Stonefield Query significantly faster than in other reporting solutions. We’ve heard of cases where a report took hours to run in (unnamed competitor) but only seconds in Stonefield Query.

However, occasionally we come across reports that run slower than expected. Frequently, the cause is a calculated field (defined by the developer in the data dictionary) or formula (defined by the user in the Formula Editor) that doesn’t perform well. This blog post and the next one discuss a couple of common causes and solutions. This post discusses the RunSQL function.

Note: this is an advanced topic intended for developers or technically-oriented users.

Using RunSQL

The RunSQL function has been in Stonefield Query for a long time. It was added to allow a formula to execute a SQL statement that retrieves additional data from the database. For example, suppose you want to show the amount of freight charged for each customer, but freight is a line item in an invoice details table rather than being a discrete field so you can’t just add a field for it to the report. In that case, a formula with an expression like this would calculate the freight charge for each customer:

RunSQL("select sum(AMOUNT) from INVOICE_DETAILS where ITEMNO = 'FREIGHT' and CUSTOMER_NO = ?CUST_ID")

The ?CUST_ID syntax means use the value of the CUST_ID field in the current record of the result set. In this case, the report displays the CUST_ID field from the CUSTOMERS table, then for each record, calculates the freight by executing the SQL statement in the RunSQL function.

The problem with RunSQL

The way Stonefield Query generates the result set for a report is as follows:

  • Determine the SQL statement needed to retrieve the data for “real” fields and for calculated fields and formulas that have the “Send formula to database engine” setting turned on.
  • Add to the SQL statement the fields necessary to perform the calculation of calculated fields and formulas that have the “Send formula to database engine” setting turned off.
  • Retrieve the data using the SQL statement into a temporary result set.
  • Go through every record in the temporary result set and process any calculated fields and formulas that have the “Send formula to database engine” setting turned off to give the final result set.

It’s the “go through every record” in the final point that causes the problem when you use RunSQL. Suppose you have 2,000 customers. As Stonefield Query processes the expression for the Freight formula, it sends a SQL statement to the database engine for every record. In this case, that’s 2,000 additional SQL statements. No wonder the report takes so long to run!

The solution

There are a couple of solutions to this problem. One is to use a new feature added in Stonefield Query version 5.0: grouping formulas. A grouping formula is like a regular formula but allows you to specify a summarization to be done on the expression (like the SUM function used in the SQL statement above), a field to group the summarized data on (like the implied grouping due to the filter on customer number in the SQL statement above), and optionally a filter to limit the records being retrieved (like the WHERE clause on ITEMNO in the SQL statement above). In many cases, a grouping formula can replace a formula using a RunSQL statement, but grouping formulas have one critical advantage: only a single SQL statement is sent to the database and the result set retrieved by it is cached and indexed for high-speed lookups. This results in orders of magnitude better performance, at the cost of using a little more memory.

GroupingFormula

Following the example above, you’d replace the expression of the Freight formula with just AMOUNT, click the Grouping button, select “Sum” for Summary, select CUST_ID for Grouping Field, and specify a filter of ITEMNO equals FREIGHT. The resulting formula is not only easier to create (you don’t have to know the SQL language or real table and field names) but performs a LOT faster.

For those few cases where a grouping formula can’t replace a RunSQL statement (although we haven’t encountered one yet), the other solution is to use the technique discussed in the next blog post: caching and indexing your own result set.