Reports w/ Large data Set slow

7 December, 2015

I've got a report with about 15k rows per day in it, and I'm looking at the last 30 days. (~500,000 rows)

Aggregations on this data in reports takes a large amount of time, even simple counts, but only in Yellowfin. The exact same operation performed by Oracle occurs in about 250ms.

Count on the data produces speedy results, but a display of the rows pivoted by day takes 5 or more minutes. the report itself states that the average runtime is 0 seconds. Could there be a problem with the yellowfin client preventing the display of the data?

Hi Justin,

Thanks for the question. Large cross tab reports are known for having performance limitations. This can become especially evident when multiple aggregations and sections are being applied. This is because report SQL is first passed to the underlying db, and then the return data is aggregated by Yellowfin (which can be process intensive). Traditional column based reports simply return data that is aggregated by the underlying DB.

The below forum post outlines this process:

What are cross tab report limits?

If you convert your cross tab report into a traditional column based report you should notice fairly close report processing times between your DB and Yellowfin. If this isn't the case then we should probably take a deeper look.

As a general rule of thumb, we like to recommend customers use filters to limit the data returned in a cross-tab report to help speed up the report rendering process. Obviously this isn't always appropriate depending on how much data you are trying to see, but it is worth mentioning.

Lastly, I've gone ahead and attached our Yellowfin performance guide for your review.

Hopefully this information is helpful and helps provide some more clarity to the performance issues you are experiencing.

Please let us know if you need any clarification on any of this information. Have a great day/weekend!

Kind Regards,

Dustin

What about performance for large data sets that are NOT cross-tabbed? The same 500K rows would return in less than a second from my DB cluster, but yellowfin cannot display these results (with no post aggregation) in less than 30. Essentially, the same SQL statement takes my SQL*PLus client or TOAD client < 1 second, but YF takes 30.

I'm not sure of Yellowfin's design paradigm from the development standpoint, but i would assume that generating the report results would be best done by forcing that work on the data source DB rather than done within Yellowfin's engine and the apache environment. I can issue a SQL PIVOT command and get the same results that Yellowfin would return, much much faster.

Hi Justin,

30 seconds versus 1 second is certainly quite a difference! There could be many different reasons for this difference in performance, so to narrow down the possibilities it would be great to see some thread snapshots taken during such a 30 second period while you're waiting on Yellowfin to execute a report, that way we will hopefully be able to see which thread is holding up the proceedings and what it is doing.

Thread snapshots are very easy to take in Yellowfin, all you have to do is to run the following page on another browser tab while Yellowfin is hanging and save the results:

http://:/info_threads.jsp

If you could save a snapshot every 5 seconds during the 30 second interval then that would be most helpful. Please email the results to us and be sure to reference the subject of this forum post.

Also, it would be useful if you could run this other information page (just the once will do nicely) and send those results as well.

http://:/info.jsp

regards,
Dave

Also worth reading:

The Yellowfin Platform

The Ultimate Guide to Embedded Analytics

The Executive’s Guide to Embedded BI

Flexible Pricing Models

Forum