Is there a way to select the top N rows from each group
13 June, 2014
Hello!
I would like to make a report that selects the top N rows from each group. There is a blog post explaining how it is done in SQL here, but it uses user variables and I haven't had any luck creating a view in YellowFin that uses variables (probably with good reason).
My problem is also slightly more complex because I am not working with a single table like the example, and I want to be able to filter the underlying table before the summary is created... and I want to do it on a sum figure rather than a flat figure.
Let's say I've got three tables, using the fruit analogy from the blog.
Table 1 is STORE
Table 2 is FRUIT
Table 3 is SALES
Every SALE has a foreign key that references the STORE and another foreign key that references the FRUIT.
There are thousands and thousands of different types of fruit sold at each store (this is where the analogy begins to fall apart).
The report I'm trying to make would show something like this
[code]+-------------+--------------+---------------+
| Store | Fruit | Sale (sum) |
+-------------+--------------+---------------+
| Sydney | Watermelon | $3051.00 |
| | Apple | $2197.54 |
| | Banana | $998.12 |
| Melbourne | Banana | $6019.55 |
| | Apple | $5586.01 |
| | Peach | $4443.00 |
| Perth .... and so on[/code]
Is there any way I can achieve this in YellowFin?
I would like to make a report that selects the top N rows from each group. There is a blog post explaining how it is done in SQL here, but it uses user variables and I haven't had any luck creating a view in YellowFin that uses variables (probably with good reason).
My problem is also slightly more complex because I am not working with a single table like the example, and I want to be able to filter the underlying table before the summary is created... and I want to do it on a sum figure rather than a flat figure.
Let's say I've got three tables, using the fruit analogy from the blog.
Table 1 is STORE
Table 2 is FRUIT
Table 3 is SALES
Every SALE has a foreign key that references the STORE and another foreign key that references the FRUIT.
There are thousands and thousands of different types of fruit sold at each store (this is where the analogy begins to fall apart).
The report I'm trying to make would show something like this
[code]+-------------+--------------+---------------+
| Store | Fruit | Sale (sum) |
+-------------+--------------+---------------+
| Sydney | Watermelon | $3051.00 |
| | Apple | $2197.54 |
| | Banana | $998.12 |
| Melbourne | Banana | $6019.55 |
| | Apple | $5586.01 |
| | Peach | $4443.00 |
| Perth .... and so on[/code]
Is there any way I can achieve this in YellowFin?
Hi Max,
unless I have misunderstood something, Yellow actually makes this complex task very easy to accomplish. I will give an example using our Ski Team Demo Database:
Here is a basic report showing the total invoiced amount per Demographic per Region:
now, if I want to show only the top 3 Demographic groups per Region I've just got to add Region as a Section and then of course add the Top N Advanced Function to the Invoiced Amount column:
then the final output will look like this:
which I think fulfills your requirement. By the way, I forgot to mention that in the Region column I configured Suppress Duplicates = On so the repeated Region names are not shown. Also, if I wanted I could have chosen to Hide the Rank column so you only see the Invoiced Amount.
If I have misunderstood your requirement then please let me know.
regards,
Dave
unless I have misunderstood something, Yellow actually makes this complex task very easy to accomplish. I will give an example using our Ski Team Demo Database:
Here is a basic report showing the total invoiced amount per Demographic per Region:
now, if I want to show only the top 3 Demographic groups per Region I've just got to add Region as a Section and then of course add the Top N Advanced Function to the Invoiced Amount column:
then the final output will look like this:
which I think fulfills your requirement. By the way, I forgot to mention that in the Region column I configured Suppress Duplicates = On so the repeated Region names are not shown. Also, if I wanted I could have chosen to Hide the Rank column so you only see the Invoiced Amount.
If I have misunderstood your requirement then please let me know.
regards,
Dave
So simple! Obviously I should RTFM, thank you so much =)
nahh....then we'd be out of a job ;-)