Decile ranking performance in Netezza

N West picture N West · Sep 7, 2012 · Viewed 7.3k times · Source

I have a small(ish) aggregated data set in Netezza, about 10m rows, on a TwinFin 6.

To simplify the question a bit, I've cut down the number of columns:

CUSTOMER_SALES_AGG

CUSTOMER_ID
NUMBER_TRANS
TOTAL_DOLLARS
TOTAL_ITEMS

This table is distributed on CUSTOMER_ID, with 1 row per customer ID, collecting the total transactions the customer has made, the total dollars they've spent, and the # of items that they've purchased.

I'm attempting to calculate the decile ranking of each customer across all customers, by # transactions, total $ spent, and total items bought. E.G. if a customer spent >= 90% of other customers, they would rank in the 1st decile.

I've built a query:

SELECT
    CUSTOMER_ID, 
    NUMBER_TRANS,
    NTILE(10) OVER(ORDER BY NUMBER_TRANS DESC NULLS LAST) as TRANS_DECILE,
    TOTAL_DOLLARS,
    NTILE(10) OVER(ORDER BY TOTAL_DOLLARS DESC NULLS LAST) as DOLLARS_DECILE,
    TOTAL_ITEMS,
    NTILE(10) OVER(ORDER BY TOTAL_ITEMS DESC NULLS LAST) as ITEMS_DECILE
FROM CUSTOMER_SALES_AGG;

This works, but it's very slow, taking nearly 10-20 minutes to run.

Since doing a decile computation requires sorting the data and then dividing that sorted data into groups, it seems like the MPP structure of Netezza would handle this very well. If I was partitioning the deciles I could redistribute and do the ranking on each SPU, it could be even faster.

Any ideas on how to speed this up?

Answer

N West picture N West · Sep 7, 2012

It appears that the main problem stems from the use of multiple analytic functions (NTILE) in the same SQL statement (my actual statement ranks the customers in 7 different ways).

From what I can tell, Netezza does, as @GordonLinoff explains in the comments, a quicksort on each processor and a final quicksort on the controller system (the Netezza Host). However, it only does this once, and then, as he guessed - pushes it all to the controller system.

It proceeds to quicksort the data on the controller system for the remaining analytic functions, not using parallelism at all. I would expect that instead it ought to sort the data each way, on each processor, do the final sort on the host, then push the data back down to the processors for a final hash join of each column.

I ended up creating a query something like this.

WITH 
NT AS (
  select customer_id, 
         number_trans,
         ntile(10) over (order by number_trans) as trans_decile
),
TD AS (
  select customer_id, 
         total_dollars,
         ntile(10) over (order by total_dollars) as dollars_decile
),
NI AS (
  select customer_id, 
         total_items,
         ntile(10) over (order by total_items) as items_decile
)
SELECT
    NT.CUSTOMER_ID, NT.NUMBER_TRANS, NT.TRANS_DECILE,
    TD.TOTAL_DOLLARS, TD.DOLLARS_DECILE,
    NI.TOTAL_ITEMS, NI.ITEMS_DECILE
FROM NT
JOIN TD ON (NT.CUSTOMER_ID = TD.CUSTOMER_ID)
JOIN NI ON (NT.CUSTOMER_ID = NI.CUSTOMER_ID);

This query's plan is much more complex, but for my case where i was doing 7 analytic rankings, it took query time down from 12 minutes to a bit less than 5 minutes.