Sample of data within groups - Teradata

Donna Panzarotti picture Donna Panzarotti · Jun 8, 2012 · Viewed 13k times · Source

Found a similar question but without an answer that worked succuessfully.

I need to select a sample of 50 of each status type within a single table.

TABLE1

MEMBER  STATUS
1234       A
1324       A
3424       R
3432       S
3232       R
2783       A
2413       S
4144       R
2387       S

I tried:

SEL Member, status FROM TABLE1 Qualify Row_Number ( ) OVER (PARTITION BY status ORDER BY random (1,10000)) <=50

As suggested in the previous question/answer but Teradata does not like RANDOM in an Aggregate or Ordered Analytical Function.

Answer

mechanical_meat picture mechanical_meat · Jun 8, 2012

Since, as you say in the comments, you do not need to use random you can do this:

SEL     Member, status 
FROM    TABLE1 
QUALIFY ROW_NUMBER() OVER (
            PARTITION BY status 
            ORDER BY NULL) <= 50;

Edit: based on a question in the comments related to a spool-space error, an approach we can try when the table is large is to create a temporary table with a portion of the records from the original table.

CREATE MULTISET VOLATILE TABLE tmp_tbl AS (
    SEL     Member, status
    FROM    TABLE1
    WHERE   somefield = 'somecriterion'
) WITH DATA 
ON COMMIT PRESERVE ROWS;

And then try again:

SEL     Member, status 
FROM    tmp_tbl /* now from temporary table */
QUALIFY ROW_NUMBER() OVER (
            PARTITION BY status 
            ORDER BY NULL) <= 50;