I have some fairly simple SQL that should provide 1 row per quarter per asset1. Instead, I get multiple rows per group by.
Below is the SQL, a SAS data step, and some of the output data. The number of duplicate rows (in the below data, 227708) is equal to Num_borrowers, which is the asset1 number of rows.
proc sql outobs=max;
create table table1 as
select
case
when period_dt ='01DEC2003'd then '2003Q4'
when period_dt ='01DEC2004'd then '2004Q4'
when period_dt ='01DEC2005'd then '2005Q4'
when period_dt ='01DEC2006'd then '2006Q4'
when period_dt ='01DEC2007'd then '2007Q4'
when period_dt ='01DEC2008'd then '2008Q4'
when period_dt ='01DEC2009'd then '2009Q4'
when period_dt ='01DEC2010'd then '2010Q4'
when period_dt ='01DEC2011'd then '2011Q4'
when period_dt ='01DEC2012'd then '2012Q4'
when period_dt ='01DEC2013'd then '2013Q4'
when period_dt ='01JUN2014'd then '2014Q2'
end as QTR,
case
when MM_ASSET in ('C&I', 'Foreign', 'Leasing','Scored-WF','Scored-WB') THEN 'C&I'
when MM_ASSET='Construction' THEN 'Construction RE'
when MM_ASSET='Mortgage-IP' THEN 'Income Producing RE'
when MM_ASSET='Mortgage-OO' THEN 'Owner Occupied RE'
when MM_ASSET='Mortgage-SF' THEN 'Mortgage-SF'
when MM_ASSET='Unknown' THEN 'Other'
end as asset1,
count (period_dt) as Num_Borrowers,
exposure,
co_itd,
MM_NINEQTR_LOSS,
MM_LIFE_LOSS
from td_prod.OBLIGOR_COMBINED
where period_dt in ('01DEC2003'd,'01DEC2004'd,'01DEC2005'd,'01DEC2006'd,'01DEC2007'd,'01DEC2008'd, '01DEC2009'd,'01DEC2010'd,'01DEC2011'd,'01DEC2012'd,'01DEC2013'd,'01JUN2014'd)
and mm_asset in ('C&I','Foreign','Leasing','Construction','Mortgage-IP','Scored-WF','Scored-WB'
'Mortgage-OO','Mortgage-SF','Unknown')
group by 1,2
order by 1,2;
quit;
data table2; set table1;
Total_Exposure = exposure/1000000;
if total_exposure = 0 then total_exposure=.;
Total_Charge_Offs =co_itd/1000000;
Total_9Q_Losses = MM_NINEQTR_LOSS/1000000;
Total_Life_Losses = MM_LIFE_LOSS/1000000;
avg_borrower_exp = total_exposure/num_borrowers;
co_rate = total_charge_offs/total_exposure;
life_lossR = Total_life_losses/total_exposure;
nineQtr_lossR = total_9q_losses/total_exposure;
run;
*** sample of output data set ***;
qtr asset1 num_borrowers
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
2003Q4 C&I 227708
Realized my comment above is more like an answer.
In SAS SQL, in a query with a group by clause that includes extraneous columns on the select statement (i.e. columns not part of the group by and not derived from an aggregating function), SAS "remerges" the summary statistics back to the original data (with a note to that effect). Most SQLs would just throw an error. Below is an example:
data have;
input gender $ age score;
cards;
M 10 100
M 20 200
F 30 300
F 40 400
;
run;
proc sql;
select gender, mean(age) as AvgAge, SCore
from have
group by gender
;
quit;
returns:
gender AvgAge score
F 35 300
F 35 400
M 15 100
M 15 200
In your code, exposure, co_itd, MM_NINEQTR_LOSS, and MM_LIFE_LOSS are all extraneous columns, causing SAS to remerge.
Whenever remerging occures you will see the following message in SAS log:
NOTE: The query requires remerging summary statistics back with the original data.
See remerging data section in SAS documentation on summary-function for more details