I work with panel data which contain several companies (id
) and cover the period from 1.1.2008 to 1.1.2013 (date
, year
). I want to generate new variable (sum1
) which contains a sum of daily observation for var1
for each company and specific time interval. If the interval was equal to each year I would use the function total()
:
bysort id year: egen sum1=total(var1)
In my case however, the time interval is determined as an interval between two events. I have a special variable called event
, which takes value of 1 if the event has occurred on special date and missing otherwise. There are 5 to 10 events for each company. The intervals between events are not equal; hence the first interval can contain 60 observations, the next interval 360 observations. The intervals are also not equal for different companies. The starting date for the first interval for each company is 1.1.2008. The starting date for the second interval is the date of the first event + 1 day. Besides I would like to account for missing values, so if all values of var1
for the company x are missing variable, sum1
for company x and specific interval must contain missing values and not 0.
My panel looks like this:
id date year var1 event sum1(to gen) event_id(to gen)
1 1.1.2008 2008 25 . 95 (25+30+40) 1
1 2.1.2008 2008 30 . 95 (25+30+40) 1
...........................................................1
1 31.4.2008 2008 40 1 95 (25+30+40) 1
1 1.5.2008 2008 50 . 160 (50+50+60) 2
1 2.5.2008 2008 50 . 160 (50+50+60) 2
......................................... ................2
1 31.4.2009 2009 60 1 160 (50+50+60) 2
2 1.1.2008 2008 26 . 96 (26+30+40) 1
2 2.1.2008 2008 30 . 96 (26+30+40) 1
...........................................................1
2 31.6.2008 2008 40 1 96 (26+30+40) 1
2 1.5.2008 2008 51 . 161 (51+50+60) 2
2 2.5.2008 2008 50 . 161 (51+50+60) 2
...........................................................2
2 31.6.2009 2009 60 1 161 (51+50+60) 2
I tried to write different loops (while
, if
), but I failed to do it correctly. I cannot use rolling
as my intervals are not the same.
My other idea was to create the group identifier first (called event_id
), which contains the event_id
for each interval and each company. Then I could use bysort id event_id: egen sum1=total(var1)
, but unfortunately I do not have any idea how to do that. So, the variables event_id
and sum1
in my panel do not exist and serve as an example for output I want to achieve.
I can make sense of the example with the following changes:
That said, one trick of reversing time makes subdivision into spells easy. Given markers 1 for the ends of each spell, we can then cumulate backwards using sum()
. The crucial small detail here is that sum()
ignores missing values, or more precisely treats them as zero. Here that is entirely a feature, although not quite what the OP wants when applying egen, total()
.
Then reverse spell numbering, reverse time to the normal direction and apply egen
as in other answers. Reversing and reversing back are both just negation using -
. Sorting on date within panel is just cosmetic once we have a division into spells, but still the right thing to do.
For more on spells in Stata, see here
For hints from Statalist on how to provide data examples using dataex
(SSC), which apply here too with minor modification, see here
clear *
input id str10 date year var1 event DesiredSum
1 1.1.2008 2008 25 . 95
1 2.1.2008 2008 30 . 95
1 30.4.2008 2008 40 1 95
1 1.5.2008 2008 50 . 160
1 2.5.2008 2008 50 . 160
1 30.4.2009 2009 60 1 160
2 1.1.2008 2008 26 . 96
2 2.1.2008 2008 30 . 96
2 30.4.2008 2008 40 1 96
2 1.5.2008 2008 51 . 161
2 2.5.2008 2008 50 . 161
2 30.6.2009 2009 60 1 161
end
gen ddate = -daily(date, "DMY")
bysort id (ddate): gen EVENT = sum(event)
replace ddate = -ddate
by id: replace EVENT = EVENT[_N] - EVENT + 1
bysort id EVENT (ddate): egen Sum = total(var1), missing
assert Sum == DesiredSum
list, sepby(id EVENT)
+-----------------------------------------------------------------------+
| id date year var1 event Desire~m ddate EVENT Sum |
|-----------------------------------------------------------------------|
1. | 1 1.1.2008 2008 25 . 95 17532 1 95 |
2. | 1 2.1.2008 2008 30 . 95 17533 1 95 |
3. | 1 30.4.2008 2008 40 1 95 17652 1 95 |
|-----------------------------------------------------------------------|
4. | 1 1.5.2008 2008 50 . 160 17653 2 160 |
5. | 1 2.5.2008 2008 50 . 160 17654 2 160 |
6. | 1 30.4.2009 2009 60 1 160 18017 2 160 |
|-----------------------------------------------------------------------|
7. | 2 1.1.2008 2008 26 . 96 17532 1 96 |
8. | 2 2.1.2008 2008 30 . 96 17533 1 96 |
9. | 2 30.4.2008 2008 40 1 96 17652 1 96 |
|-----------------------------------------------------------------------|
10. | 2 1.5.2008 2008 51 . 161 17653 2 161 |
11. | 2 2.5.2008 2008 50 . 161 17654 2 161 |
12. | 2 30.6.2009 2009 60 1 161 18078 2 161 |
+-----------------------------------------------------------------------+