My df
looks like this:
Id Task Type Freq
3 1 A 2
3 1 B 3
3 2 A 3
3 2 B 0
4 1 A 3
4 1 B 3
4 2 A 1
4 2 B 3
I want to restructure by Id and get:
Id A B … Z
3 5 3
4 4 6
I tried:
df_wide <- dcast(df, Id + Task ~ Type, value.var="Freq")
and got the error:
Aggregation function missing: defaulting to length
I can't figure out what to put in the fun.aggregate
. What's the problem?
The reason why you are getting this warning is in the description of fun.aggregate
(see ?dcast
):
aggregation function needed if variables do not identify a single observation for each output cell. Defaults to length (with a message) if needed but not specified
So, an aggregation function is needed when there is more than one value for one spot in the wide dataframe.
An explanation based on your data:
When you use dcast(df, Id + Task ~ Type, value.var="Freq")
you get:
Id Task A B
1 3 1 2 3
2 3 2 3 0
3 4 1 3 3
4 4 2 1 3
Which is logical because for each combination of Id
, Task
and Type
there is only value in Freq
. But when you use dcast(df, Id ~ Type, value.var="Freq")
you get this (including a warning message):
Aggregation function missing: defaulting to length
Id A B
1 3 2 2
2 4 2 2
Now, looking back at the top part of your data:
Id Task Type Freq
3 1 A 2
3 1 B 3
3 2 A 3
3 2 B 0
You see why this is the case. For each combination of Id
and Type
there are two values in Freq
(for Id 3: 2
and 3
for A
& 3
and 0
for Type B
) while you can only put one value in this spot in the wide dataframe for each values of type
. Therefore dcast
wants to aggregate these values into one value. The default aggregation function is length
, but you can use other aggregation functions like sum
, mean
, sd
or a custom function by specifying them with fun.aggregate
.
For example, with fun.aggregate = sum
you get:
Id A B
1 3 5 3
2 4 4 6
Now there is no warning because dcast
is being told what to do when there is more than one value: return the sum of the values.