How to get the mode of a group in summarize in R

drew picture drew · May 22, 2015 · Viewed 14.3k times · Source

I want to compare costs of CPT codes from two different claims payers. Both have par and non par priced providers. I am using dplyr and modeest::mlv, but its not working out as anticipated. Heres some sample data;

source CPTCode ParNonPar Key         net_paid  PaidFreq seq
ABC   100       Y      ABC100Y  -341.00     6   1
ABC   100       Y      ABC100Y     0.00     2   2
ABC   100       Y      ABC100Y   341.00     6   3
XYZ   103       Y      XYZ103Y   740.28     1   1
XYZ   104       N      XYZ104N     0.00     2   1
XYZ   104       N      XYZ104N   401.82     1   2
XYZ   104       N      XYZ104N   726.18     1   3
XYZ   104       N      XYZ104N   893.00     1   4
XYZ   104       N      XYZ104N   928.20     2   5
XYZ   104       N      XYZ104N   940.00     2   6

and the code

str(data)
View(data)

## Expand frequency count to individual observations
n.times <- data$PaidAmounts
dataObs <- data[rep(seq_len(nrow(data)), n.times),]

## Calculate mean for each CPTCode (for mode use modeest library)
library(dplyr)
library(modeest)
dataSummary <- dataObs %>%
  group_by(ParNonPar, CPTCode) %>%
  summarise(mean = mean(net_paid),
            median=median(net_paid),
            mode = mlv(net_paid, method=mfv),
            total = sum(net_paid))
str(dataSummary)                     

I thought I could load modeest in the summarize function with the mean and median, but this formulation errors out with Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character' Without mlv I am getting a df like this, but what I want is to get all the stats for a payer cpt on one line. I envision graphing it in boxplots by limiting the x and y segments, once I get what I need on a row

the inadequate answer is this ( I forgot to get the payer name in here!)

ParNonPar   CPTCode mean          median(net_paid)  total
N           0513F   0.000000    0.000           0.00
N           0518F   0.000000    0.000           0.00 
N           10022   0.000000    0.000           0.00
N           10060   73.660000   90.120        294.64
N           10061   324.575000  340.500      1298.30
N           10081   312.000000  312.000       312.00

thanks very much for your time and effort.

Answer

Ram Narasimhan picture Ram Narasimhan · May 22, 2015

You need to make a couple of changes to your code for mlv to work.

  1. the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
  2. After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.

Try:

dataSummary <- dataObs %>%
  group_by(ParNonPar, CPTCode) %>%
  summarise(mean = mean(net_paid), 
            meadian=median(net_paid), 
            mode = mlv(net_paid, method='mfv')[['M']], 
            total = sum(net_paid))

to get:

> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar

  ParNonPar CPTCode     mean meadian     mode   total
1         N     104 639.7111  893.00 622.7333 5757.40
2         Y     100   0.0000    0.00   0.0000    0.00
3         Y     103 740.2800  740.28 740.2800  740.28

Hope that helps you move forward.