NA values using sqldf

Roman Luštrik picture Roman Luštrik · Jan 14, 2012 · Viewed 13.5k times · Source

If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5.

# prepare data and write to file
write.table(data.frame(col1 = c(NA, NA, 3:10)),
        "my.na.txt", row.names = FALSE)

mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5

my.na <- read.csv.sql("my.na.txt", sep = " ",
        sql = "SELECT AVG(col1) FROM file") # 5.2

# this is identical to
sum(3:10)/10

unlink("my.na.txt") # remove file

Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using na.rm argument (in R)?

Answer

mathematical.coffee picture mathematical.coffee · Jan 14, 2012

Modify your query to ignore the NA values:

SELECT AVG(col1)
FROM file
WHERE col1 IS NOT \"NA\"