Remove rows from dataframe that contains only 0 or just a single 0

Question 1

Remove rows from dataframe that contains only 0 or just a single 0

r filtering bioinformatics data-processing

KnightofniDK · Aug 8, 2014 · Viewed 21.9k times · Source

Answer

Answer

Using data.table (assuming df is your data set)

library(data.table)
setDT(df)[, .SD[!all(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total  PC3small  PC3total
# 1:  MIR22HG  1  33221.500   1224.550  2156.430   573.315
# 2: MIRLET7E  2  87566.100   7737.990 25039.300 16415.600
# 3:   MIR612  3      0.000      0.000   530.068     0.000
# 4: MIR218-1  4      0.000      0.000  1166.880   701.253
# 5: MIR181B2  5  70723.200   3958.010  6209.850  1399.340
# 6:   MIR10B  7    787.516    330.556     0.000 20336.400

Or if you only want to remove rows with any zeroes

setDT(df)[, .SD[!any(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total PC3small  PC3total
# 1:  MIR22HG  1    33221.5    1224.55  2156.43   573.315
# 2: MIRLET7E  2    87566.1    7737.99 25039.30 16415.600
# 3: MIR181B2  5    70723.2    3958.01  6209.85  1399.340

Question 2

I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns.

Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary.

I have tried to paste some of my data here with the results I want to obtain.

unfiltered:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    6   MIR218-2    0           0           0           0
    7   MIR10B      787.516     330.556     0           20336.4
    8   MIR3176     0           0           0           0

any rows with containing a zero removed:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34

only rows that is all zero is filtered away:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    7   MIR10B      787.516     330.556     0           20336.4

I did find a way of removing any rows that had at least 1 zero in it, but it was "cheating" by exchanging all zeros with NA and then using complete.cases to filter.

Also, by doing that it remove all rows where the GeneName had a zero in it (as for MIR10B).

I can solve it by using for loops, but I have been told that loops in R is very ineffective and would therefore like to avoid that solution.

EDIT: While Xin Yin's solution works perfectly well and kept the data in a data frame, the answer by David Arenburg is supposedly more efficient and should be used.

Remove rows from dataframe that contains only 0 or just a single 0

Answer

Related questions