How do I reference the row number of an observation? For example, if you have a data.frame
called "data" and want to create a variable data$rownumber
equal to each observation's row number, how would you do it without using a loop?
These are present by default as rownames
when you create a data.frame
.
R> df = data.frame('a' = rnorm(10), 'b' = runif(10), 'c' = letters[1:10])
R> df
a b c
1 0.3336944 0.39746731 a
2 -0.2334404 0.12242856 b
3 1.4886706 0.07984085 c
4 -1.4853724 0.83163342 d
5 0.7291344 0.10981827 e
6 0.1786753 0.47401690 f
7 -0.9173701 0.73992239 g
8 0.7805941 0.91925413 h
9 0.2469860 0.87979229 i
10 1.2810961 0.53289335 j
and you can access them via the rownames
command.
R> rownames(df)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
if you need them as numbers, simply coerce to numeric by adding as.numeric
, as in as.numeric(rownames(df))
.
You don't need to add them, as if you know what you are looking for (say item df$c == 'i'
, you can use the which command:
R> which(df$c =='i')
[1] 9
or if you don't know the column
R> which(df == 'i', arr.ind=T)
row col
[1,] 9 3
you may access the element using df[9, 'c']
, or df$c[9]
.
If you wanted to add them you could use df$rownumber <- as.numeric(rownames(df))
, though this may be less robust than df$rownumber <- 1:nrow(df)
as there are cases when you might have assigned to rownames
so they will no longer be the default index numbers (the which command will continue to return index numbers even if you do assign to rownames
).