Getting the first letter from each row of a certain column in a data.frame in R

Question 1

Getting the first letter from each row of a certain column in a data.frame in R

r extract dataframe rowname

R Question · Apr 30, 2011 · Viewed 15.6k times · Source

Answer

Answer

Won't substr(row.names(data), 1, 1) get you the vector of first letters you seem to be after?

EDIT: I initially wrongly wrote substr(row.names(data)), omitting the indices.

For the second part of your question, assuming firstletter is a vector:

table(firstletter) gives you the frequency table of the first letters. So a bit of manipulation gets what you want, for example:

names(sort(table(firstletter), decreasing=TRUE)[1:3])

Does this help? Now you may want to do something such as, only keep from the original dataset the rows corresponding to these three most frequent letters. One way to do this would be:

top3letters <- names(sort(table(vec), decreasing=TRUE)[1:3])
data <- subset(data, firstletter %in% top3letters)

Question 2

I experience the following problem. One is given a data frame with 5 categories-a,b,c,d,e for each name(names are 54). I give you a small extract from the whole data frame in R just to give you a feeling on the topic.

                        **a       b      c           d      e
Teniers                  15      12     13          6      G
Van Dyck                 15      10     17         13      G
Bourdon                  10       8      8          4      H
Le Brun                  16      16      8         16      H
Le Suer                  15      15      4         15      H
Poussin                  15      17      6         15      H**

I have succeeded to arrange the names alphabetically with the "sort" function, so that not only the names column gets arranged alphabetically but their 5 categories belonging to each names moved as well. So far, so good, but the task is to take the first letter from each name and to select those names only whose beginning letters apear most often. I can get the first letters with the "strsplit" function, then the first letters appear on each row, but to the left ot them stays everywhere [1]"the fist letter", new row[1] "another first letter", new row1[...] till the 54th; and not the position in the dataframe..

So, any ideas?

Here is an extract from the code...

library(MASS)
data(painters)
attach(painters)
      painters
      str(painters)
      summary(painters)

y <- as.vector(rownames(painters))
     is.vector(y)


  
   
sortnames <- painters[order(y) , ]
as.data.frame(  painters[order(y) , ] )   ##sorted in list; each name with ist relevant criteria

rownames(sortnames)
z <- rownames(sortnames)
str(z)
is.vector(z)
strsplit(z, "")

as.list(strsplit(z, ""))


liste <- as.list(strsplit(z, ""))
   matrix <- as.matrix(liste)
   matrix
   matrix[,1]
   matrix[1,]
   matrix[1,1]
   matrix[[1]] [1]  
   
   first <- matrix (as.matrix(liste))
   for(i in 1:54)  {print( matrix[[i]][1])  }    
   
   str(first)

Regards and thanks for the fast response in advance!!

EDIT

what I need is:

to create a vector(or a matrix with dimension[54,1]) that contains only the first letter of each name in the "rownames" column, each row of it should be the number of the row from the sorted vector in the data frame, so that we keep the position in the dataframe shown.

e.g.

[1]"A"
[2]"B"
[3]"B"
[4]"C"
....

In other words, one has to extract a vector/matrix with only the first letter of rownames(in the dataframe "rownames" is defined as only the painters names, so the very 1st column of the 6 ;) )

I appreciate your help.

substr(data, 1, 1)

i got them like that:

 firstletter <- substr(rownames(sortnames), 1, 1)
 firstletter <- as.data.frame(firstletter) **##how should I define "firstletter" for later use??**
 firstletter
 

1            A
2            B
3            B
4            B
5            B
6            C
7            C
8            C
9            D
10           D
11           D
12           D
13           D
14           D
15           D
16           F
17           F
18           F
19           G
20           G
21           G
22           H
23           J
24           J
25           L
26           L
27           L
28           L
29           M
30           M
31           O
32           P
33           P
34           P
35           P
36           P
37           P
38           P
39           P
40           P
41           R
42           R
43           R
44           T
45           T
46           T
47           T
48           T
49           T
50           V
51           V
52           V
53           V
54           V

worked like a charm. the first letter of the painters names is extracted and the row number stays as it should.

So, thanks a lot !

p.s. I have a last question only, is there a function or a command in R that can now take this "firstletter" [vector/matrix/list/data.frame] depends how we define its structure(what is the best decision? here for later use) and check which are the 3 most often appearing first letters in the vector/matrix/list and extracting only them? or it would be too complicated?

EDIT: All i need is now just to delete the redundant last row from a certain matrix after a substract(rbind command)

                  firstletter Composition Drawing Colour Expression School
Da Udine      "D"         "10"        " 8"    "16"   " 3"       "A"   
Del Piombo    "D"         " 8"        "13"    "16"   " 7"       "A"   
Diepenbeck    "D"         "11"        "10"    "14"   " 6"       "G"   
Palma Giovane "P"         "12"        " 9"    "14"   " 6"       "D"   
Palma Vecchio "P"         " 5"        " 6"    "16"   " 0"       "D"   
Pordenone     "P"         " 8"        "14"    "17"   " 5"       "D"   
Teniers       "T"         "15"        "12"    "13"   " 6"       "G"   
The Carraci   "T"         "15"        "17"    "13"   "13"       "E"   
Tintoretto    "T"         "15"        "14"    "16"   " 4"       "D"   
Titian        "T"         "12"        "15"    "18"   " 6"       "D"   
Da Vinci      "D"         "15"        "16"    " 4"   "14"       "A"   
Domenichino   "D"         "15"        "17"    " 9"   "17"       "E"   
Poussin       "P"         "15"        "17"    " 6"   "15"       "H"   
The Carraci1  "T"         "15"        "17"    "13"   "13"       "E"

Have googled for a long time and no function worked for me till now..

Any suggestions?

Getting the first letter from each row of a certain column in a data.frame in R

EDIT

Answer

Related questions