I am working with NCBI Reference Sequence accession numbers like variable a
:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
To get information from the biomart package I need to remove the .1
, .2
etc. after the accession numbers. I normally do this with this code:
b <- sub("..*", "", a)
# [1] "" "" "" "" "" ""
But as you can see, this isn't the correct way for this variable. Can anyone help me with this?
You just need to escape the period:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
gsub("\\..*","",a)
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"