I have a question about extracting a part of a string. For example I have a string like this:
a <- "DP=26;AN=2;DB=1;AC=1;MQ=56;MZ=0;ST=5:10,7:2;CQ=SYNONYMOUS_CODING;GN=NOC2L;PA=1^1:0.720&2^1:0"
I need to extract everything between GN=
and ;
.So here it will be NOC2L
.
Is that possible?
Note: This is INFO
column form VCF file format. GN is Gene Name, so we want to extract gene name from INFO
column.
Try this:
sub(".*?GN=(.*?);.*", "\\1", a)
# [1] "NOC2L"