R extract part of string

Lisann picture Lisann · Mar 15, 2012 · Viewed 33.7k times · Source

I have a question about extracting a part of a string. For example I have a string like this:

a <- "DP=26;AN=2;DB=1;AC=1;MQ=56;MZ=0;ST=5:10,7:2;CQ=SYNONYMOUS_CODING;GN=NOC2L;PA=1^1:0.720&2^1:0"

I need to extract everything between GN= and ;.So here it will be NOC2L.

Is that possible?

Note: This is INFO column form VCF file format. GN is Gene Name, so we want to extract gene name from INFO column.

Answer

kohske picture kohske · Mar 15, 2012

Try this:

sub(".*?GN=(.*?);.*", "\\1", a)
# [1] "NOC2L"