Multiple regex matches in Google Sheets formula

flo5783 picture flo5783 · Apr 16, 2017 · Viewed 20.2k times · Source

I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :

=REGEXEXTRACT(A1, "\d-")

My problem is that it only returns the first match... how can I get all matches?

Example text:

"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"

My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).

I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula. Something like the "global - Don't return after first match" option on regex101.com

I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).

Any help appreciated! Thanks :)

Answer

Aurielle Perlmann picture Aurielle Perlmann · Apr 18, 2017

You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:

=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))

basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:

enter image description here