Using match to find substrings in strings with only bash

Adesso picture Adesso · Mar 7, 2012 · Viewed 93.5k times · Source

Although I am almost sure this has been covered, I can't seem to find anything specific to this. As I continue my journey on learning bash I keep finding parts where I am baffled as to why things happen the way they do.

Searching and replacing or just matching sub-strings in strings is most likely one of the first thing you do when writing scripts. But, trying to stick to one single language or set of tools is difficult to do in bash, as you are able to solve most problem in multiple ways. I am doing my best to stay as low level as possible with bash. I have run into a snag that I need someone to explain to me.

Doing sub-string a search in bash with match gives me different results depending on the regular expression I use, and I am not sure why.

#!/bin/bash
Stext="Hallo World"
echo `expr "$Stext" : '^\(.[a-z]*\)'` # Hallo
echo `expr "$Stext" : '.*World'`      # 11

Although both search for a word, I think, both don't return what they find. Why?

Answer

kev picture kev · Mar 7, 2012

You can use the BASH_REMATCH variable in bash to get the matched string:

$ Stext="Hallo World"
$ [[ $Stext =~ ^.[a-z]* ]] && echo $BASH_REMATCH
Hallo
$ [[ $Stext =~ ^(.[a-z]*) ]] && echo ${BASH_REMATCH[1]}
Hallo

Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.