I have a file that looks something like this:
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
I need to extract anything within the quotes that follow name=
, i.e., content_analyzer
, content_analyzer2
and content_analyzer_items
.
I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.
Since you need to match content without including it in the result (must
match name="
but it's not part of the desired result) some form of
zero-width matching or group capturing is required. This can be done
easily with the following tools:
With Perl you could use the n
option to loop line by line and print
the content of a capturing group if it matches:
perl -ne 'print "$1\n" if /name="(.*?)"/' filename
If you have an improved version of grep, such as GNU grep, you may have
the -P
option available. This option will enable Perl-like regex,
allowing you to use \K
which is a shorthand lookbehind. It will reset
the match position, so anything before it is zero-width.
grep -Po 'name="\K.*?(?=")' filename
The o
option makes grep print only the matched text, instead of the
whole line.
Another way is to use a text editor directly. With Vim, one of the
various ways of accomplishing this would be to delete lines without
name=
and then extract the content from the resulting lines:
:v/.*name="\v([^"]+).*/d|%s//\1
If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:
grep -o 'name="[^"]*"' filename
In all of the commands above the results will be sent to stdout
. It's
important to remember that you can always save them by piping it to a
file by appending:
> result
to the end of the command.