samedi 25 avril 2015

Extracting text between two keywords or a keyword and \n

I have a set of lines where most of them follow this format

STARTKEYWORD some text I want to extract ENDKEYWORD

I want to find these lines and extract information from them.

Note, that the text between keywords can contain a wide range of characters (latin and non-latin letters, numbers, spaces, special characters) except \n.

ENDKEYWORD is optional and sometimes can be omitted.

My attempts are revolving around this regex

STARTKEYWORD  (.+)(?:\n| ENDKEYWORD)

However capturing group (.+) consumes as many characters as possible and takes ENDKEYWORD which I do not need.

Is there a way to get some text I want to extract solely with regular expressions?

Aucun commentaire:

Enregistrer un commentaire