matching in perl

  • match any one of a set of characters , We put the several options in square brackets, select between single options
  • with . (dot) match any single character, . can be taken to match any character whatsoever except a ‘newline’
  • match several characters in the middle, + sign tells Perl to match one or more of the preceding character – one or more of any character  with   .+
  • match zero or more characters with .*
  • ? matches zero or one of the preceding character
  • simple \ (backslash) to indicate that the subsequent character is to be regarded as something to match, and not some fancy control character

modifiers: /test/i

  • i – case insensitivity
  • s – allows match foo on one line and bar on next so that even /./ will match a “newline” character.
  • m – allows the ^ $ to match after a new line and before next newline
  • g keep track of where in string it left off. G means end of previous match

extract information from part of a match –       /alpha(.+)gamma/

  • “xxalphazzzgamma”
  • “alpha beta gamma delta”

what do the (parentheses) achieve? The answer is simple – everything in parenthesis is put into the Perl variable $1. (If you have a second set of parentheses, the contents of this set go into $2, and so on).

 
\n newline (line feed)
\w a word character [a-zA-Z0-9_]
\W NOT a word character, that is [^a-zA-Z0-9_]
\s white space (new line, carriage return, space, tab, form feed)
\S NOT white space
\d a digit [0-9]
\D NOT a digit, i.e. [^0-9]

  • \b Match a word boundary
  • \B Match a non-(word boundary)
  • \A Match only at beginning of string
  • \Z Match only at end of string, or before newline at the end
  • \z Match only at end of string
  • \G Match only where previous m//g left off (works only with /g)
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment