CHAPTER 9 – REGULAR EXPRESSIONS – Escape Sequences
As shown in the previous table, the character is the general escape character. In combination with the character that follows it, the stands for a special group of characters. Table 9.2 shows the different cases.
Table 9.2 Escape Sequences Case Description ? + * The first use of the escape character is to take away the special meaning [ ] { of the other metacharacters. For example, if you need to match 4** in } your pattern, you can use '/^4**$/' Be careful with using double quotes around your patterns, because PHP gives a special meaning to the in there too. The following pattern is therefore equal to the one above. "/^4\*\*$/" (Note: In this case, "/^4**$" would also have worked because * is not recognized by PHP as a valid escape sequence, but what is shown here is not correct way to do it.)
Table 9.2 Escape Sequences Case Description \ Escapes the so that it can be used in patterns. <?php $subject = 'PHP5'; $pattern1 = '/^PHP\5$/'; $pattern2 = "/^PHP\\5$/"; $ret1 = preg_match($pattern1, $subject, $matches1); $ret2 = preg_match($pattern2, $subject, $matches2); var_dump($matches1, $matches2); ?> Now you are probably wondering why we used three slashes in $pattern1; this is because PHP recognizes the as a special character inside single quotes when it parses the script. This is because you need to use the to escape a single quote in such a string ($str = 'derick's';). So, the first escapes the second for the PHP parser, and that combined character escapes the third slash for PCRE. The second pattern inside double quotes even has four slashes. This is because inside double quotes 5 has a special meaning to PHP. It means "the octal character 5," which is, of course, not really useful at all, but it does give a problem for our pattern so we have to escape this slash with another slash, too. a The BEL character (ASCII 7). e The Escape character (ASCII 27). f The Formfeed character (ASCII 12). n The Newline character (ASCII 10). r The Carriage Return character (ASCII 13). t The Tab character (ASCII 9). xhh Any character represented by its hexadecimal code (hh). Use xdf for the ß (iso-8859-15), for example. ddd Any character represented by its octal code (ddd). d Any decimal digit, which is the same as specifying the character class [0-9] in a pattern. D Any character that is not a decimal digit (is the same as [^0-9]). s Any whitespace character. (It the same as [tfrn ], or in words: tab, formfeed, carriage return, newline, and space.) S Any character that is not a whitespace character.
Table 9.2 Escape Sequences Case Description w Any character that is part of a words, meaning any letter or digit, or the underscore character. Letters are letters used in the current locale (language-specific): <?php $subject = "Montréal"; /* The 'default' locale */ setlocale(LC_ALL, 'C'); preg_match('/^w+/', $subject, $matches); print_r($matches); /* Set the locale to Dutch, which has the é in it's alphabet */ setlocale(LC_ALL, 'nl_NL'); preg_match('/^w+/', $subject, $matches); print_r($matches); ?> outputs Array ( [0] => Montr ) Array ( [0] => Montréal ) Tip: For this example to work, you will need to have the locale nl_NL installed. Names of locales are system-dependent, too--for example, on Windows, the name of the locale is called nld_nld. See http://www.mac- max.org/locales/index_en.html for locale names for MacOS X and http:// msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/ _crt_language_strings.asp for Windows. W Any character that does not belong to the w set. b An anchor point for a word boundary. In simple words, this means a point in a string between a word character (w) and a non-word charac- ter (W). The following example matches only the letters in the subject: <?php $string = "##Testing123##"; preg_match('@b.+b@', $string, $matches); print_r($matches); ?> outputs Array ( [0] => Testing123 )
Table 9.2 Escape Sequences Case Description B The opposite of the b, it acts as an anchor between either two word characters in the w set, or between two non-word characters from the W set. Because of the first point that matches this restriction, the fol- lowing example only prints estin: <?php $string = "Testing"; preg_match('@B.+B@', $string, $matches); echo $matches[0]. "n"; ?> Q ... E Can be used inside patterns to turn off the special meaning of metachar- acters. The pattern '@Q.+*?E@' will therefore match the string '.+*?'. 9.3.1.6 Examples '/w+s+w+/' Matches two words separated by whitespace. '/(d{1,3}.){3}d{1,3}/' Matches (but not validates) an IP address. The IP address may appear anywhere in the string. <?php $str = "My IP address is 212.187.38.47."; preg_match('/(d{1,3}.){3}d{1,3}/', $str, $matches); print_r($matches); ?> outputs Array ( [0] => 212.187.38.47 [1] => 38. ) It is interesting to notice that the second element only contains the last one of the three matched subpatterns.