Regular Expressions

In this lesson of the PHP tutorial, you will learn...
  1. To understand how regular expressions work.
  2. To use regular expressions for advanced form validation.

Regular expressions are used to do sophisticated pattern matching. PHP supports two types of regular expressions: POSIX and Perl. The Perl style is more powerful and much more common, so we'll cover these in this class.

Perl-compatible Regular Expression Functions

preg_match()

The syntax for preg_match() is as follows.

preg_match(pattern, text_to_search);

preg_match() returns 1 if pattern is found in text_to_search and 0 if it is not.

preg_replace()

The syntax for preg_replace() is as follows.

preg_replace(pattern, replacement, text_to_search);

preg_replace() replaces all instances of pattern in text_to_search with replacement.

Regular Expression Tester

We have created a simple PHP-based regular expression tester. The code for the tester is shown below.

Code Sample: RegExp/Demos/Tester.php

<?php
@$Pattern = $_POST['Pattern'];
@$TextToSearch = $_POST['TextToSearch'];
?>
<html>
<head>
 <title>Regular Expression Tester</title>
<style>
 .reg {font-family:Verdana; font-size: 14pt; font-weight:bold; color:darkblue; text-decoration:none; padding: 4px}
 .reg:hover {border: 2px solid red; padding: 2px}
</style>

<script>
function usePattern(PATTERN)
{
 document.formRE.Pattern.value=PATTERN;
 document.getElementById("display").innerHTML="<b>PATTERN: </b>" + PATTERN;
}
</script>
</head>

<body>
<h2><font face="Verdana, Arial, Helvetica, sans-serif">Regular Expression Tester</font></h2>
<form name="formRE" method="post">
<table>
<tr>
 <td align="right"><font size="+2" face="Arial, Helvetica, sans-serif">Text to search:</font></td>
 <td><font size="+3" face="Arial, Helvetica, sans-serif">
   <input type="text" name="TextToSearch" value="<?= $TextToSearch ?>" size="50" maxlength="50">
   </font></td>
</tr>
<tr>
 <td align="right"><font size="+2" face="Arial, Helvetica, sans-serif" id="exp">Pattern:</font></td>
 <td><font size="+3" face="Arial, Helvetica, sans-serif">
   <input type="text" name="Pattern" size="50" value="<?= $Pattern ?>" maxlength="100">
   </font></td>
</tr>
<tr>
 <td colspan="2" align="center" style="font-size:18pt; font-family:Arial, Helvetica, sans-serif; background: #cccccc;">
  <?php
   if (empty($Pattern))
    echo '<font color="blue">Let\'s play!</font>';
   elseif (preg_match($Pattern,$TextToSearch))
    echo '<font color="green">Match</font>';
   else
    echo '<font color="red">No Match</font>';
  ?>
 </td>
</tr>
<tr align="center">
<td colspan="2"><font size="+2" face="Arial, Helvetica, sans-serif">
      <input type="submit" value="Submit">
      <input type="reset">
</font></td>
</tr>

<tr>
 <td colspan="2">
 <table width="100%" border="0" cellpadding="4">
 <tr>
  <th><a href="javascript:usePattern('/^[a-zA-Z0-9_\\-\\.]+@[a-zA-Z0-9\\-]+\\.[a-zA-Z0-9\\-\\.]+$/');" class="reg">Email</a></th>
  <th><a href="javascript:usePattern('/^[0-9]{3}[\\- ]?[0-9]{2}[\\- ]?[0-9]{4}$/');" class="reg">SSN</a></th>
  <th><a href="javascript:usePattern('/^\\(?[2-9][0-9]{2}\\)?[\\- ]?[0-9]{3}[\\- ]?[0-9]{4}$/');" class="reg">Phone</a></th>
 </tr>
 </table>
 </td>
</tr>
</table>
</form>
<div id="display" style="font-size:18pt; font-family:Courier New"><b>PATTERN:</b> <?= $Pattern ?></div>

</body>
</html>

Regular Expression Syntax

A regular expression is a pattern that specifies a list of characters. In this section, we will look at how those characters are specified. As we go through this section, we'll test some regular expression in our browser using our regular expression tester at RegExp/Demos/Tester.php.

Start and End ( ^ $ )

A caret (^) at the beginning of a regular expression indicates that the string being searched must start with this pattern.

  • The pattern ^foo can be found in "food", but not in "barfood".

A dollar sign ($) at the end of a regular expression indicates that the string being searched must end with this pattern.

  • The pattern foo$ can be found in "curfoo", but not in "food".

Number of Occurrences ( ? + * {} )

The following symbols affect the number of occurrences of the preceding character : ?, +, *, and {}.

A questionmark (?) indicates that the preceding character should appear zero or one times in the pattern.

  • The pattern foo? can be found in "food" and "fod", but not "faod".

A plus sign (+) indicates that the preceding character should appear one or more times in the pattern.

  • The pattern fo+ can be found in "fod", "food" and "foood", but not "fd".

A asterisk (*) indicates that the preceding character should appear zero or more times in the pattern.

  • The pattern fo*d can be found in "fd", "fod" and "food".

Curly brackets with one parameter ( {n} ) indicate that the preceding character should appear exactly n times in the pattern.

  • The pattern fo{3}d can be found in "foood" , but not "food" or "fooood".

Curly brackets with two parameters ( {n1,n2} ) indicate that the preceding character should appear between n1 and n2 times in the pattern.

  • The pattern fo{2,4}d can be found in "food","foood" and "fooood", but not "fod" or "foooood".

Curly brackets with one parameter and an empty second paramenter ( {n,} ) indicate that the preceding character should appear at least n times in the pattern.

  • The pattern fo{2,}d can be found in "food" and "foooood", but not "fod".

Common Characters ( . \d \D \w \W \s \S )

A period ( . ) represents any character except a newline.

  • The pattern fo.d can be found in "food", "foad", "fo9d", and "fo*d".

Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9].

  • The pattern fo\dd can be found in "fo1d", "fo4d" and "fo0d", but not in "food" or "fodd".

Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9].

  • The pattern fo\Dd can be found in "food" and "foad", but not in "fo4d".

Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).

  • The pattern fo\wd can be found in "food", "fo_d" and "fo4d", but not in "fo*d".

Backslash-W ( \W ) represents any character except a word character.

  • The pattern fo\Wd can be found in "fo*d", "fo@d" and "fo.d", but not in "food".

Backslash-s ( \s) represents any whitespace character (e.g, space, tab, newline, etc.).

  • The pattern fo\sd can be found in "fo d", but not in "food".

Backslash-S ( \S ) represents any character except a whitespace character.

  • The pattern fo\Sd can be found in "fo*d", "food" and "fo4d", but not in "fo d".

Grouping ( [] )

Square brackets ( [] ) are used to group options.

  • The pattern f[aeiou]d can be found in "fad" and "fed", but not in "food", "faed" or "fd".
  • The pattern f[aeiou]{2}d can be found in "faed" and "feod", but not in "fod", "fed" or "fd".

Negation ( ^ )

When used after the first character of the regular expression, the caret ( ^ ) is used for negation.

  • The pattern f[^aeiou]d can be found in "fqd" and "f4d", but not in "fad" or "fed".

Subpatterns ( () )

Parentheses ( () ) are used to capture subpatterns.

  • The pattern f(oo)?d can be found in "food" and "fd", but not in "fod".

Alternatives ( | )

The pipe ( | ) is used to create optional patterns.

  • The pattern foo$|^bar can be found in "foo" and "bar", but not "foobar".

Escape Character ( \ )

The backslash ( \ ) is used to escape special characters.

  • The pattern fo\.d can be found in "fo.d", but not in "food" or "fo4d".

Form Validation Functions with Regular Expressions

Regular expressions can be used to write sophisticated form validation functions. For example, earlier in the course, we wrote a checkEmail() function that looked like this:

function checkEmail($Email)
{
 $Email = trim($Email);
 if (!checkLength($Email,6))
 {
  return false;
 }
 elseif (!strpos($Email,'@'))
 {
  return false;
 }
 elseif (!strpos($Email,'.'))
 {
  return false;
 }
 elseif (strrpos($Email,'.') < strpos($Email,'@'))
 {
  return false;
 }
 return true;
}

We can use a regular expression to make this function both simpler and more powerful:

function checkEmail($Email)
{
 $EmailPattern = '/^(\w+\.)*\w+@(\w+\.)+[A-Za-z]+$/';
 if (preg_match($EmailPattern,$Email))
 {
  return true;
 }
 else
 {
  return false;
 }
}

A nice thing about this is that we can use virtually the same function to do client-side validation with JavaScript:

function checkEmail(EMAIL)
{
 var reEmail = /^(\w+[\-\.])*\w+@(\w+\.)+[A-Za-z]+$/;
 if (reEmail.test(EMAIL))
 {
  return true;
 } 
 else
 {
  return false;
 }
}

So, by using regular expressions in this way, you make it easy to create a similar function library on the client side.

Regular Expressions Conclusion

Regular expressions are supported to varying degrees in most modern programming languages. For further study, there is a good online reference at http://www.regular-expressions.info/reference.html.

To continue to learn PHP go to the top of this page and click on the next lesson in this PHP Tutorial's Table of Contents.

Use of this website implies agreement to the following:

Copyright Information

All pages and graphics on this Web site are the property of Webucator, Inc. unless otherwise specified.

None of the content on this website may be redistributed or reproduced in any way, shape, or form without written permission from Webucator, Inc.

No Printing or saving of web pages

This content may not be printed or saved. It is for online use only.


Linking to this website

You may link to any of the pages on this website; however, you may not include the content in a frame or iframe without written permission from Webucator, Inc.


Warranties

This website is provided without warranty of any kind. There are no guarantees that use of the site will not be subject to interruptions. All direct or indirect risk related to use of the site is borne entirely by the user. All code and explanations provided on this site are provided without warranties to correctness, performance, fitness, merchantability, and/or any other warranty (whether expressed or implied).

For individual private use only

You agree not to use this online manual to deliver or receive training. If you are delivering or attending a class that is making use of this online manual, you are in violation of our terms of service. Please report any abuse to courseware@webucator.com. If you would like to deliver or receive training using this manual, please fill out the form at http://www.webucator.com/Contact.cfm.