Recommendation
A lot of this may be better explained in perlre or perlfaq6 (Regexps), but if you like from simple to advanced examples
you should still check this out.
IPv4 Address Matching
Let's look at a basic IPv4 address: 127.0.0.1
Here will be the final regular expression we use: /^(?:0*(?:2(?:[0-4]\d|5[0-5])|1?\d{1,2})(?:\.|$)){4}/
We break the IP down to different parts:
<127>
<.>
<0> <.>
<0> <.> <1>
Since we know what an IP is shaped like (from 0.0.0.0 to 255.255.255.255), we can break the numbers down more to fit a regular
expression:
<0 - 255> <.>
<0 - 255> <.>
<0 - 255> <.>
<0 - 255>
Looking at more like a regexp:
((<2> (<0 - 4>
<0 - 9> or <5>
<0 - 5>)) or
<0 - 1> <0 - 9>
<0 - 9>)
<.> etc., etc..
Now converting this to a regular expression...we'll assume the regexp we're making will be replaced in the code where it says
'REGEXP':
$ip = "127.0.0.1"; if ($ip =~ /REGEXP/x) { print "$ip is a valid IPv4 address.\n"; } else { print "$ip is not a valid IPv4 address.\n"; }The reason there is a 'x' at the end of the /REGEXP/ is so we can allow white space and comments inside of the regular expression.
^
(?:0* (?:2
(?:[0-4]
\d
| 5
[0-5])
| 1? \d{1,2})
(?:
\. | $
)){4}
$
matches the end of the line.
*
matches 0 or more times.
?
matches 0 or 1 times.
+
(not used in this) matches 1 or more times.
{}
is used for matching.
{n}
matches n times.
{n,}
matches at least n times.
{n,m}
matches at least n times but not more than m times.
^
matches at the beginning of the line.
()
is for grouping (capturing and clustering data).
?:
at the beginning of ()
clusters but doesn't capture the data.
[]
is for character classing.
-
in a character class matches anything from the character before it to the
character after it.
|
is for alternation. There is grouping because we need to alternate what's
in the cluster and not just everything.
\.
is just the decimal character. The backslash quotes the metacharacter
after it, in this case .
. It must be backslashed because a
\d
matches a digit.
.
in a regular expression matches any character (except newline unless specific regular expression options
are added at the end).
Changes
August 29, 2001 - I began writing this.
Author
That would be me, Samy Kamkar. You can reach me at CommPort5@LucidX.com or on
IRC on SUIDnet in #suid with the IRC name 'CommPort5'.