The Implementation
This article follows a guide for Compiler Construction. If this is your first article here, it might help to start at the Compiler Construction Guide.
Regular Expressions
With this implementation, we obviously need to have regular expressions. We will piggyback off of our previously defined regular expressions in our NFA Construction. They are repeated below.
def build-epsilon():
return ["Regular Expression", "Epsilon"]
def build-symbol(symbol):
return ["Regular Expression", "SYMBOL", symbol]
def build-concat(re-1, re-2):
return ["Regular Expression", "CONCAT", re-1, re-2]
def build-or(re-1, re-2):
return ["Regular Expression", "OR", re-1, re-2]
def build-kleene_star(re):
return ["Regular Expression", "KLEENE STAR", re]
def is-epsilon(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "Epsilon"
def is-symbol(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "SYMBOL"
def is-concat(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "CONCAT"
def is-or(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "OR"
def is-kleene-star(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "KLEENE STAR"
We need to add two more things now still. We need a representation for the empty set. For this, we will do just as we did for epsilon. The other is an and regular expression. This isn’t used much, but it is still essential. For this, we will take after or.
def build-empty-set():
return ["Regular Expression", "Empty set"]
def is-empty-set(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "Empty set"
def build-and(re-1, re-2):
return ["Regular Expression", "AND", re-1, re-2]
def is-or(re):
return type(re) == list and re[0] == "Regular Expression" and re[1] == "AND"
Nullability
Now rather than implementing an NFA, we want to implement the derivative. We discussed Regular Expression Derivatives before, but we will once again repeat the mathematical functions as a reference.
To begin we need to implement the nullable function.
$$\gamma( \epsilon ) = \epsilon $$
$$\gamma( \emptyset) = \emptyset $$
$$\gamma( \alpha) = \emptyset $$
$$\gamma( r^*) = \epsilon $$
$$\gamma( r \circ s ) = \gamma( r )\ \& \ \gamma( s ) $$
$$\gamma( r | s ) = \gamma( r )\ | \ \gamma( s ) $$
$$\gamma( r \& s ) = \gamma( r )\ \& \ \gamma( s ) $$
def nullable(re):
if is-epsilon(re):
return build-epsilon()
else if is-empty-set(re):
return build-empty-set()
else if is-symbol(re):
return build-empty-set()
else if is-kleene-star(re):
return build-epsilon()
else if is-concat(re):
return build-and( nullable(re[2]), nullable(re[3]) )
else if is-or(re):
return build-or( nullable(re[2]), nullable(re[3]) )
else if is-and(re):
return build-and( nullable(re[2]), nullable(re[3]) )
else:
raise Error("Unknown Regular Expression")
The Derivative
Now that we have the nullable operator in code, we move on to the derivative itself. Once again we did discuss this, but we will repeat it again as a reference.
$$\delta_\alpha ( \epsilon ) = \emptyset $$
$$\delta_\alpha ( \emptyset ) = \emptyset $$
$$\delta _\alpha ( \alpha ) = \epsilon $$
$$\delta _\alpha ( \beta ) = \emptyset $$
$$\delta _\alpha ( r^* ) = \delta _\alpha (\ r\ ) \circ r^* $$
$$ \delta_\alpha( r \circ s ) = [ \delta_\alpha( r) \circ s ] \ \ |\ \ [ \gamma(r) \circ \delta_\alpha(s) ] $$
$$ \delta_\alpha( r | s ) = \delta_\alpha( r )\ \ |\ \ \delta_\alpha( r ) $$
$$ \delta_\alpha( r \& s ) = \delta_\alpha( r )\ \ \& \ \ \delta_\alpha( r ) $$
def derivative(re, respect):
if is-epsilon(re):
return build-empty-set()
else if is-empty-set(re):
return build-empty-set()
else if is-symbol(re):
if re[2] == respect:
return build-epsilon()
else:
return build-empty-set()
else if is-kleene-star(re):
return build-concat(derivative(re[2], respect), re)
else if is-concat(re):
return build-or(
build-concat(derivative(re[2], respect), re[3]),
build-concat(nullable(re[2]), derivative(re[3], respect)
)
else if is-or(re):
return build-or(
derivative(re[2], respect),
derivative(re[2], respect)
)
else if is-and(re):
return build-and(
derivative(re[2], respect),
derivative(re[2], respect)
)
else:
raise Error("Unknown Regular Expression")
We can also add the the recursive structure for the respect parameter to simplify our implementation a bit. For this it would follow as…
def derivative(re, respect):
if respect.length > 1:
return derivative(respect[0], respect[1:])
if is-epsilon(re):
return build-empty-set()
else if is-empty-set(re):
return build-empty-set()
else if is-symbol(re):
if re[2] == respect:
return build-epsilon()
else:
return build-empty-set()
else if is-kleene-star(re):
return build-concat(derivative(re[2], respect), re)
else if is-concat(re):
return build-or(
build-concat(derivative(re[2], respect), re[3]),
build-concat(nullable(re[2]), derivative(re[3], respect)
)
else if is-or(re):
return build-or(
derivative(re[2], respect),
derivative(re[2], respect)
)
else if is-and(re):
return build-and(
derivative(re[2], respect),
derivative(re[2], respect)
)
else:
raise Error("Unknown Regular Expression")
Simplify
Before writing a matching function we need a way to simplify our regular expressions. This function is pretty straightforward, we will essentially simplify anything with a recursive case. So logical and, logical or, Kleene star and concatenation.
Logical and will return an empty set if either of its children is the empty set.
Logical or and Concatenation will simplify if one of their regular expressions is the empty set.
Kleene star will simplify if its regular expression is either the empty set or epsilon.
def simplify(re):
if is-or(re):
r1, r2 = simplify(re[2]), simplify(re[3])
if is-empty-set(r1):
return r2
if is-empty-set(r2):
return r1
if is-and(re) or is-concat(re):
r1, r2 = simplify(re[2]), simplify(re[3])
if is_empty-set(r1) or is-empty-set(r2):
return build_empty-set()
if is-kleene-star(re):
r = simplify(re[2])
if is_empty-set(r) or is-epsilon(r):
return r
return build-kleene-star(r)
return re
Matching
The last function we need to implement is the matching operator. This one is relatively easy to implement.
def match(re, string):
if string.length == 0:
return is-epsilon(nullable(re))
else:
return match(derivative(re, string[0]), string[1:])
This function is one-to-one with the math notation. We could be clearer and more concise with our code by writing this instead.
def match(re, string):
return is-epsilon(simplify(nullable(derivative(re, string))))
Books
Books
Listed below are a few of my favorite books that taught me a fair amount of compiler construction. If you have books that benefited you on this topic, please share them in the comments below and expand this list!
- Introduction to Languages and the Theory of Computation, By John Martin
- Regular Algebra and Finite Machines, By John Conway
- Compilers: Principles, Techniques, and Tools, By Alfred Aho, Monica Lam, Ravi Sethi, and Jeffrey Ullman
- Introduction to the Theory of Computation, By Michael Sipser
