What are Regular Expression Derivatives and How To Use Them?

Regular Expression Derivatives

Regular Expression Derivatives

This article is a section to the more extensive guide, The Ultimate Guide To Computational Computer Science Theory.

The idea of a regular expression derivative is to move through a regular expression as to say if we wanted to know if the string “aaba” matches the pattern aaba. As humans, we would immediately see that it does, but we would need to check each letter against the pattern as a computer. This gets tricky when we bring Kleene star and or operations into the mix.

Nullability

To start, we want to discuss an idea called nullability. Something is said to be nullable if it matches the empty string. We will write a recursive function now that will determine if a Regular Expression is nullable.

First, we recognize that we need some base cases. A symbol and the empty string are obvious base cases since they match regular expressions, but there is one more. We want to consider if an empty set is also nullable. Since regular expressions are patterns that describe sets, it’s possible that we can see an empty set here. That would mean that nothing matches the pattern.

With these cases we see that epsilon does match epsilon, both the other two do not so we have…

$$\gamma( \epsilon ) = \epsilon $$

$$\gamma( \emptyset) = \emptyset $$

$$\gamma( \alpha) = \emptyset $$

Now we will also consider Kleene star. We mentioned before that the Kleene star matches infinite concatenations of itself and the empty string. So with confidence, we can say.

$$\gamma( r^*) = \epsilon $$

Now we will look at the cases where we combine two regular expressions. These are concatenation and the logical or operator. These are recursive cases, so we will simply check both regular expressions to see if they are nullable and either and them if they’re concatenated or logical or them if they’re logical or.

$$\gamma( r \circ s ) = \gamma( r )\ \& \ \gamma( s ) $$

$$\gamma( r | s ) = \gamma( r )\ | \ \gamma( s ) $$

The Derivative

Now that we have discussed nullability, we can define the derivative operator for Regular Expressions. This idea is straightforward for a human, so we just have to break down the cases.

First, we have the regular expression of the empty string. If we compare a symbol to the empty string, they should not match. Since it does not match, we use the empty set to show nothing matches. That is our first case.

$$\delta_\alpha ( \epsilon ) = \emptyset $$

Now we should ask what happens when we try to match a symbol to the empty set. Clearly, this shouldn’t match since the empty set has nothing to match it against.

$$\delta_\alpha ( \emptyset ) = \emptyset $$

Next, we have a symbol. This one is easy. If the symbols match, then it’s a match. Otherwise, it doesn’t. If it matches what we have left is the empty string.

$$\delta _\alpha ( \alpha ) = \epsilon $$

$$\delta _\alpha ( \beta ) = \emptyset $$

Now we have the operations. First up is Kleene star. Since we can continually match a regular expression, our answer needs to keep the Kleene star format. But if we get a symbol that does not match that regular expression at some point, we need to account for that.

We should take a moment here and discuss what happens when you concatenate the empty set with a regular expression. Concatenate will take every element of the first set and prefix it to the second set. If there is nothing in one of the two sets, then concatenation will return the empty set. This is because there was nothing to be combined with another element.

That being said, if we concatenate the derivative of the Regular Expression within the Kleene star to the Kleene star itself, we should be handling the cases where the string does not match. If it does match, concatenating epsilon will simplify back to the Kleene star again.

$$\delta _\alpha ( r^* ) = \delta _\alpha (\ r\ ) \circ r^* $$

Next, we look at concatenation. At this point, I expect you to have been asking why to bother defining nullability if we were not going to use it. Well, your wait ends here; this is the case where it becomes important.

We have two cases with concatenation that can occur. The first is simple; if we have two regular expressions concatenated, we simply take the derivative of the first and concatenate it on the second. This is because the second will be unaffected when trying to match a symbol, and we will recur to the simple cases in the first regular expression.

This has an issue, though. What happens if our first regular expression is epsilon or a Kleene star. Our symbol that we are taking the derivate with respect to may not match it, but that doesn’t mean we’re wrong. This is where our nullable operator comes in handy. We can concatenate the nullable operator of the first regular expression to the derivative of the second.

Now we just or them together and we have…

$$ \delta_\alpha( r \circ s ) = [ \delta_\alpha( r) \circ s ] \ \ |\ \ [ \gamma(r) \circ \delta_\alpha(s) ] $$

Lastly we have the logical or to handle. This one is the simplest recursive case. It is either the derivative of the first or the derivative of the second.

$$ \delta_\alpha( r | s ) = \delta_\alpha( r )\ \ |\ \ \delta_\alpha( r ) $$

Extending the Regular Expression Derivative

We see what happens when we take the derivative with respect to one symbol but what happens if we have a string of symbols. We need a mathematical way to derive this. For that we simply say…

$$ \delta_\epsilon( r ) = r $$

$$ \delta_{\alpha\sigma}( r ) = \delta_{\sigma}( \delta_\alpha( r ) ) $$

This will allow us to recurs through the string and define if a pattern matchs.

We can then go further and define if a Regular Expression matches a string using the nullable operator. Since we know that epsilon marks the ending of our matches if it’s successful and otherwise, we see an empty set.

$$ M( r, \epsilon ) = \gamma(r) == \epsilon $$

$$ M(r, \alpha\sigma) = M( \delta_\alpha , \sigma ) $$

Now we have all of the tools needed to handle a Regular Expression.

Books

Here is a list of books that taught me a great deal about Computational Theory. If there are any books that you have benefited from, feel free to share them in the comments and help grow this list!

Research Papers

A couple of names were mentioned in this article of famous historical authors who I cited some works. Here is a list of these research papers, which are fairly easy reads once you start to understand the computational theory. I recommend reading them after this course.