A Guide to Better Understand NFA-Construction: The Theory

NFA Construction - The Theory

The Theory


This guide follows a guide for NFA Construction. If this is your first article here it might help to start at the Compiler Construction Guide.

If you haven’t already I recommend checking out my article talking about Non-deterministic Finite Automaton before reading this article.

Assuming you have this image of an NFA is meant to serve as a refresher to NFAs and how they can be represented. Primarily since NFA construction is code-based we will focus on the mathematical representation as opposed to the graphical representation for obvious reasons.

Figure 1: NFA Example

The idea we would like to follow is derived from the Compilers Principles, Techniques, and Tools book. We will take a recursive approach and build the NFA from regular expressions. The idea is to take multiple regular expressions and break them down into multiple NFAs. Then from there, we will glue all of those NFAs together with a new initial state and epsilon transitions. Now how do we do that?

Base Cases of Regular Expression to NFAs

First, we need to build some intuition behind what our code is going to do. For this, we will take the basic regular expressions and build them on our own. Let’s say we just have an epsilon, nothing more just a simple one-state NFA. This will be important as technically it is a base case of the recursive definition of regular expressions, as seen in my article about regular expressions. That being said the other base case is a simple symbol. Let’s use the example of the symbol a. The two regular expressions look like the following…

Figure 2: Simple Base Cases of Regular Expressions as NFAs

Now if we concern ourselves with the recursive cases, we have concatenation, logical or, and Kleene star.

Recursive Cases of Regular Expression to NFAs: Concatenation

With Concatenation, we have two regular expressions. To make things simple we will represent these two as a machine that has an initial state and a final state. In practice, these will be built recursively so we do not have to worry much about how they are represented.

To build them into one regular expression we can create a new initial state and a new final state. We can then add 3 epsilon-transitions, one from the new initial state to the initial state of the first machine, then one from the final state of the first machine to the initial state of the second, and finally one from the final state of the second machine to the new final state.

Figure 3: Regular Expression Concatenation as an NFA

Recursive Cases of Regular Expression to NFAs: Logical Or

Next, we have logical or. Just as concatenation we will represent our previous two regular expressions as a machine with the initial and final states. Now we will once again add a new initial state and a new final state with some epsilon transitions. The difference now is that the epsilon transitions will be from the new initial state to both of the machine’s initial states, then their final states to the new final state. This will allow us to enter both machines in “parallel” with each other, and accept if one or both are accepting.

Figure 4: Regular Expression Logical Or as an NFA

Recursive Cases of Regular Expression to NFAs: Kleene Star

Finally, we have Kleene star. This one is tricky as you have to worry about cycles when implementing it. To prevent it we will build the diagram such that it cannot cycle.

To begin we again create a new initial state and a new final state. This time we only have one machine to worry about so we again connect the initial states together and final states together with epsilon-transitions. Now we need to add two more, one from the machine’s final state to the machine’s initial state, and another from the new initial state to the new final state. This will allow us to repeat the machine as many times as needed without constantly transitioning from one state to the next all on epsilon transitions. That being said in theory that is not an issue but in practice it is.

Figure 5: Regular Expression Kleene star as an NFA

Putting it all together

All together now we have the following diagrams to work with. It may help to save this image as you will most likely reference it a lot in both implementing and debugging your solution.

Figure 6: Full NFA from Regular Expression diagram

From here we can now begin implementing this NFA construction. We will begin by implementing a regular expression and then take that regular expression and build an NFA from it.

Books

Listed below are a few of my favorite books that taught me a fair amount of compiler construction. If you have books that benefited you on this topic, please share them in the comments below and expand this list!