Regular Expressions

This article is a section to the more extensive guide, The Ultimate Guide To Computational Computer Science Theory.

In the last section, we talked about Regular Languages and Finite Automaton in a general sense. We now want to know more about these Finite automata. To do so first requires us to understand Regular Expressions.

Regular Expressions are a tool for any programmer. We use them all of the time for numerous different tasks. But many programmers lack the theory of what is actually happening when they use a language’s built-in package.

To begin, we will first state why we want Regular Expressions before Finite Automaton. This is because Regular Expressions are a form of Finite Automaton, just a simpler way a brilliant mathematician came up with describing them early on. This Mathematician’s name was Stephen Cole Kleene. He found that you can describe patterns with these basic rules, then later it was discovered that they can easily be re-written into Finite Automaton, as we will see later.

The Theory

The idea of a regular expression is to describe a pattern in the simplest way. Take the letter “a,” for example; if we wanted to describe the pattern which matches “a,” it’s actually very simple. It would just be the letter “a.” In a sense, we are just saying that “a” = “a.”

This gets slightly trickier when we say we want to match nothing. There isn’t a symbol in our spoken languages that matches nothing or an empty string. In programming, it’s often that we use double quotes with nothing within them, “”. In computational theory, we have two accepted symbols that match nothing. This depends on the class you’re taking or the book you’re reading. These symbols are epsilon(ε) and lambda (λ). For the purpose of this guide, we will use epsilon.

Now that we can represent a letter and nothing, we actually have the first two parts to our definition.

Regular Expressions are defined by…

α is a Regular Expression that matches only to α
ε is a Regular Expression which only matches to the empty string

We have a couple of things here to consider. Some books will list out the operations that Regular Expressions follow, but these books often leave out a lot because they don’t know better. That being said, we will follow a computer scientist named Janusz Brzozowski, who has a more open definition. He defines a property that says if we have two regular expressions, P and Q, then the concatenation of P and Q, Kleene star of P, and any Boolean function f of P and Q are all Regular Expressions.

Concatenation

Concatenation is the process of taking two symbols and joining them one after the other. For example, if we have the symbols “a,” “n,” and “d,” then we can concatenate them to build the word and.

Kleene Star

Kleene star is a process with one regular expression. You denote Kleene star by raising a regular expression to the power of *. That is the best way to describe the notation anyways. What is actually happening is a set is being built with a concatenation of the regular expression infinite times with itself. Epsilon is included within this set as well. That means if we have a symbol “a” concatenated, then the regular expression matches ε, a, aa, aaa, aaaa, and so on.

Logical Or

Logical or is a good example of a Boolean operation. It is not the only, but for the purpose of computational theory, it is one of the main 3. It works as you would expect. If you take two symbols and or them together, you can get either the first or the second to match the pattern.

Regular Expression Definition

A Regular Expression is defined recursively by the following

α is a Symbol that matches only to α
λ is a Regular Expression which only matches to the empty string
If P and Q are Regular Expressions then P ◦ Q, P | Q, and f(P,Q) where f is any Boolean function are all Regular Expressions.

Books

Here is a list of books that taught me a great deal about Computational Theory. If there are any books that you have benefited from, feel free to share them in the comments and help grow this list!

Introduction to Languages and the Theory of Computation, By John Martin
Regular Algebra and Finite Machines, By John Conway
Introduction to the Theory of Computation, By Michael Sipser

Research Papers

A couple of names were mentioned in this article of famous historical authors who I cited some works. Here is a list of these research papers, which are fairly easy reads once you start to understand the computational theory. I recommend reading them after this course.

Derivatives of Regular Expressions, By Janusz Brzozowski
Representation of Events in Nerve Nets and Finite Automata, by Stephen Cole Kleene

A Simple Explanation of Regular Expressions