What Are Regular Languages And Why Do We Need Them?

This article is the start to the guide The Ultimate Guide To Computational Computer Science Theory.

Think to yourself for a moment of what the English language consists of and give the most basic idea. Often people think of words as the most basic idea of the English language. Regular Languages are kind of like words. If we compare English words to Regular Languages, it becomes straightforward to break down the terminology. So let’s start by using the language to do so.

In a general sense, English words are to grammar as Regular Languages are to Context-Free Grammars. This is not exactly one-to-one, but it is close and builds an early understanding that Regular Languages are a building tool to use and that there are far more out there in the realm of computational theory. The diagram shows how these languages all work within one another and is a great reference for a student to look back to.

Language Diagram — Language in Computational Theory

Alphabets

In English, we have an alphabet the letters a-Z; we don’t include the numbers because we write the number out when we talk about numbers in writing. With Regular Languages, we also have an alphabet. We most often use the greek letter sigma to denote an alphabet. So if we were to attempt to argue that the English language is Regular, we would start by defining its alphabet, which we just did.

That being said, we need to be careful. Regular Languages are not nearly as restrictive as the alphabet for English. In fact, the English alphabet is simply a subset of all the symbols that can be used with Regular Languages.

Symbols

This raises the question of what are symbols? Symbols, in a general sense, are anything. They could be letters like you see with the English language or even greek letters. Technically speaking, you can even define a smiley face as a symbol.

State

The concept of state is often difficult to teach. The formal definition is along the lines of “The particular condition that someone or something is in at a specific time.” Once you understand the state, that becomes enough, but many struggles to understand what that exactly means.

To avoid confusion, let’s look at 2 examples of states.

The first is really simple, something that we use every day, a light bulb. When you first walk into a room, the light is off. This is to say that the state of the light bulb is off. When we turn it on, the state changes to on. That means the light bulb has two states on and off.

Another example that is a bit more complex is a text document. When you begin a text document, the state is empty. Let’s call this state 0. When we press a key, we change the text document’s state to another state. The question is what state. We have many keys to choose from, typically 101. This means we have at least 101 states that we can transition to, and then from each of those states another 101 states, and so on. Machines like this have tools that we will later discuss actually to be able to decipher all of the states. For now, we move on to what these machines are, Finite Automaton.

Finite Automaton

A Finite Automaton, often abbreviated as FA, is an important took regarding Regular Languages. Finite Automaton is a device that holds 6 important pieces of information. Before continuing that thought, I would like to point out that most people will say 5 pieces of information, but John Conway proposed the idea of 6 in his book Regular Algebra and Finite Machines. Conway’s idea sheds light on ideas that introductory students often feel are missing.

These 6 pieces of information include an alphabet, a set of states, a set of accepting states, a set of output functions, a transition function, and finally, an initial state. We have discussed alphabets and states, so let’s break down the other components next.

The set of accepting states is a set of states that we say that if reached and maintained, the machine is in an accepting state. What that exactly means is up to the interpretation of the machine’s design.

Next up is the output function that takes a state and performs a task based on your state. So if we look to the light bulb with the first state of off, when off is passed to the output function, it does nothing. But when on is passed to the output function, the function will send power to the lightbulb.

Following the output function we have, the transition function is arguably the most important part of this machine. This function takes a state and a symbol and outputs the next state. In the case of our second example, we would give state 0 the letter a then we would move to some state. The transition function dictates that move.

Lastly, we have the initial state. This tells us where to begin. In the case of the light, we said the first state was off. In the case of the text editor, the first state was 0. These are their initial states.

Regular Languages

With all of the background defined, we can now express Regular Languages more precisely. A Regular Language is any language that can be represented with a Finite Automaton. That means when we started with the English language, we were already talking about a Regular Language. In theory, you can construct the entire English language as a Regular Language. It would not be an easy task and pretty much a pointless one but you could do it.

We say that the English language is one of many Regular Languages. That means that Regular Languages are simply a set of languages. This also means that Regular Languages are closed under normal set operations. We can Complement, Concatenate, union, intersect, difference, so on and so on. This is important for a theory student to know, and as we move forward, you will see that these closure properties dwindle as you go up the chain of language types.

Why are Regular Languages important to us?

Computational Theory has built us the computers we run on today. Every computer is a Turing machine, and the Turing Machine would have never been discovered without Regular Languages.

If the history isn’t important to you, try this one. Regular Languages are used to define every single programming language. It’s the first step of developing a programming language. This means that without them, we couldn’t do any of the complex things we do today with computers.

If that STILL isn’t enough for you! Programmers on a daily basis use regular expressions to simplify programs, store data, search databases, analyze the human genome, and so much more! Without regular expressions, the world we live in today would cease to exist, and we would fall back to a life with little to no automation.

Books

Here is a list of books that taught me a great deal about Computational Theory. If there are any books that you have benefited from, feel free to share them in the comments and help grow this list!

Introduction to Languages and the Theory of Computation, By John Martin
Regular Algebra and Finite Machines, By John Conway
Introduction to the Theory of Computation, By Michael Sipser