A Guide to Better Understand Lexical Analysis

Lexical Analysis

Terms to know


A source programs are the files written in the source language given to a compiler to be translated to a new language.

Lexical Analysis is the first stage of the compilation process. The source program is given to a lexical analyzer during lexical analysis to be changed from a string to an array of tokens.

A Token is a data structure that represents some information regarding the lexical analysis. Typically this is a two-tuple with a token name and an optional attribute value. This attribute value is used when we need to keep more information saved regarding the token, such as a specific identifier or integer value.

A Pattern is a description of some sequence of symbols. Patterns typically come in the form of regular expressions.

A Lexeme is a sequence of symbols found within the source program which match a pattern.

Lexical Analysis


Lexical analysis is the first phase of compilation. During the compilation process, the lexical analyzer provides a sequence of tokens produced by parsing the source program.

Figure 1: Process of lexical analysis

Sometimes lexical analysis will be referred to as tokenization since it primarily deals with transitioning from code to tokens. In the application, we may refer to a lexical analyzer as a lexer or tokenizer.

Lexical analyzers work hand in hand with their parsers and symbol tables to prepare the program for the next phase, semantic analysis. In the application, we see lexers provide tokens to the parser via a function or method called getNextToken(). With this idea, we can provide tokens as they are needed rather than in preparation.

Figure 2: Interaction between the Lexical, Parser, and Symbol Table

Lexical analyzers are often not done with an ad-hoc method but rather by using a lexical analyzer generator. To fully understand lexical analysis we need to understand these generators. Continue reading more about Lexical Analyzer Generators in the Compiler Construction course and build your skills in the subject!

Books

Listed below are a few of my favorite books that taught me a fair amount of compiler construction. If you have books that benefited you on this topic, please share them in the comments below and expand this list!