Optimization of DFA-Based Pattern Matchers

Minimizing the Number of State of a DFA

There can be many DFA's that recognize the same language. Not only do these automata have states with different names, but they don't even have the same number of states. If we implement a lexical analyzer as a DFA, we would generally prefer a DFA with as few states as possible.

Rules for minimization

For states and in a finite automata, they can be merged if and only if for all input symbol , states and both have transitions on to

non-final states
the same final state or merged final state

Example

Minimize the following DFA transition graph

unmin-dfa

In the following table, each entry shows whether two sets can be merged, while the upper-right triangle is redundant.

	a	b	c	d	e	f	g
b		—	—	—	—	—	—
c			—	—	—	—	—
d				—	—	—	—
e	✅				—	—	—
f				✅		—	—
g							—
h		✅

Consequently, we get these new state

min-dfa

More NFA-to-DFA Conversion

Consider the regular expression, NFA transition graph and the corresponding transition table

(+|-)?(d+|d+.d*|d*.d+)

signed-real-number-nfa

	+	-	.	d	ε
S	A	A			A
A				B, C	E
B				B	F
C			D	C
D				D	F
E			G	E
F
G				H
H				H	F

Using subset construction

signed-real-number-dfa

	+	-	.	d
S,A,E	A,E	A,E	G	B,F,C,E
A,E			G	B,F,C,E
G				H,F
H,F				H,F
B,F,C,E			D,F,G	B,F,C,E
D,F,G				D,F,G
D,F,H				D,F,G

Minimize DFA using the technique above

Since and , we can merge and into , which producing

	+	-	.	d
S,A,E	A,E	A,E	G	B,F,C,E
A,E			G	B,F,C,E
G				H,F
H,F				H,F
B,F,C,E			D,F,G,H	B,F,C,E
D,F,G,H				D,F,G,H

TIP

Now, the next minimization is tricky. For and , we can merge them if and only if they're merged.

In DFA minimization, when the effect is the cause, we shall always assume that the minimization stands, i.e., and can be merged!

After they're merged, we get.

	+	-	.	d
S,A,E	A,E	A,E	G	B,F,C,E
A,E			G	B,F,C,E
G				D,F,G,H
B,F,C,E			D,F,G,H	B,F,C,E
D,F,G,H				D,F,G,H

Tidy it up by renaming the merged state with the leading state

	+	-	.	d
S	A	A	G	B
A			G	B
G				D
B			D	B
D				D

signed-real-number-dfa-min

Eliminating Empty Cycles and Paths

We've seen how to transform the NFA into the DFA, however, the number of states in the DFA may grow exponentially with the number of states in the NFA. So we should do some processes on the NFA before using subset construction.

Eliminating empty cycles

Whenever there is an empty cycles, a state within the cycle can be eliminate. A empty cycle consists of solely two or more empty transitions. For example, the graph below contains two empty cycles, C-D-A and C-D-E. The gray state indicate that it can be eliminated.

empty-cycle-1

empty-cycle-2

empty-cycle-3

Optimization of DFA-Based Pattern Matchers #

Minimizing the Number of State of a DFA #

Example #

More NFA-to-DFA Conversion #

Using subset construction #

Minimize DFA using the technique above #

Eliminating Empty Cycles and Paths #

Eliminating empty cycles #