Relating software refactoring to algebra factoring

Saturday, March 12, 2011

Refactoring is a fairly common word in software development. Most people in the software industry have some ideas of it. However, it was not clear to me where the term comes from and how it relates to the factor we learnt in algebra. A Bing search indicates that the word Refactoring origin from Opdyke’s Ph.D. thesis used for restructuring in object oriented software, but is not clear how it relates to the factor we normally know. So I attempt to make some sense out of it after the fact.

In algebra, we know a factor is part that participated in a multiplication. For example, if C = A * B, we call A and B factors of C.

Factors are often used to simplify computation. Taking the reverse distribution identity of multiplication as an example, C * A + C * B can be rewritten as C * (A + B). This will improve computational efficiency if multiplication is far more expensive than addition. C in this case is called the common factor in the two terms, C * A and C * B. If there are no common factors between A and B, we call C the greatest common factor (GCF) the two terms.

To identify the GCF of two numbers, we usually find all the prime factors of numbers and include the greatest common count of those prime factors. For example, if we want to find the GCF of 72 and 120, we would like 72 as 2³ * 3² and 120 as 2³ * 3¹ * 5¹. Then the GCF would be 2³ * 3¹ = 24. (Note that in real computer program we usually calculate GCF using Euclid’s algorithm).

Now let us see an example of analogy to algebra factoring in Boolean algebra. There is an identity (A and B) or (A and C) = A and (B or C). If we relate “and” to * and “or” to +, the above identity would be very similar to distribution identity in algebra.

In software refactoring, we usually identify the longest common sequence that is duplicated (let me set parameterization aside for simplicity) and try to reuse the sequence as a single unit. How do we find an analogy to algebra? Let us consider an instruction in a sequence as a function that take the state of computer as input and return the modified state. Then a sequence of instructions can be written as f1(f2(f3(…fn()))). If we replace the () symbol with *. Then we can write above as f1 * f2 * f3 * … * fn. Now we use + to denote any situation that we need branching. For example, we could write if (cond) { A } then { B } as A + B. Now let use apply this notation to a more complex example:

if (cond) {
	A;
	B;
	C;
} else {
	D;
	B;
	E;
}

A to E above are block of code. So using the above notation, we can rewrite it as A * B * C + D * B * E. Now if we try to factor this expression, we get (A + D) * B * (C + E) (Note that this is a bit different to normal Algebra factoring). Now we expand the above expression into code, we would get something like:

if (cond) {
	A;
} else {
	D;
}

B;

if (cond) {
	C;
} else {
	E;
}

That is in fact one of fairly common types of refactoring.

No Comments