logo
Published on developer.* Blogs (http://www.developerdotstar.com/community)

Brian Kernighan's Beautiful Code?

By Edward G Nilges
Created 2007-12-26 01:06

When I had started working at Princeton University's Information Centers in 1987, my boss at a reception for alumni student Info Center workers said to me, Ed, this is *Brian Kernighan*.

It was all I could do to avoid making the yi er san kow-tow like Wayne and Garth in Wayne's World when they meet Alice Cooper, saying, I'm not worthy I'm not worthy I'm scum.

I'd been very impressed with Kernighan's work, in particular his literary and critical "take" on programming style in books he wrote in the 1970s.

However, I was disappointed by his essay "A Regular Expression Matcher", in Beautiful Code (O'Reilly Media, Sebastopol CA 2007) and moreover by the book.

Kernighan's essay, and the book in general, is post-Dijkstra, post-Algol, reflective in fact of an American centric anarcho-conservatism in which people retreat behind the flimsy yet obdurate walls of their favorite platforms without acknowledging that this serves corporate needs exclusively.

Mere structs are proudly labeled objects in Web hacks. In general, people are proud to be members of in-groups, not to evolve a global computing language. Dijkstra isn't even in the index despite the fact that he practically invented the very idea that code needed, as what Dijkstra called a matter of life and death, to be elegant syntactically and semantically over a lifetime of versions.

Turning to Kernighan's example of "beautiful" code, written in one hour by Rob Pike in 1998, we find the code I've placed at the bottom of this post.

[Note: Pike isn't in the index either. Oh well, he's just a programmer, right? The "programming style" movement that Kernighan helped start recentered the mere programmer as being a creative and intelligent Subject, but a new conservatism relegates him along with Dijkstra to a common grave.]

The privatized industrial *milieu* in which I fear Kernighan operates unquestioningly, preparing his students at Princeton for a lifetime in which there's never owing to profit pressures enough time to do it right, but plenty of time, and job opportunities, to do it over, is shown by the fact that to Kernighan, it's a Good Thing that Rob only took an hour.

Paradoxically, perhaps dialectically, the programmer's time, by becoming priced so high in an imaginary market, becomes worthless as becomes his authorship. A constant feature of programming is the after hours free labor of "valuable" programmers, where the worthlessness of their free time varies inversely to the market price of their billable time. I discovered at one firm that one of the Most Valuable Players had become an MVP by not ever reporting the extra hours he worked, even though he would have been paid for those hours by company policy. Rob's product is valued because it isn't "pretentious", he didn't have the right to work longer, only harder, producing, as we'll see, a flawed piece of code for posterity.

The most serious flaw is that it uses the test parameter of match as the index to the test "string" (which is only painfully a modern string, as we'll see, whence the scare quotes).

Yes, Brian, I understand that in C the programmer can use a name as a Von Neumann address to a byte now and forever, amen. I understand that test is passed as are all variables in C by value, therefore on a stack implementation (yes, the only possible implementation) the programmer is "free", o happy day, to use the stack copy, test itself, oh frabjous day, as the index itself.

"Freedom, hoy-day, freedom!" - Shakespeare, The Tempest

But I question the right of the C programmer to use something so idiomatic as confusing a value parameter, something which is not normally modifiable, as modifiable.

Brian, you cannot have your cake and eat it too, even on Christmas Day. You want to use C as a way to speak about Beautiful Code to Beautiful Minds.

But this requires the Beautiful Mind to call back to mind the fact that in most, but not all, stack virtual machines, value parameters on the stack can be slyly, in what I think to be an Ugly way, used as "work" variables.

Unless of course the C virtual machine is a piece of hardware, on which value parameters placed on the stack cannot be modified...or can be only slowly, while new values (function end results) can be pushed quickly. Such a machine, implemented in embedded hardware, is conceivable.

In the interests then of using C as you want to use it, as an Algol style publication language, you should have modified Rob's code to copy test to a private workplace. The "waste" of a cycle would save the time of the program reader.

C, as a language for talking about algorithms, is pernicious because it requires the reader to "understand" too much, viz., that it is a high level assembler language which obscures as much as it shows.

I understand your point that in Java and in C Sharp, the programmer would have been compelled (boo hoo) to use a string object, and this would be perhaps more time costly...at least on an unoptimized platform.

But you follow in the exposition after the code is presented the rule of the Duke of Wellington: never apologize, never explain. I looked for an apologetic explanation to people who don't use C that test may be used as an index in the way you use it.

You also fail to mention that the program doesn't even work for modern strings, only for what we call in .Net, sbyte, strings o' bytes. To call Rob's code in .Net, I have to create an Unsafe interface.

In 1998, it was painfully arguable that the world was still interested in scanning strings o' bytes using regular expressions that are also strings o' bytes.

Missing is Bjarne Stroustrup's Danish internationalism. In 2004 in Shenzen I read code which applied regular expressions to Chinese text in double-byte Unicode.

C programmers claim that their language is unicode-aware because they are, and they can hack it. This isn't the point. The point is information hiding, and the Java and C sharp strings can handle international input: Rob's code cannot, and will be keyed in, as Beautiful Code, by tyro programmers and it will fail. As in the case of the use of the test variable, you don't discuss this issue.

As a relatively minor cavil, I know that it has been very hip, since the days of the PDP8, to use lower case. The problem is that matchhere is ugly because it uses double h in a way that doesn't occur in English: it's Klingon, and why not bite the bullet and use matchHere?

In my opinion, camelCase is a thing of beauty in that it avoids the class bias of Proper case, refusing the first letter a special honor merely because of primogeniture, but preserving needed breaks between words. I understand that MATCH_HERE or match_here would be Coyote Ugly in a use of underscore common in the IBM, PL/I and Rexx tradition, but camelCase is to me an excellent compromise.

Also, why couldn't Rob have returned an index to the place where the regular expression returns, and, to be consistent with zero origin indexing, -1 for no hit? It's not elegant in a real timesaving way to do all the work you do, and return such a thin cold response to callers whose next question will be where the heck does the string start!

It's very confusing to have the code return 1 (affirmative) for regular expression C* in string DD until you remember "zero, one, or more" but there's no helping this. The reader has to understand the basics.

I like the way you show how to use recursion simply in the real world; too many programmers say "ooooo recursion, scarey and time wasting, don't make me think, aaaargh".

But in general, I think your C has become a lingua franca and as such a Frankfurt, a walled city with its own shibboleths.

This to me is contrary to the open, humanistic and critical spirit of Kernighan's early writings on programming style, which deconstructed bad Fortran and showed how it could be rewritten even in Fortran itself.

I may not have realized that my (computing) generation was engaged not in liberating computing from the shackles of privatized idiocy, but merely in an Oedipal destruction of the 1960s generation of the IBM fathers, creating, sad to say, its own shibboleths and orthodoxy...in which even such a brilliant man as Kernighan stays within a discourse boundary in which he cannot explain that "we use the value parameter test to index, don't worry, your copy of test will not be changed because test is called by value".

This is a computing culture dominated by American private corporations which use computing to obscure. I have learned, for example, that the securitized mortgages which are creating a global economic crisis cannot be reverse engineered in such a way as to find the original debtors, and reprice the security by determining the best risks.

This was probably because the programmers were under orders to do things in an hour or less, as Rob Pike was praised for, and they appeared to have discarded the idea of keeping keys in new tranches. I'm not an expert on this, but I have enough experience to somehow intuit that during the Big Party, yuppie scum who were getting rich were bullying programmers to emulate Rob, and discard "unnecessary features", whether international strings, or an audit trail.

Dijkstra is rolling in his grave. While he was alive, there was to my knowledge absolutely no two-way communication between Dijkstra and guys like Brian because in the American corporate context, Dijkstra's honest criticism was a speed bump.

Brian's essay, Rob's code, and the rest of the book is a disappointment, I'm afraid.

Here's the code wrapped in a value class for Microsoft C++ Visual Studio with a simple main().


public value class kernighanRegexC
{
public:
/* match: search for regexp anywhere in text */
static int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
private:
/* matchhere: search for regexp at beginning of text */
static int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}

/* matchstar: search for c*regexp at beginning of text */
static int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instance */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
};

int main(array ^args)
{
Console::WriteLine((kernighanRegexC::match("C", "DDC")).ToString());
return 0;
}