Software Development
Blogs and Discussion
developer.*
Books Articles Blogs Subscribe d.* Gear About Home

Request to my readership for clarification concerning C#

OK, I can see why the designers of C# did not implement the preprocessor of C++ and C. It creates obfuscated code, and its functionality, as I learned several years ago from Number One Son, its functionality is largely replaced by objects.

For me, furthermore, it doesn't go far enough. This is because the preprocessor of PL/I was Turing complete in that you could create loops in the preprocessor and, from a logical standpoint, the preprocessor was a full "computer", allowing you to ship seriously customizable (and, seriously obfuscated) solutions.

But I find the string model of C# supports the concept of "scan for any one of a SET of characters" that is found in C strspn, but FAILS to support (as far as I can determine) the concept of "scan for any one of the COMPLEMENT of a SET of characters" as seen in strcspn.

"All characters" form a finite set in ASCII, in extended ASCII, in Unicode, in DBCS and even in the mystical set of all possible characters, which my homey Bjarne Stroustrup says can be represented by an unsigned int. Why, on modern platforms, if you had to expand beyond THAT, you could represent all character sets in the local galaxy cluster using a modern long unsigned: all 2^64-1 of them.

Therefore, any complement of any set of characters is finite, and the string model could (but apparently does not) support the concept of character set complement.

The motivation for verify() in utilities.DLL as shipped with my book was to provide the orthogonal facility, provided in the era of small universal character sets including ASCII and EBCDIC, to scan for the complement of a small character set such as "all non letters". verify() knows about Unicode, and uses a rather brute force approach.

OK, I understand that the UNICODE complement if "all letters" is large, and that a crude implementation should not generate the complement as a string. But it is simple to generate the complement as an array of "runs" of adjacent characters, where each run is represented by its start character and length. Any need for a search small in nearly all cases simply, in my view, does not justify the failure of orthogonality.

The worst case would suck, if the caller of strcspn/verify used every other Unicode character: but note that "bad" to worst cases occur when the caller needs his head examined.

I post this call without due diligence in the sense of six hours of research into all .Net facilities to find if I am just wrong. This is because this blog needs to be more participatory and more readers need to toot their own horn by exposing my bone ignorance of the C# programming language...for somewhat the same reason that on French Hill in San Francisco, and in gay Paree, I speak demotic French in order to learn: for the same reason I dicker with cab drivers at Lo Wu in a Chinese of 0..20 words.

Learning a programming language, in my view, needs to be dismabiguated completely from learning computer science: Princeton University does not teach ANY classes in programming languages *per se* for this reason, and when I was there, I taught C to an interesting assortment of Pascal-trained high school geniuses and prematurely aged graduate students. In industry I find the two issues are systematically merged.

This prevents the learning of a programming language as a CRITICAL venture, reminiscent of the bull in the China shop, in which the tyro learns as does the Marine recruit: by getting mad at the DI. In programming languages, you LEARN Fortran, in my experience, by learning its ridiculous overscoping. You LEARN PL/I, in my experience, by discovering just how obfuscated PL/I can be.

But in corporations, the critical gesture (including expressions of primitive dismay emanating from cubes) is strongly discouraged, with the result that when Dilbert clones "learn" new things, they perforce must do so in fear and awe, celebrating, in song and story, the putative genius of the designers while interspersing their threnody with choruses, that sing of their own inability, with the hypocrisy of the corporate world.

The false humility means, in my direct experience as an instructor, that many ace programmers (who in fact learn in a somewhat autistic fashion to perform in fact well, on the job and in its terms alone) NEVER LEARN the deepest semantic aporias (deficiencies, gaps) in their platform!

For example, in a recent class in VB.Net in Fiji, I presented a For statement as For intIndex = 1 To intEnd - 1. A very bright student, who was in my opinion almost an expert in VB COM, said, that's inefficient.

Why, I said.

"Because ya repeatedly evaluate intEnd - 1, Professor", he said (I always feel like the guy who plays piano in the whorehouse when computer students call me Professor).

But in ALL editions of VB, an effective standard is that the For limit (unlike the corresponding limit in C, C++, Java, and languages of the C family) is evaluated "by value": it is precalculated and read-only for the duration of the For.

He felt, however, that it is better to be belt and suspender "structured": he felt it is better to be safe than sorry: he felt that give them an inch and they will surely take a mile (boys). Which is to say that sadly in light of the genius of this guy (whose installers were things of beauty), his thinking was informed by what Marcuse calls surplus repression.

A CRITICAL learning of VB, one based on computer science, would upon being presented with the For, interrogate the language as to what sort of evaluation was in use: by value or "by reference". But in the corporate world, we have to hide, not only our love, but that form of anger, which is criticism...and which George Sand said is like love in the first place.

As instructors, we're all familiar, with "Otto", the guy, either in the rear of the class or the first row, who asks "silly" questions. At university, I have seen "Otto"s flower. But in the corporation, I have seen them systematically humiliated in classes, staff meetings, and code reviews, and their treatment, even in light of their genuinely warped personalities, makes me sick. It is said that maturity is getting shut of the idea that life can be like university, or Spring Break for that matter: genuine maturity, for me, consists in a more complex idea: life is neither a seminar, nor Spring Break, but could be and is not.

This is why I am displaying what may well be mere ignorance, and challenging my C sharp homeys to expose it.

The designers of C# certainly seem to have learned from their fathers' mistakes in the abandonment of the preprocessor. Furthermore, a true preprocessor can and should be written completely independent of the preprocessed language. But here, as far as I can tell, C# is flawed, and I'd like to start a discussion.

Categories:  |

Enter the Orthogon

Well, to be frank I understand the functionality you say is lacking, and it sounds cool. But I can't think of any situation I have ever encountered (or possibly will ever) where I'd need something like that. Can't say I really miss it. Give a concise example of how you've used it and the effect C#'s exclusion of said feature affects your acceptance or lack thereof.

Inefficency and Computer Language

I'm with Edward in the importance of trying out new languages in life and in software (if there's really a difference). And I've also tried out my microscopic Chinese vocabulary on cab drivers. I once repeated the Chinese word for "train station" (it's really a mouthful) five or six times to an incomprehending cabbie outside of Taipei. Finally I just said "train station" in English. He understood right away and laughed and laughed (though the Chinese are supposed to be soooo polite) at my silly way of saying the word in his language.

When in comes to software development languages, I'm not sure exactly how necessary the sort of deep understand one would require to know off-hand whether variables are passed by-value or by-reference into a for loop really is. In the case of VB, the by-val approach is more forgiving, and many developers in Java have to learn the hard way that if a variable evaluated in the for clause is modified in the loop, craziness can result.

Well, obviously it's important, then, but I think less important than being aware of the finer points of a particular language is being aware of the finer points of writing good code, and these finer points are universal.

I'm not going to look up the exact references, but Steve McConnell, in Code Complete, regularly disdains placing an emphasis on efficiency over clarity. In the for loop example cited by Edward's student as inefficient, even if he had been right on the facts he would have been wrong on the importance of the fact. Today's computers are (and have been for many years) so fast that an inefficiency like this would have to be a part of a very very big loop (on the order of millions or even billions of iterations) for there to be any noticable delay.

The real issue is the clarity of the expression "intEnd - 1". Now, whether this is the clearest way of expressing this value in the method in question, I have no idea. But that's the point that we, as developers, need to focus on.

If efficiency were the only factor at play, we'd never create objects or perform method calls. More efficient to just copy all of the code into one super method. In fact, we might have to write all of our code as memory pointers expressed as 1s and 0s.

Reply to the Orthogon and also to Mac MacGrogan

Orthogon-man, you need the complement of the set when the clearest way of expressing your purpose is "I needs to get to to the end of a string of characters consisting of characters from a small set of characters". In my experience, this type of left-to-right moseying occurs all the time in programming.

[To Mosey is to advance, after the great Union General Cyrus Washington Booker Mosey who at the Battle of Quaker Meeting said to his troops, "men, let us Mosey forth without fear and meet the Rebs. On, Wisconsin! Union or Death!"]

Let's be serious. I am concerned with programming what I mean.

Mac: my experience is that understandability, as a psychological predicate having to do with the phenomenology of code, cannot be treated as a strictly technical issue.

Wow, wash my mouth out with soap after that one.

What I mean is that understandibility, in an irreducible way, depends on reader response.

A mathematically oriented programmer will be delighted to see an expression in a for loop. A non-mathematical programmer will prefer to see that expression broken down.

Since I think programming is mathematical my tendency will be to place the expression in the For loop in VB and in Java, and, in languages of the C and Java family, simply avoid modifying any variable in the for terminus.

This is a normative style but in fact many programmers use it. They code what they think their readers SHOULD understand. If it so happens they share cultural background with their mates, then their code is celebrated by the lads as good code. But if their code seems, to the homes, as in any way fancy-schmancy and reflective of a foreign and estranged practice, then the coder is regarded at best as a sacred monster.

At worst, the brutal demotic enforcement of "proper" speech enters the code review, which becomes a sort of flaying of Marysas based not on "objective" criteria but on local power structures.

Practices, which make life easier for the lads, are normed in an absolute sense. The question is whether life should be made easier for the lads as a general rule.

But best practice in Java-like languages is probably to precalculate the expression and even declare it, where possible as const and therefore read-only.

Basically, I think humanity needs a Character Set Object capable of representing, independent of any one standard, a set of integer values which represents a character set.

Orthogonality

In general, if you provide function a, and it is easy to provide its inverse, then you should as a general, local praxis independent way of delivering true "clarity".

It is impossible to so write a document that all ordinary people world wide can understand it in their local patois: this is trivially, true.

"Understandability" of code and ordinary text is an easily corruptible predicate which in Visual Basic praxis is used to excuse all sorts of bugs, potential bugs, and ugliness because "understandability" is interpreted introspectively as "I understand what I code".

It should be replaced by universality as seen in the writings of Stroustrup which MEANS that today, the 1970s discourse of "understandability" needs to be globalized.

In the 1970s, a common accusation was "dis code is written in Sanskrit". "Dis code is a Chinese Fire Drill".

Today, my homeys on the job in China at best only appreciate the image of coordinated mass energy constituted in the phrase "Chinese Fire Drill", and I ran the Shenzhen Fun Run last October. It was a Chinese Fire Drill only in the sense that out of a number of runners (at least 10000!) that would have created chaos in America, and out of what appeared to be in fact Chaos, a 10K run emerged with a great deal of order...in part because they gave everybody the same running attire.

Therefore, in search of that understandability constituted in universality, "orthogonality" is for me an important guide. I cannot be certain that the VB trained reader will initially see the need for the notion of "scanning, for any one of a set of characters" when VB has traditionally only provided the notion only of "scanning, for a string"...which is completely different.

But once I see the need for the positive scan, the relative ease of the inverse scan almost forces its implementation, based not on localized cultural considerations but metaphorically by the very idea that if I can get from A to B I should be able to make the reverse journey.

strcspn

After reading your last couple comments, I thought for sure I had missed something in your original post. So then I went back and read it again. Then I went and Googled the function you refer to (strcspn) and got this definition:

[quote]The strcspn() function computes the length of the maximum initial segment of the string pointed to by s1 that consists entirely of characters that are not in the string pointed to by s2.[/quote]

So the way I understand this, in usage it would look something like this (excuse the C# pseudo-code):

string _s1 = "Enter the 49 gates of uncleanliness!";
string _s2 = "Xpg2!@l";
int _iLength = String.strcspn(_s1, s2);

The result of _iLength would then be 13, correct? That's the length of the initial segment in _s1 before any character in _s2 (in this case "g") is encountered. And you are right that there is no corollary in C# for this (though I can't speak to whether or not it's included in the 2.0 Framework). This is a little bit different than my original understanding of what you were describing, but I'll not muddy the waters by going into that.

Suffice it to say, I stand by my original post in that I've never had the pleasure to encounter a situation where I've needed something like this though I'll concede its use as plausible. However the way to do it in C# would be to write a function yourself to parse _s1 into an array of type char and loop through each character until you found a match...or not...which could be a total pain if the string you're scanning is on the order of, say, Moby Dick.

Dunno, there may be a more elegant approach but I'd have to think on it. I wouldn't consider this omission justification for labelling C# as "flawed" though. You should pose your question to Eric Gunnerson and see if he can speak to it.

http://blogs.msdn.com/ericgu

strcspn redux

I will ask Eric.

But you confirm my point.

I don't think its possible for a language designer to claim (by implementing strspn and not its mathematical inverse) that he has any knowledge of what programmers need, for the very good reason that he is both sensing and creating needs.

It is in fact common to want to find the first non-letter, the first non-digit, etcetera and even in the era of Unicode, the boundaries of the universal character set are known.

The practical problem, I think the C# designers faced, is that as opposed to ASCII, it is not simple and efficient to represent character set complements by means of a string.

However, I maintain that mathematics is probably the only universal grammar in the pragmatic sense. In a given culture, a programmer may more naturally think in terms of searching for "anything except a favored set of characters".

Furthermore, providing only strspn will bias the code, of lazy coders, into providing false representations of inverse sets as strings of characters, easily available on their keyboards.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Recent comments

User login

About our advertising.

Atom Feed

developer.* Blogs also has an Atom feed, located at this url.

Click here for more information about Atom.

A Jolt Award Finalist
Software Creativity 2.0
Foreword by Tom DeMarco

Recent Posters

Based on most recent 60 days, sorted by # of posts and name.

Google
Web developer.*

Who's online

There are currently 0 users and 34 guests online.

Syndicate

Syndicate content
All views expressed by authors, bloggers, and commentors are their own and do not necessarily reflect the views of developer.* or its proprietors.
Click to read the Copyright Notice.

All content copyright ©2000-2005 by the individual specified authors (and where not specified, copyright by Read Media, LLC). Reprint or redistribute only with written permission from the author and/or developer.*.

www.developerdotstar.com