When a new developer joins a project which is already in progress, there is a steep
learning curve. If the new developer already knows the methodology and programming
language, some of this is reduced. If the new developer already knows the problem domain
fairly well, this also shortens the ramp-up time.
There is often a great deal of artificial curve which is added to a project by decree
or by accident. This has the opposite effect; it increases ramp-up time and can hurt the
new developer’s time-to-first-contribution considerably. And not only the first
contribution, but the next several.
The goal of these rules set is to help avoid creating one type of artificial learning
curve, that of deciphering or memorizing strange names.
The rules were developed in group discussions, largely by examining poor names and
dissecting them to determine the cause of their "badness".
If you can’t pronounce it, you can’t discuss it without sounding like an idiot. "Well,
over here on the bee cee arr three cee enn tee we have a pee ess zee kyew int, see?"
I company I know has genymdhms (generation date, year, month day, hour, minute
and second) so they walked around saying "gen why emm dee aich emm ess".
I have an annoying habit of pronouncing everything as-written, so I started saying "gen-yah-mudda-hims".
It later was being called this by a host of designers and analysts, and we still sounded
silly. But we were in on the joke, so it was fun. Fun or not, don’t do that.
It would have been so much better if it had been called generation_timestamp. "Hey,
Mikey, take a look at this record! The generation timestamp is tomorrow! How can that
be?"
Encoded names require deciphering. This is true for Hungarian and other `type-encoded’
or otherwise encoded variable names. To allow any encoded prefixes or suffixes in code is
suspect, but to require it seems irresponsible inasmuch as it requires each new employee
to learn an encoding "language" in addition to learning the (usually
considerable) body of code that they’ll be working in.
When you worked in name-length-challenged programs, you probably violated this rule
with impunity and regret. Fortran forced it by basing type on the first letter, making the
first letter a `code’ for the type. Hungarian has taken this to a whole new level.
We’ve all seen bizarre encoded naming standards for files, producing (real name) cccoproi.sc
and SRD2T3. This is an artificially-created naming standard in the modern world
of long filenames, though it had it’s time.
This isn’t intended as an attack on Hungarian notation out of malice toward Microsoft or
Windows. It’s a simple rule of simplifying and clarifying names. HN was pretty important
back when everything was an integer handle or a long pointer, but in C++ we have (and
should have) a much richer type system. We don’t need HN any more. Besides, encoded
names are seldom pronounceable ([#1]).
Of course, you can get used to anything, but why create an artificial learning curve
for new hires? Avoid this if you can avoid it.
If the names are too clever, they will be memorable only to people who share your sense
of humor and remember the joke. Will the people coming after you really remember what HolyHandGrenade
is supposed to do in your program? Sure, it’s cute, but maybe in this case ListItemRemover
might be a better name. I’ve seen Monty Python’s The Holy Grail, but it may
take me a while to realize what you are meaning to do.
I’ve seen other similar cutesy namings fail.
Given the choice, choose clarity over entertainment value. It’s a good practice.
Pick one word for one abstract function and stick with it. I hear that the Eiffel
libraries excel at this, and I know that the C++ STL is very consistent. Sometimes the
names seem a little odd (like pop_front for a list), but being consistent will
reduce the overall learning curve for the whole library.
For instance, it’s confusing to have fetch, retrieve and get
as same-acting methods of the different classes. How do you remember which method name
goes with which class? Sadly, you often have to remember who wrote the library in order to
remember which term was used. Otherwise, you spend an awful lot of time browsing through
headers and previous code samples. This is a considerably worse practice than the use of
encodings.
Likewise, it’s confusing to have a controller and a manager and a driver
in the same process. What is the essential difference between a DeviceManager and a
ProtocolController? Why are both not controllers, or both not managers? The name leads you
to expect two objects that have very different type as well as having different classes.
We can take advantage of this to create consistent interfaces and simplify learning
dramatically.
Don’t use the same word for two purposes, if you can at all avoid it.
This is the inverse of the previous rule. When you use different terms, it leads one to
think that there are different types underlying them. If I use DeviceManager and
ProtocolManager, it leads one to expect the two to have very similar interfaces. If I can
call DeviceManager::add(), I should be able to call ProtocolManager::add().
Why? Because the name created an association between the two. I expect to see *Manager::add()
now.
If you use the same word, but you have very different interfaces, this isn’t a total
evil (see #12 ), but it does cause some confusion. If you
system or your module is small enough, or your controls rigorous enough to prevent
synonyms, then that’s great.
If you’re learning a framework, though, you need to be most careful not to be fooled by
synonyms. While you should be able to count on the names denoting type, you
frequently cannot.
Remember also that it’s not polite at all to have the same name in two scopes.
Classes and objects should have noun or noun phrase names.
There are some methods (commonly called "accessors") which
calculate and/or return a value. These can and probably should have noun names. This way
accessing a person’s first name can read like:
string x = person.name();
Other methods (sometimes called "manipulators", but not so
commonly anymore) cause something to happen. These should have verb or verb-phrase names.
This way, changing a name would read like:
fred.changeNameTo("mike")
As a class designer, does this sound boringly unimportant? If so, then go write code
that uses your classes. The best way to test an interface is to use it and look for ugly,
contrived, or confusing text. This really helps.
Go ahead, use computer science (CS) terms, algorithm names, pattern names, math terms,
etc.
Yeah, it’s a bit heretical, but you don’t want your developers having to run back and
forth to the customer asking what every name means if they already know the
concept by a different name.
We’re talking about code here, so you’re more likely to have your code maintained by a
CS major or informed programmer than by a domain expert with no programming background.
End users of a system very seldom read the code, but the maintainers have to.
When there is no `programmer-ese’ for what you’re doing, use the name from the problem
domain. At least the programmer who maintains your code can ask his boss what it
means.
In analysis, of course, this is the superior rule to [Use
Solution Domain Names], because the end-user is the target audience.
Readers shouldn’t have to mentally translate your names into other names they already
know.
There are some unfortunate examples for this. One of them is Microsoft’s choice to call
the things that walk through a list Enumerators instead of Iterators.
This is sad because the term iterator is in common use in software circles and was
completely appropriate to the domain (see Pick One ) and
also because the term enumeration typically has a very different meaning (see
Multiple Meanings ). Between the two, most developers
have to translate enumerator to iterator mentally as the conversations about
such things go on.
This problem generally arises from a choice to use neither problem domain terms nor solution
domain terms.
Sadly, and in contradiction to the above, all names require some mental mapping, since
this is the nature of language. If you use a term which might not be known to your
audience, you must map it to the concept you’d like it to represent.
For this reason, most important names should be in a glossary or should be explained in
comments at least. Even if they’re parameters or local variables. Even if they’re inside
the static member of a class, unless the term is completely in harmony with all of these
naming rules.
Avoid words which already mean something else. For example, "hp",
"aix", and "sco" would be horrible variable names
because they are the names of Unix platforms or variants. Even if you are coding a
hypotenuse and "hp" looks like a good abbreviation, it violates too
many rules and also is disinformative.
Likewise don’t refer to a grouping of accounts as an AccountList unless it’s
actually a list. A list means something to CS people. It denotes a certain type of data
structure. If the container isn’t a list, you’ve disinformed the programmer who has to
maintain your code. AccountGroup or BunchOfAccounts would have been
better.
The absolute worse example of this would be the use of lower-case L or uppercase o as
variable names, especially in combination. The problem, of course is in code where such
things as this occur:
int a = l;
if ( O = l )
a = O1;
else
l = 0;
You think that I made this one up, right? Sorry. I’ve examined code this year (1997)
where such things were abundant. It’s a great technique for shrouding your code.
When I complained, one author told me that I should use a different font so that the
differences were more obvious. I think that the problem could be more easily and finally
corrected by search-and-replace than by publishing a requirement that all future readers
to choose Font X..
There are few names which are meaningful in and of themselves. Most, however are not.
Instead, you need to place names in context for your reader by enclosing them in classes,
well-named functions, or comments.
The term `tree‘ needs some disambiguation, for example if the application is
a forestry application. You may have syntax trees, red-black or b-trees, and also elms,
oaks, and pines. The word `tree’ is a good word, and is not to be avoided, but it must be
placed in context every place it is used.
If you review a program or enter into a conversation where the word "tree"
could mean either, and you aren’t sure, then the author (speaker) will have to clarify.
In an imaginary application called "Gas Station Deluxe", it is a bad idea to
prefix every class with `GSD‘ if there is a chance that the class might later be
used in "Inventory Manager" (at which time the prefix becomes meaningless).
Likewise, say you invented a `Mailing Address’ class in GSD‘s accounting
module, and you named it AccountAddress. Later, you need a mailing address for
your customers. Do you use `AccountAddress‘?
In both these cases, the naming reveals an earlier short-sightedness regarding reuse.
It shows that there was a failing at the design level to look for common classes across an
application.
Sadly, this is the standard being used by many Java authors. Even in C++, this is
becoming increasingly common. We need language support for this type of work. I’ve not had
too much trouble with it in Python, but I’m watching out. You should also.
The names `accountAddress‘ and `customerAddress‘ are fine names for
instances of the class.
This is a problem that usually arises from writing code solely for the
compiler/interpreter. You can’t have the same name referring to two things in the same
scope, so you change one of them. Well, that’s better than misspelling one (I’ve seen code
that looks like this was intentional, and correcting the spelling prevented compiles due
to symbol clashes), but there should be some fundamental change in name that make it clear
that they are different.
Imagine that you have a Product class. If you have another called ProductInfo
or ProductData, you have failed to make the names different. Info and Data
are like "stuff": basically meaningless. Likewise, using the words Class
or Object in an OO system is so much noise; can you imagine having CustomerObject
and Customer as two different class names?
MoneyAmount is no better than `money‘. CustomerInfo is no
better than Customer. The word `variable‘ should never appear in a
variable name. The word `table‘ should never appear in a table name. How is NameString
better than Name? Would a Name ever be a floating point number? Probably
not. If so, it breaks an earlier rule about disinformation.
There is an application I know of where this is illustrated. I’ve changed the name of
the thing we’re getting to protect the guilty, but the exact form of the error is:
getSomething();
getSomethings();
getSomethingInfo();
The second tells you there are many of these things. The first lets you know you’ll get
one, but which? The third tells you nothing more than the first, but the compiler (and
hopefully the author) can tell them apart. You are going to have to work harder.
Try to disambiguate in such a way that the reader knows what the different versions
offer her, instead of merely that they’re different.
The hardest thing about choosing good names is that it requires good descriptive skills
and a shared cultural background. This is a teaching issue, rather than a technical,
business, or management issue. As a result many people in this field don’t do it very
well.
Follow some of these rules, and see if you don’t improve the readability of your code.
If you are maintaining someone else’s code, make changes to resolve these problems. It
will pay off in the long run.