The art of naming


Link to this posting

Postby Ursego » 21 Feb 2013, 10:22

Give classes, functions, variables etc. (but start from DB tables and their fields!) meaningful, descriptive, self-explanatory names which absolutely clearly reveal intent (i.e. make their purpose self-evident), requiring as little comments as possible.

A name should tell us a story to answer the question what the variable or a DB field contains (if a variable name requires a comment, then the name does not reveal its intent!), and which service a class or a method provides. Choosing names that reveal intent can make it much easier to understand and change code.

*** BAD code: ***

Code: Select all
int days;

*** GOOD code: ***

Code: Select all
int daysAferLastPurchase;

Provide the "PER WHAT" information

Use the words "Per", "By", "In" and "Of". A variable's name cowsPerFarm is better than cowsCount - that is the idea. These words especially simplify writing and reading financial calculations! Of course, these words are not needed when the context is obvious. For example, if you are processing a record set row by row, the var rowCount does the work perfectly. The var rowsInDataSet (or, better, dataSetRowCount) would make sense only if something else with rows is being processed in the same code fragment.

Don't use abstract words like "Actual", "Total" and "Real". They will madden you when you will be spending extra time trying to understand what is "actual", what is "not actual" and "total" for which grouping level it is. Is totalCows field total per barn? per farm? per village? per province? per country? per the Universe? If per farm, then how will you name the total per village? It's also "total"! These words are meaningless without the context. That context is obvious for a data architect when he/she is creating the table ("it's easy - it's total per what I am thinking about just now!"), but becomes very foggy later. Sometimes, it's very difficult to see that context looking at a variable, surrounded by hundreds lines of code. So, don't write simply "total" - write "total PER WHAT"!

*** BAD code: ***

Code: Select all
totalSalary = actualSalary * totalHours;

*** GOOD code: ***

Code: Select all
salaryPerDay = salaryPerHour * hoursPerDay;


Use the word "by" (or "of") in variables of the hash table (aka map, dictionary or associative array) type.

Just an example in Java:

Code: Select all
private static final HashMap<String, String> capitalCityByProvince = new HashMap<>();
static {
   capitalCityByProvince.put(province.ONTARIO, "Toronto");
   capitalCityByProvince.put(province.QUEBEC, "Montreal");
   capitalCityByProvince.put(province.MANITOBA, "Winnipeg");
}

See how clear the code is when a <something>By<something> variable is used:

Code: Select all
String capitalCity = capitalCityByProvince.get(clientProvince);

You could argue that same information is repeated, so a shorter name (like capitalCities) would be clear as well (since the key variable name provides the "per what" information). However, that repetition does not harm, especially when the variable is burred in tons of other code or is a part of a complex condition. Also, if the value part of the hash table is something more complicated that just String, the <something>By<something> naming convention is very useful when you are deciding which class to use in your code. Fore example, the name employeeByEmployeeId instructs you that the returned value should be stored in a variable of the type Employee. All that is very important because a real life project can have a lot of hash tables which store complicated business entities (much less straightforward than Province or City).

Only exact definitions that produce no (or minimum) questions, even if that results in longer names!

The common, obvious practice is to give short names. But, sometimes, there are situations when it makes sense to break that rule and use long, "real-English" sentences.

Do that only if it's absolutely necessary. But if you feel that it will simplify working with the code, go ahead! See the difference between

Code: Select all
String mainFilter = this.getMainFilter(); // bad...

and

Code: Select all
String filterBySelectedRowInSummaryScreen = this.getFilterBySelectedRowInSummaryScreen(); // good!

As you see, names of variables can tell a whole story, so code readers immediately understand what is going on here. Even if it makes the code line too long, there is no problem to break it into two lines - it's better than trying to guess what a method returns (or spending time and making an effort to investigate its code) and what is stored in a variable.

From the book "Clean Code":

Don't be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment. Use a naming convention that allows multiple words to be easily read in the function names, and then make use of those multiple words to give the function a name that says what it does.

Don't be afraid to spend time choosing a name. Indeed, you should try several different names and read the code with each in place. Modern IDEs like Eclipse or IntelliJ make it trivial to change names. Use one of those IDEs and experiment with different names until you find one that is as descriptive as you can make it.

Choosing descriptive names will clarify the design of the module in your mind and help you to improve it. It is not at all uncommon that hunting for a good name results in a favorable restructuring of the code.


Names, describing actions, must express if the action should be done in the future or has been already done in the past.

Giving names to variables and tables' fields, don't force readers to guess an action's timing. For example, in one of my project two different tables had fields named calc_method, but one of them stored the method to be used in the next calculation, while the second one contained the method already used in the last calculation. Why not to call the columns calc_method_to_use and calc_method_used accordingly if that can improve understanding of the business and SQLs?

Boolean names consisting of a noun only (without a verb) is another sad story. What do you think about a Boolean variable named isCalculation? Calculation - what??? :twisted: How should the condition if (isCalculation) be understood? As you see, no problem exists if the variable is named isCalculationDone (isCalculated) or, oppositely, doCalculation, shouldBeCalculated or simply calculate.

So, the common advice is to use words do..., ...ToDo, perform..., execute..., ...ToApply, should... etc. for stuff which has to take place in the future, and ...Done, ...Performed, ...Executed, ...Occurred, ...Passed, ...Applied etc. for things which have taken place in the past.

Declare separate variables for each distinct requirement.

This is just one entry of a more general category: "don't be lazy!" When you declare a variable, you should give it a name that accurately reflects its purpose in a program. If you then use that variable in more than one way ("recycling"), you create confusion and, very possibly, introduce bugs.

*** BAD code: ***

Code: Select all
DECLARE
   l_count INTEGER;
BEGIN
   l_count := list_of_books.COUNT;
   IF l_count > 0 THEN
      l_count := list_of_books(list_of_books.FIRST).page_count;
      analyze_book(l_count);
   END IF;
END;

*** GOOD code: ***

Code: Select all
DECLARE
   l_book_count INTEGER;
   l_page_count INTEGER;
BEGIN
   l_book_count := list_of_books.COUNT;
   IF l_book_count > 0 THEN
      l_page_count:= list_of_books(list_of_books.FIRST).page_count;
      analyze_book(l_page_count);
   END IF;
END;

Now, you can make a change to one variable's usage without worrying about its ripple effect to other areas of your code.

Be consistent throughout the application. Don't produce different versions of a name for a same entity.

When you use values, retrieved from database tables, name the corresponding variables exactly as the DB fields (adding naming convention prefixes where needed, of course). For example, if the DB field is empId, then your variable should be empId too - not empNo, empNum, employeeNo, employeeNum or employeeId.

But if DB fields names are not informative enough, then you can feel more freely, especially when you are coding financial calculations. For example, if the DB field totalHours contains the number of hours the employee has worked in a particular day, it's better to accommodate the retrieved value in a variable named hoursPerDay. Understandability is more important, than consistency.

Shorten words in names ONLY if the produced abbreviation is more than obvious.

Trying to decrease our methods’ size, we usually shorten words while creating names of our tables, columns, application objects, scripts and variables. Sometimes we shorten them slightly, sometimes - significantly, it usually depends on a practice existing in your company. In one of my working places I was suffering from very strict rules of shortening; they shortened everything they saw! They didn’t use vowels except leading ones (wrk - work, crtfc - certificate, insr - insurance etc.) and had other official rules how to remove meat from bones (when I didn't understand one of such "wrds", I opened a special list of abbreviations! :o ). Those names were understandable only to ancient Egyptians, so I was happy when my contract had ended.

In another project, I saw the opposite picture: no shortening at all! OK, sometimes (easy stuff like "id", "sys", "col" or "num"), but usually - full sentences used as names of tables, fields, functions etc. I was reading scripts as if they would be an adventures book, not as programming language code! Initially, I was slightly in shock: how do they do that? Don’t they know that everybody must try to keep their scripts shorter? It was not according to what I had seen in different projects during multiple years of my career... But I want to tell you - it was real pleasure to work on this project! So, accept this shocking and unusual, but great idea: try NOT to shorten words in your objects' names! Of course, only guys who creates new systems (including DB objects) can utilize this advice, and only if the DBMS allows long names (unfortunately, the max identifier length in the Sybase database is 30, so you will become an ancient Egyptian :lol: ). If words in your system have already been mangled, you can do nothing with that - use names as they already exist, don't create versions of names for same entities (consistency is above all!).

Here I list exclusive 4 cases when abbreviations should appear in our code:

1. Abbreviations, used as conventional terms in the business. Such abbreviations (which have become regular words in the daily work) exist in any project, so, of course, they should be used in identifiers.

2. Abbreviations, used as conventional terms by developers community, working with the programming language or technology. For example, each PowerBuilder programmer knows that DS stands for "DataStore", each C/C++ writer knows that "ptr" stands for "pointer", and each database specialist knows the meaning of "PK" and "FK".

3. Abbreviations for long words or word-combinations, used in a name of a LOCAL variable which is mentioned in the function many times. In this case you have to add an explanatory comment to the variable's declaration line, like in the following example, taken from one of my projects:

Code: Select all
String ccm; // current calculation method
String pcm; // previous calculation method

4. Words from this list:

amt - amount
arg - argument
arr - array
asc - ascending
avg - average
buf - buffer
calc - calculate, calculation
col - column
cnt - count (not counter!)
ctr - counter
cust - customer
curr - current
def - definition
del - delete
desc - descendant (not description!)
descr (dscr) - description
dest - destination
dif - difference
doc - document
elig - eligible, eligibility
emp - employee (not employer!)
env - environment
err - error
exp - expired, expiration (not expression!)
expr - expression
frag - fragment
id - identifier
idx - index
img - image
ind - indicator (not index!)
ins - insert (not insurance!)
ini(t) - initial
info - information
ins - insert
inv - invoice (not inventory!)
qty - quantity
len - length
lim - limit
max - maximum
min - minimum
misc - miscellaneous
msg - message (not monosodium glutamate :lol: )
num - number
obj - object
orig - original
parm - parameter
pos - position
prev - previous
ptr - pointer
ref - reference
res - result
ret - return(ed)
rc - return code
sel - select
src - source
sum - summary
svc - service
sys - system
temp - temporary
upd - update
val - value
win - window

If you don't create DB objects, print out this article and pin it in the cubicle of the data architect. :lol:
User avatar
Ursego
Site Admin
 
Posts: 143
Joined: 19 Feb 2013, 20:33

Link to this posting

Postby Ursego » 27 Jun 2019, 12:26

From the book "97 Things Every Programmer Should Know":

Code in the Language of the Domain

Picture two codebases . In one, you come across:

Code: Select all
if (portfolioIdsByTraderId.get(trader.getId()).containsKey(portfolio.getId())) {...}

You scratch your head, wondering what this code might be for. It seems to be getting an ID from a trader object; using that to get a map out of a, well, map of maps, apparently; and then seeing if another ID from a portfolio object exists in the inner map. You scratch your head some more. You look for the declaration of portfolioIdsByTraderId and discover this:

Code: Select all
Map<int, Map<int, int>> portfolioIdsByTraderId;

Gradually, you realize it might have something to do with whether a trader has access to a particular portfolio. And of course you will find the same lookup fragment - or, more likely, a similar but subtly different code fragment - whenever something cares whether a trader has access to a particular portfolio. In the other codebase, you come across this:

Code: Select all
if (trader.canView(portfolio)) {...}

No head scratching. You don’t need to know how a trader knows. Perhaps there is one of these maps-of-maps tucked away somewhere inside. But that’s the trader’s business, not yours.

Now which of those codebases would you rather be working in?

Once upon a time, we only had very basic data structures: bits and bytes and characters (really just bytes, but we would pretend they were letters and punctuation). Decimals were a bit tricky because our base-10 numbers don’t work very well in binary, so we had several sizes of floating-point types. Then came arrays and strings (really just different arrays). Then we had stacks and queues and hashes and linked lists and skip lists and lots of other exciting data structures that don’t exist in the real world. “Computer science” was about spending lots of effort mapping the real world into our restrictive data structures. The real gurus could even remember how they had done it.

Then we got user-defined types! OK, this isn’t news, but it does change the game somewhat. If your domain contains concepts like traders and portfolios, you can model them with types called, say, Trader and Portfolio. But, more importantly than this, you can model relationships between them using domain terms, too.

If you don’t code using domain terms, you are creating a tacit (read: secret) understanding that this int over here means the way to identify a trader, whereas that int over there means the way to identify a portfolio. (Best not to get them mixed up!) And if you represent a business concept (“Some traders are not allowed to view some portfolios - it’s illegal”) with an algorithmic snippet - say, an existence relationship in a map of keys - you aren’t doing the audit and compliance guys any favors.

The next programmer to come along might not be in on the secret, so why not make it explicit? Using a key as a lookup to another key that performs an existence check is not terribly obvious. How is someone supposed to intuit that’s where the business rules preventing conflict of interest are implemented? Making domain concepts explicit in your code means other programmers can gather the intent of the code much more easily than by trying to retrofit an algorithm into what they understand about a domain. It also means that when the domain model evolves - which it will, as your understanding of the domain grows - you are in a good position to evolve the code. Coupled with good encapsulation, the chances are good that the rule will exist in only one place, and that you can change it without any of the dependent code being any the wiser.

The programmer who comes along a few months later to work on the code will thank you. The programmer who comes along a few months later might be you.
User avatar
Ursego
Site Admin
 
Posts: 143
Joined: 19 Feb 2013, 20:33




Ketones are a more high-octane fuel for your brain than glucose. Become a biohacker and upgrade yourself to version 2.0!



cron
Traffic Counter

eXTReMe Tracker