Why use prefixes on member variables in C++ classes

VoidPointer picture VoidPointer · Aug 4, 2009 · Viewed 111.5k times · Source

A lot of C++ code uses syntactical conventions for marking up member variables. Common examples include

  • m_memberName for public members (where public members are used at all)
  • _memberName for private members or all members

Others try to enforce using this->member whenever a member variable is used.

In my experience, most larger code bases fail at applying such rules consistently.

In other languages, these conventions are far less widespread. I see it only occasionally in Java or C# code. I think I have never seen it in Ruby or Python code. Thus, there seems to be a trend with more modern languages to not use special markup for member variables.

Is this convention still useful today in C++ or is it just an anachronism. Especially as it is used so inconsistently across libraries. Haven't the other languages shown that one can do without member prefixes?

Answer

Jason Williams picture Jason Williams · Aug 4, 2009

I'm all in favour of prefixes done well.

I think (System) Hungarian notation is responsible for most of the "bad rap" that prefixes get.

This notation is largely pointless in strongly typed languages e.g. in C++ "lpsz" to tell you that your string is a long pointer to a nul terminated string, when: segmented architecture is ancient history, C++ strings are by common convention pointers to nul-terminated char arrays, and it's not really all that difficult to know that "customerName" is a string!

However, I do use prefixes to specify the usage of a variable (essentially "Apps Hungarian", although I prefer to avoid the term Hungarian due to it having a bad and unfair association with System Hungarian), and this is a very handy timesaving and bug-reducing approach.

I use:

  • m for members
  • c for constants/readonlys
  • p for pointer (and pp for pointer to pointer)
  • v for volatile
  • s for static
  • i for indexes and iterators
  • e for events

Where I wish to make the type clear, I use standard suffixes (e.g. List, ComboBox, etc).

This makes the programmer aware of the usage of the variable whenever they see/use it. Arguably the most important case is "p" for pointer (because the usage changes from var. to var-> and you have to be much more careful with pointers - NULLs, pointer arithmetic, etc), but all the others are very handy.

For example, you can use the same variable name in multiple ways in a single function: (here a C++ example, but it applies equally to many languages)

MyClass::MyClass(int numItems)
{
    mNumItems = numItems;
    for (int iItem = 0; iItem < mNumItems; iItem++)
    {
        Item *pItem = new Item();
        itemList[iItem] = pItem;
    }
}

You can see here:

  • No confusion between member and parameter
  • No confusion between index/iterator and items
  • Use of a set of clearly related variables (item list, pointer, and index) that avoid the many pitfalls of generic (vague) names like "count", "index".
  • Prefixes reduce typing (shorter, and work better with auto-completion) than alternatives like "itemIndex" and "itemPtr"

Another great point of "iName" iterators is that I never index an array with the wrong index, and if I copy a loop inside another loop I don't have to refactor one of the loop index variables.

Compare this unrealistically simple example:

for (int i = 0; i < 100; i++)
    for (int j = 0; j < 5; j++)
        list[i].score += other[j].score;

(which is hard to read and often leads to use of "i" where "j" was intended)

with:

for (int iCompany = 0; iCompany < numCompanies; iCompany++)
    for (int iUser = 0; iUser < numUsers; iUser++)
       companyList[iCompany].score += userList[iUser].score;

(which is much more readable, and removes all confusion over indexing. With auto-complete in modern IDEs, this is also quick and easy to type)

The next benefit is that code snippets don't require any context to be understood. I can copy two lines of code into an email or a document, and anyone reading that snippet can tell the difference between all the members, constants, pointers, indexes, etc. I don't have to add "oh, and be careful because 'data' is a pointer to a pointer", because it's called 'ppData'.

And for the same reason, I don't have to move my eyes out of a line of code in order to understand it. I don't have to search through the code to find if 'data' is a local, parameter, member, or constant. I don't have to move my hand to the mouse so I can hover the pointer over 'data' and then wait for a tooltip (that sometimes never appears) to pop up. So programmers can read and understand the code significantly faster, because they don't waste time searching up and down or waiting.

(If you don't think you waste time searching up and down to work stuff out, find some code you wrote a year ago and haven't looked at since. Open the file and jump about half way down without reading it. See how far you can read from this point before you don't know if something is a member, parameter or local. Now jump to another random location... This is what we all do all day long when we are single stepping through someone else's code or trying to understand how to call their function)

The 'm' prefix also avoids the (IMHO) ugly and wordy "this->" notation, and the inconsistency that it guarantees (even if you are careful you'll usually end up with a mixture of 'this->data' and 'data' in the same class, because nothing enforces a consistent spelling of the name).

'this' notation is intended to resolve ambiguity - but why would anyone deliberately write code that can be ambiguous? Ambiguity will lead to a bug sooner or later. And in some languages 'this' can't be used for static members, so you have to introduce 'special cases' in your coding style. I prefer to have a single simple coding rule that applies everywhere - explicit, unambiguous and consistent.

The last major benefit is with Intellisense and auto-completion. Try using Intellisense on a Windows Form to find an event - you have to scroll through hundreds of mysterious base class methods that you will never need to call to find the events. But if every event had an "e" prefix, they would automatically be listed in a group under "e". Thus, prefixing works to group the members, consts, events, etc in the intellisense list, making it much quicker and easier to find the names you want. (Usually, a method might have around 20-50 values (locals, params, members, consts, events) that are accessible in its scope. But after typing the prefix (I want to use an index now, so I type 'i...'), I am presented with only 2-5 auto-complete options. The 'extra typing' people attribute to prefixes and meaningful names drastically reduces the search space and measurably accelerates development speed)

I'm a lazy programmer, and the above convention saves me a lot of work. I can code faster and I make far fewer mistakes because I know how every variable should be used.


Arguments against

So, what are the cons? Typical arguments against prefixes are:

  • "Prefix schemes are bad/evil". I agree that "m_lpsz" and its ilk are poorly thought out and wholly useless. That's why I'd advise using a well designed notation designed to support your requirements, rather than copying something that is inappropriate for your context. (Use the right tool for the job).

  • "If I change the usage of something I have to rename it". Yes, of course you do, that's what refactoring is all about, and why IDEs have refactoring tools to do this job quickly and painlessly. Even without prefixes, changing the usage of a variable almost certainly means its name ought to be changed.

  • "Prefixes just confuse me". As does every tool until you learn how to use it. Once your brain has become used to the naming patterns, it will filter the information out automatically and you won't really mind that the prefixes are there any more. But you have to use a scheme like this solidly for a week or two before you'll really become "fluent". And that's when a lot of people look at old code and start to wonder how they ever managed without a good prefix scheme.

  • "I can just look at the code to work this stuff out". Yes, but you don't need to waste time looking elsewhere in the code or remembering every little detail of it when the answer is right on the spot your eye is already focussed on.

  • (Some of) that information can be found by just waiting for a tooltip to pop up on my variable. Yes. Where supported, for some types of prefix, when your code compiles cleanly, after a wait, you can read through a description and find the information the prefix would have conveyed instantly. I feel that the prefix is a simpler, more reliable and more efficient approach.

  • "It's more typing". Really? One whole character more? Or is it - with IDE auto-completion tools, it will often reduce typing, because each prefix character narrows the search space significantly. Press "e" and the three events in your class pop up in intellisense. Press "c" and the five constants are listed.

  • "I can use this-> instead of m". Well, yes, you can. But that's just a much uglier and more verbose prefix! Only it carries a far greater risk (especially in teams) because to the compiler it is optional, and therefore its usage is frequently inconsistent. m on the other hand is brief, clear, explicit and not optional, so it's much harder to make mistakes using it.