C++ for C# programmers

home

One of the best ways to understand a computer language is to see another one. The nice thing about C++ is it's really a simpler language that C# was based on. Reading about it is a good way to understand C# better.

C++ has a reputation as being difficult. That's only because it has no safety features. It lets you make some impossible-to-find bugs (the kind that aren't errors, don't show up in unit tests, and only happen 10% of the time after running for 20 minutes). It also has more symbols and options. But that won't matter if you aren't writing a complete program with it.

Reading about C++ shows you what the computer can do, and the steps to do it. Then, in C#, you'll have a better idea of what commands are really shortcuts, or which errors and rules are for your protection.

This isn't a guide to actually using C++ -- just the ideas and some interesting syntax. But it you actually want to learn C++, it's a decent introduction to the changes and concepts.

General changes

No reflection, no serialization

When C++ was written, programs didn't have these. For serialization, you wrote your own file read/write for each class. Memory tended to be tight, so you hand-made the shortest binary format.

Reflection wasn't a thing back then, and no one requested it. Once it became something that might be nice to have, C++ had too many features that made reflection hard to add.

Casting

C++ casts work like functions -- the thing you want to cast goes in the parens. For example: int(5.4).

If you think about it, ConvertToInt32(5.4) and constructors like p = Point(4,6) are basically casts. Casts should look like functions.

Java introduced backwards casts, (int)5.4, and C# copied them.

Scope resolution operator

You may have noticed that C# double-uses the dot for two different, but similar, things. If you have frog.species, species could be a global, inside the class frog; or it could be a field of the variable frog.

C++ splits that into 2 symbols. The dot is only for variables. It uses a double-colon for globals tucked inside of something, called the scope resolution operator. frog::species tells you that frog is a static inside the frog class.

The advantage is you can tell by looking what something is. sheep::elder.name tells you elder is a static variable inside the class sheep, with member variable name. Whereas sheep::elder::name lets you know elder is a nested class.

All-in-all, combining both into one symbol makes it easier to get started, but then blurs them together, making it harder to realize it's 2 different things.

More flexible namespaces

C++ lets a namespace hold anything: classes (which C# allows) and also functions and variables.

C++ also allows a using anywhere. You can put using CatColors; at the start of one function. You can even pick one thing out: using Math::Floor; lets you use Floor with no qualifier.

The nicest part is general understanding. If you want a global function or variable, you put it in a namespace. You don't need to use the C# trick where you make a static class; understand why it's not a real class, know how static stops it from being a member variable, or remember to put using static.

Warnings

C++ uses the rule that compiler errors are only things that prevent the program from running. Everything else is a warning. If something will probably crash the program, but it can run -- that's a warning.

Back in the day, all warnings and errors were just white text and they all looked equally scary and you took them all seriously. Now with a GUI, the warnings are hidden in a tiny window. By the time C# was made, it was more common to ignore warnings, if you even saw them.

To fight this, C# turned several "probably a bad idea" warnings into errors. In the old days, people fixed them since they were WARNINGS; now in C# you fix them since you have to.

 

You may be used to the standard set of C# warnings, and the general rule to always fix them. C++ doesn't really do that. It has many, many, possible warnings -- more are being written all the time: one for comparing floats using ==, one for bad indentation. There are warnings for using perfectly legal commands (you turn them on so it can warn you about typos that turn into those commands by accident).

The result is that C++ doesn't have a "fix all warnings" idea like C# has. You pick the ones you want to hear about. You can even change the warning levels for different parts of the program. In fact, just like C#, you can even set some warnings to count as errors.

Uninitialized vars

C++ doesn't initialize any variables, and it's not an error to read from one. int n; print(n); will compile, and run, and will print a random value.

That's because it's not an actual error to read from an uninitialized variable. It's a terrible idea, but not an error. When memory is freed, it's not cleared to 0's -- that would waste time. Then when you get a new variable, you're given memory "as is". It has whatever was left over from before.

C++ compilers have two levels of warnings about this: using a definitely uninitialized variable, and maybe using one. The second one is when you have a lot of if's and it seems like there might be a way, but it can't tell. You may have seen that second type in C#. For example:

For example:
bool needsRedo=false;
int valueForRedo;
if( ... ) { needsRedo=true; valueForRedo= ...; } // flag redo and set value
  ...
if(needsRedo) {
  ... = valueForRedo; // using inited value error (which is incorrect).

The compiler can't tell that this code is fine. C# give you errors until you set valueForRedo in a way it knows is safe.

 

C#'s extra-cautious must-initialize rule is there because uninitialized variable bugs are a top-10 C++ debugging nightmare. What happens is most fresh memory is all zeroes (for security, the operating system clears it at the start of your program, so you can't spy on the last program to use it.) If you forget to init a variable to 0, it will probably be 0 anyway. All your tests will run perfectly. But once the program has been running for a while, memory filling with junk, you call the function and it gets a non-0 garbage value. If you're lucky, that will crash the function. If not, you get a randomly appearing wrong answer, which takes forever to track down.

But it does allow fun tricks. You can write a function that prints out a dozen uninitialized variables, just to examine how memory is allocated (you run some other functions first, to put stuff there, have them quit, then run your testing function).

Auto-initializing

C++ never initializes anything. It's always up to you. That's a pretty simple rule.

For fun, is this not legal C#: Cat c; if(c==null). C# initializes all reference types, but only the part on the heap, not the references. But int[] A=new int[5]; print(A[0]) is legal. C# sets ints to 0, if they're on the heap. C# "initialize some things" seems to cause more confusion than picking one way for everything.

Initializing makes the program run a little slower, but new programmers assume everything is initialized. So C# compromised. Now it's easier to teach beginners, using one class and functions with no locals.

Nicer Generic functions

Generics (C++ calls them Templates) look the same, but are more flexible. You can write a template function using anything you want, and it will work with any type that can do it. For example, this will add everything in a list of ints, strings, Points, or any class which overloads +:

// assumes + is overloaded for T
T arraySum(Vector<T> theArray) { // Vector is C++'s name for List
  T sum = theArray[0];
  for(int i=0; i<theArray.Count; i++)
    sum = sum + theArray[i]; // <- using + on type T
  return sum;
}

To do this in C#, you'd need to create an interface named something like Addable and have your classes officially implement it (which wouldn't work to let basic types use it).

Changes for classes

Public/private

In C++, public and private variables are considered equally good options. For structs, the default is public. You set public/private by writing a line that applies to everything below:

// simple class:
struct Goat {
  int name; // these are both public
  int age;
}
  
// class with some of each:
struct Horse {
  private: // everything below this is private
  string name;
  int age;
  
  public: // everything below this is public
  void printMe() { ... }
}

It's a little clunky. It's based on the idea that you naturally put public stuff in one spot, as the interface, and your private stuff in another.

C++ added object-oriented features, making private an option. By the time Java was written, the plan was the make the whole language feel object-oriented, which means encapsulation, which means private as the default. C# borrowed that.

virtual function syntax

In C++, to make a function virtual, you only add virtual before it in the base class. You don't have to write override in front of it in every subclass -- they're automatically virtual.

This is an example where C# added the extra override's to fix what they see as a bug. It's tricky: suppose Cat inherits from Animal, and has a normal function smell(), which is how many seconds it smells new people. Later, Animal adds a virtual smell() function, which is a 1 to 10 rating. The smell() in Cat wasn't supposed to override this one -- it doesn't even mean the same thing, but in C++ it does. When we call a1.smell() when a1 is a Cat, we'll mistakenly call the cat::smell function.

Most of the changes in C# are to make things easier for beginners. This one is the other way -- the problem it prevents can only happen in a big, multi-person program, and the fix makes things a little harder for beginners.

No default values for member variables

C++ only allows you to set starting values in the constructor. It doesn't have the shortcut where you can write int age=2; where you declare the field:

class Zebra {
  int age=2; // error (in C++)
    ...

If you didn't know, C# actually secretly sneaks age=2 as the first thing in all constructors. The rule really is just a shortcut, and constructors are the real way everything gets starting values, even in C#.

This is a real oddball shortcut. It's definitely simpler for coders who don't know how to use constructors yet. But a program with constructors should assign everything there. Using this trick to split them up is confusing.

Not important, but C++ a shortcut for this, sort of. At the top of your constructor, you can call the one for any field: public Zebra(): age(99) { body of constructor }.

Multiple inheritance, no interfaces

C++ allows you to inherit from as many base classes as you want:

class Cat : Animal, Obstacle {

It looks the same as inheriting from a class and an interface, except everything is a class. It works in the obvious way: you get all the variables and functions from everything. Instead of using base-dot for the base class, you list it by name, but other than that the rules are the same.

C++ doesn't have interface classes, since it doesn't need them. You can make a class with only virtual functions, marked "subclass must implement", and it's an interface. C++ even allows you to add a few member variables and give the functions a default body. Obstacle could be a "real" class we also inherit, or mostly an interface type class, or a mix, and it's fine.

Java and C# banned multiple inheritance since it can make a mess. What if the base classes use the same variable names? What if they both inherit from the same thing -- do you get two copies? These are fixable, but figuring out the problem can be a pain. C# added Interface's as a way to fake multiple inheritance (that's why interfaces can't have anything in them).

C++ version of reference types

C# Reference is split into pointer/reference

You may have noticed C# has two different meanings for the word reference. First you learn value-type and reference type, and get used to saying Cat c; is a reference. Then you see call-by-reference. You get to decipher things like passing a value by reference, a reference by value, and, the worst, a reference by reference.

The problem is there are two similar but different things, and C# accidentally used the same word for them both. C++ uses two words -- one word for each thing.

The most common is C# reference variables. C++ calls them pointers, after the way you use them to point to things. They're real variables, which can be changed, and you can check where they point. The second type, in call-by-reference, is a locked-in alias. In this picture, d1 is an actual Dog, r1 is a "type-2" reference, and p1 is a pointer:

     ----
 d1 |    |
     ----
   /    \
  /     --
r1   p1|  |
        --

r1 is a Dog. It counts as d1. p1 isn't a Dog -- it's a pointer to a Dog. It has it's own memory, which happens to be pointing to d1, but that could change.

The nice thing about C++ is that when you come to call-by-reference, it's the first time you hear that word. We've been calling the other things pointers the whole time. Then you learn a reference is like a locked-in pointer. C++ can pass pointers by reference, which works the same as passing a reference by reference in C#, but the terms aren't as confusing.

C# didn't merge those two words on purpose. What happened was Java called them references. C# borrowed the term, then added call-by-reference, so the word got double-used.

No garbage collection

As a review: in C# you create classes and arrays on the heap, using new. They become memory-wasting garbage when you lose the last reference. When too much memory is wasted, the system runs garbage collection.

Things are the same in C++, except for the last part. When you're about to lose the last pointer, you need to use delete(c1) to clean it up yourself. That immediately frees the memory. If you forget, you've got permanent garbage. C++ never runs garbage collection.

That seems pretty bad, but in practice, most C++ programs never use new. When you do, you use it a lot less.

But even so, remembering to use delete is another of the top 10 reasons C++ is hard. Each time you forgot, that part of the program generates some permanent memory-wasting garbage. It eventually runs you out of memory and crashes. It's hard to track down who forgot to use delete. Some C++ programs make garbage slowly enough that it's easier to restart them every few days than it is to find the problem.

As you might guess, the advantage is speed. Garbage collection is slow and potentially causes a little hicup. The other is versitility. Many of the limits on C# are to make the garbage collector work.

new only means grabbing memory

In C++, new is only used to create something on the heap. That's the original meaning of new -- allocating new memory. To simply run the constructor to the values, you leave it out:

c1 = Cat("Fred", 2); // simple assignment

d2 = new Dog("Blackie", 5); // create a fresh Dog for d2

// changed our mind, put something else in d2, re-using same space:
*d2 = Dog("Brownie", 3);

C# added dummy new's for structs or new int()'s as a simplification. Everything gets a new. That's easier for beginners. C++ likes the rule that commands say exactly what they do -- if you aren't allocating new memory from the heap, you don't write new.

Any variable can be value-type or reference type

C++ doesn't have value or reference types since anything can be either. In other words, when you declare a class variable, you can make it count as a struct or a class, by how you declare it. A star after the type means "pointer to" (which, remember, is like a reference in C#):

Cat c1, c2; // normal Cats ("value type")
c1.age=7; // legal

Cat* cp3 = null; // pointer to a Cat
// will probably aim at c1 and c2

Because of this, C++ programs don't have to think about ownership as much. c1 and c2 own their Cats since they are their Cats. You only declare a pointer, like cp, when you need to point to other people's.

The drawback is you need to understand stack vs. heap sooner. You also need to know how the star symbol works. C# doesn't need any stars.

Anything can be on the heap

This is the other half of "anything can be value or reference type". So far, classes can decide to be on the heap or stack. So can basic types. They can decide to be pointers, to heap data. You just create them with new:

int* np = new int(); // pointer to an integer

// example:
int* np2 = np; // both are sharing that heap integer

The rules are simple: star turns anything into a pointer, and new creates any type on the heap. And you always have those options with every type.

Can point to stack data

This is more of the same freely mixing pointers and stack/heap. A pointer in C# can also aim at the stack. An ampersand before something means "make a pointer to":

Cat c1; // regular Cat
Cat *cp;
cp = &c1; // cp points to c1

In C++, the same pointer can aim at heap data, or stack data, and it works the same either way. Even better, suppose we have an array of Cats. We can can put the Cats inside the array, like structs, and we can still aim pointers at them if we need to.

In C++ you can make a C#-style array of Cats -- make an array of pointers, using Cat*, and use new to create every Cat. But there's almost never a reason to do that (only if you plan to shuffle them around, using the pointers).

An advantage, and a drawback is you can tell be looking what sort of thing you're pointing to. Here aiming at value c1 needs to &, but aiming at cp2 can't have the & (since cp2 is already a pointer):

Cat c1;
Cat* c2 = new Cat();

Cat* cp1;
cp1 = &c1; // pointer to value-type
cp1 = cp2; // pointer-to-pointer assignment

As you might guess, all these stars and ampersands scare away lots of people. The math works out, and the errors make sure you understand exactly why things aren't matching. But it's not something you learn in a few days.

Explicit dereference

This is about a C# shortcut where it condenses two steps into one. For a C# reference a1.age follows a1 to the actual object, then uses the dot to look it up. That's two steps. Compare to having a1 be a struct, where it's only the second step. There's no following.

In C++, you're required to write "follow the pointer" as a step. Put a star in front of a pointer. It works like this:

Cat* c1 = new Cat(); // sample Cat pointer
(*c1).age = 6; // follow c1 to the Cat, then go to the age

// compare:
Cat c2; // not a pointer
c2.age=3; // don't need the step to follow it

The advantage is you always know when we're using a pointer. (*cp).age lets us know something tricky is going on.

The drawback is obvious: you have to know what symbol to use where.

We need this rule for pointers to ints. Here were can change where np points, or change the contents of what it points to:

int n1=5, n2=9; // sample ints to work with
int* np; // pointer to int

np = &n1; // changing where np points (no star in front)
*np = n2; // follow np to the int, and change that int (need the star) 

A really nice thing about this rule is how it respects types np = n1; is "clearly" an error, since you're assigning in int to a pointer-to-int, which are completely different types.

Summary

It turns out the stack and the heap are real -- not just part of C#. The stack is fast for creating and deleting, but not for things that change size or need to live after the function ends. Heap data is permanent, but slow and takes more space.

C++'s plan is simple: any type of data can be in either place, and a pointer can go to anything, anywhere. Some uses, or mistakes, can lead to hard-to-find bugs, so be careful.

The reference/value type system is strictly a limitation: you can't point to the stack, and you can't decide where to create something. That allows for a simpler language -- for example there's no symbol to declare a pointer since everything either has to be one or can't be one. But it's still the same basic system as C++.

The biggest oddity of reference types, to a C++ user, is sometimes being forced to use pointers and new: you want an array with Cats inside of it, but since Cat's a class, you're forced to make an array of references to Cats. But that's too bad in exchange for free garbage collection and fewer bugs.

More changes

Arrays

C++'s array syntax is ugly. The brackets go after the variable, which means the name of the type "array of int" is broken up:

int A[]; // A is a C++ array of integers

// Gah! Can declare normal ints and arrays of ints on the same line:
int n1, N[10]; // n1 is not an array

The reason for that syntax is so we can create stack arrays. int N[10]; is part of the function stack, which means it's fast to make and auto-destroyed when the function ends. If we needed it to live on, we would have used new.

C++ arrays also don't range check and don't store their length. There's no A.Length function in C++.

The idea is that you should normally use the "good" arrays -- List's in C#, Vectors in C++. Those use tricks to let you add items, and have more functions, but are basically arrays and are almost as small and fast. Arrays are for special purposes, mostly speed. And no range-checking an no length is as fast as you get.

The problem is, think of all the times a loop went one past the end, you got a run-time error, and fixed the loop. In C++ there's no error. N[12] reads from the uninitialized data past the end, which is a yet another top-10 C++ bug. C# decided to make arrays just a little slower and longer, in order to add safety.

Fun fact: C++'s Vector class can do either: one way range-checks, and another doesn't. You can test your program the using first way, hopefully fix all of the of-ends, then ship faster code the second way (which will have only 1 or 2 horrible bugs).

ref/out / call-by-reference

C++ has one only type of call-by-reference. You pass your variable and the one in the function becomes a hard link back to it. You can use it in a "change this variable" way, or a "fill these empty boxes" way.

In C#, ref and out are the same -- both are simple call-by-reference. The only difference is the error messages. Both make the local variable an alias for the parameter.

ref has normal C# error messages -- you can't pass unassigned variables to a function. out rearranges errors to make sense for a "fill these boxes" function. The error about not setting them is mostly a reminder.

The other difference is that C++ doesn't require anything extra when you call the function, only when you write it. If clamp is call-by-reference, use it with clamp(n, 1, 10;).

C# added a safety rule requiring ref/out in the call. That makes call-by-reference stand out more, since new programmers often have trouble with it.

Dispose / Finalize / Destructor

C++ and C# have special functions that run whenever a class instance is destroyed. In C# they are Finalize and Dispose, and you almost never use them. In C++ it's called the Destructor. Like Finalize, it's written using a tilda in front of the class name.

In C++, any class that uses new will also use delete in the destructor. Most normal C++ programs use classes that automatically clean up after themselves. The programs don't need new -- it's in the well-tested classes.

In C++ the destructor runs immediately when the object dies. In C# versions, Finalize runs whenever it's finally garbage-disposed. That could be a while. But it's fine since you rarely need a Finalize. C#'s Dispose is for when you need it to run right away. That's what the special using block is for -- it marks out when you want to destroy the object and immediately run the Dispose function.

In other words, C# is nicer than C++ since you never write destructors. But it's also worse since garbage collection makes them more complicated.

char size

Very old computers used just enough bits to store upper-case letters and a little punctuation. By the time C was written, 8-bit bytes were standard, giving 255 options, which was considered plenty for every symbol on the keyboard and more. In C++ char are one byte.

A funny side-effect is that C++ doesn't have a byte type, because char means byte. char-arrays are common in C++ and everyone knows it's really a byte array.

By the time C# was written, most programs obviously needed the new Unicode extended letters. So C# uses 2 bytes for characters.

A funny thing is that C#'s 2 bytes is now too small. Unicode symbols with 3 or 4 bytes, are becoming more common (for example, in names of heavy metal bands). But english still only needs one byte, so C# is in in-between land on characters.

Reinterpret casts

As we know, everything is stored in bytes, using 0's and 1's. C++ lets you look at it and read it however you want. You can look at anyone else's memory and read it as if it were some other type. The result is usually junk, but you can do it. As you might guess, Java/C# think this is risky, and it is, and banned it.

This lets you read from a single integer as if it was a size-4 char array (remember, those are bytes):

int n=21; // sample value
char* B = &n; // B is an array (C++ arrays have weird syntax)
B[2]=99; // changing the 3rd byte of n

  n (a 4-byte int)
 --------
|        |
 --------
  3 2 1 0
  B (4 bytes)

n and B are both looking at the same 4 bytes, thinking of them in different ways. The last line junks n up pretty bad. But you can use it to see how memory works, and sometimes the trick is useful.

Typedefs

If you remember, C# allows you to pre-make a name for a function type, using the delegate keyword; then use that new name to declare variables. C# allows you to do that for anything. For example:

typedef Vector CatList; // defines new type CatList

// use it:
CatList C1, C2; // shortcut for: Vector C1, C1;

void lookAtCats(CatList C) { // likewise -- function takes a Vector input

C# banned it since it can be confusing when overused (you can have typedef using other typedefs).

Directly creating references

In C++ you can make a call-by-reference type reference without calling a function. This makes g into another name for an age inside of Cats:

int& g = Cats[4].age; // establish g as a reference to that age

The & after the type means it's a reference. G isn't a pointer -- it's another name for that box, and links directly to it. g++; changes the age of that Cat, inside the array.

C# has a limited version of this, but you have to set it using a function. It's clunky compared to the C++ way (but you don't use this trick often).

No cast between int and bool

In C#, if(done=true) is a pretty bad bug. Forgetting the second = makes it always run, since it always changes done to true. There's no error message -- just code that mysteriously takes the if, even when done is false.

C++ has an even worse, more common version. if(n=3) accidentally changes n to 3, and always runs. That's because computers store bools as just ints, with 0=false, and anything else = true. 3 counts as true.

That rule can come in handy. In C# you can write if(c), with a reference variable, to check for false. In C++ you can also use if(n) as a shortcut for "if n not equal 0".

But if(n=0) is such a well-known potential bug in C++ that the C# designers specifically wanted to make it impossible (by making it an error.) They could have banned assignment statements inside conditions, or make assignment statements not have return values (Swift does that). Instead they banned casts between ints and bools.

That seems pretty extreme -- like nailing shut one of your windows since the factory makes bad smells on Wednesdays. But the if(n=3) bug is really bad, and they really wanted to be sure you got a compile error.

Duplicate variable in nested block scopes

You may have written something to test how block scope works and gotten a confusing error. For example, in this the loop i should temporarily cover up the outer i:

int i=876; // outer i
for(int i=0; i<3; i++) { print(i); } // 0 1 2
print(i); // 876, inner i is gone, back to the first one

For real, this is totally fine. It's the same general idea as local variables covering up globals. It would run in C++.

But this is another thing C# thought often led to confusing code, so banned. In C++ it might be a low-level "this might be confusing" warning.