Wednesday, 10 December 2014

Let reason prevail

Hark! 'Tis the call of a higher-order language. It beckons with sweet words of reduced time solving ancient, low-level problems! The siren speaks true!

Seriously though...

Higher-order programming languages afford the competent programmer a method for solving more higher-level problems in the same timeframe than she could in with a lower-order language. Simply put, high-order languages (like C++, C#, Python, Ruby, PHP, etc) afford the programmer the ability to skip over the low-level "move this byte to this address" kind of programming so that she can solve more interesting and/or more lucrative problems.

There is a cost though: iron. These higher-order languages require smarter compilers and, typically, more RAM and CPU. It's a cost we're happy to pay though -- iron is cheaper than development.

But when someone starts making outlandish claims that a higher-order language is more proficient than a lower-order one (, see the string concatenation example in the first section), the reasonable programmer doesn't just gulp that down, even if it does come from the father of C++. Actually, especially if it does come from the father of a higher-order language, since he would have reason to pad out his results (or pretend to be lazy) to make his prodigy all the more appealing.

Aside: before we go any further, a disclaimer: I like C++. I like C. That's OK to like both and recognise their strengths. What's NOT ok (imo), is to spread misinformation to highlight the language of your preference. Have honest reasons for preference -- by all means! -- and be objective about comparisions. And the discourse continues!

So let me make a really bold statement:

Any proficient code in a higher-order language can only hope to be (at best) as proficient as proficient code in a language of lower-order for solving the same problem.


Let's take the string vs char* example from above:

std::string has to be implemented (at some point) around a buffer of memory. I don't care if it's char*, wchar* or whatever. It's a buffer which, at some point, was obtained via malloc() (even if you want to say it was obtained by new char[], that still boils down to essentially a malloc, so let's stop arguing semantics). The C++ compiler affords us the ability to overload operators, such that we can do:
string1 + string2
and get another string. Under the hood, this is allocating a third string object and the associated memory and doing some memory copying. One way might be to malloc() on strlen(string1) + strlen(string2) + 1 char for the null terminator, then strcpy() and strcat() in the parts. There are quite a few ways this could be done, but this is one.

Now the problem lies here: in an opaque, higher-order implementation of string concatenation (for example), the best outcome we could hope for is the one which is fastest in C, ie which we discovered by trying all paths including strcat, memcpy, strcpy, etc. So let's assume that the best path was chosen for std::string's + operator overload. That still makes it only as fast as the best implementation in C. Take a step back and realise that the + operator may also do clever things like allocate more memory than required to save on a realloc() later as well as rudimentary bounds-checking or what-have-you, and it's easy to see why the string variant is 10-20x slower ( -- oddly enough, the disparity was greater on a win32 version I did earlier today where the sprintf() version took only 48ms for 32768 iterations vs over 600ms for the string version. The code linked here reports the following results on my Linux machine:

C++ function: 32768 runs took 5827 ms
C function: 32768 runs took 706 ms
C function (sprintf): 32768 runs took 4826 ms

Which still shows that the C++ version is nearly 10x slower. Some of the difference I've experienced between platforms may be to Microsoft optimisations as well as stdlib being (in my experience) much slower on Windows than Linux (not counting boost, which I haven't used, but which benchmarks well). The C++ version may optimise for frequent use better than the quoted C version -- but the C programmer is free to update her code accordingly, as required. Again, remember, we're talking about proficient code solutions, so when you change the parameters of the argument, the code is free to change too.

This shouldn't be surprising. The C++ code has to deal with the generic case (and provides a lot of extra functionality which is probably worth the cost) than the C version. But let me re-iterate:

Any proficient code in a higher-order language can only hope to be (at best) as proficient as proficient code in a language of lower-order for solving the same problem.

Note the use of the word "proficient". If I write shitty C code, I bet you can write good C++ that is faster. Same goes for any other pairing: if I write shitty low-order code, I'm sure you can write good high-order code which out-performs my shitty code.

The request I have to the programming community is this:

Please be honest and stop trying to win a "my language is better than yours" with pure lies and FUD. When speaking of well-written, proficient code, Ruby/Python/.NET IL/PHP/C++/whatever is NOT faster than C or assembly. The lower you go, the more specific your instructions can be and the more efficient the overall run can be.

We all accept the costs of higher-order languages because of what they offer us:
  • Quicker to code solutions to the problems which are interesting and/or lucrative, therefore cheaper (overall) than the iron required to run the output
  • Safe memory handling
  • Rich libraries
  • Abstraction from the iron
  • Easier-to-read (and therefore maintain) code
  • And a host of other benefits

I'm open for challenges on this though. Provide a problem which is relatively small to solve in a lower-order language which you think is faster (not smaller or more elegant) in a higher-order language and I'll see what I can do prove my point. Remember: a small problem, like something mathematical or string manipulation.

No comments:

Post a Comment

Everything sucks. And that's OK.

There is no perfect code, no perfect language, no perfect framework or methodology. Everything is, in some way, flawed. This realisati...