Significant performance difference of std clock between different machines

Testing something else I stumbled across something that I haven't managed to figure out yet.

Testing something else I stumbled across something that I haven't managed to figure out yet.

Let's look at this snippet:

#include <iostream>
#include <chrono>

int main () {
int i = 0;
using namespace std::chrono_literals;

auto const end = std::chrono::system_clock::now() + 5s;
while (std::chrono::system_clock::now() < end) {
++i;
}
std::cout << i;
}

I've noticed that the counts heavily depend on the machine I execute it on.

I've compiled with gcc 7.3,8.2, and clang 6.0 with std=c++17 -O3.

On i7-4790 (4.17.14-arch1-1-ARCH kernel): ~3e8

but on a Xeon E5-2630 v4 (3.10.0-514.el7.x86_64): ~8e6

Now this is a difference that I would like to understand so I've checked with perf stat -d

on the i7:

       4999.419546      task-clock:u (msec)       #    0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
120 page-faults:u # 0.024 K/sec
19,605,598,394 cycles:u # 3.922 GHz (49.94%)
33,601,884,120 instructions:u # 1.71 insn per cycle (62.48%)
7,397,994,820 branches:u # 1479.771 M/sec (62.53%)
34,788 branch-misses:u # 0.00% of all branches (62.58%)
10,809,601,166 L1-dcache-loads:u # 2162.171 M/sec (62.41%)
13,632 L1-dcache-load-misses:u # 0.00% of all L1-dcache hits (24.95%)
3,944 LLC-loads:u # 0.789 K/sec (24.95%)
1,034 LLC-load-misses:u # 26.22% of all LL-cache hits (37.42%)

   5.003180401 seconds time elapsed

   4.969048000 seconds user
   0.016557000 seconds sys

Xeon:

       5001.000000      task-clock (msec)         #    0.999 CPUs utilized
42 context-switches # 0.008 K/sec
2 cpu-migrations # 0.000 K/sec
412 page-faults # 0.082 K/sec
15,100,238,798 cycles # 3.019 GHz (50.01%)
794,184,899 instructions # 0.05 insn per cycle (62.51%)
188,083,219 branches # 37.609 M/sec (62.49%)
85,924 branch-misses # 0.05% of all branches (62.51%)
269,848,346 L1-dcache-loads # 53.959 M/sec (62.49%)
246,532 L1-dcache-load-misses # 0.09% of all L1-dcache hits (62.51%)
13,327 LLC-loads # 0.003 M/sec (49.99%)
7,417 LLC-load-misses # 55.65% of all LL-cache hits (50.02%)

   5.006139971 seconds time elapsed

What pops out is the low amount of instructions per cycle on the Xeon as well as the nonzero context-switches that I don't understand. However, I wasn't able to use these diagnostics to come up with an explanation.

And to add a bit more weirdness to the problem, when trying to debug I've also compiled statically on one machine and executed on the other.

On the Xeon the statically compiled executable gives a ~10% lower output with no difference between compiling on xeon or i7.

Doing the same thing on the i7 both the counter actually drops from 3e8 to ~2e7

So in the end I'm now left with two questions:

  • Why do I see such a significant difference between the two machines.
  • Why does a statically linked exectuable perform worse while I would expect the oposite?

Edit: after updating the kernel on the centos 7 machine to 4.18 we actually see an additional drop from ~ 8e6 to 5e6.

perf interestingly shows different numbers though:

   5002.000000      task-clock:u (msec)       #    0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
119 page-faults:u # 0.024 K/sec
409,723,790 cycles:u # 0.082 GHz (50.00%)
392,228,592 instructions:u # 0.96 insn per cycle (62.51%)
115,475,503 branches:u # 23.086 M/sec (62.51%)
26,355 branch-misses:u # 0.02% of all branches (62.53%)
115,799,571 L1-dcache-loads:u # 23.151 M/sec (62.51%)
42,327 L1-dcache-load-misses:u # 0.04% of all L1-dcache hits (62.50%)
88 LLC-loads:u # 0.018 K/sec (49.96%)
2 LLC-load-misses:u # 2.27% of all LL-cache hits (49.98%)

5.005940327 seconds time elapsed

0.533000000 seconds user
4.469000000 seconds sys

It's interesting that there are no more context switches and istructions per cycle went up significantly but the cycles and therefore colck are super low!

C/C++ vs. Rust: A developer’s perspective

C/C++ vs. Rust: A developer’s perspective

In this post, you'll see the difference between Rust and C/C++ in a developer’s perspective

Originally published by Maourice Gonzalez at https://www.onmsft.com

C++ is an incredibly fast and efficient programming language. Its versatility knows no bounds and its maturity ensures support and reliability are second to none. Code developed in C++ is also extremely portable, all major operating systems support it. Many developers begin their coding journey with the language, and this is no coincidence. Being object-oriented means it does a very good job of teaching concepts like classes, inheritance, abstraction, encapsulation and polymorphism. Its concepts and syntax can be found in modern languages like C#, Java and Rust. It provides a great foundation that serves as a high speed on ramp to the more popular, easier to use and modern alternatives.

Now it’s not all rosy. C++ has a very steep learning curve and requires developers to apply best practices to the letter or risk ending up with unsafe and/or poor performing code. The small footprint of the standard library, while most times considered a benefit, also adds to the level of difficulty. This means successfully using C++ to create useful complex libraries and applications can be challenging. There is also very little offered in terms of memory management, developers must do this themselves. Novice programmers could end up with debugging nightmares as their lack of experience leads to memory corruption and other sticky situations. This last point has lead many companies to explore fast performing, safe and equally powerful alternatives to C++. For today’s Microsoft that means Rust.

The majority of vulnerabilities fixed and with a CVE [Common Vulnerabilities and Exposures] assigned are caused by developers inadvertently inserting memory corruption bugs into their C and C++ code - Gavin Thomas, Microsoft Security Response Center

Rust began as a personal project by a Mozilla employee named Graydon Hoare sometime in 2006. This ambitious project was in pre-release development for almost a decade, finally launching version 1.0 in May 2015. In what seems to be the blink of an eye it has stolen the hearts of hordes of developers going as far as being voted the most loved language four years straight since 2016 in the Stack Overflow Developer Survey.

The hard work has definitely paid off. The end result is very efficient language which is characteristically object oriented. The fact that it was designed to be syntactically similar to C++ makes it very easy to approach. But unlike the aforementioned it was also designed to be memory safe while also employing a form of memory management without the explicit use of garbage collection.

The ugly truth is software development is very much a trial and error endeavor. With that said Rust has gone above and beyond to help us debug our code. The compiler produces extremely intuitive and user friendly error messages along with great direct linking to relevant documentation to aid with troubleshooting. This means if the problem is not evident, most times the answer is a click away. I’ve found myself rarely having to fire up my browser to look for solutions outside of what the Rust compiler offers in terms of explanation and documentation.

Rust does not have a garbage collector but most times still allocates and release memory for you. It’s also designed to be memory safe, unlike C++ which very easily lets you get into trouble with dangling pointers and data races. In contrast Rust employs concepts which help you prevent and avoid such issues.

There are many other factors which have steered me away from C++ and onto Rust. But to be honest it has nothing to do with all the great stuff we’ve just explored. I came to Rust on a journey that began with WebAssembly. What started with me looking for a more efficient alternative to JavaScript for the web turned into figuring out just how powerful Rust turns out to be. From its seamless interop…

Automatically generate binding code between Rust, WebAssembly, and JavaScript APIs. Take advantage of libraries like web-sys that provide pre-packaged bindings for the entire web platform. – Rust website

To how fast and predictable its performance is. Everything in our lives evolves. Our smartphones, our cars, our home appliances, our own bodies. C++ while still incredibly powerful, fast and versatile can only take us so far. There is no harm in exploring alternatives, especially one as exceptional and with as much promise as Rust.

What do you guys think? Have you or would you give Rust a try? Let us know your thoughts in the comments section below.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading

Why you should move from Node.js to Rust in 2019

Rust Vs. Haskell: Which Language is Best for API Design?

7 reasons why you should learn Rust programming language in 2019

An introduction to Web Development with Rust for Node.js Developers

Why do C and C++ allow the expression (int) + 4*5?

Why is this&nbsp;<em>(adding a type with a value)</em>&nbsp;possible? (tried with g++ and gcc.)

(int) + 4*5;

Why is this (adding a type with a value) possible? (tried with g++ and gcc.)

I know that it doesn't make sense (and has no effect), but I want to know why this is possible.