Forewords
I'm some sort of self-taught programer, so it is expected that my methods and algorithms aren't state of the art optimized to best design practices and improved performance. I made the code without thinking on but doing the tast and leaving aside the code optimization and all that stuff.
The result here is just what I have found. I think that sitting and optimizing code could turn tables, but, not anyone is capable of efficiently write code (a.k.a. optimizing it for best performance), specially if you are a scientist coding (and not a computer scientist).
Motivation
For several months I've been wondering whether C or C++ is more efficient at doing science related tasks. As far as I know, the most common tasks in science involve linear algebra, and one of the simplest tasks is the sum of vectors. Given that I'm preparing some study materials I've decided to test it out and here is what i've found so far.
I made codes that compares the performance between regular C arrays (also C++ valid arrays) and STL arrays (exclusive of C++) by taking two random vectors up to 200000 components as input from a file.
One program run consist of,
- Read a 1000 components of two random vectors from a given file.
- Sum component to component the couple of vectors read. Report the number of components and the milliseconds it took for the whole process to standard output.
- Increase the number of components by 1000, then repeat the read-sum-report procedure stated above.
- Repeat increasing size until the vectors reach 200000 components.
The codes are available here
Results
Well, using my unoptimized algorithm it seems that C++ is indeed a little bit faster than C.
The time-to-completion was measured 300 times on the same machine, the resulting run-times where averaged to compensate the effects of anyother processes running.
A fitting to linear functions seemed suitable for the resulting data. The blue marks on the graph represent C++ run-times whilst purple marks represent those of C.
The fitting was made using gnuplot
with the fit function: f(x) for C and g(x) for C++ (details below)
Fitting for C with f(x)
f(x)=A*x+B
The relevant output of the fitting was
After 8 iterations the fit converged.
final sum of squares of residuals : 68.4475
relative change during last iteration : -1.00902e-13
degrees of freedom (FIT_NDF) : 197
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.589448
variance of residuals (reduced chisquare) = WSSR/ndf : 0.347449
Final set of parameters Asymptotic Standard Error
======================= ==========================
A = 0.000385433 +/- 7.274e-07 (0.1887%)
B = 1.36903 +/- 0.08389 (6.127%)
Fitting for C++ with g(x)
function used for fitting: g(x)
g(x)=C*x+D
After 8 iterations the fit converged. q
final sum of squares of residuals : 48.4348
relative change during last iteration : -1.78975e-14
degrees of freedom (FIT_NDF) : 197
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.495845
variance of residuals (reduced chisquare) = WSSR/ndf : 0.245862
Final set of parameters Asymptotic Standard Error
======================= ==========================
C = 0.000367315 +/- 6.119e-07 (0.1666%)
D = 1.16317 +/- 0.07056 (6.067%)
Conclusions
At the beginning, the execution times of both languages are undistinguishable but it seems that C++ has better performance than C as the size of the vectors increase. It could be a matter of design, more test with different algorithms should be performed.
The difference between the fitted curves is the linear function
D(x) = f(x) - g(x) = 0.000018118*x-0.20586
Clearly when N=10 000 000
the difference between runtimes would be of just 17.91214 ms so whats the point of using C++ over C if the performance difference would be so small? Well, because of Object Oriented Programming, that's why. Besides, the C++ std::array
are expected to be optimized over the regular int array[]
of C.