martes, 24 de mayo de 2016

Neovim: Next generation vim


Motivation

I was wandering the web looking for some vim plugins that feature golang syntax and autocompletion. I found out about the neovim project and got interested on it, cause I'm a vim user. Here's what I've found.


Neovim?

Official Neovim webpage states,

Neovim is an extension of Vim: feature-parity and backwards compatibility are high priorities. If you are already familiar with Vim, see :help nvim-from-vim to learn about the differences.

After using it for more than a week I gotta say it's worth to use it. The customization is way to easy, as for performance neovim loads faster than my vim (this could be due to the installed plugins or other setups, though). The installation of plugins is pretty easy with the aid of vim-plug which can be installed by a single curl request from the command line.


Installation and use of Plugins

Detailed instructions of installation are located on my repo https://github.com/Daniel-M/nvimConfigFiles where I store my neovim configuration files. The Readme describes all steps to get Neovim up and running in Debian.


References

lunes, 9 de mayo de 2016

C++ seems faster than C at summing big arrays!!

Forewords

I'm some sort of self-taught programer, so it is expected that my methods and algorithms aren't state of the art optimized to best design practices and improved performance. I made the code without thinking on but doing the tast and leaving aside the code optimization and all that stuff.

The result here is just what I have found. I think that sitting and optimizing code could turn tables, but, not anyone is capable of efficiently write code (a.k.a. optimizing it for best performance), specially if you are a scientist coding (and not a computer scientist).


Motivation

For several months I've been wondering whether C or C++ is more efficient at doing science related tasks. As far as I know, the most common tasks in science involve linear algebra, and one of the simplest tasks is the sum of vectors. Given that I'm preparing some study materials I've decided to test it out and here is what i've found so far.


C++ seems faster than C at summing big arrays of integers

I made codes that compares the performance between regular C arrays (also C++ valid arrays) and STL arrays (exclusive of C++) by taking two random vectors up to 200000 components as input from a file.

One program run consist of,

  • Read a 1000 components of two random vectors from a given file.
  • Sum component to component the couple of vectors read. Report the number of components and the milliseconds it took for the whole process to standard output.
  • Increase the number of components by 1000, then repeat the read-sum-report procedure stated above.
  • Repeat increasing size until the vectors reach 200000 components.

The codes are available here


Results

Well, using my unoptimized algorithm it seems that C++ is indeed a little bit faster than C.
The time-to-completion was measured 300 times on the same machine, the resulting run-times where averaged to compensate the effects of anyother processes running.
A fitting to linear functions seemed suitable for the resulting data. The blue marks on the graph represent C++ run-times whilst purple marks represent those of C.
The fitting was made using gnuplot with the fit function: f(x) for C and g(x) for C++ (details below)


C vs C++ Benchmarking


Fitting for C with f(x)

f(x)=A*x+B

The relevant output of the fitting was


After 8 iterations the fit converged.
final sum of squares of residuals : 68.4475
relative change during last iteration : -1.00902e-13

degrees of freedom (FIT_NDF) : 197
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.589448
variance of residuals (reduced chisquare) = WSSR/ndf : 0.347449

Final set of parameters Asymptotic Standard Error
======================= ==========================
A = 0.000385433 +/- 7.274e-07 (0.1887%)
B = 1.36903 +/- 0.08389 (6.127%)

Fitting for C++ with g(x)


function used for fitting: g(x)
g(x)=C*x+D

After 8 iterations the fit converged. q
final sum of squares of residuals : 48.4348
relative change during last iteration : -1.78975e-14

degrees of freedom (FIT_NDF) : 197
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.495845
variance of residuals (reduced chisquare) = WSSR/ndf : 0.245862

Final set of parameters Asymptotic Standard Error
======================= ==========================
C = 0.000367315 +/- 6.119e-07 (0.1666%)
D = 1.16317 +/- 0.07056 (6.067%)


Conclusions

At the beginning, the execution times of both languages are undistinguishable but it seems that C++ has better performance than C as the size of the vectors increase. It could be a matter of design, more test with different algorithms should be performed.

The difference between the fitted curves is the linear function
D(x) = f(x) - g(x) = 0.000018118*x-0.20586
Clearly when N=10 000 000 the difference between runtimes would be of just 17.91214 ms so whats the point of using C++ over C if the performance difference would be so small? Well, because of Object Oriented Programming, that's why. Besides, the C++ std::array are expected to be optimized over the regular int array[] of C.

viernes, 6 de mayo de 2016

Introducción a la programación en Python para Biólogos

Introducción a la programación en Python para Biólogos

Preparé unos materiales de estudio y algunos ejemplos para un curso express de Python que presenté para el Instituto de Biología de mi alma mater

Los materiales de estudio están disponibles en el repositorio https://github.com/Daniel-M/IntroPythonBiologos al igual que códigos de ejemplo.

Los materiales de estudio consisten en notebooks de jupyter y documentos en pdf que se encuentran en la carpeta docs/notes


Enlaces