You’ve done your data science work in Python. The numbers make sense; the results are nontrivial. Now you just need to put them to practice. You want to obtain the same outstanding results – only now on new data arriving in real time, twenty-four hours a day, seven days a week.
Should you just clean up your Python code from the Jupyter notebook (make it work, make it right, make it fast), package it as a collection of modules – as a library – and put it into production for real-time use by others?
Sometimes the answer is yes. More often than not, though it is 'no.' Python is not an ideal language for production use. It is slower than many other languages. There’s that global interpreter lock (GIL). Python’s duck- and dynamic typing are a double-edged sword: those annoying checks that you were happy to forego while prototyping could come back to bite you when a corner case surfaces in real-life use. And they always do surface.
So, you begin to ponder another question: C++ or Java?
C++: The hardest programming language of the lot
C++ is arguably the toughest programming language, fit only for the smartest people. The C++ assembly code is closer to the metal. C++ has been the favourite of hardcore developers (and quants) ever since Bjarne Stroustrup created it in 1985.
A C++ pointer is the address of a variable in memory. But with the great power of low-level memory access comes the great responsibility for managing the lifetime and ownership of objects. The side effect is this: you think you are debugging a Monte Carlo engine or PDE solver, whereas in actual fact you are spending most of your time debugging memory access. Especially if you are new to C++; and it takes time, potentially years, to get quick at this.
Smart pointers make the task of memory management somewhat easier but do not eliminate the problem completely.
Java: The dummies C++?
Java was created by James Gosling as a dumbed-down version of its ten-years-senior cousin for fancy television sets.
Java’s garbage collector does memory management for you. It works in two simple steps. During the mark step it identifies which pieces of memory are in use and which are not. During the sweep step it removes the objects identified during the mark step as garbage collectable.
In most non-real-time applications the garbage collector shouldn’t be a problem. However, if you are writing a high-frequency trading system, the garbage collector may cause intrusive latency spikes.
There are options to help mitigate this. Oracle’s native garbage collector is highly configurable. Azul Systems, Inc’s Zing is an alternative Java Development Kit (JDK) that eliminates GC and application timeout issues immediately without the need to keep tuning the Java Virtual Machine (JVM).
Nondeterministic latency is also a concern in Java when it comes to real-time systems. However, there have been advances. The high-performance inter-threading messaging library, LMAX Disruptor, has brought this latency down through mechanical sympathy for the hardware that it is running on and through being lock-free.
The fact that Java compiles to bytecode and runs on a virtual machine means that it is highly portable and can run on Windows, Linux, and MacOS with minimal tweaking. C++ source code is, in theory, portable, but compilers may behave differently on different systems, so a lot more tweaking is required.
Java’s rigid syntax makes it easier for programming environments, such as Eclipse and IntelliJ IDEA to support powerful refactoring. But this increase in productivity comes at a price. Java’s generics are more syntactic sugar than C++’s powerful templates.
Both languages have advanced but legacy systems are stuck in the past
Both C++ and Java have advanced in recent years. C++ went through several revisions, namely C++03, C++11, C++14, and C++17. New features are being considered for inclusion into the forthcoming C++20. Among the recent C++ features are the move semantics, unified initialisation, lambda functions, multithreading and the memory model, regular expressions, standardised smart pointers, hash tables, std::array, reader-writer locks, fold expressions, structured binding declarations, parallel algorithms, and many more features that C++ programmers will find very exciting.
Post C++11 is very much a different language, sometimes referred to as modern, as opposed to classical, C++.
Java also developed through versions 7, 8, 9, 10, 11, and 12. It now features lambda expressions, functional interfaces, default and static methods in interfaces, Java Stream API for bulk data operations on collections, Java time API, collection, concurrency, I/O API improvements, and more.
However, new language features may not be an option for you if you are faced with the all-too-common task of supporting legacy code on a legacy system. In this case, the real question is whether you should support the existing codebase or rewrite the product using modern versions of the languages and modern development tools? This is one for you to decide on a case by case basis.
Paul Bilokon is a founder of The Thalesians. The Thalesians are an Artificial Intelligence (AI) company specialising in neocybernetics, digitaleconomy, quantitative finance, education, and consulting. The are experts in (and run courses in) the application of Machine Learning (ML) techniques to time series data, particularly Big Data and high-frequency data. Our areas of expertise also include the mathematics of ML, Deep Learning (DL), Python, and kdb+/q. A former quant and algorithmic trader at Deutsche Bank, Citi and Nomura, Paul also lectures part time at Imperial College London.
Have a confidential story, tip, or comment you’d like to share? Contact: email@example.com in the first instance. Whatsapp/Signal/Telegram also available.
Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)