Calling Matlab from Python
For my latest experiments, I needed to run both Python functions and Matlab functions as part of the same program. As I noted earlier, Matlab includes the Matlab Engine for Python (MEfP), which can call Matlab functions from Python. Before I knew about this, I created Transplant, which does the very same thing. So, how do they compare?
Usage
As it's name suggests, Matlab is a matrix laboratory, and matrices are the most important data type in Matlab. Since matrices don't exist in plain Python, the MEfP implements it's own as matlab.double
et al., and you have to convert any data you want to pass to Matlab into one of those. In contrast, Transplant recognizes the fact that Python does in fact know a really good matrix engine called Numpy, and just uses that instead.
Matlab Engine for Python | Transplant ---------------------------------------|--------------------------------------- import numpy | import numpy import matlab | import transplant import matlab.engine | | eng = matlab.engine.start_matlab() | eng = transplant.Matlab() numpy_data = numpy.random.randn(100) | numpy_data = numpy.random.randn(100) list_data = numpy_data.tolist() | matlab_data = matlab.double(list_data) | data_sum = eng.sum(matlab_data) | data_sum = eng.sum(numpy_data)
Aside from this difference, both libraries work almost identical. Even the handling of the number of output arguments is (accidentally) almost the same:
Matlab Engine for Python | Transplant ---------------------------------------|--------------------------------------- eng.max(matlab_data) | eng.max(numpy_data) >>> 4.533 | >>> [4.533 537635] eng.max(matlab_data, nargout=1) | eng.max(numpy_data, nargout=1) >>> 4.533 | >>> 4.533 eng.max(matlab_data, nargout=2) | eng.max(numpy_data, nargout=2) >>> (4.533, 537635.0) | >>> [4.533 537635]
Similarly, both libraries can interact with Matlab objects in Python, although the MEfP can't access object properties:
Matlab Engine for Python | Transplant ---------------------------------------|--------------------------------------- f = eng.figure() | f = eng.figure() eng.get(f, 'Position') | eng.get(f, 'Position') >>> matlab.double([[ ... ]]) | >>> array([[ ... ]]) f.Position | f.Position >>> AttributeError | >>> array([[ ... ]])
There are a few small differences, though:
- Function documentation in the MEfP is only available as
eng.help('funcname')
. Transplant will populate a function's__doc__
, and thus documentation tools like IPython's?
operator just work. - Transplant converts empty matrices to
None
, whereas the MEfP represents them asmatlab.double([])
. - Transplant represents
dict
ascontainers.Map
, while the MEfP usesstruct
(the former is more correct, the latter arguable more useful). - If the MEfP does not know
nargout
, it assumesnargout=1
. Transplant usesnargout(func)
or returns whatever the function writes intoans
. - The MEfP can't return non-scalar structs, such as the return value of
whos
. Transplant can do this. - The MEfP can't return anonymous functions, such as
eng.eval('@(x, y) x>y')
. Transplant can do this.
Performance
The time to start a Matlab instance is shorter in MEfP (3.8 s) than in Transplant (6.1 s). But since you're doing this relatively seldomly, the difference typically doesn't matter too much.
More interesting is the time it takes to call a Matlab function from Python. Have a look:
This is running sum(randn(n,1))
from Transplant, the MEfP, and in Matlab itself. As you can see, the MEfP is a constant factor of about 1000 slower than Matlab. Transplant is a constant factor of about 100 slower than Matlab, but always takes at least 0.05 s.
There is a gap of about a factor of 10 between Transplant and the MEfP. In practice, this gap is highly significant! In my particular use case, I have a function that takes about one second of computation time for an audio signal of ten seconds (half a million values). When I call this function with Transplant, it takes about 1.3 seconds. With MEfP, it takes 4.5 seconds.
Transplant spends its time serializing the arguments to JSON, sending that JSON over ZeroMQ to Matlab, and parsing the JSON there. Well, to be honest, only the parsing part takes any significant time, overall. While it might seem onerous to serialize everything to JSON, this architecture allows Transplant to run over a network connection.
It is a bit baffling to me that MEfP manages to be slower than that, despite being written in C. Looking at the number of function calls in the profiler, the MEfP calls 25 functions (!) on each value (!!) of the input data. This is a shockingly inefficient way of doing things.
TL;DR
It used to be very difficult to work in a mixed-language environment, particularly with one of those languages being Matlab. Nowadays, this has thankfully gotten much easier. Even Mathworks themselves have stepped up their game, and can interact with Python, C, Java, and FORTRAN. But their interface to Python does leave something to be desired, and there are better alternatives available.
If you want to try Transplant, just head over to Github and use it. If you find any bugs, feature requests, or improvements, please let me know in the Github issues.