Integrating Python with MATLAB
From the series: Using MATLAB in Finance 2020
Financial engineers who rely only on Python may be challenged when generating C/C++ and CUDA code, building interactive dashboards, creating parallel applications, and implementing deep learning. MATLAB® is a full-stack advanced analytics platform that empowers domain experts to rapidly prototype ideas, validate models, and push the applications into production with ease. See examples of how to use these tools together, such as combining the MATLAB library of advanced analytics capabilities with supplemental models from the open source community, or piping data between different IT systems or the web.
Learn how to integrate MATLAB and Python either as R&D tools or as scalable components of production infrastructure. The latter gives engineers immediate access to built-in analytics capabilities in MATLAB, such as deep learning, optimization, signal and image processing, computer vision, portfolio and risk management, data mining, time-series forecasting, code generation, and more.
Published: 18 Nov 2020
Welcome to Integrating Python and MATLAB. My name is Ian McKenna, and I'm a Senior Application Engineer at The MathWorks supporting the financial services industry. So before we get started today, just a little bit of history. For many years, MATLAB has provided direct APIs to many other languages to allow the MATLAB developer to take advantage of these other external libraries. And so we support interfaces to C/C++, Java, .NET, Fortran, COM, and others.
And in 2014b and newer releases of MATLAB, we also support a direct Python interface as well. In fact, there are three ways that MATLAB and Python can interact together. And this also just so happens to be our agenda for today. Those three ways are co-execution, where we have both environments calling each other, such as MATLAB calling Python libraries and vise versa.
Deployment, which is similar to co-execution, but we package up the MATLAB analytics so they can be called from a system that doesn't have a MATLAB license. And in this section, we're going to address packaging applications for desktop based deployment, as well as scalable enterprise deployment. And the third way is through data exchange. So exchanging data and models between MATLAB and Python.
So let's assume that we're responsible for building an enterprise scale analytic that other business critical applications can connect to, must rely on as a service, ultimately drive the bottom line of the business. So let's take a look at such an example that we want to build out. It's ultimately going to be the thesis for what we're going to be doing today.
So here is such an application, which has been nicely wrapped up as a dashboard that business users can use, click on a button, and access the data they need when they need it without having to write any type of code or have any type of technical expertise in programming. And so here you see the results are nicely formatted in a report here.
So we can access the data, peruse the data, and see the results in the predictions of the forecasts of the cryptocurrency is right here. So we can make our business decisions more quickly. And so the question at hand is, how do we build out such an application like this one? It requires a lot of different technologies at play. You can see here that there's typical web technologies, like HTML, probably JavaScript libraries, and so forth.
Probably on the back end, there's probably servers such as maybe Python. There's predictive analytics in here that is probably powered by MATLAB. So you have a lot of these different tools and technologies that need to interact with each other, and the question is, how do you do that? So in building such an application there are a few caveats that we're going to also take into consideration.
First, we expect to use data libraries that are maintained by the IT team, as opposed to writing our own. And this is because our team doesn't have the manpower to maintain those libraries over time, and they're owned by the IT department, so we can't touch them. Second, some of the predictive models will need to be pushed back into Python, because that is where the trade execution and servers are located.
However, not only do we need to push it back into Python, but we'd also like to enable other applications including Python applications to connect into our application that we're creating via RESTful API hooks. And then finally, we'd like to make the entire application extensible so that it can be improved over time. So let's assume this is our current setup that we're working with.
So far IT has already created a pre-existing library for data importing that their team manages, and that connects to a cryptocurrency data source. And then this information is packaged up nicely for business users and managers to make decisions and help drive the bottom line of the business. So for example, here we see an application that shows the historical price movement of a particular cryptocurrency.
Then one day, one of the business managers comes to us and says, hey, I have an idea. If we had access to the predicted forward looking data as opposed to the historical data, we could make additional profit beyond what we're currently making, even if the prediction is 100% accurate. So our organization has a few quants that have extensive MATLAB expertise. And they know exactly how to build out such predictive models that the business users want and are looking for.
However, before we can get to that, our first challenge is to call the Python data scraping libraries and pull that data directly into MATLAB. So let's see how this would work. Let's take a look at a simple example of making a call to the Python libraries. And so that's going to be the first part of our session.
So with this first task at hand, let's say that we want to take a URL, such as the cryptocurrency URL that we are connecting to, and parse it so that we can just get out the domain name. And we want to use this function that's contained within Python libraries, but use it from within MATLAB. So that really comes down to just a couple of things.
First, we use this py dot notation. This is referencing that the commands that we're going to follow are going to be a Python package-- if it exists-- a module, then a function. So in this case here, we are going to call a package urllib, which is a Python package. The module within urllib, parse. And then the function, urlparse.
And you can see here if you've never used the MATLAB live editor before, it's a great way to do quick and rapid prototyping. Kind of similar to the Jupyter notebook, but it's interactive and far more powerful, and you're going to see some of those capabilities later on such as live editor tasks, controls, and also other things. So let's pass in that URL, and you can see it's pretty messy here.
So in this case, we're connecting to Coinbase Pro, and there's a lot of additional parameters in here that maybe we don't care about. We only want to get out api.pro.coinbase.com. So let's run this. And you can see down here, that this is a Python parse result. Right. So this Python object here where netloc looks like it has the domain name. So this is what we want to get out.
So let's just add that as the next line, and see if we can do just that. So I'll assign this a value, let's call this urlparts. And we want just the domain, so this could be urlparts and netloc.
And bingo. You see exactly what we got here. We got the domain name, that's all we wanted. And so this is a quick and easy way of using Python from within MATLAB here. So again, that's great and all, but that example was fairly simple.
So let's take a look at a more realistic case that will address our challenge at hand. And so in this case, what we'd like to do is call a custom library called dataLib and the functions that are contained within it. So first off, what is dataLib? So this is the dataLib.py library.
You can see that it's just typical Python code. So we have a couple of different functions inside here, such as getPriceData and parseJson, and we want to be able to use that within our MATLAB code. And everything else in between is just typical Python code. You can see here, returning data, we're using a variety of different data structures inside here from things like numpy arrays to lists and dictionaries, and even JSON.
So let's take a look at a more fleshed out script here. So this is what a live editor script might look like if you add in comments along with the code and annotations and so forth. And I've just simply put the results of the script on the right hand side instead of underneath. So we can see the code on the left and the results on the right hand side. And so you see here, it's kind of mimicking the same thing we just did from the command line.
So in this case here, pulling in and using a library-- the math library and the sqrt function from within the math library. But you can also see down here, that we can create Python data structures from within MATLAB, and even run things like methods on those data structures from the MATLAB side. So having that kind of baseline understanding of calling Python from within MATLAB, how do we go about calling the dataLib.py library?
Well, it's pretty much the same as what we did from before, right? So you can see here, py followed by the module, because in this case, there was no package. And then the name of the function is contained within that module. The only thing that you need to be aware of when you're doing this is that if you're using a custom library, just make sure that it's on the Python path. If it's not in the Python path, it's not going be picked up by the Python interpreter that's being executed on the back end.
Also, you see here that we can pass in MATLAB data types. In this case, things like strings or character arrays, and it will automatically convert them over to the corresponding Python data type. And also, you probably know something new here if you've never used the live editor before. And this is a control.
So in the MATLAB live scripts-- live editor scripts-- you can put in custom controls like this to point other people to areas where you may want to change parameters or selectable things to do scenario analysis and so forth. So this is a great way to help tell your story to other people who may not be as familiar with the application that's being built. And it's very easy to put in these controls. Just simply Insert, Control, and you have a variety of controls at your fingertips here.
And so that's exactly what we did for doing the products and passing the results. But again, the key thing here is that we're able to use those functions that we built in Python directly in MATLAB by simply passing in inputs from the MATLAB side, doing things like parsing out the JSON and getting the data that we want.
Then if the data is not a MATLAB data structure, in this case, we have a complex tuple that has lists inside of it, and it also has a numpy arrays inside of it. And we want to convert that over to the corresponding MATLAB data type. That's not a problem. We can easily split up the tuple by using syntax like this. And then we can cast over corresponding things like the numpy arrays, right here you see on the right hand side, by using just the double command in MATLAB.
And likewise, we have a variety of other casting commands like cell, car-- char, struct, and so forth. And once we've done that, convert that over to the MATLAB table, which is basically equivalent to the data frame, but scalable in the MATLAB side. So if you haven't used a MATLAB table before, very similar to a data frame, highly recommend it.
And then last thing we're going to do is take advantage of things like timetable. So again, table, timetable, date/time, tall table, and so forth. These are all built in data constructs that have been built in MATLAB over the last couple of years to make our lives easy for doing simple types of tasks or even complex types of tasks that we have to do over and over again.
So in this case here, if I want to deal with something like time zones and convert the times in this case-- which are with respect to universal time zone-- to a view of someone who is in New York, this is all we need to do. A single built in command allows us to do that conversion. And then finally, just as a quick litmus test, we want to preview that data and just see if it makes sense.
So in this case here, we see that we're getting the price of ethereum. And from this period of time, it looks like the value ranged from about 171 to about 174, which seems to make sense. It's nothing crazy like 9,000 or two. So that's really all that it takes in order to call Python for MATLAB. Very easy, just remember py dot and then making sure that any custom libraries on the Python path.
OK. So now that we have a connection to Python, we can harvest the data as needed. And we can focus on rapidly building predictive models using MATLAB apps, and scaling to big data with tall tables. So let's take a look at a quick example here on how we can build out those models. So here we're working with the same data that we've been working with from before when we were scraping in data from Python-- or harvesting data from those Python libraries.
So when we load this in, we're loading it in as a timetable. And again, if you've never worked with a timetable in MATLAB, is very useful for allowing you to synchronize data to the same frequency. So for example, daily or quarterly or monthly, even if the frequencies of different time series were different. And also, aligning the time series to the same start and stop period.
And so in this case here, you see that we have four different cryptocurrency price level data that we're working with. And what we would like to do is attempt to predict the price of ethereum. So to do that, it's actually fairly easy in MATLAB. We're going to write pretty easy syntax here, it's very easy to understand and read. It's pretty much just the English language, where in this case here, we're lagging the price by two hours. Right. And you just simply write that here in straightforward English language.
Likewise, if you want to do things like calculate a return series, there's built in commands for doing that. Calculating things like the moving average or relative strength index. And these are all part of the financial toolboxes that we have available. So you don't have to write this code from scratch or reinvent the wheel. You can just use those pre-existing built in functions.
And then we take all these technical indicators that we're building out, and what we want to do is we want to build a wide-- essentially, a wide table of these technical indicators with the hopes that there is going to be a signal for driving the changes in the price and it's contained within one of these different predictors here. And so synchronize is our best friend in being able to take all these different predictors here and aggregate them together into a wide table for us.
So synchronize is a built in function that's part of timetable that allows us to take things like all these relative strength indexes and moving averages, and return series even, and combine them together. And the nice thing about this is that if your series don't have the same number of points, so for example, the return series and moving average have different number of points, that's not a problem because it's time stamped.
And so that it automatically take the data and make it the same size and fill it in with NaNs or even do interpolation if we so desire. And once we have that wide table of technical indicators, which right here we see we have about 30 different predictors here in our data set.
And when we have that, now it's time to build out a model that will give us some sort of prediction of the price in the future. And so the typical approach to do that is you're going to break up the data set into testing versus training-- or in sample versus out of sample. And so in this case here, we're going to do a-- looks like about four or five or six day, actually, time range for our training portion of data, and a four day period for our testing.
And one of the great things, again, is of using things like timetables is that they have these built in functions. So not only do we have things like synchronize, but we also have things like time range where you can specify our period of time. Makes it easier to write the code and index into our timetable for the portion of data that we need.
Once we've broken that up into testing versus training, now we want to build out our models. And we can do that, of course, by hand, but there are apps that allow us to facilitate the small development much quicker. If you never used an app inside MATLAB, there's many of them. So just to point out a couple that are useful, so for example, classification learner, regression learner, are the machine learning models-- apps that we typically would use.
There's also things for doing and building up neural nets, and even doing deep network designing. And in fact, we're going to reference this deep memory designer later on when we're building out deep networks and doing things like working with TensorFlow models. Also, being able to connect to databases, doing ARIMA modeling, or building our credit scorecards, and deployment.
So there's a variety these different apps. So I highly recommend trying to see if there's an app for the task that you're trying to do first before trying to do it by scratch. And so in this case here, we want to build out a bunch of regression based models using machine learning models to see which one is best. So we're going to use the regression learner app to do that. All right. So here's what the regression learner looks like.
And again, if you're new to this, the nice thing about this is it's great for both beginners and advanced folks. Because it has this nice workflow designed to start on the left hand side and walk you through until you're on the right hand side of the workflow. So we're going to follow that and pull in some data.
In this case, we're going to be working with the training data set. We're going to pull all those predictors into the app. And the idea here is that we want to build out a variety of different models that allow us to, in this case, accurately predict or address or connect to the training data set here.
So we want to be able to take this historical data set and fit a model to it as accurately as possible. And so in MATLAB, we have a variety of different built in models to do that. Things like, boosted and bagged decision trees, gaussian process models support vector machines. And more simple models, like for example, trees or regression models-- linear regression models in particular. All right.
So the nice thing about this is that you may not necessarily know which model is best, so you might want to simply train all of them. So let's train a few of these guys. And MATLAB will automatically que them up for you here. If you have access to a multicore machine, you simply click on the Use Parallel button up here, and it will allow you to do this in parallel.
So if you've got a quad core machine, it would do these all at the same time. And what this is doing is it's actually ranking each one of the different individual models based on the root mean square error. The one with the lowest root mean square error is the one that fits best to the historical data set. In fact, if you want to try out different models, it's as simple as just clicking on a individual model that you want to train and then it's going to give you the new assessment.
In this case here, the root mean square error is lower, so this is indicating this is probably the best fit model. So assuming we want to go with this, all we need to do is just simply take this, click on the checkbox button, and this will allow us to export the model to the MATLAB workspace that we can then use. And so how do we go about using this?
So the nice thing is that MATLAB gives us hints in terms of how to use these functions. In fact, it says, hey, all you need to do is pass in some new data here-- some new testing data-- and I will give you back, in this case, the predicted prices by fit. Great. So that's exactly what we do right here.
So we just simply took that hint, pasted it directly into our live editor script, and now we're on our way. So we have our predicted prices right here. And then of course, what we want to do is we want to take those predicted prices and actually measure it against something, some sort of benchmark. In this case, we're going to use the out of sample data. So we just simply take our testing data, pass it into our predict function, and then visualize it so we get a sense of what it looks like, and also calculate some metrics around that.
On the right hand side is the visual of what the historical, which is blue, versus our prediction, which is red here. And you could see that we're capturing some of the dynamics of the model fairly well, but there are other areas that potentially could be improved. Overall, the root mean square error is seven. And the question is, can we do better? we trained this data set on only six days of data.
What if we trained this on 30 days instead? Would it get a better accuracy to the out of sample data? So let's try it out and see what happens. OK. So, typically, one of the challenges to doing that is that you need to work with data sets that may not necessarily fit into memory. They could be gigantic, maybe petabyes of data is things that I've seen before.
So if you have to deal with such a large data, you obviously can't fit into RAM. And so an alternative approach of doing that is using the data store command in MATLAB. So the data store command allows us to connect to either a single file or multiple files that can be a variety of different data types. So it's actually allows you to connect to a repository of data and treat it as a repository that you're not necessarily pulling in all the data all at once.
Allows you connect to many different things, so it's actually a very powerful command. It allows you to connect to Excel files, flat files. Allows you connect to databases, HTFS systems, Azure Blobs, S3 buckets. So you can connect to many different data sources using this data store command.
The next command is tall. So once you've established the connection to the repository of data, you're going to use this tall command, which I mentioned from before. There's tables, tall tables, timetables. So this is one of the core data types that's been introduced in MATLAB over the last five years. And so this allows us to, as a MATLAB developer, it's almost kind of magical because we don't need to worry about restructuring our code to make it big data friendly.
Instead we call this tall command, and then we can just write our MATLAB code like we've always have without having to refactor it and work with chunks of data that fit into memory at a time, and so forth. So what you're going to notice after this is that we can scale to big data without having to change our code at all.
And in fact, you can see this exact same code we were working with from before, time range, time table to table, and so forth, with one small change. And that one small change is right here, train prediction model. So in the first step we used the regression learner app to build out a model and assess a model that works best. Do we have to do that again? And the answer is, no.
We can go back to that regression learner app-- let's do that. And right here, is a button to generate the function so that we don't have to do this again and again. Once we found the model that works for us, we can just simply use that and pass in our new training data set, and that's going to give us our trained model. And so what you see here is a white box, right? It's not closed to you. You can see all the details of what's happening here and dive into the specifics, change it if you want to.
But the key thing is that you didn't have to write the code yourself. So ultimately, this is a tool that saved you a lot of time. OK. And so finally, once we've done that we may want to visualize the data. And so one key thing that can be a challenge is when you're working with big data, just simply doing a visualization of a data set.
And so what MATLAB has is a variety of built in visualizations for big data that you don't have to pull the entire data set into memory. So instead, you can do a quick rastering of the data set and pull in the portion of data that you need when you need it. So this is another time savings technique. And then finally, once we do the same visualization and calculate the root mean square error, here's our result on the right hand side.
So you can see our root mean square error dropped from about seven down to less than one, and you can see that visually by inspection. We can see the historical and predicted prices are almost on top of each other. So in fact, it looks like using more data to train this model, actually, indeed worked. And so we're happy with this model, and now let's save that train the model that we created into a mat file, which is going to ultimately be called by our deployed MATLAB analytics, which is then in turn called by Python.
So Python calling MATLAB calling this mat file. Great. So that next step has been completed. We've built out the model, and so now our next step is being able to validate or vet this model. So before we productionalize this, we want to be able to validate the model in the environment that it's going to be ultimately be run in.
So our next challenge is being able to call MATLAB now from Python, so the reverse of what we just did from before. So let's take a look at a simple example of calling MATLAB from Python. Let's say I have this command, copularnd, that I found, which is basically a random number generation for Copula.
Let's take a look at the documentation of this and see how it's used first. So copularnd is a built in function that allows us to perform correlated multivariate simulations. And it looks like all we need to do is simply specify the Copula type followed by this correlation parameter row, right? And then the number of vectors to return. And we can see all that inside the documentation here.
So this is what we want to use. So let's try it out. Let's first open up a Python interpreter. All right. So there's only a couple of steps we need to do here. First, import the MATLAB engine.
Second, we need to start up this MATLAB engine that we just imported into Python. And now we can access any of the MATLAB functions directly from Python by just indexing into the variable, m, that we just created here.
So in this case, if we want to access the copularnd function, all we need to do is just do m.copularnd, and then pass in the inputs that we so desire. And let's just take a look at what that looks like. So there we have it. We've just made a connection to a MATLAB function, copularnd, and called it and saved the results into Python here.
But that, again, was a simple case, and we want to take a look at our scenario at hand. So first, we have successfully called a Python library. In this case, dataLib, inside of MATLAB. Now, we want to take the model we built in MATLAB-- the MATLAB script build predictor that we're working with from before-- and wrap it up into a self-contained function called predictPrice. And then call that predictPrice from within Python.
So first, let's just take a look at what predictPrice looks like, and then we'll take a look at calling that predictPrice function from within Python. OK. So here's the predictPrice function. You can see it just is using typical MATLAB function signature here, varargin. If you've never heard of that before, it's a variable number of inputs, so we don't have to have a dynamic function signature, which is great.
If we are calling this from a different environment, we don't have to hardcode it. And so looking down here, it's pretty much just doing the same stuff that we did from before. It's just calling in and getting and aggregating the price level data. And then the exact same code that we saw from before, the lagging, calculating the technical indicators here, combining them altogether and creating that wide table of predictors.
Loading in the mat file that contains the trained model that we built from before, so that's where this comes into play. And then doing the prediction, right? So it's all the same stuff that we did from before, just put together into this predictPrice function so we can call this as a function. And so now from the Python side, what does it look like to call that?
So pretty much the same set of steps that we saw from before. So again, we are making a call to MATLAB engine, starting up the MATLAB desktop here. And so one thing that you noticed in this case, is that we can interactively call the desktop. We have a variety of different mechanisms that we can use to start the MATLAB session. We can either start up a hidden session, a full MATLAB desktop if you want to see the variables in the workspace, or even connect to your current MATLAB session if you want to.
Then from there, casting any data types over to the corresponding MATLAB data types, and then running are predictPrice custom function that we created. So let's take a look at this, let's run this.
And what you see here is that MATLAB has opened up a new MATLAB session right here, that is, in this case, connected to the Python environment. And we can see that the results from the MATLAB environment are passed back into Python here. And you can see the results, coin price, and validation data. And in fact, if we take a look at validation data, it is a structure.
And we can see the structure contains some validation metrics such as the root mean square error, predicted and historical final prices. And if we take a look and compare that to Python code, a couple of things first off. You notice that we can actually manage, and debug, and adjust, and edit the workspace-- the MATLAB workspace-- from Python.
And in fact, when we were doing the validation data here, you notice that it wasn't JSON data, but it was a structure, right? And so we actually manipulated that data and managed it from the Python side. So you could actually call MATLAB functions and syntaxes directly from Python. And this is what allowed us to unpack the JSON directly into a structure.
So now finally once the model is vetted in the Python environment, we want to push the analytics across the organization. However, not everyone who would benefit from MATLAB's capabilities is a developer. Some, such as the business users, decision makers, managers, happen to be consumers of the model, and thus a license to MATLAB would be overkill. But they still would benefit from the models themselves.
What would be ideal in this case is to be able to package up the MATLAB analytics, which could then be run against some lightweight runtime. The results, which are then pushed back into Python and ultimately the web application, thus allowing the business users to access the forward looking predictions of our MATLAB models.
So let's take a look at an example of what this would look like, and move on to the second part of our agenda, which is deployment of the MATLAB models that we've created. So in this case, we want to take our predictPrice and package it up.
So we can do just that using an app inside MATLAB. Not having to recode anything from scratch, just creates our libraries for us. And so the app to do that is the library compiler right here. So if we click on this, this has a variety of different target outputs that we can deploy out to.
So in this case, you would choose Python. And then we add in the functions that we want to make accessible to the Python environment. We can add in 1, 10, 1,000, as many functions as you like to expose to the Python developer. And the third step is just simply click on the Package button, and it will create a py library for us. And so, I've done just that and added in the relevant functions necessary for this to run.
So when I click on Package here, it's going to do three key things for us. First, it will find and do a dependency analysis to find the corresponding MATLAB code necessary for this to run. If you have mat files, it will find those as well for you. If you're using external things like py libraries, you will have to add those yourself. But it will find most of the things that you need in that dependency analysis. So that's the first thing it does.
The second thing it does is encrypts your MATLAB code so that your intellectual property remains safe. And then the third thing it does is build out the py interface so that you as a MATLAB developer don't need to know Python in order to pass this over to the Python team.
It will create the py wrapper for you so that the Python developers can just leverage the MATLAB analytics on their end without need to know pretty much anything about MATLAB. All right, and it's done. And so here it's built the py library necessary to install the MATLAB library.
So when you run this, it's going to simply install this into your MATLAB-- or sorry, into your Python environment. And then you'll be able to import the MATLAB library. So you just run the setup py, and it'll copy the MATLAB analytics over to your Python environment. OK.
So once we've done that, now we can call the corresponding MATLAB library from Python. So let's take a look at the corresponding Python code. OK. So here it is. It's pretty much the same as we saw from before, right? You're doing some predictPrice commands the same way. You're casting it over to the corresponding data structure, like you did from before.
There's only kind of two key differences between the previous code in this one. One is that we are now importing this custom library that we created using that app, right? So the great thing about this is that there is no recoding that's needed whatsoever to do this. The MATLAB code is exactly the same as before, but instead of starting up a MATLAB session, we're actually initializing a runtime in the process. So that's what this next step is doing.
So instead of starting up the full MATLAB session, you're initializing that thin lightweight runtime. And so this provides the additional benefit of improved performance, because the runtime is leaner to start up than a full MATLAB session. So let's execute this. And there we go.
We've got the same results back into Python, but now using the packaged up py library. So if someone doesn't have a MATLAB license, they can still use the models that we built in MATLAB from within Python. All right. So let's summarize the desktop deployment workflow.
On the left hand side is the MATLAB developer's machine. They have their license of MATLAB and the corresponding toolboxes. They've built out a application in MATLAB, in this case, the predictPrice. And then our one click deployment via the app will generate the py library. Then all we need to do is to simply share the py library with an end user that does not necessarily need to have a license in MATLAB.
All that's needed on their end is to install this free thing called the MATLAB runtime on the end user machine, and it will provide them with the same rich user experience that they would get by using MATLAB. Again, the one key important difference here is that the end user does not need to invest in a MATLAB license. Our deployment solution is royalty free, meaning that you can share that py library with the world and the cost is the same to you, which is just simply the initial investment in the MATLAB compiler tools.
So that's great, but now the question is, how do we scale this to the enterprise where we need to support massive application concurrency? Some of the approaches that I can think of include buying more MATLAB licenses, and that's going to be probably too costly. Performing manual language translation, that is both costly and time consuming. Or using the MATLAB compiler, which we just talked about.
However, let's assume that installing the MATLAB runtime on end user machine is a no go because of IT restrictions and regulations. Well, in that case, that rules out using the MATLAB compiler desktop deployment based approach. Additionally, there are some other requirements that we need to fulfill as well.
So in this case, reducing data transfers by bringing the analytics to the data is of interest. Being able to centralize the analytics to ensure users are always accessing the latest version of the models that we've built out automatically instead of having to manually push the updates out to them. Providing multiple interfaces, so a single analytic can be accessed by multiple applications.
So in this case, if we have web based applications like JavaScript or Python, or we've got C# or C++ or MATLAB calling the application, we only have to have one version of it deployed out. And of course, the last thing and the most important, is that needs to be scalable-- robust.
So the MATLAB production server is the solution to this. It's a server software that can sit anywhere. It can sit and be installed on a server, or a cloud, a grid, a desktop, a laptop. It can be literally installed anywhere. And what it does is it provides a centralized repository of analytics that can be concurrently accessed by other applications in programming languages. Thus providing a scalable architecture, which is achieved by essentially managing a pool of runtimes, which you see here on the right hand side.
This is the production server and these are the pool of MATLAB runtimes that are up and accessible at any point in time. So simply package up the MATLAB analytics into a library, using the MATLAB compiler SDK. And then place it into a directory within the production server instance.
Then these libraries become immediately accessible by applications or users. So for example, web servers, Excel add-ins, application servers, databases, and so forth, can all make calls to this production server, which is essentially the gold standard of all the analytics. The most up to date versions are always deployed to that production server.
And in fact, these libraries can be accessed through a variety of different interfaces too. So including here in this case, Python and RESTful interfaces. The RESTful interface is especially a powerful way to connect to the production server, because no thin client is required at all to execute the deploy MATLAB analytics and get the results back into the calling environment.
So let's take that predictPrice function that were working with before and now make it accessible as a RESTful call that's hosted on a server, and that call is going to be calling from Python. OK. So just like from before when we were building out the py library using an app, there's an app for building out production server based deployment targets.
And so you can find that right here, production server compiler. If I click on that, or in this case, I already have a version that's already included the functions that I want to deploy out. You could see that the interface is pretty much the same as before. The only difference is the targets that we can deploy out.
So in this case here, we create a CTF file instead. And so this CTF file, you can think of it as a zipped up model containing all of your analytics in a single file, which you then place into a directory of that MATLAB production server instance. And so typically the approach is that we would add in all the different functions we want to deploy, click on Package, and it creates that CTF file.
However, before we do that and we move the predictPrice function over to the MATLAB production server, we want to test it first from MATLAB to make sure everything runs smoothly. And to do this, we can use the MATLAB production server development and test environment right here.
So if we click on this here, it basically acts the same way as the MATLAB production server for making calls to your analytics. But instead will use your MATLAB license, or your MATLAB session here. So it's using and tying up your current environment, so it doesn't have scalability. That's what the MATLAB production server is for. But otherwise, it looks and acts the same way as the MATLAB production server. So gives us a great way to test and debug things.
So let's start this up. And one caveat we need to make sure is that we don't have anything else running on this particular port. No problem if you have multiple things that you want to run at the same time, just make sure they're on different ports here. All right. So we started this up. It's on port 9910. We can see here this is how we connect to it.
So in fact, all we need to do is specify the IP, the port number, followed by the name of the library that we just created, and then the function that we want to call. In this case, it would be predictPrice right after the MATLAB library. So let's take a look at an example of a client making a call to this RESTful service that's being hosted by the test and development environment for the MATLAB. OK.
So here's the example. It pretty much is just doing a typical RESTful call using the standard Python libraries to do that. So you see here, we're specifying the IP, port number, the location of the library followed by the function that we want to call, and then any parameters that we want to pass into that function. And then the rest of that is just basically taking the JSON and unpacking it.
So let's run this. Great. So it looks like it finished here. And we can see the results for the predictPrice is passed back into Python. And if we take a look at that test and development environment, you can see that it recorded that call right here. So we can see that it completed. And in fact, we can even see the inputs and outputs right here.
So if we take a look at the inputs here, we can see that we passed in ethereum for this start and stop point. And we can reference that back to the Python code that we had from before to make sure those are, in fact, correct. So if I take a look at the Python code, here are the inputs I passed in. So it's the same exact inputs as we expected. Things are looking good. OK.
So let's stop this now, now that we've tested this out make sure that it works appropriately. And we want to harden this. So we want to take this and build out our CTF file here. So if I click on Package, it's going to do the same processes as from before, those three steps of encrypting the model, finding the dependencies, and taking this and packaging it all up into a single file. OK, great. Now it's finished.
Let's take a look at the output. All right. So here's our CTF file. So now all we need to do is just take this and put it on the MATLAB production server. So all we need to do, and again, we can do this two ways. We can do this programmatically or we can do it interactively using the MATLAB production server dashboard, which is part of the MATLAB production server.
So that's what I'm going to use here, but all we're really doing is literally dropping this into a directory of the MATLAB production server instance. So going back over here to our MATLAB production server dashboard, we can see the application needs to be started. So let's start up the MATLAB production server instance. And you could see here, it's a nice dashboard for easily managing the resources of the MATLAB production server.
So you can see here that the number of workers, we can see where the requests that are coming in, as well as the memory and usage and utilization that's available to us in this instance. To take our application again and deploy it to the MATLAB production server, just simply Upload and Deploy, right here. Point to the location at where that file is located and just add it in, and that's it.
It is hot deployable, meaning that you don't need to take the MATLAB production server down, it can automatically update for you. And it also has extensive IT infrastructure to go along with it to help manage those models, such as extensive logs. And also setting the configuration of the production server is very easy and straightforward too.
So if you need to do things like manage legacy applications that you've built out in previous versions of MATLAB, no problem. It's simple as adding in the location of the corresponding runtime for the legacy application that you have run. So this case, I'm supporting applications and it will work with applications that have been built out in 2019b, '19a in this case. And if I want to add '18b, '18a, and so forth, I could just add them here too.
On top of that, if I need to scale this out, that's also not a problem. I just simply specify the number of workers here, and I can have as many as I like. In this case, my machine is only a dual core, so I'm just simply using two. Great. So now we've done pretty much all the pieces that we needed to do.
Now we can set up that RESTful web service using a Python server like we did from before using the code snippet that we just had. And this allows users and applications to make the typical HTTP call, which will then returned back a JSON payload. And we have just a server on this machine here running this Python server, and it's on our port 3030 in this case.
So if we make a call, let's say in this case to here, we can specify the type of coin that we want to work with. Let's say we want to do the ethereum coin. And it's going to run against the MATLAB production server, and return back results to us on the predicted price. Great. And you see here, here all the results that we worked with from before, including the predicted prices into the future for us.
And if we take a look back at the corresponding MATLAB production server dashboard, we can see those calls being made right down here. So this is number of requests, the requests are being completed, and then we have two workers available for us again.
So what this means is now any web based application such as this website that you saw in the beginning here can take advantage of these web services like any typical web application would. Great. So now our application is up and running. We have a Python server, which is calling MATLAB, which is calling Python data libraries, essentially more or less like a Python sandwich where MATLAB is the meat in between the Python slices of bread.
So we're done. Well, not really. Applications need to be maintained and improved over time, and the same is true here. So suppose we just found out that another group within our company has that deep learning expertise other than us. And they have some TensorFlow models that they have extensively trained with relevant data that we're interested in.
Well, we certainly could reinvent the wheel and get that data and train models from scratch in MATLAB. But instead, let's reduce our time to market and reuse those existing TensorFlow models. Also, we should assume that each group probably wants to continue using their tool of choice. They're going to continue using Python, we're going to continue using MATLAB moving forward. And it becomes clearly advantageous that the groups should collaborate on any new models built.
So in this scenario, we need an easy way to transfer data and models between those two different groups. So in this scenario there are two areas of future development, which includes general R&D and also using deep learning models to further enhance predictions. And for the general R&D, what would make our lives easier is a way to import and export data between the environments.
For the deep learning piece, we need a way to improve existing models, which includes using techniques such as transfer learning. And so this now brings us to the last section of our agenda, which is utilizing the various methods of exchanging data between MATLAB and Python. First, let's address sharing models between MATLAB and Python.
So MATLAB has direct importers and exporters to ONNX-based deep learning models. And so this includes connections to PyTorch, Caffe, MXNet, and, of course, TensorFlow. So if you haven't done deep learning with MATLAB before, there's four key reasons why MATLAB can improve your experience using deep learning.
First, ease of use. It lowers that barrier to on ramping. Second, there's lots of built in visualizations and debugging-- and I'm going to show you that in the second. Third, augmentation and optimization of deep nets with hyperparameter tuning. And fourth, deployment, including generating CUDA code without having to recode. So deep learning can be used in many different places, but one place I've personally seen it used quite extensively in finance is with text based data.
So in general, deep learning has been used with text analytics with quite a bit of success because of its accuracy. So let's consider a case where we want incorporate new sentiment into our cryptocurrency price prediction. Of course, we could build a model which includes a predictor column that gives a general sentiment of the cryptocurrency.
But to generate a predictor column we would have to create a classifier, which takes in free form text and extracts news site sentiment and-- sorry, returns back a sentiment score from new site information. With a robustly trained deep network we can even account for things like understanding word context, spelling, and grammatical errors in the sentiment as well, which is a very alluring piece of using deep learning.
However, deep nets are often regarded as a black box and difficult to use for beginners. This is where MATLAB can help. To import a ONNX-based model, such as TensorFlow, requires only a single command in MATLAB, importONNXNetwork. Once we have that model in MATLAB, the first thing we want to do is visualize all the layers of the deep network, and that can be done using the deep network designer app in MATLAB, which I alluded to before.
Here we see a picture of a few the connected layers. And the app also provides us an easy way to edit the network, such as adding or removing layers. Also automatically generating code and exporting the modifications of the layers back to MATLAB or even back down to Python. Second, we want to analyze the structure the deep network to ensure the network is connected correctly and there aren't any issues. This ability to visualize and debug the network also extends to TensorFlow since we can import TensorFlow models into MATLAB.
Therefore, not only can we get started more quickly, but we can even assist the TensorFlow team by running analysis reports and ensuring the integrity of the network structure. Finally, because price movements change very rapidly with respect to the news, the inference speed to generate sentiment scores being quite important. And this is where using MATLAB's GPU code generation can dramatically improve the deep network inference speed without having to do any type of recoding.
And speaking of automatic code generation, we have extensive code capabilities in this area for generating both CNNs and LSTMs for inference. So with a single command, such as the one you see below here, you can generate CUDA code that can be executed outside of MATLAB. And in doing so [? can ?] performance improvements of on average about two times that of running inside MATLAB, and speeds of up to seven times faster than that are not unheard of.
And in fact, it's generally on average faster than tools like TensorFlow when you do the inference. And also besides supporting multiple libraries like the Intel math kernel library and TensorFlow RT, we even support various processors such as ARM. So now you have a direct way of taking a TensorFlow model, generating CUDA code from it without needing to be a CUDA code programmer yourself.
So besides the sharing of models, sometimes it's necessary to share the data between MATLAB and Python. And this is especially true in the research phase of MATLAB development, where it's helpful to do some initial validation of the results from multiple environments. One way to share data is using Apache Parquet. And this allows us to share data frames as tables inside MATLAB and vise versa. For those that aren't very familiar with Parquet, it's the file type that allows you to store data to disk.
And specifically, Parquet is very efficient and fast. And it makes it great for working with big data, but also any tabular Excel-like data as well. Below here is an example of saving a data frame as a Parquet file, then loading the Parquet file into MATLAB as a table. A single command in MATLAB you see here allows you to read and write to Parquet files. Additionally, Parquet can be read asynchronously, meaning that you can read in parallel with MATLAB, further improving the performance.
So let's take a look at an example of creating a data frame in Python, loading that data into MATLAB as a table for further analysis, and the interrupt between table and data frame using Parquet. All right. So the first step here is getting the data into a Parquet file. And so you see here, we're just using the same dataLib library that we've been using the entire time here to get the price data, parse the JSON.
And now we're creating a pandas data frame, and then taking this and converting it or saving it into a Parquet file that we're going to then load into MATLAB. So let's run this real quickly here. OK. So you see here that we're pulling in some single univariate time series that has closed price level data and some dates here, and we'd like to take this and pull this into MATLAB. All right.
So let's create a new live script. Right. So we save the data right here as data.parquet, and we want to load this in to our live script.
So to do that, we just need to use the command parquetread and the name of the Parquet file. And so here's that same data that we're working with in Python, but now we've loaded the corresponding data frame as a table inside MATLAB. You see this is stored as a table here.
So first thing we want to do now that we're inside MATLAB is just quickly visualize it, which is very easy to do. So let's just do that. And let's plot these two guys. OK. And so here's our plot of what the data looks like for that period of time. It looks like it makes sense.
We've got the dates on the x-axis and the corresponding price level data on the y-axis, but we see here that the data is kind of rigid here. If we were trying to make some predictions, it looks like it's got quite a bit of noise. So the question is, is there a particular smoothing window that we can use that would be relevant to potentially a prediction?
So when we're doing this analysis of the data, we can do that very rapidly inside MATLAB by leveraging not only the live editor scripts, but also live editor tasks too. So like controls, these are advanced tools that allow you to do rapid prototyping. So in this case here, I can do a lot of preprocessing techniques like cleaning data, finding outliers, removing trends, or, in this case, determining what smoothing factor is appropriate.
So if I click on the smoothing task, this will load in more or less what seems like a lightweight app that allows you to do a particular task at hand without having to write any type of code. So in this case here, I want to work with data. I want to smooth the closing value. And I want to use a moving window and try to figure out what moving window makes sense for this particular time series. All right.
So when it did a moving window of 16, it looks like this is removed a lot of the noise. This looks pretty good to me. So in fact, if I was going to then take this and do some analysis with it moving forward, I may want to include a moving average window of somewhere between the range of maybe 10 and 20, because 16 looks pretty good.
So the nice thing about these tasks is that they generate code for you right here. And in fact, when you're done with the task itself, you can leave it like this if you want, or you can just take this and remove and put just the code only here, and it automatically puts the code in here for you. Finally, let's say that we want to take this altered data here and pass it back over to the Python team for further investigation on their end. Well, that's also easy to do.
Since this is a corresponding table in MATLAB, I can simply take this and assign the close value the new smooth data that we just created. And that's easily done by just doing this.
And now here's our smooth data, 8236, 8238, versus the original values right here. So we can see we've updated it. And now let's save this back as a Parquet file. Great. Now the data has been saved back as a Parquet file.
Let's go back to Python, load that back in, make sure that the data is right. OK. So here we're just loading in that new data that we just saved.
So let's run this. And there we have it. The corresponding same values, 82.36, 82.38 and so forth. So we pulled in all the same data that we made modifications to.
So to summarize things up, in production and even in research environments, many systems need to integrate and play nicely together. Generally, this can be quite challenging for solution architects who are in charge of building a robust back end infrastructure. And MATLAB has made our lives easier by creating a lot of built in interfaces to many of these systems. From data sources, such as Cassandra, NoSQL, MongoDB, Hive, and so forth.
To business intelligence systems, like Power BI, Kafka, Tableau. And to even virtual machine, cloud, and mobile devices. And now for the last six years, Python 2 has had its place in that analytics ecosystem as well. So hopefully today you've learned some new techniques to bridge together MATLAB and Python.
If you're still interested in learning more, such as additional information on integrating MATLAB and Python, building deep learning or reinforcement learning models, these links right here are fantastic and include free downloadable e-books and interactive online tools to get you started with a number of topics.
So to get started immediately, you can download a free trial of MATLAB directly from our website. Or if you would like to speak to a technical expert to discuss your team's specific needs, just reach out to your account manager or call us directly. We're always happy to help. So with that, we have come to the end of our session. Thanks for taking the time today to see how you can get the best out of both worlds by combining MATLAB and Python together. Thank you.