Coding for Finance

Yves Hilpisch on Derivatives Analytics with Python

Yves Hilpisch on Derivatives Analytics with Python

Derivatives Analytics with Python cover

In 2015 Jacob Bettany talked with Yves Hilpisch about his book Derivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration and Hedging. In a wide-ranging conversation, they covered the state-of-the-art on Python, and why institutions are increasingly choosing the language for financial applications.

Derivatives Analytics with Python was published by ‎ Wiley on 26 June 2016. Learn More

Jacob Bettany: Could you tell me a little bit about your background and how you came to be involved with Python?

 Yves Hilpisch: Yes, well the most important things are: I’m married, we have two children, two dogs and ten horses. My wife is a semi-professional horse rider. That’s the reason we live where we live in Germany and not in a big city, it all revolves around the family and the horse riding. I do need to travel to Frankfurt, London and New York quite often but luckily much can be done online from home now.

In terms of education, I started out with a Diplom-Kaufmann degree as it is called in Germany, the German equivalent of the MBA. Although I have a degree close to a Master in Business Administration, nearly everything I did at university was in financial theory and banking. I did my diploma on banking theory, I wrote my seminar thesis on Mergers & Acquisitions so it was all about finance and not so much about business management, marketing, etc. Later on I also studied mathematical finance for my PhD, rather theoretical work on models doing no programming at all at that time. During my PhD I started working in management consulting which I did for two years, not on the quant side but rather on operational and strategic topics. After my PhD I returned to consulting then and in 2001 I founded my first company with two partners, a management consulting company.

In 2004/5 I heard about Python and was fascinated. I thought it would be interesting to start using it for finance and tried it straight away. I started giving talks at conferences on what Python could do for financial algorithms. People were at first not convinced of its use in finance. In 2005/6 we started building some prototypes, starting with the Monte Carlo simulation and computationally demanding stuff. It took a while until the nice libraries like NumPy and later pandas came out, but now Python is used by the biggest financial institutions in the world.

Jacob Bettany: Why did you want to write Derivatives Analytics with Python, who is it for and why do you feel it was needed?

 Yves Hilpisch: It all started with a lecture about numerical option pricing at Saarland University, where I also did my PhD. I was to give a lecture from the practitioner perspective about good computational finance in practice. I gave this lecture for three winters and started writing small notes for the students about different topics. I thought it would be good to write a book out of that. I hadn’t had the intention to publish it with a publisher back then, rather to write something I could use in sales and marketing and for the students because there hasn’t been and still isn’t a book covering similar topics available.

Jacob Bettany: How much finance and Python background do you need to get the most out of the book?

 Yves Hilpisch: Well the finance part is explained to some extent, the Python part is not explained at all. It explains how to use various techniques with Python but doesn’t explain how to program with Python. That’s the reason actually why I wrote my Python for Finance book (O’Reilly, 2014). This book closes the gap as it explains how to use Python for finance and is more of a reference work for looking things up. I assume readers already know how to program Python and how to use the different libraries. I have an appendix showing a few examples but it doesn’t replace my other book. The finance topics are covered but not explained in depth so some prior knowledge is necessary.

Jacob Bettany: How is the book structured?

Yves Hilpisch: The basic structure is in three parts. Part one is about the markets, with two chapters. Chapter one covers market-based valuation from a more process orientated perspective. The second chapter is on market stylized facts, so I set the stage by considering what we need to take care of when we talk of market-based valuation, in the sense of that there are different notions of volatility, that there is correlation, and it’s about the volatility surface, what characteristics with regard to stock index returns we observe in the market and so forth. So the major examples we use are equity index options. It’s not about rates models although it might be applied in many other places than equity options – this was a strategic decision we made given that we had limited space. The major focus indeed is on the process itself, providing all the tools needed for that. If you then go to other product categories for example commodities or Forex then you just need to add the domain specific knowledge and not that much about the tool set itself. The tool set side of things is developed fully, completely but the product spectrum that is covered is more or less constrained to equity options.

As a general guiding principle in all my books I always provide the full code and data sets so everybody can reproduce every single result – be it a table, a graphic, or a single number by the use of the Python script so it’s completely reproducible research if you like.
Part two is about theoretical valuation. I start with the basic theory of risk-neutral valuation of the martingale approach providing a good basis for what I cover later on. Then I introduce complete market models which means the Black-Scholes-Merton model, also the Cox-Ross-Rubinstein binomial pricing model, the benchmark cases in our world. I then go on to cover Fourier-based options pricing which many people have commented is a unique feature of the book, also with full Python codes starting from the very basics to the more advanced stuff because this is used later on for the calibration. As a general guiding principle in all my books I always provide the full code and data sets so everybody can reproduce every single result – be it a table, a graphic, or a single number by the use of the Python script so it’s completely reproducible research if you like.

Jacob Bettany: I know there is some desire in Computational Finance circles to do this, to publish results which are reproducible by supplying source code and source data.

Yves Hilpisch: Reproducibility is what you are looking for. When I look back at my studies I saw columns of numbers and thought, you have no means to verify or falsify what others had written. So for me it’s a guiding principle to provide all the things that I am doing in the book in a way that you not only see the code but we go further with the resources we offer, for example we offer free access to our Quant Platform so you can see the code reproducing the results in the way expected. This is only possible because we are doing this on a commercial basis, otherwise you wouldn’t like to program a platform for this very purpose – we have it already so it is only one step forward to provide everything on the platform.

Jacob Bettany: It’s certainly quite a novel approach to take with a book.

Yves Hilpisch: Yes it is, especially with our Python for Finance book, we had thousands of people on the website using the codes on our platform.

To finish up the overview of the book structure, I have one more chapter then in the theoretical part which is the valuation of American options by simulation which is a rather involved topic. Up until roughly 2000/2001 it hasn’t even been possible to value American options by Monte Carlo simulation – properly at least, or efficiently – so this covers this very important topic with lots of code involved and later on that’s a major numerical method used quite regularly.

Then we come to part three which is the major part, about market-based valuation which starts with the first overview example covering all the steps involved so, the market model, the valuation using FFT and other integration approaches, the calibration, the simulation to come up with values so it gives a good overview for a simple model, the Merton 76 Jump Diffusion model that I am using here, really the first step after the Black Scholes world if you like. I go on then and introduce a general market framework which is then used afterwards and is the model of Bakshi Cao Chen from 1997 and this involves a jump diffusion part, stochastic volatility component as well as a stochastic short rate model. So this combines really quite a few features and this is the model that is used for the rest of the book and within this framework I develop approaches for  option pricing by different numerical methods. There is a separate chapter on Monte Carlo simulation algorithms where the focus is on accuracy given different discretisation schemes and there are quite a few that are tested and compared. We then go on to model calibration which is discussed in general and illustrated with the use of multiple examples for different components. Given the model I have just described before we have to calibrate the short rate component, the jump diffusion component and you have to calibrate the stochastic volatility component and at the end you want to do a complete calibration of this whole thing so there’s lots of things going on when it comes to the numerics of calibrating the model and then obviously to simulate the model for the valuation of more exotic payoffs and so forth, this is chapter 12. Then we have the last chapter about dynamic hedging where I use this rather complex model and simulation methods to derive hedging strategies in this complex model. So it goes in depth when it comes to numerical methods applied to this rather sophisticated equity options pricing models.

Jacob Bettany: What do you think makes Python attractive for quants generally and useful for this particular purpose?

Yves Hilpisch: To start with it’s the syntax. Usually when I give my general Python for Finance talks and also in my recent Wilmott magazine article, it’s about Python being so close to the math syntax. If you study any kind of formal discipline then you might be used to writing LaTex code, and LaTex code and Python code is sometimes hardly distinguishable, and many people say writing Python code when it comes to algorithms looks like pseudo-code! It’s a natural computer language to describe and implement your algorithms and when people first encounter Python coming from compiled languages where you have expressive loops and so forth, they are surprised how concise the Python syntax can be. So it is very close to math and also to LaTex.

Also with the syntax come things like vectorisation, which is something also found in Matlab and R for example, comes speed in and of itself so writing vectorised code in Python, concise code, avoiding any kind of loop on the Python level leads typically to speed-ups in the range of 20/30/40 times, because what you can write in vectorised fashion is executed at the speed of C code. So, by writing the code in a rather concise vectorised fashion you typically gain not only a kind of readability but also the speed itself that you are looking for, there is a nice interplay between these two aspects.

Secondly, and this might be more significant still, is that if you have expert developers who say ‘well, I write any language, you know; I know five I am learning a sixth one – I don’t really care’ – for them the ecosystem is something that they actually really appreciate when it comes to Python. I am talking about our “scientific stack” as it’s called, or the “PyData stack” which includes data and scientific functionality that is used in many disciplines. We now have a very well developed ecosystem of libraries that can be used for finance. I guess one of the most popular libraries is pandas and many people are coming to Python especially for this library. It’s not only that it has introduced many new things but it also actually does a good job in wrapping many existing things like the matplotlib plotting library into a nice API, so it’s about functionality that hasn’t been there before. It’s also about efficiency since, if implemented correctly, it runs at about the speed of C code, and it’s also the convenience in terms of wrapping many of the libraries into a single unified API. For me it is also about the syntax again because when you are doing lots of teaching and training you really appreciate that because people get up to speed quickly. And then I guess an attractive factor these days obviously is the jobs market and the prospects of the language for your career. The job market for Python experts has grown tremendously with more and more financial companies applying Python.

Efficiency, development and maintenance I guess are major attractions for quants. Just last week I had a discussion with people who had issues in maintaining their MATLAB code and wanted to migrate to Python. Many people are coming only for this particular point, to be better able to maintain their applications and have an efficient development and production cycle . Last but not least also these days it is about the speed and performance that Python can offer. Still I meet people, but much less often than 5-10 years ago, who say Python is slow.

Python has been improved in so many ways these days to get the high performance level when used to write code based on the right idioms and libraries that the speed disadvantage definitely isn’t there anymore.

Jacob Bettany: Is it right that people can use pandas without necessarily being aware they are using Python?

Yves Hilpisch: That is so. I was suggesting actually that you could teach pandas as a language of its own because, as I was saying, pandas wraps so many things that you are not aware sometimes of which original library is used for what purpose. For example when you use the HDFStore which then gives you database access based on the HDF5 format, a standard binary format for high performance I/O, you are maybe not aware of what is going on in the background. Under the hood it uses two libraries one of which is the PyTables library, the Python wrapper around the HDF5 library and the HDF5 library itself, but you don’t need to know anything about any of these technologies I have mentioned, you just need to call a single function which is called HDFStore, you provide a filename and then you are done. So you don’t need to learn anything about how to work with the database, pandas simply wraps it in a nice function and you just notice that it’s rather efficient. One of the most impressive examples from my point of view is the plot method of the DataFrame object. The DataFrame object is the major class, the major object that is used in pandas and you can store megabytes of data in such an object, you can plot selections of your data in a fraction of a second by simply calling the method. So you have a single line of code and a nice plot of all the data you are looking for.

Jacob Bettany: In terms of derivatives analytics, could you outline that from a quant perspective and how you approach that generally from a system architecture point of view?

Yves Hilpisch: In the book, as in our company, we focus on two major topics. The first is computational finance and we’ve spoken quite a bit about that which involves Monte Carlo simulation for pricing, for Value-at-Risk, for XVA, then numerical methods for calibration, hedging procedures and so forth. The second major topic is financial data science. How to work with the data. So everything around data logistics – how to get the data, how to unify and manage the data, how to use it, how to plot it, how to work with it, how to store it and so everything around the data itself. The major topic in the book obviously is computational finance but in practice I would say that it is more about financial data science. So in practice 60% of problems might be concerned with financial data science, 40% only with computational finance. Having said that, it changes all the time and if you are touching upon these things you might think you need probably 100 libraries out of the scientific stack, but what we use in practice and what I use in the book it’s more or less only the very standard libraries like NumPy that provides the basic data structure for array structured data, pandas as I mentioned before and PyTables for data storage. These are more or less the only three, obviously something is needed for plotting like matplotlib, but there is nothing special that we use for doing that. You can get a long way with just the libraries that I mentioned. They are so powerful and flexible these days that you can do many amazing things. Of course the more specialised you get at what you want to do, the more special things you need so if for example you are doing calibration you might use from SciPy (which is also one of the major pillars in the scientific stack) specialised optimisation routines, for example. But this again builds on top of NumPy. So on a fundamental level we have the basic libraries NumPy, pandas, if you want to store stuff you have maybe PyTables used in the backend. On top of that are more or less the specialised libraries which then rely on the basic data structures provided by NumPy and pandas and as I mentioned before, maybe SciPy for optimisation and matplotlib for plotting.

Jacob Bettany: It’s interesting what you say about the 60% data science. There’s a trend towards data scientists going into finance, do you see it as becoming a more popular area in the field?

Yves Hilpisch: Yes, for sure, but data science is a little bit of an over-used term from my point of view. There is no computational finance without data science. It’s the basis for everything no matter what you want to do, especially if you are doing something on the retail side of banking, where we are not active at all in general, and you do analytics there with regard to retail bank accounts – then there is lots of data science and data logistics involved. No matter what you do on the quant finance side there is always data involved and with the emergence of data science, big data, machine learning, artificial intelligence and so-forth, there’s a drive, a need, a will to get more of this into the quant finance world.  When you see people in the traditional big data areas used by Google, LinkedIn etc, they have achieved remarkable results. Quants are thinking, wow, why not try this out in our space? If they can achieve such leaps with these technologies why not apply this to finance as well? More and more data scientists and machine learning, deep learning specialists and so forth are needed because banks are eager and hedge funds especially to find new sources of competitive advantages and alpha. I guess hedge funds  are more or less the driving force on this because they are usually principal owned and freer and quicker in implementing things. It has become a general trend that you see data scientists coming more and more to the financial field.

Jacob Bettany: What considerations should developers take into account when using Python for finance?

Yves Hilpisch: When you have a look at our open source derivatives library DX Analytics then you will find again not that many Python libraries that are used in the code base. The major ones again are NumPy and pandas and of course and a few others in a few different places. It’s not that you need 200 different things to put together in order to get something to work. These libraries typically already have the performance you need built in. Typically, my training programs first cover, beyond the very basics of Python, NumPy and pandas. After the first day people attending my training classes have covered the most important 20% of what you need to accomplish probably 80% of your everyday tasks. Of course in practice there is much more than these. This is where the scientific stack itself comes into play. People generally don’t build their own stacks these days, what they use are standard Python distributions – I don’t know the recent numbers but I think the most popular one is the Anaconda Python distribution from Continuum Analytics. With a single download, a single install you get the complete set of libraries generally needed. But, you must be aware that it’s not that easy to maintain such an installation. Anaconda with the Conda package manager does a great job here. Nevertheless we recommend, especially for larger teams, for things that we discussed before like reproducibility, our Quant Platform or similar Web-based environments. To make deployment much more efficient when using Python across teams across a whole organisation we provide the Quant Platform which not only has a complete stack of Python, complete in the sense that it includes 95% of what is needed in general, it also has R and Julia as languages and many development tools. Single accounts are free and it takes 20 seconds to set one up. And you have everything that you need to do any kind of derivatives or other financial analytics work. This is the most efficient way both for single people who want to work on a professional infrastructure, and also for companies. This is one of the pillars of our work these days, to deploy the platform for companies where multiple people share work together in a unified environment, where they can maintain the whole structure in a sensible fashion without taking care of gigabytes of source installations on Windows, Linux or whatever the diverse end users use.

Jacob Bettany: What are the prospects for C++ and will it always remain in the dominant position?

Yves Hilpisch: Of course C++ is still a force to be reckoned with in Quant Finance. When talking about, for example, Bank of America Merrill Lynch they have 14+ million lines of Python code in production for the Quartz risk management system. However, the analytics library for pricing and risk management itself is still in C++. There is one major reason for this, and that is because there is so much legacy code that works and is highly efficient, so there is no incentive and no need to migrate that to Python. This is one reason why C++ is here to stay for quite a long time. As a compiler language itself it has certain advantages compared to Python.

What I observe with people coming out of these environments and setting up their own shops they have major stakes in, they in general never start off by using C++ for anything any more because Python is now available.
With no legacy code they can start from scratch, they follow a green field approach and all these decision makers and owners then decide for Python. Whenever they come to a bottleneck in terms of performance it’s so easy with Python to interface with C++ they can still do whatever they want. As a general developers language Python is much more efficient, I guess it’s at least one tenth if not one fiftieth of the lines of code you typically need compared to C++. So this is the situation. C++ has advantages when it comes to performance and lots of experts being available in the analytics space, but Python is by now the most popular language in terms of languages being taught in the leading computer science departments. It’s one of the most popular languages for the data scientists and quants these days. So, the future will look different but for the next ten, fifteen years in the analytics space you’ve got other languages like Java and C++ in the big companies which will be here to stay.

Jacob Bettany: How easy is parallelisation in Python?

Yves Hilpisch: Well, parallelisation is a topic in and of itself. When people speak about Python being slow due to being an interpreted language, they also mention the Global Interpreter Lock, the so-called, GIL, which means that in principle you only have one process running. The code execution is constrained, if you like, by the two characteristics “single threaded” and “interpreted”. But this has changed and those arguments are no longer valid. Python is open in all directions. No matter what kind of nice technology there is around you will typically find Python wrappers and APIs support the technologies it can use.

One example is when it comes to big data and in memory analytics, everyone is talking about Spark. In that space, there is PySpark available. Python is one of the major supported languages in this ecosystem used by so many companies for their big data needs.

On the other hand when it comes to GPUs for example, Python again is one of the major supported languages by NVIDIA. There is for example the Numba library which makes it really efficient to write pure Python code and to compile this to CUDA compatible kernels, which you can then execute on the GPUs of NVIDIA. So two major approaches here: One is the Big Data approach where you have clusters of many nodes where you want to distribute your calculation. The other one being the GPU where you have highly parallelised processes on a single machine.

Python itself is also getting more and more parallelisation capabilities and libraries. There is now even in the standard library the multiprocessing module which makes it really efficient to parallise code execution. We have built this into our DX analytics library. For one of our clients I’ve a chart showing DX Analytics running on an AWS EC2 instance with 32 cores utilised tp 95-100%. There are others too, like Numba, the dynamic compiling library mentioned before, where you can write pure Python code which is compiled to machine code which can be run in parallel on your CPU. Similar to the GPU example I mentioned before. There are many, many others like Celery which has a job queue so the server can distribute Python tasks around clusters and so on. There are so many options that you have these days it has become a strength rather than something which is not even available there.

Jacob Bettany: How about supercomputing?

Yves Hilpisch: It depends on how you define supercomputing. I have given a talk at the supercomputing centre here in Germany in Juelich and I was speaking to a guy who did his PhD there at the centre. It was two years ago and this guy said he could put Python on a supercomputer with 1.95 million cores. Obviously compute time is very expensive on these systems but he was very pleased to have had 5 minutes and he did an example with Python, he ran a Python job and on almost 2 million cores. Python works obviously well when it comes to supercomputing.

I think there are limits with regard to performance on the other hand. Python cannot cover some areas, for example when it come to the high frequency trading space where typically different technologies are deployed and Python is hardly ever used, if at all then maybe for more administrational tasks but for the algorithms they use there and for the hardware that’s used like FPGAs and so forth Python is hardly ever a choice. So this is one of the fields where Python doesn’t play a role yet.

In and of itself Python has never been a high performance language nor a big data language, but it is a nice language with many cute libraries that might open the worlds of high performance computing and even supercomputing.

Jacob Bettany: What are you doing generally to promote Python and are there any supporting resources for the book we haven’t mentioned already?

Yves Hilpisch: On my website I have fifty or sixty talks I have given on Python for finance. I more or less started out speaking about the financial applications during Python conferences like the EuroPython or PyData which are general Python conference although PyData is specialised focusing on the data science part. In addition to that I’m doing podcasts and webinars, I just recently did a webinar on Automated Trading with Python.

Community-wise I think the biggest thing I’m doing is  the Python for Quant Finance meetup group in London and also in New York. The London one is much bigger and more active as I’m here in Europe, we have 1400+ members as of now and we have 8 meetups with close to 100 people per year and a few smaller events that we do. We have speakers on different topics and I usually give a talk there. In New York the same thing, the last meet up I had in January (May) when I was there. In addition to that what we do is the For Python Quants Conference series which is as of now done once a year in New York and once a year in London. This started out two years ago, the fifth one will come up in May in New York. It started as a one day conference and has grown to be a full week with four different bootcamps and one conference day, so there will be four different days of training covering many different aspects from the finance universe. This year for the first time we have added Python for Excel. There are many nice ways to integrate Python with Excel to replace VBA with Python analytics. On the conference day usually we have about ten talks during the day, an expert panel afterwards.

Then we have various websites that we provide, one of them being our website for the book itself, and there you can find links to many other resources, providing a GitHub repository, providing all of the Jupyter Notebooks and the codes of the book. I provide the link to the Quant Platform where you can log in and execute the code directly. I have a brief video illustrating how to use everything. I do this for every one of my books.

Jacob Bettany: You have another book coming out also, perhaps you could also tell us a little bit about that one?

Yves Hilpisch: Yes, that is Listed Volatility and Variance Derivatives: A Python-based Guide which originated in a similar spirit to this one. It all started out this time with a client project for Eurex, the German derivatives exchange, and not with a university lecture.  We developed Python-based tutorials for them covering both listed volatility and variance products. I am grateful that they allowed me to use the content to write the new book. It is at the final editing stages at the moment and due to be published by Wiley in October or November.

Jacob Bettany: I’m always curious to talk to technologists about  blockchain technology so I’d be interested to hear your views and how you think it could affect the future of finance.

Yves Hilpisch: Yes, well let me start by saying I am not an expert in blockchain. I have seen many startups pitching at big data conferences, I’ve seen talks about it but I am not an expert. First of all blockchain is from a technological point of view nothing but an immutable database. So another way to store information with some things attached. Once it is written down in the blockchain it is impossible or, let’s say, really hard, to change what is written down. You then have two areas where you can apply this. The one thing I find really charming is when it comes to property rights in general for example. When you have this immutable database, this might help me in documenting who owns what with regard to whatsoever be it digital assets be it real assets, there are so many things from art, to digital music rights and whatsoever, and I can really associate with that. If you have this technology and it’s pretty easy to build up such a database and to document property rights and for everybody else to see who’s owning what, this is kind of a nice thing.

There’s also a lot of interest when it comes to the transactional element, where you need much faster technology and this whole immutable database thing with its technological infrastructure was not designed to be very fast, indeed everything in this regard is pretty slow. So when you think of financial markets and high frequency trading and so forth there is no technology available yet to set up a blockchain to record high frequency transactions. At the moment it takes ten minutes or so to complete and document a transaction. But this might change, there are so many people working on that and this is more or less only a technological problem so you would expect to see new processes and techniques speeding things up.

When it comes to the origin of everything, blockchain plus currency which means bitcoins, I see partly tremendous benefits, partly no benefits at all. The tremendous benefits we see in the reduction of transaction costs which everyone can relate to. If someone in a European country wants to transfer money to Africa, Middle East or whatever region which might be classified as an emerging region, when they do it via Western Union it is costing maybe 25-30% transaction cost to transfer money. This can be done with bitcoin at almost no transaction cost at all. For these people there will be huge impacts. But in the end, everybody is talking about disruption, so when I see all the VC Funding going on and all the big banks putting that much money into this space, I haven’t seen yet anything where we say this disrupts the financial industry as we know it today. It improves some things, it might bring efficiency gains, but from my point of view so far we’ve been more or less only incremental. Even this very good application of the currency of bitcoin and other cryptocurrencies that are around right now, the size of these currencies is so small compared to the real markets you see that there is no disruption. If it was to be truly disruptive for everybody in the world then it would grow much faster, it would have captured already 10 or 20% of Forex trading or whatever, but this is a long way from the case. Disruption from my point of view means that something fundamentally changes. And now that you say well we had a 25 different database approaches now we have 26, and in certain places we might have certain benefits out of the 26th option that we have This does not seem to be a fundamental change.

Everybody is afraid of missing out on something. All the banks are investing that much money now, opening labs in London, eg at Level 39, the tech incubator, or doing a blockchain lab on their own. Some people say openly well we are testing all that stuff because we don’t want to miss anything. Others have a more kind of a marketing take and like to say that they are the forefront of everything and we’ll apply blockchain to whatever we find is beneficial. So I guess most of them don’t have a clue what to do with it in the future that will prove useful at all and they just say they want to do something because everyone is talking about it, let’s do this… it’s like Big Data, everyone needs to do big data, if you are the chief executive of a company and you say you’re not doing big data then it might be bad for your stock prices.

Comments are closed.