Finally, The Secret to Credit Risk Modeling with Python

Author: Paul Thomas

I’m going to defer to a quote from the Banker Scene in Monty Python’s Flying Circus to start this blog:

“I’m glad to say we’ve got the go-ahead to lend you the money you required…

We will, of course, want as security the deeds of your house, of your aunt’s house, of your second cousin’s house, of your wife’s parents’ house, and of your granny’s bungalow. And we will, in addition, need a controlling interest in your new company, unrestricted access to your private bank account, the deposit in our vaults of your three children as hostages…”

It may seem a little odd to quote Monty Python at the start of a blog about credit risk modeling using Python but, I have two very good reasons:

  1. While the level of security the borrower in this scene needed to provide is absurd, it highlights the need for lenders to fully understand the risk of a loan in order to not require their customers to put their children up as collateral…
  2. You’ll find out in a second

When Monty Python was filmed back in 1969 credit risk modeling was in its infancy, in fact it was barely crawling let alone walking into adolescence. Understanding default risk was a complicated, long, and labor-intensive process. We can thank risk modeling pioneers and languages like Python for helping banks and other lenders make more informed decisions about loan applications.

A Brief Recap on the History of Python

It was the week of Christmas in 1989 in Amsterdam, 48* F and cloudy, when Guido van Rossum started tinkering on a small project to busy himself while his employer’s office closed for the holiday. van Rossum set out to create a language, based on a previous project called ABC, that was reliant on the Unix infrastructure and conventions, without being Unix-bound. He placed a high emphasis on readability and uniformity in an era that praised languages like C and Perl (PHP’s high-maintenance personality would crash the party later). The result was a gorgeously elegant open source language named after (here’s reason 2) Monty Python’s Flying Circus.

Since its official release in 1993, Python has displayed its prowess across multiple industries and in widely varying use cases. It is now the most widely taught introductory language in the top computer science programs worldwide, was named the most in-demand programming language in the U.S. by Forbes’ fintech columnist, and hit #2 on the list of most GitHub pulls by language in 2017.

Python Risk Modeling in Finance

One increasingly popular application of Python is in credit risk modeling. Many brilliant data scientists and analysts wrangle the usability of Python to implement machine learning and deep learning algorithms. From simple algorithms like logistic regression, decision trees, random forests, support vector machines via classification and regression, to more advanced methods like clustering and neural networks, the many advantages of Python are opening the doors to apply artificial intelligence to credit risk challenges.

Yet, despite the promise that Python and other popular languages bring to financial services, we talk to companies nearly every day who vent frustration with the process of testing and deploying models to production, then maintaining or changing models as business objectives evolve. It has been stated that “The next breakthrough in data analysis may not be in individual algorithms, but in the ability to rapidly combine, deploy, and maintain existing algorithms.” I would venture to guess that most of us in financial services (or any data-driven industry, for that matter) would say, “Of course.” Delays in model deployment are nothing new, and we’re ready for a change.

Challenges Deploying Credit Risk Models using Python

Ben Lorica, the Chief Data Scientist at O’Reilly once called out the delays that exist from modeling to production, identifying two root causes for the long analytical lifecycle:

  1. Silos -Data Scientists and Production Engineering teams are historically divided, resulting in a wall between the two organizations that results in inherent delays.
  2. Recoding – Models often have to be recorded before they can be deployed into production. So, while your data scientist may prefer Python risk modeling, your production system probably requires Java.

You’re probably squirming in your seat right now because you get the same familiar anxiety that I do when thinking about this dynamic. But, stay with me.

Vent your frustration here

powered by Typeform

It Doesn’t Have to Be This Way

If everybody is so frustrated with this problem, why doesn’t anyone fix it?

I’m so glad you asked.

Provenir tackled this challenge head on in two very distinct ways:

  1. Empowerment – Silos or no silos, your production system should be so easy to use that your data scientist can deploy a model autonomously. Deploying a model into a Provenir decisioning environment is as simple as attaching a document to an email. Just upload the document, visually map your values, and go.
  2. Native Operationalization – Recoding?! That sounds like re-work to me. Why not operationalize models in their native languages? So, Provenir supports the native operationalization of risk models in R, SaS, and Excel along with PMML and MathML. We also include support for Python, not just because we value any opportunity to share a Monty Python joke, but because we know how beneficial it is to let your team create models in the language they’re most comfortable with!

We’ve put together a quick demo, so you can see for yourself how easy it is to deploy a Python model:
Watch the demo


1 https://queue.acm.org/detail.cfm?id=2431055
2 https://www.oreilly.com/ideas/data-scientists-and-the-analytic-lifecycle

[if lte IE 8]
[if lte IE 8]