Finally, The Secret to Credit Risk Modeling with Python

Blog

January 9, 2023 | Jonathan Pryer

I’m going to defer to a quote from the Banker Scene in Monty Python’s Flying Circus to start this blog:

“I’m glad to say we’ve got the go-ahead to lend you the money you required…

We will, of course, want as security the deeds of your house, of your aunt’s house, of your second cousin’s house, of your wife’s parents’ house, and of your granny’s bungalow. And we will, in addition, need a controlling interest in your new company, unrestricted access to your private bank account, the deposit in our vaults of your three children as hostages…”

It may seem a little odd to quote Monty Python at the start of a blog about credit risk modeling using Python but, I have two very good reasons:

While the level of security the borrower in this scene needed to provide is absurd, it highlights the need for lenders to fully understand the risk of a loan in order to not require their customers to put their children up as collateral…
You’ll find out in a second

When Monty Python was filmed back in 1969 credit risk modeling was in its infancy, in fact it was barely crawling let alone walking into adolescence. Understanding default risk was a complicated, long, and labor-intensive process. We can thank risk modeling pioneers and languages like Python for helping banks and other lenders make more informed decisions about loan applications.

A Brief Recap on the History of Python

It was the week of Christmas in 1989 in Amsterdam, 48* F and cloudy, when Guido van Rossum started tinkering on a small project to busy himself while his employer’s office closed for the holiday. van Rossum set out to create a language, based on a previous project called ABC, that was reliant on the Unix infrastructure and conventions, without being Unix-bound. He placed a high emphasis on readability and uniformity in an era that praised languages like C and Perl (PHP’s high-maintenance personality would crash the party later). The result was a gorgeously elegant open source language named after (here’s reason 2) Monty Python’s Flying Circus.

Since its official release in 1993, Python has displayed its prowess across multiple industries and in widely varying use cases. It is now the most widely taught introductory language in the top computer science programs worldwide, was named the most in-demand programming language in the U.S. by Forbes’ fintech columnist, and hit #2 on the list of most GitHub pulls by language in 2017.

Python Risk Modeling in Finance

One increasingly popular application of Python is in credit risk modeling. Many brilliant data scientists and analysts wrangle the usability of Python to implement machine learning and deep learning algorithms. From simple algorithms like logistic regression, decision trees, random forests, support vector machines via classification and regression, to more advanced methods like clustering and neural networks, the many advantages of Python are opening the doors to apply artificial intelligence to credit risk challenges.

Yet, despite the promise that Python and other popular languages bring to financial services, we talk to companies nearly every day who vent frustration with the process of testing and deploying models to production, then maintaining or changing models as business objectives evolve. It has been stated that “The next breakthrough in data analysis may not be in individual algorithms, but in the ability to rapidly combine, deploy, and maintain existing algorithms.” I would venture to guess that most of us in financial services (or any data-driven industry, for that matter) would say, “Of course.” Delays in model deployment are nothing new, and we’re ready for a change.

Challenges Deploying Credit Risk Models using Python

Ben Lorica, the Chief Data Scientist at O’Reilly once called out the delays that exist from modeling to production, identifying two root causes for the long analytical lifecycle:

Silos -Data Scientists and Production Engineering teams are historically divided, resulting in a wall between the two organizations that results in inherent delays.
Recoding – Models often have to be recorded before they can be deployed into production. So, while your data scientist may prefer Python risk modeling, your production system probably requires Java.

You’re probably squirming in your seat right now because you get the same familiar anxiety that I do when thinking about this dynamic. But, stay with me.

Vent your frustration here

powered by Typeform

It Doesn’t Have to Be This Way

If everybody is so frustrated with this problem, why doesn’t anyone fix it?

I’m so glad you asked.

Provenir tackled this challenge head on in two very distinct ways:

Empowerment – Silos or no silos, your production system should be so easy to use that your data scientist can deploy a model autonomously. Deploying a model into a Provenir decisioning environment is as simple as attaching a document to an email. Just upload the document, visually map your values, and go.
Native Operationalization – Recoding?! That sounds like re-work to me. Why not operationalize models in their native languages? So, Provenir supports the native operationalization of risk models in R, SaS, and Excel along with PMML and MathML. We also include support for Python, not just because we value any opportunity to share a Monty Python joke, but because we know how beneficial it is to let your team create models in the language they’re most comfortable with!

We’ve put together a quick demo, so you can see for yourself how easy it is to deploy a Python model:

Watch the demo

Next Blog

Replacing Your Legacy Credit/Loan Application Processing Software

Back to Blog Posts

Next Blog

Replacing Your Legacy Credit/Loan Application Processing Software

Latest Resources

View all

April 24, 2024

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
OptanonConsent	1 year	OneTrust sets this cookie to store details about the site's cookie category and check whether visitors have given or withdrawn consent from the use of each category.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin to store whether or not the user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
debug	never	Cookie used to debug code and website issues.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
loglevel	never	Maintains settings and outputs when using the Developer Tools Console on current session.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_calendly_session	21 days	Calendly, a Meeting Schedulers, sets this cookie to allow the meeting scheduler to function within the website and to add events into the visitor’s calendar.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
AWSALBTG	7 days	AWS Application Load Balancer Cookie. Load Balancing Cookie: Used to encode information about the selected target group.
AWSALBTGCORS	7 days	AWS Classic Load Balancer Cookie: Used to map the session to the instance. This cookie is identical to the original ELB cookie except for the attribute &SameSite=None;

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_3DY9STJEMW	2 years	This cookie is installed by Google Analytics.
_ga_J5QKCECHV7	2 years	This cookie is installed by Google Analytics.
_gat_UA-67726727-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gd_session	4 hours	This cookie is used for collecting information on users visit to the website. It collects data such as total number of visits, average time spent on the website and the pages loaded.
_gd_visitor	2 years	This cookie is used for collecting information on the users visit such as number of visits, average time spent on the website and the pages loaded for displaying targeted ads.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
6suuid	2 years	Registers user behaviour and navigation on the website, and any interaction with active campaigns. This is used for optimizing advertisement and for efficient retargeting.
ajs_anonymous_id	never	This cookie is set by Segment to count the number of people who visit a certain site by tracking if they have visited before.
ajs_user_id	never	This cookie is set by Segment to help track visitor usage, events, target marketing, and also measure application performance and stability.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
attribution_user_id	1 year	This cookie is set by Typeform for usage statistics and is used in context with the website's pop-up questionnaires and messengering.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
ln_or	1 day	Registers statistical data on users’ behaviour on the website. Used for internal anyalytics by the website operator.
lpv730213	30 minutes	Pending.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
rl_anonymous_id	never	Generates an unique anonymous Id to identify a user and attach to a subsequent event.
rl_user_id	never	To store a unique user ID for the purpose of Marketing/Tracking.
UID	2 years	Scorecard Research sets this cookie for browser behaviour research.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
visitor_id730213		Pardot Website tracking.
visitor_id730213-hash		Pardot Website tracking.
visitor-id	1 year	Pardot Website tracking.

Cookie	Duration	Description
_an_uid	7 days	Presents the user with relevant content and advertisement. The service is provided by third-party advertisement hubs, which facilitate real-time bidding for advertisers.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
MUID	1 year 24 days	Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.