Elevate: On Driving Innovation in Credit-Scoring through Advanced Analytics

Blog

April 9, 2021 | Jonathan Pryer

Elevate Credit, an alternative credit provider that lends to customers currently underserved by mainstream finances, requires a robust data science team and industry-leading technology stack to originate more than $4.9 billion in credit to more than 1.8 million non-prime customers in the UK and US¹. The company is outspoken on its dedication to advanced analytical techniques as a means to comply with regulatory responsibilities and to benefit its growing customer base.

Because Elevate sets itself apart with its data driven approach—it’s not uncommon for Elevate to use thousands of different facts in the process of underwriting a new customer—we knew we had to speak with one of the forward-thinking data scientists in Elevate’s Risk Management department. John Bartley has over ten years of experience in financial services and recently oversaw an effort to transition Elevate’s UK’s credit risk models from SAS to R. You will definitely hear more about that in this interview.

Adi: John, thank you for taking the time to speak with us about Elevate’s impressive data science initiatives. Can you give us an overview of your recent work with Elevate?

John: Thank you, Adi. Absolutely. Of course, we’re excited about the recently launched Elevate Labs at our Advanced Analytics Center in San Diego, California. Elevate has always been committed to innovating the world of data science in credit risk, so this facility is just the next step in that constant evolution. It is a pleasure to work with the high caliber talent we’re able to attract because of that commitment.

On the day-to-day, we’re focused on continually improving our analytical models to serve the non-prime market in the US and high-cost short-term credit market in the UK. For example, we have observed huge uplift in one of our acquisition Channels in the UK as a result of improvements in our modeling. The better that our models are able to explain and predict consumer behaviour, the more of the alternative credit market we’re able to address.

A: What types of data is Elevate using in its underwriting process?

J: Our risk analytics stack utilizes a terabyte-scale Hadoop infrastructure composed of thousands of elements, customer records, and other wide-ranging data inputs including credit bureau data, web behavioral and performance data, bank transaction data and other non-traditional data. All of this works to give us a holistic view of the customer and helps us accurately assign risk to those applications.

Advanced machine learning techniques let us consider these factors in the development of algorithms which better predict behaviour and customer vulnerability. Actually, we recently moved to R because of the range of modeling techniques R is able to support. Using appropriate modeling techniques has allowed us to significantly simplify our underwriting and lead to more accurate predictions of likely loan performance.

Also read: What is credit underwriting?

A: What prompted the adoption of R?

Before moving to R, we used SAS to develop pretty sophisticated credit risk models. SAS has traditionally been the software of choice for many statisticians and credit risk professionals working in the banking and financial services sector and although SAS is good for many applications in this sector, we find that it is far less flexible when compared to an open source programming language like R.

To provide an example, a far more complex credit risk strategy (e.g. population segmentation) was required to get our historic linear model’s to provide the necessary lift to adequately underwrite a population. This is because many consumers in the high-cost short-term credit market have complex and varied credit histories. At Elevate, our goal is to provide our customers with a comparable experience to prime lending. In order to do this, we need to use tools (such as R) that allow us to build more complex models to adequately understand the complex financial histories of our consumers.

R has a number of packages for powerful machine learning algorithms such as RandomForest and XGBoost. While SAS does support some of these modeling techniques we have found it is far quicker to build, and implement some of the newer techniques using R. In my experience, R also provides better support for multi-threading which often helps us to train our models in far shorter periods of time. In addition, the range of algorithms SAS has developed which utilize their high performance technology is limited in comparison to the options I have when considering a modeling challenge using R.

And, of course, you know we deploy our models through your platform. Provenir gives us the capability we need to test and operationalize our advanced analytical models so we can make strategic changes quickly. So, we felt comfortable making big modeling changes from that perspective.

See how simple it is to deploy R and Python models in Provenir with this insider how-to:

Watch the Demo

A: Moving away from linear models, what techniques are you currently focusing on?

Linear models have been used extensively in credit risk because they are relatively simple to construct and easy to understand. However, given the limitations of some credit risk models that we discussed and the complexity of our datasets, we now utilize a combination of both linear and nonlinear modeling techniques.

A: Are you interested in throwing your experience into the linear vs. nonlinear discussion?

Sure. In a situation where there is a simple linear relationship between predictors and outcomes, linear models work very well. However, linear models have many limitations because they often struggle with complexity and nonlinear relationships.

A linear model may look like this:

Image source:
https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/2000px-Linear_regression.svg.png

The correlation between the predicted and actual outcome of a tree-based model on a complex non-linear dataset may look something like this.

By contrast, a tree-based model does a much better job at approximating the complexity of the dataset.

Tree-based models afford many advantages. For example, tree-based models are quite good at mapping non-linear relationships which simply can’t be modeled by linear regression. Tree-based models are typically highly-accurate, very stable but can be more difficult to explain.

(To note: It is important to consider that tree-based models contain built-in segmentation when using boosting and bagging techniques.)

With complex data sources where different segments may exhibit very different behaviour (holding everything else constant), a tree-based model is often better at predicting an outcome. Utilizing tree-based models in conjunction with including more characteristics has helped us to significantly improve our customer underwriting.

A: Wow. So, you’ve presumably improved the accuracy of your models, how has that impacted the business strategy challenges you mentioned?

Using a combination of both linear and nonlinear modeling techniques gives us the flexibility to significantly simplify our business strategy. For example, with our new machine learning models, we only need to have a handful of strategies in place. We get a simplified strategy and model that is more adept at explaining different types of people some of which we weren’t able to underwrite before.

A: Have you seen an uplift in approval rates since you deployed this new R model in production?

Although it is still too early to tell, initial results indicate that our new model is performing significantly better than the prior model. We’ve seen an increase in our approval rates and as our recently underwritten vintages continue to develop over, we continue to dial up performance. Obviously that has significant implications for our customers. At Elevate, we feel strongly about helping our customers find financial relief and as we improve our modeling, we improve our ability to serve a population which is underserved by mainstream finance.

A: Changing direction a little bit, I have one last question before you go. You have an impressive history in data sciences and financial services. What are your thoughts on the future of data science in this industry?

Much has changed in analysis and data science in the last 10 years. Statisticians and data scientists have always worked to predict the probability of default, but the techniques that statisticians and data scientists use have evolved significantly.

Ten years ago, for example, nonlinear models were challenging because many organizations didn’t have the computational power or technical skills in place to effectively use these advanced techniques. Fast forward ten years and that has completely changed. This movement toward nonlinear models provides better accuracy while empowering a simplified risk strategy.

That’s where the future begins. Now that the industry has begun to accept more complex modeling techniques it is in a better position to accept non-conventional data sources.

Currently, most organizations have both summaries and tradeline variables from Bureaus. Many in the industry are very reliant on summary variables, though there is a trend toward using tradeline variables. That’s where the next big change is: It’s not just around modeling techniques, but around data sources. I believe we will see the need to bring in different and more granular data sources.

As capacity expands, there will be more emphasis placed on non-traditional variables, which is something we already do at Elevate. Organizations will want to be able to analyze things such as an individuals’ bank transactions, especially for thin bureau file applications, to allow them to decision an application with varying data sources.

A: John, thank you for taking the time to share your expertise today. Looking forward to speaking again soon.

J: Cheers!

powered by Typeform

About John

John Bartley is a Data Scientist at Elevate at Elevates offices in London, UK. John has over 10 years’ experience as an analyst and data scientist predominantly in the banking and financial services sector.

Next Blog

The Benefits and Risks of Emojis in Payments ‎😃🤫🧐

Back to Blog Posts

Next Blog

The Benefits and Risks of Emojis in Payments ‎😃🤫🧐

Latest Resources

View all

April 8, 2024

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-functional	1 year	The cookie is set by the GDPR Cookie Consent plugin to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
OptanonConsent	1 year	OneTrust sets this cookie to store details about the site's cookie category and check whether visitors have given or withdrawn consent from the use of each category.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin to store whether or not the user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
debug	never	Cookie used to debug code and website issues.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
loglevel	never	Maintains settings and outputs when using the Developer Tools Console on current session.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_calendly_session	21 days	Calendly, a Meeting Schedulers, sets this cookie to allow the meeting scheduler to function within the website and to add events into the visitor’s calendar.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
AWSALBTG	7 days	AWS Application Load Balancer Cookie. Load Balancing Cookie: Used to encode information about the selected target group.
AWSALBTGCORS	7 days	AWS Classic Load Balancer Cookie: Used to map the session to the instance. This cookie is identical to the original ELB cookie except for the attribute &SameSite=None;

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_3DY9STJEMW	2 years	This cookie is installed by Google Analytics.
_ga_J5QKCECHV7	2 years	This cookie is installed by Google Analytics.
_gat_UA-67726727-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gd_session	4 hours	This cookie is used for collecting information on users visit to the website. It collects data such as total number of visits, average time spent on the website and the pages loaded.
_gd_visitor	2 years	This cookie is used for collecting information on the users visit such as number of visits, average time spent on the website and the pages loaded for displaying targeted ads.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
6suuid	2 years	Registers user behaviour and navigation on the website, and any interaction with active campaigns. This is used for optimizing advertisement and for efficient retargeting.
ajs_anonymous_id	never	This cookie is set by Segment to count the number of people who visit a certain site by tracking if they have visited before.
ajs_user_id	never	This cookie is set by Segment to help track visitor usage, events, target marketing, and also measure application performance and stability.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
attribution_user_id	1 year	This cookie is set by Typeform for usage statistics and is used in context with the website's pop-up questionnaires and messengering.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
ln_or	1 day	Registers statistical data on users’ behaviour on the website. Used for internal anyalytics by the website operator.
lpv730213	30 minutes	Pending.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
rl_anonymous_id	never	Generates an unique anonymous Id to identify a user and attach to a subsequent event.
rl_user_id	never	To store a unique user ID for the purpose of Marketing/Tracking.
UID	2 years	Scorecard Research sets this cookie for browser behaviour research.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
visitor_id730213		Pardot Website tracking.
visitor_id730213-hash		Pardot Website tracking.
visitor-id	1 year	Pardot Website tracking.

Cookie	Duration	Description
_an_uid	7 days	Presents the user with relevant content and advertisement. The service is provided by third-party advertisement hubs, which facilitate real-time bidding for advertisers.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
MUID	1 year 24 days	Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.