Understanding Machine Learning

5th Jun 2020, By Navarun Jain

Understanding Machine Learning

Shining some Light on the Black Box that is ML

You’ve probably heard that Machine Learning (ML from here) can be highly predictive and sophisticated. Unlike stochastic models, ML algorithms do not rely on any underlying assumptions about the data, and can learn adaptively. Moreover, they can capture hidden patterns and trends in data that cannot be detected by stochastic models. This is what gives them the edge. The Lux Profitability Analysis Tool is one such example which promises high predictive power and tailormade sophistication.

But let’s leave the prediction aside for a minute and address the elephant in the room – ML is a black box. And this poses a potentially big communication gap. We know that ML algorithms can learn adaptively from data and are extremely robust, but is that enough for it to really be useful?

Let’s consider a banking example. Say we’re trying to predict customer profitability and created a model that performs extremely well, so you can be sure that we’ve captured all those hidden nuances and patterns in your data. Now I know what you’re thinking – what exactly are these hidden patterns and how can we see them? How can we use the sophistication provided by ML to generate information that can help the business?

This is where Lux can help.

Let’s stick to the profitability example. The Lux Profitability Analysis Tool can not only create robust models, but can also help you leverage them to deliver intelligent and valuable insights on your data. Here’s how this works:

1. Variable Importance

This one is fairly self-explanatory. We try to look at what features in the data affect the response or outcome, which in this case is profitability.

Chart 1: Variable Importance

The chart shows that CLIENT_TERM (which is the amount of time someone has been a customer), Contractual_Interest_Rate (of a given loan), TERM (of the loan), Account Status (which tells us whether a given loan account is open, in default or in recovery) and LOAN_AMT_BAND are the top 5 key drivers of profitability for our bank here.

2. Interaction Effect Analysis

We mention earlier that ML algorithms also model hidden relationships in data. But not all relationships are actually useful. Which ones are though? Using Interaction Effect Analysis, we simulate our model on all possible interactions between the top 10 variables (as given by the Variable Importance Analysis above).

The way this works is very simple – if we have 3 variables (say X, Y and Z), this yields 3 possible two-way interactions between them (XY, YZ and XZ). Our framework takes each of these interactions and automatically simulates the model on these to tell us which ones are more important.

Chart 2: Top Interaction Effects

Looking at the chart above, we can see that the relationships between CLIENT_TERM & Contractual_Interest_Rate and CLIENT_TERM & LOAN_AMT_BAND are key to predicting profitability.

Traditional statistical or actuarial models are bound by very specific assumptions and cannot operate beyond the realm of these assumptions. However, with ML, there is only one assumption – There are no assumptions. It’s precisely because of this that we are able to leverage the power these algorithms have to detect relationships like the ones described above.

What we have described so far is interpretation on a macro level – this is Global Interpretation. We can also explore how our model is thinking on a micro, i.e., record level – this is Local Interpretation.

3. Local Interpretation

Let’s pick a random record from our data. We have 17 predictors in this dataset. The goal of Local Interpretation is to find out – for that particular record – what drives our predictions?

We illustrate this here for a case where our predicted profitability is not too good (i.e. our model predicts high loss-making behavior for this client).

Chart 3: Local Interpretation

The premise behind this is simple – we make all the variables in this record play a game with the overall payout being the predicted profitability. We then measure the contribution that each player in this game has on the payout. As the above chart shows, all variables except the number of loans the client has taken out have a negative impact on profitability. CLIENT_TERM and TERM (of the loan, in years) have the worst impacts on profitability, with Contractual_Interest_Rate and LOAN_AMT_BAND also having high negative impacts. We can infer from this chart that the client in question is new, has taken out a high-value long-term loan for a relatively high interest rate, and this does not seem to be very good for business. The only thing that works in this client’s favor is that she has only one loan with the bank.

These conclusions, however, only apply to the one client we are studying here. If we were to repeat these analyses for, say the top 50 best and worst cases, patterns could start to emerge. We could potentially see what is common among these best and worst cases and make generalizations (to some extent) on what risk groups are good or bad for business. The Lux Profitability Analysis Tool is capable of performing such analyses. Combined with global trend analysis, Lux can give you a very clear and accurate picture of your business through your data.