“Not just GenAI” Leadership Series — Part III
Your team is planning to build an AI/ML model for predicting who will default on their borrowed loan.
The historical dataset has several rows, one row per borrower, where Features/Characteristics of the borrowers are captured. These are called the independent variables. The target/dependent variable is also captured — namely, Defaulter (with values Yes or No).
For the sake of simplicity and ease of explanation, let’s just go with 2 features – Employment Status and Amount.
So, for each borrower, you have the employment status and amount values, and the target variable will indicate whether the borrower has defaulted or not. These are all historical/factual data.
Your team plots all these points on a graph, with one point representing one borrower (obviously, the Employment Status and Amount become the 2 axes of the graph).
Your team’s goal is to draw a line on the chart, that separates the defaulters from the non-defaulters in the ‘cleanest’ manner. Cleanest here means, one side of the line should have mostly defaulters, and the other side, mostly non-defaulters. Sure, perfectly clean groups on either side is ideal, but highly unlikely, since you are using a straight line.
Again, there are infinite lines that can be drawn on this graph, but the goal is to draw one that gives you the cleanest separation of the 2 classes — Defaulters and Non-Defaulters.
Deciphering Loan Approval: Linear Discriminants in AI/ML Analysis
Once a new person applies for a loan, your team collects the info about Employment status and loan amount, and plots that on the graph. Depending upon which side of the line the new applicant falls – defaulter side or non-defaulter side — your team can take the decision on whether to give the loan or not.
The job of any AI/ML algorithm would be to arrive at that line/model.
Since these models distinguish or discriminate between 2 or more classes, they are called Linear Discriminants.
2 important points:
- Again, this is a very simplistic example with just 2 variables. In reality, there will be a large number of variables considered for such modeling.
- Also, practically speaking, your team does not need to plot such graphs (indeed it is impractical when there are several variables). Instead, they will resort to ML code/tools for this analysis. But this explanation should give an idea about what happens under the hood.
Discriminants are an example of Discriminative AI/ML that we referred to, in the previous posts. Another type of Discriminative AI/ML algorithms, are called Logical Discriminants. Decisions Trees and their variations are examples of such Logical Discriminants.
Unraveling Decision Trees: Splitting Data for Loan Approval
Decision Trees and their ensembles/variations are one of the most widely used algorithms in AI. Let’s use the same business example as above, for understanding Decision Trees.
The Decision Tree algorithm’s job is to figure out which variable — Employment status or Amount — can be used to split the data into 2 clean groups — Defaulters and non-defaulters.
Important: For the Linear Discriminant model above, you had to consider both the variables Employment Status and Amount, together, for building the model. Whereas in the case of Decision Trees, you would consider these variables one at a time.
Let’s say the data is split into 2 groups based on Employment status. In other words, the variable Employment Status is used to discriminate the data into 2 groups (hence the name Logical Discriminants). Sure, you will very likely not get 2 clean groups. The group of Employed folks, for example, will likely have both defaulters and non-defaulters (hopefully more of the latter).
Now the algorithm will take each of the above 2 groups and further divide them based on the second variable, Amount. The group where Amount is greater than some value will have Defaulters and non-defaulters, So also the other group.
Basically, the job of the Decision Tree algorithm is to use each variable to keep breaking the data into Defaulters and non-defaulters, thus forming a tree, a Decision Tree.
When a new person applies for a loan, he/she is placed into the appropriate group based on the values of Employment Status and Amount. If that group happens to be the non-defaulter group, great, the person is given the loan.
A Linear Discriminant creates one single divider line/plane to segregate the data into relatively clean classes, using a single numeric formula.
A Logical Discriminant (e.g. Decision Trees) will break the data into logical groups based on the values of the variables, till it gets near-clean groups.
Both these are examples of Discriminative AI/ML. They focus on the boundaries of the classes. The question that they ask is — What is the probability of getting a particular value of the target variable, given a set of values of the independent variables? That is, they look at Conditional Probability, as discussed earlier.