Skip links

“Not just GenAI” Leadership Series — Part III

Jump To Section

“Not just GenAI” Leadership Series — Part III

Your team is planning to build an AI/ML model for predicting who will default on their borrowed loan.

The historical dataset has several rows, one row per borrower, where Features/Characteristics of the borrowers are captured. These are called the independent variables. The target/dependent variable is also captured — namely, Defaulter (with values Yes or No).

For the sake of simplicity and ease of explanation, let’s just go with 2 features – Employment Status and Amount.

So, for each borrower, you have the employment status and amount values, and the target variable will indicate whether the borrower has defaulted or not. These are all historical/factual data.

Your team plots all these points on a graph, with one point representing one borrower (obviously, the Employment Status and Amount become the 2 axes of the graph).

Your team’s goal is to draw a line on the chart, that separates the defaulters from the non-defaulters in the ‘cleanest’ manner. Cleanest here means, one side of the line should have mostly defaulters, and the other side, mostly non-defaulters. Sure, perfectly clean groups on either side is ideal, but highly unlikely, since you are using a straight line.

Again, there are infinite lines that can be drawn on this graph, but the goal is to draw one that gives you the cleanest separation of the 2 classes — Defaulters and Non-Defaulters.

Deciphering Loan Approval: Linear Discriminants in AI/ML Analysis

Once a new person applies for a loan, your team collects the info about Employment status and loan amount, and plots that on the graph. Depending upon which side of the line the new applicant falls – defaulter side or non-defaulter side — your team can take the decision on whether to give the loan or not.

The job of any AI/ML algorithm would be to arrive at that line/model. 

Since these models distinguish or discriminate between 2 or more classes, they are called Linear Discriminants. 

2 important points:

  1. Again, this is a very simplistic example with just 2 variables. In reality, there will be a large number of variables considered for such modeling. 
  2. Also, practically speaking, your team does not need to plot such graphs (indeed it is impractical when there are several variables). Instead, they will resort to ML code/tools for this analysis. But this explanation should give an idea about what happens under the hood.

Discriminants are an example of Discriminative AI/ML that we referred to, in the previous posts. Another type of Discriminative AI/ML algorithms, are called Logical Discriminants. Decisions Trees and their variations are examples of such Logical Discriminants.

Unraveling Decision Trees: Splitting Data for Loan Approval

Decision Trees and their ensembles/variations are one of the most widely used algorithms in AI. Let’s use the same business example as above, for understanding Decision Trees.

The Decision Tree algorithm’s job is to figure out which variable — Employment status or Amount — can be used to split the data into 2 clean groups — Defaulters and non-defaulters. 

Important: For the Linear Discriminant model above, you had to consider both the variables Employment Status and Amount, together, for building the model. Whereas in the case of Decision Trees, you would consider these variables one at a time. 

Let’s say the data is split into 2 groups based on Employment status. In other words, the variable Employment Status is used to discriminate the data into 2 groups (hence the name Logical Discriminants). Sure, you will very likely not get 2 clean groups. The group of Employed folks, for example, will likely have both defaulters and non-defaulters (hopefully more of the latter). 

Now the algorithm will take each of the above 2 groups and further divide them based on the second variable, Amount. The group where Amount is greater than some value will have Defaulters and non-defaulters, So also the other group. 

Basically, the job of the Decision Tree algorithm is to use each variable to keep breaking the data into Defaulters and non-defaulters, thus forming a tree, a Decision Tree.

When a new person applies for a loan, he/she is placed into the appropriate group based on the values of Employment Status and Amount. If that group happens to be the non-defaulter group, great, the person is given the loan.

Summary

A Linear Discriminant creates one single divider line/plane to segregate the data into relatively clean classes, using a single numeric formula.

A Logical Discriminant (e.g. Decision Trees) will break the data into logical groups based on the values of the variables, till it gets near-clean groups. 

Both these are examples of Discriminative AI/ML. They focus on the boundaries of the classes. The question that they ask is — What is the probability of getting a particular value of the target variable, given a set of values of the independent variables? That is, they look at Conditional Probability, as discussed earlier.

Jayaprakash Nair

Jayaprakash Nair

Latest Reads

Subscribe

Suggested Reading

Ready to Unlock Yours Enterprise's Full Potential?

Adaptive Clinical Trial Designs: Modify trials based on interim results for faster identification of effective drugs.Identify effective drugs faster with data analytics and machine learning algorithms to analyze interim trial results and modify.
Real-World Evidence (RWE) Integration: Supplement trial data with real-world insights for drug effectiveness and safety.Supplement trial data with real-world insights for drug effectiveness and safety.
Biomarker Identification and Validation: Validate biomarkers predicting treatment response for targeted therapies.Utilize bioinformatics and computational biology to validate biomarkers predicting treatment response for targeted therapies.
Collaborative Clinical Research Networks: Establish networks for better patient recruitment and data sharing.Leverage cloud-based platforms and collaborative software to establish networks for better patient recruitment and data sharing.
Master Protocols and Basket Trials: Evaluate multiple drugs in one trial for efficient drug development.Implement electronic data capture systems and digital platforms to efficiently manage and evaluate multiple drugs or drug combinations within a single trial, enabling more streamlined drug development
Remote and Decentralized Trials: Embrace virtual trials for broader patient participation.Embrace telemedicine, virtual monitoring, and digital health tools to conduct remote and decentralized trials, allowing patients to participate from home and reducing the need for frequent in-person visits
Patient-Centric Trials: Design trials with patient needs in mind for better recruitment and retention.Develop patient-centric mobile apps and web portals that provide trial information, virtual support groups, and patient-reported outcome tracking to enhance patient engagement, recruitment, and retention
Regulatory Engagement and Expedited Review Pathways: Engage regulators early for faster approvals.Utilize digital communication tools to engage regulatory agencies early in the drug development process, enabling faster feedback and exploration of expedited review pathways for accelerated approvals
Companion Diagnostics Development: Develop diagnostics for targeted recruitment and personalized treatment.Implement bioinformatics and genomics technologies to develop companion diagnostics that can identify patient subpopulations likely to benefit from the drug, aiding in targeted recruitment and personalized treatment
Data Standardization and Interoperability: Ensure seamless data exchange among research sites.Utilize interoperable electronic health record systems and health data standards to ensure seamless data exchange among different research sites, promoting efficient data aggregation and analysis
Use of AI and Predictive Analytics: Apply AI for drug candidate identification and data analysis.Leverage AI algorithms and predictive analytics to analyze large datasets, identify potential drug candidates, optimize trial designs, and predict treatment outcomes, accelerating the drug development process
R&D Investments: Improve the drug or expand indicationsUtilize computational modelling and simulation techniques to accelerate drug discovery and optimize drug development processes