Case Study

Payment Recovery ML (End-to-End System)

Predicting which failed / unpaid card transactions are likely to be recovered, and prioritising collections outreach by expected recovered revenue.

Business Context

The business had a large volume of failed card payments and unpaid invoices handled manually by the credit control team. Every day, new failed transactions came in, and the team had to decide who to call or email first with limited time.

The goal was to use machine learning to estimate the probability of recovery for each failed transaction, and then rank them by expected recovered amount (amount × probability). This allows the team to focus effort where it generates the most value.

Data & Feature Engineering

I combined transactional and customer data to build features around:

  • Payment amount, currency, provider, and card type
  • Historical behaviour: past successful / failed payments per customer
  • Time since invoice, invoice age, and days outstanding
  • Aggregated risk signals (e.g. previous write-offs, chargebacks, disputes)

The features were engineered in SQL, then exported to Python for modelling. Care was taken to avoid target leakage by using only information available at the time of the failure.

Modelling Approach

I chose a Logistic Regression classifier as a strong, interpretable baseline and focused on building a well-calibrated model rather than a black-box with marginally higher raw accuracy.

  • Train / validation split with time-aware folds
  • Evaluation focused on PR AUC due to class imbalance
  • Brier Score and calibration curves to validate probability estimates
  • Lift analysis to understand business impact by decile (top X% of scored accounts)

The final model produced well-calibrated probabilities that can be used directly to compute expected recovered revenue per transaction.

Business Impact & Usage

In the Streamlit app, each failed transaction appears with:

  • Probability of recovery (0–100%)
  • Amount and expected recovered revenue (amount × probability)
  • Ranked priority within the daily worklist

This allows the credit control team to:

  • Focus on accounts with both high probability and high amount
  • Quickly see “easy wins” vs. low-probability / low-value cases
  • Better justify prioritisation decisions with data, not intuition alone
← Back to all projectsNext project: Finance Collections & DSO Forecasting →