Source Attribution for RAG

October 2025

Final project for the Trustworthy Deep Learning class (CPSC 5710). We explored various different approaches to perform source attribution for RAG systems i.e. identifying the most relevant informational sources used by an LLM to generate an answer in a RAG system. We implemented and evaluated six source attribution methods comparing Shapley-based approaches (Leave-One-Out, Monte Carlo, Permutation) with white-box methods (gradient, integrated gradients, attention-based) using Llama-3.2-1B.

I worked on the white-box attribution methods, which leverage model internals like gradients and attention patterns for single-pass attribution. The full writeup is available here.