AI4Contracts: LLM & RAG-Powered Encoding of Financial Derivative Contracts

AI4Contracts: LLM & RAG-Powered Encoding of Financial Derivative Contracts

Maruf Ahmed Mridul, Ian Sloyan, Aparna Gupta, Oshani Seneviratne

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI4Tech: AI Enabling Technologies. Pages 9305-9312. https://doi.org/10.24963/ijcai.2025/1034

Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) are reshaping how AI systems extract and organize information from unstructured text. A key challenge is designing AI methods that can incrementally extract, structure, and validate information while preserving hierarchical and contextual relationships. We introduce CDMizer, a template driven, LLM, and RAG-based framework for structured text transformation. By leveraging depth-based retrieval and hierarchical generation, CDMizer ensures a controlled, modular process that aligns generated outputs with predefined schemas. Its template-driven approach guarantees syntactic correctness, schema adherence, and improved scalability, addressing key limitations of direct generation methods. Additionally, we propose an LLM-powered evaluation framework to assess the completeness and accuracy of structured representations. Demonstrated in the transformation of Over-the-Counter (OTC) financial derivative contracts into the Common Domain Model (CDM), CDMizer establishes a scalable foundation for AI-driven document understanding, structured synthesis, and automated validation in broader contexts.
Keywords:
Domain-specific AI4Tech: AI4Finance
Advanced AI4Tech: Generative and LLMs-driven AI4Tech