ID: 2427030730

LegalBERT Based Multi-Label IPC Section Recommendation Engine

"To design and evaluate a LegalBERT based multi-label classification system for automatically recommending relevant IPC sections from unstructured crime descriptions."

Problem Statement

Determining the correct IPC sections from unstructured crime descriptions is a complex and manual process that requires legal expertise. The absence of an automated, context-aware system often leads to delays, inconsistencies, and potential misclassification. This project aims to address this challenge by developing a legal text classification system for accurate and systematic IPC section recommendation.

Literature Review / Market Research

LEGAL-BERT: The Muppets straight out of Law School (Chalkidis et al., 2020) – Introduced domain specific LegalBERT, demonstrating improved performance over generic BERT in legal tasks.

EURLEX57K Multi-Label Legal Classification Benchmark – Established large scale legal multi-label classification standards using transformer models.

InLegalBERT (LAW-AI) – Domain-adapted transformer model trained on Indian legal corpora, improving statute prediction in Indian context.

Research Gap / Innovation

Unlike most existing legal NLP studies that evaluate models on a single dataset, this project introduces a cross-dataset generalization framework by training on the large-scale ILSI dataset (66,074 samples, 98 IPC sections) and evaluating on the unseen NyayaAnumana dataset.

The approach focuses specifically on the Indian legal domain with full IPC coverage, using real unstructured crime descriptions rather than curated text segments.

With detailed metric analysis (Precision, Recall, Micro-F1), the work emphasizes high precision legal recommendation while identifying areas for future recall optimization.

System Methodology

Dataset / Input

• Training Data: ILSI/LSI – 42,835 cases (66,074 samples)
• Labels: 98 IPC sections (multi-label classification)
• Unseen Evaluation: IndianBailJudgments (~1,200 cases)
• Processing Pipeline: Text cleaning → LegalBERT tokenization → Multi-label encoding → Train/Validation split

Model / Architecture

• LegalBERT fine tuned for 98 label IPC multi-label classification.
• Crime descriptions encoded using transformer based contextual embeddings.
• Sigmoid activation layer enables independent multi-label prediction.
• Evaluated using Micro F1, Precision and Recall with cross dataset testing.

Live Execution

VIEW CODE / DEMO

Results & Analysis

Precision 83.02%

Quantifiable outcomes and evaluation metrics compared to baselines.

Academic Credits

Project Guide

Ms. Neha

Team Member 1

Pankhudi Raina

2427030730