LegalBERT Based Multi-Label IPC Section Recommendation Engine
"To design and evaluate a LegalBERT based multi-label classification system for automatically recommending relevant IPC sections from unstructured crime descriptions."
Problem Statement
Determining the correct IPC sections from unstructured crime descriptions is a complex and manual process that requires legal expertise. The absence of an automated, context-aware system often leads to delays, inconsistencies, and potential misclassification. This project aims to address this challenge by developing a legal text classification system for accurate and systematic IPC section recommendation.
Literature Review / Market Research
LEGAL-BERT: The Muppets straight out of Law School (Chalkidis et al., 2020) – Introduced domain specific LegalBERT, demonstrating improved performance over generic BERT in legal tasks.
EURLEX57K Multi-Label Legal Classification Benchmark – Established large scale legal multi-label classification standards using transformer models.
InLegalBERT (LAW-AI) – Domain-adapted transformer model trained on Indian legal corpora, improving statute prediction in Indian context.
Research Gap / Innovation
Unlike most existing legal NLP studies that evaluate models on a single dataset, this project introduces a
cross-dataset generalization framework by training on the large-scale ILSI dataset
(66,074 samples, 98 IPC sections) and evaluating on the unseen NyayaAnumana dataset.
The approach focuses specifically on the Indian legal domain with full IPC coverage,
using real unstructured crime descriptions rather than curated text segments.
With detailed metric analysis (Precision, Recall, Micro-F1), the work emphasizes
high precision legal recommendation while identifying areas for future recall optimization.
System Methodology
Dataset / Input
• Training Data: ILSI/LSI – 42,835 cases (66,074 samples)
• Labels: 98 IPC sections (multi-label classification)
• Unseen Evaluation: IndianBailJudgments (~1,200 cases)
• Processing Pipeline: Text cleaning → LegalBERT tokenization → Multi-label encoding → Train/Validation split
Model / Architecture
• LegalBERT fine tuned for 98 label IPC multi-label classification.
• Crime descriptions encoded using transformer based contextual embeddings.
• Sigmoid activation layer enables independent multi-label prediction.
• Evaluated using Micro F1, Precision and Recall with cross dataset testing.
Live Execution
VIEW CODE / DEMOResults & Analysis
Quantifiable outcomes and evaluation metrics compared to baselines.
Academic Credits
Project Guide
Ms. Neha
Team Member 1
Pankhudi Raina
2427030730