Identification and Categorization of Stop Words in Sanskrit Using Natural Language Processing (NLP) Approaches

doi:N/A

Advances in Consumer Research

Issue 2 : 528-535

Original Article

Identification and Categorization of Stop Words in Sanskrit Using Natural Language Processing (NLP) Approaches

Manisha D Mistry

Nirali Dave

Dikshan N. Shah

Teaching Assignment, Vanita Vishram Women's University

Dean Faculty of Computer Science, Vanita Vishram Women's University

Assistant Professor Kaushalya - The Skill University

Abstract

The goal of this research is to use Natural Language Processing (NLP) to create a systematic and computationally sound method for recognising and categorising stop words in Sanskrit. Finding function words like conjunctions, negations, and discourse markers is essential for precise computational linguistics tasks like parsing, translation, and information retrieval because of Sanskrit's intricate morphology and syntactic structure.More than 100 high-frequency functional terms were taken from digital archives and traditional Sanskrit manuscripts. Two primary methods were employed: a statistical model that ranked and validated word frequency using Zipf's Law, and a rule-based linguistic approach based on Paninian grammar and POS tagging. To guarantee linguistic accuracy, tools like morphological analysers and Sanskrit-specific taggers were included and then expertly validated. Metrics including precision, recall, and F1-score were used for evaluation.

Both approaches produced useful but complementary outcomes. While Zipf's Law improved memory by finding statistically significant function words, the rule-based method offered great accuracy. A standardised, machine-readable list of Sanskrit stop words that is compatible with contemporary NLP processes and arranged according to grammatical functions was produced as a consequence of the hybrid approach.This work is one of the first to systematically combine statistical modelling for the categorisation of Sanskrit stop words with linguistic theory. It opens up new possibilities for machine translation, voice recognition, and semantic analysis in classical Indian languages by offering a domain-specific, verified natural language processing resource designed for an ancient language..

Keywords

Sanskrit Stop Words

Natural Language Processing (NLP)

Text Pre-processing

Sanskrit Computational Linguistics

Stop Word Classification

Recommended Articles

Original Article

Design and Implementation of Intelligent Autonomous Agents for Data Validation, Orchestration, and Cost Optimization

Soma Sekhar Gaddipati,

Siva Gandikota

Download Read Article

Original Article

Clinicobiochemical and Metabolic Associations of Polycystic Ovary Syndrome with Dermatological Manifestations and Renal Function Alteration among Reproductive-Age Women

Enas M.A Elzeity,

...

Mubasherah Jamil