Projects

Language Identification Analysis

PythonNLPNaive BayesTF-IDF

Comparative study of language classifiers for Hindi/Marathi Devanagari text identification (CL2 course project).

  • Multiple feature extraction: character frequency, word length, morphological analysis, n-grams, POS tagging, TF-IDF
  • Naive Bayes classifiers comparing linguistic features vs baseline n-gram models