01

Low-Resource Machine Translation

In production

Fine-tuning multilingual foundation models (NLLB) for translation between English and underserved Indian languages. Our evaluation framework uses held-out test sets co-built with native speakers, measuring BLEU, chrF, and human evaluation metrics.

Khasi machine translation is our flagship research track, currently shipping at BLEU 48.0 through Bha-Kha V4 on Bhasaflow, a Medharvix Systems platform.

02

Evaluation Infrastructure

Active

Building structured evaluation pipelines for low-resource language AI. We treat evaluation as a first-class research problem: curating benchmarks, designing human evaluation protocols, and creating test sets that reflect real-world language use rather than synthetic data.

03

Speech Systems for Indian Languages

Under research

Developing text-to-speech and automatic speech recognition systems for languages with limited existing coverage. Our speech work addresses acoustic modelling, phonological adaptation, and prosody for languages like Khasi where standard speech models underperform.

Khasi TTS is in active research. Khasi ASR is under development.

04

Document Intelligence and OCR

Under research

Research on reading and understanding printed and handwritten text in Indian scripts. This work supports digitisation of institutional records, preservation of historical documents, and extraction of structured information from unstructured visual inputs.

05

Multilingual Pipeline Architecture

Active

Designing modular AI pipelines that combine translation, speech, and document understanding into integrated systems. This research underpins the Bhasaflow platform architecture and enables us to add new languages and modalities without rebuilding from scratch.

Research collaboration.

We are open to research partnerships with universities, language communities, and institutions working on low-resource AI, multilingual systems, or Indian language technology.