Research & Open Source
Omneity Labs is a private R&D lab focused on low-resource language AI. It produces open-source tools, datasets, and models covering 340+ languages: the data pipelines, tokenization methods, and evaluation infrastructure that don't yet exist for underrepresented languages.
Publications
Sawtone: A universal framework for phonetic similarity and alignment across languages and scripts
Lingua Posnaniensis, Vol. 67, Issue 1, pp. 165-200 (2025)
Processing text across different scripts presents significant hurdles in natural language processing, especially when dealing with non-standardized orthographies and informal writing systems common in low-resource languages. To address this, we introduce Sawtone, an integrated framework designed to enable consistent cross-script phonetic alignment and text normalization. At its heart is an architecture built for interoperability, combining a unified phonological feature space rooted in linguistic principles with modular, language-specific adapters. This structure allows for robust mapping and comparison between any pair of scripts.
View publicationConference Presentations
Moroccan Darija and Generative AI
7th International Congress for Moroccan Arabic
University of Navarra, Spain
TIM'24 Presentation
TIM'24 Conference
University Hassan II, Morocco
Open Source Projects
Low-Resource Language Data
NLP Tooling
LLM Training
Dev Tooling
Interested in collaboration?
Research partnerships, compute collaborations, or contributing to low-resource language AI.
Get in touch