Gemma-3n-Swahili: On-Device Swahili AI Assistant

Bringing state-of-the-art Swahili LLMs to 200+ million Swahili speakers

Hugging Face Collection GitHub Repository

Introduction

Despite being spoken by over 200 million people across Sub-Saharan Africa, current AI models struggle with Swahili comprehension and instruction following. These limitations manifest in several critical ways: models fail to maintain cultural understanding essential for meaningful communication, cannot handle the code-switching that characterizes modern Swahili usage, lack specialized vocabulary for emerging technological concepts, and demonstrate poor performance in both technical and creative writing tasks in Swahili. This gap in language support effectively excludes a significant portion of the African continent from accessing advanced AI capabilities in their native language.

The Model Suite

We are excited to announce the release of Gemma-3n0Swahili preview, a comprehensive suite of multimodal Gemma-3n language models specifically adapted for Swahili language understanding and generation.

Our model suite includes:

Gemma-3n-Swahili-E2B-it
Gemma-3n-Swahili-E4B-it

Dataset and Training

We leveraged the Bactrian-X dataset, carefully extracting and validating high-quality Swahili instruction-response pairs. From the original 67,017 pairs, we fine-tuned on 10,000 samples selected for quality and diversity.

Key dataset characteristics:

Diverse content spanning technical, creative, and professional domains
Natural language flow with appropriate cultural context
Content distribution across instructional queries, analysis tasks, creative writing, technical content, and cultural knowledge

Key Capabilities

Pure Swahili Generation: Maintains language consistency without code-switching unless requested
Cultural Understanding: Preserves East African context and idiomatic expressions
Technical Proficiency: Handles technical and professional content in Swahili
Instruction Following: Superior understanding and execution of complex instructions
Creative Writing: Generates stories, poems, and creative content with natural flow
On-Device: The models can run locally, a feature useful for both maximizing user privacy and accessibility in internet constrained locations

Sample Response Comparison

Prompt: Mwanafunzi ana kalamu 7 zaidi ya daftari. Ana kalamu 15. Ana daftari ngapi?

Base Model

gemma-3n-E2B-it-Q8_0

Haya, twafanya hatua kwa hatua:

Tujue mwanafunzi ana kalamu ngapi:
Mwanafunzi ana kalamu 15.

Tujue ana kalamu ngapi zaidi ya daftari:
Ana kalamu 7 zaidi ya daftari.

Tupange equation:
Kila daftari lina kalamu 7 kidogo. Kuna kalamu 15, na anahitaji kalamu 7 zaidi ya daftari. Hiyo inamaanisha daftari ni: 15 - 7 = 8.

Jibu:
Mwanafunzi ana daftari 8.

Naf sikuza:
Mwanafunzi ana kalamu 15, na anahitaji kalamu 7 zaidi ya daftari. Hiyo inamaanisha kuwa daftari lina kalamu 15 - 7 = 8.

Our Model

gemma-3n-swahili-E2B-it-Q8_0

Hatua 1: Mwanafunzi ana kalamu 7 zaidi ya daftari lake.

Hatua 2: Mwanafunzi ana kalamu 15.

Hatua 3: Mwanafunzi ana kalamu 7 zaidi ya daftari lake, kwa hiyo daftari lake linakuwa 15 - 7 = 8.

Hatua 4: Mwanafunzi ana daftari ngapi?

Jibu: Mwanafunzi ana daftari 8.

Notice how the base model mixes languages ("equation" instead of "mlingano") and creates non-existent Swahili words like "Naf sikuza". Our fine-tuned model maintains pure Swahili throughout while providing clear, logical steps to solve the problem.

Performance Benchmarks

Swahili MMLU

We evaluated on 500 samples from the Swahili MMLU benchmark, measuring comprehension across diverse subjects:

Translation Benchmarks

Using 50 samples split evenly between Swahili-to-English and English-to-Swahili translation tasks, our models show significant improvements:

The benchmarks demonstrate transformative gains in cross-lingual understanding, with perplexity scores improving from 70.5 to 21.6 for English-to-Swahili translation, representing a 3x improvement in translation quality.

Mobile Demo

For detailed instructions on running these models locally on mobile apps or desktop, please visit our GitHub repository.

Conclusion

This is a preview release fine-tuned on 10,000 samples. We will be fine-tuning on the complete dataset and making further improvements in the coming weeks to deliver even better Swahili language understanding and generation capabilities.

We believe that in the near future very powerful models generalized for any applications will be able to run locally on-device, maximizing both human privacy and agency, as well as enhancing human intelligence augmentation.