G42's Inception launches JAIS 70B, 20 other models
Now Reading
Democratising AI: Here’s how G42’s new LLM JAIS 70B will help

Democratising AI: Here’s how G42’s new LLM JAIS 70B will help

Developed through training on 370 billion tokens – 330 billion of which are Arabic – the new LLM represents the largest Arabic dataset ever used for an open-source foundational model

Gulf Business
inception launches Jais 70b LLM GettyImages-1495305831-e1710226804270

Inception, a subsidiary of the G42 group specialising in advanced AI models and applications, has launched its latest large language model (LLM), JAIS 70B.

This 70-billion parameter model is designed to enhance Arabic-based natural language processing (NLP) solutions, aiming to accelerate the integration of Generative AI services across industries such as customer service, content creation, and data analysis.

JAIS 70B offers unparalleled Arabic-English bilingual capabilities, making it a significant milestone for the open-source community.

The model boasts an enhanced capacity for handling complex tasks and processing intricate datasets.

Developed through continuous training on 370 billion tokens – 330 billion of which are Arabic – JAIS 70B represents the largest Arabic dataset ever used for an open-source foundational model.

In addition to JAIS 70B, Inception has introduced a comprehensive suite of JAIS foundation and fine-tuned models, encompassing 20 models across eight sizes, ranging from 590 million to 70 billion parameters.

These models, fine-tuned for chat applications and trained on up to 1.6 trillion tokens of Arabic, English and code data, address feedback from the Arabic NLP community.

The suite includes the first Arabic-centric model small enough to run on a laptop, offering both compute-efficient models for specific applications and advanced models for enterprise-level precision.

Dr Andrew Jackson, CEO of Inception, emphasised the significance of the release: “AI is now a proven value-adding force and large language models have been at the forefront of the AI adoption spike. JAIS was created to preserve Arabic heritage, culture, and language, and to democratise access to AI. Releasing JAIS 70B and this new family of models reinforces our commitment to delivering the highest quality AI foundation model for Arabic-speaking nations.”

Series of JAIS versions launched

Inception’s  previous releases include JAIS-13B and JAIS-13B-chat in August 2023, followed by JAIS-30B and JAIS-30B-chat models.

The newly launched JAIS 70B and JAIS 70B-chat models have demonstrated superior performance in benchmarking data in both English and Arabic compared to their predecessors.

Neha Sengupta, principal applied scientist at Inception, highlighted the efficiency gains achieved with JAIS 70B: “For models up to 30 billion parameters, we successfully trained JAIS from scratch, consistently outperforming adapted models in the community. However, for models with 70 billion parameters and above, the computational complexity and environmental impact of training from scratch were significant.

“We chose to build JAIS 70B on the Llama2 model, allowing us to leverage the extensive knowledge base of an existing English model and develop a more efficient and sustainable solution.”

The LLM retains and, in specific cases, exceeds the high-quality English-language processing capabilities of Llama2 while vastly excelling in Arabic outputs.

Inception’s development team expanded the Llama2 tokeniser to enhance Arabic text processing efficiency, doubling the model’s base vocabulary.

Sengupta noted that this approach “splits Arabic words less aggressively and makes training and inferencing cheaper” compared to the standard Llama2 model.

You might also like


© 2021 MOTIVATE MEDIA GROUP. ALL RIGHTS RESERVED.

Scroll To Top
<