Developed through training on 370 billion tokens – 330 billion of which are Arabic – the new LLM represents the largest Arabic dataset ever used for an open-source foundational model
At its core lies Jais 30B, touted as the world’s most performant Arabic Large Language Model