Developed through training on 370 billion tokens – 330 billion of which are Arabic – the new LLM represents the largest Arabic dataset ever used for an open-source foundational model