BERT large language model named “Tooka” was developed for the first time in Iran in Farsi language by Part Artificial Intelligence Research Center and with the aim of facilitating the development of smart products, it has been made available to enthusiasts and businesses in an open source form. In addition to the Large version of this language model, the Part knowledge group has also provided its basic version in open source to keep the artificial intelligence ecosystem of the country at the edge of the world’s technologies.
According to Iran digital economy annotation, this language model, with a data volume of 500 GB equivalent to 90 billion tokens, is known as one of the most optimal models for use in Persian language services, and due to its accuracy and high quality, it ranks first compared to other similar Persian models. Also, since the Tooka language model provides the ability to run and train on less powerful hardware, it is considered the most suitable option for research purposes, personal developers, and start-up businesses.
Releasing an open source version of this language model allows businesses and programmers to easily and freely access and collaborate with other developers. In addition, it creates an opportunity so that, on the one hand, developers and programmers can develop more powerful and practical tools for the Farsi-speaking audience, and on the other hand, businesses have the possibility to create customized products that meet the needs of their users, and as a result All components of Iran’s technological ecosystem experience significant growth and progress together and in sync with each other.
It is worth mentioning that, over the past years, the “Tooka” language model has been used in PART products such as the smart cloud services of Sahab, the voice-to-text conversion service “Avanegar”, the text-to-voice conversion service “Avasho”, the smart chatbot “Danabot” and the image-to-text conversion service “Nevisenegar” has been used and it has shown great performance and has provided the context for the use of interactive smart tools for millions of Persian language users. However, these days, Part’s knowledge-based group has published news about the development of a large Dorna language model with 13 billion parameters, and it is expected that this year, we will see commercial uses of the large Dorna model in the products of this knowledge-based company.
No Comment! Be the first one.