The Center for Research in Urdu Language Processing (CRULP)

The Center for Research in Urdu Language Processing (CRULP) is first of its kind in Pakistan. The Center’s main objective is to conduct research for the evolution of computational models of Urdu and Pakistan’s other regional languages. The research at CRULP is carried within the context of projects, each having a well defined list of deliverables. Current projects being conducted at CRULP are:

PAN Localization Project

This project is an initiative of International Development Research Center (IDRC), Canada and the Center for Research in Urdu Language Processing (CRULP). The Objective of this project is to build local language computing capacity in regional institutions of Asia. Phase II of PAN Localization project will research into challenges associated with digital literacy of end-users using the localized technology for communication and to produce local language content. The project will also continue to further mature the language technology in the target languages. This project is led by researchers at CRULP, NUCES. CRULP will be coordinating efforts across Asia through ICT researchers, practitioners, linguists and policymakers from government agencies, universities and the private sector. The countries (and languages) included in the second phase of the project are Afghanistan (Pashto), Bangladesh (Bangla), Bhutan (Dzongkha), Cambodia (Khmer), China (Tibetan), Laos (Lao), Mongolia (Mongolian), Nepal (Nepali), Pakistan (Urdu) and Sri Lanka (Sinhala, Tamil).
http://www.panl10n.net

PAN Localization Pakistan Component

This is the Pakistan Country Component of the PAN Localization Project. This project consists of three main activities:
1. Urdu localization of open source software
2. Developing training material and imparting ICT literacy training for using selected localized open source software and
3. Evaluation of the imparted ICT literacy training.

PAN Localization Pakistan Component

Microsoft Vista Urdu Language Interface Pack

The main objective of this project is to develop a Language Interface Package (LIP) for Microsoft Windows Vista and Microsoft Office . This language pack will provide Urdu language interface for Microsoft's upcoming Vista operating system. It will also enable the general masses to use Microsoft Windows and Microsoft Office in Urdu language, who cannot understand English. For this purpose, the scope of the project includes translation of 300,000 English words in Urdu. Microsoft provides the translation tools (i.e. LocStudio) and translation files (i.e. *.edb). The translation is done at CRULP end. This project is sponsored by Microsoft Corporation, USA.

Lexicon for Urdu Language

This project aims to develop a lexicon for Urdu language for Nokia. This lexicon will be used for future development of speech and language technology. This project includes the development of a lexicon of commonly used words in Urdu, some domain specific words and proper nouns. The lexicon will also contain basic grammatical and pronunciation information of these words, and will provide almost complete corpus (and language) coverage. The lexicon will be the fundamental building block for other applications in script, speech and language technologies, to be developed in the future, including basic user services (e.g., SMS support, address book) to more advanced user assistance applications (e.g., text-to-speech, speech recognition, spoken language translation and handwriting recognition technologies). Nokia has already indicated that follow-up work on Urdu speech synthesis will be undertaken using this lexicon (based on unit selection technique, which CRULP has not yet done). This project is sponsored by Nokia Research, Beijing, China.

Urdu Localization Project

The Urdu localization project envisages bringing the benefits of information age to vast majority of Pakistan which are not literate in English, the lingua franca of Internet, and thus are deprived of the immense possibilities offered by this revolution. It will also usher the Urdu language, the national language of Pakistan spoken and understood by masses, to the information age.
Urdu Localization Project

Urdu Component Development

SpellChecker, Collation and Normalization are basic language utilities. The purpose of this project is to provide APIs for these utilities for Urdu language. SpellChecker utility will check words for spelling errors and will suggest a ranked list of words if a spelling error is found. Collation utility will provide a language sensitive comparison of two strings with respect to sorting. Normalization is a process to convert multiple equivalent representations of data to consistent underlying normal forms.