The Center for Research in Urdu Language Processing (CRULP)

The Center for Research in Urdu Language Processing (CRULP) is first of its kind in Pakistan. The Center’s main objective is to conduct research for the evolution of computational models of Urdu and Pakistan’s other regional languages. The research at CRULP is carried within the context of projects, each having a well defined list of deliverables. Current projects being conducted at CRULP are:

PAN Localization Project

This project is an initiative of International Development Research Center (IDRC), Canada and the Center for Research in Urdu Language Processing (CRULP). The Objective of this project is to build local language computing capacity in regional institutions of Asia. Phase II of PAN Localization project will research into challenges associated with digital literacy of end-users using the localized technology for communication and to produce local language content. The project will also continue to further mature the language technology in the target languages. This project is led by researchers at CRULP, NUCES. CRULP will be coordinating efforts across Asia through ICT researchers, practitioners, linguists and policymakers from government agencies, universities and the private sector. The countries (and languages) included in the second phase of the project are Afghanistan (Pashto), Bangladesh (Bangla), Bhutan (Dzongkha), Cambodia (Khmer), China (Tibetan), Laos (Lao), Mongolia (Mongolian), Nepal (Nepali), Pakistan (Urdu) and Sri Lanka (Sinhala, Tamil).
http://www.panl10n.net

PAN Localization Pakistan Component

This is the Pakistan Country Component of the PAN Localization Project. This project consists of three main activities:
1. Urdu localization of open source software
2. Developing training material and imparting ICT literacy training for using selected localized open source software and
3. Evaluation of the imparted ICT literacy training.

PAN Localization Pakistan Component

Microsoft Vista Urdu Language Interface Pack

The main objective of this project is to develop a Language Interface Package (LIP) for Microsoft Windows Vista and Microsoft Office . This language pack will provide Urdu language interface for Microsoft's upcoming Vista operating system. It will also enable the general masses to use Microsoft Windows and Microsoft Office in Urdu language, who cannot understand English. For this purpose, the scope of the project includes translation of 300,000 English words in Urdu. Microsoft provides the translation tools (i.e. LocStudio) and translation files (i.e. *.edb). The translation is done at CRULP end. This project is sponsored by Microsoft Corporation, USA.

Lexicon for Urdu Language

This project aims to develop a lexicon for Urdu language for Nokia. This lexicon will be used for future development of speech and language technology. This project includes the development of a lexicon of commonly used words in Urdu, some domain specific words and proper nouns. The lexicon will also contain basic grammatical and pronunciation information of these words, and will provide almost complete corpus (and language) coverage. The lexicon will be the fundamental building block for other applications in script, speech and language technologies, to be developed in the future, including basic user services (e.g., SMS support, address book) to more advanced user assistance applications (e.g., text-to-speech, speech recognition, spoken language translation and handwriting recognition technologies). Nokia has already indicated that follow-up work on Urdu speech synthesis will be undertaken using this lexicon (based on unit selection technique, which CRULP has not yet done). This project is sponsored by Nokia Research, Beijing, China.

Urdu Localization Project

The Urdu localization project envisages bringing the benefits of information age to vast majority of Pakistan which are not literate in English, the lingua franca of Internet, and thus are deprived of the immense possibilities offered by this revolution. It will also usher the Urdu language, the national language of Pakistan spoken and understood by masses, to the information age.
Urdu Localization Project

Urdu Component Development

SpellChecker, Collation and Normalization are basic language utilities. The purpose of this project is to provide APIs for these utilities for Urdu language. SpellChecker utility will check words for spelling errors and will suggest a ranked list of words if a spelling error is found. Collation utility will provide a language sensitive comparison of two strings with respect to sorting. Normalization is a process to convert multiple equivalent representations of data to consistent underlying normal forms.

Urdu Components



Urdu Components
Text to Speech
Urdu Spell Checker Utility v1.0 Urdu Text to Speech System
Urdu Collation Utility v1.0
Urdu Normalization Utility v1.0
Urdu POS Tagger Machine Translation
Annotator English to Urdu Machine Translation System
Urdu Part of Speech Tagset
Statistical Part of Speech Tagger for Urdu v1.0
IPA to SAMPA Morphological Analyzer
Urdu Letters to IPA and IPA to SAMPA Urdu Finite-State Morphological Analyzer
Transliteration
Hindi to Urdu Transliterator

Urdu Localization Terminology Glossary


Urdu Localization Terminology Glossary

The Urdu Localization Terminology Glossary is the glossary being developed and used at CRULP by the Pan Localization Project Pakistan Country Component team for the localization of open source software. It is based on the Electronic Dictionary of Localization of Computer Applications (English-Urdu), 2005 by the Center of Excellence for Urdu Informatics, National Language Authority, Islamabad (Pakistan) . Other resources used in the development of this glossary include:

Existing localized open source software:
Mozilla Urdu Language Pack
OpenOffice, FireFox & Thunderbird (for Urdu-India)

Online technical terminology translations:
Urdu Word Bank
Urdu Dictionary

Dictionaries:
All major English to Urdu translation dictionaries have also been consulted in the process, e.g. Qaumi English-Urdu Dictionary published by National Language Authority of Pakistan.

For more details about the glossary, please see the localization process report.

The glossary is available online and also in the form of a tab delimited, utf-8 encoded text file. This glossary is updated regularly.

Online Urdu Localization Terminology Glossary
Download
Urdu Localization Terminology Glossary (tab delimited, utf-8 encoded text file)
Updated: February 07, 2008.
License Details:

A major portion of this glossary is based on the Electronic Dictionary of Localization of Computer Applications (English-Urdu) developed by the National Language Authority. The complete glossary is available here but the part of the glossary that is derived from the NLA is under NLA copyrights, and terminology translations that have been added by our team are released under the Creative Commons License.

The tags "NLA" and "EXT" are used in both the released forms to differentiate between translations taken from the NLA (NLA) dictionary and those added by the team (EXT).

Center for Research in Urdu Language Processing

Center for Research in Urdu Language Processing (CRULP) is pleased to release Urdu-Nepali-English Parallel Corpus. 29/09/2008

Center for Research in Urdu Language Processing is pleased to announce workshop on First Internationalized Domain Name for Pakistani Languages. 19/04/2008

Center for Research in Urdu Language Processing is pleased to release the source code (VOLT project) of Nafees Tahreer Naskh. 18/03/2008

Center for Research in Urdu Language Processing is pleased to release Urdu Games . 04/03/2008

Center for Research in Urdu Language Processing carried out the identification of Urdu Closed Class Words List to support further localised researches. 02/03/2008

Open Source Software Urdu Localization: Urdu Localization Terminology Glossary updated. 07/02/2008

Center for Research in Urdu Language Processing is pleased to release the source code (VOLT project) of Nafees Fonts. 08/11/2007

Urdu Components Development: Applications and Source Code of Urdu Collation, Urdu Normalization and Urdu SpellChecker Utilities are released. 01/11/2007

Open Source Software Urdu Localization: Windows Installer for Urdu NVu released. 01/11/2007

Urdu Localization: Updated version of Phonetic Keyboard released. 23/10/2007

Open Source Software Urdu Localization: SeaMonkey Urdu Language Pack released. 10/10/2007

Center for Research in Urdu Language Processing is pleased to announce beta release of character based Nafees Riqa OpenType font. 05/09/2007

Conference on Language and Technology (CLT07) held at Bara Gali Campus, University of Peshawar, from 7-11 August, 2007. CRULP collaborated for the organization of workshop.

Sony Ericsson G705 is now official

Sony Ericsson G705 - another cellphone in Sony’s G series, which was initially set to be released on the Sept 8, but it got delayed for a day. At least it’s come now a day later.


The SE G705 is a good looking slider which is loaded with nice mid-end features. Since it’s also a member in the G-series, it’s meant to be used as a personal assistant in your overly busy daily life. The SE G705 doesn’t run on Symbian UIQ and neither on any other mobile OS. In fact, it’s on a Java platform.

The phone sports a 2.4 inch TFT display with 262K colors, and supports auto-rotation. You’ll find A-GPS, DLNA, Bluetooth 2.0, FM radio, 120MB of internal memory, 1GB M2 card and 3.2-megapixel camera with flash on this cellphone. The camera is also capable of geotagging and video recording.

The phone is to ride on quad-band GSM cellular networks. It’s been loaded with some Google features, equipped with a full HTML browser, with a shortcut key to have quick access to Google Maps and also video viewing and direct uploading on YouTube. It also supports Exchange ActiveSync for your email.

Since the phone is ready with the nice browser and functions with Google Maps, YouTube and email etc, it must come ready with high-speed Internet connectivity. Yes, it works on Tri-band HSDPA, which Sony claims that it’s the “turbo 3G speeds”.

The US version is made available in early 2009, whereas the G705u, which is the first SE’s UMA-enabled phone will be made available for the UK. The price tag still remains unknown.

RIM Blackberry Pearl Flip phone aka Kickstart gets official

BlackBerry Pearl Flip, also known as Kickstart Clamshell, had come to our attention early in May. And now it’s got official. But disappointingly, it’s got no 3G high-speed data support. But you’ll find WiFi, a 2-megapixel camera and a full-on document editor on this first flip phone of RIM.


This flip phone will debut on T-mobile sometime in spring this year but the pricing details is still unknown. The Pearl Flip measures 3.9 x 1.9 x 0.7-inch, which is thinker than the original Pearl that measures only 0.57 inches in thickness. But the Pearl Flip presents a relatively compact look.

The Pearl flip carries an external display and an internal display that has a resolution of 240 x 320-pixel. And the SureType keypad lies exactly beneath the internal screen. As mentioned this flip phone can reach EDGE which has no 3G support, so you can only do simple emailing and Internet browsing on this phone. Heavy download tasks and streaming video will need you for a long wait.

The Pearl flip’s WiFi comes with UMA support, which might mean the phone is compatible with T-mobile’s HotSpot @Home Wi-Fi voice calling. GPS support seems to have no luck on the Pearl Flip, as the press release says that the Pearl lets you connect to a GPS receiver via Bluetooth. Meaning that it’ll have no built-in GPS module.

To elaborate more about the full-on document editor - it’s an editor that allows you to edit Office documents (Word, Excel, Powerpoint etc) on the go. Other goodies are a video/music player, an “enhanced” HTML browser, 2 MP camera that’s capable of video, voice commands and microSD memory expansion up to 16GB.