CMU-Q professor compiles decades of research into Turkish Natural Language Processing

Published: 06 Aug 2019 - 09:54 am | Last Updated: 01 Nov 2021 - 05:02 pm

Kemal Oflazer, a faculty at Carnegie Mellon University Qatar, Associate Dean for Research and Area Head of Computer Science.

The Peninsula

A professor at Carnegie Mellon University in Qatar (CMU-Q) has compiled decades of research into Turkish Natural Language Processing.

Many people may not be familiar with the term “Natural Language Processing,” (NLP) but English-speakers experience ask Siri or Alexa a question or use Google Translate service to translate text from one language to another.

There have been many noteworthy advances in natural language processing in English, and often these techniques can be applied to other languages. Some languages, however, can pose significant computational challenges.

Kemal Oflazer, a faculty at Carnegie Mellon University Qatar, Associate Dean for Research and Area Head of Computer Science, completed his bachelor’s and master’s degrees at Middle East Technical University in Ankara. He then pursued his PhD in computer science at Carnegie Mellon University, studying and then working in the US for a decade.

When he returned to Turkey to teach at Bilkent University, he found that his time away had given him a new perspective. “You rarely get the chance to see what your first language looks like from an external point of view. I wrote a Turkish document, and I realized, there is no Turkish spell-checker,” says Oflazer.

It was the early 1990s, and this observation raised questions that would guide Oflazer’s research interests for the next three decades.

Turkish is an agglutinative language, which means that suffixes attach to a root word like beads on a string. One complex Turkish word with several suffixes could express the same meaning as an entire sentence in English.

“In English, the computer can check spelling against a finite list of words,” said Oflazer. “In Turkish, a given verb root can give rise to about 1.5 million different word forms. It is rather amazing.”

This also brings other interesting properties, such as free word order where the subject, object or the verb can be arranged in any possible order. In English, by comparison, the order is rather fixed.

In the early 1990s, there was no work being done in the area of Turkish NLP. Through funding provided by NATO Science for Stability Program, European Union and Turkish Scientific and Technological Research Council, Oflazer and his graduate students did research and development on Turkish natural language processing.

In 2012, Oflazer was invited to deliver a talk at the Language Resources Evaluation Conference (LREC) in Istanbul on the challenges of Turkish NLP. After the lecture, he was approached by Springer Verlag with a proposition to compile a book on the state of the art of Turkish NLP. Along with co-editor Murat Saraçlar of Boğaziçi University in Istanbul, Oflazer spent more than four years working with researchers—many of them their former graduate students—to bring together 25 years of work. The book was published in 2018 in both hard-copy and online versions, and so far more than 2,000 copies of the various chapters have been downloaded.