A group of researchers at Cornell University and Tel Aviv University created a new algorithm that allows a computer to scan a text in a number of languages and extract the grammar rules out of it. It is then used to create some new and meaningful sentences in that field of study. It is reported that this method also works for sheets of music and a chain of proteins. The Automatic Distillation of Structure (ADIOS) relies on pattern recognition in texts and is able to reproduce such patterns so that the sentences are gramatically correct.
“The algorithm — the computational method — for language learning and processing that we have developed can take a body of text, abstract from it a collection of recurring patterns or rules and then generate new material,” explained Shimon Edelman, a computer scientist who is a professor of psychology at Cornell and co-author of a new paper, “Unsupervised Learning of Natural Languages,” published in the Proceedings of the National Academy of Sciences (PNAS, Vol. 102, No. 33).
“This is the first time an unsupervised algorithm is shown capable of learning complex syntax, generating grammatical new sentences and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics,” he said.
This opens a whole lot of applications. One might use this algorithm as a spam filter (or spam generator!) to recognize random speech. The researchers are looking to use this algorithm to decipher speech from parents to young children. It may give an insight on how childs learn new languages.
However, since the algorithm is patent pending, we might never see the light of all applications that might emerge from it.
September 7th, 2005 | General Science
It must be my Master’s project rubbing off on me (I work on inverse planning in radiation therapy), but I can’t help but imagine an *inverse* application. I wonder if we could specify a set of grammar rules and, say, the probability of finding a given letter in a word, so the program can invent a language of its own, only decipherable by a similar program. It could be useful for data encryption. Or just for fun
Comment by Jason — September 7, 2005 @ 9:30 am