The researchers created a neural network system that can decipher text from lost languages. (Image credit: Sharon Mollerus via Wikipedia)


Researchers from MIT’s CSAIL (Computer Science and Artificial Intelligence Laboratory) and Google Brain have developed a neural network that is capable of deciphering dead languages, including Linear B- a syllabic script that was used for writing Mycenaean Greek, the most ancient form of the Greek language. In a recent paper titled “Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B,” the research team detail how they adapted deep learning to solve the puzzle of translation.


Their platform differs from traditional AI methods of language translation, as there is a lack of data when it comes to lost ancient languages. To get around this issue, the researchers based their translation matrixes based on a principle that languages evolve over time, and understanding that words are related to each other in similar ways.


Their machine-learning platform starts the process of translating by mapping-out those relationships for that specific language, which requires an enormous amount of text. It then searches to see how often each word appears next to other words, which creates a pattern using unique signatures that define a word as a vector in space. The researchers explain that those vectors follow simple mathematical rules, for example- King – Man + Woman = Queen, so a sentence can be thought of as a set of vectors, following one after another to produce a trajectory.


This principle is applied to words in two separate languages, which occupy the same vectors in their own parameters of space, making it possible to map one language on to another using a one-to-one comparison. It’s those trajectory similarities through those respective spaces that allow for translation, and the system doesn’t even need to know what the words mean to figure it out.   


Have a story tip? Message me at: cabe(at)element14(dot)com