Teaching Computers to Understand Language

With the recent advent of GPUs and increased computational power, machine learning and neural networks have risen from the grave and are now one of the forefront technologies in tackling anything a human would normally do. One of the biggest areas of research for this approach has been in understanding the nuances of language. Computers have traditionally struggled to learn languages due to thousands of rules and even more exceptions to each rule. Simple logic approaches fail to take into account context and interpretation and are rarely able to accurately interpret sentences and paragraphs.

In the past decade, researchers have begun applying recurrent neural networks to understand text. Neural networks are combinations of artificial neurons modeled off of the human brain. These networks can change the strength of connections in between the neurons based on training data given to them. For example, if a neural network receives pictures of apples and oranges along with labels for each picture, over time it can tune these connections and learn to distinguish the two objects.

Recurrent neural networks, frequently abbreviated to RNNs, are an extension of this idea and take input from previous iterations. So if an RNN was run on a sentence, it would take the classification of the previous word and use that as additional information for the current word. This makes RNNs particularly effective at handling sequential and time correlated data. In this case, since sentences are sequential constructions and previous words impact the interpretation of the current word, RNNs can better pick up contextualization and the nuances of language.

However, there are still some issues with this idea. Firstly, RNNs can only recall one state which often isn’t enough. Most modern structures actually use something called LSTMs (Long-Short Term Memory), which are a variant of RNNs that can store multiple states and decide which ones are important enough to still keep. Another common modification is the usage of BRNNs (Bi-directional RNNs). These systems stack two opposing RNNs together in order to extract contextual information from both before and after a target word. This way, if the network is looking at a noun, it can get descriptive information such as adjectives, which are usually before the noun, and information about its current state and actions, which are usually after the noun. For example, if the network read “A red cat sits here,” the two directional approach would allow it to extract what the object (cat) looked like (red) and what it was doing (sitting).

So now we have a tool that can potentially learn and understand text. But what exactly can we do with it? How can we use this information? It turns out that while we haven’t been able to fully create a system that understands everything about language, we can build specific structures to extract certain characteristics.

For example, RNNs can determine the part of speech of a word, separating them into categories such as noun, verb, and adjective. This serves as the foundation for grammatical analysis and other insights. Google’s Cloud Natural Language API builds on this and is able to find all the different entities from a sentence, along with their relative importance and connotation. This kind of information can help identify key parts of a piece of writing and separate them out automatically.

Another approach has been in encoding words and sentences. Certain machine learning techniques are used to convert words to vectors, such as what is done by word2vec, allowing computers to represent words in mathematical terms. From this, computers can automatically learn relationships and patterns, such as the similarity between “man” and “women” compared to “king” and “queen” as the vectors between these points will be of similar size and angle. In this way, computers can symbolically represent the same information about these words that we have in our brains.

This kind of approach of encoding information has been extended to other applications, such as translating. The idea is that if you can encode and map different languages to the same vector space, then your vector space now can be used as a universal translator. One RNN can map a sentence to this space and another can take this mapping and convert it back to a different language. This actually turns out to be very similar to Google Translate functions.

From all these different applications, higher level features and characteristics of the text can be extrapolated and greater insight can be made into the content of the text. This is essential to a variety of problems, from chatbots to translators to text editors and much more and can greatly help in automating complex, repetitive work for efficient scaling.