What is Natural Language Processing?
Natural Language Processing, or NLP, is a field derived from artificial intelligence, computer science, and computational linguistics that focuses on the interactions between human (natural) languages and computers. The main goal of NLP is to program computers to successfully process and analyze linguistic data, whether written or spoken.
Operations in the field of NLP can prove to be extremely challenging due to the intricacies of human languages, but when perfected, NLP can accomplish amazing tasks with better-than-human accuracy. These include translating text from one language to another, speech recognition, and text categorization.
Due to varying speech patterns, accents, and idioms of any given language; many clear challenges come into play with NLP such as speech recognition, natural language understanding, and natural language generation.
What Can NLP accomplish?
The following applications are driven from the use of NLP:
- Spelling / grammar accuracy check, as used in word processing software such as Microsoft Word and Grammarly
- Speech recognition, which provides a way for computers to understand spoken instructions.
- Smart search-engine predictions as used in Google and Amazon web store
- Messenger bots and interactive voice response used for hands-free customer support
- Digital assistants, such as Google Assistant, Alexa, and Siri.
While extremely powerful, NLP models can also be difficult to build because of the dynamic and nuanced nature of human languages. Some of these challenges include:
- Human language is extremely dynamic. The meaning of words changes subtly over time, and new words are constantly introduced into use. This means that NLP models must follow word trends, and understand how those tie into concepts and messages.
- Human lanugage is nuanced. The complex characteristics of human languages such as sarcasm and suffixes cause problems for NLP. High level emotive constructs, like sarcasm, are subtle and abstract for a machine to pick up on. Low-leve problems like suffixes can be a bit easier for a machine to decipher, but still present difficulties as the machine may confuse variations of one word with contractions or endings of another.
Even with these challenges, there are many powerful computer algorithms that can be used to extract and structure from text.
How does NLP work?
NLP makes use of several algorithmic techniques to parse text. These include: lexical analysis and synctactic analysis.
Lexical analysis is the process of trying to understand what words mean, intuit their context, and note the relationship of one word to others. It is often the entry point to many NLP data pipelines. Lexical analysis can come in many forms and varieties. It is used as the first step of a compiler, for example, and takes a source code file and breaks down the lines of code to a series of "tokens", removing any whitespace or comments. In other types of analysis, lexical analysis might preserve multiple words together as an "n-gram" (or a sequence of items).
After tokenization, the computer will proceed to look up words in a dictionary and attempt to extract their meanings. For a compiler, this would involve finding keywords and associating operations or variables with the toekns. In other contexts, such as a chat bot, the lookup may involve using a database to match intent. As noted above, there are often multiple meanings for a specific word, which means that the computer has to decide what meaning the word has in relation to the sentence in which it is used.
This second task if often accomplished by associating each word in the dictionary with the context of the target word. For example, the word "baseball field" may be tagged in the machine as
LOCATION for syntactic analysis (see below).
The syntax of the input string refers to the arrangement of words in a sentence so they grammatically make sense. NLP uses syntactic analysis to asses whether or not the natural language aligns with grammatical or other logical rules.
To apply these grammar rules, a collection of algorithms is utilized to describe words and derive meaning from them. Syntax techniques that are frequently used in NLP include the following:
- Lemmatization / Stemming - reduces word complexity to simpler forms that have less variation. Lemmatization uses a dictionary to reduce the natural language to its root words. Stemming uses simple matching patterns to strip away suffixes such as 's' and 'ing'.
- Parsing - This is the process of undergoing grammatical analysis of a given sentence. A common method is called Dependency Parsing, which assesses the relationships between words in a sentence.
- Word Segmentation - This is the separation of continuous text into separate words. In English this is easy because all words are usually separated by spaces, but for some languages like Japanese and Chinese they do not mark spaces for words. This is when word segmentation becomes very useful.
Semantics refers to the meaning that is conveyed by the input text. This analysis is one of the difficult tasks involved in NLP, as it requires algorithms to understand the meaning and interpretation of words in addition to the overall structure of a sentence. Semantic analysis techniques include:
- Entity Extraction - This means identifying and extracting categorical entities such as people, places, companies, or things. It is essential to simplifying the contextual analysis of natural language.
- Machine Translation - This is used to automatically translate text from one human language to another.
- Natural Language Generation - This is the process of converting information of the computer semantic intention into readable human language. This is utilized by chatbots to effectively and realistically respond to users.
- Natural Language Understanding - This involves converting pieces of text into representations that are structured logically for the computer programs to easily manipulate.
Natural Language Processing is an incredibly powerful tool that is critical in supporting machine-to-human interactions. Although the technology is still evolving at a rapid pace, it has made incredible breakthroughs and enabled wide varieties of new human computer interfaces. As machine learning techniques become more sophisticated, the pace of innovation is only expected to accelerate.