Most of the data we use today is in text format, indeed text data represent more than 70% of the Internet data. We must therefore find ways to handle them and especially draw useful information and applications from them, all the means and techniques used to achieve this goal is grouped under the name of NLP (Natural Language Processing) which combines machine learning, textual data preprocessing and some traditional computational linguistic techniques.
We manipulate textual data on a daily basis, and to name just a few, we have emails, social networks, web pages, and text messages. And this huge amount of data is just waiting to be exploited.
This textual data can be used to build several types of useful applications like :
- Sentence or document classification (i.e sentiment analysis)
- Automatic translation
- Speech recognition
- Conversational Agent (i.e Customer Service)
- Whatever you want, or should i say whatever you can do
As you can see, the list of applications that can be build with text data is as long as the size of the available text data, well maybe I’m going too far.
But what is Natural Language Processing? Well let’s ask the all knowing:
Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. (Wikipedia)
Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.(Wikipedia)
So, Natural Language Processing is here to save the day, by offering techniques and strategies to build these different applications and allowing machines to be as efficient as humans in understanding the languages we use on a daily basis.
Impossible mission, you will tell me !!!
Well guess what, we have a lot of Tom Cruise in the NLP kingdom. However, even Tom Cruise could fail in such a perilous mission, and this for several reasons, such as the complexity and diversity of languages ??(grammar, conjugation, word composition, … you name it).
To make matters even worse, comes the difficulty of being able to automate the evaluation of such applications, that means we usually need a human to be able to do empirical evaluation of the performance of such systems. For example, the best way to evaluate a conversational agent is simply by letting human discuss with it (Amazon Truck i see you).
Due to all these difficulties, we were inspired to propose a series called The NLP Kingdom, with three seasons to explain in detail some techniques used to create applications that exploit textual data, so that everyone can get their hands dirty in such a vast kingdom full of challenges to solve.
The first season is divided in three episodes, in which, we will talk more about the pretreatment techniques used in the NLP field, I named Flirting with NLP
Season 1 :
The second season, also composed in 3 episodes, will allow us to travel in a city where words are represented not by characters but in continuous vectors containing the word semantic and syntactic meaning, I named Word Embedding.
Season 2 :
- Épisode 1 : Word2Vec
- Part 1 : Skip-Gram
- Part 2 : Continuous Bag Of Words (CBOW)
- Épisode 2 : Global Vectors (GloVe)
- Épisode 3 : Other Technics
The third and final season will use the stories of the two previous ones to immerse us in the heart of a deep network of brilliant minds who work in the shadows to give ordinary people like us, means to shine in the NLP kingdom, I named Deep NLP.
Season 3 :
- Episode 1: RNNs
- Episode 2: LSTMs
- Episode 3: GRUs
- Episode 4: Attention
- Episode 5: Seq2Seq models
- Episode 6: Beam Search VS Greedy Search
Shall we begin? Let’s have fun.