In this lesson, you will learn what vector embeddings are, their main applications in natural language processing, from token embeddings to sentence embeddings and how sentence embeddings are used in RAG. Let's dive in. Vector embeddings map real-world entities such as a word, a sentence, or an image into vector representations, or a point in some vector space. A key characteristic of this representation is that points in vector space that are similar to each other in some way have a similar semantic meaning. Word2vec was the pioneering work on learning token or word embeddings that maintain semantic meaning, demonstrating that dense vector based representation of words can capture their semantic meaning. The really cool thing is that these word embedding vectors behave like vectors in a vector space, and you can use algebraic equations and operations on them. There was a very famous example showing that the closest vector to queen minus woman plus man is king. Here I'm showing one of my favorite examples. If you train were to work on Star Wars text, you can show that the closest vector to Yoda minus good plus evil is Vader. A sentence embedding model applies the same principle to complete sentences. It converts a sentence into a vector of numbers that represent the semantic meaning of the whole sentence. In this course, we focus on text embeddings. But embeddings are a generic concept that can be applied and used in other domains. For example, you can train deep learning models to learn embeddings for images, videos, and even audio clips. One of my favorite examples of multimodal embeddings is CLIP, a model developed by OpenAI that aligns images and text in a shared representation space, enabling it to perform a variety of tasks. For example, text generation from images. Let's look at a few applications of vector embeddings. In building LLM's, token embeddings are used in transformer models to represent tokens. Sentence embeddings are used to power semantic search, also known as vector or neural search and the retrieval engine in a RAG pipeline. Another interesting application is product recommendations, where embedding vectors represent products and recommendations are made based on similarity between those embedding vectors. Finally, embedding vectors can also be used for anomaly detection using typical anomaly detection approaches in the embedding space. A critical component of any good RAG pipeline is the retrieval engine. How does this work? Given an user query, you rank order all possible facts or text chunks by the relevance to the query, before sending the facts to the LLM for generating a response. Now you might ask "what algorithms can we use for optimal retrieval?" One approach for ranking text chunks by relevance is using a cross encoder. A cross encoder is a transformer based neural network model that is used as a classifier to determine the relevance. Basically, you take an encoder like BERT, concatenate the question and answer with a separator token, and ask the cross encoder to compute the relevance of the answer to the question. But there's one big problem, cross encoders are very slow, and this approach requires you to run this classification operation for every text chunk in your data set. So this does not scale. Sentence embedding models provide an alternative. This works as follows. During indexing, you create an embedding for each text segment and put it into a vector database. When a user issues a query, you use similarity search to identify the best matching chunks to retrieve. This is often less accurate than cross encoders, but it's much faster and provides a realistic and practical implementation path. All right. In this lesson, we introduced the concepts of word embeddings, sentence embeddings, and we saw how sentence embeddings can be used in RAG. In the next lesson, we will learn about contextualized word embeddings and how they work with the BERT model. See you there!