In this lesson, you dive into the techniques of reordering documents to improve information retrieval relevance and quality. You'll learn how to use specific metadata values to determine reordering position. Let's go. There are scenarios where a document can contain other fields that affect its position within such results. Take, for example, an Airbnb listing with rating and number of reviews field. This fields indicates qualitative and quantitative measures that can contribute to the relevance of a document with respect to a user query and search criteria. Taken into consideration, the value of these fields in order to affect the position of a document in the list of return search results, is referred to as boosting. Why should you consider adding a boosting technique in your search queries? Vector search is an effective method of ranking documents based on semantic similarity. Although vector search scores and ranking effective, metadata values can contribute to the document relevance, which can affect the ordering within search results. Using additional qualitative and quantitative measures to rank documents and shows database operation results are credible and relevant to user queries and their search criteria. Boosting can also be used to make sure results meet user specific requirements, which introduce personalization within search results. In the coding section, you're going to go through some familiar steps. The first would be to set up a rank pipeline, and you will add the relevant stages. Then, you add a boosting logic which will use some mathematical operators available within MongoDB database. And as usual, you handle the user query and visualize results. Let's code. Start by importing your custom utils module like you've done in the previous lesson. Move on to load the data. Also, like you've done in previous lessons. You can also take some time to view the attributes of each data points. Move on to the document modeling which loads the listing into a conformed model. This is similar to process that you've carried out in previous lessons. The next step is to get an object of your database and your collection. Then start of a clean collection by calling the delete any on the collection record. This is similar to the process that has been carried out in previous lessons. Go through the data ingestion process and move on to the vector search index definition process. All similar to the previous lesson. Now, you define a search result item model for the results shown in this lesson. The attributes for each results needs to contain a combined score, number of reviews, and average review scores. These new attributes will be explained later. Again, just like in the previous lesson, you have to handle user query function with the exact same code. Now, we can get to the main aspects of this lesson. You'll be implementing a boosting logic and adding it to the vector search operations conducted on the aggregation pipeline. Here, you are assigning to a variable named "review average stage". In this cell, what is happening is, we are adding two new fields to every document returned from the database operation. The first field is the average review score. Now, this is a qualitative measure I was talking about. The average review score is going to go through every review component of a document and take an average of a sum of the review components. So within every document we can see the accuracy, the cleanliness, the check in and other attributes of a listing. Get the score and with the dollar operator, specifically the dollar add which conducts the mathematical operation of an addition, we can get a sum of all the review component, and then we can divide it by the number of review components, which in this case for the listings in our data set six. This gives us an idea of what the average rating of a listing is. That explains the new field that we're adding to every document called the average review score. The second field that has been added to every document is the review count boost. This is a quantity measure, and this field will take the value of the number of reviews attributes in each document. This is how you can pass the value of one field to a new field. Simply using the dollar operator and the name of the field. To add this new field to every document in the database operation, you can add this process as a new stage, specifically the add fields stage. That concludes adding the qualitative measure and the quantitative measure. In the next step, you'll need to add weights and determine how each component of the quantitative measure and the quantitative measure should affect the ranking of a document after a vector search operation. This is done by adding a new stage to our pipeline. This is the weighting stage. Now the weighting stage comes right after the review average stage. So, the weighting stage will then have a reference of the average review score and the review count boost that was added to each document in the review average stage. This is how you can reference the values of these fields from the documents. To implement a weighting logic, you will use several operators enabled by MongoDB database to conduct mathematical operation. The add operator and the multiply operator. For the multiply operator, you will multiply the value of the average review score, which is the qualitative measure by a weight. I'm using the number between 0 and 1 to assign a weight. Then do the same for the review count boost, which is a quantitative measure that will be considered to rank the document after the vector search operation. You then use the add operator to combine the two results from the different multiplication operations. Assign this new additional value to the field combined score. The combined score is the combination of the two multiplied value, and we can add the combined score to each document within the database operation by using the add fields operator. This is the weighting stage. There is one more stage to complete this process. The final stage is the sorting stage. The sorting stage is very simple. Using the dollar operator sort, we can actually rerank the documents based on their combined score or a certain field. In this case, you are using the combined score and you are reranking it in descending order. So, this is indicated by minus one ascending order will be indicated by a one. Now that you have all the additional stages implemented to add to the vector search operation, you can create a new variable called Additional Stages, that takes a list of all the defined stages. The first is the review stage, where we conduct a mathematical operation to gain the qualitative and quantitative measure and add it as a new field to the documents After the vector search operation. Then there is a weighting stage, and then there is a sorting stage. All the stages are executed sequentially after the vector search operation. Remember, the vector search operation we're using in this lesson, is a pre-filter in vector search. Similar to the ones you've created in previous lessons. Now it's time for you to see the results of the boosting logic. Using the same query from previous lessons and also the same function from previous lesson, the handle user query function, you will pass in the additional stage and make note to use the vector index with filter. Here, you can observe that the vector search operation stage was conducted in a fraction of a millisecond. Now, let's observe the documents that were returned from this operation that included a combination of stages to simulate a boosting logic. Here you can see the results of the database operation that included multiple stages. The average review score is included along with the numbers of reviews and the combined score. Remember, the combined score includes a weightage consideration. Now, the documents shown are ordered by the combined score. You can pause the video here and observe the combined score of the other documents. One thing to note, is that because of the weighting logic we added, you will observe that despite this document, having a high rating, it is ranked lower in comparison to the other documents above it because it had a lower number of reviews. This is the impact of adding weights to the components you're considering for your boosting and logic. One more thing. You can play with the weights and adjust the numbers to see how it affects the results. To do this, simply go back to the weighting stage and adjust the weights. Now I'm giving higher weights to the review count and a lower one to the average reviews. Once you've changed the weights, you can observe the results again. As you can observe from the results, because you've added more weightage to the number of reviews a document with a high number of reviews is ranked higher than one with a high number of rating in comparison. Pause the video here to observe the results. In this lesson, you've learned how to implement a typical RAG system, conduct vector search. But now, you've added multiple stages to the aggregation pipeline to simulate a boost in logic, which adds more relevance and context to the ranking of your documents after a database operation. In the next lesson, you'll learn how you can utilize prompt compression to reduce the prompt that are sent to large language models in order to reduce operational costs. See you then!

Learn Code

Next Lesson

Prompt Compression and Query Optimization

Introduction

Vanilla Vector Search

Filtering With Metadata

Appendix-Tips and Help

Course Feedback

Community