Now that you're familiar with the idea of AI validation and you've seen how guardrails are applied in applications, let's implement your first guardrail. In this lesson, you'll build a simple validator and a guard to make sure that your customer chatbot doesn't mention a very special project that your pizza shop is working on. We're just going to do a little bit of quick setup as we are getting into implementing our validator, making sure that we are ignoring all warnings, making sure that we have all of the right libraries imported. You know. I'm just going to call out that for a lot of our coding we use type hints. This is basically to make sure that your code is very, very readable for anybody else that comes onto your code and is generally a good Python coding practice that I recommend. You'll once again import OpenAI to use as the LLM client and import the helper function for the RAG chatbot and vector database. And these last set of imports will help us set up a guardrail in code. So let us quickly take a look in more detail at what these classes are and what these functions are doing for you. So in the last lesson, you saw this diagram outlining how guardrails are built around your LLM. An input guard can check your user's input or any retrieved text before they are passed to the LLM. And an output guard checks any response returned by the LLM against your validation rules. As you begin implementing guardrails, you'll need to know some terminology. The validator is the core piece of logic that lies at the heart of a guardrail, and implements the code to check that inputs or outputs conform to your specific validation rules. You'll often hear me call this a guardrail. A guard, on the other hand, is part of your application stack, and it handles the processing of inputs outputs to pass to the validator. A guard can actually contain more than one guardrail. And in this lesson, you'll use the validator and guard classes to implement a simple guard for your RAG chatbot. So coming back to the notebook, you'll import some classes and functions from the Guardrail's AI Python SDK. First of all, we're going to import the guard on fail action and settings class. The guard is really straightforward. It is basically a container for many different guardrails, so that you can mix and match multiple guardrails, and then run them all at the same time. And you can initialize the guard to run around your AI LLM call on the input or output side. And on fail action basically specifies how you want Guardrails to handle any type of failure. So for example, if you detect hallucinations, do you want to block the LLM for answering or do you want to just let it continue and then log on the back end that that failure occurred. All of this can be configured using the on fail actions class. These next set of imports are what we need to create our own custom validator. Now, if you're not creating your own validator and you're just using a guard with out-of-the-box validators, then you don't need to import these. But this will help you basically make sure that you can return the right types of results, like either a pass result, if everything that the guardrail is looking at looks okay, or a fail result. If you detect any specific types of failures. A validation result object is basically just for type hinting. This is the main class that creates your validator for you. So this is the class that you will, you know, subclass and then add your own custom logic to. And then finally register a validator is so that our orchestration framework can identify where your validator is located, if you want to refer to it by name. Now that all the imports are set up, let's now set up our simple RAG chatbot again. So you guys have seen this before. We are setting up our client, setting up our vector database once again with all of the dummy documents about our pizzeria in the shared data drive, and then setting up our system message. This is all the same as what you'd seen before, except for one difference that I'm going to highlight, which is do not respond to questions about Project Colosseum. And what is Project Colosseum? Once again, if you want to take a quick stop and then look at the shared data drive about Project Colosseum. But Project Colosseum is this exciting new pizza project that the pizzeria is kicking off, which uses a different type of flour for different types of toppings, etc. That they're just not ready to reveal yet to their customers. Now that we have all of the components that we need to build our RAG chatbot, let's build it the same as we did last time, and then take it for a quick spin, where this time, what we're really going to do is ask it a question about Project Colosseum. And then this is the question that I'm pasting in. Once again, what we're really doing here is trying to extract information about Project Colosseum. You can imagine that maybe this is a competing pizzeria that needs to know what Alfredo's Pizza Cafe's up to, and then look at what our chatbot did here. It ended up revealing what the ratio of the pizza crust is, which I'm sure Alfredo's Pizza Cafe is not going to be very happy about. This information leakage about Project Colosseum is what we're going to try to address with our validator. And let's build a very, very simple validator where all it does is to check if our input string contains any mention about Project Colosseum, and if it does, just do not respond to that question, right? So let's start with writing our simple validator. We're just creating a new class that subclasses a validator class from guardrails. And then register that validator with the name deck Colosseum. Awesome. Now what we want to do is define this method of the class, which is the validate function. What the validate function does is it takes in a value. So we're checking does this contain any references to the word Colosseum and then some metadata we don't really needed for this validator. And later we'll see if there's any examples that we needed. And it does some checking on this value. And then returns a validation result which may be a pass result or a fail result. So in this case the check that we're doing is really, really simple. If Colosseum is in value, in that case, what we want to do is return fail result. What does the fail result contain? It contains an error message. This is helpful for informing the user as well as the LLM. Why the specific value failed in this case. Colosseum detected. And then optionally, a fixed value. You know it tells guardrails how you want to handle any failures. In this case, the fixed value is "I'm sorry I can't answer questions about Project Colosseum." Otherwise we end up returning a pass result and then a pass result doesn't need anything. That's it. That's our simple validator to detect there's talks about Project Colosseum. Now that we have a validator implemented, let us now use it in a guard. In this case, let's give our Guard a name which is "Colosseum Guard" and then use this Colosseum detector that we just build up. So what we're doing is telling our guard that if Colosseum is detected, what we want to happen is fail or raise an exception. And then we run this on the input side. So on messages, the default behavior for guard is to actually run it on the output. Now that we have our guard created, what we want to do next is run our guard through Guardrail server. So Guardrail's server is a handy utility we have that can help you wrap your LLM API call and surround it with these input and output guardrails and talked about in the previous lesson. The guardrail server can run locally or it can be hosted online, and you can configure it to make one or more guards available to you for using in your application. The guardrail server is also OpenAI API compatible. So if you're using OpenAI or any other LLM that's compatible with the OpenAI SDK, you can basically swap in and out guardrail server for it with one line and make sure your LLM is now protected and safeguarded. Every guard that you use in the server can be created with any custom guardrail that you just created. So for example, our Colosseum guardrail. Or you can use one of the guardrails downloaded from the hub as part of your guard. And we'll actually cover this in more detail a little bit later. So what we've done is we've actually already set up Guardrail's server for you that is running the Colosseum detector that you created. But if you're interested in setting up the server on your own, you'll need to install Guardrails and then follow the instructions that we've included at the bottom of this notebook. But it's a very simple configuration file and a single terminal command that you need to run. The server offers many benefits that are crucial for production-ready application. Some of them are around easy cloud deployment and making it very easy to containerize your Guardrails applications can independently scale any infrastructure that's needed for running Guardrails, especially any GPUs. And then finally, using Guardrail's server makes it very, very easy to wrap your LLM API calls with any guards that you create because of the OpenAI SDK compatible endpoints that we provide. So you can use it with any other open source models that are compatible with the OpenAI SDK. So now that we have some background on our guardrail server, all we need to do to make sure that we're using the guard in our application, is set up this guarded client. So remember that earlier we had set up an OpenAI client. All we end up doing is setting up this base URL that points to the locally running Guardrail server that runs this Colosseum guard that we created. So let's go ahead and run this. And then now create this guarded chatbot, which is the same as what we've done above, only swapping out the client for a guarded client. And now, once again, let's try to get our chatbot to leak some information. So I am going to copy-paste the same prompt that I had used earlier that told me about my secret recipe and let's see what Guardrails does now. All right, you see that validation failed for a field with errors. Colosseum detected. Awesome. So this is really cool because before we got a chance to reveal any proprietary company data, we saw that somebody was trying to extract our secrets and just blocked that in there. Now, the guard you've written here is set to throw an exception. Anytime Colosseum is mentioned, which will actually break the flow of the application. This was specified in the guard set up with the on fail action property, which was set to exception. But if you want, you can actually edit this code where you instead set on fail action to fix instead of exception, and you'll be able to pass back a much more graceful message, which is the message that we set as a fixed value for the guardrail. So now that we saw the components of building, you know, a quick and dirty guardrail, let's look at how we can really build more sophisticated guardrails and address some of those unreliable behaviors that we saw earlier. And we're going to start in the next lesson with hallucinations.

Learn Code

Next Lesson

Safe and reliable AI via guardrails

Introduction

Failure modes in RAG applications

What are guardrails

Building your first guardrail

Checking for hallucinations with Natural Language Inference

Using hallucination guardrail in a chatbot

Keeping a chatbot on topic

Ensuring no personal identifiable information (PII) is leaked

Preventing competitor mentions

Conclusion

Course Feedback

Community