In this lesson, you will build autonomous web agent that can execute multiple tasks simultaneously. You will prompt the agent to perform tasks such as navigating, taking actions, summarizing the web pages even filling up a form and signing up for a newsletter. All right. Let's start coding. Now you will make the use case much more complex and see how can the agent work fully autonomously. The agent will open a course page, to get the summary of learning outcomes, and even subscribe for the newsletter. In the lab, you will learn how to create autonomous web agents. You will use MultiOn web agent in this lab. Now we start by importing our MultiOn client. And we've prepared some functions in utils for you to make the lab much easier to visualize. So we allow you to visualizeSession, see a MultiOn demo. Have a session manager, image utils, and also display step header. Here, we start by initializing our MultiOn agent. We load the MultiOn API key from the utils. Now we create MultiOn client. Here you are creating a simplified client for a MultiOn API. First, we initialize the MultiOn client with the API key. Here we use the already created instance for MultiOn class. Then we create a new session for agent, which starts at the URL that we provide, and we also allow to include screenshots. Then, we also allow you to close all open sessions, and we allow you to also navigate to a particular URL. Now we allow you to execute task in the current session. Here, you can give it on a specific task that you want the agent to follow. And here the task executes in a single step, but you can allow it to continuously run in a multiple steps. Some of the instructions that were given to our agent is: First, do not ask questions to the user. It has all the necessary information in the page, and it should try to complete the task in the best of its abilities. And finally to describe the task and also ask it to return a screenshot. Now we initialize a MultiOn clients that we created. Let's start with our first example. We give it an instruction to get a list of all the courses. Here, we start by creating our session. You ask it to execute task, based on each step. And we define each step as a particular action that the agent can take. And it can be either click, tag, submit, scroll, or triggering the information. Right now we limit it to a certain number of max steps. And that we define is ten. You can visualize the session using the visualize session function. And then execute the task following the instruction that we have given it. A new session has been created, we continue to see how many steps have been executed. Now, let's run it. So, this might take some time, but we'll speed it up in post. Now, we have our results. So we can see the step zero. It created a session. Then the first step that it take it scrolls on the page, to load more content. Similarly, it keeps scrolling down until it will reach the end of the page by following multiple steps. Now you will see the final response. We will use the visualize session function again. And this is the final response that the agent gives us. Here, it was able to find all the courses that are a part of the DeepLearning.AI website and in full details. You can see it does not follow a particular structure because it is a more of a conversational assistant. Where it follows and answers as more for a conversation. But if we give it an instruction: give a structured output like a JSON on markdown, it would've followed the results. You can see, it's like the final screenshot where it stopped and started giving us all the course listing. Now you will create your own browser UI. Here, you use your initialize my own client as part of our session manager for which we already provided for implementation in the utils folder. We will ask it to follow certain instructions like finding the course on a subject and open it. Summarize the course, get detailed course lessons. And finally, we start with a more complex example and go to the DepLearning.AI homepage subscribe to The Batch Newsletter, use a particular name, email, choose any other required field. And then give it a guidelines to make sure that it selects the proper dropdown values. And once it sees the subscribe button, it clicks it. Now, let's describe our variables, like course, subject, email and name. Here we have given it subject as RAG. We define the name as Div Garg. And our email
[email protected] Now. We will create our own MultiOn browser UI and also see what results and page is interacting on. Now our MultiOn Browser UI is fully loaded, and you can see it started on DeepLearning.AI courses. We will start with the first example. We are asking it to find the RAG course and open it. You can see it entered the RAG course in the search bar and waiting for the search bar to roll. And here it is trying to scroll and find the course, that it wants to open and examine its content. Scroll more to see which course, it wants to open for RAG. Ask the question on I've seen multiple courses on RAG, but it wants us to give it a specific title and as you can see in the browser UI, you see a course on Multimodal RAG. And I would ask you to open it on that. As you can see, is just able to and interact with the browser and open the course on Multimodal RAG. Now, as the next instruction will ask it to summarize it. Here, it has given us the summary of the course and concludes the summary. Here, you can ask it to give it a more detailed core lessons. Here, it improves the format and gives us variety of different topics that the lesson covers. Go to the course website and see in detail if we have the concepts right. So you can see this is the Multimodal RAG Chat with Video. With Vasudev Lal. Let us see the course outline. And you can see it has eight lessons and six core examples. And our agent was able to find the course lessons outline in a structured order. This is the basic capabilities of autonomous AI agents. Let's do something even more exciting. We will ask the agent to go to the DeepLearning.AI homepage subscribe to The Batch Newsletter, use the name Div Garg,
[email protected]. Choose other required fields like country as United States and job title as software engineer. And we can give it a guideline to make sure that proper dropdown values are selected by typing and clicking on it. And finally when it sees the subscribe button and it clicks it. Navigating to the DeepLearning.AI homepage. This might be a small to see, but it is able to enter the Div Garg as a name and also the email. It might have reached the maximum number of steps, but we will ask it to continue. Now it selected country as United States. So, it missed one particular step and we will ask it to and also select the job title. So the agent seems to be asking more questions just to make sure that it follows the correct guidelines. And I can mention. Yes, that is correct. And you can see now is entering the proper job title. Trying to select it from the dropdown menu. And it has finally selected it. And, it clicked subscribe button to to subscribe to the newsletter. So in this lesson you learned how to create autonomous web agents how you can give it simple instructions and if you follow them and you can also follow a much more structured response without us manually describing it as code. And then you also see how it can interact with the browser. But you also see there are some challenges where it might forget some steps or it might need more clarification or guidelines to follow. But agents are able to follow these steps fully autonomously in a multiple-step manner. fully autonomously in a multiple-step manner. Now I encourage you to interact with this amazing browser UI, and if you are not subscribed to the DeepLearning.AI Batch newsletter, try it and see how it goes.