In this lesson, you will build a simple web agent that is capable of scraping the web and providing the results as a structured output based on natural language instructions. All right, let's dive into the code. So, in this lesson we are going to be building a simple web agent. Here, We are calling it a learning recommender agent. Here we start with navigating to the DeepLearning.AI website, and we ask the agent to scrape the courses, list a specific course on a specific topic, and then read the details and learning objectives of that course. Now let's go to the lab. So we are going to start with importing our libraries. In this lesson we are going to be using Pandas, playwright.asyncio, OpenAI, web client, an image library to display our screenshot data and same for IPython. And then we are going to be using our helper functions. Then we start with initializing our OpenIA client with the API key. OpenIA client with the API key. And we also make sure our notebook can run asynchronously. I'm going to run this. And then we already have the key set up for you in the DeepLearning.AI platform. We are going to start with creating a simple web scraping agent. So, here we describe a web scraper agent, we initialize it. We also initializing our browser that we are scraping our HTML content. We also allow you to take screenshots and converting it into a screenshot buffer to better display. And finally, you are allowed to close your browser. Then we initialize our web agent client. We start with creating our structured data format. Here, we start describing our DeepLearning.AI course which has this title, description, who the presenter is, this image url. And the course url. Here you create multiple DeepLearning.AI courses. So we will start with creating our OpenAI client. Here we are describing a model, which is typically 4o-mini with a static version which is highly fine-tuned towards running a script response. So we are describing our system prompt which describes the role of the agent. Here, we are asking you to act like a web scraping agent and it's task is to follow the relevant information convert it from HTML to Json. And here we allow you to give your own instructions. And its objective is to return the title, description, presenter, image url, course url, of all the courses in the DeepLearning.AI website. And basically instructed to only give JSON and not the format that we don't want. Then you give your HTML content to the agent, and you also describe the response format you want your structured output to be. Here, you return your structured response. In the web scraper, this is where everything comes together. You first get your HTML content. Then we take a screenshot. And finally, you give it to your LLM to process and return a structured response. And as a way to do it properly, we also close our scraper. Now let's work on our first example. So we first start with our target URL. This is where the web page was targeted. And, we also define a base URL that we need for displaying our final course url. then let's give it an instruction. So in the instruction it's a simple instruction where we ask it to get all the courses. And we get the result and screenshot from the web scraper. We run this. So the agent started extracting the HTML content taking screenshot and finally it is processing the results to get our scrape data. This might take some time, but we will speed up in post. And finally, we have a structured response. Here, using the visualize Course function, you will visualize your results. The details of this is in the helper file. And here's our output. As you can see, it just scraped all the courses that are in the DeepLearning.AI website. And it starts with Long-term Agentic Memory with LangGraph. A description, who the presenter is. The image url. And finally, the course url that you can directly navigate to. Same with Llama Index, Windsurf, course from DeepLearning.AI, Arize, and many more. And you can also visualize the screenshot of the page that the web scraper agent got these courses from. Now let's work on a second example. What you can do is you can read the description and have the LLM understand what the course is about. And only retrieve the course that you are interested in. You can ask the agent to read all the description about the course. And the description can be given the title, some of the course summary, even like who the presenter is and some overall details that are not visible here. And that way LLM has information that is not visible to you, but it has an inherent understanding of what the course is about. Now we will give you the instruction that follows it reading a description about the course and only giving three courses that are about the subject. And we make sure that it does not give any other course an output. Here the course we are choosing is "Retrieval Augmented Generation (RAG)." And now let's processes our results. Now that we've generated our instruction's response. We can use the visualize function again to see our structured response visualization. Here, now the courses that our agent is returning is about RAG. So you can see if it was able to find this courses in RAG, course description and their tags and labels. Now let us see some of the challenges that we have in our web agents. Here, if we give it like a bit more complex example. Where were suppose we will get the summary of it, of course, and provide for the learning from it given the subject. Let's see what it can return. Now, we will use our visualization function again to visualize the results. What our agent returned. And here you see that it followed instructions correctly, the instruction was to give us the summary and learnings that we get from it. But since the agent was not able to follow our instructions correctly, it returned all the courses that are available in the website. Now you might be able to improve the agent in terms of having it go deeper in the course, open the course, get the learnings and description, summary, all the teachings and have the agent then able to output that as a structured result. But then you're overfitting towards the data. But rather, the use case that we want from our agent is it needs to understand our task really well and generalize our learnings. In this lesson, we learned how to build simple web agents that are capable of scraping the web and follow our instructions. We also notice the challenges where sometimes the instructions can be unclear or agents just do not have a capability of following that. And then they give us the wrong output, which shows the challenges. In the next lesson, we'll go over how can we overcome some of these challenges that we have seen in our current web agent.

AI is the new electricity and will transform and improve nearly all areas of human lives.

Learn Code

Next Lesson

Building AI Browser Agents

Introduction
Video
・
2 mins

Intro to Web Agents
Video
・
11 mins

Building a Simple Web Agent
Video with Code Example
・
7 mins

Building an Autonomous Web Agent
Video with Code Example
・
9 mins

Agent Q
Video
・
8 mins

Deep Dive into AgentQ and MCTS
Video with Code Example
・
10 mins

Future of AI Agents
Video
・
5 mins

Conclusion
Video
・
1 min

Appendix – Tips and Help
Code Example
・
1 min

Course Feedback

Community