In this lesson, you'll learn how to use the Google Cloud Carbon Footprint tool. This tool provides a more comprehensive measure of your carbon footprint by automatically estimating greenhouse gas emissions from all of your usage of Google Cloud. So let's dive into the code. We're going to be using Google Cloud. So, like before you'll need to import the authenticate function. So you can get access to the credentials. And the project ID. Every Google Cloud project has a carbon footprint overview page in the Cloud console, which shows you a monthly carbon footprint estimate associated with a particular billing account across project, region, and product. If you have a personal Google Cloud project that you've been using, you can type carbon footprint into the search bar and you'll see something like this if you want to do some more in-depth analysis, the carbon footprint data can be exported to BigQuery, which is Google Cloud's data warehouse tour. Once it's in BigQuery, you can use SQL to analyze the data or export it to json or CSV. In this lesson, we're going to explore carbon footprint data and this notebook using Python and SQL. now before we can actually get to coding, you'll need to understand just a little bit more about how carbon is usually measured and quantified. The Greenhouse Gas Protocol corporate standard defines three categories called scope one, scope two, and scope three. These categories help organizations measure their carbon impact, and what exactly contributes to each category is going to differ across organizations depending on the type of business that they run and the type of activities that they carry out. Scope two is what we've talked about so far in this course. It captures all of the indirect emissions from the purchase of electricity heating and cooling. When we talk about carbon and computing, scope two is the most relevant because it includes the emissions related to electricity usage and Google Cloud data centers as a result of being connected to the grid. However, purchase electricity is not the only way we produce emissions. There's also scope one, which includes the direct emissions from sources controlled by an organization, like a gas range in a kitchen, or maybe fuel that's used to run shuttles. If you were to operate a fleet of shuttles in the context of a Google Cloud, data Center scope one could include something like an onsite backup generator. Then at their scope three, which captures all of the indirect emissions from your assets not controlled by your organization. This includes categories like items bought from suppliers or waste disposal, and any kind of business travel. So in the case of Google Cloud, this could include the emissions associated with producing the GPUs and the data centers. Now carbon impact encompasses more than just the scope two electricity generation emissions that we focused on so far in this course. And by including scopes 1 and 3 into our estimates, we get a more holistic view into our impact. When we talk about this carbon footprint tool specifically, and the data that we're going to look at in this lesson, what you're seeing is Google's Scope one, two and three emissions from data center operations. And as a developer, your usage of Google Cloud would actually fall under your scope three emissions. Now, that was a lot of new jargon. So let's make all of this more concrete by looking at the data and seeing these numbers in action. The first thing we need to do is import the BigQuery library, and we'll also import Pandas. So we'll do from Google dot Cloud import BigQuery. And then we can import Pandas. For your own projects outside of this course, you're going to need to set up an export of the carbon footprint data to BigQuery. I've already done this for this course. So you won't need to set that up. But let me quickly show you how you would do this in the cloud console. If you want to try this out for your own projects and get a sense of your own carbon data for any work you might have been doing in Google Cloud. So in the cloud console, you would type in carbon footprint. And then from here, you would click this export button. And you'll select a project you want to export the data to. And then you can click configure Export. Once you've done that there are few different customizations you can set. You can just leave everything as the default. The only thing you need to set is the name of a data set in BigQuery, where you want all of the data to get transferred to. So I'm going to select the data set called carbon footprint which is a data set I made. And then you can click save. And once you've done that this will set up an export of the data and you'll be able to access it. But back to the notebook. Using the BigQuery Python client, we're going to write a function that executes a BigQuery query. This function will take as input a SQL query as a string, and we'll execute it in BigQuery, and the results will be returned as a pandas data frame. Let's define this function. We're going to call it run BQ query. And it takes in SQL. And this will be string. And we'll use this function in just a minute. So, the first thing we need to do is create our BigQuery client. So this is kind of similar to the previous lesson where we created the cloud storage client, except this time instead of storage client we'll call this BigQuery dot client. And we need to pass in our credentials and our project. So we'll pass in project and we'll pass in the credentials. once that's done we need to actually run the query. So to do that we're going to set up something called a job config, which is where you can pass any specific configurations. But we don't actually have any that we need here. And then we will actually query the BigQuery client. And to that will pass in the SQL. And then this job config. And then I'm also going to go ahead and grab the job ID just so we can print that out. Because that's nice to see. And the very last thing we'll do here wait for the job to finish. And we will extract everything into a pandas dataframe using this to.pandas and then we'll just print out, little a nice message here saying that the job finished, and we'll print the job ID, and then we'll also return the data frame. So, let's run this function to make it a little bit more concrete. So now that we have our function that let's try some queries and we can investigate the data. And we're going to use SQL to do this. But don't worry if you're not a SQL expert, I'm not either. We're just going to walk through a few basic SQL commands that you can use to get a lot of information out of this data. So the first thing we're going to do is take a look at a subset of the data. we're just going to pull the first five rows of the data set. And that's so we get a feel for what's actually going on in this data set. And what all the scope two, scope three, scope three business actually means. so let's start defining our query. We're going to write this in SQL. But we'll format this as a string. So, we'll say select star And so if you aren't familiar with SQL that's going to pull all of the columns. And then we need to pass and the name of the BigQuery table where we have our data. So I've gone ahead and put all of the carbon footprint data in a BigQuery data set called Carbon Footprint. And a table called sample Data. And this is the name of the project that it's located. And so this is where all the carbon footprint data is currently located. And I set this up beforehand. And then I'm going to say limit five so that we only get the first five rows. And we don't get a whole bunch of data because we just want to look at it and just kind of get a look and feel for the data. So once we've defined our query, we can call this run BigQuery query function up here and get the data back. So let's go ahead and do that let's call this return data frame sample data frame. And we'll say sample data frame equals run BQ query. And then we'll pass in our query right here. So let's execute that. And when it's done we'll see this little note here saying the job was finished. And here is the job ID. So let's now print out this data frame. Each row here in this table represents the emissions for a particular Google Cloud service in a specific project over the time frame of one month. This is actually all real data from three of my projects in Google Cloud. So in this, you're going to see different Google Cloud services that I used at different times for all of the demos and machine learning content and courses that I create. That's what you're seeing here. If we scroll through these columns here, you'll notice that there's not one single column, unfortunately, called emissions. If only it were so easy. But that's because there are a few different ways to calculate and report this number. So let's zoom in on this column here called carbon footprint kg CO2 e. And in fact, we're going to go ahead and print out the value for this first row here. So we'll extract this value for the first row. You'll see that this carbon footprint kg CO2 e field has three nested columns: scope one, scope two and scope three, which are the scopes that we talked about earlier. So what are these numbers telling us? Well, this is carbon footprint data I compiled from some of my Google Cloud projects. And this row specifically, if we print out the value for the service column here is from something called Cloud Run, which is a particular Google Cloud project. So these scope one two and scope three numbers here are telling us the emissions that were produced from my cloud run usage that particular month. So if we look at scope two here, you can see that 4.966 times ten to the power of negative zero five. Are the emissions produced by scope two. And this is estimated by taking the greenhouse gas emissions produced from electricity provided by the local grid. Where this compute workload was executed. Here is the value for scope one and then this value for scope three. And in the carbon footprint tool these two numbers are calculated by taking the total Google Cloud scope one and scope three emissions, and then apportioning them based on your specific Google Cloud usage. So to tie all this together, when we look at my usage of cloud Run, it's not just the scope two emissions right here, which were the emissions produced from the electricity generated at that matter. But this carbon footprint tool also gives us an estimate for the additional emissions that were created as a result of operating and running the data center where I actually ran this workload. Now, just a quick note. You might have noticed that scope two here has this extra key called location based. This refers to the actual amount of CO2 equivalent being emitted from electricity consumed in the usage of a particular Google Cloud service. The reason that this is called out as a specific key is that in the future, there's going to be a market-based value here as well, which takes into account Google Cloud renewable energy purchases for the workload. But that number doesn't exist in the data right now, so you don't need to worry about it. you can ignore it. But in case you're wondering why the scope to have a specific key here and you don't see that for scope one and scope three, that's why. But the number is still the same. And the number here is what you care about. Now in addition to this carbon footprint kg CO2 e column, you'll see if we scroll over that, there is also carbon footprint total kg CO2 e. So, let's print this out. If you want to know the total emissions for all three scopes and not just for scope two electricity usage, this is the field to look at. And we can actually test this out and make sure the math is correct. Because if we add up scope one two and scope three from here, we should get this total location number right here. So let's go ahead and do that. Let's add up the value of scope one plus the value for scope two and the value for scope three. And if we do that there we go. The number is the same. So really even though in this course so far we really just talked about scope two emissions, if we wanted to get a full picture of our impact, adding in scope one and scope three, and looking at this total number here, would give us a better estimate of our carbon emissions. So now that we've taken a quick look at the data, what types of interesting queries can we run? Well, if we want to know the total emissions from electricity generation for one particular service, let's say BigQuery, here's what you could run. Well, define a query and we'll say select. And then we'll do a sum. And we're going to add up all of the scope two values in our data. And we'll say from and we'll specify our same BigQuery table where I store all the data. And then we'll add this little where clause here at the end. So we only want to add up scope two values for rows that are related to BigQuery. So if we look back up here only want to grab the emissions from a rows here that are for the BigQuery service and not for other products that I may have used on Google Cloud. So now that we've defined our query, we can pass that into our run BigQuery query function. And we'll see a little note when the job is completed. And let's take a look. So here are the emissions. So just to recap in my three Google Cloud projects here that we're looking at, data from 0.199 is how many kilograms of CO2 were produced from all of my usage of BigQuery, which is pretty low. But BigQuery is not a product that I use too much. So let's try a different query now. What if you wanted to see the breakdown of carbon emissions across all three scopes, not just scope two for a particular project across month, region, and service? Here's a complicated query that you might run. So this time we're going to type select and select is just defining the columns from our data that we want to actually pull. So I want to pull usage month which is the month that this data is from the service description which tells us what Google Cloud Project, that row refers to. And then also the location which tells us which data center that workload was running. And then we'll also grab the total kg CO2 e field. This is going to be all of the carbon that was produced from all three scopes. And then again we'll say from and this will be from our same table here, the sample data table here where I put all the carbon footprint data. And now we need to define a "where" clause to specify the project number we care about. So this data has three different projects in it. And we're going to look at project number 11111. so obviously my project number wasn't actually 11111, but I just changed a couple of things to anonymize the data that I made up a new billing account number and some projects numbers and stuff. So that's what that number there is. And then we want to also order this data and we'll order it by two fields the usage month and then the service description. Then we need to close out our string. So let's run our same BigQuery query function. And we pass in this new query. And then we can print out the results. Once that completes. So again this new table here is basically a filtered version of what we were looking at earlier. But it's just for one particular project. So each row here tells me in this specific project 11111, it tells me that I was using cloud build in 2021 06 01. And here are the resulting carbon emissions that were produced. And it also tells me the region that I actually ran that workload in. let's take a look at one more query. if you wanted to know the total amount of emissions across all three scopes for a specific project, here's what you would run. First of all, define the query. And we'll format this as an F-string. And then we'll do our select. And we're going to do a sum over the carbon footprint total kg CO2 e location based values. And let's call that "carbon emissions". So that comes back as a nice name instead of and so we can actually have that new column and name. And then we also want to select the project number column. Then we will define our "from" statement. And specify our carbon footprint table that we've been getting all this data from. And then lastly we'll group this by our project number. Then we can execute this query using our run BigQuery query function. And we will print out the results. So now you can see for the three projects in this data set you can see the total amount of carbon emissions. So just to give you a little bit of context, according to figures from the German nonprofit organization atmosphere, flying from London to New York and back, it generates about 986kg of CO2 per passenger. So this one project here is just a little bit less than two flights from London to New York and back. And that's all just from training, machine learning models and building demos, which is what I do in these projects. Now, if we were actually just sum these three numbers using this query here, we can see the total amount emissions from all of my Google Cloud activity in these projects. Let me just ask my SQL skills here because, let's make sure this is actually accurate. So here I will take this first number. Add this one. Let's hope these numbers actually add up. Okay. Phew. They do. All right. So I was right in defining my queries. So if we take this number right here, and we divide it by 986, which was the amount of kilograms of CO2 per passenger flying from London to York and back again, we get 28, almost 29. So that's close to 29 flights to and from New York that I produced, just from training ML models and building demos and all of those other things that I do in these Google Cloud projects. And this number I thought was pretty surprising to me. I think, personally, it's easy to see the way we have an impact on the environment when we, use single-use plastic or we put gasoline in our car or something that's a little bit more tangible. But executing cells in a notebook, running code, it kind of feels pretty abstract sometimes. And so it's difficult to actually tie what we do when we're writing code to the actual impact we can have on the environment. And so being able to see my carbon footprint data like this and see the numbers so clearly, for me, it was really eye opening. So now it's your turn to experiment with the data and see if there are other interesting queries you would like to run. And if you're like me and you're not super familiar with SQL, this data is actually small enough that you can actually load it into a pandas dataframe in this notebook, and so you can use pandas instead if you wanted to analyze the data that way. So let me quickly show you how to do that. We'll define a query. this query is just going to take all of the data. From our table. And so we'll just pull everything. And if we pass this query into our handy helper function here, what we'll get back is a data frame that has all of the BigQuery data. But now in pandas. So you can analyze it directly using Python. So that's all for this lesson where we saw how we could get a complete picture of the carbon emissions produced from all of our activities using Google Cloud. In the next lesson we'll talk about some next steps and further reading and also some concrete steps that you can take in Google Cloud to lower your carbon footprint. And I'll see you there.