As AI Ubiquity Looms, CAST AI Balances Cloud Efficiency & Environmental Impact

Episode 34 AI Ubiquity CAST AI Balances Cloud Efficiency & Environmental Impact

Today, we are witnessing the price of progress. As generative AI swiftly evolves amidst a booming landscape of adoption, the marvels of artificial intelligence are met with astounding costs and challenges. The allure from the VC community and tech giants, who have invested billions of dollars into startups specializing in generative AI technologies have not considered the underlying reality of these high costs that threaten this current boom.

As of June 2023, ChatGPT has received 60 million visits daily, with 10 million queries per day. As of April 2023, it was estimated that to run ChatGPT would cost $70,000 per day at an average cost of $0.36 per question. In June, however, “Tom Goldstein, an AI ML Professor at Maryland University, has estimated the daily cost of running ChatGPT to be approximately $100,000 and the monthly cost to be USD$3 million.”

This recent article profiled one startup, Latitude, which found itself grappling with exorbitant bills as their AI-powered games, like AI Dungeon, gained popularity. Latitude’s text-based role-playing game utilized OpenAI’s GPT language technology, resulting in soaring costs proportional to the game’s usage. Content marketers’ unexpected usage of AI Dungeon for generating promotional copy further exacerbated the startup’s financial strain.

One of the primary reasons for the high cost of generative AI is the substantial computing power required for “training and inference.”

I met with Laurent Gil, former lead of Oracle’s Internet Intelligence Group and current Cofounder of CAST AI, which is an ML powered cloud optimization platform that analyzes millions of data points, looking for the optimal balance of high performance at the lowest cost. CAST AI determines how much you can save, then reallocates your cloud resources in real time to hit the target with no impact to performance.

Transcript

Hessie Jones

Hi everyone. I’m Hessie Jones and welcome to Tech uncensored. This is a special edition. We’re live at collision for the next three days. We’re going to showcase some amazing companies. Today we’re going to talk about generative AI and the cost of embracing it from a cloud perspective and I’m here with Lauren Gill, who formerly led Oracle’s Internet intelligence group, is now the co-founder of Cast AI and Lauren is here at collision talking about cloud computing, especially in the age of generative AI. Welcome, Lauren!

Laurent

Thank you. It’s a pleasure to be here as usual and it’s a really cool show as well.

Hessie Jones

I’m going to start with a bunch of stats. Especially from the perspective of ChatGPT as of June 2023, ChatGPT received 60 million visits a day, 10 million queries a day. How much does it cost to run 700 thousand U.S. dollars per day or $0.36 per question. The report estimated that we’ll need 30,000 more GPU’s just to maintain Chat GPT’s current trajectory to 2023. So, from that perspective, it seems like the future is going to be using a lot of our electrical grid.  Let’s talk about that. Can we start with cloud usage today? Is it fair to say from an infrastructure perspective that a lot of it before Chat GPT came onto the scene that a lot of it was used by storage?

Laurent

Oh, it’s interesting what you say. We see it a little bit differently. But if you look at it, if you analyze in general for cloud, AWS, Azure and Google, its about the same, but half of your bill will be for compute. When I say compute, it means CPUs and memory mostly it’s like 90% of it and the other half of services. So that includes storage, database services, everything else. What we see for AI company is usually it’s squeezing more towards compute and less over the rest, because most of the costs of running this model are on compute GPUs. As you mentioned, CPU and memory very heavy user of that. What is interesting though I was listening to your stats on Chat GPT. It’s very recent so we have many customers on the cloud, and we have very great visibility on all three in all the regions, all over the world and we are managing and optimizing millions of CPUs every day. This is the trend we see. Three months ago, my answer would have been different, and that’s why it’s recent. A lot of AI companies now are spending a lot of dollars in training the models that they have. Chat GPT was trained six months ago over a long period of time, but we see a lot of companies now training specialized AI models and you know that because their CPU usage could be 0 or like 50 CPUs like almost nothing, goes up to 50000 CPUs and GPUs. Huge applications for like 3 hours and goes down to 0. That is training, that is the cost of doing training for machine learning. It’s not an inference, it’s not yet using it, it’s training the model. And these are extremely huge and heavy use of compute that we see.

Hessie Jones

Can I interject for a sec? Because the companies that can afford to train the models? Are they later stage startups that can afford the compute cost? Because to me, if generative AI is the future, then there’s already going to be a disparity among those that can actually use this type of technology.

 

Laurent

There are there are two kinds of AI engine there is the, let’s call them the generic one and open AI is one of them. Generic means they need a huge amount of compute for a very long period of time because they are training sample is on a huge amount of data, so that’s where you costs come in. I’m talking of thousands, 10s of thousands, hundreds of thousands of CPUs reserved for those generic huge company. I think there will probably be fewer players because it’s so expensive to run. You see some VC round Series A and Series B series in Europe before where they spend, they do a series of $100 million. Usually all of it is to pay for those machines. Either they rent it, or they buy it. That’s one kind of machine learning company. The second kind, where I’m a lot more excited about are all the startups and enterprises that are building specialized models. Specialist model means the same idea of GPT, I want to ask a question, or I want something friendly to ask a question could be an electronic question or a human question. But the model is very specialized in solving one thing and solves it very, very well. This is in my mind, this is where the industry is going, which is you are selling a specialized model that does one thing very, very well. He doesn’t need to know what time your flight is leaving, it just need to solve the question that you are asking and this is the kind of company we see where you need to have a huge amount of compute for short period of time, but not six months as chat GPT is asking or more than that. I think it’s a very specialized time. This is where we go. This is where the interesting piece of the action is, and the interesting value of those, it’s not so much in the model. They are very well known, but in the data says if I own data that is very unique to me and I train an engine to this, that means I will have the best answer to the industry I’m active in. I think that’s a new economy that is coming.

 

Hessie Jones

If you’re talking about accessibility, and so from that perspective, what would I? I don’t know if you can say an average cost or usage for a start-up. That is doing something very specific on a monthly basis.

 

Laurent

I’ll also give you statistics we have and its mind blowing. On average, what we see with our customers that we have is before they go to us and after we optimize that, we optimize cloud cost and we do it in real time. So, we know a lot about the cost to run an AI engine. The average cost optimization we have for those examples that I just gave you of you need to train a model for a few hours or a few days and then it goes down to 0. The cost optimization we provide is in the range of 80 percent. So, think of this as before they come to us. They will spend $100 to train their model on average.  after they optimize that 100 becomes 20. The reason is another of these models and these companies is very hard to estimate by hand. That’s how they do it today to estimate. What is the true amount of compute need and once you know that number, how do you provide it only for the period when you need it?  We built an AI engine that exactly understands that. It will provide only what you need. Nothing less, but also nothing more than that. And then as soon as the model is done, or as soon as the request of compute goes down, we will delete all these machines in real time and very fast. And that’s where the 80% cost optimize would come from. There’s no magic there. It’s just the idea that we built an AI that optimized AI in the sense you can see it this way. We bought an engine that we know exactly what you need. We’ll supply that from the cloud provider and then we’ll shut it down gradually as soon as you stop using it. We help a lot these young startups that need to train an engine that is very expensive. Where we tell them, look, it’s very expensive, but it’s going to be 5 times less. Because you don’t need to use all this, we know exactly what you need. Our engine knows exactly what you need and we supply it in real time. This is universal in every of the users that we see coming to us and we have a lot of them. Those are the statistics we see for AI training.

 

Hessie Jones

OK, so I’m going to throw something at you from I would say an environmental sustainability perspective, because now we know what one of the reasons why Bitcoin decentralized Ledger hasn’t been adopted was the amount of computing cost that it takes on our electrical grid. That’s an environmental cost but now we’re seeing the same thing that’s coming out of generative AI. I realize that you are trying to create a lot more efficiency from that perspective. But if every, let’s say new startup that has to deal with data, eventually defaults to some kind of generative AI, what do you think that means in terms of how the industry needs to change or adapt to that kind of demand?

 

Laurent

On one side, the obvious one – huge investments. You can see all the news about it almost every day. VCs are funding it, so the NVIDIA GPU sales come from that for the most part. Now these GPU must go somewhere, they will go to probably one of the three cloud or all the three cloud providers. The big ones, great. So, the economy would be nice, thanks to this because these GPUs, when they are there, they are available, then we can start to use them. However, what I see that is fundamental in this is the energy consumption of it. It’s not just that you need to build new houses to house these machines, is that they need a huge amount of electricity, especially in GPUs, and this is where we come in. You see, when I give you the statistics of 80% cost reduction, yes, it’s a cost reduction in dollars, but it’s truly a cost reduction in utilization So, if you mobilize 10,000 CPUs for four hours or mobilize 10,000 CPUs for 30 minutes, then your energy consumption is very, very different. So, the same optimization you have on dollar of CPU usage, you have it directly into energy consumption and that’s a great byproduct that we see across all our clients. We’re actually trying to measure it now. It’s fascinating when you see it because you see on the on the dashboard. It’s nice to see a dollar sign. So, you know how much you are saving and we are starting to add the CO2 savings to that also. Say, as a byproduct of it.

 

Hessie Jones

Oh, that’s good. You are counting carbon footprint savings!

 

Laurent

We have a few clients for whom it is actually mandatory and so, we are calculating this number and staggering because there is a direct correlation.  Of course, when the CPU is not used, 100% doesn’t consume that much so you must factor this into account. But effectively 2 things we do, we have an impact – The energy consumption is much lower because you don’t need this machine for that long. But also, these machines become available for someone else and that’s a great byproduct. If you use them for two hours instead of four hours, means for the other two hours, the cloud provider can actually resell them somewhere else, so they don’t. When I say they can resell them it means they don’t need to build more because they have capacity. Then we provide them with more capacity for compute they had before because we use them a lot. We prevent the waste of the usage a lot less, so that’s the nice byproduct for the cloud provider itself. We have some owners of the data centers and they start to realize, I have 10,000 servers in my data center and we told them think about this, if it will feel the same as having 16,000 because we’re going to use them so well that there will be a shrinkage of your existing customers so that the new things that appear available you now can sell it again to other players, don’t build a new data center, just use the machine you have better.

 

Hessie Jones

Yeah, that that seems to be almost like the future, and I see this kind of innovation even when it comes to Internet where the utilization of Internet, they say it usually happens at night and but during the day, if it is available, you could actually give to other people. That’s the same premise.

 

Laurent

It’s exactly the same. It’s exactly the same idea, but for compute and the realization that cloud was invented for a great reason that we love, it’s elastic means you buy when you need it. You return it when you don’t need it and if you have tools that follow the utilization like I just said, then effectively you can save in energy in in cost because you only pay for what you use. In the past in Where you say, well, I don’t really know what I need so. I will Add all of it. Just in case, here we say no. Just pay for what you need when you need it. Use some smart technology to do this. We are one of them, but we have a lot of colleagues in the same field and it’s great that we have other players like us that do the same thing because we help solve the shortage of machines. We help solve the energy consumption problem. We helped the data center better utilize their investment because now they can resell a few times more what they what they own, and it is at the end. It’s better for the economy and the industry.

 

Hessie Jones

That’s amazing. One last question, what are your goals for this year and are there, are there specific milestones that you need to hit to get those? To those goals.

 

Laurent

Yes, so a little bit selfish here we we have a fantastic team about 2/3 are in Europe. We are in Lithuania, in Vilnius. We have another big player in Vilnius, which is super cool, and a lot of talented players there. It’s a pleasure when I go to my office my my main office is over there in in Lithuania. It’s a Baltic state in Europe. About 100 people. Now we’re going to triple the size of the company. We just raised a big series A, two months ago. So, we’re going to triple the team. We have a lot of users coming to us and customers and a big organization that came to us, a lot of them are AI based. They need to find compute, they cannot find them. We help them only find what they need, so it’s great. It’s fantastic for us. This industry is booming, and we are surfing the waves the same way as others. And for us, the more NVIDIA cell GPUs, the better the world is. That’s how we see it and the better the consumption and the utilization of these resources will be.  It’s really the beginning of it. A lot of people tell me. But NVIDIA is it’s a high the stock goes up and it will go down. No, you should think of this as now we’re training when the training phase, we see that with our client, meaning huge consumption for short period of time. That means we’re training. We’re making this model knowledgeable to solve a problem. The next phase is using them. We haven’t had a lot of this yet. Chad GPT was the first one. Using them is where the economy is, where industry will grow. It’s hiding behind the hype of buying GPUs to train just behind that is using it and that’s where the industry will explode.

 

Hessie Jones

I’m looking forward to seeing it and congratulations on the raise.

 

Laurent

Thank you!

Host Information
Hessie Jones

Hessie Jones is an Author, Strategist, Investor and Data Privacy Practitioner, advocating for human-centred AI, education and the ethical distribution of AI in this era of transformation. 

She currently serves as the Innovations Manager at Altitude Accelerator. She provides the necessary support for Altitude Accelerator’s programs including Incubator and Investor Readiness. She will be the liaison among key stakeholders to provide operational support and ultimately drive founder success. 

LinkedIn

You can also listen to this podcast on Transistor.

Please subscribe to our weekly LinkedIn Live newsletters.