Is open source AI catching up to proprietary models like GPT-4?

Yes, per Galileo CTO Atin Sanyal. He pointed to Llama 3.1 and 3.2 outperforming GPT-4o on many benchmarks and noted the cost of intelligence has dropped enough to run these models on a laptop. The bigger advantage is open weights: you can fine-tune them with far more freedom than a closed fine-tuning API. Vikram Chatterji and Yash Sheth agreed both have a place — open source acts as a distribution channel and commoditizes the expensive training layer.

Why do enterprises need a trust layer for AI applications?

Because, as Yash Sheth put it, AI development has gone “back to the stone ages of the SDLC” — manual, no clear process, “vibe checks” instead of the IDEs, CI/CD, monitoring, and firewalls that let traditional software ship safely the same day. Atin Sanyal noted you can’t manually check every one of a million production queries, so without a deterministic evaluation and trust layer, teams can’t guarantee an app won’t damage their reputation or erode user trust. Galileo raised a $45M Series B to build that layer.

How much of Google’s code is written by AI?

About 25%, a stat Sundar Pichai shared on Google’s earnings call. The Galileo founders framed it as real proof of value rather than hype: Atin Sanyal said it likely automates the most obvious, boilerplate “grunt work” — the code developers are too lazy to write — freeing engineers for more nuanced problems. Conor Bronsdon added a parallel from Amazon’s Andy Jassy, who cited saving 4,500 developer-years upgrading code from Java 11 to Java 17.

How does Writer get enterprise AI accuracy without making customers fine-tune?

CEO May Habib says Writer ships domain-specific models — for financial services, customer support, creative/marketing, and medical — preconfigured and close to the use cases vertical by vertical, so customers don’t need to fine-tune. In its AI Studio, Writer captures business logic by pairing it with examples rather than data, and examples need far less data than fine-tuning. Writer also runs its own QA teams staffed with PhDs to build evals even when the customer hasn’t.

Should AI regulation target foundation models or applications?

The Galileo founders favored guardrails at the application layer. Yash Sheth warned that restricting foundation models upfront would “drastically reduce the number of use cases” and stifle innovation; instead, model makers should disclose training data to avoid harmful or biased content, while application developers handle domain-specific compliance (e.g., healthcare). They noted regulation is fragmenting across state and federal levels, and that AI policies jumped from 25 in 2023 to 82 by March 2024 and near 100 by year-end.

Episodes · S1 E1 Next →

The State of AI: Open-Source Models & Enterprise Trust | May Habib

Nov 6, 2024 · May Habib , Writer · 49 min

Open Source AI AI Evaluation & Reliability AI Observability Enterprise AI

Listen on any app

Key takeaways

Atin Sanyal argued open source is overlooked relative to proprietary hype: Llama 3.1 and 3.2 now outperform GPT-4o on many benchmarks, the “cost of intelligence” has fallen enough to run models on a laptop, and open weights give far more degrees of freedom to fine-tune than a closed fine-tuning API.
Vikram Chatterji reframed corporate open source as a distribution channel, not altruism: Meta open-sourced Llama as a “lead generation mechanism” and brand play after lagging. He contrasted it with Google giving BERT away for free while OpenAI invested in more parameters and larger context windows and “won” the early language-model war.
Yash Sheth said Meta spending hundreds of millions to train open models commoditizes the costliest part — training and data gathering — so banks, healthcare, defense, and finance teams can build proprietary AI on top, massively furthering adoption that the proprietary-only world couldn’t match at the same rate.
The founders argue AI development has regressed to the “stone ages of the SDLC” — back to “vibe checks” and eyeballing — because there’s no trust layer. Atin noted you can’t manually check every one of a million production queries, making a deterministic evaluation/trust layer the critical bottleneck for shipping, which is why Galileo raised its $45M Series B.
Yash and Atin tied the ChatGPT moment to two human-driven factors — RLHF (human feedback as reward signal) and data quality — without which GPT-3.5 would have been only “sublinearly” better than GPT-3. The same human-in-the-loop principles now need to be baked into evaluation pipelines: one founder wagered that ~90% of the time to ship a new GPT version goes to human annotation and evaluation, not GPU training (his own bet, not an OpenAI figure).
May Habib said Writer’s “secret sauce” is abstracting complex engineering away: its RAG is “zero engineering rag” where the word “rag” appears nowhere in the product, guardrails are built with LLMs rather than naive rejects, and domain-specific models (financial services, customer support, creative, medical) ship preconfigured so customers don’t need to fine-tune — capturing business logic via examples, which need far less data than fine-tuning.

Frequently asked questions

Is open source AI catching up to proprietary models like GPT-4?: Yes, per Galileo CTO Atin Sanyal. He pointed to Llama 3.1 and 3.2 outperforming GPT-4o on many benchmarks and noted the cost of intelligence has dropped enough to run these models on a laptop. The bigger advantage is open weights: you can fine-tune them with far more freedom than a closed fine-tuning API. Vikram Chatterji and Yash Sheth agreed both have a place — open source acts as a distribution channel and commoditizes the expensive training layer.
Why do enterprises need a trust layer for AI applications?: Because, as Yash Sheth put it, AI development has gone “back to the stone ages of the SDLC” — manual, no clear process, “vibe checks” instead of the IDEs, CI/CD, monitoring, and firewalls that let traditional software ship safely the same day. Atin Sanyal noted you can’t manually check every one of a million production queries, so without a deterministic evaluation and trust layer, teams can’t guarantee an app won’t damage their reputation or erode user trust. Galileo raised a $45M Series B to build that layer.
How much of Google’s code is written by AI?: About 25%, a stat Sundar Pichai shared on Google’s earnings call. The Galileo founders framed it as real proof of value rather than hype: Atin Sanyal said it likely automates the most obvious, boilerplate “grunt work” — the code developers are too lazy to write — freeing engineers for more nuanced problems. Conor Bronsdon added a parallel from Amazon’s Andy Jassy, who cited saving 4,500 developer-years upgrading code from Java 11 to Java 17.
How does Writer get enterprise AI accuracy without making customers fine-tune?: CEO May Habib says Writer ships domain-specific models — for financial services, customer support, creative/marketing, and medical — preconfigured and close to the use cases vertical by vertical, so customers don’t need to fine-tune. In its AI Studio, Writer captures business logic by pairing it with examples rather than data, and examples need far less data than fine-tuning. Writer also runs its own QA teams staffed with PhDs to build evals even when the customer hasn’t.
Should AI regulation target foundation models or applications?: The Galileo founders favored guardrails at the application layer. Yash Sheth warned that restricting foundation models upfront would “drastically reduce the number of use cases” and stifle innovation; instead, model makers should disclose training data to avoid harmful or biased content, while application developers handle domain-specific compliance (e.g., healthcare). They noted regulation is fragmenting across state and federal levels, and that AI policies jumped from 25 in 2023 to 82 by March 2024 and near 100 by year-end.

Concepts in this episode

AI terms discussed here — each links to a plain-language definition.

Open Weights Foundation Model Retrieval-Augmented Generation (RAG)Accuracy AI Evaluation Artificial General Intelligence (AGI)Explainability Tool Use (Function Calling)AI Alignment AI Guardrails

Chapters

00:00Introduction to Chain of Thought Podcast
01:27Big News in AI: ChatGPT and Anthropic
06:34Open Source vs Proprietary AI
12:17The Importance of Trust in AI
20:12Challenges in AI Development and Deployment
22:07The Role of Human Input in AI Development
28:45The Future of AI Regulation
34:41Interview with May Habib co-founder & CEO at Writer
40:01What’s Writer’s secret sauce?
43:31Challenges in productionizing GenAI
48:08Conclusion

Visual explainer

The argument the founders made: open weights commoditized the expensive training layer, so the bottleneck for shipping moved up the stack to evaluation and trust.

Show notes

From ChatGPT's search engine to Google's AI-powered code generation, artificial intelligence is transforming how we build and deploy technology.

In this inaugural episode of Chain of Thought, the co-founders of Galileo explore the state of AI, from open-source models to establishing trust in enterprise applications. Plus, tune in for a segment on the impact of the Presidential election on AI regulation. The episode culminates with an interview of May Habib, CEO of Writer, who shares practical insights on implementing generative AI at scale.

Follow:

Vikram Chatterji: ⁠⁠⁠https://www.linkedin.com/in/vikram-chatterji/⁠⁠

Atin Sanyal: ⁠https://www.linkedin.com/in/atinsanyal/ Yash Sheth: ⁠https://www.linkedin.com/in/yash-sheth-/⁠

May Habib: https://www.linkedin.com/in/may-habib/

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

Show notes: Watch all of Productionize 2.0: ⁠⁠⁠⁠https://www.galileo.ai/genai-productionize-2-0⁠⁠

Transcript

108 segments

Conor Bronsdon 0:08 Welcome to episode one of Chain of Thought, everybody. This is the podcast that demystifies AI and positions builders to effectively put GenAI into production. I'm your host and moderator for today's episode, Connor Bronson, head of developer awareness for Galileo. For the last several years, you may have heard me previously hosting the Dev Interrupted podcast, so you may recognize my voice. I'm also delighted to be joined today by three expert panelists to talk about the state of AI. And afterwards, we'll be interviewing May Habib, CEO and cofounder of Ryder on how enterprises are getting ROI out of AI and much more. But first, we're gonna talk about the state of AI today in November 2024.

Conor Bronsdon 0:44 And joining me for that conversation are the co founders of Galileo, CTO Atandriosanyal, aka Athen, CEO Vikram Chatterjee, and COO Yash Chef. You'll get to know each of as they co host episodes and interview some of the amazing guests we have lined up on Chain of Thought. There's so much more coming for you during the rest of season one, and the season two in January is gonna be incredible. But, gentlemen, episode one, great to have all three of you here. Thanks, Connor. Excited to get started.

Conor Bronsdon 1:09 I really am excited for this conversation because the three of you have shared so many insights with me as I have joined Galileo and as we've started to talk about the AI space. And a common theme I'm I'm hearing from each of you is the complexity and rapid pace of development in AI today, as I think anyone paying attention to the space sees. To to nobody's surprise, this means that we have some big news items for the past week that I I do wanna highlight. First off, ChatGPT,

Conor Bronsdon 1:33 officially an AI powered search engine, at least in demo form, and it's giving Perplexity a run for its money with its new prelaunch demo. And meanwhile, you know, rival Anthropix Claude can now control your computer. It's already been caught slacking off during a presentation, and, some folks I follow on Twitter are even having it play civilization. Vikram, based on your experience as a founder and as a former AI product leader at Google,

Conor Bronsdon 1:56 what's your take on these developments?

Speaker 1:58 First of all, thanks, Conrad, for doing this. Super exciting to have this podcast. Well, my honest take on a lot of this stuff is you'll see a lot of news coming out of these startups, especially the ones which have a lot to lose, like Anthropic, OpenAI, which are highly valued, where they wanna be ahead of the news cycle. I feel like a lot of their DNA relies on being the first in the news cycle. And so I think a lot of what you'll see from them is, oh, we're gonna launch this soon and showing

Speaker 2:28 people what is possible. So when I read and see all of this, you can, you know, in my head, I feel like I kind of know that companies like Google and Meta and AWS, they're already probably thinking about this, probably already have a team of 100 people that are working a lot of this. The search piece, right? Like I think OpenAI has packaged that in an interesting way and

Speaker 2:49 the world likes to see a David versus Goliath battle. So that's interesting. Anthropics thing about the computer use, it's interesting, but you know, we already, if you search even a little bit on the, if you search a little bit on Google, you'll hear that Google itself is working on its own computer action model, right? It's called Jarvis. I believe that's the code name. That's what if you just search just a little bit, you'll be able to see that. But, you know, Anthropicana refers to market or perhaps with that. So they went ahead with it. So my take on all of this is I think it's really exciting because it keeps the I think it helps people's imaginations

Speaker 3:23 with thinking about what's possible with AI beyond just chatbots because there's a lot more which you can do with these brains, these operating systems that can think now. But it's not necessarily something that's gonna be, you know, super widely adopted in the next, like, two to five months. Right? They're kind of like multimodal. It came up a long, long time ago, but it's still not primetime yet. But it's really great to see. It gives you a glimpse into where things

Speaker 3:51 can be in maybe one or two years. I also read somewhere that Sundar said that 25% of code at Google is written by AI now, which is insane. Like, you know, we all know how much code Google is built on. That scale is incredible. I mean, these are the wins that like really make the hype kind of real in many ways. While, you know, all these new announcements that we keep seeing, like I've also seen them kind of promote false expectations

Speaker 4:20 amongst leaders, amongst developers as well that, Hey, like this thing can, this model can do X, Y, and Z. I'm going to build an application that can take advantage of that. But then it's oftentimes too early for that technology to mature. And, you know, you get into these bad expectation cycles where you build a POC and, you know, you just can't launch it even a year later.

Conor Bronsdon 4:47 Yeah. And, Anathen, I I know you've obviously been an engineering leader at Uber and Apple working on different ML and AI systems before. What's your take on this announcement from Google, which they shared during their recent earnings call that, yeah, 25% of Google's code is AI generated today? Like, how is that going to impact

Speaker 5:05 all the different processes that go into enterprise code? I think it only highlights the fact that in the short and medium term, you can use these pieces of technologies as a great efficiency sort of uplifter. So at the very least, it can help you generate better code or write boilerplate code for you. I'm sure if you investigate the 25% of the code, it's likely the code that you know, it's the most obvious code to write, but our developers are likely very lazy to write two for loops, they would just quickly, you know,

Speaker 5:37 boilerplate code. And it's really great because you're automating a lot of the grunt work and really focusing on the more nuanced problems. That said, I think there's a lot more to be achieved in the longer run as you start building more complex systems on top of these LMs with agents and tools. Today, you'll see the grunt work get automated, and before you know it, you'll see a lot more advanced things also get automated. So I'm very optimistic and bullish on this. Yeah. To your point, I know,

Conor Bronsdon 6:07 Andy Jassy over at Amazon has talked about how they saved forty five hundred developer years of work on code upgrades from Java 11 to Java 17, that stuff that most engineers don't wanna spend a ton of time doing. So that that's certainly an area of opportunity right now for AI. I think you and Yosh are both alluding to that on efficiency on these, coding opportunities.

Conor Bronsdon 6:27 But I also know that while you've both found, you know, what the well known players like OpenAI and Anthropic are doing, we're not thinking as much about the open source ecosystem.

Speaker 6:38 What do you think the media might be missing on the open source side of things? Yes, ma'am. I'd love to hear from my cofounders as well. My personal take is that open source is very overlooked right now just given the amount of hype that's there on the proprietary side of things, but there's tons of innovation happening on the open source side of things, especially given a lot of the benchmarks that you measure and test these models on. You've seen Lama's latest 3.1,

Speaker 7:05 3.2 models outperform a ChatGPT four point zero on many of the benchmarks, and it only shows how the cost of intelligence has come down to the point where you can literally take these models, run them on your laptop. And more, I think the most powerful thing about open source is that these the weights of these models are open and you can fine tune them and can do a lot more. The degrees of freedom to which you can operate the open source models is a lot more than, say,

Speaker 7:32 OpenAI giving a fine tuned API. And the large businesses will always sort of work on any kind of innovation that's more catered to their business as opposed to the open source models are just free ingredients and you can do whatever you want, in the long run, you'll see a lot more innovation happen on the open source front compared to proprietary. Vikram, what's your take on Yeah. Agree. I mean, I do think there's a place in the world for both. Obviously, not saying like open source is everything and closed source should just go away completely. I feel like they both have their own place in the world because, I mean, they're both at the end of the day, I think of them as distribution

Speaker 8:08 channels. Right? It's open source is oftentimes not just for like, especially when it's led by a for profit enterprise. Open source is never a means to just do good for the world. It's actually a lead generation mechanism, Right? That's why Meta is doing it. It's for brand awareness. It's not for, you know, for the betterment of society. It's mostly because they have to play and they were lagging the walls. They made it open source so that people start showing them some love. And that's happening. People are loving it and they're kind of riding that riding that wave.

Speaker 8:40 That being said, I feel like because at Google, we kinda saw this firsthand with BERT, where Google in general is not very good at monetization of their products. Right? Google is generally good at building great products and giving it away for free, and that's what happens with BERT. And what we didn't see as a result of that was a massive amount of investment in making BERT bigger and bigger, faster and faster,

Speaker 9:03 Which OpenAI has done, more weights, more almost like more parameters and larger context windows. And now all of a sudden, they kind of won that language model war, at least the beginning of it. But as of now, what we see with larger models and this this investment almost solely by meta for the most part in open source so far, which is scary, which there should be other players coming in as well on the open source side to match them and give them a run for their money. But what happens as a result of that is

Speaker 9:31 is there's a massive proliferation of AI apps that are built out on opposite ends of the spectrum in terms of the developers, right? You have developers in a garage who maybe know something about accounting and they'll build an AI accounting application very quickly with this open source model. And then you have these enterprises where name your large Fortune 50 bank and they can't use OpenAI. They can't have the data go to a different service. So they want to be able to fine tune that model. So it's on both sides, it has massive advantages.

Speaker 10:01 From where, at least V said, V personally, the big thing is like, how can we get to the point where we remove all obstacles from the way towards building AI apps? Obviously, Galileo is in the business of removing trust as an obstacle, but there is also cost and there's also latency. And I think open source goes a long way for that. Josh, you obviously led AI engineering teams at Google. Do you think that AI engineering teams today are

Conor Bronsdon 10:28 experiencing

Speaker 10:29 cost as a real measure for them right now? Or do you see it on different areas? Yeah. I mean, think there's, to the the open source point that that Vikramanathan made as as well. Right? That that by meta training these open source models and investing millions, if not hundreds of millions of dollars in this, it is actually making it possible for application teams, for engineering teams build on top of those and almost like removing cost as a big factor. The biggest cost factor there

Speaker 10:59 for having any proprietary language model is the training and data gathering piece. Once you commoditize that, which Meta has done a tremendous job in doing that, every team, you know, in a bank, in a healthcare organization, in the defense industry, in the financial world, every team can now build proprietary AI using that, which is, you know, I think more from a distribute, like, you know, not only just a distribution mechanism for Meta, I believe that it's truly

Speaker 11:34 furthering the adoption of AI massively. If we were just in the proprietary world today, a lot of these businesses, a lot of these applications wouldn't have been able to adopt AI at the same rate at which they are today. So now cost being, you know, pretty much eliminated from that perspective, from the adoption perspective, there is definitely a scaling factor to that cost when you're trying to deploy these models

Speaker 12:05 and run them efficiently. But that's, I think, a good problem to have. If a team is getting millions of requests per day for their AI application, it is definitely worth investing more in that. That aside, I think one of the big obstacles for the biggest obstacle, you know, cost aside, not having a trust layer in their applications. Like even when, you know,

Speaker 12:30 even with my teams at Google, like, you know, one of the biggest bottlenecks in productionizing new applications, new models, new features was not having a deterministic process to almost have a guarantee that this application will not screw up in production to damage our reputation, to damage, to erode on user trust. And every team building an AI application

Speaker 12:57 does not want to erode their users' trust. For that, trust is an absolute essential ingredient of their AI stack. We have this today with software already, right? I think everyone resonates with this by now is software can deploy, can be deployed on, you know, same day if you wanted to because of an inherent trust layer, the rich IDEs, the CICD pipelines and the monitoring and tracing abilities in production, as well as firewalls

Speaker 13:27 for cybersecurity, make it such that software can be deployed in a safe, trustworthy manner and do that very effectively. With AI, you know, it feels like we're back in the stone ages of the SDLC where it's extremely manual. It's, we don't have a clear cut process. We don't know what to measure. And that's where a trust layer in the AI powered software world is absolutely critical.

Speaker 13:53 Harping on the importance of evaluation and observability and how we are sort of even with the MLOps revolution that happened in 2017 and we kind of hit ground zero back again when the ChatGPT moment happened and kind of went back to caveman tools of eyeballing and vibe checks and that was the case maybe ten, fifteen years ago with traditional ML and then we spent so many years getting to some degree of explainability around the smaller models and the ground got swept off our feet. So we're back to ground zero, which is where a lot of the innovation that we ourselves are doing at Galileo is kind of just setting the foundations of how you build

Speaker 14:36 a trust layer and evaluations layer to make sure these systems are a little more deterministic or at the very least you're able to sort of pinpoint why a certain thing was said by a complex AI system when they said it. But it's only scratching the surface and there's a lot to be a lot of more innovation on the evaluation front to be had in the coming years.

Conor Bronsdon 15:01 Absolutely. With how rapidly AI is evolving, I mean, look at agents. Right? Like, a year ago, we were barely talking about agents, and now they're seemingly everywhere. There's so much money flowing into the space. There's so much talent here that is rapidly changing the technology. And it can be really easy to get caught up in the hype versus actually getting real ROI from effectively putting, hey, applications into production, which, hey, maybe Google's doing that with that 25%

Conor Bronsdon 15:24 of their code now being written by AI. And I know that's part of why the three of you just raised your $45,000,000 series B. There's this real need for this trust layer. Vikram, I've heard you talk about how enterprises need to mature their systems with regard to AI and establish this trust layer. And this phrase evaluation intelligence has started to become something that is being in the lexicon

Conor Bronsdon 15:47 here. How would you kind of explain to the audience how this can solve this trust layer challenge that Yash and Nathal and you are all referencing?

Speaker 15:55 I mean, I- I think first of all, like when you think about AI in the enterprise and in general AI adoption across different use cases, across different kinds of workflows to just make people's lives easier, we're all in the very, very first innings. Right? And I think in general, as humans, we all have a tendency that as soon as we see a bright and shiny object, we try to put that everywhere all at the same time and hope for that to be the silver bullet that's gonna solve all of our bores.

Speaker 16:22 Not gonna happen. And so I think that's what's led to, like, a little bit of this question mark around, like, what's the ROI of AI? It makes for great headlines. I think it helps when people dampening the mood a little bit more. I think that's the right question to ask in enterprise AI right now for very basic use cases that are coming about. And sure, there's a here and now, which I can talk about, right? Past is important.

Speaker 16:44 But, you know, we studied the companies three and a half years ago with the idea that language models will get bigger and then when they get bigger, then, you know, there is going be this big challenge. And so we've always taken this very long term view about stuff. And when we think of this right now, again, like if you really draw this out and see where things are gonna be maybe a couple of years from now, what's gonna happen is these models are

Speaker 17:07 essentially thinking tools by themselves. Right? And you can apply this in different places. Right? Now you're applying them in basic software and helping getting better answers to the word, better search. That's very basic. Next step, like press a button and the model is going to press a different button for you. That's an agent going to do some action. That's great. Now we're also starting to see like AI at the edge just about take over and AI and robotics kind of becoming just about a thing on the horizon.

Speaker 17:33 Like, and you'd see like three, four or five years from now, right? It's basically a brain that you can put into a system to make it do things. So like, you have an intelligent dishwasher. You have an intelligent refrigerator. Like, all of that is gonna be powered by AI, right? Now, when it's everywhere, trust becomes extremely important. You know, it's and this is not like the world of Terminator or something like that. It's just more like it just shouldn't malfunction. It shouldn't do things which

Speaker 18:00 not coded to do. So it's very similar to how in software engineering, how you would always put like some basic checks and balances, right? You would always have some level of checking, like have a kill switch, let's say, or be able to be able to check for what's going on in the system at all points in time and where things can go wrong. And if there are, then you can have like a, you can have a flag that is set for that. So we've done these things before in software engineering and it's similar with this new form of AI application development, right, where if things go wrong in their system, you've got to be able to lay that out before it goes wrong, right, in the building stage. What does that mean? What are the tools you need for that? What's the mindset that you need to have to be able to create that workflow for yourself before you even launch anything? That's where software engineering for AI application development is at right now. That's basically needs

Speaker 18:49 a workflow, but it also needs a set of tools to do that. And that's where collectively you can solve that trust problem and then take that at scale to see where this goes. So I feel like this trust issue is just becoming a very critical need in the AI application development workflow. But as AI becomes more pervasive and as we move towards your your dishwasher using using AI,

Speaker 19:13 and while making this up, there was actually a company that was funded recently to help you do that $400,000,000 for physical intelligence. So Okay. Was gonna ask, like, what are we doing with the dishwasher and anything? So I'm glad you brought it. That's actually I was just talking to some friends over the weekend, and they were saying the same thing. Like, ah, you know, like, this is it's nice that I can chat with my app about the weather now instead of just looking up the weather in Google search. But is that ROI for AI? Like what I really want is for it to do my dishes and for it to do my laundry. And that's kind of basically what AI and robotics means. Right? So that's getting funded a lot more now. So that's, and that's what's coming, right? And that's what Galileo has been built for. It's not just for the here and now of today for

Speaker 19:54 like you to be able to build smarter chatbots. It's for you to be able to build a world that's, which is ubiquitous AI everywhere. We help make sure that that's going to be safe and secure and trustworthy. So that's how we perceive the world of AI. That's where we're here for the next like ten, twenty years to see that coming along. But we're very much in the first innings and we're already seeing this huge need for moving away from humans and Yashatan and I have seen this and painfully being a part of this human evaluation flow where labeling companies throw humans at the problem and just make a fortune out of it. They're the bane of everyone's existence, seriously. And we hated labeling companies. We still do. Software engineers don't like that.

Speaker 20:34 And that's the reason Galileo started. It's like, how do we get away from this? How do we automate this? And thankfully, as soon as LLMs became a thing, everyone was like, Hoorah, we don't have to label things as much again. Very few people are fine tuning LLMs as a result of that. But, you know, at the same time, it does mean still we are stuck with the world of vibe checks and figuring out whether things We can figure out where things are going wrong. And

Speaker 20:54 that's one of the biggest bottlenecks right now for launching anything because no one can be sure. And what are you gonna do in production? You're gonna check every single one of those million queries that are coming in? You just can't. So that's actually a really big problem, which I feel like people aren't really talking about as much. And the other side of that coin is actually like, you know, while we,

Speaker 21:12 my teams at Google were faced by, you know, large amounts of like huge delays because of large amounts of annotation needs. And that's the truth even today. At one point, I think OpenAI threw out a stat that it takes them like fourteen months to, you know, release the next version of GPD. And where is that going? Where's the time going? 90% of the time I can

Speaker 21:38 put money on it and say that 90% of the time is going in human evaluation, human annotation and evaluation. They have all the GPUs in the world and it's not like they're waiting on GPUs to train the next model. It's on collecting and annotating and evaluating the data. To that note itself, right, like in order to make AI machine critical, there are two things that still need to be in place.

Speaker 22:01 One is the involvement of humans is still critical to this workflow. Like, you you can't just remove subject matter experts that actually shape these models in some ways. So human input is gonna be necessary while algorithms should take human input and help scale human activity to that level. So I think, you know, one of our customers, like we had an executive briefing and one of the executives rightfully

Speaker 22:32 said that while in the traditional machine learning world, the rigor, the effort was going into training models, fine tuning models, collecting the data there. Now these models are just general purpose and we want them to be general purpose. Hallucination is not necessarily a bad thing, but now the trigger has to be in that evaluation and setting up the trust layer

Speaker 22:55 for the AI systems. That's where the shift has gone from, know, Atin talked about the whole, like the maturity in the ML OX lifecycle. We got to bring that maturity to the trust layer in the Gini stack. Yeah, totally. I'll also add that the importance of human evals. I think one example of that is the reason why GPD 3.5 and that family of models is step function better than the earlier versions of GPD

Speaker 23:25 was because of newer elements which were added to the end to end training stack of these foundation models, including RLHF, where you use human feedback as reward signal. And RLHF and just data quality were the two sort of biggest driving factors in kind of creating the chat GPT moment. Like, without these two, this moment would not have happened and GPT three dot five would be maybe marginally or sublinearly better than GPT three. And these generative models weren't as great before.

Speaker 23:56 So applying the same practices to evaluation, I think one thing that we think deeply about at Gallileo is how to sort of scale this human feedback involvement. To Yash's point, to scale this manual feedback, you need this human feedback pipeline baked into the platform and we apply those principles, the same RLHF esque principles to evaluation where we offer certain metrics which allow you to quantify

Speaker 24:25 errors in your inputs and outputs but you can only create a certain baseline level of accuracy And then it comes down to the nuances of the data and the nuances of the feedback that you receive. So the problem now becomes how you bake this pipeline into the workflow so that these evaluation metrics evolve with your system and evolve with your data. I think what you're also seeing, Conor, is everything that Yashnad and also talking about. If you talk to a data scientist about the stuff, right, like you look at your data, give it feedback, iterate constantly, those are data science workflows, which but if you look at the people who are now starting to build out these AI applications, they're software engineers. So a lot of what's also is

Speaker 25:05 this shift in workflows where software engineers need to adopt data science principles in order to build better applications. And what we're seeing across our customers is they're mostly the ones who are more mature in that life cycle where they realize this, where they realize that they can't just slap together a model and some queries, and then it's gonna give a bit great answer, but it's very nuanced. It's an entire system they have to bring in place, and they have to iterate on that. And it's a lot about how do you bring these data science, the best practices from data science

Speaker 25:36 towards software engineering. That's gonna lead. Those are the organizations that do that successfully with the right tools and the right workflows in place that are actually gonna succeed in productionizing safe and trustworthy AI applications at scale. The rest of them are gonna have a harder time because they're just not following those principles.

Conor Bronsdon 25:54 Speaking of trust, we're recording this the night before the US presidential election, and regulation around AI is obviously a key topic that's come up in 2024. How do the three of you see regulation affecting the industry as we move forward regardless of of whoever wins tomorrow?

Speaker 26:12 I can I can take a stab really quick? And, you know, Atin, Vikram, please, add your thoughts as well. This is, you know, right before the election. As we talked about trust, right, Nick, as we also talked about how agents are becoming important and how, you know, the critical fabric of the world is going to be like imbibed with AI. For example, at Galileo, you know, we've been working with financial services, telecom companies, retail, healthcare.

Speaker 26:38 All of these are critical infrastructure for our nation. And as AI starts automating or augmenting all of these workflows, these critical workflows, we basically can't afford to have mishaps happen at that scale because that can have a direct implication on society, let alone like a fancy chatbot that gives me fancy answers. As we head in that direction, which we clearly are, regulations are going to be paramount.

Speaker 27:08 You know, we know from the executive order, from Biden, you know, then we talked about SMEs being part of the critical workflow here, even on the security side, on regulating AI, the necessity for red teams is going to be key. The necessity for monitoring key aspects of AI powered software are going to be key and we're seeing that across all of the regulations and the theme, the common theme at least, is that we must have the right

Speaker 27:38 explainability and visibility into these AI systems. And, you know, while there is a contention on who should be regulated, whether it's the foundation model providers or the application teams leveraging AI in their end user applications itself. I think everyone aligns on the fact that the trust layer is still going to play an important role. Now, if the Democrats win, we already have a flavor

Speaker 28:05 through the executive order. We also know that there'll be some AI bills coming out very soon and if the Republicans win, they've not, you know, again, there's been a lot of concepts and not clear plans around AI yet, but one thing we've seen from the campaign as well is that, you know, the AI regulations, the Republicans are low on regulations in general and, you know, they will be more open ended towards that, But in

Speaker 28:36 either ways, everyone has aligned on at least here's one bipartisan alignment at least on the need for regulation at least. Atid? Yeah, agnostic to who wins or loses, I think we've seen is that number one, the jury is still out on, I guess, a cohesive understanding of where the regulation needs to be, whether it's at the foundation model layer or at the application layer, like Yash said. But also there seems to be a bit of a fragmentation

Speaker 29:05 per my understanding at the state level versus at the federal level on whether there should be curbs and guardrails around the big companies who are training these models and the data that's going into the foundation layer versus should the guardrails be at the output layer which seems to be more of a federal take apparently that the output shouldn't be biased?

Speaker 29:27 But irrespective of where, you know, the buck sort of falls in the coming months and years, like Yash said, I think the the need for guardrailing will be important both at the foundations model layer, which are being trained by these super large companies costing hundreds of millions of dollars and that technology emanates outward at the application layer where smaller players are building on top of, what we'll see is this proliferation

Speaker 29:55 of applications and kind of almost like a rapid spread of this technology across the country and across the globe eventually. And there will be some form of guardrailing needed at each layer to make sort of faithful, safe applications. This feels like a an opportunity for us to have a a further in-depth discussion. We're gonna definitely do an episode on this later this year. Any other thoughts you guys wanna add around this? Yeah. Just one last thing is at a very, very macro level, it is interesting that AI is just getting regulated this way, right? Because software engineering was never regulated this way in our careers, a couple of decades. So the fact that it is getting regulated at the state and the federal level is, this is again tantamount to the fact that it is a really powerful technology. Everyone accepts that. But again, a Rust perspective, when we think about this, the reason we've been doubting this word so often is because that means not just

Speaker 30:49 accuracy and, data privacy, but it's also a lot of to do with the fact that can you even ship a product in the state of California, which uses AI such that it is adhering to all of the laws of the land, right? That becomes really, really important. And if you look at the last couple of years, again, 2023, there were just 25 AI related regulations. But by March 2024,

Speaker 31:12 there were 82 AI policies. And by the end of the year, now they're over closer to 100. When it comes to these, the different kinds of bills that are being passed around this, there are six sixteen bills across the 45 states and there are another 108 congressional bills around AI. So the flurry of bills and the number of AI regulations that are coming are going to be crazy. So it's all coming. It's going to become really hard for

Speaker 31:37 the software engineer at a company to ship their product because they allow you to adhere to all of these rules and laws. That's another reason. That's another big part of trust because at the end of the day, the government is there to have checks and balances on any kind of technology that can go off the rails. And so there again, this need for, to Atin and Josh's point around, like guardrails.

Speaker 31:57 This isn't just for hallucinations. It's not just for data privacy. It's also for compliance, regulatory, and you have to make it easier for developers to ship while it's also necessary to have these regulatory guardrails in place. Just one last take on what my opinion is of where we should be in terms of guardrails, right? Like, there's been a lot of debate here on whether we should have guardrails around foundation models and that will greatly

Speaker 32:24 stifle innovation from what I'm seeing. Today, these foundation models, you know, the application developers can leverage them in any way they seem fit. If we add card reads to what these large language models can support and the way they behave, it can drastically reduce the number of use cases that we can apply them for. And, you know, that's definitely not what we want to see. While the foundation model companies and teams should,

Speaker 32:54 you know, should ideally expose what data is going into those models to make sure that there's no harmful content or biased content that the models are learning from, but don't restrict these models upfront, have those guardrails at the application layer so that if I'm building an application that supports the healthcare industry, I, as a developer, can adhere to those guidelines

Speaker 33:18 for my application. So that's that's where my hope is the regulation lands.

Conor Bronsdon 33:25 I love that you're all thinking about this, and hopefully, we'll have a great regulatory regime that will get figured out here over the next couple of years. Great conversation, guys. Really appreciate you all coming on in queue. And speaking of companies that are successfully productionizing Gen AI, we have to give credit to Ryder. They are one of the companies that is doing a fantastic job working with enterprises,

Conor Bronsdon 33:45 to solve their challenges with AI. And we were delighted to have their cofounder and CEO, May Habib, join us for Productionize two point o, our wonderful online conference earlier this week, alongside luminaries from NVIDIA, Databricks, Twilio, HP, Cohere, Weviate, and many more. If you want to check those out, you can find them on galileo.ai. You can go watch all those sessions. And Yash sat down with May for an extended conversation on how Ryder has found success.

Conor Bronsdon 34:10 We hope you enjoy their conversation. And if you've enjoyed this episode, please take a brief moment. Before the interview, just rate and review the podcast in your podcasting app of choice. Spotify, Apple Podcasts, if you're on YouTube, hit that subscribe and like button. This all helps. Engaging with the show, subscribing, sharing with your friends, and especially via reviews provides crucial signals that help our episodes get discovered by our listeners. And since we're just starting out, we especially need the help. Alright. Without further ado, we hope you enjoy this interview with writers May Habib.

Speaker 34:41 Hi, May. How are you doing? I'm good, Yash. How are you doing? Good. Thanks for joining us today. Lovely to have you here. Thanks for having me. It's been an exciting day so far with some very interesting chats and love to share some of the learnings at Rytr as well. I know Rytr is doing a tremendous job with applying generative AI and coming up with technology

Speaker 35:03 that actually solves real world problems for the enterprise and so I'd love to kick off our discussion with the fact that we see so much variability in the enterprise in terms of use cases. How does Writer engage with enterprises today in terms of delivering the value across so many different use cases? Yeah, totally. Especially in large enterprise, right, it can really feel like

Speaker 35:27 the use cases are all so different and and and all require custom builds. The way we square the the circle is as as follows. So for folks who don't know us, we are a full stack generative AI platform. So think of us as no code, low code for building generative AI apps and workflows. So as as Yash intimated, use cases are the lifeblood, right, of of how we build.

Speaker 35:51 And what we do is really take to the enterprise this dual approach where for mission critical use cases, think of this as killer apps industry by industry, even sub vertical by sub vertical within these industries. These mission critical apps, think of them as almost pre fine tuned in Writer, prebuilt with our components. And our components of the full stack platform

Speaker 36:18 are LLMs, built in RAG, and guardrails. And then what we do is those same building blocks, we put in an AI studio so that the organization's own data scientists, PMs, AI builders, we call them AI builders because it's a really a range of tech technology skill sets, right, that can contribute. They can build these use cases themselves. And so it is both almost like a build, operate transfer

Speaker 36:49 on the biggest, most mission critical use cases. And then this toolkit that's very easy to use across the business that we roll out and train people and certify people on. And, you know, having a purview into so many different use cases and verticals, you know, where do you all see Jet AI making the biggest impact? Yeah. Wow. I mean, where do we start? In financial services,

Speaker 37:23 there have just been incredible use cases. Bankers that are using Writer to generate drafts of s ones, analyze earnings, build pitch books. So I started my career as a junior investment banker, and and the job really can be very, very different. In capital markets and in hedge funds, similarly, being able to build knowledge assistance that quickly answer questions on industry news, trends,

Speaker 37:51 company deep dives, investment details, being able to write IC memos. So anything that is related to, you know, insights and both the the the short form and the long form of of going deep on big meaty topics and companies' information. Writer is just really doing some mission critical work in financial services. In insurance, claim adjudication is a huge industry for us, or set of use cases for us within the insurance industry. That's those big kind of $100,000

Speaker 38:27 plus claims where the initial kind of automated passes don't get into all of the details. So we work again to support only, this is important for us, human decision makers, but are able to give them really quick answers to questions and kind of initial readouts of evidence, for example, in various claims and kind of electronic health records, etcetera. So in financial services insurance, healthcare,

Speaker 39:01 incredible complexity and requirement for high accuracy. And so the full stack approach where you really need to build a compound AI system around the LLM to get the results, we really flourish in those kinds of use cases. Amazing. Yeah, and I'm sure, like, you know, on the marketing and on the creative side as well, the generative use cases are pretty phenomenal. We've heard

Speaker 39:28 some great things about the Ryder platform from many of our enterprise customers as well, so congratulations on the success so far and it's great to learn from you and you and I have one thing in common, least we both started as junior investment bankers in our career, so I can at least relate to that pitch books and comms and slides that we had to prepare for every single deal.

Speaker 39:56 In terms of you know, what's driving the Jennie Adoption and seeing writer success in this space, like, what's the secret sauce? What's the recipe? Yeah, so let me answer technically, and then from a go to market perspective. On the technical side, it's two things. Number one is really complex engineering efforts we've abstracted for the customer. So our rag is zero engineering rag. Right? The customer doesn't need to preprocess their data.

Speaker 40:23 We're the ones who are doing that for them automatically. Right? The word rag actually appears nowhere in the product, right, and in the platform. And certainly, you know, there are a ton of tie ins to the API so that folks who need to scale and send, you know, 3,500 contracts to Writer are able to use the API. But in UI, we have made it so that folks are connecting data and data repos, and all of the data processing for Rag we're doing under the hood. So we do the same thing for AI guardrails. It's not just simple

Speaker 40:56 kind of naive rejects, right, for guardrails. We are using LLMs to build these guardrails in a really high trust way. And all of that, again, is abstracted for the customer. So it's a huge part of our secret sauce because when we say full stack platform, folks can go to a hyperscaler's website and be like, well, that looks like full stack to me too. Right? They've got LLMs. They've got RAG. They've got guardrails.

Speaker 41:21 With Ryder, every layer talks to each other. Right? Every layer is preconnected. You don't need as much engineering. And so especially as the LLMs get more and more powerful, ours and others, the gulf between, like, like, flat out DIY versus what you can get with with Ryder, that gulf only increases. The second big technical reason for for the success is the domain specific

Speaker 41:48 models are are really close to the use cases vertical by vertical. And so the customer doesn't need to fine tune to get the kind of accuracy. Right? In AI Studio, we're able to break down a use case and really capture the business logic that needs to go into the model, but we're able to do that in a way that pairs kind of parts of the logic with examples. And it's examples versus

Speaker 42:16 data that really makes the big difference. And it's a quantitative measure. Right? Like, to fine tune and fine tune appropriately, you need a lot more data than examples need. Right? And because the domain specific models, be that our financial services model versus our customer support model versus our creative model for marketing versus our medical model. Right? All of those are

Speaker 42:44 preconfigured, think of it that way, for the types of use cases folks want to do. And you can mix any of our model families within a single use case. So we've really abstracted away a lot of the complexity, you know, even down to the model level. That's awesome. So like, what I hear is like vertical integration and like capturing the business value and minimizing the data required to get

Speaker 43:09 started, to get set up, has been kind of the key drivers here as well. How do you see, when you say like, you know, verticalization in the product as well, just a follow-up to that is, when, let's say, on financial services, you talked about a few use cases. Are there typically cases, like, pretty much every generative AI case that you would cover? Are there any challenges in productionizing some use cases versus others?

Speaker 43:35 Yeah, absolutely. I mean, the use cases where a customer is not aligned on what good looks like. Right? When we say we align to business logic, folks need to agree on what that logic is, and especially for the data that's in people's minds, right, and the subject matter kind of experts' brains, you know, there's quite a bit of work to align on how you're going to do something if you're going to now trust AI to do that thing.

Speaker 44:04 So there is just, you know, a ton of use cases in the realm of curation or taste required. Right? A lot of marketing use cases fall in that bucket. You know, what is our brand voice? What does good look like? Where, you know, the complexity is is less, you know, can AI do it? Right? And it's more, you know, have we agreed on the the how? And then in, you know, where where we do find

Speaker 44:30 that, you know, it's the next step function change in LLMs that is required is around, you know, function calling and tool use, right? The use of tools in parallel, the use of tools and being able to assure and ensure that what you are moving from system to system is high quality. Right? You're not just moving, like, garbage from one place to another. There's still a lot of work that needs to be done at the model layer, right, to get to the nirvana of AgenTik that folks are

Speaker 45:02 excited about. Amazing. Yeah. And I think, in favor of productionization, I'd love to, I guess, pick your brain on like, is there a best practice that has led to writer productionizing so many use cases with so many enterprise customers? You mentioned expectation setting and like, what is the expected behavior from a business logic perspective and then realizing that in production almost reliably and in a trustworthy manner there as well. Any best practices that you've seen, like this is what we advise,

Speaker 45:37 you know, our end users to almost adapt? Yeah, totally. You know, we'll go in and ask for two things, right? Number one, a recognition that AI is a team sport, right? If we are brought in by IT versus line of business, right, we are really insistent on being close to the the business and not building science projects that, you know, aren't necessarily connected to to reality.

Speaker 46:01 The second thing that we really insist on is dedicated generative AI experts. We will make them experts, but the folks that, like, end up owning this internally have to be dedicated to the the program. And a range of technology skill sets is is excellent. Folks who really get the tech, they don't necessarily have to be engineers, but understand how the engineers are getting these outcomes. We don't want AI to be magic inside of a company. We need it to be just software that people really understand. So once we have those two things, you can really get get down to work. And then in terms of just the nuts and bolts of of how we go about it, it's a very focused effort to get to accuracy.

Speaker 46:45 And it's everything from, you know, us having our own QA teams. Right? I think we're probably the only generative AI company that employs, you know, QA people with PhDs that make sure that these applications are excellent and that we're building the evals even if the customer doesn't have them. Because it is really important to be able to take a v one to the business user that's really excellent. Right? Like, we have kind of passed the era where you could take a prototype

Speaker 47:15 as a technologist, you know, to to the end user. So and we're almost now exclusively working with the IT teams, internal generative AI teams, to build for their business. And so it's a very high bar for ourselves for what we what we take to the business. This has been awesome. Thank you so much, May, for for coming on board, sharing so many on the ground learnings

Speaker 47:36 that make enterprise general AI applications successful today, so glad to have you here again and wishing Ryder all the best. Thank you so much, Yash. Lots to come. I'm May at Ryder, anybody who wants to reach out, we're hiring for every position you could imagine. Lots of technical architects, both pre and post sales. So if you're excited to work on these complex

Speaker 47:59 applications and rollouts, definitely get in touch. The enterprise needs you. Thank you, Kash. Talk soon, everybody. Thanks, Minh. Thanks. Bye.

Conor Bronsdon 48:07 That is a wrap for our first episode of Chain of Thought. We'll be back every Wednesday with more insights and conversations from the world of AI. So make sure to like and follow the show wherever you get your podcasts. Don't forget to check out Galileo's YouTube channel where you can watch every episode as well as events like Productionized. Plus, we'd love to hear your thoughts on the show. Connect with us on Twitter or LinkedIn at run Galileo. Let us know what you think. Thanks for listening, everyone.