Why does Aurimas Griciūnas decide against building another observability and evals company?

When Aurimas Griciūnas left Neptune AI, building something similar in the observability and evals space was the natural next move, so he researched it. Within the first few weeks he found 20-plus companies already doing it — plus the hyperscalers, and probably another 20 in stealth, half of them open source and free to self-host. Because it’s hard to pinpoint what will matter in the next few months, all of them try to cover end to end: traces, evals, experiments, prompt registries, even routers. He concluded there was no unique space left, so he called the market too packed and didn’t build.

What does Aurimas Griciūnas think about vibe coding your way to a startup?

Aurimas Griciūnas thinks vibe-coded tools can mostly succeed in B2C, where a new idea quickly captures the broad public’s attention. For enterprise products, he still expects VC-backed companies with large cash reserves to win — unless you build something really great really fast, raise a lot of money, then hire hundreds of engineers to refactor the vibe-coded foundation. He adds a key caveat: if a really strong engineer is doing it — and he notes they’re usually doing assisted coding, not pure vibe coding — that approach can work, because the engineering talent is already inside the founding team.

What fundamental gaps does Aurimas Griciūnas see in how teams build AI today?

Aurimas Griciūnas points to several. Evals are still a gap — eval-driven development is crucial but not widely adopted. Overreliance on orchestrators causes problems as systems mature, forcing teams back to base software engineering without wrappers. People take too long to ship: first MVPs aren’t rolled out soon enough and human feedback isn’t fed back in fast enough. He also sees weak business understanding — teams build something shiny instead of solving a business problem, then build agentic systems “in the basement” only to ship something that solves nothing.

Why does Aurimas Griciūnas call evals the hardest part of building agentic systems?

Aurimas Griciūnas says observability tooling in general is great but isn’t the hardest problem. The hardest problem when building agentic systems is creating the eval datasets themselves, which he calls really, really hard — sometimes 70% of the entire project goes into figuring them out. He frames this as the piece of the puzzle that’s currently missing: too many hours go into eval datasets, and that, not the observability layer, is where the real difficulty lives.

What does Aurimas Griciūnas say about data engineering in the AI era?

Aurimas Griciūnas — a data engineer for four or five years who also led data engineering teams — says it’s consistently underrepresented even though data engineers do most of the work to make these systems run. He pushes back on the idea that AI engineering is just data engineering: data engineering is about piping data to where it needs to live, while AI engineering is about building agentic system designs on top of that data. He’s candid that he doesn’t know how to keep it in the spotlight — it’ll never be hot, in his words, because data engineering saves costs rather than producing revenue.

Episodes · S2 E38 ← Prev Next →

From Demo to Defensibility: How to Build an AI Business that Lasts | Aurimas Griciūnas

Aug 27, 2025 · Aurimas Griciūnas , SwirlAI · 52 min

AI Evaluation & Reliability AI Observability Context Management AI Hardware AI Energy & Data Centers Enterprise AI

Listen on any app

Key takeaways

Aurimas Griciūnas argues there are only three ways to easily start a successful startup today: build something that grabs attention fast and reaches escape velocity within the first months, have really strong backing from the start — enough money to build a big team and roll out enterprise operations properly — or have distribution on day one.
Open source is hard to turn into a profitable business, and enterprises don’t work well with it — they need a very mature solution. That’s also why infrastructure companies are hard: you need real engineering, not vibe coding, because security, stability, on-prem deployment, support, and enterprise features all have to be top notch.
The technology moat is genuinely less defensible now, Aurimas says. With a strong engineering team — maybe five strong engineers using AI — you can very quickly build an observability and eval tool that rivals Langsmith or Langfuse. So no one pays because you’re a known person; they pay because the product is better and more efficient than competitors.
When advising founders, Aurimas would weight the founder over the product: the first idea is usually not a great idea, so he’d probe how good their pivoting ability is. He calls distribution and reach key — how you sell and market the product is probably even more important than the product itself, at least at the very beginning.
For agent builders, context engineering equals prompt engineering — you can’t build agentic systems without it. Aurimas warns the context window explodes to a few hundred thousand tokens per run; five runs at 200,000 input tokens each can take fifty seconds and “that’s a chatbot.” His favorite lever: compress the conversation history and offload discarded actions to a scratch pad.
Aurimas predicts a slowdown — no big leaps in AI in the next six months, and no distributed multi-agent systems in production yet despite the A2A hype, because it’s too hard to instrument long-running distributed agentic systems. He doesn’t buy that LLMs alone reach AGI either, suspecting a model-architecture problem more than a hardware one, while still expecting we’ll need this kind of compute for inference. What excites him: coding CLI agents and self-improving agents that rewrite their own code, evolutionary-algorithm style.

Frequently asked questions

Why does Aurimas Griciūnas decide against building another observability and evals company?: When Aurimas Griciūnas left Neptune AI, building something similar in the observability and evals space was the natural next move, so he researched it. Within the first few weeks he found 20-plus companies already doing it — plus the hyperscalers, and probably another 20 in stealth, half of them open source and free to self-host. Because it’s hard to pinpoint what will matter in the next few months, all of them try to cover end to end: traces, evals, experiments, prompt registries, even routers. He concluded there was no unique space left, so he called the market too packed and didn’t build.
What does Aurimas Griciūnas think about vibe coding your way to a startup?: Aurimas Griciūnas thinks vibe-coded tools can mostly succeed in B2C, where a new idea quickly captures the broad public’s attention. For enterprise products, he still expects VC-backed companies with large cash reserves to win — unless you build something really great really fast, raise a lot of money, then hire hundreds of engineers to refactor the vibe-coded foundation. He adds a key caveat: if a really strong engineer is doing it — and he notes they’re usually doing assisted coding, not pure vibe coding — that approach can work, because the engineering talent is already inside the founding team.
What fundamental gaps does Aurimas Griciūnas see in how teams build AI today?: Aurimas Griciūnas points to several. Evals are still a gap — eval-driven development is crucial but not widely adopted. Overreliance on orchestrators causes problems as systems mature, forcing teams back to base software engineering without wrappers. People take too long to ship: first MVPs aren’t rolled out soon enough and human feedback isn’t fed back in fast enough. He also sees weak business understanding — teams build something shiny instead of solving a business problem, then build agentic systems “in the basement” only to ship something that solves nothing.
Why does Aurimas Griciūnas call evals the hardest part of building agentic systems?: Aurimas Griciūnas says observability tooling in general is great but isn’t the hardest problem. The hardest problem when building agentic systems is creating the eval datasets themselves, which he calls really, really hard — sometimes 70% of the entire project goes into figuring them out. He frames this as the piece of the puzzle that’s currently missing: too many hours go into eval datasets, and that, not the observability layer, is where the real difficulty lives.
What does Aurimas Griciūnas say about data engineering in the AI era?: Aurimas Griciūnas — a data engineer for four or five years who also led data engineering teams — says it’s consistently underrepresented even though data engineers do most of the work to make these systems run. He pushes back on the idea that AI engineering is just data engineering: data engineering is about piping data to where it needs to live, while AI engineering is about building agentic system designs on top of that data. He’s candid that he doesn’t know how to keep it in the spotlight — it’ll never be hot, in his words, because data engineering saves costs rather than producing revenue.

Concepts in this episode

AI terms discussed here — each links to a plain-language definition.

Context Engineering Vibe Coding Model Context Protocol (MCP)AI Evaluation Artificial General Intelligence (AGI)Inference AI Agent Prompt Engineering Tokenization Context Window

Show notes

The technological moat is eroding in the AI era, what new factors separate a successful startup from the rest?

Aurimas Griciūnas, CEO of SwirlAI, joins the show to break down the realities of building in this new landscape. Startup success now hinges on speed, strong financial backing, or immediate distribution. Aurimas warns against the critical mistake of prioritizing shiny tools over fundamental engineering and the market gaps this creates.

Discover the new moats for AI companies, built on a culture of relentless execution, tight feedback loops, and the surprising skills that define today's most valuable engineers.The episode also looks to the future, with bold predictions about a slowdown in LLM leaps and the coming impact of coding agents and self-improving systems.

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

Follow Today's Guest(s)

Connect with Aurimas on⁠ ⁠⁠LinkedIn⁠

Aurimas' Course: ⁠End-to-End AI Engineering Bootcamp

Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Transcript

137 segments

Aurimas Griciūnas 0:00 Enterprises do not work well with open source. Enterprises need a very mature solution. When you build an infrastructure company, you need real engineering. Right? You cannot wipe code. There are many factors and requirements by enterprise companies like security, stability, and everything needs to be top notch.

Conor Bronsdon 0:24 We are back on Chain of Thought. I am your host, Conor Bronsdon. Today, we're joined by a guest that many of you may know and who I've been following for quite a while, Aramis Gritsunis. Aramis is someone you've probably seen on LinkedIn, maybe on X. If you're interested in an AI, you've absolutely seen some of his incredible charts that he shares, these graphics

Conor Bronsdon 0:45 and the insights that he brings around everything from the observability stack for AI to how AI agents are being built at the enterprise. He makes some really unique and interesting content and has a deep background in the trenches of data and AI, having been everything from a data analyst, a machine learning engineer, to ML Ops engineer, and chief product officer at Neptune AI. Today, he is the CEO and cofounder of Swirl AI, consulting and building agentic systems for clients. He's a prolific content creator, as we've already mentioned, and he's launched his own course, the end to end AI engineering boot camp to train the next wave of builders.

Conor Bronsdon 1:26 Armas, it's a pleasure to have you on the show. Welcome to Chain of Thought. Hi, Conor, and thank you for having me here, and super glad to have the conversation with you. I know we'd plan to start on some more general topics, but given that you just told me that your first cohort of your end to end AI engineering bootcamp is wrapping up right now, I'd love to hear from you. How has the cohort been? How's the new

Conor Bronsdon 1:53 course going?

Aurimas Griciūnas 1:54 I do believe that it is going really well. There are a lot of learnings that I'm bringing away from the first cohort, and I'm definitely bringing into the second one. So first learning probably is that there is a lot to cover when it comes to AI engineering, especially end to end AI engineering. And you're covering systems from the very simplest ones, Wing to Rag, Agent to Rag, Agents, World Agents, communication

Aurimas Griciūnas 2:21 protocols, deployment, observability, evaluation, right? So probably eight weeks is not enough if you are also working full time job at the same time. So now their their realization is that, yes, you can deliver the material in eight weeks, but a commitment for a learner who is actually learning those materials should probably probably be around, I don't know, six months. So

Aurimas Griciūnas 2:50 the materials should be reviewed after the boot camp ends. So I think that's one of the realizations. But in general, it is a very hands on boot camp, and it seems like people like it. And I think that everyone is bringing a lot of hands on experience from it.

Conor Bronsdon 3:10 Do you think there would be a challenge with the pace of innovation and change that is occurring though in AI if you were to do that type of six month course where, oh, look, so much has changed around the frameworks you're applying, and there may be new tools you wanna bring in. How do you approach this given that you have, like, your own day to day approach to learning and then these cohorts that you're also working through?

Aurimas Griciūnas 3:35 So what what I actually meant is that the course, the cohort itself would still be eight weeks, so it will not span eight months. But if you want to get deep into the topics and properly apply them in practice, you should probably take around six months and spend another four months on top of the eight weeks of the boot camp to properly learn that. Since I am kind of providing

Aurimas Griciūnas 4:00 also lifetime access to the materials, so the next cohorts, there are I update the materials, are also available for the previous learners, and go back, get up to speed with all of the changes in the industry.

Conor Bronsdon 4:14 And you've mentioned observability and evaluations as key areas within the AI space. And increasingly, I think we're seeing more and more conversation about this. And you in fact mentioned to me that you've considered starting a company in that area, but ultimately decided against it, calling the market too packed. Could you walk us through that thought process,

Conor Bronsdon 4:37 and how you were thinking about the AI infrastructure market today?

Aurimas Griciūnas 4:41 So when I decided to try and look into the space, I was leaving Neptune AI. It was kind of natural for me to try and maybe build something very similar in the similar space, more in the application layer. That's why the initial decision to actually research the space. But then kind of even after a few few of the first weeks, we kind of found 20 plus companies that are doing

Aurimas Griciūnas 5:10 observability and DDoS. Right? And apart from the hyperscalers who are also doing that as well. Right? So there are quite a lot, and all of them are covering also trying try to cover end to end because it's really hard to pinpoint what will be really important in the next few months. So I think that's why all of those companies are trying to do observability and evals and experiments and prompt registries and

Aurimas Griciūnas 5:39 maybe some of them routers as well. Right? So connecting the end to end traces and having end to end observability and evals of the system. So there was really no, probably no unique space to tackle that hasn't been already kind of picked up. And then probably there are also 20 companies in stealth building those solutions. And half of them are open source and available for free to host.

Conor Bronsdon 6:05 Speaking of open source, you are one of the folks who has correctly anticipated the need for agent interconnection. I've seen you talking about it for months now, maybe even back into 2024. And that space is obviously now being tackled by open standards like A2A and Agency, which have been donated to the Linux Foundation, obviously, MCP. How do you see the open source movement within AI changing the calculus for founders

Conor Bronsdon 6:32 who are trying to build venture backed companies?

Aurimas Griciūnas 6:35 Open source. Few thoughts here. So currently, in the first place, currently it is not easy to find a company and be successful, right? So you either build something that grabs the attention really quickly and you kind of reach some sort of escape velocity within the first months once the after you start building or you have really strong backing from the sea. So you have a lot of money and then you can actually build a big team and roll enterprise operations properly roll out enterprise operations properly.

Aurimas Griciūnas 7:08 Or you have distribution day one. So I guess those are the only ways how you can easily start a successful startup today. When it comes to open source, I'm a strong believer in open source, but it is really also hard to make an open source product a profitable business. Yeah. And enterprises do not work well with open source. Right? Enterprises need a very mature solution.

Aurimas Griciūnas 7:45 And that's also the reason why it is hard to build infrastructure company. Because when you build an infrastructure company, you need real engineering. Right? You cannot wipe code because there are many, many factors and requirements by enterprise companies like security, stability, being able to deploy on prem, say the support that comes with it, enterprise features.

Aurimas Griciūnas 8:14 And everything needs to be top notch. And then for some of the companies also need the ability to do hyperscaling on your own side because they are big companies. They might be ingesting a lot of data. And if your infrastructure is not specifically meant for that, then you will not be able to succeed

Conor Bronsdon 8:34 in enterprise space. So do you see this differentiation between the capital rich folks or the folks who have at least raised a lot of capital to take on infrastructure companies, kind of taking a very different approach from people who are, as you put it, vibe coding their way to success and maybe using their own built in distribution to try to quickly generate revenue.

Conor Bronsdon 8:58 How do you see these dynamics playing out in the market? Yeah, we'd love to explore that with you.

Aurimas Griciūnas 9:05 So when it comes to vibe coded tools and products, I think this can mostly be successful in b two c type of products because you're quickly capturing attention with some sort of a new idea from the broad public. And when it comes to building enterprise products, I still think that VC backed companies with large amounts of cash will be the winners. Unless you really build something really, really great really fast, and then you get a lot of money, and then you hire hundreds of engineers to refactor your vibe coded

Conor Bronsdon 9:42 I don't know. What what's your what's your take on this? Good question. I I think you're spot on that it really depends on the space. Every time I see someone trying to vibe code their way to a business solution, I just assume, maybe unfairly, that it's not gonna scale. That, okay, sure, this may work for a certain DevTools segment, then or or or maybe a a single

Conor Bronsdon 10:06 iCPU, if you have the ability to just kind of put a credit card in. But once you start going up against competitors in larger deals, and there's actual frameworks being applied about, okay, like, how's your security, how are your security and compliance protocols? Are you meeting our needs in these specific areas? Do we have the role based access control we need? All the things that enterprises, or even just larger scale ups are looking for,

Conor Bronsdon 10:29 I would expect you to see some some major challenges. And I do think there's a potentially viable path, we're maybe seeing this play out a bit, Vibe code your way to a cool demo, try to raise money off that cool demo, and then actually hire engineers to create the whole thing. And I wouldn't be surprised if there's quite a few companies doing that today. I'm not gonna name names. But that also creates a lot of hidden risks for founders

Conor Bronsdon 10:55 who choose this kind of high intensity path where it may help you get to that raise and maybe that's what you need, but there's a lot of pressure that comes with that as well.

Aurimas Griciūnas 11:06 But I guess if a really strong engineer is doing vibe coding, and usually we are not doing vibe coding, we are doing assisted coding in Fair. Efficient way, then maybe this kind of a tool coded in this kind of way could actually succeed. Right? You can build something good and you hire a team quickly once you get VC money and then you scale out. I think it's a viable approach if you have that engineering talent within the If founding

Conor Bronsdon 11:33 you if you are coming in and you're a nontechnical founder and you're expecting to be able to just vibe your code vibe code your way to initial success, I would be hesitant because I feel like you'll induce, or you'll introduce so many issues. Because to your point, I think you need to treat it like a partner. You can't simply just say yes to everything. It's very easy to refactor

Conor Bronsdon 11:56 things in the wrong direction and introduce a ton of long term challenges. So yeah, I do agree with you, though. If someone comes in and has technical expertise and understands what they're doing and they want to use AI as a partner today, I think that's a fantastic use case for it. And I think we'll see, and are already seeing, but we'll continue to see, a lot of

Conor Bronsdon 12:19 folks take on founding with AI as a key partner in their initial build out, their demo. And then I think the challenge will be, okay, how do you translate that to, we're a scaling company? Obviously there's a million people who've written books on that. I'm not gonna try to pretend I'm Paul Graham and say, oh, here's the approach you wanna take with But the inception point of going from

Conor Bronsdon 12:43 idea to MVP feels like it needs to just move so fast today. And I think it's a big opportunity for founders, but of course brings a lot of pressure as you start bringing in those VC dollars and another backing. Even if you have Raised and, you know, a successful vertical app, maybe even get to a million to an ARR, there's also this copy risk that's being introduced of, oh,

Conor Bronsdon 13:09 this could be copied easily now. It's a lot easier to just say, oh, great. Like, let's let's take it what our our rivals doing, and we're gonna do the same thing. How do you create that defensible moat? That's where my mind's at now of It feels like having a clever idea, I mean, or an early traction. I mean, obviously it's never been enough on its own. You still have to execute. There's many things that have to go right.

Conor Bronsdon 13:30 But I wonder if the moat of technology is actually less defensible today.

Aurimas Griciūnas 13:37 It is. I guess it is. And especially you you touched this point previously, like when you're doing an enterprise sale, right? So it's not like you're just coming in solo, a single company and trying to sell. Enterprise sales process is a very kind of known, has very known patterns where you would be benchmarked against ten hours and the vest would be chosen,

Aurimas Griciūnas 14:04 And then there are a few risks, either you're already entering a very hot market. So then how do you become better than others? Like even if you, like no one will pay just because you are a known person, right? They will pay because the product is good and better than ours. We have, need to have all of those features that we need and implement it in a more efficient way, way than your

Aurimas Griciūnas 14:30 competitors. And now when this copy risk exists, then a new product can very quickly be kind of coded. You could say white coded, but if you have a very strong engineering team with AI at their side, so maybe five strong engineers, you know, it's very easy to build even, for example, observability infra tool. Right? If you're a really great engineer and you're not the a single engineer in the company and then you use AI to some extent, you can very quickly build a observability and eval tool that

Aurimas Griciūnas 15:07 rivals tools like Langsmith, for example, or Langfuse. Right. Or yours. Not I don't know. I I never used Galileo,

Conor Bronsdon 15:15 but maybe Try it out. Let's know what you think. Yeah. So I guess let's make this practical then. If you were advising another founder today and they have an idea and they're like, how should I get started? How should I approach this? You know, we've talked a bit about a couple of these flash points that are now occurring where it's easier to get to MVP, it's maybe easier to copy. What would your advice to that founder be,

Conor Bronsdon 15:43 who is coming in with a unique idea, trying to think through how should they should approach it?

Aurimas Griciūnas 15:48 So I think this is also how many VCs think, is that it's not all about the product. Right? It's very it's it's a lot about the founder herself or himself. And what this really means is that can that person very quickly pivot and adjust to the changes in the market? So usually the first idea is not a great idea. So I would really try and maybe figure out how good their pivoting ability is.

Aurimas Griciūnas 16:21 Then this is the first one. The next one is how do you sell the product? How do you market the product? Because that's probably even more important than the product itself, at least at the very beginning, right? The distribution and reach is key. So I wouldn't even be too strict on the idea that the person is trying to build and rather see which part of the market

Aurimas Griciūnas 16:45 those founders are targeting and maybe looking a little bit back into their histories, what they have been doing before, and how they think about the industry in the first place.

Conor Bronsdon 16:58 I completely agree. I'll say my most successful agent investments so far have both been instances where the founders have pivoted and said, we didn't quite have this right initially, but we saw the potential of how smart and driven and thoughtful these people were, and they've found the path. And I think that's true of most folks who aren't doing investing, is the

Conor Bronsdon 17:23 founder first. Who are they? Will they actually take you down this path? Do they have the the grit and determination and the, you know, mental ability to think outside the box, but then also bring order to the chaotic ideas that they are putting out in the

Aurimas Griciūnas 17:40 And, no, the real kind of decision points probably come once you actually put out your first idea into a market, and then you get the feedback, then you get maybe some paying customers,

Conor Bronsdon 17:50 and then you kind of figure out what needs to be done next. I think that's one of the really exciting parts for a lot of entrepreneurs in the space right now, which is that it's so much easier to get to MVP and start getting that feedback faster. So you can say, Oh, I did this completely wrong. Or, Oh, great. We've got something here. Let's see where this goes.

Conor Bronsdon 18:09 And it's creating this intense market pressure for speed, both within larger companies and for entrepreneurs. How have you integrated this focus on teaching what's important and this idea of focus, which I would argue is increasingly important in a world where there's just so much happening and so much information and the cost of generating code or generating content is drastically decreasing.

Conor Bronsdon 18:40 How do you bring that into your course and your your work as you help advise and and understand what folks should be focusing on.

Aurimas Griciūnas 18:51 So my course is really focusing on fundamentals. So it's not tool focused course, even though we are using popular tools, of course. But taking a simple example, like I'm using Lang graph throughout the entire build out of, let's say, the capstone project. But at the same time, I'm not using Lang Smith. I'm using instructor for structured outputs, and I'm using instructor

Aurimas Griciūnas 19:21 wrappers within inside of LangRefNotes. So it's kind of teaching people that structured outputs are important. This is how you can achieve that. This is how it works. You shouldn't rely on those abstractions that hide the structured outputs that are actually achieved. Also, not using tool bindings, use the frameworks. Right? Actually prompting the LLM itself by produce tool suggestions,

Aurimas Griciūnas 19:52 by giving tool descriptions, etcetera. So anyway, so the main point is that the course is specifically about teaching fundamentals and real infrastructural patterns along the way. For example, observability is being taught day one, and then we move with observability and evals throughout the eight weeks. And I teach how to evaluate each different system. Yeah. So I think fundamentals are really important.

Aurimas Griciūnas 20:20 Now we are also seeing with the flop of GPT-five, I think it is a flop now already. Right? It's not as great as promised. Yeah. So I think that this iterative improvement of LLMs is happening, is starting to happen. Right? So AGI might not be so close. And I think the old old which are old things which are two years old are still very important in building agentic systems. Properly understanding how to context engineer.

Aurimas Griciūnas 20:51 And by the way, context engineering, I think, is a very, very important topic and very often be overlooked. Than learning how to build agentic systems, because it's not as easy as it looks like when you're building demos. You're not doing any context engineering usually. And, yeah. So these fundamental things, I think, are very important.

Conor Bronsdon 21:14 Are there particular gaps that you're noticing in the experiences or fundamental skills of folks who are either trying to grow their skills working with you or people who are out in the market today? You mentioned observability and evaluation. Obviously, we share the viewpoint. Those are crucial and need to be day one instrumentation pieces that continue throughout the entire life cycle of your AI application or agent.

Conor Bronsdon 21:40 But I'm curious if there are particular gaps that you're noticing out in the market today where people aren't really paying attention to fundamental skills.

Aurimas Griciūnas 21:49 So it depends on what you, which part you're referring to. Is it the boot camp itself? Because the boot camp is naturally not for the

Conor Bronsdon 21:58 top, top engineers. Yeah. Think I'm looking more broadly here of like, like, what are you seeing in the market as far as potential gaps?

Aurimas Griciūnas 22:05 Definitely, evals are is still a gap, right? So this is probably it's not an emerging topic. It's already an old topic, but not everyone is adopting the practice of eval driven development yet, even though it is crucial in building these systems. Then I think overreliance on some orchestrator orchestrators is bringing some problems eventually once the systems are starting to are starting to mature because then you need to go back to

Aurimas Griciūnas 22:40 base software engineering without using any wrappers. So people are taking too long to ship in some cases. I think the first MVPs are not being rolled out soon enough. The human feedback is not getting brought back into the system soon enough. Yeah. Definitely business understanding, people are not people building those systems are not always very close to business, and

Aurimas Griciūnas 23:10 they want to build something shiny and adopting some cool tech, right, but not necessarily solving a business problem. And then projects and products start being deprioritized because we are not showing any business value. Teams are building agentic systems in the basement for

Conor Bronsdon 23:33 five months, then they come out and the system doesn't solve a real business. Mean, like we just said, you have to just get out there faster and start getting that feedback. Yeah. Or else you're creating a risk point for yourself that you could be building in a silo. Most companies don't succeed that way. One

Aurimas Griciūnas 23:49 more thing. So I think that we are still early in MCP days, and sometimes I think MCP is being overused. It brings most value when you have remote MCP servers, right? But I don't think that we are, at any point we are not yet there where we can actually utilize them remote MCP servers properly, at least without significant engineering. Yeah. So sometimes just using tools within your code is also a good idea. You don't need to have MCP for everything.

Conor Bronsdon 24:21 Yeah. You don't always have to be using the new hotness. Doesn't have to be shiny. Sometimes fundamentals are fundamentals for a reason. You mentioned context engineering as one point of emphasis in the market today. I think there's been quite a bit of conversation around, I mean, everyone, the folks who are saying, Oh, prompt engineering is the way, and a lot of folks think, Yeah, I mean, it's a short term solve.

Conor Bronsdon 24:47 And then vibe coding and context engineering. There's been all these terms thrown about and different approaches that have been discussed as this is the new approach that we should be taking on here. And I'm curious from your perspective, you talk about context engineering as important and using a system focused lens. What would be your advice to engineers who are maybe under utilizing this and

Conor Bronsdon 25:15 or haven't explored it yet?

Aurimas Griciūnas 25:18 Okay. So the first very important point is probably that I when I was thinking prompt engineering, I was always thinking context engineering from day one. Because if you are building agentic systems, you cannot build agentic systems without context engineering. Right. So for agent agent builders, context engineering equals prompt engineering, because you need to store the actions somewhere, you need to compress the actions because the context window is just exploding if you are building systems of multi turn conversations.

Aurimas Griciūnas 25:53 I don't even have like, okay, I I do have suggestions on what needs to be implemented while performing context engineering, but if you are building a multi turn agenting system, you have felt the pain, I believe. Like the agent running for too long, the context window exploding to a few 100,000 tokens per single run. And then you need to you know, have maybe five runs

Aurimas Griciūnas 26:23 each 200,000 tokens for input, then it takes fifty seconds to complete and that's a chatbot. Yeah. So where to focus is probably the main ideas that I love in context engineering is the ability to compress the conversation history, I think, because that is definitely being able to also discard unnecessary actions or store them in so called scratch pad where you can later on pick it

Aurimas Griciūnas 26:59 up from. Maybe writing all of the state files to the disk, but then picking only what you really need for specific nodes in your agent system as needed. Then when it when it comes to tool usage, it's just regular patterns. We cannot avoid adding those additional tokens inside of your prompt because if you don't do that, then your tool calls will start erroring out. So you will not be able to properly

Aurimas Griciūnas 27:27 retrieve

Conor Bronsdon 27:29 structured outputs correctly. Right? Yeah. Tool optimization has been a big area of focus for us, I'll say, when we've been looking at agentic reliability and observability is understanding how can we better suggest opportunities to improve tool usage within apps, because it's a super common problem, as you're alluding to here, like one of the key places what'll fall is like an agent will just try to use the wrong tool over and over, or it will get stuck in this loop of trying to solve this problem without going back to first principles and thinking it through. So, yeah, there's there's some really obvious failure patterns to address there.

Aurimas Griciūnas 28:05 And and what kind of low hanging fruits do you have to suggest to the audience?

Conor Bronsdon 28:13 Yeah. Well, I'll say check out galileo.ai, and we can maybe maybe we'll maybe we'll contribute a we'll contribute a lecture here at your next boot camp, actually. I'm happy to happy to chat more about that. That'd be that'd be fun. Of course, let's do it. Yeah, we've been doing stuff around tool error optimization in terms of the platform. So basically looking to see if we can identify

Conor Bronsdon 28:36 from an agent graph or from a project with a bunch of traces, like, okay, what's consistently happening here if we aggregate these different traces together? Can we identify and basically use our inference engine on the back end to suggest fixes to agents where we can say, oh, we're seeing this tool issue. Maybe, for example, your application is supposed to be booking you a trip, and it is looking at, you know, trivago and Expedia, and it's always trying to use trivago first even if it fails, instead it's not thinking about it's, you know, secondary option to book a hotel.

Conor Bronsdon 29:14 How do we change the weights in that? So, we've been doing some of that through, like, automation within the platform through the what we we're calling our insights engine, where we're basically feeding the kind of evals dataset that's people have established in the platform. So the their metrics they've created, the traces, their logs, annotations they may have,

Conor Bronsdon 29:34 into our our judge, and the judge is then suggesting stuff. And then you can provide human feedback on the suggestions. So it's working pretty well so far. I think there's a lot more potential there, honestly. We're just scratching the surface. It's not something where I it's like, I think our long term goal would be something where it's creating this automated feedback loop where it's just like, oh, yes, let me go

Conor Bronsdon 29:53 Vibe code my app through this. I'm a little not Vibe code, Vibe improve, I guess. So was just like, okay, great. Like, here's the eval, here's the improvement. Let me take a reader really quick. Cool. Great. Check. But that's

Aurimas Griciūnas 30:04 the longer term dream, I think. So this is almost like a automated

Conor Bronsdon 30:11 error analysis, right? Yeah, we're trying to automate recalls analysis for errors. Good question. And it's not a 100% solved yet by any means, but we're starting to make some real strides with it. And you'll you'll see us start to do it through an MCP lens too, where you can access this, like, these different catalogs of different error types and, you know, pull in it through MZP and just do it through IDE too, where it's like, oh, here, great. We ran went and ran the eval. Here's a suggestion.

Conor Bronsdon 30:41 You know, awesome. Let me approve it. But that's all that's all very much on our our beta testing side of things right now. So Because the because I would say that

Aurimas Griciūnas 30:50 the observability tooling in general is great. Right? But it's not the hardest problem to solve. Right? When you're building agentic systems, the hardest problem to solve is to actually create those eval data sets. Yeah. Really, really hard. So from what I hear is here that you're not only targeting the actual improvements to the system given the eval data set, but also somehow

Aurimas Griciūnas 31:17 clustering the traces themselves.

Conor Bronsdon 31:20 Yeah, we've got a couple different ways we're doing it. Part of it's through just like new views. So we have like an aggregated graph view, for example, where you can look at like multiple traces at once and kind of see, I wish I had a good example handy right now, but you can just see like, okay, what paths did the agent take throughout this? How much are they overlaying? Where are their problems?

Conor Bronsdon 31:44 And then we're trying to do that in a much more automated way, as you point out. So I'm really interested to see where it goes. It's the stuff that gets me excited about the platform. It's like, okay, observability is a base layer, get that right, okay, and then try to do evals really well, and then ideally that should feed an improvement mechanism. And right now where that improvement mechanism is AI engineers going in and kind of tuning things themselves, but okay, the more we can do to just make it really easy for them to go, oh, yes, great. We see this. It's it's identified very quickly. We can go solve it very quickly. I think that's where this whole

Conor Bronsdon 32:20 evaluation and durability space is going to really expand to driving improvement.

Aurimas Griciūnas 32:25 I agree completely. Like, this is really the piece of a puzzle which is currently kind of missing because just It's nebulous. Yeah. Yeah. Too much, too much hours are going in into figuring out the eval data sets, like 70% of the entire project sometimes.

Conor Bronsdon 32:44 Yeah. We're we're getting faster at it, and I think our the fact that we have our our Luna two small language models fueling some of our, like, eval metrics. I I mean, the the challenge with those right now is maybe not by the time this episode comes out, but, like, for the moment, like, we have to fine tune those models to get them really accurate, but they're much cheaper and faster than if we're using an LLM call all across the board. So you can do this much more cheaply and much more effectively.

Conor Bronsdon 33:10 And I'll tell you, though, Adam may have to edit this out depending on when this episode comes out. Like, are going to go live with anyone can just fine tune their metrics using SLMs on platform, which will make things go way faster, hopefully. But this is all, again, is the edge of the platform. We're like, oh, we're not quite done with it yet. We're hopefully figuring it out. So that's that's the exciting stuff. That's the fun stuff. And what is your take on

Aurimas Griciūnas 33:36 all of these OpenAI open source models coming out now so you can actually pick that one up and fine tune it? I'm a big fan. Yeah. I I think I think that we should

Conor Bronsdon 33:46 I'm I'm very pro open source models. And in part because, like, I think if we don't open source most models, we're gonna have a standpoint where a couple of companies are just gonna monopolize in the long run. And I think that could be really negative for the broader economic picture and broader ecosystem of software. So I I think the the opportunity with smaller models to do more specialized tasks and for fine tune open source models is huge. And I'll say, like our Luna two models were originally based off of like LAMA models that we took and redid and fine tuned. So

Conor Bronsdon 34:21 yeah, I think the feature idea of having a model that is cheap to run and can run on your own hardware and really enable people to have cheap, excellent inference that is fine tuned to their tasks is very exciting to me because while I know it's not gonna necessarily, it's not gonna solve AGI, right? Much more reasoning, much more inference, way more GPUs thrown at that problem.

Conor Bronsdon 34:47 I think it can tactically solve a lot of problems as long as it is fed initially by these broader frontier models, where we're spending all this money to get them right. But So Yeah. I don't know. What's your take? Yeah. Yeah.

Aurimas Griciūnas 34:59 I believe that there's a need for fine tuning even in agentic systems. We need to fine tune for specific routes, but there's no buts, but I think that these open source models by OpenAI will be a great kind of leap forward for all of this research. Yeah. I mean, even thinking, getting my hands on one of those NVIDIA sparks, maybe putting it on my table and playing around open source myself.

Conor Bronsdon 35:31 Also say we've been oh, sorry. Go ahead. Go ahead. Yeah. No. No. I'd say so we we did this. I don't if you saw our research about our agent leaderboard. So we did an original version of this back in February, and then we just did an update recently, basically looking at tool selection quality for different LLMs across a variety of agentic scenarios that were aligned towards enterprises.

Conor Bronsdon 35:54 So it was like, okay, here's like a finance scenario. Here's banking, healthcare, insurance, telecoms. The idea being, okay, let's try to actually identify, like, these LMs effective within customer support agents that have to be very specialized for these different areas? And so we looked at both tool selection quality and then action completion with the idea of being like, did they actually complete what you want and solve your problem?

Conor Bronsdon 36:19 And honestly, one of the most impressive models we looked and most recent round was Kimi K2, which came out a few weeks ago as we're recording this for Moonshot. And yeah, like Quen 2.5, they're 72B, and then Kimi K2 both did really well in our analysis. So I think there is a big opportunity for open source models to feed a lot of this. And I'll say, probably by the time this episode comes out, we'll definitely have added these new open source OpenAI models too, because it's very exciting to see that ecosystem catching up. I know Gemini and Claude and GBT are gonna jump ahead again, but it's like, okay, let's

Conor Bronsdon 36:57 make sure that we're not leaving the boats behind, so to speak. Well, I've taken us completely off track here as we talk about agents, but it's been a ton of fun here. I do wanna ask you a bit more about how you're thinking through the future of the space. Obviously we've been talking about open source a bit. We've been talking about the fundamental skills an AI engineer needs.

Conor Bronsdon 37:19 But honestly, one of the things that really made me wanna have you on the show is how good the graphics are that you make on LinkedIn. And you make these complex AI concepts accessible. And I'd love to understand, as you peek into the future and think about what's ahead, what are the big misunderstood ideas or emerging trends that you're excited to showcase to the community next? What are you thinking about?

Aurimas Griciūnas 37:46 So there are a few areas that I think are underrepresented, especially in content creation. And one of them maybe is a step back, but it is data engineering for AI applications. So connecting the data layer with the actual application layer because, no, there are there's a lot of talks. There are a lot of talks about data engineering being kind of left behind, even though data engineers are doing most of the work to make these systems actually run.

Aurimas Griciūnas 38:19 And then there is no supporting content up on that. So maybe I was thinking maybe I should actually step into that direction a bit because talking just about agentic system designs, everyone is doing today, right? It's really, really, becomes, it's becoming really, really boring.

Conor Bronsdon 38:39 Yeah. Okay. Well, let's, let's dive in there a bit. I mean, personally, I'll say one of the things I've thought about with data engineering is that I almost feel like we are just recreating names for subtasks of what data engineering does with so much of what we've been saying about AI for the last year. What's your take on data engineering and what needs to be done to

Conor Bronsdon 39:04 make sure it's getting the attention deserves and also that it's being effective?

Aurimas Griciūnas 39:09 It has always always been a problem. I I was data engineer in my career for four or five years, I think, also leading data engineering teams. And even back then, machine learning was taking the center stage. Data engineering was never. So it's either system design or machine learning or now AI and AI engineering. Yeah. But now I I don't agree that I don't agree with people who are saying that AI engineering is just data engineering. That's not true. Data engineering is about

Aurimas Griciūnas 39:41 piping the data to where it needs to live, and AI engineering is about building those agentic system designs on top of the data that you have. So it's definitely not the same discipline. How to keep data engineering in the spotlight? I don't know. To be honest, I don't know. This is a long problem that we are facing and no one is really end up talking about data engineering enough.

Aurimas Griciūnas 40:09 Maybe education, just general education, maybe just running boot camps about it because people are interested in data engineering. For some reason, it's simply not taking the spotlight because it will never be hot, unfortunately. Data engineering is not the money machine that VCs are looking. That

Conor Bronsdon 40:30 that does leave it a little out of the spotlight. It's true. You've gotta find that money machine to to really get the attention you deserve.

Aurimas Griciūnas 40:38 Yeah. Because the data data engineers are saving costs. We are not really producing revenue

Conor Bronsdon 40:43 in a sense. Okay. So we need to need to more clearly show, I guess, more graphs showing the money saved. But help, maybe. But yeah. But but money saved is the money saved is also not hot. Money made is hot. Right? No. It's really not hot today at all. Looking at the burn rate of some of these companies, it's like, what? Yeah. Alright. What what other predictions do you have about let's call it the the next six months of AI. AI? I don't want to make you think too far out because it starts to get really blurry at that point. But

Conor Bronsdon 41:12 as you think through this massive wave of agentic conversations people are having, and I would argue some of the overhype that's happening on agents. Because I'll say personally, I'm kind of with Carpathian, this idea of like, yeah, this is the year of agents, but also it's gonna be a decade of agents. Like, we're not solving this tomorrow. But what are the things you're thinking about, though, as we head towards this next stage of AI development?

Aurimas Griciūnas 41:37 So six months a few months ago, I would have said that six months is really a very, very short amount of time. Like, a lot of things could happen, but now we are seeing the slowdown of improvement of LLM. So I think less and less stuff will be happening as we move forward in this amount of time. So in general, I think that what we will not see is definitely we will not see any big leaps in TI.

Aurimas Griciūnas 42:07 I think what we will not see, we will not see distributed multi agent systems in production yet, even though everyone is talking about A2A and how it will change the world because companies will start exposing agents as services. Not not in six months. I don't I don't believe in that. It's too hard to build multi agent systems for various reasons, but one of them being just regular observability.

Aurimas Griciūnas 42:37 It's it's really hard to instrument a distributed system, especially when it is long running agentic systems behind those distributed services. So what I'm really looking forward to is coding agency, coding CLI agents improving, Because we are already quite good. I and I I mean the agents that do not require writing any line of code. Like call call code for Yeah. Exactly. Because we are already quite good. So I'm really looking forward on how this develops. And I had a chat with a very brilliant engineer

Aurimas Griciūnas 43:16 a few days ago about this idea of writing specifications and allowing your agents to write your code and then, you know, throwing really making microservices ideas come to life where code is useless. Right? You can throw it away and you can just rebuild your entire service with the next iteration. So I'm looking forward on the next iteration of these coding agents because I think there's something here, and it will definitely change

Aurimas Griciūnas 43:46 software engineering as it is. Yeah. But the industry is now I think it will start moving slower, so six months is not that long the time frame. Do you think we need to basically take a new leap in scaling as far as massively

Conor Bronsdon 44:00 more amounts of energy, tons more GPUs, to kind of take the next step? Or do you think this is an inherent challenge with AI hardware today, and there's a need for a non transistor based architecture, or a new architecture? What's your kind of take on what's gonna get us past this, maybe a little bit of a wall we're hitting?

Aurimas Griciūnas 44:19 So we need to take one of two sides, Right? Either you believe that LLMs will allow us to kind of move forward through this barrier that we are facing, or you need to take a side that you will even need a different kind of architecture in on the model side, which will not be LLM based. So I'm rather on the second one. I think LLMs will not bring us to AGI, it might not even be hardware problem.

Aurimas Griciūnas 44:45 It might be the actual model problem, model architecture problem. Does it mean that this new breakthrough model architecture that we will find will require as much of compute as we currently are building for? I don't know. But I think that we would need this kind of competing file power for inference anyway. It's even with these kinds of models that we currently have.

Aurimas Griciūnas 45:13 Agree. What's your take? That's a good question.

Conor Bronsdon 45:17 I think I agree. I think we I think it's two I would I would name three challenges. So one, I mentioned energy. I think we're gonna need simply a lot more chips. I think we're gonna see a barrier in the next year or two where we just realize, hey, we need more nuclear reactors. We need more, you know, energy sources here. Like, we simply can't build the amount of data centers we want to

Conor Bronsdon 45:43 and have the type of power grid we want to with our current setup. So I think there's a massive investment needed there. We're starting to see that with Microsoft, for example, investing in reopening the recently shuttered 3 Mile Island reactor. We're seeing folks talking about small nuclear, we're seeing people talking about investing in natural gas in different areas, we're seeing solar being brought up, but I think that's a limiting factor we have to consider, and

Conor Bronsdon 46:10 outside of the hyperscalers, I don't see talked about as much as maybe it should be. Maybe it's because it's not really our problems if we're not on a hyperscaler, we're not on a data center, but like, we should be cognizant of that, and that obviously aligns to the secondary problem of like, yeah, we're we are still gonna need more inference. We are still gonna need more chips. We're still we're gonna need,

Conor Bronsdon 46:27 you know, more data centers. And I I definitely think that's an inherent challenge today with LLMs, but I've always been a skeptic skeptic about this idea of getting to AGI just by brute forcing with LLMs. And I look forward to be prove being proven wrong. We'll we'll see, but I I agree with you. I think we're gonna need to fundamentally make some change with the architecture.

Conor Bronsdon 46:51 The current LLM movement is incredibly useful. SLMs are very useful. We've made massive advances, but I I don't I mean, if we if we look at the basics of it, we're not truly creating thinking machines the way that I think has been built at times. Like, we're creating machines that are doing fantastic things for prediction and have incredible memories and data sets and do unique things, but

Conor Bronsdon 47:20 mostly they're predicting what should go next versus, I think, creating that new. And so it it just feels to me it's like to make a true breakthrough here and have a fundamentally different paradigm, we'll take just that, like a a new way of exploring the problem. We're gonna create a lot of business value. We're gonna create really interesting systems out of this. We can solve a lot of problems.

Conor Bronsdon 47:45 But I, I don't know that we're going to truly redefine how thinking works. And that to me feels like it is a different step. So I guess it's also a question of how do you define AGI? I think an LLM, I don't know, GPT six, or Gemini six, or whatever, could become a strong enough knowledge worker that it can just solve most business problems that are kind of inherent today? Sure. I think that's that's possible in the current architecture. And I think that's a huge amount of value and worth shooting for. But I don't think it's gonna create a new model of physics, I guess is how I'd put it. Yeah.

Aurimas Griciūnas 48:23 There's by the way, there's one more area that I'm kind of think that it could work, but I forgot to mention. So self improving agents, right? So agents still kind of seem like they are the way to go, at least short term, as with the LLMs that we do have, right? How do we make the agent rewrite its own code? And I think this is where we also will need a lot of inference,

Aurimas Griciūnas 48:51 and this is where a lot of the nuclear reactors should be going into. But how do we make agent? So we can make it the right new agents upon new agents like evolutionary algorithms do, right? Yeah. That's that's one potential. But I think this will be an area of research, active research in the next few years, at least two years maybe. Not six months, a little bit later.

Aurimas Griciūnas 49:16 I'm definitely looking forward to this one. Yeah. I think it's really easy,

Conor Bronsdon 49:21 because there's so many exciting things happening in this space, particularly the last couple of years, to expect that, oh, we're gonna solve this immediately. It's gonna be solved tomorrow. And in some cases, I've been surprised by the problems that are being solved, but agent swarms, and again, maybe I'll be proven wrong on this by the time this episode comes out,

Conor Bronsdon 49:37 it feels like truly having self improving, self growing agent swarms that are very successful for solving an actual enterprise's problems, not doing a cool demo, it's gonna take a bit of time still to get right, because there are these inherent challenges that we're talking about, can vibe credit our way to our demo on this. We can quickly pull something together. But actually solving a business problem in a way that makes money is a different challenge entirely. So Armas, I just want to say thank you so much for joining me today. It's been such a fun conversation. And I really appreciated you

Conor Bronsdon 50:07 kind of bringing us through your thought process and talking about so many of the things you're thinking about. Where can listeners go to in order to follow you and learn more about all the great stuff you're creating and thinking about?

Aurimas Griciūnas 50:21 So you can find me on LinkedIn, You can find me on X, and you can also find me on my newsletter, which is newsletter.swirlai.com. And very soon, I will be start also starting posting out YouTube videos, so you can also start checking my YouTube channel, which is still empty. It's already there for more than a year, I think. It's still empty. So in the upcoming month, for sure, there will will be some fresh videos coming in. Fantastic.

Conor Bronsdon 50:53 Well, I am excited to watch those, and I highly recommend to all of our listeners, definitely follow RMS on whatever platforms you're active on. His work's deeply valuable, and his his thought process is, I think, excellent. And while you're at it, make sure you're subscribed to the Chain of Thought podcast on whatever platform you're interested in here. Whether that's LinkedIn, you follow Galileo or Chain of Thought podcasts,

Conor Bronsdon 51:16 Spotify. If you're listening to Apple Apple Podcasts, we love our rating or review. Makes a makes a huge difference. If you're on YouTube right now, you can see Armis and I's smiling faces as we discuss the future of AGI and everything else. And just wanted to say thank you so much, everyone, for listening. And and Armis, thank you so much for joining us today. It's been an absolute pleasure. Thank you for having me.