Cover art for Architecting Reliable Agentic AI | Cisco’s Giovanna Carofiglio on the AGNTCY Collective

Episodes · S2 E31

Architecting Reliable Agentic AI | Cisco’s Giovanna Carofiglio on the AGNTCY Collective

· Giovanna Carofiglio , Cisco · 41 min

AI AgentsOpen Source AIMCP (Model Context Protocol)AI Evaluation & ReliabilityAI ObservabilityEnterprise AI

Key takeaways

  • Giovanna Carofiglio describes AGNTCY as an open-source collective Cisco, Galileo, and LangChain launched as founding members “in March.” The premise: for agents built in different frameworks and deployed remotely to collaborate, the ecosystem needs interoperable, distributed agentic communication — what she calls a new internet revolution.
  • She lays out AGNTCY’s pillars as discovery, compose-and-deploy, then communication. Agents get found by skills, publishers, tool compatibility, and reputation (an area she says “Galileo really helps build”), via a discovery service launched “a week ago.” Once located, they collaborate where they already run, often served as a service.
  • On protocols, Giovanna separates layers: MCP handles agent-to-tool interaction, especially for remote tools behind MCP servers, while agent-to-agent is trickier — AGNTCY worked on ACP with LangChain, and Google has since proposed A2A. She wants AGNTCY to support all of them, over a lower-level transport layer akin to TCP.
  • Cisco’s transport protocol SLIM was “recently announced” for group-based, low-latency, secure, interactive communication. Giovanna argues point-to-point isn’t enough: when communication is driven by natural-language questions, multiple agents collaborate, so the foundation should be secure by design, low latency, group based, and data centric.
  • For observability, Giovanna says AGNTCY — with Galileo, LangChain, Traceloop, Pydantic, and 50-plus partners — is defining an interoperable standard schema for agentic communication, extending OpenTelemetry so MELT telemetry stays open. An SDK supporting the schema released “two days ago,” and she wants to reconstruct the agentic graph.
  • Giovanna frames predictability as explainability: even stochastic, “magical” model behavior can be explained. Working in the collective and with Splunk at Cisco, she wants to test the space of solutions and keep output consistent for similar inputs. Her frontier is “active evaluation” — quantifying the margin for improvement, then remediating.

Frequently asked questions

What is the AGNTCY collective and who founded it?
Giovanna Carofiglio describes AGNTCY as an open-source collective launched “in March,” with Cisco, Galileo, and LangChain as founding members and 50-plus partners. Her premise: for agents built in different frameworks and deployed remotely to collaborate, the ecosystem needs interoperable, distributed agentic communication — a new internet revolution like the one Cisco pioneered years ago. Its pillars: agent discovery (by skills, publisher, tool compatibility, and reputation), compose-and-deploy (agents collaborating where they run), then communication, observability, and evaluation.
How does Giovanna distinguish MCP, ACP, and A2A, and why a transport layer beneath them?
Giovanna says MCP and ACP aim at very different objectives. MCP targets agent-to-tool interaction — valuable especially when tools are remote rather than integrated, exposed like an API behind MCP servers — and she notes it’s popular because it’s simpler. Agent-to-agent is trickier: AGNTCY started from LangChain’s agentic protocol (ACP), and Google has since proposed A2A. AGNTCY wants to support them all. Her key argument: these application-layer protocols need a lower-level transport layer beneath them — secure by design, low latency, group based — like TCP underpins the internet.
How is AGNTCY approaching agentic observability, evaluation, and standards?
Giovanna says the foundation is defining an interoperable standard schema for agentic communication — the layer that lets agents be instrumented so metrics, events, logs, and traces (MELT telemetry) reconnect — with Galileo, LangChain, Traceloop, Pydantic, and 50-plus partners by extending OpenTelemetry to keep it open. An SDK supporting the schema released “two days ago.” Beyond visibility, she wants to reconstruct the agentic graph (noting Galileo released a way to do this) so developers and enterprises can see how agents communicate, how data passes between them, and how tools are called.
How does AGNTCY think about agent identity and security at scale?
Giovanna says AGNTCY recently released an agent identity component. It started from a schema for identifying agents — she calls it “OSF” [AGNTCY’s published schema is OASF] — meant to do for agents what OCSF (Open Cybersecurity Schema Framework) did for security: become a lingua franca. AGNTCY pushed a first version and welcomes contributions on what defines an agent, since even that can be challenged. The goal: agents uniquely identified, with clear provenance for agent and data, spelling out their skills to be discovered and rated. She frames identity as the start of broader security work.
What does Giovanna want to see next from Galileo and the collective?
Giovanna says she wants Galileo to bring its work on defining and recommending metrics — especially for multi-agent systems, agentic context, and communication — into AGNTCY, including sample metrics and a metrics computation engine to guide developers through assembling an application. She praises Galileo’s Luna small language models, saying evaluation should be constrained in budget and time, and that deep evaluation with SLMs is “amazing.” She names evaluation that needs less ground-truth data as an exciting frontier, and points listeners to the observability and evaluation group.

Chapters

  1. 00:00Introduction
  2. 01:00Overview of Agent Interoperability
  3. 02:20What is AGNTCY
  4. 03:45Agent Discovery and Composition
  5. 04:38Agent Protocols and Communication
  6. 05:45Observability and Evaluation
  7. 07:00Metrics and Standards for Agents
  8. 09:45Challenges in Agent Evaluation
  9. 14:15Low Latency and Active Evaluation
  10. 23:34Synthetic Data and Ground Truth
  11. 25:07Interoperable Agent Schema
  12. 26:37MCP & A2A
  13. 30:17Future of Agent Communication
  14. 32:03Security and Agent Identity
  15. 34:37Collaboration and Community Involvement
  16. 38:28Conclusion

Show notes

The Internet of Agents is rapidly taking shape, necessitating innovative foundational standards, protocols, and evaluation methods for its success.

Recorded at Cisco's office in San Jose, we welcome Giovanna Carofiglio, Distinguished Engineer and Senior Director at Outshift by Cisco. As a leader of the AGNTCY Collective (an open-source initiative by Cisco, Galileo, LangChain, and many other participating companies), Giovanna outlines the vision for agents to collaborate seamlessly across the enterprise and the internet. She details the collective's pillars, from agent discovery and deployment using new agentic protocols like Slim, to ensuring a secure, low-latency communication transport layer. This groundbreaking work aims to make distributed agentic communication a reality.

The conversation then explores the critical role of observability and evaluation in building trustworthy agent applications, including defining an interoperable standard schema for communications. Giovanna highlights the complex challenges of scaling agents to thousands or millions, emphasizing the need for robust security (agent identity with OSF schema) and predictable agent behavior through extensive testing and characterization. She distinguishes between protocols like MCP (agent-to-tool) and A2A (agent-to-agent), advocating for open standards and underlying transport layers akin to TCP.


Chapters:

00:00 Introduction

01:00 Overview of Agent Interoperability

02:20 What is AGNTCY

03:45 Agent Discovery and Composition

04:38 Agent Protocols and Communication

05:45 Observability and Evaluation

07:00 Metrics and Standards for Agents

09:45 Challenges in Agent Evaluation

14:15 Low Latency and Active Evaluation

23:34 Synthetic Data and Ground Truth

25:07 Interoperable Agent Schema

26:37 MCP & A2A

30:17 Future of Agent Communication

32:03 Security and Agent Identity

34:37 Collaboration and Community Involvement

38:28 Conclusion


Follow the hosts

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠


Follow Today's Guest(s)

AGNTCY Collective: agntcy.org

Connect with Giovanna on LinkedIn

Learn more about Outshift: outshift.cisco.com


Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Transcript

99 segments

Conor Bronsdon 0:05 Welcome back to Chain of Thought, the podcast that positions builders to put AI into production. I'm your host, Conor Bronsden, Head of Developer Awareness at Galileo. You may notice if you are watching on YouTube, we are not at home today. I'm not sitting in my office. I am delighted to be at Cisco's office on Santana Road in San Jose. And we're going be diving deep into the fascinating world of agent interoperability

Conor Bronsdon 0:31 and the infrastructure that will power the next generation of AI systems. I'm thrilled to be joined by Giovanna Carafio. She's a distinguished engineer and senior director at Outshift by Cisco, where she's at the forefront of building the internet of agents. And we're delighted to be part of their agency collective effort with Langchain and with Galileo leading the steering committee alongside Ouchip by Cisco.

Conor Bronsdon 0:56 Giovanna brings an in-depth perspective to this conversation. She's one of Cisco's youngest distinguished engineers. She has an incredible research background, a background in mathematics, originally from Italy, based in Paris today, very jealous, with a remarkable background that spans from reinventing internet protocols with information centric networking,

Conor Bronsdon 1:16 to now architecting the open standards that will enable AI agents to collaborate seamlessly across the enterprise and across what we think will be an Internet of Agents. Now, she's working to reinvent how the Internet's communication protocols will work with agents in particular. Giovanna, welcome to Chain of Thought. My pleasure. Thank you for having me. We really appreciate you making the time. I know you're only in The States for a couple weeks, and so it's fantastic that we got a chance to sit down with you. The timing really couldn't be better for this conversation

Conor Bronsdon 1:47 coming off the heels of Cisco Live and some of the really cool announcements there. Though this recording will likely go out in a couple of weeks, so if you already heard the announcements, pretend it's happening live for you right now. And there's so much momentum building around the agency collective, the open source collective that you are helping to lead.

Conor Bronsdon 2:06 We're witnessing the foundation for this emerging agentic stack for a world where agents are embedded throughout the enterprise, throughout organizations. Giovanna, let's start with that big picture. What is agency? What is this collective we're talking about? And why is an Internet of agents what you think the future will be?

Giovanna Carofiglio 2:27 Sure. So Agency is an open source collective that Cisco, Galileo and Langchain as founding members launched in March. And that's because we believe that in order to have agents coming together and collaborate, even when developed with different agentic framework and deployed remotely, that really calls for a new internet revolution as the one Cisco pioneered

Giovanna Carofiglio 2:58 many years ago, where we really provide the developers or the enterprise that deploys these agents the tooling for interoperable, large scale, distributed, agentic communication. And this is what agency is about. If you want to go through the main pillars of agency, essentially it's around identifying and discovering agents. We think that there will be agents specialized,

Giovanna Carofiglio 3:31 bit like subject expert matters that we have in our teams in given domains or for doing given tasks or attached to tools. And for that to come together, what's important is that we enable a discovery. So this is what we call in agents, agent discovery and we have recently launched a week ago, the service that powers agent discovery and this is where we're going to be able to go and search agents

Giovanna Carofiglio 4:03 based on skills, based on publishers, based on compatibility with given tools, and based on their reputation, which is something that Galileo really helps build. So this is discovery phase. The second one, which is very important, is the compose and deploy. So once you have located these agents, what's really important is that they come together, again, where they are, they can be served as a service, they don't have to be deployed, which is most of the case these days.

Giovanna Carofiglio 4:38 And they will interact again across the internet. For that to happen, what's important is that we rethink the agentic protocols. So we have seen a lot of protocols coming out these days, especially for agent to agent. ACP is a protocol we have worked on with lung chain as an extension of agentic protocols. A2A has been proposed recently by Google, and we want to support all of them. MCP

Giovanna Carofiglio 5:06 is a very popular protocol these days for agent to tool interaction. This happens at application layer, but for that to be efficient, and when I say efficient means secure, low latency, interactive, that also calls for a transport layer protocol, which is funny because this is what we were working on days ago with information centric networking. And this is what we are doing in agency

Giovanna Carofiglio 5:34 with a protocol named Slim that was recently announced in the past days, where that enables group based low latency secure interactive communication. I love That's one aspect, but I really want to go into observability and evaluation. Our favorite topic. Our topic that we work on with Galileo. I think this is really key to unleash the power of agentic application, because we need to help developers. So the the first step there, we are working together again with Galileo,

Giovanna Carofiglio 6:09 LungChain, Tracelope, Pedantic and many other of the 50 and plus agency partners is to define an interoperable standard schema for agentic communications. This is the underlying layer for agents to be instrumented and to reconnect all the metric events, log traces, so the MELT telemetry that comes out of them, and we are trying to do that extending open telemetry standards, because of course agents will come within application

Giovanna Carofiglio 6:53 and we want that to be as open as possible. And so there we are driving the communication standards. We recently released two days ago, the SDK that supports the schema and we are working with partners such as Galileo and the collective. That's the foundation, but what really excites us is that it really opened then to doing a bit more than just providing visibility.

Giovanna Carofiglio 7:25 So the next step, and this is something that we really think is important, is to be able to explain this agentic communication. So today, we have seen a lot of in agent observability or even evaluation. What we want to do next is really to provide the visibility of the agentic graph. Know Galileo has released these days a way to reconstruct the agentic graph.

Giovanna Carofiglio 8:02 This is very important. We want to have that in agency released one of our next components, because this will give both the developer and the enterprise deploying the application in production the way to understand how the agent communicate, how the data are passed between one agent and another, how the tool are called, and this opens to much deeper analysis.

Conor Bronsdon 8:32 Yeah. Absolutely agree. I think it's crucial that we make debugging and understanding agent interactions much easier. It's why we released our graph view so that you can see the basic trace and understand it in a different way much more visually, which I think is especially great in order to enable not just developers, but business users and other folks who are are building agents or seeing agent interactions.

Conor Bronsdon 8:55 It's also why we're adding other views like timeline view, which we recently released. So you can actually see on a timeline, as different agents are working together, how are they communicating? Where are they calling tools? How do they work together? And then messages. So you can see the full trace within how that agent will will experience it and how an individual will experience it at other end. And I think it's really important we have all these to use and more to come because

Conor Bronsdon 9:20 the interface for agents is so based around natural language right now, but it's rapidly expanding to include multiple different mediums. And I would expect we're gonna have a variety of ways that we need to interface with these agents and which they need to interface with each other. And so being able to understand that communication, make it more predictable, make it more reliable,

Giovanna Carofiglio 9:42 is crucial. And it's why we've released our AI reliability features and are excited to bring a lot of to the agency collective as well. Totally. I think this is a key point if we really want to see this application in production. So these days we see a lot of single agent or some talk about multi agent. We want to go to the next level or really see business application

Giovanna Carofiglio 10:05 adopting these agents in a trustworthy way. And so for building that confidence, it's really important that the evaluation goes deep into explaining how this system works because they are much more complex than normal application. And this is something that we want to drive in agency and that also companies such as Galileo are driving. So there I would say that there are a few aspects that are really important. So first of all, the selection of good metrics

Giovanna Carofiglio 10:43 and not too many, because we don't want to confuse the users. We really need those metrics that are key to capture the dynamic of these multi agentic systems. I want to recommend the user which one to use and think this is one of the themes that we are also working on. We want that evaluation to be done in a cost effective way. So today it can be a lengthy task.

Giovanna Carofiglio 11:11 It can involve a lot of models and dataset creation. We want to have that as a seamless process that is really integrated into the development and that builds this trust on the agentic system. Absolutely agreed.

Conor Bronsdon 11:31 We need to start bringing the metrics for agents, the observability for agents to developers versus forcing them to always go somewhere else and, you know, pull that information into their platform of choice. It's why we're doing things like it may be live by the time this releases actually, using MCP to bring agentic metrics and visibility into the IDE. So you can simply prompt your way through using Galileo

Conor Bronsdon 11:58 by ingesting our docs, ingesting the agency docs, and any agency related agent can easily add observability with OpenTelemetry with Galileo. That's why we're really excited about the AI reliability features that we are are rolling out with as part of our AI reliability platform specifically built for agents. And I know that's something you're really passionate about is this idea of creating predictability

Conor Bronsdon 12:21 for agents because, obviously, the nondeterminism is the magic. Right? It is the opportunity. But we have to have structures within an operates. Have to have some predictability. We have to have some explainability and reliability. I'm an easy one of ability phrases here, but it matters, I swear. Tell me more about your perspective on how to create predictable agents that do what we need them to do and

Giovanna Carofiglio 12:48 have enough trust to actually go into production. Yeah. Yeah. We all love the fact that this model seems kind of magically doing their job. But maybe because of my background, I think it's really important we get to explain because even the stochastic nature can be explained. And so what we are working on and in the collective, also at Cisco with Splunk, it's really to make

Giovanna Carofiglio 13:18 the agent or the agentic collaboration predictable. That means that we want to explore by testing all the possible state of solution or space of solutions that these agents will work in and try to characterize patterns, make sure the output is consistent where the agents are provided with similar input. It also goes into recreating, reconstructing the normal behavior

Giovanna Carofiglio 13:56 that for these systems is not so trivial. Doing that I think is going to be very important, for enterprise to have confidence about the output of the systems.

Conor Bronsdon 14:12 And so this is something that is really important in my in my view. Yes. Absolutely. And there's also the challenge of latency, which we were talking about a little bit before we started recording. Because we have this expectation of of real time, often LLMs will not fulfill all the needs of evaluation systems. Often, human feedback will not fulfill all the needs of evaluation systems.

Conor Bronsdon 14:38 Now there are places for those. LM as a judge is a, you know, at this point, kind of well known technique, continuous learning through human feedback. Galileo has it. Many other folks are are starting to bring in human feedback. I know Cisco's leveraging it, And and leveraging those SMEs while also creating your digitized SMEs with an element of judge can provide a lot of excellent observability and evaluation feedback to AI systems and agents in particular.

Conor Bronsdon 15:05 However, if you need them operating in real time, you also need guardrailing. You also need lower latency metrics. Do you have any insights onto how you think that should be approached by teams?

Giovanna Carofiglio 15:19 Sure, sure. I think you're touching upon very interesting aspects. So we I would say that we have been working on the low latency from an agent communication perspective. As I was mentioning before, we're working on this group communication where you can think that multiple agents with different skills will, when provided with a question will be able to answer and

Giovanna Carofiglio 15:51 combine their output and that really calls for a low latency communication protocol because otherwise it's really unusable. So this is very important. Think in terms of evaluation of this system, we also want to monitor latency, but I would say that the step farther that I found really exciting is to be able to provide recommendation for improvement. And so doing what we can call an active evaluation. So not just observe and score

Giovanna Carofiglio 16:24 the existing, but really trying to quantify the margin for improvement and provide recommendation, whether it's to the developer, the composer of this application, or even the enterprise running in production to optimize, optimize for latency for sure, optimize for costs and costs, as you mentioned before, related to running this application and costs related to the evaluation. I think what they're doing is absolutely

Giovanna Carofiglio 16:55 awesome and key if you want to have this three sixty degree evaluation, must be cheap and fast if you want to have it. And so yes, to me, the frontier which I found really exciting is that to go beyond just evaluating and really providing remediation and helping root cause analysis and fixing errors even in real time when the application is running. So I think this is where we should go next. Completely agreed. And obviously,

Conor Bronsdon 17:28 we've done a lot of research on that with our team here at Galileo as part of creating our Luna family of evaluation models. Folks who have checked those out may know that they enable lower latency evaluations. They enable real time guardrailing for things like EII, toxicity, prompt injection attacks, and more. And we have actually now released Luna specific

Conor Bronsdon 17:58 metrics for agents as well. So tool selection quality, action advancement, is the agent actually taking actions to move forward, action completion, it could actually fulfill the task you want, tool airing. We now have a family of LUNA metrics that are fine tuned with these SLMs we've developed so that we can have significantly lower latency agentic evaluations and feedback as well as that guardrailing to enable that production challenge that you're talking about here. And we're really excited about that and excited about how that's also fueling our insights engine,

Conor Bronsdon 18:29 where we can say, hey, great. Like, here, based off of these, extremely low latency real time metrics, here are the insights that we have. Here are suggestions for changes you can make. And to me, that's the magic. I see this huge opportunity to keep leaning in there.

Giovanna Carofiglio 18:43 Yeah. Especially think about it if you're you're gonna be able to do it live. Yeah. If you're gonna do to be able to embed this evaluation with the systems and provide recommendation as it goes, that would really, I think, check all the boxes from giving transparency and control and even the margin for improving these systems. I really

Conor Bronsdon 19:06 hope that that they're getting there. I I think we're I think we're there, which is really, really exciting. The the next step for me is gonna be, okay. How do we now enable rapid implementation of that? Do we do it, like in the ID through MCP that we talked about earlier, where you can say, okay, here's my, I I've like, I'm, I'm building this application. I'm experimenting.

Conor Bronsdon 19:28 Oh gosh, like here's a problem. Great. Let's hit complete. We're good to go. Do we do it through integrations with agent providers and, folks who are building agents? Like, an example of this might be like n eight n, where you can look at our agent graph and say, oh, like, here's where tool layering is happening. Suggestion is to move this step later. Great. Just hit accept, it flows back and then and then your agent changes.

Conor Bronsdon 19:51 Now we're not quite there yet, but I think that's where things are going. And I'm really excited about that, what that future looks like, especially once we can solve these protocol challenges around actually having great communication. Because imagine these self improving agents that folks are working on. And I think there's so much opportunity for self improving evaluations, self improving agents that say, hey. Let me take the inputs out of these these evals,

Conor Bronsdon 20:15 these observations that are coming through these small language models, these Luna models that we have, and then apply them in real time to improve myself based off of whatever guardrails have been suffering. Which implies

Giovanna Carofiglio 20:25 other AI. Yeah. And reinforcement learning. Yes. That that that's also great. Yeah. You were mentioning about the the the Luna models. I think what's also interesting there is that Galileo calls for custom metrics. It's really important that we do this evaluation in a way that is really targeted to what is the intent of the agentic system. So this is something that we also want to open source as example to the And

Giovanna Carofiglio 20:54 so one of the things that again is coming in agencies, this metrics computation engine, where a few example of important metrics related to the evaluation of the agentic communication, agentic framework, the task delegation, workflow efficiency

Conor Bronsdon 21:10 will will will be there. I absolutely agree with you. And it's the same reason we've added automatic insights and metric suggestions within our platform as well. And we're really excited to work with agency to bring some of that to the collective because, I think it's so essential that if I come in and I have a customer support agent, like, yeah, I can suggest four or five metrics from the Galileo platform

Conor Bronsdon 21:33 are already in there and likely will work. And probably a couple open source ones with agency that, yeah, you should probably have context adherence and tool selection quality. But maybe we have four custom metrics that we've developed or that we can suggest that we develop with you, that you do you can do it live with an LMS Judge or an SLM or by code. And I think that is where we can really start this data flywheel moving.

Conor Bronsdon 21:59 As we get customization around these different systems, I mean, enterprises have such complex internal systems that not everything is gonna run on open source. Not everything is gonna be simple. So how can we enable the base standards to work together and then customization based off of the direction that each system goes. Absolutely. What else are you thinking about when it comes to agents right now? What is top of mind for you when it comes to making sure they succeed? Beyond evaluation?

Giovanna Carofiglio 22:26 Well, I mean, we can dive deeper into evaluation I'm just so that's really thinking about bringing the evaluation in a different way, Observability and evaluation. So today we are going through the standard route of instrumenting and collecting data and computing upon, which is required. And as we said, there are a lot of things already there to do, a lot of challenges.

Giovanna Carofiglio 22:50 But I'm thinking whether it's possible, I mean, is just a thought these days to really embed these observability and evaluation as agents within the genetic that would be able to work in a localized manner while the application is running. I think there is a lot of AI that will help not just in terms of what we observe, but how we observe. So yeah, I think the future will

Giovanna Carofiglio 23:28 bring a lot of new opportunities for this drafting also how Another we're

Conor Bronsdon 23:34 doing

Giovanna Carofiglio 23:35 thing that really passions me is the capability to do evaluation without a lot of ground truth data. I mean, you know this better than mine, Galileo, you're trying to remove this need for good ground truth data set because in many cases you

Conor Bronsdon 23:54 just don't have them. Well, it's also why we have synthetic dataset. Yeah. So, I mean, I a, I agree. Like, we're not always gonna have ground truth to work off of. And it's part of why we've added synthetic dataset generation to our platform so you can generate your ground truth. Especially when you're creating these custom metrics, it may be like, oh, we want this idea.

Conor Bronsdon 24:13 But I think part of this too, it comes down to continued fine tuning of your metrics and of your ground truth. Because what you may initially establish may may change over time. You make an SME feedback. You can use auto tune metrics, Galileo, for example, to provide that human feedback and auto tune the metrics, go back to the SLM or the LLM and say, hey. Great. Like, adjust based off of this.

Conor Bronsdon 24:34 But I I think it's a really interesting challenge to your point where we increasingly have, I mean, like datasets built on datasets and AI built on AI datasets. And I'm I'm really optimistic about what we're doing with synthetic data sets, not just at Galileo, but all all over the world. Yeah. Also to to to reduce the need for human feedback. Yes. Because this is there, but it doesn't scale as small as we need Right. It's crucial to having certain points, but if you try to do it across the board,

Conor Bronsdon 25:03 not enough time, not enough humans. What else are you thinking about as far as observability evaluations when it comes to agency? Where do you see the next six, twelve months go? I think,

Giovanna Carofiglio 25:13 first of all, I think we should succeed in making this interoperable agentic schema an instrumentation standard and this is something that is happening in OpenTelemetry to me is the key to Because for sure agents will be developed with multiple frameworks and interconnected the entire Internet of Agents principle lies upon that. So this is something that I see happening because

Giovanna Carofiglio 25:45 again, accelerated by the agency collective that we see a lot of partners and companies in general pushing for that. And the other thing that again, companies such as Carlyleo are promoting is really the capability to have this evaluation done in a compact, effective way. This will, I think, foster the development or more complex application because people would be less scared about the complexity of this system. So this is to me something that is hopefully coming in the next six months,

Giovanna Carofiglio 26:27 while of course after that, we'll see the evolution, I will try to adjust even observability and evaluation

Conor Bronsdon 26:35 with the very rapid development of this agentic system that we are observing. Let's get specific for our audience. I know a lot of folks listening are familiar with Agent Connect protocol. A lot of folks listening are familiar with model context protocol. But some aren't. Some may only know one. And I think it's important to set the ground truth of this conversation.

Conor Bronsdon 26:58 What are the differences between the agent connect protocol and the model context protocol? And how can they work togetherwhen should they be used?

Giovanna Carofiglio 27:07 Yeah. First, let me say that with protocols, and this is true in general, beyond the genetic, you see the emergence and you can have multiple protocols trying to achieve the same thing before a few of them really become the reference. So I see this happening today and in agency we want to support them all. Now you mentioned specifically MCP and ACP. Well, they're trying to achieve two very different objectives here. So MCP is really aimed at

Giovanna Carofiglio 27:44 agent to tool interaction, which is very important. And with that, especially when the tools are remote and not integrated within the agent. So this is a protocol that we want to support, we are already integrating and supporting an agency also in terms of observability. Now, again, this is to me related to agent to tool interaction, it's coming sooner and is very popular

Giovanna Carofiglio 28:19 because it's much simpler for these agents to use tools and for tools to provide you a way like an API to be used by agents behind MCP servers. That's MCP, it's really becoming popular. And again, we want to support it. Agent to agent communication, this is trickier one. We have tried even there to start from existing protocols at the time, the agentic protocol from

Giovanna Carofiglio 28:52 lung chain. Now we see the emergency of other protocol, A2A for instance, but there many others. I think that there we really need to progress and see what's really required. But to me what's really important is that we power that with a lower level transport layer that supports them all. It's a bit like what TCP has done in the internet protocol stack for a long time. We want to have that

Giovanna Carofiglio 29:32 good foundation. And this time, I would say, again, based on my past work, we want to make it secure by design, low latency and group based. Because when I think really beyond in agents, we're really beyond this agent to agent communication in most of the cases, especially when the communication is driven by natural language question,

Conor Bronsdon 29:56 multiple

Giovanna Carofiglio 29:59 agents will collaborate. And this is also one of the frontiers of identity communication. We want these agents that today we are stitching into agentic graph to be able to autonomously collaborate. Yeah. And that really calls for a bit more than just one protocol or a point to point connectivity.

Conor Bronsdon 30:22 It's a great point. And, Yavanna, I'd love to understand more about that vision of the future. As we move from a couple agents interacting to thousands, millions of agents, What are the biggest technical challenges that you're anticipating and how is agency seeking to solve them?

Giovanna Carofiglio 30:40 Well, you know, as Cisco, we always say this in three connectivity, and this is what we're just talking about, the transport layer protocols that we are pushing with Slim should be able to scale. That's why it's going to be a data centric protocol that by definition is not connecting points. So this is something that has proven to be a good fundamental principle for the design of protocols.

Giovanna Carofiglio 31:14 The second one is security. So there as well, there is a lot to come. We haven't touched upon that, but it is definitely a lot of work that starts with identity. It goes into being able to connect in a secure way, even in enterprise with zero trust, agentic concepts, agents. And three, we are always coming back to the observability one,

Conor Bronsdon 31:43 that we need to make it work at scale. We did not pay her to say this. I promise.

Giovanna Carofiglio 31:48 It's really something that that that is close to my heart these days because we are working and I think a lot of potential there. But as we said, also as a requirement. There we need to make this, the availability and evaluation, they are all part of the same dimension be scalable.

Conor Bronsdon 32:09 You mentioned security. Wanna I actually had that as my next question, so I'm glad you brought it up. One of the most compelling things happening around security is agent identification and frameworks around ensuring agent identity. Otherwise, hey, how good is a communication protocol if agents are lying to one another and stealing information? Why is identity such a critical challenge for autonomous agents? And how do Cisco and the agency collective plan to solve that problem? Yeah, it's good to mention. We just released an agent identity component in agency.

Giovanna Carofiglio 32:45 So all started for us, first of all, from defining a schema for identifying agents, we call it OSF, is one of the components of agency, we call it like that if you want to replicate what OCSF with open cyber security schema framework has done for security becoming this lingua franca for defining cybersecurity data. And so in this case, we want to really characterize,

Giovanna Carofiglio 33:19 we pushed a first version of OSF, a really welcoming contribution there about what defines an agent. Also because the definition of agent itself, as we were mentioning before, can be challenged. And so we want for these agents to be uniquely identified, to have a clear provenance, whether it's for the agent and the data, to have a clear way to spell out their skills,

Giovanna Carofiglio 33:49 so that can be discovered based on skills and that can be evaluated or you can have a reputational touch the skills. And beyond that, we want to have an identification of agents, of tasks, then there's a lot more work coming, but this is definitely one of the important

Conor Bronsdon 34:14 dimensions for agency today. Absolutely. And we're excited also to release a new agent leaderboard. That's we have an open source one. You may have seen our first version. That is going to include a lot of focus around agent success with certain verticals and tasks because we we agree there there's a huge opportunity here to understand success, to define what we want out of predictable,

Conor Bronsdon 34:37 hopefully, agents, and there's a huge need to secure and observe that. I'm curious, what do you wanna see from Galileo as we build for the next few months? Like, what are the things that we would would or could or are building that would be exciting to you and and useful?

Giovanna Carofiglio 34:55 Well, you know that. I want for Galileo to push some of these awesome work you are doing in defining metrics, especially for multi agentic system and agentic context and communication intertwelve agency in an agency. I think this is very important. We can have a few sample medics and the metrics computation engine as one of the elements that we want to provide the developer to

Giovanna Carofiglio 35:24 start, help him through this journey of putting together a genetic application. So this is coming and I know we are working on that. Yeah, are really excited about Galileo bringing all these expertise in defining metrics and recommending metrics into agencies so that really together we can have a more predictable and reliable agentic system.

Conor Bronsdon 35:54 How can Galileo's Luna models unlock more opportunities for lower latency evaluations?

Giovanna Carofiglio 36:02 Well, I think evaluation should be constrained in budget because you don't want to have it too expensive and constrained in time. So what Galileo is doing with the small language model in Luna, it's amazing. This is a key step for having a deep evaluation, a thorough evaluation of this system. So kudos to that. Well, we're hopefully excited to

Conor Bronsdon 36:29 bring some of that to open source as well. So I think there's gonna be lot of fun coming here. Thank you so much, Giovanna, for the conversation and for bearing with me as we knock out a couple last things here. I think we're all set, honestly. Absolutely agreed. I think there's a major opportunity for us to collaborate with Cisco and others within the collective,

Conor Bronsdon 36:48 Wama Index, Langchain, so many other folks who are there to create systems for AI agents that help them to stay as reliable as we we need, stay on task, and succeed. Because, really, in the end, we have a trust problem to solve, and we want to enable every enterprise, every business, every, hopefully, builder around the world to leverage AI agents in a way that helps make their lives easier.

Conor Bronsdon 37:14 And as you've pointed out throughout this conversation, there are keys to that. There are there are basic layers we have to solve. And I think we're we're starting to get there, but there's so much opportunity to build upon that. You know, we talked a bit earlier about the the small language models Galileo's using, LUNA models, which we just released new ones on.

Conor Bronsdon 37:31 And I'm really excited to see, like, are there open source opportunities around that? Are there opportunities to extend that? How can we make things more real time so that guardrailing is more successful? And, you know, some of this I'm sure will stay within the paid platform. Some of it will will come to agency and and be fully open source, but, it's just such an exciting time. I there's really nowhere else I'd rather be than than part of this as we we build, I mean, I think the future of the Internet and the future of knowledge work, which

Giovanna Carofiglio 38:01 I see your excitement about it too. Really agree. Let's go back to work.

Conor Bronsdon 38:06 Giovanna, it's been such a pleasure. Thank you for joining us all the way from Paris. We're so glad we could snag you while you were here in the Bay Area. This has been such a fascinating deep dive into the future of agents, frankly, the current situation with agents, and the infrastructure that is going to power the next generation of AI systems. Thank you for sharing both the technical vision for Agency

Conor Bronsdon 38:29 and the practical lessons that you're learning from building these systems. Where can our listeners go to learn more about the Agency Collective and to see the work of the steering committee with Outshift, Galileo and Langchaine.

Giovanna Carofiglio 38:41 Well, it was my pleasure. First of all, thanks a lot for this very interesting conversation. I think it really shows that that we have a path ahead of us and we want to get there very quickly. So for everyone who wants to know about our work in the collective, I would say go to agency.org and specifically on some aspects such as observability and evaluation, we're a working group. So I will encourage people to just go there and participate to the meetings

Giovanna Carofiglio 39:12 and take these journeys with us. Absolutely.

Conor Bronsdon 39:15 And you can find links to all that in the show description, as well as at galileo.ai, if you wanna try any of the features we've talked about today, and very much check out the GitHub as well. Agency.org will have links to all that. There's a lot of opportunities to contribute, and we're so excited to continue to include developers from around the world, and and very excited about the the opportunities to keep building these protocols and collaborating with A2A, MCP, and everyone else. To our audience, if you're building agents or thinking about agent interoperability,

Conor Bronsdon 39:44 evaluations, observability, predictability, reliability, all these phrases, all the abilities. This is a community. We want you to join. And the Internet of Agents isn't this distant future. I know it kind of feels that way sometimes as we talk about the future, what it looks like. It's being built right now by teams like Cisco, by team folks like Giovanna. And the Open Source Collective

Conor Bronsdon 40:06 needs you to help shape the future. So thank you so much for tuning into this special open source focused episode of Chain of Thought. Be sure to subscribe wherever you get your podcasts. And if you know a developer, a builder, whether they're a data scientist, someone else, someone who can contribute and you think would be interested in The Collective, send them a link to this episode. Have them check it out. We'd love their thoughts, their feedback,

Conor Bronsdon 40:28 their PRs if they wanna make a submission. And you can always find way more content like this, much more in-depth, everything from AMD's perspective on open source to so much more on the Galileo YouTube channel for more episodes and deep dives into the world of productionizing AI. This has been fantastic. Really, really appreciate you sitting down, and thanks for letting us steal your time. Nice meeting you. Nice meeting you as Thank you. Yeah. We'll definitely have to stay on time.