How Startups Can Succeed at Productionizing AI Without Breaking at Scale
- Jörn Menninger
- Aug 21, 2025
- 37 min read
Updated: Apr 8

What Is This About?
Productionizing AI without breaking at scale is the challenge that separates successful AI startups from failed experiments. This guide covers the practical engineering decisions — from model serving and monitoring to data pipelines and failover — that make AI systems reliable in production.
Introduction
Moving AI from prototype to production is where most startup AI projects fail. This practical guide addresses the engineering, organizational, and financial challenges of productionizing AI systems without breaking the bank — covering infrastructure decisions, model deployment strategies, monitoring requirements, and the operational discipline needed to run AI reliably at scale in production environments.
Executive Summary
Productionizing AI without breaking the bank requires strategic decisions about infrastructure, model deployment, and monitoring that most guides written by big tech companies fail to address for startup budgets. The most common failure point is underestimating the ongoing operational cost of running AI in production, which typically exceeds training costs within 6 months. Key strategies include starting with managed inference services before building custom infrastructure and implementing automated model performance monitoring from day one. The guide provides specific cost benchmarks and architecture decisions mapped to different funding stages.
Why most AI prototypes fail in production — and how startups can succeed.
This founder interview is part of our ongoing coverage of Scaleup Founder Interviews from Germany, Austria, and Switzerland.
Key Takeaways
Atomic Answer
🚀 Management Summary
Why most AI prototypes fail in production — and how startups can succeed. Startuprad.io brings you independent coverage of the key developments shaping the startup and venture capital landscape across Germany, Austria, and Switzerland.
Most startups underestimate the leap from AI prototypes to production-ready systems. What works in a demo often collapses under real-world usage. In this blog, we explore the hard truths of Productionizing AI, featuring insights from Dennis Traub, AI Engineering Specialist at AWS.
Founders, scaleup execs, and investors will learn why security, scalability, observability, and simplicity are critical for success in 2025. We also explain the role of agentic workflows, Model Context Protocol (MCP), LLM Ops, and AWS Bedrock in building reliable AI architectures.
👉 For startup founders scaling in Germany, DACH, and globally, this is your playbook for making AI production-ready.
📚 Table of Contents
Why Productionizing AI Matters for Startups
The Hard Truth: From Prototype to Production
Agentic Workflows & MCP: Connecting AI to the Real World
AI Architecture 2025: What Startups Need
AI Engineering & LLM Ops: The New Discipline
Counterintuitive Lesson: Less Tech, More Success
Further Reading & Resources
FAQs: Productionizing AI for Startups
🚀 Meet Our Sponsor
AWS Startups is a proud sponsor of this week’s episode of Startuprad.io. Visit startups.aws to find out how AWS can help you prove what’s possible in your industry.
The AWS Startups team comprises former founders and CTOs, venture capitalists, angel investors, and mentors ready to help you prove what’s possible.
Since 2013, AWS has supported over 280,000 startups across the globe and provided $7Billion in credits through the AWS Activate program.
Big ideas feel at home on AWS, and with access to cutting-edge technologies like generative AI, you can quickly turn those ideas into marketable products.
Want your own AI-powered assistant? Try Amazon Q.
Want to build your own AI products? Privately customize leading foundation models on Amazon Bedrock.
Want to reduce the cost of AI workloads? AWS Trainium is the silicon you’re looking for.
Whatever your ambitions, you’ve already had the idea, now prove it’s possible on AWS.
Visit aws.amazon.com/startups to get started.
🔹 Why Productionizing AI Matters for Startups
Building an AI demo is easy. Scaling it into production is not. Startups quickly face challenges like:
Security risks (exposing sensitive data through LLMs)
Scalability bottlenecks (apps breaking under user load)
Observability gaps (no monitoring = runaway costs)
API complexity (manually wiring dozens of integrations)
👉 Productionizing AI means transforming prototypes into secure, scalable, and observable systems. For startups, this requires handling security, scalability, monitoring, and real-world API connections before launch.
🔹 The Hard Truth: From Prototype to Production
Dennis Traub notes that most GenAI prototypes fail when exposed to customers. Why? Because they ignore production realities.
Top pitfalls founders face:
Underestimating cost blow-ups from non-deterministic LLM loops
Poorly defined service boundaries between agents
Lack of compliance frameworks (GDPR, SOC2, HIPAA)
No evaluation pipeline for regression testing
🔹 Agentic Workflows & MCP: Connecting AI to the Real World
The Model Context Protocol (MCP) is emerging as an industry standard for connecting LLMs to APIs, databases, and SaaS tools.
Why MCP matters:
Standardizes how LLMs call APIs
Reduces developer burden of manual integrations
Already supported by Anthropic, Google, Microsoft, and AWS
Enables multi-agent workflows without chaos
👉 Agentic workflows allow LLMs to plan, reason, and act via tools. MCP standardizes these connections, making production AI more reliable and scalable.
🔹 AI Architecture 2025: What Startups Need
According to AWS, a modern AI stack should include:
Model serving (via Bedrock, OpenAI API, or Ollama)
Orchestration frameworks (LangGraph, LlamaIndex, Strand Agents)
Data pipelines (vector DBs, semantic search, retrieval-augmented generation)
Monitoring & evaluation (LLM regression testing, observability dashboards)
Security & compliance layers (identity, guardrails, GDPR controls)
🔹 AI Engineering & LLM Ops: The New Discipline
A new role is emerging: AI Engineer.
Unlike traditional ML engineers, AI engineers:
Don’t train models — they integrate them into apps
Focus on orchestration, evaluation, and compliance
Work at the intersection of dev, ops, and data science
Dennis describes this as “DevOps + AI” — a discipline where evaluation pipelines and observability are as important as coding features.
🔹 Counterintuitive Lesson: Less Tech, More Success
One of Dennis’s strongest points: don’t over-engineer AI systems.
Sometimes, a simple LLM interface is better than a multi-agent stack. Complexity adds risk, latency, and cost. Founders should ask:
Can this be solved with deterministic workflows?
Do we really need an agent, or is one API call enough?
🧵 Further Resources
🚪 Connect with Us
Partner with us: partnerships@startuprad.io
Subscribe: https://linktr.ee/startupradio
Feedback: https://forms.gle/SrcGUpycu26fvMFE9
Follow Joe on LinkedIn: Jörn Menninger
Relationship Map
Jörn "Joe" Menninger → Host of → Startuprad.io
Frequently Asked Questions
What is this article about: How Startups Can Succeed at Productionizing AI Without Breaking at Scale?
Productionizing AI without breaking at scale is the challenge that separates successful AI startups from failed experiments. This guide covers the practical engineering decisions — from model serving and monitoring to data pipelines and failover — that make AI systems reliable in production.
What are the main takeaways from this discussion?
Moving AI from prototype to production is where most startup AI projects fail. This practical guide addresses the engineering, organizational, and financial challenges of productionizing AI systems without breaking the bank — covering infrastructure decisions, model deployment strategies, monitoring requirements, and the operational discipline needed to run AI reliably at scale in production environments.
How does this topic connect to the broader startup ecosystem?
Productionizing AI without breaking the bank requires strategic decisions about infrastructure, model deployment, and monitoring that most guides written by big tech companies fail to address for startup budgets. The most common failure point is underestimating the ongoing operational cost of running AI in production, which typically exceeds training costs within 6 months. Key strategies include starting with managed inference services before building custom infrastructure and implementing autom
About the Host
Joern "Joe" Menninger is the host of the Startuprad.io podcast and covers founders, investors, and policy developments across the DACH startup ecosystem. Through more than 1,300 interviews and nearly a decade of reporting, he documents the evolution of the European startup landscape. Follow Joern on LinkedIn.
Support Startuprad.io
This expert analysis was produced by Startuprad.io, the leading independent source for European startup intelligence. If this deep dive on AI productionization helped your technical strategy, subscribe to our podcast and newsletter for more expert perspectives on scaling technology in the DACH ecosystem.
Automated Transcript
1 Ah, well, the traditional problems that you have. I think 2 with every piece of software that you're trying to 3 productionize, it's fairly easy to build something 4 that works, that solves a problem. But as soon as you 5 put it out into the world, you have a few things. First of all, you 6 need to make sure it's secure. Second of all, you need to make sure that 7 it scales. Because if by all means I I hope for you 8 as a startup it actually works and people enjoy using it, 9 the next step really is how does it scale? How does it not fall 10 apart under the load of people wanting to 11 try? And the third thing 12 really is observability, being able 13 to really get telemetry, look 14 into what's actually happening. Foreign. 15 YouTube blog covering the German startup scene with
16 news interviews and live events, 17 AWS is proud to sponsor this week's episode of startup raid 18 IO. The AWS startup team compromises 19 former founders and CTOs, venture capitalists, 20 angel investors and mentors ready to help you prove 21 what's possible. Since 2013, AWS 22 has supported over 280,000 startups across the 23 globe and provided US$7 billion 24 in credits through the AWS Activate 25 Program. Big Ideas Feel at home at 26 aws with access to cutting edge technologies 27 like generative AI, you can quickly turn those ideas into 28 marketable products. Want your own AI 29 powered assistant? Try Amazon Q. Want to 30 build your own AI products privately? 31 Customize leading foundation models on Amazon Bedrock. 32 Want to reduce the cost of AI workloads? 33 AWS Trainium is the silicon you're looking for. 34 Whatever your ambitions, you've already had the idea.
35 Now prove it's possible on AWS. Visit 36 aws.Amazon.com 37 startups to get started. So you build a chatbot. 38 Cool. But now your team is stuck wondering how to connect it 39 to real APIs, make it reliable or roll it out 40 to thousand users. That's where this episode comes in. 41 AWS Dennis Troup walks us through how 42 to productionize AI securely at 43 scale and without breaking your app. We 44 dive into agentic workflows, MCP 45 and how real startups go from MVP to market. 46 Let's see in our episode today in cooperation with aws. 47 Dennis Straub is a developer advocate at aws. Focus 48 on on helping startups and enterprises to take their 49 gen experiments into real world deployment. With 50 decades of experience in secure infrastructure and 51 developer productivity, Dennis has helped teams across 52 industries automate, integrate and scale their
53 projects using modern cloud native tools. 54 Today we talk about model context, protocol, agent based 55 architecture and what it really takes 56 to make AI Work in production Dennis, welcome back 57 to celebrate. O oh thanks, 58 thanks for having me again. Totally my pleasure. For everybody who's 59 not aware of this, this is number two of a series of two 60 interviews. But actually one of your colleagues would join us as well. 61 So in total we have four interviews with you guys 62 here. Dennis 63 Productionizing AI Tell us about it. 64 What's the hard truth? What's the biggest gap you see between 65 Genai prototypes and action production systems? 66 Well, the traditional problems that you have, I think with 67 every piece of software that you're trying to productionize, 68 it's fairly easy to build something that 69 works, that solves a problem. But as soon as you put
70 it out into the world, you have a few things. First of all you need 71 to make sure it's secure. Second of all, you need to make sure that it 72 scales. Because if by all means I hope for you as 73 a startup it actually works and people enjoy using it, the 74 next step really is how does it scale? How does it not fall 75 apart under the load of people wanting 76 to try? And the third 77 thing really is observability, being 78 able to really get telemetry, 79 look into what's actually happening. Does it do what it's supposed to 80 be doing? Am I running risks in 81 terms of, for instance, running up a large bill 82 because it does something that I didn't expect it to do? 83 You may run into edge cases that you didn't have in your
84 prototyping environment. So the traditional problems 85 that you usually have when you're building something that was 86 small and you put it into the market, the second, 87 second thing that very often happens is as well is 88 you. Most, most pieces of software are not an island. 89 They have to connect to something. They have to connect to third party APIs or 90 to your own APIs, or to customers, internal APIs 91 and data sources. And that can be hard as well because 92 how do you make sure that the agent then first of 93 all securely connects to these APIs and 94 second, doesn't mess with them, doesn't 95 do anything that it's not supposed to be doing. That is something that you really 96 need to look at. As soon as you run into production production situation, you 97 will most likely have security, scalability
98 and connectivity issues with 99 internal and third party tools. Talking a little bit here about model 100 context protocol is getting a lot of traction in 101 AI engineering world. What is it and what does 102 it matter now? One of the big questions that everybody had in 103 regards to LLMs, to language models when they came out a few years ago, 104 really was there 105 a certain amount of unreliability especially when it 106 comes to them producing facts. 107 These models have been trained on an incredible 108 amount of data, but they don't really understand the data. 109 So by the nature of how these models work, 110 they don't really know whether what they're saying is 111 actually true or not. They just look 112 whether it matches a certain pattern that they have 113 seen quite often. And that's a really big limitations because in many
114 applications, in many use cases it's really important to work with 115 factual data and not with assumptions or with, with 116 made up information that 117 doesn't actually, doesn't actually represent the facts. That can be 118 pretty dangerous. So most, most people in 119 companies who've been working with LLMs, especially in their early days, 120 ran into this problem. Well, it doesn't really work because 121 it isn't really reliable in what it says, it makes up data, 122 it confuses things and so forth. So 123 there's basically two ways to solve this 124 problem. One way is 125 adding more information to the model, fine 126 tuning the model, giving it access to a 127 vector store so that it can do semantic search to retrieve 128 actual data to, to basically either 129 double check against actual data or just 130 use this data as part of its process, which is called
131 RAG retrieval, augmented generation. And I 132 don't want to go too deep into rag and I have a very 133 specific opinion of rag, not a bad one, but what it actually 134 is, I don't look at it from the perspective of a data scientist. 135 Actually I think RAG is the 136 second thing or belongs to the second thing that we can do which is 137 basically connecting the AI to the real 138 world add runtime. So 139 at the moment in time when I actually use 140 the model, pre training, 141 fine tuning and these things, they all happen before 142 I deploy the model into production and everything 143 after that happens when the model is in production, when my 144 application runs and RAG tool use 145 and other mechanisms basically solve this problem, 146 connecting the AI to the real world. And that could be a database,
147 that could be a vector store with a semantic search, but that could also be 148 something where the AI can actually act like call an 149 API to trigger a process or to send an email 150 or, or do things like that. And the challenge 151 was that every API works differently. Every 152 API has a different authentication mechanism, every database has a slightly different 153 SQL dialect and so forth. So all the tools 154 that I wanted to use, starting from a simple 155 calculator, all the way to a weather API or to 156 a third party SaaS, providers API or anything, they 157 all have their proprietary 158 API. So it was my task as a 159 developer to basically manually 160 code the connectivity the connection to all of these APIs 161 to be able to get the data or send a request or do whatever
162 to get it into the model or to have the model act on my process. 163 MCP Model Context Protocol is an open source 164 protocol nad has been 165 published and recommended by Anthropic. It's being widely adopted 166 by Google, Amazon, Microsoft and many others. 167 It looks like it's actually turning into an industry standard, 168 which is a good thing because it simplifies 169 that connection layer between the 170 actual tool or API and 171 your agent. It's a fairly 172 simplified protocol that contains a few primitives like tools. 173 So what can I actually do with that 174 API as a model? That tool could be 175 a browser engine, could be a runtime for me to run 176 code in a sandbox environment, could be an API to a third 177 party provider, could be 178 a piece of code that I wrote myself, could be basically everything. And with mcp,
179 you're able to just wrap this as a so called server, 180 it's an MCP server. And then you can use the client 181 library of the SDK inside of your own application 182 to just connect to the server and 183 include it into your agent so that you don't have to rebuild 184 it. For every single API that you to want to use, that's, that's mcp. 185 It has a few more primitives like it's able to send 186 notifications to the client, it's able to provide 187 resources, static resources to the provider, and a few more. 188 But the most important and most interesting use case for most 189 AI applications really is the fact that it can expose 190 its capabilities or the capabilities of the underlying 191 API as tools that the LLM can 192 understand and use. You said 193 before that AI systems are only as powerful as their
194 connections. What does it take to connect models to real world 195 APIs or workflows? I'm not sure how to answer 196 this question. You need, you need. It 197 takes a few things on a few different levels. Well, first of all, you should 198 be aware of what you want to connect to 199 and also the implications of 200 connecting to these things. For instance, if you connect to your, 201 to your email account, you give the model effectively access 202 to all your data, possibly including pii, most 203 likely including sensitive and personally identifiable 204 information. That is something that you just need to be aware of. 205 So once you connect your AI to this system and 206 connect it to another system, you need to 207 be aware of the fact that technically these two connected, 208 these two systems are connected through an intermediary,
209 but they are effectively connected. That is something. So it 210 takes, it takes thinking about 211 what do I Connect and do I really want 212 these things to be connected with each other? 213 We're talking about boundaries, we're talking about service isolation, 214 something that we've been talking about software architecture for a very long time time, 215 service isolation, just to make sure that a 216 service which could be an AI agent only does what it's 217 supposed to do and doesn't accidentally 218 expose information 219 to something else, even though it shouldn't. And this is 220 even more important with LLMs because LLMs are by 221 nature not deterministic. You cannot just 222 code the, the model so that it's 223 doesn't cross this boundary. You could try, but it's really 224 hard and I wouldn't, I wouldn't suggest you do that. So
225 looking at this, let's say I have two, I 226 have two systems and I need a connection between these two systems, one 227 being my email, the other being maybe a chatbot 228 that I provide to a customer as part of my website. 229 I wouldn't want this to be one thing. I wouldn't want this to be one 230 agent. I would want to separate these two, isolate these two. One could be 231 an agent that talks to the customer through a 232 chatbot interface that then realizes, well, we do what 233 that classifies the use case. Well, is this a complaint or is this an 234 order or is it a general inquiry? And then 235 if it's a complaint, this 236 agent may then send a message 237 to a different agent that is responsible for 238 processing customer complaints and does that. And
239 they don't share data between each other. 240 It's more of a handoff situation where the classifying agents 241 says, well, I call the compliant agent, 242 tell them there's a complaint from this customer, and this agent 243 comes back with maybe, okay, 244 thanks. Somebody's going to call you 245 and tells this the actual, the agent that's talking to the customer. And the 246 agent tells the customer, well, okay, somebody's going to call 247 you or could you give me your email address or phone number? Would 248 you like to be called or would you like to get an email? How do, 249 how would you like us to contact you? So the 250 negotiation with the customer and the negotiation with 251 the CRM with access 252 to sensitive information should be isolated. That is 253 something that I would have in mind 254
from the start if I wanted to do something like this. 255 I'm not sure. Did I answer your question? Well, 256 partially. I 257 totally do understand that we are very, very early in this whole world 258 of AI agents and how the systems work. And 259 currently I, without like global standards, I do believe 260 you can really completely 261 answer that questions. But let's talk about 262 Agentic workflows, how they are different from simple 263 prompt chains or aig 264 RAG pipelines. Sorry, RAG pipelines? Yeah. 265 So rack pipelines are completely different. Beast rack 266 pipelines themselves don't actually. Well, they actually involve 267 AI in a certain way, but not in the way that we talk about it 268 right now. RAC pipelines are more a 269 data preparation. How do you prepare your 270 data, your siloed information, your customer, 271 your CRM, your product database, whatever you have? How do you prepare
272 that so it can be used in a useful way 273 by an AI? The other thing, prompt 274 chains. Prompt chains can be part of an agent. So first of 275 all, but very often you actually don't even need an 276 agent. But a prompt chain could be enough. A prompt chain just being 277 you have a fairly deterministic workflow where you send something to an 278 LLM, you get a response, maybe you send a second prompt based on the 279 response, and then at some point the thing is done. 280 The way I look at it is I distinguish between three types of 281 agentic applications, or not agentic, three types of 282 LLM or 283 generative AI enhanced applications. The first being 284 non agentic. That are all the use cases 285 where you send something to an LLM and the LLM 286
responds and that's it. Or where you send something to the. 287 Or a chatbot, you chat with the LLM, the 288 LLM response and you have a back and forth between the person 289 and the model. That's non 290 agentic workflows, they can become quite 291 complex and complicated, but they are mostly 292 predetermined. Either they're a loop like in a chat, 293 or there's a sequence of steps that needs to 294 be done and this sequence is almost always 295 the same or maybe has some decisions in between that you can 296 do with traditional conditional 297 steps. The second being. So the first of the three 298 is non agentic. The second is agentic 299 AI. This is where 300 the AI, the model, actually makes 301 decisions, plans. Does 302 reasoning understands that it doesn't 303 have all the information it needs, asks you about
304 this information, or reaches out to one of the tools 305 via mcp, the tools that it has available to 306 get the information it needs for this specific use case, and is 307 able to adapt the workflow based on the 308 interaction, based on the available information, based on its 309 own reasoning. So very often an agentix system 310 starts by analyzing the 311 task, coming up with a plan, 312 maybe even storing that plan somewhere in the file 313 system as a checklist for itself, using a tool, 314 again using MCP, for instance, that gives it access to 315 a contained file system, a 316 temporary directory, basically where it can store intermediate 317 information. So it puts its checklist, its plan in there and 318 then it does something and then it goes back to the checklist, checks this thing 319 off and then it realizes, well, for this I need some more information from
320 the customer, goes back, chats with the customer 321 to receive this information and so forth. So an 322 agent basically 323 perceives, acts. 324 So it gets information from 325 either from the user through the prompt or through a tool, through a 326 database or something, gets information, 327 decides based on this information and then acts 328 within certain bounds. And these boundaries are basically 329 defined by the use case and the capabilities of this agent. 330 The third so non agentic, gentic and the third basically 331 being multi agent systems. This is where multiple 332 agents interact with each other to, 333 to provide for an even more 334 complex use case. This is something that 335 I talked about before where you may have an agent 336 that interacts with the customer through the website and is able to 337 kick off different processes depending on the customer and what they want.
338 And this agent then communicates with an agent that's responsible 339 for complaints and another agent that's responsible for ingesting orders 340 and so forth. 341 So it can become infinitely complicated. 342 I'm basically thinking about agents as like I think about 343 microservices. Actually I would say agent is 344 when the AI deviates from simple. If this, then that rules, 345 would that be a good definition? Even 346 complex? If this and that rules. There are 347 fairly complex workflows that can be 348 deterministically defined 349 where the entire work process, no matter how 350 complex the process is, is in itself 351 is deterministic and 352 algorithmic. So you can do it with step functions or you can do 353 it with a, with an orchestration engine or something like that. In 354 that case I wouldn't necessarily use AI, an AI agent, 355
I might use AI, for instance, as a front end to that process 356 that understands natural language. So if I want to have, 357 if I want to have, want to have the 358 possibility to use Slack, do 359 anything through Slack, I have an agent in Slack and I'm able to 360 tell the agent, please do this for me. In the past 361 I would have to use a very specific command format for Slack 362 Ops. And now with LLMs I can use natural language 363 and I just use natural language. Then I have the LLM as 364 basically the client and the 365 LLM is not an agent. It just takes the 366 request and has a list of processes, classifies 367 the request, extracts the relevant information and 368 kicks off the processes. It becomes 369 agentic as soon as, yeah, it becomes agentic as
370 soon as this thing 371 may have to make more involved decisions like 372 maybe we need more information or maybe we have to call 373 somebody. It becomes fairly complicated fairly quickly. So it's really hard 374 to talk about it. But what I'm trying to say really is don't build an 375 agent for everything. First of all, don't use AI 376 if the problem can be solved without 377 in a fairly easy way. Second, don't build an agent if it can 378 be done by a simple deterministic workflow, 379 even if it involves an LLM. And don't 380 build complicated. Well, let me say it this 381 way. If the solution to your problem is more complicated 382 than the problem you're trying to solve, 383 you are probably doing it wrong, 384 if that makes sense. I see, 385 I see. What does a typical production AI stack looks
386 like in 2025, especially for startups that are scaling fast? 387 Well, you need a few things. One is obviously model 388 serving. So you need the model 389 somewhere. Could be locally, could be something that you do yourself. As a 390 startup. I wouldn't recommend doing that as a startup. I would really 391 just, really just suggest you use an existing model 392 provider that provides the 393 model through an API. Could be Amazon Bedrock, for instance. We have lots of 394 different models and the list is growing. We have open weights model like Llama, 395 Mistral, Deep Seq, Others 396 we have commercial models like Claude, our 397 own Nova model family we have. 398 Now I'm blanking out. We have Cohere. We have many different models 399 that you can use for different use cases. We have many, also many 400 general purpose language models. So you could use Amazon
401 Bedrock to just talk to a model through an API, through 402 a secure API, while it's secure, so you 403 don't have to worry about what does the model provider do with your data. We 404 don't do anything with it. We don't even use it for model training. 405 We just provide the model to you so that you can use it in a 406 secure way, so that you can even build GDPR compliance 407 systems. And so that's model 408 serving. You may need databases, either your own 409 databases, maybe a vector store if you want to do some semantic search. 410 But that's advanced, I wouldn't start with that. And you need 411 something that orchestrates the process. So 412 something that basically takes the input 413 calls the actual language model. Because the language model itself cannot do 414 anything. It only takes text
415 and, or depending on the modality, we're talking about language models right now. So it 416 takes text and it produces something based on that text. It doesn't do anything. 417 So the orchestration engine connects to 418 MCP tools, to the front end, to whatever you 419 want to do, and The LLM and the orchestration framework 420 could be one of the many open source frameworks that we have, like 421 Langgraph is one or 422 Llama Index is another one. True AI is one strands agents. 423 This is one that we have open sourced about two months ago, 424 which is model agnostics, even provider agnostics. You can use 425 trans agents with models from OpenAI 426 or directly with Llama API or 427 even with Olama on your own machine. So you need an orchestration 428 tool or engine, you need a 429 model somewhere. You may need a database or some data
430 for the model to work with. You might want to think about 431 primary and secondary models that's a bit more advanced than as well. So primary 432 models is the general purpose model 433 that does the majority of the work. And then you may want to use 434 secondary models for instance for very simple use cases, so you don't 435 need to use the expensive ones. You can use very cost effective models 436 for simple summarization tasks, while you may want to use a 437 more expensive reasoning model for the overall orchestration for 438 instance. Another thing that's part of the stack is evaluation and 439 monitoring. And that is something where I really would say as 440 a startup you should put that in place as early as possible. 441 Monitoring. I think monitoring observability is self explanatory. You 442 should be able to see what's happening. And you should also very
443 early implement cost monitoring. Because if something 444 goes wrong, especially in a non deterministic system 445 like agentic AI system, 446 if it runs into a loop, it may run up 447 a big context that it sends to the recursively 448 sends to the LLM and all of a sudden it become very expensive. You wouldn't 449 want that. So please set up cost monitoring 450 very early. And also I would recommend 451 implementing an evaluation mechanism. Evaluation is 452 basically testing but full LLMs. So 453 you take a specific model and you have a number of prompts for your 454 system and some data that you get from your database or 455 through RAG or through mcp and you plug these things together and 456 you test them in different scenarios, maybe with different user 457 inputs to a point where in most cases you are
458 satisfied with the response. So you have you reach 459 a certain threshold of reliability of your 460 system to do what it's supposed to do. 461 All of a sudden a model provider 462 deploys an update of their model, a new version 463 that has been trained on different data or has been fine tuned in a different 464 way, which could break your 465 system apart because a variable, a very important 466 variable has changed. Or 467 maybe you change your prompts that 468 you use as part of the pipeline or Your data changes, the 469 structure of your data changes. This could all lead to 470 the fact that your overall system degrades in terms of 471 reliability when it comes to 472 how good the results are. And you can solve that by 473 implementing an evaluation pipeline so that whenever you change
474 anything, you run a 475 number of prompts or a number of use 476 cases against the system to see 477 if the reliability drops beneath your threshold. And 478 if it does, the test fails and you have to go look at it. 479 That's very important. If you implement something like this as early as possible, 480 just like with testing in general, you will be able to 481 to iterate with much quicker than 482 if every time there's a new model update, or every time the data structure 483 changes, or every time you update your own prompts, 484 the system falls apart because you realize it doesn't 485 reliably create the solutions or 486 the responses that I was looking for. Apart from 487 that, well, we were at the question, what's part of the stack? So model 488 serving, data access and orchestration, 489
then mostly primary model to start with. And I would suggest 490 just starting with a primary model, then evaluation and monitoring, 491 then maybe a data pipeline if you actually want 492 to use live data that changes over time. But again that's a fairly advanced 493 topic and obviously security and compliance. Whenever you 494 use sensitive data, whenever you use proprietary information, 495 make sure that you comply to your internal 496 compliance frameworks, to your customers, compliance frameworks to 497 legal compliance frameworks, make sure that you use proper 498 authentication, make sure that your agent can only do what it's supposed to 499 do, that your agent doesn't have access to your customer 500 database, while also chatting on the Internet with random 501 people and perhaps by 502 accident giving them access to your customer database. 503 That's important. Security and compliance, that's part of
504 any production stack and should be 505 because these are the basic things that you need to make 506 sure that you don't run into problems, most likely 507 earlier than later. How do you 508 guys from AWS support this kind of production 509 grade AI stack from bedrock to step functions to vector 510 DBs? Well, first of all we have a number of services 511 around the bedrock family of services, 512 and that is Amazon Bedrock itself, which 513 is first of all model serving, 514 where we provide secure and private 515 access to models from different providers, including 516 the current frontier models of most providers 517 where you can just use models and be sure that 518 your data is not being used for training or used for 519 anything else. So we basically run these models 520 inside of our own escrow accounts. They are air
521 gapped. Everything you send to the model 522 is not stored or reused for anything. 523 It's just sent to the model. The model's basically brought to 524 life, loaded into the GPU cluster. It 525 runs and then it returns the response and then the 526 models basically goes back down and all the data is gone 527 apart from the actual model weights themselves because they 528 need to be used for subsequent calls. So that is one thing. 529 Amazon Bedrock, which provides access to models, 530 including our own family of models, including 531 the capability to actually fine tune certain models 532 or distill models into 533 smaller models. If you want to use smaller models, basically you want to use, 534 let's say you want to use Llama Llama 4, but you don't want to use 535 the version that Meta provides. You want to distill this into a
536 smaller model. You can do that with Bedrock. Again, very advanced features. 537 I would not suggest starting with that. It's 538 time intensive, it's costly, it's a use 539 case for enterprises and it may be a use case for you 540 once you are further ahead on the road in 541 adoption. The second thing that we provide is Bedrock 542 Guardrails along with a few other capabilities. 543 Bedrock Knowledge Base guardrails is 544 basically to mask sensitive data or 545 to block requests that contain sensitive data or responses that contain 546 sensitive data or to block requests or responses, responses 547 that violate ethical codes that you've defined or something like 548 that. And Knowledge base is basically 549 direct access to vector stores. And we also provide 550 obviously services for vector stores with OpenSearch 551 or with Postgres on RDS. We've just 552
released S3 vectors, Amazon S3 vectors. 553 So you can even store your vectors on S3 as object 554 stores, but and use that which is extremely cost effective 555 if you compare it to traditional vector stores. Because traditional vector 556 stores have to be. They're basically database servers, servers 557 and they have to run and they cost money. And S3 558 vectors stores your vectors on S3. So you only 559 pay for storage, not for a machine that's running all the time. You pay for 560 storage and then of course for access. So it's, you 561 can, you can even. You can save up to 90% of the cost 562 compared to. To database 563 based vector stores. And then 564 we have Bedrock Agents, which is out of a box system that provides 565 agents in a fairly opinionated 566 way. So you can just build an agent using Bedrock
567 agents. We don't have to do that much, 568 but these agents are self contained inside of 569 aws. And the third thing, and that is something that's 570 just in purview since. Since a few weeks at the time of 571 recording. Maybe ga. When you listen to this episode. GA means general 572 generally available. It's In a public preview right now, that's 573 Bedrock, Amazon Bedrock's Agent 574 Core and that's family of services. That's very 575 interesting because it, it gives you all the 576 individual capability capabilities as building 577 blocks that you can use. That is, it 578 has access to Bedrock models, obviously, but it also, you can 579 also use this, use it to use Bedrocks, any, 580 sorry to use models anywhere. So you can 581 also use this, use it with OpenAI, with llama API, 582 with your own ollama or whatnot. And it's
583 provider, no, it's framework agnostic. So you can deploy 584 your CREWAI agents or your langdref agents. 585 You don't have to do it the AWS way. 586 The next capability is memory, because very often it's important 587 to maintain information across sessions. So when 588 I talk to the agent right now I wanted to retain information 589 about previous conversations. And that's a capability. It's called 590 Agent Core Memory. Again, Agent Core memory can 591 just be plugged into an agent that you run on Agent Core, but it can 592 also just be plugged into an agent that you run somewhere else. Again, 593 it's provider and framework agnostic. The third 594 capability is Tools. So right now, 595 as of now, we provide direct access to 596 code environment sandbox, which is completely isolated. 597 So if you, if, if your agent creates code or if you're the
598 user of your agent's agent sends code, the 599 agent can then just use one of these 600 sandboxes to run that code in a completely secure 601 and isolated environment. And right now it provides a 602 python runtime and typescript, most likely more in the future. 603 It also provides browser access to a browser 604 environment so that your agent can use the Internet again 605 in an isolated environment. Another capability, of 606 course, is Security Identity. You can do everything, 607 of course, using IAM identity and access management with 608 AWS, but you can also use OAuth with 609 any kind of OAuth provider or your corporate 610 identity provider, your commercial identity provider that 611 you're using anyway to make sure only the people who 612 should access your agents are actually able to access 613 your agents. And then we have, and I
614 realize I've been talking a lot and it's a lot of stuff. I'm going to 615 summarize that in a second. Then one of two more things. Observability 616 out of the box using OpenTelemetry, so you can use your 617 existing observability stack if you want. 618 And Gateway 619 Agent Core Gateway allows you to just wrap any 620 API that you may already have and expose it as 621 an MCP server, including discovery, 622 including even the ability to 623 sell your Own agent or your own MCP server on the 624 AWS marketplace to other AWS customers if you want. 625 So in summary, what Agent Core provides is all the building blocks that 626 you might need to build an agent memory, 627 a runtime for the agents and the MCP servers, a gateway. 628 If you already have your MCP or your server and just want to wrap it
629 as MCP uses, has identity, it has 630 observability, tools and memory. 631 That's what we. It's a lot, I realize. 632 Yeah, it is. I 633 was wondering for our audience, have you built an AI prototype that 634 almost made it into production? What blocked you 635 tag us with your war story? 636 We'll be back after a very, very short ad break 637 then is some startups are using multiple 638 foundation models at once. What's AWS's approach 639 to multiple model orchestration and 640 how do you manage that securely? 641 Multimodal usage is one of the core premises 642 because we say it doesn't make sense to use one for everything, 643 which is why we started bedrock the way we did in the first place. It's 644 not one model that you can use. You can use models from 645 many different providers with many different capabilities. Because
646 in many use cases you may want to use a general purpose large language 647 model, but you also may want to use a model to create your embeddings or 648 to create images. Could be from a completely different provider. Or you want to 649 use a reasoning model for involved tasks and a 650 much less expensive small model for basic 651 tasks like summarization or classification. 652 And these can be from different providers. Then there's models that are 653 specialized in translation, language, translation of languages. 654 There's models that may be specialized in code, creating 655 code. So Bedrock has already always 656 Bedrock and AWS has always looked at it through the lens of different 657 customers need different things, different models and individual 658 customers may need different models models for different use cases 659 or even inside the same use case. So I wouldn't
660 start that way. If I would just build a prototype, 661 I wouldn't start with multiple models, I would start with one to start 662 understanding the moving parts and how it works and what and the limitations. 663 But you can certainly use multiple models, and 664 in any kind of production application I probably would 665 because that helps me reduce cost, 666 reduce latency, because the larger the model, the longer it 667 takes for the model to respond, to take 668 these things into account. So yes, 669 multimodal, using different models, even from different providers, 670 is certainly something that I would suggest looking into 671 once you have your basic use case down 672 and once you go into well, how can I optimize cost how can I 673 optimize latency? Is there a specialized model that 674 helps me with certain tasks inside the workflow?
675 How do we make sure it's secure? Well, just like everything 676 on aws, everything goes through the 677 AWS API. So every model invocation goes through the 678 AWS API, which is protected through iam, through 679 identity and access management. So you can have 680 really fine grained mechanism to say 681 who or which service or which third 682 party or which agent is allowed, which process is 683 allowed to interact with individual models with intro 684 with individual data stores or tools that you provide. 685 For our audience. I was wondering what's one tool or 686 pattern that help you finally scale your project? 687 Shared on threads or LinkedIn and taxpayer o 688 Dennis, what's your take on LLM ops or 689 GenIML ops? Is it the same as 690 traditional ML Ops or something new? 691 I'm getting into hot waters when I start
692 talking about that because it's 693 not the same. So. And I, I'm not sure if. What did 694 you say? Gen AI Ops, LLM Ops, 695 ML Ops OP definitions 696 are in flow. Well, I'm pretty sure there is a definition for MLOps 697 and there's probably also a definition for LLM ops. 698 But the thing is that 699 with the democratization of generative 700 AI since the Chat GPT moment, effectively when everybody wants to 701 build on top of generative AI, 702 there's a new kind of discipline 703 emerging which sits at the intersection of 704 software developers and 705 data scientists and 706 machine learning engineers. And that's what's 707 emerging as AI engineers. That's the term 708 that's being used increasingly for this, where you 709 don't go build the models yourself, 710 you don't even necessarily fine tune the models and then
711 deploy the models somewhere and run them. You use 712 models, you build applications that use 713 these models and traditional approaches to build an 714 application that has both the intelligence 715 of a language model or any kind of generative AI model 716 and the, the, the 717 capabilities of the piece of software you build. So the AI 718 engineer understands how 719 LLMs works, understands the limitations, understands the difference 720 between models, but the AI engineer usually 721 doesn't deploy these models, doesn't build and train these models. 722 That's what happens, that's what the ML engineer does. 723 And when it comes to operations, I think it's very 724 similar. The LLM ops, or 725 more more specifically defined probably 726 ML Ops is really the 727 operational aspect of building and deploying and running 728 models, training models and everything around 729 that. And the gen AI Ops, or
730 AIOps if you will, 731 is DevOps. But 732 now it includes AI 733 as another very important component 734 which requires its own 735 capabilities like evaluation. So you test AI 736 differently than you test 737 a front end or then you do load tests 738 on an environment. You have to approach it a 739 slightly different way. It's the same thing in terms 740 of what you have to do. You have to make sure it works. Every time 741 you change something in your application, you have to make sure that it still works, 742 that you don't have any regression, that you didn't introduce any bugs. 743 Now there's a new class of regression, there's a new 744 class of bugs, there's a new class 745 of thing that may introduce latency or that may 746 introduce additional cost. And that class is based
747 on the integration of AI. And in 748 my opinion, the operational aspect of 749 this becomes native part of DevOps over 750 time. There are a lot of tools right now, there will 751 be more tools in the future, but it's very different 752 from what the data science and the ML engineer. Have you 753 seen any counterintuitive success story 754 where there was less tech and that actually led to better 755 performance of the AI in production? The most 756 counterintuitive thing is something that I see fairly often really 757 is when you approach it saying, well, 758 let's do this with AI and you realize actually we don't need AI for 759 this, or let's build an AI agent because 760 everybody's talking about agents right now, which is a good thing because 761 it's an evolving space. But you realize actually I don't really need an
762 agent because I can simply use an LLM 763 for this. So the most counterintuitive 764 is something that I have seen throughout my entire career. And software 765 engineering is less complex, very often 766 is more effective. So whenever you build something, 767 I encourage you to try and to experiment with AI and AI 768 agent, but I also 769 encourage you to not try to just solve 770 everything with AI. And that may be counterintuitive advice, 771 but it has always been sound advice in my experience. Experience. 772 Last question for us here in the second interview 773 and thank you for sticking around with me because we're together here in a session 774 for more than two hours now. Zoom out for us, Dennis. 775 What's the future of AI architecture? In something like two to 776 three years, do you see MCP and ancient frameworks
777 becoming the new standard? I have no 778 idea. 779 Literally, I have no idea. If you look back through 780 the last two to three years since the ChatGPT 781 moment, basically 782 everything has changed so dramatically. 783 The technology, the infrastructure, the capabilities of the 784 models themselves, the availability, 785 the open source framework, the work that the community is doing, 786 the many, many startups that are around 787 Certainly many still try to solve old problems with new tools. But there are 788 also so many niches where something incredible is actually 789 happening and there's so much innovation happening. 790 I don't even know I'm going to go on vacation a week from now for 791 three weeks. I don't even know what the world will look like when I'm back. 792 It's really hard. I think agents. Well, first 793 of all, AI is not like a flu. It won't go away.
794 It's going to stick around. Agentic 795 AI, it's being hyped right now. But I also think 796 it is a very important topic that either sticks around 797 or evolves into something even more 798 capable. The thing is, 799 the thing is the best, the best time to get in, to get 800 involved is now because it's never, it's never going to be as 801 simple as it is today. And I realize it is really hard. 802 I'm able, I'm. I'm lucky to be able to work with this stuff 803 every day, all day long. And I'm still overwhelmed. I'm still 804 overwhelmed. I've subscribed to so many newsletters and 805 there's so much news and so many tools to look at and so many frameworks 806 and so many. I don't know. I don't even know where to start until 807 I realized most of these newsletters and most of the
808 experts that are around all of a sudden, they just copy from each 809 other. Many of them, not all of them, but many really just copy from each 810 other. And I, 811 I'm fairly convinced that many of them 812 really are just AI tools creating content on the 813 socials, in newsletters and so forth. So it's really hard 814 to, it's really hard to 815 distill the actual signal from the noise right now. 816 But at the same time, it has never been as easy as today 817 because it's only getting more complicated. It's only going to be more 818 so for you. Really important is to get started now 819 and at the same time try to understand the fundamentals, not necessarily 820 the math behind these models. You don't need a PhD in science 821 or in math or anything. I, I certainly don't. I'm a developer. I don't
822 understand AI, to be, to be honest. But 823 what I do understand very well by now is 824 how can I use AI in software application? 825 What impact does it have on the capabilities of what I build, but also 826 what impact does it have on the way I work? That's two different 827 levels. And I'm, I'm able to do 828 that because I did the work to at 829 least understand the fundamentals of 830 what these models actually are and how 831 they work in terms of their capabilities. So 832 why do they get things wrong? 833 Why do they have what we call hallucinations? Why do they 834 have a hard time doing basic 835 math while being able to talk for 836 hours? So these are the things. And I 837 invite you to listen to 838 Joe's podcast. I invite you to have a look at the stuff that we put
839 out at aws, to the things that I put out on the socials. 840 Follow me on LinkedIn. It's just Dennis Troup. I think 841 Joe's going to put my context into the details. Ask 842 questions, talk to people. Figure out, 843 figure out how this stuff works. 844 Experiment, play around with it. Don't be stupid. 845 Don't connect a random piece of AI to your email. 846 Don't put something out on the Internet and then run up a bill because 847 somebody. DDoS is you. Experiment 848 in an isolated environment, maybe inside of an AWS account or 849 on your local machine where everything's isolated and protected. 850 You don't have to worry. Worry about external 851 influences and maybe threats. Experiment, play around with it 852 and at the same time think about, think 853 about the things that you might want to solve for
854 yourself. Think about, think about the things that you 855 need to do manually, manually every day 856 because it was too hard to automate or it was too impossible. It was 857 impossible to automate or it was too costly, or you just didn't get around 858 to automating it. Maybe AI can help you 859 with a small problem that you have every day, 860 that you're trying to solve every day, and it bothers you 861 and it's so annoying. That is what I did. 862 That's how I got started. That's how I learned. I looked at what I'm 863 doing every day and there's so much stuff that I never got around to doing. 864 And it was. I was complaining about them all the time and it bothered me 865 all the time. And all of a sudden I realized I can build a small
866 agent that just does it for me. And it doesn't even need 867 to connect to sensitive data or it doesn't even connect to 868 the Internet or anything. It is just a small 869 CLI tool that I run automatically 870 every day and it takes care of some stuff for me. 871 It pulls some statistics or 872 looks. If there was some new conversations on Slack that I need to know about, 873 these are the small things that I built. And by building these small things, 874 I. I learned how they work, I learned how they fail. 875 I learned about all the things that can go wrong. And then I 876 started being able to build larger things, to build more 877 complex applications, to build actual agents, actual 878 agentic systems that I now run for 879 more and more things. I have to admit though, I'm not running anything in
880 production because I'm not building production software anymore. I haven't 881 for a few years, unfortunately. But the great thing is, in 882 my role, I get to experiment with that stuff a lot. 883 Dennis, awesome last words. Thank you very much 884 for being such a good guest and telling us so much about 885 AI and aws, how they're working together. 886 Thank you so much for having me. It was a great time. Thank you. 887 That's all folks. Find more news, streams, 888 events and 889 interviews@www.startuprad.IO. 890 remember, sharing is caring. 891 Sam.




Comments