How Startups Can Succeed at Productionizing AI Without Breaking at Scale

Jörn Menninger
Aug 21
36 min read

'Emotional Branding for Startups – The Invisible Growth Engine' on a dark blue tech-patterned background.

🚀 Management Summary

Most startups underestimate the leap from AI prototypes to production-ready systems. What works in a demo often collapses under real-world usage. In this blog, we explore the hard truths of Productionizing AI, featuring insights from Dennis Traub, AI Engineering Specialist at AWS.

Founders, scaleup execs, and investors will learn why security, scalability, observability, and simplicity are critical for success in 2025. We also explain the role of agentic workflows, Model Context Protocol (MCP), LLM Ops, and AWS Bedrock in building reliable AI architectures.

👉 For startup founders scaling in Germany, DACH, and globally, this is your playbook for making AI production-ready.

📚 Table of Contents

Why Productionizing AI Matters for Startups
The Hard Truth: From Prototype to Production
Agentic Workflows & MCP: Connecting AI to the Real World
AI Architecture 2025: What Startups Need
AI Engineering & LLM Ops: The New Discipline
Counterintuitive Lesson: Less Tech, More Success
Further Reading & Resources
FAQs: Productionizing AI for Startups

🚀 Meet Our Sponsor

AWS Startups is a proud sponsor of this week’s episode of Startuprad.io. Visit startups.aws to find out how AWS can help you prove what’s possible in your industry.

The AWS Startups team comprises former founders and CTOs, venture capitalists, angel investors, and mentors ready to help you prove what’s possible.

Since 2013, AWS has supported over 280,000 startups across the globe and provided $7Billion in credits through the AWS Activate program.

Big ideas feel at home on AWS, and with access to cutting-edge technologies like generative AI, you can quickly turn those ideas into marketable products.

Want your own AI-powered assistant? Try Amazon Q.

Want to build your own AI products? Privately customize leading foundation models on Amazon Bedrock.

Want to reduce the cost of AI workloads? AWS Trainium is the silicon you’re looking for.

Whatever your ambitions, you’ve already had the idea, now prove it’s possible on AWS.

Visit aws.amazon.com/startups to get started.

🔹 Why Productionizing AI Matters for Startups

Building an AI demo is easy. Scaling it into production is not. Startups quickly face challenges like:

Security risks (exposing sensitive data through LLMs)
Scalability bottlenecks (apps breaking under user load)
Observability gaps (no monitoring = runaway costs)
API complexity (manually wiring dozens of integrations)

👉 Productionizing AI means transforming prototypes into secure, scalable, and observable systems. For startups, this requires handling security, scalability, monitoring, and real-world API connections before launch.

🔹 The Hard Truth: From Prototype to Production

Dennis Traub notes that most GenAI prototypes fail when exposed to customers. Why? Because they ignore production realities.

Top pitfalls founders face:

Underestimating cost blow-ups from non-deterministic LLM loops
Poorly defined service boundaries between agents
Lack of compliance frameworks (GDPR, SOC2, HIPAA)
No evaluation pipeline for regression testing

🔹 Agentic Workflows & MCP: Connecting AI to the Real World

The Model Context Protocol (MCP) is emerging as an industry standard for connecting LLMs to APIs, databases, and SaaS tools.

Why MCP matters:

Standardizes how LLMs call APIs
Reduces developer burden of manual integrations
Already supported by Anthropic, Google, Microsoft, and AWS
Enables multi-agent workflows without chaos

👉 Agentic workflows allow LLMs to plan, reason, and act via tools. MCP standardizes these connections, making production AI more reliable and scalable.

🔹 AI Architecture 2025: What Startups Need

According to AWS, a modern AI stack should include:

Model serving (via Bedrock, OpenAI API, or Ollama)
Orchestration frameworks (LangGraph, LlamaIndex, Strand Agents)
Data pipelines (vector DBs, semantic search, retrieval-augmented generation)
Monitoring & evaluation (LLM regression testing, observability dashboards)
Security & compliance layers (identity, guardrails, GDPR controls)

🔹 AI Engineering & LLM Ops: The New Discipline

A new role is emerging: AI Engineer.

Unlike traditional ML engineers, AI engineers:

Don’t train models — they integrate them into apps
Focus on orchestration, evaluation, and compliance
Work at the intersection of dev, ops, and data science

Dennis describes this as “DevOps + AI” — a discipline where evaluation pipelines and observability are as important as coding features.

🔹 Counterintuitive Lesson: Less Tech, More Success

One of Dennis’s strongest points: don’t over-engineer AI systems.

Sometimes, a simple LLM interface is better than a multi-agent stack. Complexity adds risk, latency, and cost. Founders should ask:

Can this be solved with deterministic workflows?
Do we really need an agent, or is one API call enough?

🧵 Further Resources

AWS Activate for Startups → aws.amazon.com/startups
Related Articles:

🎥 The Video Podcast

🎧 The Audio Podcast

Subscribe here: https://linktr.ee/startupradio

🚪 Connect with Us

Partner with us: partnerships@startuprad.io
Subscribe: https://linktr.ee/startupradio
Feedback: https://forms.gle/SrcGUpycu26fvMFE9
Follow Joe on LinkedIn: Jörn Menninger

📝 About the Author

Jörn “Joe” Menninger is the founder and host of Startuprad.io — one of Europe’s top startup podcasts. Joe's work is featured in Forbes, Tech.eu, and more. He brings 15+ years of expertise in consulting, strategy, and startup scouting.

Learn More

If you are looking to understand the rise of AI and deep tech startups in Europe, including how emerging technologies like machine learning, quantum computing, and robotics are transforming industries, you should not miss Europe’s Ultimate Guide to AI & Deep Tech Startups. This in-depth resource provides founders, investors, and ecosystem leaders with a comprehensive overview of European AI innovation, venture capital trends, and deep tech opportunities, making it a must-read for anyone aiming to stay ahead in the fast-growing European startup landscape.

✅ FAQs: Safe AI Adoption for Startups

Q1. What does productionizing AI mean for startups?

Productionizing AI means taking prototypes live by ensuring scalability, observability, and compliance.

Q2. What is MCP in AI?

MCP (Model Context Protocol) standardizes how LLMs connect to APIs, making integrations easier.

Q3. Why do most AI prototypes fail in production?

They lack monitoring, compliance, and API connection standards.

Q4. What’s the difference between LLM Ops and MLOps?

LLM Ops focuses on running applications with generative AI, while MLOps manages training and deployment of ML models.

Q5. Do startups always need agentic workflows?

No. Many problems can be solved with simpler deterministic workflows.

Q6. What AI stack should startups use in 2025?

Model serving (Bedrock, OpenAI), orchestration (LangGraph), monitoring, and compliance-first design.

Q7. How can AI engineers prevent runaway costs?

By implementing evaluation pipelines and cost monitoring early.

Q8. Is AWS Bedrock secure for startups?

Yes. Bedrock ensures models don’t use customer data for training and supports GDPR compliance.

Q9. What’s the role of an AI engineer?

AI engineers integrate models into applications, focusing on orchestration, evaluation, and compliance.

Q10. How do multi-agent systems work?

Agents collaborate by handing off tasks, but must be isolated for security and compliance.

Give us Feedback!

Let us know who you are and what you do. Give us feedback on what we do and what we could do better. Happy to hear from each and every one of you guys out there!

https://forms.gle/Qp53eVuc9P1RMqWj8

The Host & Guest

The host in this interview is Jörn “Joe” Menninger, startup scout, founder, and host of Startuprad.io. And guest is Dennis Traub, AI Engineering Specialist at AWS.

Reach out to them:

Joe on LinkedIn

Dennis on LinkedIn

📅 Automated Transcript

Dennis Traub | AI Engineering Specialist at AWS [00:00:00]:

Ah, well, the traditional problems that you have. I think with every piece of software that you're trying to productionize, it's fairly easy to build something that works, that solves a problem. But as soon as you put it out into the world, you have a few things. First of all, you need to make sure it's secure. Second of all, you need to make sure that it scales. Because if by all means I I hope for you as a startup it actually works and people enjoy using it, the next step really is how does it scale? How does it not fall apart under the load of people wanting to try? And the third thing really is observability, being able to really get telemetry, look into what's actually happening. Foreign.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:01:02]:

YouTube blog covering the German startup scene with news interviews and live events, AWS is proud to sponsor this week's episode of startup raid IO. The AWS startup team compromises former founders and CTOs, venture capitalists, angel investors and mentors ready to help you prove what's possible. Since 2013, AWS has supported over 280,000 startups across the globe and provided US$7 billion in credits through the AWS Activate Program. Big Ideas Feel at home at aws with access to cutting edge technologies like generative AI, you can quickly turn those ideas into marketable products. Want your own AI powered assistant? Try Amazon Q. Want to build your own AI products privately? Customize leading foundation models on Amazon Bedrock. Want to reduce the cost of AI workloads? AWS Trainium is the silicon you're looking for. Whatever your ambitions, you've already had the idea.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:02:19]:

Now prove it's possible on AWS. Visit aws.Amazon.com startups to get started. So you build a chatbot. Cool. But now your team is stuck wondering how to connect it to real APIs, make it reliable or roll it out to thousand users. That's where this episode comes in. AWS Dennis Troup walks us through how to productionize AI securely at scale and without breaking your app. We dive into agentic workflows, MCP and how real startups go from MVP to market.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:03:00]:

Let's see in our episode today in cooperation with aws. Dennis Straub is a developer advocate at aws. Focus on on helping startups and enterprises to take their gen experiments into real world deployment. With decades of experience in secure infrastructure and developer productivity, Dennis has helped teams across industries automate, integrate and scale their projects using modern cloud native tools. Today we talk about model context, protocol, agent based architecture and what it really takes to make AI Work in production Dennis, welcome back to celebrate.

Dennis Traub | AI Engineering Specialist at AWS [00:03:41]:

O oh thanks, thanks for having me again.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:03:46]:

Totally my pleasure. For everybody who's not aware of this, this is number two of a series of two interviews. But actually one of your colleagues would join us as well. So in total we have four interviews with you guys here. Dennis Productionizing AI Tell us about it. What's the hard truth? What's the biggest gap you see between Genai prototypes and action production systems?

Dennis Traub | AI Engineering Specialist at AWS [00:04:15]:

Well, the traditional problems that you have, I think with every piece of software that you're trying to productionize, it's fairly easy to build something that works, that solves a problem. But as soon as you put it out into the world, you have a few things. First of all you need to make sure it's secure. Second of all, you need to make sure that it scales. Because if by all means I hope for you as a startup it actually works and people enjoy using it, the next step really is how does it scale? How does it not fall apart under the load of people wanting to try? And the third thing really is observability, being able to really get telemetry, look into what's actually happening. Does it do what it's supposed to be doing? Am I running risks in terms of, for instance, running up a large bill because it does something that I didn't expect it to do? You may run into edge cases that you didn't have in your prototyping environment. So the traditional problems that you usually have when you're building something that was small and you put it into the market, the second, second thing that very often happens is as well is you. Most, most pieces of software are not an island.

Dennis Traub | AI Engineering Specialist at AWS [00:05:42]:

They have to connect to something. They have to connect to third party APIs or to your own APIs, or to customers, internal APIs and data sources. And that can be hard as well because how do you make sure that the agent then first of all securely connects to these APIs and second, doesn't mess with them, doesn't do anything that it's not supposed to be doing. That is something that you really need to look at. As soon as you run into production production situation, you will most likely have security, scalability and connectivity issues with internal and third party tools.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:06:20]:

Talking a little bit here about model context protocol is getting a lot of traction in AI engineering world. What is it and what does it matter now?

Dennis Traub | AI Engineering Specialist at AWS [00:06:31]:

One of the big questions that everybody had in regards to LLMs, to language models when they came out a few years ago, really was there a certain amount of unreliability especially when it comes to them producing facts. These models have been trained on an incredible amount of data, but they don't really understand the data. So by the nature of how these models work, they don't really know whether what they're saying is actually true or not. They just look whether it matches a certain pattern that they have seen quite often. And that's a really big limitations because in many applications, in many use cases it's really important to work with factual data and not with assumptions or with, with made up information that doesn't actually, doesn't actually represent the facts. That can be pretty dangerous. So most, most people in companies who've been working with LLMs, especially in their early days, ran into this problem. Well, it doesn't really work because it isn't really reliable in what it says, it makes up data, it confuses things and so forth.

Dennis Traub | AI Engineering Specialist at AWS [00:07:50]:

So there's basically two ways to solve this problem. One way is adding more information to the model, fine tuning the model, giving it access to a vector store so that it can do semantic search to retrieve actual data to, to basically either double check against actual data or just use this data as part of its process, which is called RAG retrieval, augmented generation. And I don't want to go too deep into rag and I have a very specific opinion of rag, not a bad one, but what it actually is, I don't look at it from the perspective of a data scientist. Actually I think RAG is the second thing or belongs to the second thing that we can do which is basically connecting the AI to the real world add runtime. So at the moment in time when I actually use the model, pre training, fine tuning and these things, they all happen before I deploy the model into production and everything after that happens when the model is in production, when my application runs and RAG tool use and other mechanisms basically solve this problem, connecting the AI to the real world. And that could be a database, that could be a vector store with a semantic search, but that could also be something where the AI can actually act like call an API to trigger a process or to send an email or, or do things like that. And the challenge was that every API works differently. Every API has a different authentication mechanism, every database has a slightly different SQL dialect and so forth.

Dennis Traub | AI Engineering Specialist at AWS [00:09:46]:

So all the tools that I wanted to use, starting from a simple calculator, all the way to a weather API or to a third party SaaS, providers API or anything, they all have their proprietary API. So it was my task as a developer to basically manually code the connectivity the connection to all of these APIs to be able to get the data or send a request or do whatever to get it into the model or to have the model act on my process. MCP Model Context Protocol is an open source protocol nad has been published and recommended by Anthropic. It's being widely adopted by Google, Amazon, Microsoft and many others. It looks like it's actually turning into an industry standard, which is a good thing because it simplifies that connection layer between the actual tool or API and your agent. It's a fairly simplified protocol that contains a few primitives like tools. So what can I actually do with that API as a model? That tool could be a browser engine, could be a runtime for me to run code in a sandbox environment, could be an API to a third party provider, could be a piece of code that I wrote myself, could be basically everything. And with mcp, you're able to just wrap this as a so called server, it's an MCP server.

Dennis Traub | AI Engineering Specialist at AWS [00:11:30]:

And then you can use the client library of the SDK inside of your own application to just connect to the server and include it into your agent so that you don't have to rebuild it. For every single API that you to want to use, that's, that's mcp. It has a few more primitives like it's able to send notifications to the client, it's able to provide resources, static resources to the provider, and a few more. But the most important and most interesting use case for most AI applications really is the fact that it can expose its capabilities or the capabilities of the underlying API as tools that the LLM can understand and use.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:12:19]:

You said before that AI systems are only as powerful as their connections. What does it take to connect models to real world APIs or workflows?

Dennis Traub | AI Engineering Specialist at AWS [00:12:29]:

I'm not sure how to answer this question. You need, you need. It takes a few things on a few different levels. Well, first of all, you should be aware of what you want to connect to and also the implications of connecting to these things. For instance, if you connect to your, to your email account, you give the model effectively access to all your data, possibly including pii, most likely including sensitive and personally identifiable information. That is something that you just need to be aware of. So once you connect your AI to this system and connect it to another system, you need to be aware of the fact that technically these two connected, these two systems are connected through an intermediary, but they are effectively connected. That is something.

Dennis Traub | AI Engineering Specialist at AWS [00:13:22]:

So it takes, it takes thinking about what do I Connect and do I really want these things to be connected with each other? We're talking about boundaries, we're talking about service isolation, something that we've been talking about software architecture for a very long time time, service isolation, just to make sure that a service which could be an AI agent only does what it's supposed to do and doesn't accidentally expose information to something else, even though it shouldn't. And this is even more important with LLMs because LLMs are by nature not deterministic. You cannot just code the, the model so that it's doesn't cross this boundary. You could try, but it's really hard and I wouldn't, I wouldn't suggest you do that. So looking at this, let's say I have two, I have two systems and I need a connection between these two systems, one being my email, the other being maybe a chatbot that I provide to a customer as part of my website. I wouldn't want this to be one thing. I wouldn't want this to be one agent. I would want to separate these two, isolate these two.

Dennis Traub | AI Engineering Specialist at AWS [00:14:45]:

One could be an agent that talks to the customer through a chatbot interface that then realizes, well, we do what that classifies the use case. Well, is this a complaint or is this an order or is it a general inquiry? And then if it's a complaint, this agent may then send a message to a different agent that is responsible for processing customer complaints and does that. And they don't share data between each other. It's more of a handoff situation where the classifying agents says, well, I call the compliant agent, tell them there's a complaint from this customer, and this agent comes back with maybe, okay, thanks. Somebody's going to call you and tells this the actual, the agent that's talking to the customer. And the agent tells the customer, well, okay, somebody's going to call you or could you give me your email address or phone number? Would you like to be called or would you like to get an email? How do, how would you like us to contact you? So the negotiation with the customer and the negotiation with the CRM with access to sensitive information should be isolated. That is something that I would have in mind from the start if I wanted to do something like this. I'm not sure.

Dennis Traub | AI Engineering Specialist at AWS [00:16:20]:

Did I answer your question?

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:16:22]:

Well, partially. I totally do understand that we are very, very early in this whole world of AI agents and how the systems work. And currently I, without like global standards, I do believe you can really completely answer that questions. But let's talk about Agentic workflows, how they are different from simple prompt chains or aig RAG pipelines.

Dennis Traub | AI Engineering Specialist at AWS [00:16:55]:

Sorry, RAG pipelines? Yeah. So rack pipelines are completely different. Beast rack pipelines themselves don't actually. Well, they actually involve AI in a certain way, but not in the way that we talk about it right now. RAC pipelines are more a data preparation. How do you prepare your data, your siloed information, your customer, your CRM, your product database, whatever you have? How do you prepare that so it can be used in a useful way by an AI? The other thing, prompt chains. Prompt chains can be part of an agent. So first of all, but very often you actually don't even need an agent.

Dennis Traub | AI Engineering Specialist at AWS [00:17:41]:

But a prompt chain could be enough. A prompt chain just being you have a fairly deterministic workflow where you send something to an LLM, you get a response, maybe you send a second prompt based on the response, and then at some point the thing is done. The way I look at it is I distinguish between three types of agentic applications, or not agentic, three types of LLM or generative AI enhanced applications. The first being non agentic. That are all the use cases where you send something to an LLM and the LLM responds and that's it. Or where you send something to the. Or a chatbot, you chat with the LLM, the LLM response and you have a back and forth between the person and the model. That's non agentic workflows, they can become quite complex and complicated, but they are mostly predetermined.

Dennis Traub | AI Engineering Specialist at AWS [00:18:42]:

Either they're a loop like in a chat, or there's a sequence of steps that needs to be done and this sequence is almost always the same or maybe has some decisions in between that you can do with traditional conditional steps. The second being. So the first of the three is non agentic. The second is agentic AI. This is where the AI, the model, actually makes decisions, plans. Does reasoning understands that it doesn't have all the information it needs, asks you about this information, or reaches out to one of the tools via mcp, the tools that it has available to get the information it needs for this specific use case, and is able to adapt the workflow based on the interaction, based on the available information, based on its own reasoning. So very often an agentix system starts by analyzing the task, coming up with a plan, maybe even storing that plan somewhere in the file system as a checklist for itself, using a tool, again using MCP, for instance, that gives it access to a contained file system, a temporary directory, basically where it can store intermediate information. So it puts its checklist, its plan in there and then it does something and then it goes back to the checklist, checks this thing off and then it realizes, well, for this I need some more information from the customer, goes back, chats with the customer to receive this information and so forth.

Dennis Traub | AI Engineering Specialist at AWS [00:20:35]:

So an agent basically perceives, acts. So it gets information from either from the user through the prompt or through a tool, through a database or something, gets information, decides based on this information and then acts within certain bounds. And these boundaries are basically defined by the use case and the capabilities of this agent. The third so non agentic, gentic and the third basically being multi agent systems. This is where multiple agents interact with each other to, to provide for an even more complex use case. This is something that I talked about before where you may have an agent that interacts with the customer through the website and is able to kick off different processes depending on the customer and what they want. And this agent then communicates with an agent that's responsible for complaints and another agent that's responsible for ingesting orders and so forth. So it can become infinitely complicated.

Dennis Traub | AI Engineering Specialist at AWS [00:21:55]:

I'm basically thinking about agents as like I think about microservices.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:22:00]:

Actually I would say agent is when the AI deviates from simple. If this, then that rules, would that be a good definition?

Dennis Traub | AI Engineering Specialist at AWS [00:22:10]:

Even complex? If this and that rules. There are fairly complex workflows that can be deterministically defined where the entire work process, no matter how complex the process is, is in itself is deterministic and algorithmic. So you can do it with step functions or you can do it with a, with an orchestration engine or something like that. In that case I wouldn't necessarily use AI, an AI agent, I might use AI, for instance, as a front end to that process that understands natural language. So if I want to have, if I want to have, want to have the possibility to use Slack, do anything through Slack, I have an agent in Slack and I'm able to tell the agent, please do this for me. In the past I would have to use a very specific command format for Slack Ops. And now with LLMs I can use natural language and I just use natural language. Then I have the LLM as basically the client and the LLM is not an agent.

Dennis Traub | AI Engineering Specialist at AWS [00:23:23]:

It just takes the request and has a list of processes, classifies the request, extracts the relevant information and kicks off the processes. It becomes agentic as soon as, yeah, it becomes agentic as soon as this thing may have to make more involved decisions like maybe we need more information or maybe we have to call somebody. It becomes fairly complicated fairly quickly. So it's really hard to talk about it. But what I'm trying to say really is don't build an agent for everything. First of all, don't use AI if the problem can be solved without in a fairly easy way. Second, don't build an agent if it can be done by a simple deterministic workflow, even if it involves an LLM. And don't build complicated.

Dennis Traub | AI Engineering Specialist at AWS [00:24:19]:

Well, let me say it this way. If the solution to your problem is more complicated than the problem you're trying to solve, you are probably doing it wrong, if that makes sense.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:24:37]:

I see, I see. What does a typical production AI stack looks like in 2025, especially for startups that are scaling fast?

Dennis Traub | AI Engineering Specialist at AWS [00:24:45]:

Well, you need a few things. One is obviously model serving. So you need the model somewhere. Could be locally, could be something that you do yourself. As a startup. I wouldn't recommend doing that as a startup. I would really just, really just suggest you use an existing model provider that provides the model through an API. Could be Amazon Bedrock, for instance.

Dennis Traub | AI Engineering Specialist at AWS [00:25:10]:

We have lots of different models and the list is growing. We have open weights model like Llama, Mistral, Deep Seq, Others we have commercial models like Claude, our own Nova model family we have. Now I'm blanking out. We have Cohere. We have many different models that you can use for different use cases. We have many, also many general purpose language models. So you could use Amazon Bedrock to just talk to a model through an API, through a secure API, while it's secure, so you don't have to worry about what does the model provider do with your data. We don't do anything with it.

Dennis Traub | AI Engineering Specialist at AWS [00:25:50]:

We don't even use it for model training. We just provide the model to you so that you can use it in a secure way, so that you can even build GDPR compliance systems. And so that's model serving. You may need databases, either your own databases, maybe a vector store if you want to do some semantic search. But that's advanced, I wouldn't start with that. And you need something that orchestrates the process. So something that basically takes the input calls the actual language model. Because the language model itself cannot do anything.

Dennis Traub | AI Engineering Specialist at AWS [00:26:26]:

It only takes text and, or depending on the modality, we're talking about language models right now. So it takes text and it produces something based on that text. It doesn't do anything. So the orchestration engine connects to MCP tools, to the front end, to whatever you want to do, and The LLM and the orchestration framework could be one of the many open source frameworks that we have, like Langgraph is one or Llama Index is another one. True AI is one strands agents. This is one that we have open sourced about two months ago, which is model agnostics, even provider agnostics. You can use trans agents with models from OpenAI or directly with Llama API or even with Olama on your own machine. So you need an orchestration tool or engine, you need a model somewhere.

Dennis Traub | AI Engineering Specialist at AWS [00:27:25]:

You may need a database or some data for the model to work with. You might want to think about primary and secondary models that's a bit more advanced than as well. So primary models is the general purpose model that does the majority of the work. And then you may want to use secondary models for instance for very simple use cases, so you don't need to use the expensive ones. You can use very cost effective models for simple summarization tasks, while you may want to use a more expensive reasoning model for the overall orchestration for instance. Another thing that's part of the stack is evaluation and monitoring. And that is something where I really would say as a startup you should put that in place as early as possible. Monitoring.

Dennis Traub | AI Engineering Specialist at AWS [00:28:08]:

I think monitoring observability is self explanatory. You should be able to see what's happening. And you should also very early implement cost monitoring. Because if something goes wrong, especially in a non deterministic system like agentic AI system, if it runs into a loop, it may run up a big context that it sends to the recursively sends to the LLM and all of a sudden it become very expensive. You wouldn't want that. So please set up cost monitoring very early. And also I would recommend implementing an evaluation mechanism. Evaluation is basically testing but full LLMs.

Dennis Traub | AI Engineering Specialist at AWS [00:28:54]:

So you take a specific model and you have a number of prompts for your system and some data that you get from your database or through RAG or through mcp and you plug these things together and you test them in different scenarios, maybe with different user inputs to a point where in most cases you are satisfied with the response. So you have you reach a certain threshold of reliability of your system to do what it's supposed to do. All of a sudden a model provider deploys an update of their model, a new version that has been trained on different data or has been fine tuned in a different way, which could break your system apart because a variable, a very important variable has changed. Or maybe you change your prompts that you use as part of the pipeline or Your data changes, the structure of your data changes. This could all lead to the fact that your overall system degrades in terms of reliability when it comes to how good the results are. And you can solve that by implementing an evaluation pipeline so that whenever you change anything, you run a number of prompts or a number of use cases against the system to see if the reliability drops beneath your threshold. And if it does, the test fails and you have to go look at it. That's very important.

Dennis Traub | AI Engineering Specialist at AWS [00:30:37]:

If you implement something like this as early as possible, just like with testing in general, you will be able to to iterate with much quicker than if every time there's a new model update, or every time the data structure changes, or every time you update your own prompts, the system falls apart because you realize it doesn't reliably create the solutions or the responses that I was looking for. Apart from that, well, we were at the question, what's part of the stack? So model serving, data access and orchestration, then mostly primary model to start with. And I would suggest just starting with a primary model, then evaluation and monitoring, then maybe a data pipeline if you actually want to use live data that changes over time. But again that's a fairly advanced topic and obviously security and compliance. Whenever you use sensitive data, whenever you use proprietary information, make sure that you comply to your internal compliance frameworks, to your customers, compliance frameworks to legal compliance frameworks, make sure that you use proper authentication, make sure that your agent can only do what it's supposed to do, that your agent doesn't have access to your customer database, while also chatting on the Internet with random people and perhaps by accident giving them access to your customer database. That's important. Security and compliance, that's part of any production stack and should be because these are the basic things that you need to make sure that you don't run into problems, most likely earlier than later.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:32:25]:

How do you guys from AWS support this kind of production grade AI stack from bedrock to step functions to vector DBs?

Dennis Traub | AI Engineering Specialist at AWS [00:32:34]:

Well, first of all we have a number of services around the bedrock family of services, and that is Amazon Bedrock itself, which is first of all model serving, where we provide secure and private access to models from different providers, including the current frontier models of most providers where you can just use models and be sure that your data is not being used for training or used for anything else. So we basically run these models inside of our own escrow accounts. They are air gapped. Everything you send to the model is not stored or reused for anything. It's just sent to the model. The model's basically brought to life, loaded into the GPU cluster. It runs and then it returns the response and then the models basically goes back down and all the data is gone apart from the actual model weights themselves because they need to be used for subsequent calls. So that is one thing.

Dennis Traub | AI Engineering Specialist at AWS [00:33:46]:

Amazon Bedrock, which provides access to models, including our own family of models, including the capability to actually fine tune certain models or distill models into smaller models. If you want to use smaller models, basically you want to use, let's say you want to use Llama Llama 4, but you don't want to use the version that Meta provides. You want to distill this into a smaller model. You can do that with Bedrock. Again, very advanced features. I would not suggest starting with that. It's time intensive, it's costly, it's a use case for enterprises and it may be a use case for you once you are further ahead on the road in adoption. The second thing that we provide is Bedrock Guardrails along with a few other capabilities.

Dennis Traub | AI Engineering Specialist at AWS [00:34:42]:

Bedrock Knowledge Base guardrails is basically to mask sensitive data or to block requests that contain sensitive data or responses that contain sensitive data or to block requests or responses, responses that violate ethical codes that you've defined or something like that. And Knowledge base is basically direct access to vector stores. And we also provide obviously services for vector stores with OpenSearch or with Postgres on RDS. We've just released S3 vectors, Amazon S3 vectors. So you can even store your vectors on S3 as object stores, but and use that which is extremely cost effective if you compare it to traditional vector stores. Because traditional vector stores have to be. They're basically database servers, servers and they have to run and they cost money. And S3 vectors stores your vectors on S3.

Dennis Traub | AI Engineering Specialist at AWS [00:35:42]:

So you only pay for storage, not for a machine that's running all the time. You pay for storage and then of course for access. So it's, you can, you can even. You can save up to 90% of the cost compared to. To database based vector stores. And then we have Bedrock Agents, which is out of a box system that provides agents in a fairly opinionated way. So you can just build an agent using Bedrock agents. We don't have to do that much, but these agents are self contained inside of aws.

Dennis Traub | AI Engineering Specialist at AWS [00:36:21]:

And the third thing, and that is something that's just in purview since. Since a few weeks at the time of recording. Maybe ga. When you listen to this episode. GA means general generally available. It's In a public preview right now, that's Bedrock, Amazon Bedrock's Agent Core and that's family of services. That's very interesting because it, it gives you all the individual capability capabilities as building blocks that you can use. That is, it has access to Bedrock models, obviously, but it also, you can also use this, use it to use Bedrocks, any, sorry to use models anywhere.

Dennis Traub | AI Engineering Specialist at AWS [00:37:03]:

So you can also use this, use it with OpenAI, with llama API, with your own ollama or whatnot. And it's provider, no, it's framework agnostic. So you can deploy your CREWAI agents or your langdref agents. You don't have to do it the AWS way. The next capability is memory, because very often it's important to maintain information across sessions. So when I talk to the agent right now I wanted to retain information about previous conversations. And that's a capability. It's called Agent Core Memory.

Dennis Traub | AI Engineering Specialist at AWS [00:37:41]:

Again, Agent Core memory can just be plugged into an agent that you run on Agent Core, but it can also just be plugged into an agent that you run somewhere else. Again, it's provider and framework agnostic. The third capability is Tools. So right now, as of now, we provide direct access to code environment sandbox, which is completely isolated. So if you, if, if your agent creates code or if you're the user of your agent's agent sends code, the agent can then just use one of these sandboxes to run that code in a completely secure and isolated environment. And right now it provides a python runtime and typescript, most likely more in the future. It also provides browser access to a browser environment so that your agent can use the Internet again in an isolated environment. Another capability, of course, is Security Identity.

Dennis Traub | AI Engineering Specialist at AWS [00:38:43]:

You can do everything, of course, using IAM identity and access management with AWS, but you can also use OAuth with any kind of OAuth provider or your corporate identity provider, your commercial identity provider that you're using anyway to make sure only the people who should access your agents are actually able to access your agents. And then we have, and I realize I've been talking a lot and it's a lot of stuff. I'm going to summarize that in a second. Then one of two more things. Observability out of the box using OpenTelemetry, so you can use your existing observability stack if you want. And Gateway Agent Core Gateway allows you to just wrap any API that you may already have and expose it as an MCP server, including discovery, including even the ability to sell your Own agent or your own MCP server on the AWS marketplace to other AWS customers if you want. So in summary, what Agent Core provides is all the building blocks that you might need to build an agent memory, a runtime for the agents and the MCP servers, a gateway. If you already have your MCP or your server and just want to wrap it as MCP uses, has identity, it has observability, tools and memory.

Dennis Traub | AI Engineering Specialist at AWS [00:40:14]:

That's what we. It's a lot, I realize.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:40:18]:

Yeah, it is. I was wondering for our audience, have you built an AI prototype that almost made it into production? What blocked you tag us with your war story? We'll be back after a very, very short ad break then is some startups are using multiple foundation models at once. What's AWS's approach to multiple model orchestration and how do you manage that securely?

Dennis Traub | AI Engineering Specialist at AWS [00:40:58]:

Multimodal usage is one of the core premises because we say it doesn't make sense to use one for everything, which is why we started bedrock the way we did in the first place. It's not one model that you can use. You can use models from many different providers with many different capabilities. Because in many use cases you may want to use a general purpose large language model, but you also may want to use a model to create your embeddings or to create images. Could be from a completely different provider. Or you want to use a reasoning model for involved tasks and a much less expensive small model for basic tasks like summarization or classification. And these can be from different providers. Then there's models that are specialized in translation, language, translation of languages.

Dennis Traub | AI Engineering Specialist at AWS [00:41:48]:

There's models that may be specialized in code, creating code. So Bedrock has already always Bedrock and AWS has always looked at it through the lens of different customers need different things, different models and individual customers may need different models models for different use cases or even inside the same use case. So I wouldn't start that way. If I would just build a prototype, I wouldn't start with multiple models, I would start with one to start understanding the moving parts and how it works and what and the limitations. But you can certainly use multiple models, and in any kind of production application I probably would because that helps me reduce cost, reduce latency, because the larger the model, the longer it takes for the model to respond, to take these things into account. So yes, multimodal, using different models, even from different providers, is certainly something that I would suggest looking into once you have your basic use case down and once you go into well, how can I optimize cost how can I optimize latency? Is there a specialized model that helps me with certain tasks inside the workflow? How do we make sure it's secure? Well, just like everything on aws, everything goes through the AWS API. So every model invocation goes through the AWS API, which is protected through iam, through identity and access management. So you can have really fine grained mechanism to say who or which service or which third party or which agent is allowed, which process is allowed to interact with individual models with intro with individual data stores or tools that you provide.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:43:45]:

For our audience. I was wondering what's one tool or pattern that help you finally scale your project? Shared on threads or LinkedIn and taxpayer o Dennis, what's your take on LLM ops or GenIML ops? Is it the same as traditional ML Ops or something new?

Dennis Traub | AI Engineering Specialist at AWS [00:44:07]:

I'm getting into hot waters when I start talking about that because it's not the same. So. And I, I'm not sure if. What did you say? Gen AI Ops, LLM Ops, ML Ops OP definitions are in flow. Well, I'm pretty sure there is a definition for MLOps and there's probably also a definition for LLM ops. But the thing is that with the democratization of generative AI since the Chat GPT moment, effectively when everybody wants to build on top of generative AI, there's a new kind of discipline emerging which sits at the intersection of software developers and data scientists and machine learning engineers. And that's what's emerging as AI engineers. That's the term that's being used increasingly for this, where you don't go build the models yourself, you don't even necessarily fine tune the models and then deploy the models somewhere and run them.

Dennis Traub | AI Engineering Specialist at AWS [00:45:29]:

You use models, you build applications that use these models and traditional approaches to build an application that has both the intelligence of a language model or any kind of generative AI model and the, the, the capabilities of the piece of software you build. So the AI engineer understands how LLMs works, understands the limitations, understands the difference between models, but the AI engineer usually doesn't deploy these models, doesn't build and train these models. That's what happens, that's what the ML engineer does. And when it comes to operations, I think it's very similar. The LLM ops, or more more specifically defined probably ML Ops is really the operational aspect of building and deploying and running models, training models and everything around that. And the gen AI Ops, or AIOps if you will, is DevOps. But now it includes AI as another very important component which requires its own capabilities like evaluation. So you test AI differently than you test a front end or then you do load tests on an environment.

Dennis Traub | AI Engineering Specialist at AWS [00:47:12]:

You have to approach it a slightly different way. It's the same thing in terms of what you have to do. You have to make sure it works. Every time you change something in your application, you have to make sure that it still works, that you don't have any regression, that you didn't introduce any bugs. Now there's a new class of regression, there's a new class of bugs, there's a new class of thing that may introduce latency or that may introduce additional cost. And that class is based on the integration of AI. And in my opinion, the operational aspect of this becomes native part of DevOps over time. There are a lot of tools right now, there will be more tools in the future, but it's very different from what the data science and the ML engineer.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:48:07]:

Have you seen any counterintuitive success story where there was less tech and that actually led to better performance of the AI in production?

Dennis Traub | AI Engineering Specialist at AWS [00:48:18]:

The most counterintuitive thing is something that I see fairly often really is when you approach it saying, well, let's do this with AI and you realize actually we don't need AI for this, or let's build an AI agent because everybody's talking about agents right now, which is a good thing because it's an evolving space. But you realize actually I don't really need an agent because I can simply use an LLM for this. So the most counterintuitive is something that I have seen throughout my entire career. And software engineering is less complex, very often is more effective. So whenever you build something, I encourage you to try and to experiment with AI and AI agent, but I also encourage you to not try to just solve everything with AI. And that may be counterintuitive advice, but it has always been sound advice in my experience. Experience.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:49:22]:

Last question for us here in the second interview and thank you for sticking around with me because we're together here in a session for more than two hours now. Zoom out for us, Dennis. What's the future of AI architecture? In something like two to three years, do you see MCP and ancient frameworks becoming the new standard?

Dennis Traub | AI Engineering Specialist at AWS [00:49:43]:

I have no idea. Literally, I have no idea. If you look back through the last two to three years since the ChatGPT moment, basically everything has changed so dramatically. The technology, the infrastructure, the capabilities of the models themselves, the availability, the open source framework, the work that the community is doing, the many, many startups that are around Certainly many still try to solve old problems with new tools. But there are also so many niches where something incredible is actually happening and there's so much innovation happening. I don't even know I'm going to go on vacation a week from now for three weeks. I don't even know what the world will look like when I'm back. It's really hard.

Dennis Traub | AI Engineering Specialist at AWS [00:50:40]:

I think agents. Well, first of all, AI is not like a flu. It won't go away. It's going to stick around. Agentic AI, it's being hyped right now. But I also think it is a very important topic that either sticks around or evolves into something even more capable. The thing is, the thing is the best, the best time to get in, to get involved is now because it's never, it's never going to be as simple as it is today. And I realize it is really hard.

Dennis Traub | AI Engineering Specialist at AWS [00:51:19]:

I'm able, I'm. I'm lucky to be able to work with this stuff every day, all day long. And I'm still overwhelmed. I'm still overwhelmed. I've subscribed to so many newsletters and there's so much news and so many tools to look at and so many frameworks and so many. I don't know. I don't even know where to start until I realized most of these newsletters and most of the experts that are around all of a sudden, they just copy from each other. Many of them, not all of them, but many really just copy from each other.

Dennis Traub | AI Engineering Specialist at AWS [00:51:49]:

And I, I'm fairly convinced that many of them really are just AI tools creating content on the socials, in newsletters and so forth. So it's really hard to, it's really hard to distill the actual signal from the noise right now. But at the same time, it has never been as easy as today because it's only getting more complicated. It's only going to be more so for you. Really important is to get started now and at the same time try to understand the fundamentals, not necessarily the math behind these models. You don't need a PhD in science or in math or anything. I, I certainly don't. I'm a developer.

Dennis Traub | AI Engineering Specialist at AWS [00:52:33]:

I don't understand AI, to be, to be honest. But what I do understand very well by now is how can I use AI in software application? What impact does it have on the capabilities of what I build, but also what impact does it have on the way I work? That's two different levels. And I'm, I'm able to do that because I did the work to at least understand the fundamentals of what these models actually are and how they work in terms of their capabilities. So why do they get things wrong? Why do they have what we call hallucinations? Why do they have a hard time doing basic math while being able to talk for hours? So these are the things. And I invite you to listen to Joe's podcast. I invite you to have a look at the stuff that we put out at aws, to the things that I put out on the socials. Follow me on LinkedIn. It's just Dennis Troup.

Dennis Traub | AI Engineering Specialist at AWS [00:53:45]:

I think Joe's going to put my context into the details. Ask questions, talk to people. Figure out, figure out how this stuff works. Experiment, play around with it. Don't be stupid. Don't connect a random piece of AI to your email. Don't put something out on the Internet and then run up a bill because somebody. DDoS is you.

Dennis Traub | AI Engineering Specialist at AWS [00:54:13]:

Experiment in an isolated environment, maybe inside of an AWS account or on your local machine where everything's isolated and protected. You don't have to worry. Worry about external influences and maybe threats. Experiment, play around with it and at the same time think about, think about the things that you might want to solve for yourself. Think about, think about the things that you need to do manually, manually every day because it was too hard to automate or it was too impossible. It was impossible to automate or it was too costly, or you just didn't get around to automating it. Maybe AI can help you with a small problem that you have every day, that you're trying to solve every day, and it bothers you and it's so annoying. That is what I did.

Dennis Traub | AI Engineering Specialist at AWS [00:55:08]:

That's how I got started. That's how I learned. I looked at what I'm doing every day and there's so much stuff that I never got around to doing. And it was. I was complaining about them all the time and it bothered me all the time. And all of a sudden I realized I can build a small agent that just does it for me. And it doesn't even need to connect to sensitive data or it doesn't even connect to the Internet or anything. It is just a small CLI tool that I run automatically every day and it takes care of some stuff for me.

Dennis Traub | AI Engineering Specialist at AWS [00:55:42]:

It pulls some statistics or looks. If there was some new conversations on Slack that I need to know about, these are the small things that I built. And by building these small things, I. I learned how they work, I learned how they fail. I learned about all the things that can go wrong. And then I started being able to build larger things, to build more complex applications, to build actual agents, actual agentic systems that I now run for more and more things. I have to admit though, I'm not running anything in production because I'm not building production software anymore. I haven't for a few years, unfortunately.

Dennis Traub | AI Engineering Specialist at AWS [00:56:24]:

But the great thing is, in my role, I get to experiment with that stuff a lot.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:56:30]:

Dennis, awesome last words. Thank you very much for being such a good guest and telling us so much about AI and aws, how they're working together.

Dennis Traub | AI Engineering Specialist at AWS [00:56:41]:

Thank you so much for having me. It was a great time.

Jörn 'Joe' Menninger | Founder and Editor in Chief | Startuprad.io [00:56:44]:

Thank you. That's all folks. Find more news, streams, events and interviews@www.startuprad.IO. remember, sharing is caring.

Dennis Traub | AI Engineering Specialist at AWS [00:57:06]:

Same.

STARTUPRAD.IO