Understanding AI Application Architecture

Lori MacVittie

F5 Distinguished Engineer | Chief Evangelist

What does an AI application look like, really? After having built a few, it seems like a good time to break it down into its (many) parts—and explore the operational implications.

We’ve previously explored what’s the same (a lot) and what’s different (a lot) about AI applications. It’s easy enough to define an AI application as “an application that leverages AI, either predictive or generative.”

That’s true, but for the folks who must build, deploy, deliver, secure, and operate these new kinds of applications, what that does mean?

It means new moving parts, for one. And not just inferencing services. There are other new components that are becoming standard adds to the application architecture, like vector databases. It also means increased reliance on APIs (Application Programming Interfaces).

It is also true that most of these components are containerized. When I set up a new AI application project, I fire up a container running PostgreSQL to use as a vector data store. That’s to support RAG (Retrieval Augmented Generation), which a significant percentage of AI applications are using according to multiple industry reports (this one from Databricks, for example). That’s because I can augment a stock AI model using any data I like without having to fine-tune or train a model myself. For a significant number of use cases, that’s the best way to implement an AI application.

I may also fire up a knowledge graph, typically by running a containerized graph database like Neo4J. Graph databases are particularly useful when working with graph-affine data, like social networks, recommendation engine data, and fraud detection. It turns out they’re also useful in mitigating hallucinations related to policy generation, as we learned early in the year. Inclusion of a graph database can add a new protocol, GraphQL, to the list of “new things that need securing.”

Then I’m going to decide which inferencing service I want to use. I have options. I could use a cloud provider service or AI as a service (à la ChatGPT) or I could run a local service. For example, my latest tinkering has used Ollama and phi3 on my MacBook. In this case, I’m only running one copy of the inferencing server because, well, running multiple would consume more resources than I have. In production, of course, it’s likely there would be more instances to make sure it can support demand.

Because I’m going to use phi3, I choose phidata as my framework for accessing the inferencing service. I’ve also used langchain when taking advantage of AI as a service, a popular choice, and the good news is that from an operational perspective, the framework is not something that operates outside the core AI application itself.

Operational Impact of AI Applications

I haven’t even written a line of code yet, and I’ve already got multiple components running, each accessed via API and running in their own containers. That’s why we start with the premise that an AI application is a modern application and brings with it all the same familiar challenges for delivery, security, and operation. It’s also why we believe deploying AI models will radically increase the number of APIs in need of delivery and security.

Operational challenges are additive, and each new architecture adds new or amplifies existing challenges.

AI applications add another tier to an already complex environment which, of course, increases complexity. Which means AI application architecture is going to increase the load on all operations functions, from security to SRE (Site Reliability Engineering) to the network. Each of these components also has their own scaling profile; that is, some components will need to scale faster than others and how requests are distributed will vary simply because most of them are leveraging different APIs and, in some cases, protocols.

What’s more, AI application architecture is going to be highly variable internally. That is, the AI application I build is likely to exhibit different local traffic characteristics and needs the AI application you build. And more heterogeneity naturally means greater complexity because it’s the opposite of standardization, which promotes homogeneity.

This is why we see some shared services—particularly app and API security—shifting to the edge, à la security as a service. Such a set of shared services can not only better protect workloads across all environments (core, cloud, and edge) but provide a common set of services that can serve as a standard across AI applications.

Understanding the anatomy of AI applications will help determine not just the types of application delivery and security services you need—but where you need them.

Lori MacVittie is the principal technical evangelist for cloud computing, cloud and application security, and application delivery, and is responsible for education and evangelism across F5’s entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she authored articles on a variety of topics aimed at IT professionals. She holds a BS in information and computing science and an MS in computer science.

Please note: The opinions expressed in Industry Insights published by dotmagazine are the author’s or interview partner’s own and do not necessarily reflect the view of the publisher, eco – Association of the Internet Industry.

Understanding AI Application Architecture

Lori MacVittie from F5 analyzes the intricate components and operational challenges of AI applications, highlighting the importance of containerization, APIs, and emerging technologies in shaping the future of application development and security.

Lori MacVittie

Operational Impact of AI Applications

Related Stories