Artificial intelligence (AI) continues to evolve at a rapid pace, and so do the platforms that make it usable in enterprise environments. Since the release of Kong AI Gateway 3.10 and my introductory blog on its AI capabilities, the landscape has shifted from simple large language model (LLM) integrations toward more complex, agent-driven architectures. With versions 3.11 through 3.14, Kong has significantly expanded its AI capabilities and introduced more advanced mechanisms, including Model Context Protocol (MCP) access controls. It is time to revisit the AI Gateway and explore what “reloaded” means in practice for this new phase of AI adoption.
Overview
When Kong AI Gateway 3.10 was released, the journey that began with early AI capabilities in Kong Gateway 3.6 — where the first AI Gateway features were introduced — had reached an important milestone, establishing a strong foundation for integrating LLMs into enterprise environments. The focus was clear: LLM-agnostic integration, secure AI access, centralized traffic management, and support for ethical AI safeguards and risk mitigation through AI plugins.
These capabilities are integrated directly into the existing Kong Gateway platform, avoiding the need to introduce specialized SDK frameworks or standalone LLM intermediary platforms that are separate from the existing enterprise architecture.
The continued evolution from Kong Enterprise versions 3.11 to 3.14 can be observed across three key dimensions:
- broader AI ecosystem integration,
- enhanced routing, load balancing, and reliability for production AI workloads, and
- deeper support for agentic architectures through Model Context Protocol (MCP) enhancements,
- emerging AI-to-AI (A2A) communication capabilities enabled by the A2A proxy plugin.
The following sections revisit the Kong AI Gateway concepts introduced in the previous blog and explore how recent developments effectively “reload” the architecture.
Reloaded AI gateway architecture
Kong AI Gateway is built on the mature Kong API Gateway and enhanced with AI-specific plugins, some available in the open-source edition, others are provided as Enterprise capabilities that require an appropriate license. The layered architecture below illustrates how AI and MCP plugins extend the existing API Gateway stack, which in turn builds on the proven NGINX-based runtime foundation. Still an emerging capability, WebAssembly (WASM) is also supported in Kong Gateway as an alternative to LuaJIT.
This layered view highlights that AI capabilities are embedded directly into the established Kong Gateway runtime. By extending the existing plugin framework and core traffic management layer, the AI Gateway inherits the stability, performance and operational maturity of the underlying API platform.
Building on this technical foundation, the expanded AI plugin ecosystem can be organized into clearly defined control domains. Each domain represents a structured grouping of related capabilities, including observability, guardrails, performance and cost optimization, access governance, and agentic protection, as illustrated in the updated overview below.
At the heart of this architectural model is the “Prompt and Response Engineering Controls” domain. Positioned at the center of the diagram, it directly addresses ethical AI principles and structured AI risk management by governing how prompts are validated, how responses are filtered, and how guardrail mechanisms are applied.
By positioning these controls at the core of the AI Gateway, the architecture emphasizes that responsible AI usage is not an optional add-on but a foundational design principle. Surrounding domains such as observability, performance and cost optimization, unified LLM access, and MCP protection build upon this core, together forming a modular AI access and control framework. Additionally, the new A2A proxy plugin supports observability and control for agent-to-agent traffic by transparently handling A2A protocol requests through the gateway.
The following section examines how recent enhancements and newly introduced plugins strengthen these control domains and further mature the AI Gateway architecture.
Advancing AI plugin capabilities
With the foundation established in 3.10, the subsequent releases focused on deepening and refining the AI plugin landscape. The evolution of AI Proxy, AI Proxy Advanced, and newly introduced AI plugins reflects a clear shift toward greater operational control, enhanced visibility, improved performance management, and stronger support for agent-driven architectures. This is further complemented by emerging support for agent-to-agent (A2A) communication.
Within the domain of AI Observability and Alerting Controls, the AI Gateway introduced enhanced logging, metrics exposure, and tracing capabilities for AI traffic, enabling deeper insight into prompt execution, token usage, and provider interactions. These improvements allow teams to monitor AI behavior more transparently, define alerting thresholds, and detect anomalies or misuse in production environments.
The Unified LLM Access Controls category is primarily shaped by the capabilities of the AI Proxy Advanced plugin. This plugin introduces advanced traffic management mechanisms for LLM integrations. In addition to semantic routing, which enables request distribution based on contextual similarity rather than traditional balancing algorithms, the gateway now provides cost-aware provider selection, controlled retry policies, and explicit failover criteria. Usage-based load steering, configurable failure thresholds, and automated health-driven rerouting across multiple LLM providers transform AI consumption into production-grade traffic management.
The Prompt and Response Engineering Controls domain has been significantly strengthened through the introduction of additional hyperscaler-based guardrail integrations. Expanding beyond the previously available Azure Content Safety integration, recent releases introduced dedicated plugins such as AI AWS Guardrails, AI GCP Model Armor, and the AI Lakera Guard plugin, which integrates with the independent Lakera Guard SaaS solution. To provide maximum flexibility, the new AI Custom Guardrail plugin enables fully customizable validation and enforcement logic for both prompts and responses.
In addition, the AI Semantic Response Guard plugin extends protection capabilities by evaluating model responses based on semantic characteristics and configurable policies directly within the gateway. Together with the AI Semantic Prompt Guard plugin, which focuses on semantic prompt control, it establishes a complementary mechanism: one plugin governs prompt input, while the other governs model output.
These plugins enable both external and gateway-level evaluation and filtering of prompts and model responses, supporting policy enforcement, toxicity detection, data leakage prevention, and the implementation and verification of compliance-related control requirements. By integrating these guardrail mechanisms directly into the gateway layer, Kong supports ethical AI practices and structured risk management without requiring application-level modifications.
The Cost and Performance Controls domain has been further enhanced through the introduction of the AI Prompt Compressor plugin. In long-running conversational and agent-driven workloads, context windows can continuously grow, increasing token consumption, latency, and operational cost. At the same time, excessive context can contribute to what is often referred to as “context rot” — a phenomenon in which large language models may gradually lose accuracy and reliability as the volume of input data expands and relevant signals become diluted.
The AI Prompt Compressor addresses this challenge by reducing prompt size while preserving semantic intent, helping to maintain response quality, control token usage, and stabilize performance characteristics. By optimizing prompt payloads directly at the gateway layer, Kong contributes to more efficient and sustainable AI consumption without requiring application-level redesign.
The Response Accuracy and Hallucination Mitigation Controls domain has been expanded through the introduction of the AI LLM-as-Judge plugin. This plugin enables automated evaluation of prompt-response pairs using a dedicated LLM acting as a reviewer. It assigns a numerical score to generated responses on a scale from 1 to 100, where lower values indicate incorrect or irrelevant outputs and higher values reflect stronger alignment with the expected response quality.
By introducing a scoring-based evaluation mechanism directly within the AI plugin suite, Kong makes it straightforward to integrate automated validation and retry activities for LLM interactions. Responses can be assessed against configurable quality criteria and, based on the evaluation outcome, re-generated or routed to an alternative model. This assessment feature enables automated quality control and can help reduce the risk of unchecked hallucinations.
The MCP Proxy and Protection Controls domain addresses the growing need to securely expose tools and services to agent-based AI systems. The Model Context Protocol (MCP) defines a standardized way for AI agents to interact with external tools and data sources, enabling structured tool usage within agent workflows.
To support this emerging ecosystem, Kong introduced dedicated MCP capabilities through the AI MCP Proxy and AI MCP OAuth2 plugins. The AI MCP Proxy plugin allows MCP-based tool interactions to be routed and managed through the gateway, applying the same traffic control and policy enforcement mechanisms used for API traffic. Complementing this, the AI MCP OAuth2 plugin enables secure authentication and authorization for MCP tool access, ensuring that agent interactions follow established identity and access control policies. Together, these plugins extend the AI Gateway from managing LLM access to governing agent-to-tool interactions within MCP-based architectures.
Taken together, these enhancements highlight how the AI Gateway is evolving toward a broader operational layer for AI-driven systems. As agentic workflows and multi-model environments continue to emerge, the gateway will increasingly act as the point where AI interactions are governed, secured, and optimized, supported by the new A2A proxy plugin for agent-to-agent communication.
Conclusion
With the continued evolution of its AI plugin ecosystem, Kong AI Gateway has matured from an LLM integration layer into a broader platform for governing modern AI workloads. The expanding set of plugins shows how traditional API Gateway capabilities naturally extend into the AI domain.
One prediction from the previous article has already materialized: the importance of the Model Context Protocol (MCP). As agent-based systems increasingly rely on structured access to tools and data sources, MCP support in the Kong AI Gateway provides a standardized and secure way to govern these interactions at the gateway layer.
This reinforces the broader architectural idea that AI Gateways are not a replacement for API Gateways, but their natural evolution. As AI systems become more deeply integrated into enterprise architectures, the gateway remains the logical control point for managing access, enforcing policies, and protecting critical infrastructure.
And as we have seen once again with all the AI functionality and extensions, Kong AI Gateway can help reduce architectural limitations and provide the flexibility needed to meet the demands of the evolving AI era
Credits
A special note of thanks to Johannes Reim for reviewing every article in this series and for the invaluable feedback.