5 AI Agents

5 AI Agents #

Introduction to agentic AI #

Retreaval Augmented Generation (RAG), that we discussed in the previous chapter, is one approach that augments the capabilities of an LLM; in the case of RAG, this augmentation is triggered outside of the (main) LLM: a separate RAG mechanism takes the prompt, selects appropriate document chunks, retrieves them and feeds prompt + context to the LLM (which then generates a “grounded” answer).

The new characteristic feature of AI agents or agentic AI is that the (main) LLM extends its capabilities by using one or various tools; such tool use is triggered by commands that are generated by the (main) LLM itself. In other words: the whole process between user prompt and completion is controlled by an LLM.

Typical tools include

Web search
Database operations
Web services (e.g.: Google maps, booking platforms)
Local file operations
PDF extraction or generation
Other LLMs

Obviously, some mechanisms outside the LLM are still needed (as in RAG) for separating tool commands given by the LLM from text output intended for the user and for interactions with the tools (sending commands + context, receiving tool output).

Sometimes, all multi-stage LLM operations are regarded as agentic AI, which then includes all applications of reasoning LLMs.

Model Context Protocol (MCP) #

In November 2024, the company Anthropic introduced the Model Context Protocol (MCP) as a standard for tool discovery and tool use by LLMs. Definite information about MCP can be found on https://modelcontextprotocol.io/ (from which we cite in the following).

MCP principle — **Figure 5.:** MCP architecture (source: https://modelcontextprotocol.io/docs/getting-started/intro)

MCP provides a standardized way for applications to:

Share contextual information with language models
Expose tools and capabilities to AI systems
Build composable integrations and workflows

The protocol uses JSON-RPC 2.0 messages (via stdio, http, or SSE connections) to establish communication between:

Hosts: LLM applications that initiate connections
Clients: Connectors within the host application
Servers: Services that provide context and capabilities

MCP Features #

Servers offer any of the following features to clients:

Prompts: Pre-defined templates or instructions that guide language model interactions
Resources: Structured data or content that provides additional context to the model
Tools: Executable functions that allow models to perform actions or retrieve information

MCP requires a stateful connections between clients and servers; via these connections, clients may also offer services to the servers (sampling, roots, elicitation).

Using Chat-AI (GWDG) with MCP #

As briefly explained in https://docs.hpc.gwdg.de/services/chat-ai/features/mcp/, the GWDG AI interface Chat-AI now supports MCP. Following the GWDG example, we use the MCP server at https://mcp.deepwiki.com (cf. https://cognition.ai/blog/deepwiki-mcp-server) with the prompt

Explain what this Github Repo: https://github.com/gwdg/chat-ai is about

using the model Qwen 3 30B A3B Thinking 2507.

Chat-AI with MCP — **Figure 5.:** Using MCP with the GWDG AI interface Chat-AI

According to the Chat-AI interface, this prompt triggers two reasoning stages, before an answer is given:

Qwen 3 30B A3B Thinking 2507 - thinking stage 1

Calling MCP server: https://mcp.deepwiki.com/mcp

function: ask_question with args: {"repoName": "gwdg/chat-ai", "question": "Explain what this GitHub Repo is about"}

MCP response received of type text

Qwen 3 30B A3B Thinking 2507 - thinking stage 2

Okay, let’s see. The user asked about the GitHub repo https://github.com/gwdg/chat-ai. I called the ask_question function with the repo name and the question. The response came back with a detailed explanation.

First, I need to parse the tool response. The repo is called chat-ai, part of a larger system for LLMs on HPC. The main components are frontend (React/Vite) and backend (Node.js/Express). They mention it’s a web interface that wraps OpenAI-compatible APIs. Key features include memory system, tools integration (web search, image gen, etc.), customizable models, and Slurm-native for HPC.

Wait, the user just wants an explanation. The tool response already gives a thorough answer. My job is to present this in a clear, concise way without adding extra info. Let me check if all the important points are covered.

The response mentions Docker services, configuration files, and some notes about the project structure. The user might not need all the technical details, but since they asked for an explanation, including the main points is good.

I should structure the answer to highlight the purpose, key components, features, and maybe a note on deployment. Avoid jargon where possible, but since it’s a technical repo, some terms are necessary.

Wait, the tool response says it’s part of a larger architecture with server and HPC repos. So the chat-ai repo is specifically the web interface. Need to clarify that it’s not the entire system but the frontend/backend for the web part.

Also, the memory system stores locally in the browser, which is a nice feature. Tools like web search and image generation are integrated. Slurm-native integration is important for HPC, so that’s a key point for the target audience (HPC users).

Check if the answer should mention the Docker setup, but maybe briefly. The user might be a developer looking to deploy it, so configuration via Docker and JSON files is relevant.

Make sure to mention it’s a standalone app that can wrap any OpenAI-compatible API, so it’s flexible.

Avoid including the DeepWiki link or the citation file details unless they’re crucial. The main explanation should focus on what the repo does, not the supplementary files.

Putting it all together: Start with the main purpose, then components, key features, and deployment. Keep it to a few paragraphs. Let me draft that.

Qwen 3 30B A3B Thinking 2507 - answer

The GitHub repository gwdg/chat-ai is a web-based interface for interacting with Large Language Models (LLMs), designed specifically for deployment on High-Performance Computing (HPC) systems. It serves as the frontend and backend layer for a larger architecture (with server/HPC components in separate repositories) and functions as a wrapper for OpenAI-compatible API endpoints.

🔑 Core Components #

Frontend (front)
- Built with React + Vite, runs in the browser.
- Handles user interactions, UI features (e.g., collapsible sidebars, model selector, audio input), and local memory storage for conversation context.
Backend (back)
- A Node.js/Express proxy that routes API requests to LLM providers (e.g., OpenAI, custom endpoints).
- Manages document processing, memory updates, and tool integrations.

💡 Key Features #

Memory System: Stores conversation context locally in the browser (via generateMemory API).
Tool Integration: Supports web search, image generation, speech synthesis, and RAG (Retrieval-Augmented Generation) via Arcana.
Customization: Adjustable LLM parameters (e.g., temperature, top_p), model selection, and default settings via secrets/front.json/back.json.
HPC Integration: Designed as a Slurm-native platform for deploying LLM services on HPC infrastructure.
Standalone Deployment: Works as a self-contained Dockerized app (frontend + backend) with configuration via JSON secrets.

🛠️ Deployment #

Uses Docker and Docker Compose for containerization.
Configuration is managed through environment variables injected from secrets/*.json files.
The frontend reads front.json to set VITE_* variables (e.g., API endpoints, model defaults).

📌 Why It Matters #

This repo enables researchers and developers to access LLMs via a user-friendly web interface directly on HPC systems (e.g., university clusters), avoiding the need for manual API calls. It’s part of the SAIA software suite, which emphasizes Slurm-based orchestration for HPC workloads.

For details, check the Wiki or the CITATION.cff for academic reference.

The interesting question remains: what exactly was the answer of the MCP server? Unfortunately, this answer is not available in the Chat-AI interface.

We, therefore, use some Python code (generated using Gemini 3, after attempts using curl had failed, also attempts using Python with the requests and httpx modules had failed):

Python code for inspecting MCP answer

 1import asyncio
 2from mcp import ClientSession
 3from mcp.client.sse import sse_client
 4
 5# The URL where the server "listens" (SSE endpoint)
 6# The SDK will automatically learn the POST URL from the handshake.
 7MCP_SERVER_URL = "https://mcp.deepwiki.com/sse"
 8
 9async def main():
10    print(f"Connecting to {MCP_SERVER_URL}...")
11
12    # sse_client manages the background connection automatically
13    async with sse_client(MCP_SERVER_URL) as streams:
14        read_stream, write_stream = streams
15        
16        # ClientSession binds the read/write streams together
17        async with ClientSession(read_stream, write_stream) as session:
18            
19            # 1. Standard MCP Initialization
20            await session.initialize()
21            print("✅ Connected and Initialized!")
22
23            # 2. Call the Tool
24            # We don't need to manually construct JSON-RPC; the method does it.
25            print("Running query...")
26            result = await session.call_tool(
27                name="ask_question",
28                arguments={
29                    "repoName": "gwdg/chat-ai",
30                    "question": "Explain what this GitHub Repo is about"
31                }
32            )
33
34            # 3. Print the Result
35            # The result object contains the structured content you wanted
36            print("\n--- RESPONSE CONTENT ---\n")
37            for content in result.content:
38                if content.type == "text":
39                    print(content.text)
40
41if __name__ == "__main__":
42    asyncio.run(main())

MCP answer via Python client

This GitHub repository, gwdg/chat-ai, contains the standalone web interface for a privacy-focused web application called Chat AI . This application allows users to interact with Large Language Models (LLMs) through OpenAI-compatible APIs . It is designed to be deployed independently but also functions as the web interface component of a larger SAIA (Slurm-Native AI Architecture) ecosystem .

Purpose and Architecture #

The primary purpose of this repository is to provide the frontend and a proxy backend for the Chat AI system . The system is composed of two main services:

Frontend (front): A React-based single-page application built with Vite . This part of the application handles the user interface, state management using Redux, and local data storage in IndexedDB .
Backend (back): A Node.js/Express proxy service . This service acts as a wrapper for API requests, preventing CORS errors and providing more control over requests to the OpenAI-compatible API endpoint . It also integrates with docling for document processing, such as PDF conversion .

The architecture follows a three-tier design: a Browser Layer, a Backend Proxy Layer, and an External Services Layer . All user data is stored locally in the browser using IndexedDB, ensuring privacy .

Key Features #

Chat AI offers several features to enhance user interaction with LLMs :

Multiple Conversations: Users can manage an unlimited number of conversations, which are stored locally .
Streaming Responses: Provides real-time message streaming from the LLM .
User Memory System: Allows the AI to learn and remember relevant details across conversations, with options for “Recall” and “Learn” modes . Memories are stored locally in the browser .
File Attachments: Supports various file types including images, audio, video, PDFs, and CSV files .
GWDG Tools Integration: Extends LLM capabilities with tools like web search, image/audio generation, and RAG (Retrieval Augmented Generation) .
Markdown & LaTeX Rendering: Full support for formatted text and mathematical notation in responses .
Personas: Users can import pre-configured system prompts from GitHub to define the AI’s persona . The ImportPersonaModal.jsx component handles fetching and processing these persona files from a GitHub repository .
Internationalization: The interface is available in English and German .

Deployment and Configuration #

The application uses Docker Compose for orchestration, with both frontend and backend services running in separate containers . Configuration is managed through two JSON files: secrets/front.json for frontend settings and secrets/back.json for backend settings . These files allow customization of ports, API endpoints, API keys, and default conversation settings .

Notes #

The CITATION.cff file indicates that this software is part of a larger project named SAIA, which focuses on providing a Slurm-native solution for HPC-based services . The README.md further clarifies that this repository specifically contains the standalone web interface, while other components of the SAIA architecture (server components and HPC components) are located in separate GitHub repositories . The front/package.json file provides details about the frontend’s dependencies and scripts . The CHANGELOG.md file details the version history and highlights new features and improvements over time .

Wiki pages you might want to explore:

Overview (gwdg/chat-ai)

View this search on DeepWiki: https://deepwiki.com/search/explain-what-this-github-repo_30b68fb2-deb7-40a2-8046-3d3e26e702e8

In this case, the MCP already answered the question contained in the prompt (using markdown). The local LLM (in this case: Qwen 3) just summarized and rephrased the MCP answer.

MCP inspector #

Interactive exploration of MCP servers is possible using the MCP inspector (https://modelcontextprotocol.io/docs/tools/inspector). Here, we connect it first to the same MCP server (that does not require authentication)

A click on the “List tools” panel leads to the following listing (with the entries read_wiki_structure, read_wiki_contents, and ask_question)

A click on the tools gives not only more information to the specific tool, but also allows its usage (via screens that are created based on the prompt templates received during MCP discovery). In particular, we can ask the same question about the Chat-AI GitHub repository again, via this GUI:

In addition to the text answer (in green color), we also get structured information (“Server Notifications”). Interestingly, the text/markdown answer now contains a mermaid diagram:

MCP answer obtained via MCP instructor (including Mermaid)

This GitHub repository, gwdg/chat-ai, contains the standalone web interface for a privacy-focused web application called Chat AI . This application allows users to interact with Large Language Models (LLMs) through OpenAI-compatible APIs . It is designed to be deployable independently, but also functions as a component of the larger SAIA (Slurm-Native AI Architecture) ecosystem .

Purpose and Architecture #

The primary purpose of Chat AI is to provide a web-based interface for LLMs, emphasizing user privacy by storing all user data locally in the browser using IndexedDB .

The system is composed of two main services:

Frontend (front): A React-based single-page application built with Vite .
Backend (back): A Node.js/Express proxy service that handles API requests and document processing .

The architecture follows a three-tier design:

Browser Layer: The React application (App.jsx) manages global state via Redux and persists conversation data to IndexedDB .
Backend Proxy Layer: An Express server (back/src/index.js) handles CORS, authentication, and routes requests to the OpenAI-compatible API, also processing document uploads .
External Services Layer: This layer interacts with LLM inference services and optional tool-enabled gateways for advanced capabilities like web search and RAG .

graph TB
    subgraph "Browser Environment"
        User["User"]
        FrontApp["front/src/App.jsx<br/>React Application"]
        Redux["Redux Store<br/>interface_settings<br/>user_settings"]
        IndexedDB["IndexedDB (Dexie)<br/>conversations<br/>messages<br/>files_data"]
        i18n["i18next<br/>en.js / de.js"]
    end
    
    subgraph "Backend Service"
        BackServer["back/src/index.js<br/>Express Server<br/>Port 8081"]
        ChatProxy["/chat/completions<br/>POST endpoint"]
        ModelsProxy["/models<br/>GET endpoint"]
        DocProcessor["docling integration<br/>PDF processing"]
    end
    
    subgraph "External Services"
        OpenAIAPI["OpenAI-compatible API<br/>apiEndpoint from back.json"]
        DoclingService["Docling Service<br/>Document conversion"]
        GWDGTools["GWDG Tools<br/>saia-openai-gateway"]
    end
    
    subgraph "Configuration"
        FrontConfig["secrets/front.json"]
        BackConfig["secrets/back.json"]
    end
    
    User -->|"Interacts"| FrontApp
    FrontApp -->|"Reads/Writes"| Redux
    FrontApp -->|"useSyncConversation"| IndexedDB
    FrontApp -->|"useTranslation"| i18n
    FrontApp -->|"API Calls :8080"| BackServer
    
    BackServer --> ChatProxy
    BackServer --> ModelsProxy
    BackServer --> DocProcessor
    
    ChatProxy -->|"Streaming"| OpenAIAPI
    ModelsProxy -->|"List models"| OpenAIAPI
    DocProcessor -->|"Convert PDFs"| DoclingService
    ChatProxy -->|"enable_tools=true"| GWDGTools
    
    FrontConfig -.->|"Configure"| FrontApp
    BackConfig -.->|"Configure"| BackServer
    
    Redux -.->|"redux-persist"| LocalStorage["localStorage"]

Key Features #

Chat AI offers various features, including:

Multiple Conversations: Management of unlimited conversations with local storage .
Streaming Responses: Real-time message streaming from LLMs .
User Memory System: Contextual memory across conversations with recall modes .
File Attachments: Support for various file types including images, audio, video, PDFs, and CSV files .
GWDG Tools Integration: Features like web search, image/audio generation, RAG, and MCP servers .
Markdown & LaTeX Rendering: Full support for formatted text and mathematical notation .
Personas: Ability to import pre-configured system prompts from GitHub .
Internationalization: English and German interface support .
Export Functionality: Conversations can be exported in JSON, PDF, and plain text formats .

Technology Stack #

The project utilizes a modern web development stack :

Frontend: React 19.1.1 with React Router, Vite 7.1.2, Redux Toolkit 2.8.2, Dexie 4.0.11 (for IndexedDB), Tailwind CSS, Headless UI, Framer Motion, react-markdown, rehype-katex, and i18next .
Backend: Node.js with Express framework, and docling integration for document processing .
Development Tools: npm for package management, Docker with Docker Compose for orchestration, and Git for version control .

Data Persistence #

Chat AI employs a dual-layer persistence strategy :

Redux Store → localStorage: Global UI settings, user memories, and the last conversation ID are persisted to localStorage using redux-persist .
Component State → IndexedDB: Conversation data, including metadata, messages, content items, and file attachments, is stored in IndexedDB via Dexie . The useSyncConversation hook manages synchronization and conflict resolution .

This approach ensures data privacy, performance, reliability, and portability .

Configuration #

The system is configured using two JSON files :

secrets/front.json: Contains frontend settings such as API paths, default models, and UI configuration .
secrets/back.json: Holds backend settings including the OpenAI-compatible API URL, API key, and service name .

Notes #

The CITATION.cff file indicates that this software is part of a larger project named SAIA, which focuses on providing a Slurm-native solution for HPC-based services . The README.md further clarifies that this repository specifically contains the web interface, while other components of the SAIA architecture (server and HPC components) are located in separate repositories . The front/src/i18n/en.js file provides English translations for various UI elements and help texts, including explanations for features like Arcana, MCP, models, memory, system prompt, temperature, tools, web search, and top_p . The front/src/db/dbTypes.ts and front/src/db/index.ts files define the data structures and database schema for storing conversations, messages, content items, and file data using Dexie (IndexedDB) .

Wiki pages you might want to explore:

Overview (gwdg/chat-ai)

View this search on DeepWiki: https://deepwiki.com/search/explain-what-this-github-repo_ac5e4b34-af75-4e8c-8901-fc5222c40bf6

**Figure 5.:** Mermaid diagram contained in MCP answer

A full list of available MCP servers can be found on the official registry at https://registry.modelcontextprotocol.io

A somewhat curated list, including reference servers is available at https://github.com/modelcontextprotocol/servers (see also https://modelcontextprotocol.io/examples).

Interesting entries in the latter list are

DeepL - Translate or rewrite text with DeepL’s very own AI models using the DeepL API
GitHub - GitHub’s official MCP Server

Unfortunately, the DeepL MCP server is only available for local installation and only allows connections via stdio (not http); on the backend, it connects to the DeepL servers via propriatary APIs. In order to explore the DeepL MCP features, we connect it directly to the MCP inspector (API key requires registration with credit card, only partially shown):

export DEEPL_API_KEY="84e84575-123c-..."
npx @modelcontextprotocol/inspector npx -y deepl-mcp-server

After setting the Transport Type to STDIO and hitting the Connect button, we can list the tools and get, e.g., a list of the source languages:

**Figure 5.:** DeepL MCP server: get-source-languages

Using the translate-text tool, we can translate, e.g., the above paragraph (showing only the right-hand column of the MCP inspector window, split in half for better layout):

In principle, it should be possible to connect the DeepL MCP server also to local LLMs via OpenWebUI; however, this is not straightforward (as it requires some STDIO - http proxy).

GitHub provides a public MCP server at https://api.githubcopilot.com/mcp/; registered users can create personal access tokens (PATs) at https://github.com/settings/tokens. Connecting (via Streamable HTTP and with a header “Authorization” with content “Bearer <PAT>”) we can, again, list the tools in the MCP inspector (here only partially shown, the full list is quite long)

and, e.g., get information about the API user:

It is straightforward to connect OpenWebUI to the GitHub MCP server. However, attempts of getting Qwen3 8B to use any of the GitHub tools were not successful.

Other agentic protocols #

MCP is probably the most widely used open protocol for agentic AI (beyond reasoning LLMs). A Google AI search brings up some alternatives:

Agent-to-Agent (A2A) Protocol: A protocol developed by Google for structured peer-to-peer communication between AI agents. It handles message passing, role negotiation, shared context, and task delegation, allowing agents to collaborate in multi-agent systems.
Agent Communication Protocol (ACP): An IBM-developed protocol that standardizes interfaces for agent interactions, lifecycle management, and workflow orchestration.
Agent Gateway Protocol (AGP): An industry-standard protocol that acts as a bridge for communication between AI agents and external systems, handling message transformation and access controls.
Tool Abstraction Protocol (TAP): Developed by LangChain, this protocol provides a standardized way for agents to interact with various tools.
Open Agent Protocol (OAP): A community-developed protocol aimed at creating a standardized framework for agent communication.

Note that some agentic AI applications certainly use other means, e.g. proprietary REST APIs or local integrations.