Gaia Agentic System

GAIA: General Artificial Intelligence Agent System

This project implements a dockerized FastAPI server for easy deployment and environment consistency. This AI agent system is designed to tackle complex questions tested on real L1 GAIA benchmark. It uses LangChain for agent creation and LangGraph for workflow orchestration, supports interchangeable LLM backends (OpenRouter cloud models or local Ollama models via LiteLLM), and includes web search and code execution.

This Minimum Viable Product (MVP) demonstrating core concepts of building modular AI agent systems.

Process Workflow

Here's a high-level overview of the GAIA Pathfinder Agent API process workflow:

  1. Receive Question: The API receives a question from a user through an HTTP POST request or via html UI.
  2. Parse Question: The question is parsed to determine the type of response required (e.g., text, code execution).
  3. Invoke Tool: Based on the question type, the corresponding tool is invoked:
    • text_generation_tool for simple text-based questions
    • code_execution_tool for code-related questions (secure sandboxing)
  4. Execute Tool:
    • For text generation: Use a language model to generate a response.
    • For code execution: Execute the provided code in a secure sandbox environment and return the output.
  5. Format Response: The tool's output is formatted into a standardized GaiaAnswer format, including answer text, metadata, and optional attachments (e.g., images).
  6. Return Response: The formatted response is returned to the user through an HTTP response.

Features

  • AI Agent: Uses LangChain's AgentExecutor for reasoning and LangGraph for workflow orchestration.
  • LLM Flexibility: Supports OpenRouter and local Ollama models via LangChain's integration with LiteLLM. Configurable via environment variables.
  • Tool Usage: Includes basic web search (Tavily) and Python code execution tools implemented as LangChain tools.
  • Structured Output: Leverages Pydantic models (GaiaAnswer) with multiple approaches to ensure structured responses from the LLM.
  • API Server: Built with FastAPI, providing asynchronous request handling and automatic OpenAPI documentation (/docs).
  • Configuration: Uses Pydantic Settings (config.py) to manage configuration via .env files and environment variables.
  • Containerization: Dockerized for consistent builds and deployment (Dockerfile).
  • Modularity: Code is organized into distinct components (API, Agent, Tools, Config, Schemas).

Architecture

Key Architectural Points

  • Agent Framework: Utilizes LangChain for agent creation and LangGraph for workflow orchestration, leveraging Pydantic for structured data handling (tools and responses).
  • Configuration: Centralized and type-safe configuration loading using Pydantic Settings.
  • API Design: FastAPI with Pydantic schemas ensures robust request/response handling and automatic documentation.
  • Modularity: Separation of concerns between API, agent logic, tools, and configuration.
  • LLM Abstraction: Uses LangChain's integration with LiteLLM to support different LLM providers (OpenRouter, Ollama) through configuration.
  • Workflow Management: Uses LangGraph for creating a directed graph of agent and tool nodes, enabling complex reasoning flows.
  • State Tracking: Maintains conversation state and tracks intermediate steps for debugging and transparency.
  • Termination Logic: Implements proper end conditions to ensure the agent workflow terminates correctly.
  • Memory Management: Uses LangGraph's memory checkpointer to maintain state between steps and across sessions.

Quick Start

Get up and running with GAIA Pathfinder Agent API in minutes:

  1. Clone the repository
  2. Create a minimal .env file
  3. Build and run with Docker
  4. Test the API
  5. Open the web interface

The system provides a comprehensive framework for building and deploying AI agents with advanced reasoning capabilities and tool integration.