AI Integration

Alien Giraffe is designed to safely bridge the gap between AI systems and your data infrastructure. This guide explains how LLMs interact with your data and how to configure AI integration securely.

How AI Integration Works

Alien Giraffe acts as a secure environment for your software to query your datasources and interact with AI models. Instead of giving LLMs direct database access, Alien Giraffe provides a controlled interface that enforces security policies and data contracts.

Using Alien Giraffe as Your Safe Data Room

Alien Giraffe creates a secure, controlled environment where LLMs and AI agents can access your data without compromising security. Think of it as a “safe data room” that sits between your AI systems and your actual data infrastructure.

How LLMs and Agents Access Data

There are multiple ways AI systems can work with your data through Alien Giraffe:

Direct LLM Integration - LLMs generate queries that Alien Giraffe validates and executes
MCP (Model Context Protocol) Connection - LLMs connect through MCP to query data sources via Alien Giraffe
AI Agent Workflows - Agents use Alien Giraffe as a tool to fetch and analyze data, then share results with LLMs

What AI Systems See and Don’t See

What AI Systems Can Access:

Schema Metadata - Column names, data types, and relationships that your A10e instance has access to
Sample Data - Random samples across columns to understand data patterns and distributions
Query Results - Only data that passes through your security filters and contracts
Aggregated Insights - Statistical summaries and patterns within approved boundaries

What AI Systems Never See:

Raw Database Connections - No direct access to your underlying data sources
Blocked Columns - Fields explicitly marked as off-limits in your security configuration
Full Datasets - Complete data dumps are prevented through sampling and result limits
Infrastructure Details - Database credentials, connection strings, or internal architecture

Benefits of the Safe Data Room Approach

Enhanced Security:

Zero Trust Architecture - AI systems can’t bypass security controls to access raw data
Granular Permissions - Control exactly which columns and rows AI can see
Audit Trail - Complete logging of what data AI systems accessed and when
Breach Containment - Even if an AI system is compromised, attackers can’t access your core data

Improved AI Performance:

Context-Aware Queries - AI understands your data structure without seeing sensitive content
Intelligent Sampling - AI gets representative data samples for better query generation
Schema Understanding - Models learn your data patterns for more accurate analysis
Iterative Refinement - AI can improve queries based on result feedback within safe boundaries

Operational Benefits:

Consistent Interface - Same API for all AI integrations regardless of underlying data sources
Cost Control - Prevent expensive full-table scans and runaway AI queries
Performance Optimization - Built-in caching and query optimization for AI workloads
Scalable Access - Multiple AI systems can safely access data simultaneously

Inner Workings: Source-Level Security Pipeline

When an AI system requests data through Alien Giraffe:

Request Validation - Check if the AI system has permission to access requested schemas
Source-Level Filtering - Apply security rules at the data source before any data is pulled into memory
Query Generation - Generate SQL queries that exclude blocked columns and apply masking at source
Contract Validation - Ensure queries comply with predefined data contracts and security policies
Execution Control - Apply resource limits, timeouts, and result size restrictions
Audit Logging - Record the interaction for compliance and monitoring
Response Delivery - Return pre-filtered, secure data to the AI system

Key Security Principle: Nothing that violates security configurations is ever loaded into memory. All filtering, masking, and column blocking happens at the data source level, ensuring that sensitive data never enters the Alien Giraffe processing environment.

Model Configuration Options

Cloud-Based Models

Use hosted AI services for convenience and latest capabilities:

[ai.model]
provider = "openai"
model = "gpt-4"
api_key = "${OPENAI_API_KEY}"

# Optional: Custom endpoint for Azure OpenAI, etc.
endpoint = "https://your-instance.openai.azure.com/"

Local Models

Run models on your infrastructure for maximum data privacy:

[ai.model]
provider = "local"
model = "llama2-13b"
endpoint = "http://localhost:11434"  # Local Ollama instance

# Or use a local API-compatible server
endpoint = "http://your-local-gpu-server:8080"

Security Best Practices

1. Limit Data Source Access

Configure your A10e instance to never have access to sensitive fields:

# PostgreSQL configuration with limited access
[[datasources.postgres]]
name = "analytics-safe"
host = "postgres.company.com"
database = "analytics"
username = "readonly_filtered_user"  # User with no access to PII tables

# Explicitly blocked sensitive columns
[security.global_rules]
blocked_columns = [
    "ssn", "credit_card", "passport_number",
    "full_name", "email", "phone_number"
]

2. Use Deterministic Configurations

For maximum security, avoid LLM exposure entirely by using deterministic contracts:

# Deterministic schema - no LLM involved
[[schemas.customer_analytics]]
schema = """
{
    "type": "object",
    "properties": {
        "customer_id": {"type": "integer"},
        "region": {"type": "string"},
        "purchase_amount": {"type": "number"},
        "category": {"type": "string"}
    }
}
"""

# Deterministic queries - predefined, no AI generation
[[queries.top_customers]]
sql = "SELECT customer_id, SUM(purchase_amount) FROM customer_analytics GROUP BY customer_id ORDER BY SUM(purchase_amount) DESC LIMIT 10"

3. LLM-Generated Deterministic Contracts

Generate secure contracts using AI, then deploy them deterministically:

import alien_giraffe

a10e = alien_giraffe.Client()

# Use LLM to generate the contract (one-time, in development)
schema = a10e.nl_schema(
    "Customer analytics data with anonymized IDs, geographic regions, "
    "purchase amounts, and product categories. No PII or payment details."
)

# Save the generated schema to your config file for deterministic use
print("Generated schema:", schema)
# Copy this to your a10e.toml file for production deployment

Data Privacy Controls

Column-Level AI Exposure Control

[ai.exposure_control]
# Columns that AI can see for query generation
allowed_columns = ["customer_id", "region", "purchase_amount", "category"]

# Columns completely hidden from AI models
hidden_columns = ["email", "phone", "address", "payment_method"]

# Columns shown to AI but with sample data only
sample_only_columns = ["customer_notes", "preferences"]

Query Result Filtering

[ai.result_filtering]
# Maximum rows AI can see in result samples
max_sample_rows = 100

# Remove specific patterns from AI-visible results that may bypass de-identification
redaction_patterns = [
    "\\b[A-Z]{2}\\d{8}\\b",  # Internal employee ID patterns
    "\\bTICKET-\\d{6}\\b",   # Support ticket references
    "\\b(?:Account|Acct)\\s*#?\\s*\\d{6,12}\\b"  # Account number references in free text
]

Best Practices Summary

Principle of Least Privilege - Configure data source access to exclude sensitive fields entirely
Use Deterministic Contracts - Pre-define schemas and queries when possible to avoid AI exposure
Layer Security Controls - Combine database-level, application-level, and AI-level restrictions
Test with Non-Production Data - Validate AI behavior with synthetic or anonymized datasets first

Ready to implement AI integration? Start with our Getting Started guide to set up your data sources, then configure AI models based on your security requirements.