Skip to content

AI Integration

Alien Giraffe is designed to safely bridge the gap between AI systems and your data infrastructure. This guide explains how LLMs interact with your data and how to configure AI integration securely.

Alien Giraffe acts as a secure environment for your software to query your datasources and interact with AI models. Instead of giving LLMs direct database access, Alien Giraffe provides a controlled interface that enforces security policies and data contracts.

Using Alien Giraffe as Your Safe Data Room

Section titled “Using Alien Giraffe as Your Safe Data Room”

Alien Giraffe creates a secure, controlled environment where LLMs and AI agents can access your data without compromising security. Think of it as a “safe data room” that sits between your AI systems and your actual data infrastructure.

There are multiple ways AI systems can work with your data through Alien Giraffe:

  1. Direct LLM Integration - LLMs generate queries that Alien Giraffe validates and executes
  2. MCP (Model Context Protocol) Connection - LLMs connect through MCP to query data sources via Alien Giraffe
  3. AI Agent Workflows - Agents use Alien Giraffe as a tool to fetch and analyze data, then share results with LLMs

What AI Systems Can Access:

  • Schema Metadata - Column names, data types, and relationships that your A10e instance has access to
  • Sample Data - Random samples across columns to understand data patterns and distributions
  • Query Results - Only data that passes through your security filters and contracts
  • Aggregated Insights - Statistical summaries and patterns within approved boundaries

What AI Systems Never See:

  • Raw Database Connections - No direct access to your underlying data sources
  • Blocked Columns - Fields explicitly marked as off-limits in your security configuration
  • Full Datasets - Complete data dumps are prevented through sampling and result limits
  • Infrastructure Details - Database credentials, connection strings, or internal architecture

Enhanced Security:

  • Zero Trust Architecture - AI systems can’t bypass security controls to access raw data
  • Granular Permissions - Control exactly which columns and rows AI can see
  • Audit Trail - Complete logging of what data AI systems accessed and when
  • Breach Containment - Even if an AI system is compromised, attackers can’t access your core data

Improved AI Performance:

  • Context-Aware Queries - AI understands your data structure without seeing sensitive content
  • Intelligent Sampling - AI gets representative data samples for better query generation
  • Schema Understanding - Models learn your data patterns for more accurate analysis
  • Iterative Refinement - AI can improve queries based on result feedback within safe boundaries

Operational Benefits:

  • Consistent Interface - Same API for all AI integrations regardless of underlying data sources
  • Cost Control - Prevent expensive full-table scans and runaway AI queries
  • Performance Optimization - Built-in caching and query optimization for AI workloads
  • Scalable Access - Multiple AI systems can safely access data simultaneously

Inner Workings: Source-Level Security Pipeline

Section titled “Inner Workings: Source-Level Security Pipeline”

When an AI system requests data through Alien Giraffe:

  1. Request Validation - Check if the AI system has permission to access requested schemas
  2. Source-Level Filtering - Apply security rules at the data source before any data is pulled into memory
  3. Query Generation - Generate SQL queries that exclude blocked columns and apply masking at source
  4. Contract Validation - Ensure queries comply with predefined data contracts and security policies
  5. Execution Control - Apply resource limits, timeouts, and result size restrictions
  6. Audit Logging - Record the interaction for compliance and monitoring
  7. Response Delivery - Return pre-filtered, secure data to the AI system

Key Security Principle: Nothing that violates security configurations is ever loaded into memory. All filtering, masking, and column blocking happens at the data source level, ensuring that sensitive data never enters the Alien Giraffe processing environment.

Use hosted AI services for convenience and latest capabilities:

[ai.model]
provider = "openai"
model = "gpt-4"
api_key = "${OPENAI_API_KEY}"
# Optional: Custom endpoint for Azure OpenAI, etc.
endpoint = "https://your-instance.openai.azure.com/"

Run models on your infrastructure for maximum data privacy:

[ai.model]
provider = "local"
model = "llama2-13b"
endpoint = "http://localhost:11434" # Local Ollama instance
# Or use a local API-compatible server
endpoint = "http://your-local-gpu-server:8080"

Configure your A10e instance to never have access to sensitive fields:

# PostgreSQL configuration with limited access
[[datasources.postgres]]
name = "analytics-safe"
host = "postgres.company.com"
database = "analytics"
username = "readonly_filtered_user" # User with no access to PII tables
# Explicitly blocked sensitive columns
[security.global_rules]
blocked_columns = [
"ssn", "credit_card", "passport_number",
"full_name", "email", "phone_number"
]

For maximum security, avoid LLM exposure entirely by using deterministic contracts:

# Deterministic schema - no LLM involved
[[schemas.customer_analytics]]
schema = """
{
"type": "object",
"properties": {
"customer_id": {"type": "integer"},
"region": {"type": "string"},
"purchase_amount": {"type": "number"},
"category": {"type": "string"}
}
}
"""
# Deterministic queries - predefined, no AI generation
[[queries.top_customers]]
sql = "SELECT customer_id, SUM(purchase_amount) FROM customer_analytics GROUP BY customer_id ORDER BY SUM(purchase_amount) DESC LIMIT 10"

Generate secure contracts using AI, then deploy them deterministically:

import alien_giraffe
a10e = alien_giraffe.Client()
# Use LLM to generate the contract (one-time, in development)
schema = a10e.nl_schema(
"Customer analytics data with anonymized IDs, geographic regions, "
"purchase amounts, and product categories. No PII or payment details."
)
# Save the generated schema to your config file for deterministic use
print("Generated schema:", schema)
# Copy this to your a10e.toml file for production deployment
[ai.exposure_control]
# Columns that AI can see for query generation
allowed_columns = ["customer_id", "region", "purchase_amount", "category"]
# Columns completely hidden from AI models
hidden_columns = ["email", "phone", "address", "payment_method"]
# Columns shown to AI but with sample data only
sample_only_columns = ["customer_notes", "preferences"]
[ai.result_filtering]
# Maximum rows AI can see in result samples
max_sample_rows = 100
# Remove specific patterns from AI-visible results that may bypass de-identification
redaction_patterns = [
"\\b[A-Z]{2}\\d{8}\\b", # Internal employee ID patterns
"\\bTICKET-\\d{6}\\b", # Support ticket references
"\\b(?:Account|Acct)\\s*#?\\s*\\d{6,12}\\b" # Account number references in free text
]
  1. Principle of Least Privilege - Configure data source access to exclude sensitive fields entirely
  2. Use Deterministic Contracts - Pre-define schemas and queries when possible to avoid AI exposure
  3. Layer Security Controls - Combine database-level, application-level, and AI-level restrictions
  4. Test with Non-Production Data - Validate AI behavior with synthetic or anonymized datasets first

Ready to implement AI integration? Start with our Getting Started guide to set up your data sources, then configure AI models based on your security requirements.