Skip to content

Getting Started

This guide will help you get started with Alien Giraffe in just a few minutes. You’ll learn how to install the client, connect to a data source, and run your first query.

  • Python 3.8 or higher
  • Access to a data source (AWS S3, PostgreSQL, etc.)
Terminal window
pip install alien-giraffe

Here’s a complete example using data stored in AWS S3:

Create an a10e.toml file in your project directory:

[[datasources.s3]]
name = "my-data"
region = "us-west-2"
bucket = "my-company-data"
access_key = "${AWS_ACCESS_KEY_ID}" # Uses environment variable
secret_key = "${AWS_SECRET_ACCESS_KEY}"
import alien_giraffe
import json
# Initialize client
a10e = alien_giraffe.Client()
# Define schema for patient data (automatically excludes PII)
schema = {
"type": "object",
"properties": {
"patient_id": {
"type": "integer",
"description": "Unique anonymous identifier for the patient"
},
"diagnosis": {
"type": "string",
"description": "Patient's diagnosis code"
},
"medicine": {
"type": "string",
"description": "Prescribed medicine for the patient"
},
"income": {
"type": "integer",
"description": "Annual income (converted from string to int)"
},
"income_bracket": {
"type": "string",
"description": "Income range: low (<$70k), medium ($70k-$120k), high (>$120k)",
"enum": ["low", "medium", "high"]
}
},
"required": ["patient_id", "diagnosis"]
}
# Add the schema
a10e.add_schema("patient_data", json.dumps(schema))
# Load data matching your schema
a10e.load("patient_data")
# Get a DataFrame handle
df = a10e.df("patient_data")
# Run queries using pandas-like syntax
high_income_patients = df[df["income_bracket"] == "high"]
print(high_income_patients.head())
# Or use SQL
results = a10e.sql("""
SELECT diagnosis, COUNT(*) as patient_count, income_bracket
FROM patient_data
WHERE income > 50000
GROUP BY diagnosis, income_bracket
ORDER BY patient_count DESC
LIMIT 10
""")
# Export to pandas DataFrame
pandas_df = high_income_patients.to_pandas()
# Save to CSV
high_income_patients.to_csv("high_income_patients.csv")

Don’t want to write JSON schemas? Let Alien Giraffe generate them:

# Describe your data in plain English
schema = a10e.nl_schema(
"Patient data with diagnosis and income information, excluding PII like names, emails, addresses"
)
# Use the generated schema
a10e.add_schema("patient_data", schema)
  • Multi-Source Queries - Join data across S3, databases, and data warehouses
  • Column-Level Security - Mask or block sensitive data automatically
  • Performance Optimization - Handle datasets of any size efficiently
  • 📖 Check the Data Sources section for detailed configuration guides
  • 📧 Contact Support for assistance

Ready to dive deeper? Explore our Data Sources documentation for comprehensive guides on connecting to your specific data infrastructure.