Getting Started
This guide will help you get started with Alien Giraffe in just a few minutes. You’ll learn how to install the client, connect to a data source, and run your first query.
Prerequisites
Section titled “Prerequisites”- Python 3.8 or higher
- Access to a data source (AWS S3, PostgreSQL, etc.)
Install Alien Giraffe
Section titled “Install Alien Giraffe”pip install alien-giraffeQuick Example
Section titled “Quick Example”Here’s a complete example using data stored in AWS S3:
1. Create Configuration File
Section titled “1. Create Configuration File”Create an a10e.toml file in your project directory:
[[datasources.s3]]name = "my-data"region = "us-west-2"bucket = "my-company-data"access_key = "${AWS_ACCESS_KEY_ID}" # Uses environment variablesecret_key = "${AWS_SECRET_ACCESS_KEY}"2. Define Your Data Schema
Section titled “2. Define Your Data Schema”import alien_giraffeimport json
# Initialize clienta10e = alien_giraffe.Client()
# Define schema for patient data (automatically excludes PII)schema = { "type": "object", "properties": { "patient_id": { "type": "integer", "description": "Unique anonymous identifier for the patient" }, "diagnosis": { "type": "string", "description": "Patient's diagnosis code" }, "medicine": { "type": "string", "description": "Prescribed medicine for the patient" }, "income": { "type": "integer", "description": "Annual income (converted from string to int)" }, "income_bracket": { "type": "string", "description": "Income range: low (<$70k), medium ($70k-$120k), high (>$120k)", "enum": ["low", "medium", "high"] } }, "required": ["patient_id", "diagnosis"]}
# Add the schemaa10e.add_schema("patient_data", json.dumps(schema))3. Query Your Data
Section titled “3. Query Your Data”# Load data matching your schemaa10e.load("patient_data")
# Get a DataFrame handledf = a10e.df("patient_data")
# Run queries using pandas-like syntaxhigh_income_patients = df[df["income_bracket"] == "high"]print(high_income_patients.head())
# Or use SQLresults = a10e.sql(""" SELECT diagnosis, COUNT(*) as patient_count, income_bracket FROM patient_data WHERE income > 50000 GROUP BY diagnosis, income_bracket ORDER BY patient_count DESC LIMIT 10""")4. Export Results
Section titled “4. Export Results”# Export to pandas DataFramepandas_df = high_income_patients.to_pandas()
# Save to CSVhigh_income_patients.to_csv("high_income_patients.csv")Natural Language Schema Generation
Section titled “Natural Language Schema Generation”Don’t want to write JSON schemas? Let Alien Giraffe generate them:
# Describe your data in plain Englishschema = a10e.nl_schema( "Patient data with diagnosis and income information, excluding PII like names, emails, addresses")
# Use the generated schemaa10e.add_schema("patient_data", schema)What’s Next?
Section titled “What’s Next?”Learn More About Data Sources
Section titled “Learn More About Data Sources”- PostgreSQL Configuration - Set up secure database connections
- AWS S3 Advanced Guide - Optimize S3 performance and costs
- More Data Sources - Connect to Databricks, Snowflake, and more
Deploy in Production
Section titled “Deploy in Production”- Kubernetes Deployment - Deploy with Helm charts
- Security Best Practices - Configure access controls and data masking
Advanced Features
Section titled “Advanced Features”- Multi-Source Queries - Join data across S3, databases, and data warehouses
- Column-Level Security - Mask or block sensitive data automatically
- Performance Optimization - Handle datasets of any size efficiently
Getting Help
Section titled “Getting Help”- 📖 Check the Data Sources section for detailed configuration guides
- 📧 Contact Support for assistance
Ready to dive deeper? Explore our Data Sources documentation for comprehensive guides on connecting to your specific data infrastructure.