Resources
Resources represent what data can be accessed in the Alien Giraffe access control model. This component manages the catalog of databases, object storage, data warehouses, and datasets. Policies reference resources in their resources: field to specify which data subjects are allowed to access.
Relationship to Policies
Section titled “Relationship to Policies”Resources are one of the five core components that policies coordinate. When you define a policy, the resources: field specifies what data is being protected. This component provides the infrastructure for cataloging and classifying those data systems—registering databases, discovering datasets, and tracking data sensitivity.
Overview
Section titled “Overview”Instead of managing access separately for each database, object store, or data warehouse, Alien Giraffe provides a centralized resource registry. Each resource is configured once with connection details, credentials, and metadata, then referenced in policies and access requests.
Why Catalog Resources?
- Centralized Management - One place to manage all data systems
- Data Discovery - Understand what data exists and where
- Access Control - Unified policies across heterogeneous systems
- Data Classification - Track data criticality and sensitivity
- Credential Rotation - Automate credential management
Supported Resource Types
Section titled “Supported Resource Types”Databases
Section titled “Databases”Relational Databases:
- PostgreSQL - Version 10+
- MySQL - Version 5.7+
- MariaDB - Version 10.3+
- Microsoft SQL Server - 2017+
- Oracle - 12c+
NoSQL Databases:
- MongoDB - Version 4.0+
- Redis - Version 5.0+
- Cassandra - Version 3.0+
- DynamoDB - AWS managed
- Elasticsearch - Version 7.0+
Graph Databases:
- Neo4j - Version 4.0+ (Coming Soon)
- ArangoDB - Version 3.7+ (Coming Soon)
Object Storage
Section titled “Object Storage”Cloud Object Storage:
- Amazon S3 - Including S3-compatible (MinIO, Ceph)
- Google Cloud Storage - Standard and regional buckets
- Azure Blob Storage - All tiers
- HDFS - Hadoop Distributed File System
File System Storage:
- NFS - Network File System shares
- SFTP - SSH File Transfer Protocol
- SSH - Direct SSH access to filesystem paths
Data Warehouses
Section titled “Data Warehouses”- Snowflake - All editions
- Google BigQuery - Standard and enterprise
- Amazon Redshift - Provisioned and Serverless
- Databricks - SQL warehouses
- Azure Synapse - Dedicated and serverless
Analytics & BI
Section titled “Analytics & BI”- Apache Druid - Real-time analytics (Coming Soon)
- ClickHouse - OLAP database (Coming Soon)
- Presto/Trino - Distributed SQL queries (Coming Soon)
- Apache Spark - Via JDBC/Thrift (Coming Soon)
Key Concepts
Section titled “Key Concepts”Resource Metadata
Section titled “Resource Metadata”Every resource includes descriptive metadata:
Identification:
name- Unique identifier for the resourcenamespace- Organizational grouping (production, staging, finance)description- Human-readable descriptionowner- Team or individual responsible
Classification:
criticality- Business impact (critical, high, medium, low, auto)dataTypes- Categories of data (pii, financial, logs, analytics)retention- Data retention period (30d, 1y, 7y, indefinite)
When classification is set to auto, Alien Giraffe automatically classifies the resource by scanning data patterns, column names, and content to identify sensitive data types and determine appropriate criticality levels.
Technical:
type- Database engine or storage systemversion- Software versionregion- Geographic location or cloud regionenvironment- Environment type (production, staging, development)
Dataset Organization
Section titled “Dataset Organization”Resources contain datasets (tables, collections, buckets):
For Databases:
- Schema/Database - Logical grouping (e.g.,
public,sales,analytics) - Tables/Collections - Individual data containers
- Views - Derived datasets
- Schemas - Structural definitions
For Object Storage:
- Buckets/Containers - Top-level organization
- Prefixes/Paths - Hierarchical organization (e.g.,
/logs/2025/) - Objects/Files - Individual data files
Connection Methods
Section titled “Connection Methods”Direct Connection:
- Alien Giraffe connects directly to the data source
- Requires network connectivity and credentials
- Best for cloud databases and managed services
Proxy Connection:
- Alien Giraffe connects via a proxy/bastion host
- Useful for on-premises databases or private networks
- Supports SSH tunneling and jump hosts
Agent-Based:
- Lightweight agent runs in the same network as data source
- Agent communicates with Alien Giraffe control plane
- Best for air-gapped or highly restricted environments
Configuration Examples
Section titled “Configuration Examples”PostgreSQL Database
Section titled “PostgreSQL Database”apiVersion: v1kind: Sourcemetadata: name: production-postgres namespace: production description: Main application database owner: backend-teamspec: type: postgresql version: "15.3"
connection: host: db.production.internal port: 5432 database: app_production ssl: true sslMode: require
credentials: secretRef: postgres-credentials # Reference to secret store rotation: enabled: true period: 30d
classification: criticality: critical dataTypes: [pii, financial, customer-data] retention: 7y
discovery: enabled: true # Automatically discover schemas/tables schedule: "0 2 * * *" # Daily at 2 AM includeSchemas: [public, sales, support] excludeTables: [temp_*, migration_*]Amazon S3 Bucket
Section titled “Amazon S3 Bucket”apiVersion: v1kind: Sourcemetadata: name: production-data-lake namespace: production description: Analytics data lake owner: data-engineeringspec: type: s3
connection: bucket: company-data-lake-prod region: us-west-2 endpoint: s3.us-west-2.amazonaws.com # Optional, for S3-compatible
credentials: awsCredentials: assumeRole: arn:aws:iam::123456789012:role/AlienGiraffeAccess externalId: unique-external-id
classification: criticality: high dataTypes: [analytics, logs, customer-data] retention: 2y
discovery: enabled: true includePrefix: [/analytics/, /logs/] excludePrefix: [/temp/, /_scratch/]NFS File System
Section titled “NFS File System”apiVersion: v1kind: Sourcemetadata: name: shared-file-storage namespace: production description: Shared NFS storage for data exports owner: data-engineeringspec: type: nfs
connection: host: nfs.production.internal port: 2049 exportPath: /exports/data version: nfs4 # NFS protocol version
paths: - /exports/data/analytics - /exports/data/reports - /exports/data/backups
credentials: mountOptions: - ro # Read-only mount - noexec - nosuid
classification: criticality: medium dataTypes: [analytics, reports] retention: 90d
discovery: enabled: true includePattern: ["*.csv", "*.parquet", "*.json"] excludePattern: ["*.tmp", "*.lock"]SFTP Server
Section titled “SFTP Server”apiVersion: v1kind: Sourcemetadata: name: sftp-data-transfer namespace: production description: SFTP server for secure file transfers owner: data-opsspec: type: sftp
connection: host: sftp.company.com port: 22 username: alien-giraffe
paths: - /data/incoming - /data/processed - /data/archive
credentials: authMethod: ssh-key privateKeyRef: sftp-private-key # Reference to SSH private key passphrase: sftp-key-passphrase # Optional passphrase for key
classification: criticality: high dataTypes: [customer-uploads, file-transfers] retention: 30d
discovery: enabled: true schedule: "0 */6 * * *" # Every 6 hours followSymlinks: falseSSH Filesystem Access
Section titled “SSH Filesystem Access”apiVersion: v1kind: Sourcemetadata: name: remote-file-server namespace: production description: Direct SSH access to remote filesystem owner: platform-teamspec: type: ssh-fs
connection: host: files.production.internal port: 22 username: data-access
paths: - /var/data/logs - /var/data/exports - /mnt/backup/datasets
credentials: authMethod: ssh-key privateKeyRef: ssh-access-key
classification: criticality: medium dataTypes: [logs, system-data] retention: 60d
discovery: enabled: true maxDepth: 3 # Limit directory traversal depth excludePattern: ["/var/data/logs/debug/*"]Snowflake Warehouse
Section titled “Snowflake Warehouse”apiVersion: v1kind: Sourcemetadata: name: analytics-warehouse namespace: production description: Enterprise data warehouse owner: data-teamspec: type: snowflake version: enterprise
connection: account: xy12345.us-east-1 warehouse: COMPUTE_WH database: ANALYTICS schema: PUBLIC
credentials: secretRef: snowflake-credentials rotation: enabled: true period: 90d
classification: criticality: high dataTypes: [analytics, aggregated-metrics] retention: indefinite
resourceManagement: warehouseSize: MEDIUM # Auto-suspend warehouse autoSuspend: 300 # After 5 minutes idle autoResume: trueMongoDB Cluster
Section titled “MongoDB Cluster”apiVersion: v1kind: Sourcemetadata: name: session-store namespace: production description: User session database owner: backend-teamspec: type: mongodb version: "6.0"
connection: connectionString: mongodb+srv://cluster.mongodb.net database: sessions replicaSet: rs0 readPreference: secondaryPreferred # Read from replicas
credentials: secretRef: mongodb-credentials
classification: criticality: high dataTypes: [session-data, user-preferences] retention: 90d
discovery: enabled: true includeCollections: [user_sessions, api_tokens]Read Replica Configuration
Section titled “Read Replica Configuration”Separate read/write access using replicas:
apiVersion: v1kind: Sourcemetadata: name: production-db-replica namespace: production description: Read-only replica for analytics owner: data-teamspec: type: postgresql version: "15.3"
connection: host: db-replica.production.internal port: 5432 database: app_production readOnly: true # Enforce read-only at connection level
credentials: secretRef: postgres-readonly-credentials
primarySource: production-postgres # Link to primary source
classification: criticality: medium dataTypes: [analytics, reporting] retention: 7yAutomatic Classification
Section titled “Automatic Classification”Let Alien Giraffe automatically classify data sensitivity:
apiVersion: v1kind: Sourcemetadata: name: new-database namespace: production description: Database with unknown data sensitivity owner: data-teamspec: type: postgresql version: "15.3"
connection: host: db.production.internal port: 5432 database: new_app
credentials: secretRef: postgres-credentials
classification: auto # Enable automatic classification
discovery: enabled: true # Required for auto-classification schedule: "0 2 * * *"
classification: enabled: true rules: - pattern: ".*email.*|.*e_mail.*" type: pii confidence: high
- pattern: ".*ssn.*|.*social_security.*" type: pii-sensitive confidence: high
- pattern: ".*credit_card.*|.*payment.*|.*card_number.*" type: financial confidence: high
- pattern: ".*password.*|.*secret.*|.*api_key.*" type: credentials confidence: high
profiling: enabled: true sampleSize: 10000 pii_detection: enabled: true methods: [pattern-matching, statistical-analysis]When classification: auto is set:
- Discovery scans all tables and columns
- Pattern matching identifies sensitive data types
- Statistical analysis detects PII patterns
- Criticality is calculated based on findings
- Classification is updated automatically
- Changes trigger policy re-evaluation
Auto-Classification Results:
# After auto-classification completes, the resource is updated:classification: criticality: high # Automatically determined dataTypes: [pii, financial] # Detected from data patterns retention: 7y # Based on detected data types lastClassified: 2025-11-19T02:00:00Z confidence: highDiscovery and Cataloging
Section titled “Discovery and Cataloging”Automatic Discovery
Section titled “Automatic Discovery”Alien Giraffe can automatically discover and catalog datasets:
Discovery Process:
- Connect to data source
- Query metadata tables/APIs
- Enumerate schemas, tables, columns
- Extract statistics (row counts, sizes)
- Classify data based on patterns
- Update catalog
What’s Discovered:
- Database schemas and tables
- Column names and types
- Primary/foreign keys
- Indexes and constraints
- Row counts and sizes
- Last modified timestamps
For Object Storage:
- Bucket/container names
- Directory structure
- File types and sizes
- Object counts
- Last modified times
Data Classification
Section titled “Data Classification”Automatically classify data sensitivity:
spec: discovery: classification: enabled: true rules: - pattern: ".*email.*" type: pii confidence: high
- pattern: ".*ssn.*|.*social_security.*" type: pii-sensitive confidence: high
- pattern: ".*credit_card.*|.*payment.*" type: financial confidence: high
- pattern: ".*password.*|.*secret.*" type: credentials confidence: highSampling and Profiling
Section titled “Sampling and Profiling”Profile data to understand contents:
spec: discovery: profiling: enabled: true sampleSize: 10000 # Sample 10k rows schedule: "0 3 * * 0" # Weekly on Sunday at 3 AM
metrics: - uniqueValues: true - nullPercentage: true - dataDistribution: true - valueRanges: true
pii_detection: enabled: true methods: [pattern-matching, statistical-analysis]Best Practices
Section titled “Best Practices”Organize with Namespaces
Section titled “Organize with Namespaces”Use namespaces to group related sources:
production- Production data sourcesstaging- Staging/QA environmentsdevelopment- Development databasesfinance- Finance-specific sourcesanalytics- Analytics and reporting sources
Use Read Replicas
Section titled “Use Read Replicas”Separate read and write access:
- Configure read replicas for analytics workloads
- Prevent analytics queries from impacting production
- Enable longer-running queries without locks
- Provide better availability for reporting
Implement Credential Rotation
Section titled “Implement Credential Rotation”Regularly rotate database credentials:
- Enable automatic rotation (30-90 day cycles)
- Store credentials in secret managers (AWS Secrets Manager, HashiCorp Vault)
- Use short-lived credentials when possible
- Audit credential access and usage
Classify Data Sensitivity
Section titled “Classify Data Sensitivity”Tag sources and datasets by sensitivity:
pii- Personally identifiable informationpii-sensitive- SSN, passport numbers, biometricsfinancial- Payment data, bank accountsconfidential- Trade secrets, strategic planspublic- Publicly available data
This enables risk-based policies and appropriate access controls.
Enable Discovery Carefully
Section titled “Enable Discovery Carefully”Balance automation with performance:
- Schedule discovery during low-traffic periods
- Exclude temporary tables and scratch spaces
- Limit scope to relevant schemas/databases
- Monitor discovery job performance
- Cache results to reduce repeated scans
Document Ownership
Section titled “Document Ownership”Assign clear ownership:
- Team responsible for the data source
- Contact for access requests
- Escalation path for incidents
- Data steward for governance
Monitor Source Health
Section titled “Monitor Source Health”Track data source availability and performance:
- Connection health checks
- Query performance metrics
- Credential validity
- Discovery job success rates
- Access patterns and anomalies
Common Patterns
Section titled “Common Patterns”Multi-Region Setup
Section titled “Multi-Region Setup”Configure sources across geographic regions:
---apiVersion: v1kind: Sourcemetadata: name: production-db-us namespace: productionspec: type: postgresql connection: host: db.us-west-2.internal region: us-west-2
---apiVersion: v1kind: Sourcemetadata: name: production-db-eu namespace: productionspec: type: postgresql connection: host: db.eu-west-1.internal region: eu-west-1Development vs Production
Section titled “Development vs Production”Separate configurations for different environments:
# Development - more permissivemetadata: name: dev-db namespace: developmentspec: classification: criticality: low dataTypes: [test-data]
---# Production - strict controlsmetadata: name: prod-db namespace: productionspec: classification: criticality: critical dataTypes: [pii, financial]Data Lake Organization
Section titled “Data Lake Organization”Organize object storage by purpose:
spec: type: s3 connection: bucket: data-lake discovery: includePrefix: - /raw/ # Raw ingestion - /processed/ # Cleaned data - /analytics/ # Analytics-ready - /archive/ # Long-term storage excludePrefix: - /temp/ - /_spark/Related Components
Section titled “Related Components”- Policies - Centralize resource definitions with other access control components
- Subjects - Define who can access resources
- Constraints - Set temporal limits on resource access
- Channels - Specify how resources are accessed
- Context - Provide organizational context for resource classification