Skip to content

Resources

Resources represent what data can be accessed in the Alien Giraffe access control model. This component manages the catalog of databases, object storage, data warehouses, and datasets. Policies reference resources in their resources: field to specify which data subjects are allowed to access.

Resources are one of the five core components that policies coordinate. When you define a policy, the resources: field specifies what data is being protected. This component provides the infrastructure for cataloging and classifying those data systems—registering databases, discovering datasets, and tracking data sensitivity.

Instead of managing access separately for each database, object store, or data warehouse, Alien Giraffe provides a centralized resource registry. Each resource is configured once with connection details, credentials, and metadata, then referenced in policies and access requests.

Why Catalog Resources?

  • Centralized Management - One place to manage all data systems
  • Data Discovery - Understand what data exists and where
  • Access Control - Unified policies across heterogeneous systems
  • Data Classification - Track data criticality and sensitivity
  • Credential Rotation - Automate credential management

Relational Databases:

  • PostgreSQL - Version 10+
  • MySQL - Version 5.7+
  • MariaDB - Version 10.3+
  • Microsoft SQL Server - 2017+
  • Oracle - 12c+

NoSQL Databases:

  • MongoDB - Version 4.0+
  • Redis - Version 5.0+
  • Cassandra - Version 3.0+
  • DynamoDB - AWS managed
  • Elasticsearch - Version 7.0+

Graph Databases:

  • Neo4j - Version 4.0+ (Coming Soon)
  • ArangoDB - Version 3.7+ (Coming Soon)

Cloud Object Storage:

  • Amazon S3 - Including S3-compatible (MinIO, Ceph)
  • Google Cloud Storage - Standard and regional buckets
  • Azure Blob Storage - All tiers
  • HDFS - Hadoop Distributed File System

File System Storage:

  • NFS - Network File System shares
  • SFTP - SSH File Transfer Protocol
  • SSH - Direct SSH access to filesystem paths
  • Snowflake - All editions
  • Google BigQuery - Standard and enterprise
  • Amazon Redshift - Provisioned and Serverless
  • Databricks - SQL warehouses
  • Azure Synapse - Dedicated and serverless
  • Apache Druid - Real-time analytics (Coming Soon)
  • ClickHouse - OLAP database (Coming Soon)
  • Presto/Trino - Distributed SQL queries (Coming Soon)
  • Apache Spark - Via JDBC/Thrift (Coming Soon)

Every resource includes descriptive metadata:

Identification:

  • name - Unique identifier for the resource
  • namespace - Organizational grouping (production, staging, finance)
  • description - Human-readable description
  • owner - Team or individual responsible

Classification:

  • criticality - Business impact (critical, high, medium, low, auto)
  • dataTypes - Categories of data (pii, financial, logs, analytics)
  • retention - Data retention period (30d, 1y, 7y, indefinite)

When classification is set to auto, Alien Giraffe automatically classifies the resource by scanning data patterns, column names, and content to identify sensitive data types and determine appropriate criticality levels.

Technical:

  • type - Database engine or storage system
  • version - Software version
  • region - Geographic location or cloud region
  • environment - Environment type (production, staging, development)

Resources contain datasets (tables, collections, buckets):

For Databases:

  • Schema/Database - Logical grouping (e.g., public, sales, analytics)
  • Tables/Collections - Individual data containers
  • Views - Derived datasets
  • Schemas - Structural definitions

For Object Storage:

  • Buckets/Containers - Top-level organization
  • Prefixes/Paths - Hierarchical organization (e.g., /logs/2025/)
  • Objects/Files - Individual data files

Direct Connection:

  • Alien Giraffe connects directly to the data source
  • Requires network connectivity and credentials
  • Best for cloud databases and managed services

Proxy Connection:

  • Alien Giraffe connects via a proxy/bastion host
  • Useful for on-premises databases or private networks
  • Supports SSH tunneling and jump hosts

Agent-Based:

  • Lightweight agent runs in the same network as data source
  • Agent communicates with Alien Giraffe control plane
  • Best for air-gapped or highly restricted environments
apiVersion: v1
kind: Source
metadata:
name: production-postgres
namespace: production
description: Main application database
owner: backend-team
spec:
type: postgresql
version: "15.3"
connection:
host: db.production.internal
port: 5432
database: app_production
ssl: true
sslMode: require
credentials:
secretRef: postgres-credentials # Reference to secret store
rotation:
enabled: true
period: 30d
classification:
criticality: critical
dataTypes: [pii, financial, customer-data]
retention: 7y
discovery:
enabled: true # Automatically discover schemas/tables
schedule: "0 2 * * *" # Daily at 2 AM
includeSchemas: [public, sales, support]
excludeTables: [temp_*, migration_*]
apiVersion: v1
kind: Source
metadata:
name: production-data-lake
namespace: production
description: Analytics data lake
owner: data-engineering
spec:
type: s3
connection:
bucket: company-data-lake-prod
region: us-west-2
endpoint: s3.us-west-2.amazonaws.com # Optional, for S3-compatible
credentials:
awsCredentials:
assumeRole: arn:aws:iam::123456789012:role/AlienGiraffeAccess
externalId: unique-external-id
classification:
criticality: high
dataTypes: [analytics, logs, customer-data]
retention: 2y
discovery:
enabled: true
includePrefix: [/analytics/, /logs/]
excludePrefix: [/temp/, /_scratch/]
apiVersion: v1
kind: Source
metadata:
name: shared-file-storage
namespace: production
description: Shared NFS storage for data exports
owner: data-engineering
spec:
type: nfs
connection:
host: nfs.production.internal
port: 2049
exportPath: /exports/data
version: nfs4 # NFS protocol version
paths:
- /exports/data/analytics
- /exports/data/reports
- /exports/data/backups
credentials:
mountOptions:
- ro # Read-only mount
- noexec
- nosuid
classification:
criticality: medium
dataTypes: [analytics, reports]
retention: 90d
discovery:
enabled: true
includePattern: ["*.csv", "*.parquet", "*.json"]
excludePattern: ["*.tmp", "*.lock"]
apiVersion: v1
kind: Source
metadata:
name: sftp-data-transfer
namespace: production
description: SFTP server for secure file transfers
owner: data-ops
spec:
type: sftp
connection:
host: sftp.company.com
port: 22
username: alien-giraffe
paths:
- /data/incoming
- /data/processed
- /data/archive
credentials:
authMethod: ssh-key
privateKeyRef: sftp-private-key # Reference to SSH private key
passphrase: sftp-key-passphrase # Optional passphrase for key
classification:
criticality: high
dataTypes: [customer-uploads, file-transfers]
retention: 30d
discovery:
enabled: true
schedule: "0 */6 * * *" # Every 6 hours
followSymlinks: false
apiVersion: v1
kind: Source
metadata:
name: remote-file-server
namespace: production
description: Direct SSH access to remote filesystem
owner: platform-team
spec:
type: ssh-fs
connection:
host: files.production.internal
port: 22
username: data-access
paths:
- /var/data/logs
- /var/data/exports
- /mnt/backup/datasets
credentials:
authMethod: ssh-key
privateKeyRef: ssh-access-key
classification:
criticality: medium
dataTypes: [logs, system-data]
retention: 60d
discovery:
enabled: true
maxDepth: 3 # Limit directory traversal depth
excludePattern: ["/var/data/logs/debug/*"]
apiVersion: v1
kind: Source
metadata:
name: analytics-warehouse
namespace: production
description: Enterprise data warehouse
owner: data-team
spec:
type: snowflake
version: enterprise
connection:
account: xy12345.us-east-1
warehouse: COMPUTE_WH
database: ANALYTICS
schema: PUBLIC
credentials:
secretRef: snowflake-credentials
rotation:
enabled: true
period: 90d
classification:
criticality: high
dataTypes: [analytics, aggregated-metrics]
retention: indefinite
resourceManagement:
warehouseSize: MEDIUM # Auto-suspend warehouse
autoSuspend: 300 # After 5 minutes idle
autoResume: true
apiVersion: v1
kind: Source
metadata:
name: session-store
namespace: production
description: User session database
owner: backend-team
spec:
type: mongodb
version: "6.0"
connection:
connectionString: mongodb+srv://cluster.mongodb.net
database: sessions
replicaSet: rs0
readPreference: secondaryPreferred # Read from replicas
credentials:
secretRef: mongodb-credentials
classification:
criticality: high
dataTypes: [session-data, user-preferences]
retention: 90d
discovery:
enabled: true
includeCollections: [user_sessions, api_tokens]

Separate read/write access using replicas:

apiVersion: v1
kind: Source
metadata:
name: production-db-replica
namespace: production
description: Read-only replica for analytics
owner: data-team
spec:
type: postgresql
version: "15.3"
connection:
host: db-replica.production.internal
port: 5432
database: app_production
readOnly: true # Enforce read-only at connection level
credentials:
secretRef: postgres-readonly-credentials
primarySource: production-postgres # Link to primary source
classification:
criticality: medium
dataTypes: [analytics, reporting]
retention: 7y

Let Alien Giraffe automatically classify data sensitivity:

apiVersion: v1
kind: Source
metadata:
name: new-database
namespace: production
description: Database with unknown data sensitivity
owner: data-team
spec:
type: postgresql
version: "15.3"
connection:
host: db.production.internal
port: 5432
database: new_app
credentials:
secretRef: postgres-credentials
classification: auto # Enable automatic classification
discovery:
enabled: true # Required for auto-classification
schedule: "0 2 * * *"
classification:
enabled: true
rules:
- pattern: ".*email.*|.*e_mail.*"
type: pii
confidence: high
- pattern: ".*ssn.*|.*social_security.*"
type: pii-sensitive
confidence: high
- pattern: ".*credit_card.*|.*payment.*|.*card_number.*"
type: financial
confidence: high
- pattern: ".*password.*|.*secret.*|.*api_key.*"
type: credentials
confidence: high
profiling:
enabled: true
sampleSize: 10000
pii_detection:
enabled: true
methods: [pattern-matching, statistical-analysis]

When classification: auto is set:

  • Discovery scans all tables and columns
  • Pattern matching identifies sensitive data types
  • Statistical analysis detects PII patterns
  • Criticality is calculated based on findings
  • Classification is updated automatically
  • Changes trigger policy re-evaluation

Auto-Classification Results:

# After auto-classification completes, the resource is updated:
classification:
criticality: high # Automatically determined
dataTypes: [pii, financial] # Detected from data patterns
retention: 7y # Based on detected data types
lastClassified: 2025-11-19T02:00:00Z
confidence: high

Alien Giraffe can automatically discover and catalog datasets:

Discovery Process:

  1. Connect to data source
  2. Query metadata tables/APIs
  3. Enumerate schemas, tables, columns
  4. Extract statistics (row counts, sizes)
  5. Classify data based on patterns
  6. Update catalog

What’s Discovered:

  • Database schemas and tables
  • Column names and types
  • Primary/foreign keys
  • Indexes and constraints
  • Row counts and sizes
  • Last modified timestamps

For Object Storage:

  • Bucket/container names
  • Directory structure
  • File types and sizes
  • Object counts
  • Last modified times

Automatically classify data sensitivity:

spec:
discovery:
classification:
enabled: true
rules:
- pattern: ".*email.*"
type: pii
confidence: high
- pattern: ".*ssn.*|.*social_security.*"
type: pii-sensitive
confidence: high
- pattern: ".*credit_card.*|.*payment.*"
type: financial
confidence: high
- pattern: ".*password.*|.*secret.*"
type: credentials
confidence: high

Profile data to understand contents:

spec:
discovery:
profiling:
enabled: true
sampleSize: 10000 # Sample 10k rows
schedule: "0 3 * * 0" # Weekly on Sunday at 3 AM
metrics:
- uniqueValues: true
- nullPercentage: true
- dataDistribution: true
- valueRanges: true
pii_detection:
enabled: true
methods: [pattern-matching, statistical-analysis]

Use namespaces to group related sources:

  • production - Production data sources
  • staging - Staging/QA environments
  • development - Development databases
  • finance - Finance-specific sources
  • analytics - Analytics and reporting sources

Separate read and write access:

  • Configure read replicas for analytics workloads
  • Prevent analytics queries from impacting production
  • Enable longer-running queries without locks
  • Provide better availability for reporting

Regularly rotate database credentials:

  • Enable automatic rotation (30-90 day cycles)
  • Store credentials in secret managers (AWS Secrets Manager, HashiCorp Vault)
  • Use short-lived credentials when possible
  • Audit credential access and usage

Tag sources and datasets by sensitivity:

  • pii - Personally identifiable information
  • pii-sensitive - SSN, passport numbers, biometrics
  • financial - Payment data, bank accounts
  • confidential - Trade secrets, strategic plans
  • public - Publicly available data

This enables risk-based policies and appropriate access controls.

Balance automation with performance:

  • Schedule discovery during low-traffic periods
  • Exclude temporary tables and scratch spaces
  • Limit scope to relevant schemas/databases
  • Monitor discovery job performance
  • Cache results to reduce repeated scans

Assign clear ownership:

  • Team responsible for the data source
  • Contact for access requests
  • Escalation path for incidents
  • Data steward for governance

Track data source availability and performance:

  • Connection health checks
  • Query performance metrics
  • Credential validity
  • Discovery job success rates
  • Access patterns and anomalies

Configure sources across geographic regions:

---
apiVersion: v1
kind: Source
metadata:
name: production-db-us
namespace: production
spec:
type: postgresql
connection:
host: db.us-west-2.internal
region: us-west-2
---
apiVersion: v1
kind: Source
metadata:
name: production-db-eu
namespace: production
spec:
type: postgresql
connection:
host: db.eu-west-1.internal
region: eu-west-1

Separate configurations for different environments:

# Development - more permissive
metadata:
name: dev-db
namespace: development
spec:
classification:
criticality: low
dataTypes: [test-data]
---
# Production - strict controls
metadata:
name: prod-db
namespace: production
spec:
classification:
criticality: critical
dataTypes: [pii, financial]

Organize object storage by purpose:

spec:
type: s3
connection:
bucket: data-lake
discovery:
includePrefix:
- /raw/ # Raw ingestion
- /processed/ # Cleaned data
- /analytics/ # Analytics-ready
- /archive/ # Long-term storage
excludePrefix:
- /temp/
- /_spark/
  • Policies - Centralize resource definitions with other access control components
  • Subjects - Define who can access resources
  • Constraints - Set temporal limits on resource access
  • Channels - Specify how resources are accessed
  • Context - Provide organizational context for resource classification