Skip to content

Cloud Simulations (S3)

This example demonstrates running simulations with results stored in Amazon S3, suitable for distributed cloud workflows.

Prerequisites

Install S3 support:

pip install idfkit[s3]

Configure AWS credentials:

# Option 1: Environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1

# Option 2: AWS credentials file (~/.aws/credentials)
# Option 3: IAM role (on EC2, ECS, Lambda)

Basic S3 Usage

from idfkit import load_idf
from idfkit.simulation import simulate, S3FileSystem

# Create S3-backed filesystem
fs = S3FileSystem(
    bucket="my-simulations",
    prefix="project-x/",
)

# Run simulation with S3 storage
model = load_idf("building.idf")
result = simulate(
    model,
    "weather.epw",
    output_dir="run-001",  # Required with fs
    fs=fs,
)

# Results are now in s3://my-simulations/project-x/run-001/
print(f"Results stored at: {result.run_dir}")

Cloud Workflow Pattern

For large-scale simulations on AWS Batch, Kubernetes, or similar:

Step 1: Create Jobs Locally

from idfkit.simulation import SimulationJob, S3FileSystem

fs = S3FileSystem(bucket="simulations", prefix="study-001/")

# Create job specifications
jobs = []
for i, variant in enumerate(model_variants):
    jobs.append(
        SimulationJob(
            model=variant,
            weather="weather.epw",
            label=f"case-{i:04d}",
            output_dir=f"case-{i:04d}",
        )
    )

# Save job specs (e.g., as JSON or pickle)

Step 2: Run on Cloud Workers

Each worker runs a subset of jobs:

# worker.py (runs on AWS Batch, Kubernetes, etc.)
from idfkit.simulation import simulate, S3FileSystem

fs = S3FileSystem(bucket="simulations", prefix="study-001/")

# Run single job
result = simulate(
    model,
    weather_path,  # Must be local
    output_dir=f"case-{job_id}",
    fs=fs,
)

# Results uploaded to S3 automatically

Step 3: Collect Results

From any machine with S3 access:

from idfkit.simulation import SimulationResult, S3FileSystem

fs = S3FileSystem(bucket="simulations", prefix="study-001/")

# Reconstruct results
results = []
for i in range(num_cases):
    result = SimulationResult.from_directory(f"case-{i:04d}", fs=fs)
    results.append(result)

# Analyze
for i, result in enumerate(results):
    ts = result.sql.get_timeseries(
        "Zone Mean Air Temperature",
        "ZONE 1",
    )
    print(f"Case {i}: max temp = {max(ts.values):.1f}°C")

Batch Processing with S3

from idfkit.simulation import simulate_batch, SimulationJob, S3FileSystem

fs = S3FileSystem(bucket="my-bucket", prefix="batch-42/")

jobs = [
    SimulationJob(
        model=variant,
        weather="weather.epw",
        label=f"case-{i}",
        output_dir=f"case-{i}",
    )
    for i, variant in enumerate(variants)
]

batch = simulate_batch(jobs, max_workers=4, fs=fs)

# All results stored in S3
for i, result in enumerate(batch):
    print(f"Case {i}: s3://my-bucket/batch-42/case-{i}/")

S3-Compatible Services

Works with MinIO, LocalStack, and other S3-compatible APIs:

MinIO

fs = S3FileSystem(
    bucket="local-bucket",
    endpoint_url="http://localhost:9000",
    aws_access_key_id="minioadmin",
    aws_secret_access_key="minioadmin",
)

LocalStack

fs = S3FileSystem(
    bucket="test-bucket",
    endpoint_url="http://localhost:4566",
    region_name="us-east-1",
)

DigitalOcean Spaces

fs = S3FileSystem(
    bucket="my-space",
    endpoint_url="https://nyc3.digitaloceanspaces.com",
    region_name="nyc3",
)

Weather File Handling

Important: Weather files must be local. Download before simulating:

from idfkit.weather import StationIndex, WeatherDownloader

# Download weather file locally
index = StationIndex.load()
station = index.search("chicago")[0].station
downloader = WeatherDownloader()
files = downloader.download(station)

# Then use local weather with S3 output
fs = S3FileSystem(bucket="results", prefix="study/")
result = simulate(
    model,
    files.epw,  # Local path
    output_dir="run-001",
    fs=fs,
)

Performance Considerations

Minimize S3 Round-Trips

# Query results once, process locally
result = SimulationResult.from_directory("run-001", fs=fs)

# This downloads the SQL file
sql = result.sql

# Multiple queries are local (file is cached)
ts1 = sql.get_timeseries("Zone Mean Air Temperature", "ZONE 1")
ts2 = sql.get_timeseries("Zone Air Relative Humidity", "ZONE 1")

Batch Downloads

For heavy analysis, download everything locally first:

import tempfile
from pathlib import Path

with tempfile.TemporaryDirectory() as tmp:
    # Download all files for a run
    for obj in fs.glob("run-001", "*"):
        data = fs.read_bytes(obj)
        local_path = Path(tmp) / Path(obj).name
        local_path.write_bytes(data)

    # Now use local result
    result = SimulationResult.from_directory(tmp)
    # Multiple queries without network calls

Cost Optimization

  • Store only necessary output files (filter before upload)
  • Use S3 lifecycle policies to move old results to Glacier
  • Consider S3 Intelligent Tiering for varying access patterns
  • Use regional buckets close to compute resources

Security

  • Use IAM roles instead of access keys when possible
  • Apply bucket policies to restrict access
  • Enable S3 versioning for important results
  • Consider server-side encryption

See Also