Cloud Simulations (S3)¶
This example demonstrates running simulations with results stored in Amazon S3, suitable for distributed cloud workflows.
Prerequisites¶
Install S3 support:
Configure AWS credentials:
# Option 1: Environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
# Option 2: AWS credentials file (~/.aws/credentials)
# Option 3: IAM role (on EC2, ECS, Lambda)
Basic S3 Usage¶
from idfkit import load_idf
from idfkit.simulation import simulate, S3FileSystem
# Create S3-backed filesystem
fs = S3FileSystem(
bucket="my-simulations",
prefix="project-x/",
)
# Run simulation with S3 storage
model = load_idf("building.idf")
result = simulate(
model,
"weather.epw",
output_dir="run-001", # Required with fs
fs=fs,
)
# Results are now in s3://my-simulations/project-x/run-001/
print(f"Results stored at: {result.run_dir}")
Cloud Workflow Pattern¶
For large-scale simulations on AWS Batch, Kubernetes, or similar:
Step 1: Create Jobs Locally¶
from idfkit.simulation import SimulationJob, S3FileSystem
fs = S3FileSystem(bucket="simulations", prefix="study-001/")
# Create job specifications
jobs = []
for i, variant in enumerate(model_variants):
jobs.append(
SimulationJob(
model=variant,
weather="weather.epw",
label=f"case-{i:04d}",
output_dir=f"case-{i:04d}",
)
)
# Save job specs (e.g., as JSON or pickle)
Step 2: Run on Cloud Workers¶
Each worker runs a subset of jobs:
# worker.py (runs on AWS Batch, Kubernetes, etc.)
from idfkit.simulation import simulate, S3FileSystem
fs = S3FileSystem(bucket="simulations", prefix="study-001/")
# Run single job
result = simulate(
model,
weather_path, # Must be local
output_dir=f"case-{job_id}",
fs=fs,
)
# Results uploaded to S3 automatically
Step 3: Collect Results¶
From any machine with S3 access:
from idfkit.simulation import SimulationResult, S3FileSystem
fs = S3FileSystem(bucket="simulations", prefix="study-001/")
# Reconstruct results
results = []
for i in range(num_cases):
result = SimulationResult.from_directory(f"case-{i:04d}", fs=fs)
results.append(result)
# Analyze
for i, result in enumerate(results):
ts = result.sql.get_timeseries(
"Zone Mean Air Temperature",
"ZONE 1",
)
print(f"Case {i}: max temp = {max(ts.values):.1f}°C")
Batch Processing with S3¶
from idfkit.simulation import simulate_batch, SimulationJob, S3FileSystem
fs = S3FileSystem(bucket="my-bucket", prefix="batch-42/")
jobs = [
SimulationJob(
model=variant,
weather="weather.epw",
label=f"case-{i}",
output_dir=f"case-{i}",
)
for i, variant in enumerate(variants)
]
batch = simulate_batch(jobs, max_workers=4, fs=fs)
# All results stored in S3
for i, result in enumerate(batch):
print(f"Case {i}: s3://my-bucket/batch-42/case-{i}/")
S3-Compatible Services¶
Works with MinIO, LocalStack, and other S3-compatible APIs:
MinIO¶
fs = S3FileSystem(
bucket="local-bucket",
endpoint_url="http://localhost:9000",
aws_access_key_id="minioadmin",
aws_secret_access_key="minioadmin",
)
LocalStack¶
fs = S3FileSystem(
bucket="test-bucket",
endpoint_url="http://localhost:4566",
region_name="us-east-1",
)
DigitalOcean Spaces¶
fs = S3FileSystem(
bucket="my-space",
endpoint_url="https://nyc3.digitaloceanspaces.com",
region_name="nyc3",
)
Weather File Handling¶
Important: Weather files must be local. Download before simulating:
from idfkit.weather import StationIndex, WeatherDownloader
# Download weather file locally
index = StationIndex.load()
station = index.search("chicago")[0].station
downloader = WeatherDownloader()
files = downloader.download(station)
# Then use local weather with S3 output
fs = S3FileSystem(bucket="results", prefix="study/")
result = simulate(
model,
files.epw, # Local path
output_dir="run-001",
fs=fs,
)
Performance Considerations¶
Minimize S3 Round-Trips¶
# Query results once, process locally
result = SimulationResult.from_directory("run-001", fs=fs)
# This downloads the SQL file
sql = result.sql
# Multiple queries are local (file is cached)
ts1 = sql.get_timeseries("Zone Mean Air Temperature", "ZONE 1")
ts2 = sql.get_timeseries("Zone Air Relative Humidity", "ZONE 1")
Batch Downloads¶
For heavy analysis, download everything locally first:
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as tmp:
# Download all files for a run
for obj in fs.glob("run-001", "*"):
data = fs.read_bytes(obj)
local_path = Path(tmp) / Path(obj).name
local_path.write_bytes(data)
# Now use local result
result = SimulationResult.from_directory(tmp)
# Multiple queries without network calls
Cost Optimization¶
- Store only necessary output files (filter before upload)
- Use S3 lifecycle policies to move old results to Glacier
- Consider S3 Intelligent Tiering for varying access patterns
- Use regional buckets close to compute resources
Security¶
- Use IAM roles instead of access keys when possible
- Apply bucket policies to restrict access
- Enable S3 versioning for important results
- Consider server-side encryption
See Also¶
- Cloud Storage Concepts — Architecture details
- Batch Processing — Parallel execution
- Caching — Local result caching