Skip to content

Download API

Weather file downloading and caching.

WeatherDownloader

idfkit.weather.download.WeatherDownloader

Download and cache weather files from climate.onebuilding.org.

Downloaded ZIP archives are extracted and cached locally so that subsequent requests for the same station and dataset are served from disk without a network call.

Examples:

from idfkit.weather import StationIndex, WeatherDownloader

station = StationIndex.load().search("chicago ohare")[0].station
downloader = WeatherDownloader()
files = downloader.download(station)
print(files.epw)

Parameters:

Name Type Description Default
cache_dir Path | None

Override the default cache directory.

None
max_age timedelta | float | None

Maximum age of cached files before re-downloading. Can be a timedelta or a number of seconds. If None (default), cached files never expire.

None
Note

Extracted .ddy files are rewritten in place to drop SizingPeriod:DesignDay objects whose numeric fields contain non-numeric placeholder tokens (e.g. N, N/A). These appear in some upstream archives when source data is unavailable and would otherwise cause EnergyPlus to fail with a type-constraint error.

Note

The cache has no size limit. For CI/CD environments with limited disk space, consider using clear_cache periodically or setting a max_age to force re-downloads of stale files.

Source code in src/idfkit/weather/download.py
class WeatherDownloader:
    """Download and cache weather files from climate.onebuilding.org.

    Downloaded ZIP archives are extracted and cached locally so that
    subsequent requests for the same station and dataset are served from
    disk without a network call.

    Examples:
        ```python
        from idfkit.weather import StationIndex, WeatherDownloader

        station = StationIndex.load().search("chicago ohare")[0].station
        downloader = WeatherDownloader()
        files = downloader.download(station)
        print(files.epw)
        ```

    Args:
        cache_dir: Override the default cache directory.
        max_age: Maximum age of cached files before re-downloading.
            Can be a [timedelta][datetime.timedelta] or a number of seconds.
            If ``None`` (default), cached files never expire.

    Note:
        Extracted ``.ddy`` files are rewritten in place to drop
        ``SizingPeriod:DesignDay`` objects whose numeric fields contain
        non-numeric placeholder tokens (e.g. ``N``, ``N/A``). These appear
        in some upstream archives when source data is unavailable and
        would otherwise cause EnergyPlus to fail with a type-constraint
        error.

    Note:
        The cache has no size limit. For CI/CD environments with limited disk
        space, consider using [clear_cache][idfkit.weather.download.WeatherDownloader.clear_cache] periodically or setting
        a ``max_age`` to force re-downloads of stale files.
    """

    __slots__ = ("_cache_dir", "_max_age_seconds")

    def __init__(
        self,
        cache_dir: Path | None = None,
        max_age: timedelta | float | None = None,
    ) -> None:
        self._cache_dir = cache_dir or default_cache_dir()
        if max_age is None:
            self._max_age_seconds: float | None = None
        elif isinstance(max_age, timedelta):
            self._max_age_seconds = max_age.total_seconds()
        else:
            self._max_age_seconds = float(max_age)

    def _is_stale(self, path: Path) -> bool:
        """Check if a cached file is older than max_age."""
        if self._max_age_seconds is None:
            return False
        if not path.exists():
            return True
        age = time.time() - path.stat().st_mtime
        return age > self._max_age_seconds

    @overload
    def download(self, station: WeatherStation) -> WeatherFiles: ...
    @overload
    def download(self, station: WeatherStation, *, only: None) -> WeatherFiles: ...
    @overload
    def download(self, station: WeatherStation, *, only: Iterable[str]) -> PartialWeatherFiles: ...
    def download(
        self,
        station: WeatherStation,
        *,
        only: Iterable[str] | None = None,
    ) -> WeatherFiles | PartialWeatherFiles:
        """Download and extract weather files for *station*.

        If the files are already cached and not stale, no network request is made.

        Args:
            station: The weather station to download files for.
            only: If given, extract only members whose suffix matches one of
                these values (e.g. ``{".epw"}`` or ``[".epw", ".ddy"]``).
                Each entry is normalised to a lowercase suffix with a leading
                dot (``"epw"`` and ``".EPW"`` both match ``.epw`` members).
                When ``None`` (default), every member of the archive is
                extracted and the result is required to contain a ``.epw``
                and a ``.ddy``.

        Returns:
            [WeatherFiles][idfkit.weather.download.WeatherFiles] for a full
            extraction, or
            [PartialWeatherFiles][idfkit.weather.download.PartialWeatherFiles]
            when ``only=`` is set.

        Raises:
            RuntimeError: If the download or extraction fails, or if a full
                extraction is missing a required ``.epw`` or ``.ddy`` file.
        """
        # Derive a cache subdirectory from the ZIP filename
        zip_filename = station.url.rsplit("/", maxsplit=1)[-1]
        stem = zip_filename.removesuffix(".zip")
        station_dir = self._cache_dir / "files" / str(station.wmo) / stem
        zip_path = station_dir / zip_filename

        # Download if not cached or if stale
        if not zip_path.exists() or self._is_stale(zip_path):
            station_dir.mkdir(parents=True, exist_ok=True)
            logger.info("Downloading weather data for %s (WMO %s)", station.display_name, station.wmo)
            try:
                req = Request(station.url, headers={"User-Agent": _USER_AGENT})  # noqa: S310
                with urlopen(req, timeout=120) as resp:  # noqa: S310
                    zip_path.write_bytes(resp.read())
            except (HTTPError, URLError, TimeoutError, OSError) as exc:
                msg = f"Failed to download weather data from {station.url}: {exc}"
                raise RuntimeError(msg) from exc
        else:
            logger.debug("Cache hit for station %s (WMO %s)", station.display_name, station.wmo)

        only_set = _normalise_suffixes(only)
        self._ensure_extracted(zip_path, station_dir, only_set)

        epw_path = self._find_file(station_dir, ".epw")
        ddy_path = self._find_file(station_dir, ".ddy")
        stat_path = self._find_file(station_dir, ".stat")

        if ddy_path is not None:
            sanitize_ddy_file(ddy_path)

        if only_set is not None:
            return PartialWeatherFiles(
                epw=epw_path,
                ddy=ddy_path,
                stat=stat_path,
                zip_path=zip_path,
                station=station,
            )

        # Full-extract path: EPW and DDY are required.
        if epw_path is None:
            msg = f"No .epw file found in downloaded archive for {station.display_name}"
            raise RuntimeError(msg)
        if ddy_path is None:
            msg = f"No .ddy file found in downloaded archive for {station.display_name}"
            raise RuntimeError(msg)
        return WeatherFiles(
            epw=epw_path,
            ddy=ddy_path,
            stat=stat_path,
            zip_path=zip_path,
            station=station,
        )

    @staticmethod
    def _ensure_extracted(
        zip_path: Path,
        station_dir: Path,
        only: frozenset[str] | None,
    ) -> None:
        """Extract members from *zip_path* into *station_dir*.

        If *only* is ``None``, every member is extracted (matching the
        historical ``extractall`` behaviour). Otherwise, only members whose
        lowercased suffix is in *only* are extracted. A member is skipped if
        an up-to-date copy already exists on disk (mtime ≥ ZIP mtime).
        """
        try:
            with zipfile.ZipFile(zip_path) as zf:
                # Compare against the ZIP's mtime rather than ``_is_stale`` —
                # ``zipfile`` preserves archive-internal timestamps, so the
                # extracted file's mtime can be arbitrarily old.
                zip_mtime = zip_path.stat().st_mtime
                for member in zf.namelist():
                    suffix = Path(member).suffix.lower()
                    if only is not None and suffix not in only:
                        continue
                    target = station_dir / Path(member).name
                    if target.exists() and target.stat().st_mtime >= zip_mtime:
                        continue
                    zf.extract(member, station_dir)
        except zipfile.BadZipFile as exc:
            msg = f"Downloaded file is not a valid ZIP archive: {zip_path}"
            raise RuntimeError(msg) from exc

    def get_epw(self, station: WeatherStation) -> Path:
        """Download and return the path to the EPW file.

        Extracts the full archive. To skip extraction of unwanted members,
        call ``download(station, only={".epw"}).epw`` directly.
        """
        return self.download(station).epw

    def get_ddy(self, station: WeatherStation) -> Path:
        """Download and return the path to the DDY file.

        Extracts the full archive. To skip extraction of unwanted members,
        call ``download(station, only={".ddy"}).ddy`` directly.
        """
        return self.download(station).ddy

    def _resolve_filename(self, filename: str, index: StationIndex | None) -> WeatherStation:
        """Resolve an EPW filename to a station, raising on failure."""
        if index is None:
            from .index import StationIndex as _StationIndex

            index = _StationIndex.load()
        stations = index.get_by_filename(filename)
        if not stations:
            msg = f"No weather station found for filename: {filename!r}"
            raise ValueError(msg)
        return stations[0]

    def get_epw_by_filename(
        self,
        filename: str,
        *,
        index: StationIndex | None = None,
    ) -> Path:
        """Download and return the EPW path for an EPW filename.

        Resolves the canonical EPW filename to a station via
        [StationIndex.get_by_filename][idfkit.weather.index.StationIndex.get_by_filename]
        and downloads the corresponding weather files.

        Args:
            filename: EPW filename or stem (with or without extension).
            index: A pre-loaded station index.  If ``None``, loads the
                default index via
                [StationIndex.load][idfkit.weather.index.StationIndex.load].

        Raises:
            ValueError: If the filename does not match any station.
        """
        return self.get_epw(self._resolve_filename(filename, index))

    def get_ddy_by_filename(
        self,
        filename: str,
        *,
        index: StationIndex | None = None,
    ) -> Path:
        """Download and return the DDY path for an EPW filename.

        Same as
        [get_epw_by_filename][idfkit.weather.download.WeatherDownloader.get_epw_by_filename]
        but returns the DDY file path.

        Args:
            filename: EPW filename or stem (with or without extension).
            index: A pre-loaded station index.

        Raises:
            ValueError: If the filename does not match any station.
        """
        return self.get_ddy(self._resolve_filename(filename, index))

    def clear_cache(self) -> None:
        """Remove all cached weather files.

        This removes the entire ``files/`` subdirectory within the cache,
        which contains all downloaded ZIP archives and extracted files.
        """
        files_dir = self._cache_dir / "files"
        if files_dir.exists():
            shutil.rmtree(files_dir)

    @staticmethod
    def _find_file(directory: Path, suffix: str) -> Path | None:
        """Find the first file with the given suffix in *directory*."""
        for p in directory.iterdir():
            if p.suffix.lower() == suffix.lower() and p.is_file():
                return p
        return None

download(station, *, only=None)

download(station: WeatherStation) -> WeatherFiles
download(
    station: WeatherStation, *, only: None
) -> WeatherFiles
download(
    station: WeatherStation, *, only: Iterable[str]
) -> PartialWeatherFiles

Download and extract weather files for station.

If the files are already cached and not stale, no network request is made.

Parameters:

Name Type Description Default
station WeatherStation

The weather station to download files for.

required
only Iterable[str] | None

If given, extract only members whose suffix matches one of these values (e.g. {".epw"} or [".epw", ".ddy"]). Each entry is normalised to a lowercase suffix with a leading dot ("epw" and ".EPW" both match .epw members). When None (default), every member of the archive is extracted and the result is required to contain a .epw and a .ddy.

None

Returns:

Type Description
WeatherFiles | PartialWeatherFiles

WeatherFiles for a full

WeatherFiles | PartialWeatherFiles

extraction, or

WeatherFiles | PartialWeatherFiles
WeatherFiles | PartialWeatherFiles

when only= is set.

Raises:

Type Description
RuntimeError

If the download or extraction fails, or if a full extraction is missing a required .epw or .ddy file.

Source code in src/idfkit/weather/download.py
def download(
    self,
    station: WeatherStation,
    *,
    only: Iterable[str] | None = None,
) -> WeatherFiles | PartialWeatherFiles:
    """Download and extract weather files for *station*.

    If the files are already cached and not stale, no network request is made.

    Args:
        station: The weather station to download files for.
        only: If given, extract only members whose suffix matches one of
            these values (e.g. ``{".epw"}`` or ``[".epw", ".ddy"]``).
            Each entry is normalised to a lowercase suffix with a leading
            dot (``"epw"`` and ``".EPW"`` both match ``.epw`` members).
            When ``None`` (default), every member of the archive is
            extracted and the result is required to contain a ``.epw``
            and a ``.ddy``.

    Returns:
        [WeatherFiles][idfkit.weather.download.WeatherFiles] for a full
        extraction, or
        [PartialWeatherFiles][idfkit.weather.download.PartialWeatherFiles]
        when ``only=`` is set.

    Raises:
        RuntimeError: If the download or extraction fails, or if a full
            extraction is missing a required ``.epw`` or ``.ddy`` file.
    """
    # Derive a cache subdirectory from the ZIP filename
    zip_filename = station.url.rsplit("/", maxsplit=1)[-1]
    stem = zip_filename.removesuffix(".zip")
    station_dir = self._cache_dir / "files" / str(station.wmo) / stem
    zip_path = station_dir / zip_filename

    # Download if not cached or if stale
    if not zip_path.exists() or self._is_stale(zip_path):
        station_dir.mkdir(parents=True, exist_ok=True)
        logger.info("Downloading weather data for %s (WMO %s)", station.display_name, station.wmo)
        try:
            req = Request(station.url, headers={"User-Agent": _USER_AGENT})  # noqa: S310
            with urlopen(req, timeout=120) as resp:  # noqa: S310
                zip_path.write_bytes(resp.read())
        except (HTTPError, URLError, TimeoutError, OSError) as exc:
            msg = f"Failed to download weather data from {station.url}: {exc}"
            raise RuntimeError(msg) from exc
    else:
        logger.debug("Cache hit for station %s (WMO %s)", station.display_name, station.wmo)

    only_set = _normalise_suffixes(only)
    self._ensure_extracted(zip_path, station_dir, only_set)

    epw_path = self._find_file(station_dir, ".epw")
    ddy_path = self._find_file(station_dir, ".ddy")
    stat_path = self._find_file(station_dir, ".stat")

    if ddy_path is not None:
        sanitize_ddy_file(ddy_path)

    if only_set is not None:
        return PartialWeatherFiles(
            epw=epw_path,
            ddy=ddy_path,
            stat=stat_path,
            zip_path=zip_path,
            station=station,
        )

    # Full-extract path: EPW and DDY are required.
    if epw_path is None:
        msg = f"No .epw file found in downloaded archive for {station.display_name}"
        raise RuntimeError(msg)
    if ddy_path is None:
        msg = f"No .ddy file found in downloaded archive for {station.display_name}"
        raise RuntimeError(msg)
    return WeatherFiles(
        epw=epw_path,
        ddy=ddy_path,
        stat=stat_path,
        zip_path=zip_path,
        station=station,
    )

WeatherFiles

idfkit.weather.download.WeatherFiles dataclass

Paths to a fully extracted weather bundle.

Returned by WeatherDownloader.download(station) (no only=). epw and ddy are guaranteed non-None — a missing one raises during download.

Attributes:

Name Type Description
epw Path

Path to the .epw file.

ddy Path

Path to the .ddy file.

stat Path | None

Path to the .stat file, or None if not included.

zip_path Path

Path to the original downloaded ZIP archive.

station WeatherStation

The station this download corresponds to.

Source code in src/idfkit/weather/download.py
@dataclass(frozen=True)
class WeatherFiles:
    """Paths to a fully extracted weather bundle.

    Returned by ``WeatherDownloader.download(station)`` (no ``only=``).
    ``epw`` and ``ddy`` are guaranteed non-``None`` — a missing one raises
    during download.

    Attributes:
        epw: Path to the ``.epw`` file.
        ddy: Path to the ``.ddy`` file.
        stat: Path to the ``.stat`` file, or ``None`` if not included.
        zip_path: Path to the original downloaded ZIP archive.
        station: The station this download corresponds to.
    """

    epw: Path
    ddy: Path
    stat: Path | None
    zip_path: Path
    station: WeatherStation

epw instance-attribute

ddy instance-attribute

stat instance-attribute

zip_path instance-attribute

station instance-attribute

PartialWeatherFiles

idfkit.weather.download.PartialWeatherFiles dataclass

Paths to a selectively extracted weather bundle.

Returned by WeatherDownloader.download(station, only=...). Any field whose suffix was not requested and not already cached on disk will be None.

Attributes:

Name Type Description
epw Path | None

Path to the .epw file, or None if not extracted.

ddy Path | None

Path to the .ddy file, or None if not extracted.

stat Path | None

Path to the .stat file, or None if not extracted.

zip_path Path

Path to the original downloaded ZIP archive.

station WeatherStation

The station this download corresponds to.

Source code in src/idfkit/weather/download.py
@dataclass(frozen=True)
class PartialWeatherFiles:
    """Paths to a selectively extracted weather bundle.

    Returned by ``WeatherDownloader.download(station, only=...)``. Any field
    whose suffix was not requested *and* not already cached on disk will be
    ``None``.

    Attributes:
        epw: Path to the ``.epw`` file, or ``None`` if not extracted.
        ddy: Path to the ``.ddy`` file, or ``None`` if not extracted.
        stat: Path to the ``.stat`` file, or ``None`` if not extracted.
        zip_path: Path to the original downloaded ZIP archive.
        station: The station this download corresponds to.
    """

    epw: Path | None
    ddy: Path | None
    stat: Path | None
    zip_path: Path
    station: WeatherStation

epw instance-attribute

ddy instance-attribute

stat instance-attribute

zip_path instance-attribute

station instance-attribute