Surfe Diem β€” Groundswell Direction (Cos Component) Forecast v1 (USA Southwest, 12h)

Model Description

A LightGBM regression model trained to predict cos component of groundswell direction β€” part of a circular decomposition to eliminate the 0/360Β° discontinuity 12 hours in advance using real-time buoy observations from NOAA's National Data Buoy Center (NDBC).

Developed by: Surfe Diem Model type: Gradient Boosted Decision Trees (LightGBM) Language: Python License: MIT

Intended Use

Primary Use Case

Predict the cos component of groundswell direction. Pair with the ground_dir_sin model to reconstruct full direction in degrees. Forecast horizon: 12 hours.

Out-of-Scope Use

  • Horizons other than 12 hours (separate models exist for 6h, 12h, 24h, 48h)
  • Wave height or period; must be paired with ground_dir_sin for meaningful direction output
  • Regions outside the California coast (model trained on USA Southwest NDBC stations only)
  • Real-time safety-critical applications without human oversight

Training Data

Source: NOAA NDBC Buoy Spectral Wave Density Data

Stations: 15 NDBC buoys along the California coast 46011, 46012, 46013, 46014, 46022, 46025, 46026, 46027, 46028, 46042, 46047, 46053, 46054, 46069, 46086

Records: ~2.08M observations (259 Parquet files with stdmet and spectral aligned columns)

Features:

  • Meteorological: wave height, period, direction, wind speed/direction, pressure, temperature
  • Spectral compression: 9 physics-informed features derived from ~150 raw spectral bands
    • Ground swell energy, direction, quality (< 0.08 Hz)
    • Mid-range energy, direction, quality (0.08–0.12 Hz)
    • Wind wave energy, direction, quality (> 0.12 Hz)
  • Circular decomposition: sin/cos encoding for all direction columns
  • Temporal lag features: 1h, 3h, 6h, 12h lags across all features

Split: 80/20 train/test, time-series ordered (no shuffle)

Model Performance

Test MAE: 0.1314 unit circle [-1, 1]

MAE is on the unit circle [-1, 1]. Combine with the sin model via atan2(sin, cos) to recover degrees.

Evaluated on held-out data with proper time-series validation (train on past, test on future).

Training Details

Algorithm: LightGBM Objective: Regression (MAE / L1 loss) Learning rate: 0.05 Num leaves: 31 Feature fraction: 0.9 Bagging fraction: 0.8 Max iterations: 2000 (early stopping, patience=50)

Feature engineering:

  • Station IDs encoded as fixed CategoricalDtype for inference consistency
  • Lag features filled with 0 for single-observation inference

How to Use

import lightgbm as lgb
import pandas as pd
import numpy as np
from huggingface_hub import hf_hub_download

# Load model
model_path = hf_hub_download(repo_id="surfe-diem/surfe-diem-v1-usa-southwest-ground-dir-cos-12h-model", filename="surfe_diem_v1_usa_southwest_ground_dir_cos_12h_model.txt")
model = lgb.Booster(model_file=model_path)

# Prepare observation with engineered features + lags + station_id
# See full inference pipeline in the GitHub repo
obs = pd.DataFrame({
    'wvht': [2.5], 'dpd': [12.0], 'apd': [8.5],
    'mwd': [270], 'wspd': [15.0], 'wdir': [280],
    'pres': [1013.0], 'atmp': [18.0], 'wtmp': [16.0],
    # ... + spectral band features + lag features + station_id
})

prediction = model.predict(obs)[0]  # unit circle [-1, 1]

Full inference pipeline available in the GitHub repo.

Limitations

  • No history for single observations: Lag features set to 0 for real-time single-row inference (slight accuracy degradation vs. buffered inference)
  • Regional specificity: Trained only on California coast buoys
  • Forecast horizon: 12 hours only β€” separate models cover 6h, 12h, 24h, 48h
  • Spectral dependency: Full accuracy requires spectral band data; older buoy files without spectral data contribute only standard met features

Citation

@misc{surfediem2026wave,
  author = {Surfe Diem},
  title = {Wave Forecasting Models v1 - USA Southwest},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/surfe-diem}}
}

Model Card Contact

For questions or issues, please open an issue in the GitHub repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train surfe-diem/surfe-diem-v1-usa-southwest-ground-dir-cos-12h-model