Architecture: Physics-Informed Hybrid ML Models
Dec 16, 2025•Machine LearningArchitecture•12 min read
This entry documents our hybrid physics-ML architecture, inspired by Raissi et al. (2019) on Physics-Informed Neural Networks. The core insight: physics constraints prevent unreasonable predictions while ML corrects systematic biases.
The Hybrid Formula
y_final = y_physics + f_ML(x, y_physics)
Where y_physics is the physics-based baseline prediction (dispersion relation, shoaling, refraction) and f_ML is the ML residual correction model that learns systematic biases from historical data.
Why Hybrid?
- Physics constraints prevent unreasonable predictions (negative wave heights, impossible arrival times)
- ML corrects systematic biases that pure physics models miss (local bathymetry effects, station-specific errors)
- Better extrapolation to unseen conditions than pure ML
- Interpretable decomposition: users can see physics vs. correction contributions
Model Components
Our ensemble consists of five specialized components, each targeting a specific prediction task:
- Binary Classification (XGBoost): Surfability prediction with 300 estimators, max_depth=6
- Wave Height Regression (Random Forest): Residual correction targeting RMSE < 0.15m
- Wave Period Regression (Random Forest): Period prediction targeting RMSE < 0.5s
- Quality Score (Ordinal Classifier): 5-class surf quality from Flat to Excellent
- Uncertainty Estimation: Ensemble variance from tree prediction spread
Feature Engineering: 83 Hybrid Features
Features span eight categories, combining physical parameters with derived ML features:
- Source Wave (8): Hs, Tp, direction, steepness, spectrum width
- Propagation (9): Distance, bearing, alignment, travel time
- Bathymetry (4): Mean/min depth, gradient, shallow crossings
- Physics (5): Attenuation coefficient, period survival, exposure index
- Local Conditions (4): Shore wind speed and direction
- Tidal (12): Height, phase, spring/neap, currents, storm surge
- Temporal (3): Hour, day of week, month
- Spectral Partitions (28): Multi-modal sea state features (NEW)
Optimal Hyperparameters
n_estimators: 300
max_depth: 6
physics_weight: 0.3
min_samples_split: 5
min_samples_leaf: 2Cross-Validation Methodology
Time-series cross-validation prevents future-to-past data leakage:
- Time-Split CV with expanding window
- 24-hour gap period between train/test to prevent temporal leakage
- Lead-time stratified analysis: 6h, 12h, 18h, 24h, 36h, 48h horizons
- Performance tracked per buoy station for regional calibration
References
- Raissi, M., Perdikaris, P., & Karniadakis, G.E. (2019). Physics-Informed Neural Networks. Journal of Computational Physics.
- Karniadakis, G.E. et al. (2021). Physics-Informed Machine Learning. Nature Reviews Physics.
- Gneiting, T. & Raftery, A.E. (2007). Probabilistic Forecasting. Annual Review of Statistics.