Note

This page was generated from a Jupyter Notebook. Download the notebook (.ipynb)

Time Series Forecasting with ARIMA Models

Open In Colab

This notebook demonstrates how to model and forecast time series data using ARIMA-related functions (AR, MA, ARMA, ARIMA/SARIMAX) that have been added to the gwexpy TimeSeries class.

Setup

Import the necessary libraries.

[1]:
import warnings

import matplotlib.pyplot as plt
import numpy as np

warnings.filterwarnings(
    "ignore", message=r".*force_all_finite.*", category=FutureWarning
)
warnings.filterwarnings("ignore", category=FutureWarning, module=r"sklearn\..*")
warnings.filterwarnings("ignore", category=FutureWarning, module=r"pmdarima\..*")

from gwexpy.timeseries import TimeSeries

Creating Sample Data

To verify the effectiveness of ARIMA models, we create synthetic data that includes a trend, seasonality (periodicity), and noise.

[2]:
# Trend + Seasonality + Noise
t0 = 0
dt = 0.1
duration = 30
t = np.arange(0, duration, dt)
n_samples = len(t)

# 1. Linear Trend
trend = 0.3 * t

# 2. Periodic component (Sine)
seasonality = 2.0 * np.sin(2 * np.pi * 0.2 * t)

# 3. Noise
np.random.seed(42)
noise = np.random.normal(0, 0.8, n_samples)

data = trend + seasonality + noise

# Create TimeSeries object
ts = TimeSeries(data, t0=t0, dt=dt, unit="V", name="Sample Data")

ts.plot()
plt.title("Original Data (Trend + Seasonality + Noise)")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_3_0.png

1. AR (AutoRegressive) Model

The AR model assumes that the current value depends on past values. Use the ts.ar(p) method, where p is the order.

[3]:
# Fit with order p=3
model_ar = ts.ar(p=3)

# Display model summary
# print(model_ar.summary())

# Plot results
# forecast_steps specifies how many steps to forecast into the future
model_ar.plot(forecast_steps=50, alpha=0.5)
plt.title("AR(3) Model Forecast")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_5_0.png

2. MA (Moving Average) Model

The MA model assumes that the current value depends on past prediction errors (white noise). Use the ts.ma(q) method.

[4]:
# Fit with order q=3
model_ma = ts.ma(q=3)

# Plot results
model_ma.plot(forecast_steps=50, alpha=0.5)
plt.title("MA(3) Model Forecast")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_7_0.png

3. ARMA (AutoRegressive Moving Average) Model

The ARMA model combines AR and MA. Use the ts.arma(p, q) method.

[5]:
# Fit with p=2, q=2
model_arma = ts.arma(p=2, q=2)

model_arma.plot(forecast_steps=50, alpha=0.5)
plt.title("ARMA(2, 2) Model Forecast")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_9_0.png

4. ARIMA / Auto-ARIMA

The ARIMA model removes non-stationarity (such as trends) by differencing, then applies the ARMA model.

Use the ts.arima() method. With the auto=True option, the pmdarima library is used to automatically search for optimal parameters (p, d, q). This is the most recommended approach.

[6]:
try:
    print("Searching for optimal ARIMA parameters... (this may take a few seconds)")

    # Seasonality can also be considered
    # m is the number of steps in a seasonal period
    model_auto = ts.arima(auto=True, auto_kwargs={"seasonal": True, "m": 50})

    # Show info of the optimal model found
    print(model_auto.summary())

    # Plot results (Forecasting for a longer period)
    ax = model_auto.plot(forecast_steps=100, alpha=0.5)
    plt.title("Auto-ARIMA Model Forecast (Seasonal)")
    plt.show()
except ImportError as e:
    print(f"Auto-ARIMA skipping: {e}")

Searching for optimal ARIMA parameters... (this may take a few seconds)
                                      SARIMAX Results
===========================================================================================
Dep. Variable:                                   y   No. Observations:                  300
Model:             SARIMAX(1, 1, 2)x(1, 0, [], 50)   Log Likelihood                -404.141
Date:                             Thu, 26 Feb 2026   AIC                            820.282
Time:                                     20:48:31   BIC                            842.485
Sample:                                          0   HQIC                           829.169
                                             - 300
Covariance Type:                               opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.0044      0.006      0.695      0.487      -0.008       0.017
ar.L1          0.8693      0.053     16.310      0.000       0.765       0.974
ma.L1         -1.6277      0.052    -31.464      0.000      -1.729      -1.526
ma.L2          0.7409      0.043     17.235      0.000       0.657       0.825
ar.S.L50       0.1604      0.071      2.263      0.024       0.021       0.299
sigma2         0.8667      0.070     12.431      0.000       0.730       1.003
===================================================================================
Ljung-Box (L1) (Q):                   0.13   Jarque-Bera (JB):                 0.69
Prob(Q):                              0.71   Prob(JB):                         0.71
Heteroskedasticity (H):               1.37   Skew:                            -0.10
Prob(H) (two-sided):                  0.12   Kurtosis:                         3.14
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
../../../../_images/web_en_user_guide_tutorials_advanced_arima_11_1.png

5. Obtaining Forecast Data

Beyond plotting, you can directly obtain forecast values as a TimeSeries object using the .forecast() method. This also includes confidence intervals.

[7]:
if "model_auto" not in dir():
    print("Auto-ARIMA model not available (pmdarima may be missing); skipping forecast.")
else:
    steps = 50

    # Get forecast values and confidence intervals
    forecast_ts, intervals = model_auto.forecast(
        steps=steps, alpha=0.5
    )  # 50% confidence interval

    print("Forecast Data:")
    print(forecast_ts)

    print("\nConfidence Interval (Upper):")
    print(intervals["upper"])

    # Forecasting data can be saved or used for other analysis
    # forecast_ts.write("forecast.h5")
Forecast Data:
TimeSeries([ 9.02063095,  9.60946546, 10.04247608, 10.15366724,
            10.05259678, 10.38953409, 10.79815095, 10.71027742,
            11.00660771, 11.18205177, 11.08230487, 11.29883005,
            10.98339795, 11.34916921, 11.51885662, 11.45196033,
            11.87327376, 11.52379092, 11.68683663, 11.78939387,
            11.98098394, 11.6301629 , 11.97773482, 11.84042886,
            11.72116368, 11.91254849, 11.88390198, 11.78617506,
            11.8774454 , 11.82552207, 11.89784228, 11.97900471,
            12.11133126, 11.76628355, 12.22002271, 11.72134704,
            11.98239983, 12.11216074, 12.11237757, 12.04095272,
            12.1435007 , 12.16098299, 12.2071204 , 12.45443501,
            12.45772066, 12.39288272, 12.67001284, 12.66918601,
            12.8110433 , 12.8657357 ],
           unit: V,
           t0: 30.0 s,
           dt: 0.1 s,
           name: Sample Data_auto_forecast,
           channel: None)

Confidence Interval (Upper):
TimeSeries([ 9.64857296, 10.25546653, 10.7195876 , 10.87456042,
            10.82818582, 11.22844355, 11.7067511 , 11.69299174,
            12.06629413, 12.32035964, 12.29997326, 12.59592369,
            12.35948866, 12.80347361, 13.05033984, 13.05941406,
            13.55537472, 13.27914473, 13.51401064, 13.68694154,
            13.94746299, 13.66414849, 14.07782921, 14.0052681 ,
            13.9494223 , 14.20294246, 14.23519045, 14.19716117,
            14.34697639, 14.35248881, 14.48117848, 14.61768579,
            14.80437308, 14.51274095, 15.01898802, 14.57194853,
            14.88380013, 15.06355532, 15.11299318, 15.09004592,
            15.24035641, 15.30491312, 15.39746251, 15.69055103,
            15.73899569, 15.71872387, 16.03984813, 16.08246334,
            16.26722947, 16.36431549],
           unit: V,
           t0: 30.0 s,
           dt: 0.1 s,
           name: upper_ci,
           channel: None)

6. Application in Gravitational Wave Data Analysis: Noise Removal (Whitening) with ARIMA

In gravitational wave data analysis, a process called whitening is performed to extract burst signals or ringdown waveforms from stationary “colored noise”. By training an ARIMA model as a predictive model for background noise and subtracting the predicted values from the measured values (taking residuals), it is possible to remove noise and enhance the signal.

Step 1: Generating Colored Noise and Injecting a Signal

First, we create colored noise with strong low-frequency components (AR(1) process), and embed a weak signal (sine-Gaussian) that is difficult to distinguish otherwise. Note: We use “strain” as the unit, which is recognizable by Astropy.

[8]:
import matplotlib.pyplot as plt
import numpy as np

from gwexpy.timeseries import TimeSeries


def generate_mock_data(duration=10, fs=1024):
    t = np.linspace(0, duration, int(duration * fs))

    # 1. Background Noise (AR(1) process with strong autocorrelation)
    np.random.seed(42)
    white_noise = np.random.normal(0, 1, len(t))
    colored_noise = np.zeros_like(white_noise)
    for i in range(1, len(t)):
        colored_noise[i] = 0.95 * colored_noise[i - 1] + white_noise[i]

    # 2. Injected Signal (Sine-Gaussian)
    center_time = 5.0
    sig_freq = 50.0
    width = 0.1
    signal = (
        2.0
        * np.exp(-((t - center_time) ** 2) / (2 * width**2))
        * np.sin(2 * np.pi * sig_freq * t)
    )

    data = colored_noise + signal
    # Use a standard unit string that astropy/gwpy can recognize
    return TimeSeries(data, t0=0, dt=1 / fs, unit="strain", name="MockData")


ts = generate_mock_data()

plt.figure(figsize=(10, 4))
plt.plot(ts, color="gray")
plt.title("Original Noisy Data (Signal is hidden here)")
plt.xlabel("Time [s]")
plt.grid(True, linestyle=":")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_16_0.png

Step 2: Noise Modeling with ARIMA

Using auto=True, we search for the optimal model that expresses the autocorrelation structure of the background noise.

[9]:
print("Searching for optimal noise model parameters...")
# We prioritize AR model for whitening, so we set max_p=5 and max_q=0.
model = ts.arima(auto=True, max_p=5, max_q=0)
print(f"Best model order: {model.res.model.order}")
print(model.summary())
Searching for optimal noise model parameters...
Best model order: (5, 1, 0)
                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                10240
Model:               SARIMAX(5, 1, 0)   Log Likelihood              -14694.968
Date:                Thu, 26 Feb 2026   AIC                          29403.936
Time:                        20:48:38   BIC                          29454.574
Sample:                             0   HQIC                         29421.057
                              - 10240
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept     -0.0003      0.010     -0.033      0.973      -0.020       0.019
ar.L1         -0.0379      0.010     -3.794      0.000      -0.057      -0.018
ar.L2         -0.0413      0.010     -4.150      0.000      -0.061      -0.022
ar.L3         -0.0271      0.010     -2.692      0.007      -0.047      -0.007
ar.L4         -0.0303      0.010     -3.078      0.002      -0.050      -0.011
ar.L5         -0.0216      0.010     -2.195      0.028      -0.041      -0.002
sigma2         1.0330      0.014     72.089      0.000       1.005       1.061
===================================================================================
Ljung-Box (L1) (Q):                   0.01   Jarque-Bera (JB):                 0.54
Prob(Q):                              0.94   Prob(JB):                         0.76
Heteroskedasticity (H):               1.03   Skew:                             0.01
Prob(H) (two-sided):                  0.35   Kurtosis:                         3.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Step 3: Extracting and Visualizing Residuals

By subtracting the “stationary noise” predicted by the model, we obtain the residuals \(x_r = x - \hat{x}\). This is the whitened data.

[10]:
ts_whitened = model.residuals()
ts_whitened.name = "Whitened Data"

fig, ax = plt.subplots(2, 1, figsize=(10, 8), sharex=True)

ax[0].plot(ts, color="gray", alpha=0.5, label="Original")
ax[0].set_title("Original Noisy Data")
ax[0].legend(loc="upper right")

ax[1].plot(ts_whitened, color="red", label="Residuals (Whitened)")
ax[1].set_title("Whitened Data (Signal is now visible!)")
ax[1].axvspan(4.8, 5.2, color="orange", alpha=0.3, label="Injected Signal Position")
ax[1].legend(loc="upper right")

plt.tight_layout()
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_20_0.png

Step 4: Evaluation in the Frequency Domain

We calculate and compare the Power Spectral Density (PSD).

[11]:
fft_original = ts.psd()
fft_whitened = ts_whitened.psd()

plt.figure(figsize=(10, 5))
plt.loglog(
    fft_original.frequencies.value,
    fft_original.value,
    label="Original Noisy PSD",
    alpha=0.7,
)
plt.loglog(
    fft_whitened.frequencies.value,
    fft_whitened.value,
    label="Whitened PSD (Residuals)",
    alpha=0.7,
)
plt.title("PSD Comparison: Effect of ARIMA Whitening")
plt.xlabel("Frequency [Hz]")
plt.ylabel("PSD [1/Hz]")
plt.legend()
plt.grid(True, which="both", linestyle=":")
plt.show()
../../../../_images/web_en_user_guide_tutorials_advanced_arima_22_0.png

Summary

  • Using model.residuals(), whitening can be easily performed without writing complex mathematical formulas.

  • This technique is very powerful for searching for burst signals in gravitational wave analysis and for extracting ringdown waveforms (exponentially decaying sinusoids) immediately after events.