Note
This page was generated from a Jupyter Notebook. Download the notebook (.ipynb)
[1]:
# Skipped in CI: Colab/bootstrap dependency install cell.
Case Study: Bruco and ICA Noise Reduction
This notebook demonstrates an end-to-end noise reduction workflow used in real interferometer commissioning:
Bruco – brute-force coherence scan to identify the most correlated auxiliary channels.
ICA – Independent Component Analysis to separate noise sources.
Noise subtraction – remove noise contributions from the DARM channel and compare ASDs.
This reproduces the workflow used in O4b commissioning (e.g., DARM 116 Hz line investigation). The physical intent is to move from “which sensor sees the line” to “which latent source can be subtracted from DARM without destroying unrelated structure”.
Prerequisites: Familiarity with
Bruco tutorial — Bruco basics
PCA/ICA tutorial — decomposition basics
Setup
[2]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)
import warnings
with warnings.catch_warnings():
warnings.simplefilter('ignore')
# ruff: noqa: I001
import matplotlib.pyplot as plt
import numpy as np
from astropy import units as u
from gwexpy.analysis import Bruco
from gwexpy.timeseries import TimeSeries, TimeSeriesDict, TimeSeriesMatrix
1. Mock Data Generation
We simulate a scenario where the DARM channel contains:
Broadband Gaussian noise (the “signal” floor)
A 116 Hz line noise leaking from environmental sensors
Four PEM (Physical Environment Monitor) auxiliary channels each contain varying amounts of the same 116 Hz line. This mirrors the real O4b commissioning case.
[3]:
rng = np.random.default_rng(42)
fs = 512 # sample rate [Hz]
T = 64.0 # duration [s]
n = int(fs * T)
t = np.arange(n) / fs
FREQ_LINE = 116.0 # Hz — the line noise we want to remove
# ----- Independent sources -----
src_line = np.sin(2 * np.pi * FREQ_LINE * t) # coherent line noise
src_broad = rng.normal(0, 1, n) # broadband floor noise
# ----- DARM: broadband + line leakage -----
DARM_LINE_COUPLING = 0.5
darm_noise = src_broad + DARM_LINE_COUPLING * src_line
target = TimeSeries(
darm_noise, dt=1 / fs, unit=u.dimensionless_unscaled,
name="K1:CAL-CS_PROC_DARM_STRAIN_DBL_DQ", t0=0,
)
# ----- 4 auxiliary channels (different line content) -----
aux_configs = {
"K1:PEM-ACC_PSL_TABLE_PSL1_Y": (0.9, 0.1), # 90% line, 10% noise
"K1:PEM-ACC_PSL_TABLE_PSL2_X": (0.7, 0.3),
"K1:PEM-MIC_PSL_TABLE_PSL1_Z": (0.5, 0.5),
"K1:PEM-MIC_PSL_TABLE_PSL2_Z": (0.2, 0.8), # mostly noise
}
aux_dict = TimeSeriesDict({
name: TimeSeries(
a_line * src_line + a_noise * rng.normal(0, 1, n),
dt=1 / fs, unit=u.dimensionless_unscaled, name=name, t0=0,
)
for name, (a_line, a_noise) in aux_configs.items()
})
print(f"DARM sample rate: {target.sample_rate}")
print(f"Aux channels: {list(aux_dict.keys())}")
DARM sample rate: 512.0 Hz
Aux channels: ['K1:PEM-ACC_PSL_TABLE_PSL1_Y', 'K1:PEM-ACC_PSL_TABLE_PSL2_X', 'K1:PEM-MIC_PSL_TABLE_PSL1_Z', 'K1:PEM-MIC_PSL_TABLE_PSL2_Z']
[4]:
# Visualize the line noise in the ASD before cleaning
fig, ax = plt.subplots(figsize=(10, 4))
asd = target.asd(fftlength=4, overlap=2)
ax.loglog(asd.frequencies.value, asd.value, label="DARM (original)", color="steelblue")
for name, ts in aux_dict.items():
a = ts.asd(fftlength=4, overlap=2)
ax.loglog(a.frequencies.value, a.value, alpha=0.5, lw=0.8, label=name.split(":")[-1])
ax.axvline(FREQ_LINE, color="red", ls="--", label=f"{FREQ_LINE} Hz line")
ax.set_xlim(1, fs / 2)
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("ASD [1/√Hz]")
ax.set_title("ASD before cleaning — line noise visible at 116 Hz")
ax.legend(fontsize=7, ncol=2)
plt.tight_layout()
plt.show()
3. ICA: Separate Noise Sources
From the Bruco scan we know which channels are highly coherent with DARM near 116 Hz. We now stack these channels into a TimeSeriesMatrix and apply ICA to unmix the underlying independent sources.
The ICA model gives us a mixing matrix A such that:
X = S · A^T (X: observed channels, S: independent sources)
We can then subtract the noise-source contributions from DARM.
Failure mode to watch for: ICA components are not automatically ordered by physical importance, and the sign / scale of each component is arbitrary. Always inspect spectra and channel loadings before deciding which component represents the line you want to subtract.
Note — ICA convergence: FastICA may not converge within the default number of iterations on some datasets. If a
ConvergenceWarningappears, consider increasingmax_iterortolin the ICA parameters. Results may still be usable even without full convergence.A second common failure mode is rank deficiency: if you feed nearly identical witnesses into ICA, the decomposition becomes unstable and the subtraction weights are hard to trust. Prefer a small set of informative but not perfectly redundant channels.
[8]:
# Pick the top-2 channels from Bruco result
TOP_CHANNELS = df_line["channel"].dropna().unique()[:2].tolist()
print("Selected channels for ICA:", TOP_CHANNELS)
# Stack DARM + top channels into a TimeSeriesMatrix (shape: n_ch × 1 × n_samples)
channels = [target] + [aux_dict[ch] for ch in TOP_CHANNELS]
data_3d = np.stack([ch.value for ch in channels], axis=0)[:, np.newaxis, :]
tsm = TimeSeriesMatrix(
data_3d, dt=1 / fs, unit=u.dimensionless_unscaled, t0=0,
)
print(f"TimeSeriesMatrix shape: {tsm.shape} (n_channels × 1 × n_samples)")
Selected channels for ICA: ['K1:PEM-ACC_PSL_TABLE_PSL1_Y', 'K1:PEM-ACC_PSL_TABLE_PSL2_X']
TimeSeriesMatrix shape: (3, 1, 32768) (n_channels × 1 × n_samples)
[9]:
import warnings
with warnings.catch_warnings():
warnings.simplefilter('ignore')
# Run ICA
n_components = len(channels)
ica_sources, ica_model = tsm.ica(n_components=n_components, return_model=True)
sk = ica_model.sklearn_model
print(f"ICA stopped after {sk.n_iter_} iterations")
print(f"Mixing matrix A shape: {sk.mixing_.shape}") # (n_channels, n_components)
# Visualize ICA sources in frequency domain
fig, axes = plt.subplots(1, n_components, figsize=(4 * n_components, 4), sharey=True)
for k in range(n_components):
src_ts = TimeSeries(
ica_sources.value[k, 0, :], dt=1 / fs,
unit=u.dimensionless_unscaled, t0=0,
)
asd_k = src_ts.asd(fftlength=4, overlap=2)
axes[k].semilogy(asd_k.frequencies.value, asd_k.value)
axes[k].axvline(FREQ_LINE, color="red", ls="--", lw=0.8)
axes[k].set_title(f"ICA source #{k}")
axes[k].set_xlim(1, fs / 2)
axes[k].set_xlabel("Frequency [Hz]")
axes[0].set_ylabel("ASD [arb/√Hz]")
plt.suptitle("ICA independent components — one should peak at 116 Hz")
plt.tight_layout()
plt.show()
ICA stopped after 200 iterations
Mixing matrix A shape: (3, 3)
4. Noise Subtraction and ASD Comparison
Identify which ICA components contain the 116 Hz line (by inspecting the ASD), then subtract their contribution from DARM using the mixing matrix.
DARM_clean = DARM - Σ_k A[0, k] · S_k(t) (summed over noise components k)
Common mistake: subtracting every ICA component that looks non-Gaussian. Only subtract components whose spectral content and witness loadings match the targeted coupling mechanism; otherwise you risk removing unrelated detector structure along with the line.
[10]:
# Identify the noise component: the one with the largest ASD at FREQ_LINE
fftlength = 4.0
overlap = 2.0
freqs = np.fft.rfftfreq(int(fftlength * fs), d=1 / fs)
line_bin = np.argmin(np.abs(freqs - FREQ_LINE))
A = sk.mixing_ # (n_channels, n_components)
X_raw = data_3d[:, 0, :].T # (n_samples, n_channels)
S = sk.transform(X_raw) # (n_samples, n_components)
# ASD of each component at the line frequency
asd_at_line = []
for k in range(n_components):
s_ts = TimeSeries(S[:, k], dt=1 / fs, unit=u.dimensionless_unscaled, t0=0)
asd_k = s_ts.asd(fftlength=fftlength, overlap=overlap)
val = float(np.interp(FREQ_LINE, asd_k.frequencies.value, asd_k.value))
asd_at_line.append(val)
noise_component = int(np.argmax(asd_at_line))
print(f"Dominant noise component: #{noise_component} (ASD at {FREQ_LINE} Hz = {asd_at_line[noise_component]:.4f})")
# Subtract noise component from DARM (channel index 0)
darm_clean = X_raw[:, 0] - A[0, noise_component] * S[:, noise_component]
target_clean = TimeSeries(
darm_clean, dt=1 / fs, unit=u.dimensionless_unscaled,
name="DARM_cleaned", t0=0,
)
Dominant noise component: #2 (ASD at 116.0 Hz = 1.5764)
[11]:
# Compare original vs cleaned ASD
asd_orig = target.asd(fftlength=fftlength, overlap=overlap)
asd_clean = target_clean.asd(fftlength=fftlength, overlap=overlap)
fig, ax = plt.subplots(figsize=(10, 5))
ax.loglog(asd_orig.frequencies.value, asd_orig.value, label="DARM original", color="steelblue", lw=1.2)
ax.loglog(asd_clean.frequencies.value, asd_clean.value, label="DARM cleaned", color="orangered", lw=1.2)
ax.axvline(FREQ_LINE, color="gray", ls="--", lw=0.8, label=f"{FREQ_LINE} Hz")
# Suppression factor at line frequency
ratio = float(np.interp(FREQ_LINE, asd_orig.frequencies.value, asd_orig.value)) / float(np.interp(FREQ_LINE, asd_clean.frequencies.value, asd_clean.value))
ax.set_xlim(50, 200)
ax.set_ylim(1e-4, 10)
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("ASD [1/√Hz]")
ax.set_title(f"Noise reduction at {FREQ_LINE} Hz: ×{ratio:.1f} suppression")
ax.legend()
plt.tight_layout()
plt.show()
print(f"Suppression factor at {FREQ_LINE} Hz: {ratio:.2f}×")
Suppression factor at 116.0 Hz: 2.75×
Summary
Step |
Tool |
Output |
|---|---|---|
|
|
Top correlated channels at each frequency |
|
|
Independent components + mixing matrix |
|
Mixing matrix algebra |
Cleaned DARM channel |
Key takeaways
Bruco efficiently narrows down thousands of channels to the few that matter.
ICA goes beyond coherence: it separates independent source contributions even when multiple sensors share a common noise.
The mixing matrix
Aprovides a direct subtraction formula without time-domain filtering.Stable subtraction depends on witness selection and component identification, not on the ICA solver alone.
Next steps
Replace mock data with real NDS/GWF data (
TimeSeries.read()or NDS2).Run on longer segments and cross-validate with Bruco’s projection estimate.
Combine with
Spectrogram.normalize()(SNR spectrogram) to track the line over time.