解析ユーティリティ (Analysis)
- class gwexpy.analysis.Bruco(target_channel: str, aux_channels: list[str], excluded_channels: list[str] | None = None)[source]
Bases:
objectBrute force Coherence (Bruco) scanner.
- target
The name of the target channel (e.g., DARM).
- Type:
str
- aux_channels
List of auxiliary channels to scan.
- Type:
List[str]
- excluded
List of channels to exclude from analysis.
- Type:
List[str]
- compute(start: int | float | None = None, duration: int | None = None, fftlength: float = 2.0, overlap: float = 1.0, parallel: int = 4, batch_size: int = 100, top_n: int = 5, block_size: int | str | None = None, target_data: TimeSeries | None = None, aux_data: TimeSeriesDict | Iterable[TimeSeries] | None = None, preprocess_batch: Callable[[TimeSeriesDict], TimeSeriesDict] | None = None) BrucoResult[source]
Execute the coherence scan.
- Parameters:
start (int or float, optional) – GPS start time. Required if not inferable from data.
duration (int, optional) – Duration of data in seconds. Required if not inferable.
fftlength (float) – FFT length in seconds.
overlap (float) – Overlap in seconds.
parallel (int) – Number of parallel jobs for reading data and computing coherence.
batch_size (int) – Channels per batch.
top_n (int) – Number of top channels to keep per frequency bin.
block_size (int or 'auto', optional) – Channels per block in Top-N updates.
target_data (TimeSeries, optional) – Pre-loaded target channel data.
aux_data (TimeSeriesDict or Iterable[TimeSeries], optional) – Pre-loaded auxiliary channels data. Can be a dictionary-like object or an iterable/generator yielding TimeSeries.
preprocess_batch (Callable, optional) – Batch preprocessing callback.
- Returns:
Object containing frequency-wise analysis results.
- Return type:
BrucoResult
- class gwexpy.analysis.BrucoResult(frequencies: ndarray, target_name: str, target_spectrum: ndarray, top_n: int = 5, metadata: Mapping[str, str | int | float | bool] | None = None, block_size: int | str | None = None)[source]
Bases:
objectHold and analyze Bruco results with Top-N coherence per frequency bin.
- coherence_for_channel(channel: str, asd: bool = True) ndarray[source]
Get the coherence spectrum for a specific channel. Values are NaN where the channel is not in the Top-N.
- Parameters:
channel – Channel name.
asd – If True, return Amplitude Coherence. If False, Squared Coherence.
- Returns:
Coherence spectrum (same length as frequencies).
- dominant_channel(rank: int = 0) str | None[source]
Return the most frequent channel name at a given rank.
- generate_report(output_dir: str, max_rows: int = 2000, coherence_threshold: float = 0.5, plot_ranks: int = 3, asd: bool = True) str[source]
Generate an HTML report with plots and data summary.
- Parameters:
asd – If True (default), report and plots use ASD units.
- Returns:
Path to the generated HTML file.
- get_noise_projection(rank: int = 0, asd: bool = True, coherence_threshold: float = 0.0) tuple[ndarray, ndarray][source]
Calculate noise projection for the channel at a specific rank (0 = highest coherence).
- Parameters:
asd – If True (default), return ASD projection. If False, return PSD projection.
coherence_threshold – Frequencies with coherence below this value contribute zero noise.
- Returns:
(projection, coherence)
- get_ranked_channels(limit: int = 5, band: tuple[float, float] | None = None) list[str][source]
Get a list of channels ranked by their total coherence contribution.
- Parameters:
limit – Maximum number of channels to return.
band – Optional
(f_low, f_high)tuple (Hz). When given, only frequency bins inside the band contribute to the per-channel score. Bins outside the band that happen to appear in the Top-N arrays are ignored. NaN bins (outside Top-N) are always excluded vianumpy.nanmax().
- Returns:
List of channel names sorted by importance.
- plot_coherence(ranks: Sequence[int] | None = None, channels: Sequence[str] | None = None, max_channels: int = 3, asd: bool = True, coherence_threshold: float = 0.0, save_path: str | None = None) Figure[source]
Plot coherence spectrum for selected ranks or channels.
- Default behavior (ranks=None, channels=None):
Plots the Top-K contributors (per-channel mode).
- Parameters:
asd – If True (default), plot Amplitude Coherence (sqrt(Coh^2)). If False, plot Squared Coherence (Coh^2).
coherence_threshold – Draw a horizontal line at this value (default 0.0=off).
- plot_projection(ranks: Sequence[int] | None = None, channels: Sequence[str] | None = None, max_channels: int = 3, asd: bool = True, coherence_threshold: float = 0.0, save_path: str | None = None) Figure[source]
Plot target spectrum and noise projections for selected ranks or channels.
- Default behavior (ranks=None, channels=None):
Plots the Top-K contributors (per-channel mode).
- plot_ranked(top_k: int = 3, band: tuple[float, float] | None = None, asd: bool = True, coherence_threshold: float = 0.0, save_path: str | None = None) Figure[source]
Plot coherence spectra for the top-ranked channels.
Selects channels via
topk()(optionally band-limited) and delegates toplot_coherence().- Parameters:
top_k – Number of top channels to plot.
band – Optional
(f_low, f_high)frequency band (Hz) for band-limited ranking.asd – If True (default), plot Amplitude Coherence
sqrt(Coh^2).coherence_threshold – Draw a horizontal reference line at this value (default
0.0= off).save_path – Optional file path to save the figure.
- Returns:
matplotlib.figure.Figure
- projection_for_channel(channel: str, asd: bool = True, coherence_threshold: float = 0.0) ndarray[source]
Calculate projection spectrum for a specific channel where it appears in Top-N.
- to_dataframe(ranks: Sequence[int] | None = None, stride: int = 1, asd: bool = True, coherence_threshold: float = 0.0) DataFrame[source]
Convert results to a long-form DataFrame.
- topk(n: int = 5, band: tuple[float, float] | None = None) list[str][source]
Return the top-n channels ranked by coherence.
This is a convenience alias for
get_ranked_channels().- Parameters:
n – Number of channels to return.
band – Optional
(f_low, f_high)frequency band (Hz) for band-limited scoring.
- Returns:
List of up to n channel names, most coherent first.
- update_batch(channel_names: Sequence[str], coherences: ndarray) None[source]
Update the Top-N records with a new batch of results.
- Parameters:
channel_names – List of channel names in this batch.
coherences – Coherence matrix of shape (n_channels, n_bins). Must align to self.frequencies.
- gwexpy.analysis.estimate_coupling(data_inj: TimeSeriesDict, data_bkg: TimeSeriesDict, fftlength: float, witness: str | None = None, frange: tuple[float, float] | None = None, threshold_witness: ThresholdStrategy | float = 25.0, threshold_target: ThresholdStrategy | float = 4.0, n_jobs: int | None = None, **kwargs: Any) CouplingResult | dict[str, CouplingResult][source]
Helper function to estimate CF.
- Parameters:
frange (tuple of float, optional) – Frequency range (fmin, fmax) to evaluate CF and CF upper limit. Values outside the range are set to NaN.
- class gwexpy.analysis.CouplingFunctionAnalysis[source]
Bases:
objectAnalysis class to estimate Coupling Functions (CF).
- compute(data_inj: ~gwexpy.timeseries.collections.TimeSeriesDict, data_bkg: ~gwexpy.timeseries.collections.TimeSeriesDict, fftlength: float, witness: str | None = None, frange: tuple[float, float] | None = None, overlap: float = 0, threshold_witness: ~gwexpy.analysis.coupling.ThresholdStrategy = <gwexpy.analysis.coupling.RatioThreshold object>, threshold_target: ~gwexpy.analysis.coupling.ThresholdStrategy = <gwexpy.analysis.coupling.RatioThreshold object>, n_jobs: int | None = None, memory_limit: float = 2147483648.0, **kwargs: object) CouplingResult | dict[str, CouplingResult][source]
Compute Coupling Function(s) from TimeSeriesDicts.
- Parameters:
data_inj (TimeSeriesDict) – Injection data (Witness + Targets).
data_bkg (TimeSeriesDict) – Background data (Witness + Targets).
fftlength (float) – FFT length in seconds.
witness (str, optional) – The name (key) of the witness channel. If None, the FIRST channel in data_inj is used.
frange (tuple of float, optional) – Frequency range (fmin, fmax) to evaluate CF and CF upper limit. Values outside the range are set to NaN.
overlap (float, optional) – Overlap in seconds (default 0).
threshold_witness (ThresholdStrategy) – Strategy to determine if Witness is excited.
threshold_target (ThresholdStrategy) – Strategy to determine if Target is excited.
n_jobs (int, optional) – Number of jobs for parallel processing. None means 1 unless in a joblib.parallel_config context. -1 means using all processors.
- class gwexpy.analysis.RatioThreshold(ratio: float = 2.0)[source]
Bases:
ThresholdStrategyChecks if P_inj > ratio * P_bkg_mean.
- Statistical Assumptions:
No specific statistical distribution is assumed.
Tests if injection power exceeds the background level by a fixed factor.
- Usage:
Best for simple, physical excess screening where precise statistical significance is less critical.
Extremely fast as it requires no variance estimation.
- check(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: object) ndarray[source]
- threshold(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: object) ndarray[source]
- class gwexpy.analysis.SigmaThreshold(sigma: float = 3.0)[source]
Bases:
ThresholdStrategyChecks if P_inj > P_bkg + sigma * std_error.
Statistical Assumptions
Background Power Spectral Density (PSD) at each bin approximately follows a Gaussian distribution (valid when n_avg is sufficiently large).
The parameter n_avg represents the number of independent averages (e.g., in Welch’s method).
Assumes standard deviation of the noise reduces as 1 / sqrt(n_avg).
Meaning of Threshold
threshold = mean + sigma * (mean / sqrt(n_avg))This is a statistical significance test, NOT a physical upper limit.
It identifies frequencies where the injection is statistically distinguishable from background variance.
Gaussian Approximation Validity
Welch PSD estimates follow a χ² distribution with 2K degrees of freedom (K = n_avg). The Gaussian approximation is valid when K ≥ 10 (approximately).
For K < 10, consider: - Using PercentileThreshold (empirical distribution, no Gaussian assumption) - Increasing FFT averaging by using longer data or shorter fftlength
References
Welch, P.D. (1967): PSD estimation via overlapped segment averaging
Bendat & Piersol, Random Data (4th ed., 2010), Ch. 11
Warning
This method relies heavily on the Gaussian and stationary assumptions. It may be unreliable if: - The background contains significant non-Gaussian features (glitches) - n_avg is small (< ~10), where the central limit theorem has not converged - There are strong spectral lines (non-stationary or deterministic signals)
In such cases, PercentileThreshold is recommended as it uses the empirical distribution.
- check(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: object) ndarray[source]
- threshold(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: object) ndarray[source]
- class gwexpy.analysis.PercentileThreshold(percentile: float = 99.7, factor: float = 2.6)[source]
Bases:
ThresholdStrategyThreshold strategy based on empirical percentile of background distribution.
This strategy follows Appendix B of the PEM injection paper, using the 99.7th percentile of background segments and a correction factor to account for finite-averaging and χ² distribution scaling.
- Parameters:
percentile (float, default=99.7) – The percentile of the background distribution (0-100). 99.7% equivalent to 3-sigma for Gaussian noise.
factor (float, default=2.6) – Correction factor (multiplier) for the percentile value. The value 2.6 is recommended in Appendix B.1 to set reduced χ² ≈ 1.
- check(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: object) ndarray[source]
- threshold(psd_inj: FrequencySeries, psd_bkg: FrequencySeries, raw_bkg: TimeSeries | None = None, **kwargs: Any) ndarray[source]
- gwexpy.analysis.association_edges(target: Any, matrix: Any, *, method: str = 'pearson', parallel: int | None = None, threshold: float | None = None, threshold_mode: str = 'abs', topk: int | None = None, return_dataframe: bool = True) Any[source]
Compute association edges between a target TimeSeries and a TimeSeriesMatrix.
Returns a DataFrame (default) with columns: [“source”, “target”, “score”, “row”, “col”, “channel”].
- gwexpy.analysis.build_graph(edges: Any, *, backend: str = 'networkx', directed: bool = False, weight: str = 'score') Any[source]
Build a graph object from association edges.
If backend=”none”, returns edges unchanged.