Note

This page was generated from a Jupyter Notebook. Download the notebook (.ipynb)

[1]:
# Skipped in CI: Colab/bootstrap dependency install cell.

Segment Analysis: Basic Pipeline

Goal: Learn how to use SegmentTable to manage time-keyed data analysis.

SegmentTable is a container for metadata and payload data (like TimeSeries or PSDs) associated with specific time segments. It supports lazy-loading to handle large datasets efficiently.

1. Creating a SegmentTable

We’ll start by loading sample data from a CSV file. This CSV defines segments with GPS start and end times.

[2]:
import warnings

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

from pathlib import Path

from gwexpy.table import SegmentTable

# Resolve the sample path robustly for both repo-root and notebook-dir execution.
csv_candidates = []
for root in [Path.cwd(), *Path.cwd().parents]:
    csv_candidates.extend([
        root / "docs" / "_static" / "samples" / "sample_segment_data.csv",
        root / "_static" / "samples" / "sample_segment_data.csv",
    ])

sample_csv = next((path for path in csv_candidates if path.exists()), None)
if sample_csv is None:
    tried = [str(path) for path in csv_candidates]
    raise FileNotFoundError(f"Could not find sample_segment_data.csv. Tried: {tried}")

st = SegmentTable.read(str(sample_csv))

print(st)
st.display().head()
   start  end label          span
0      0    4     A    (0.0, 4.0)
1      4    8     B    (4.0, 8.0)
2     10   13     C  (10.0, 13.0)
3     15   20     D  (15.0, 20.0)
4     22   25     E  (22.0, 25.0)
[2]:
start end label span
0 0 4 A (0.0, 4.0)
1 4 8 B (4.0, 8.0)
2 10 13 C (10.0, 13.0)
3 15 20 D (15.0, 20.0)
4 22 25 E (22.0, 25.0)

2. Visualizing Segments

You can quickly visualize the timeline of your segments.

[3]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
st.segments(ax=ax, label="Tutorial Segments")
plt.title("SegmentTable Timeline")
plt.show()
../../../../_images/web_en_user_guide_tutorials_intro_table_5_0.png

3. Lazy Loading Payloads

SegmentTable allows you to attach β€œloaders” to columns. Data is only loaded when actually accessed.

[4]:
from gwexpy.noise.wave import gaussian


def noise_loader(segment):
    # Generate synthetic noise for the segment
    duration = float(segment[1] - segment[0])
    return gaussian(duration=duration, sample_rate=1024, t0=float(segment[0]))

# Note: Use add_series_column for lazy-loadable payload data (kind='timeseries', etc.)
st.add_series_column("noise", loader=noise_loader, kind="timeseries")

# Accessing the first row's noise (triggers loading)
data_0 = st.row(0)["noise"]
print(f"Loaded {len(data_0)} samples starting at GPS {data_0.t0.value}")
Loaded 4096 samples starting at GPS 0.0

4. Processing Rows

You can iterate over rows or use apply to process data.

[5]:
# Calculate RMS for each noise segment
# Use add_column for lightweight metadata results
st.add_column("rms", data=[row["noise"].rms().value for row in st])
st.display()
[5]:
start end label span rms noise
0 0 4 A (0.0, 4.0) 1.004290 <timeseries: 4096 samples>
1 4 8 B (4.0, 8.0) 1.018152 <timeseries: 4096 samples>
2 10 13 C (10.0, 13.0) 1.000509 <timeseries: 3072 samples>
3 15 20 D (15.0, 20.0) 0.998584 <timeseries: 5120 samples>
4 22 25 E (22.0, 25.0) 0.982923 <timeseries: 3072 samples>

5. Quick Check (NBMAKE)

[6]:
assert "noise" in st.columns
assert len(st) > 0
print("Validation successful!")
Validation successful!