File I/O Supported Formats Guide
A comprehensive guide to all file formats supported by gwexpy, including how to read and write each format.
This page covers only the end-user API (.read() / .write() class methods) and does not expose internal implementation details.
Supported Formats Overview
Format |
Extension |
Read |
Write |
Recommended Class / Method |
Dependencies |
Notes |
|---|---|---|---|---|---|---|
GWF |
|
Y |
Y |
|
— (gwpy built-in) |
gwpy standard format |
HDF5 |
|
Y |
Y |
|
— (gwpy built-in) |
gwpy standard |
LIGO_LW XML (DTTXML) |
|
Y |
N |
|
— |
|
CSV / TXT |
|
Y |
Y |
|
— (gwpy built-in) |
ASCII format. Supports directory loading |
Pickle |
|
Y |
Y |
|
— |
Python serialization |
WAV |
|
Y |
Y |
|
scipy |
|
MiniSEED |
|
Y |
Y |
|
ObsPy |
Seismic waveform format |
SAC |
|
Y |
Y |
|
ObsPy |
Seismic waveform format |
GSE2 |
|
Y |
Y |
|
ObsPy |
Seismic waveform format |
KNET |
|
Y |
N |
|
ObsPy |
K-NET strong motion records |
GBD |
|
Y |
N |
|
— |
|
WIN / WIN32 |
|
Y |
N |
|
ObsPy |
NIED WIN format (improved parser) |
MTH5 |
|
Y |
Y |
|
mth5 |
Magnetometer data (design-level support) |
ATS |
|
Y |
N |
|
— |
Metronix binary parser |
ROOT |
|
Y |
Y |
|
— (via gwpy) |
CERN ROOT tables |
SQLite / SDB |
|
Y |
N |
|
— |
WeeWX / Davis weather data |
NetCDF4 |
|
Y |
Y |
|
xarray, netcdf4 |
Auto-detects time dimension |
Zarr |
|
Y |
Y |
|
zarr |
Cloud-optimized chunked arrays |
Audio (MP3, FLAC, etc.) |
|
Y |
Y |
|
pydub (+ffmpeg) |
|
NDS2 |
(network) |
Y |
N |
|
nds2-client |
Network data server |
TDMS |
|
Y |
N |
|
npTDMS |
National Instruments |
ORF |
|
N |
N |
— |
— |
Not implemented (stub) |
Note: Formats marked “gwpy built-in” (GWF, HDF5, CSV/TXT, Pickle) are handled through gwpy’s built-in IO pathways. Since gwexpy extends gwpy, these are available out of the box.
Format Details
GBD — GRAPHTEC Data Logger .gbd
Extension: .gbd
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/logger.gbd", timezone="Asia/Tokyo")
Required arguments:
timezone(str or tzinfo) — Logger’s local timezone (IANA name, e.g."Asia/Tokyo"or UTC offset). Required. Omitting raisesValueError.
Key optional arguments:
channels(iterable[str], optional) — List of channel names to load. Defaults to all channels.digital_channels(iterable[str], optional) — Channel names to treat as digital. Defaults to auto-detectingALARM,ALARMOUT,PULSE*,LOGIC*.unit(str or Unit, optional) — Physical unit override. Default'V'.epoch(float or datetime, optional) — Epoch (GPS seconds) override. Datetimes are converted to GPS.pad(float, optional) — Padding value. DefaultNaN.
Dependencies: None (native implementation)
Notes:
Digital channels (
ALARM,PULSE*, etc.) are binarized to 0/1.If the
HeaderSizfield is missing from the header, aValueErroris raised.Scale factors are automatically extracted from the AMP section.
Implementation: gwexpy/timeseries/io/gbd.py
ATS — Metronix .ats
Extension: .ats
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.ats")
Reading via the mth5 library (for .atss files):
from gwexpy.timeseries import TimeSeries
ts = TimeSeries.read("path/to/data.atss", format="ats.mth5")
Required arguments: None
Key optional arguments: None (metadata is automatically extracted from the binary header)
Dependencies:
Standard reader: None (native binary parser)
ats.mth5format: Requires themth5library. RaisesImportErrorif not installed.
Notes:
Supports ATS header versions 80/81. CEA/sliced headers (version 1080) are not supported (
NotImplementedError).LSB values (mV/count) are automatically converted to Volts (V).
Channel names are auto-generated from header info in the format
Metronix_{system}_{serial}_{type}_{sensor}_{serial}.The
ats.mth5format requires mth5’s filename conventions. Use the default binary parser if file names do not conform.
Implementation: gwexpy/timeseries/io/ats.py
SDB — WeeWX / Davis Weather Station .sdb
Extension: .sdb, .sqlite, .sqlite3
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/weewx.sdb")
Required arguments: None
Key optional arguments:
table(str, optional) — Table name to read. Default'archive'.columns(list[str], optional) — Column names to read. Defaults to known weather columns (barometer,outTemp,windSpeed, etc.).
Dependencies: None (uses stdlib sqlite3 + pandas)
Notes:
Automatic conversion from Imperial to SI units (e.g. degF to degC, inHg to hPa, mph to m/s, inch to mm).
The table must have a
dateTimecolumn (UNIX timestamp).Sample rate is auto-estimated from the median
dateTimeinterval.
Implementation: gwexpy/timeseries/io/sdb.py
TDMS — National Instruments .tdms
Extension: .tdms
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.tdms")
Required arguments: None
Key optional arguments:
channels(list[str], optional) — Channels to read. Channel names use the format"GroupName/ChannelName". Defaults to all channels.unit(str, optional) — Physical unit override.
Dependencies: npTDMS — Raises ImportError if not installed, with a message suggesting pip install nptdms.
Notes:
Channel names are stored in
"GroupName/ChannelName"format.wf_increment(sample interval) andwf_start_time(start time) are read from TDMS properties.Start times as
numpy.datetime64ordatetimeare automatically converted to GPS time.
Implementation: gwexpy/timeseries/io/tdms.py
WAV — Audio File .wav
Extension: .wav
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/audio.wav")
Required arguments: None
Key optional arguments: None
Dependencies: scipy (scipy.io.wavfile) — included in gwexpy’s recommended dependencies.
Notes:
t0is always set to0.0(GPS seconds). WAV files do not carry absolute timestamps.Multi-channel files have channel names
channel_0,channel_1, etc.Mono files are automatically loaded as a single channel.
Writing uses the gwpy standard WAV writer pathway.
Implementation: gwexpy/timeseries/io/wav.py
MiniSEED — Seismic Waveform .mseed
Extension: .mseed
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
# Read
tsd = TimeSeriesDict.read("path/to/data.mseed", format="miniseed")
# Write
tsd.write("output.mseed", format="miniseed")
Required arguments: None
Key optional arguments:
channels(list[str], optional) — Channels to read. Specify by trace ID (NET.STA.LOC.CHA) or channel code.unit(str, optional) — Physical unit override.epoch(float or datetime, optional) — Epoch override.timezone(str, optional) — Timezone specification.pad(float, optional) — Gap fill value. DefaultNaN.gap(str, optional) — Gap handling method."pad"(default) or"raise".
Dependencies: ObsPy — Raises ImportError if not installed, with a message suggesting pip install obspy.
Notes:
Data is read via ObsPy’s
read()function.Gaps are padded with
NaNby default. Usegap="raise"to raise an error instead.Automatic trace merging (
merge(method=1)) is applied.
Implementation: gwexpy/timeseries/io/seismic.py
SAC — Seismic Waveform .sac
Extension: .sac
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.sac", format="sac")
tsd.write("output.sac", format="sac")
Required arguments: None
Key optional arguments: Same as MiniSEED (channels, unit, epoch, timezone, pad, gap).
Dependencies: ObsPy
Notes:
SAC is typically one trace per file. Multi-trace writing behavior depends on ObsPy.
Implementation: gwexpy/timeseries/io/seismic.py
GSE2 — Seismic Waveform .gse2
Extension: .gse2
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.gse2", format="gse2")
tsd.write("output.gse2", format="gse2")
Required arguments: None
Key optional arguments: Same as MiniSEED (channels, unit, epoch, timezone, pad, gap).
Dependencies: ObsPy
Implementation: gwexpy/timeseries/io/seismic.py
KNET — K-NET Strong Motion Records .knet
Extension: .knet
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.knet", format="knet")
Required arguments: None
Key optional arguments: Same as MiniSEED (channels, unit, epoch, timezone, pad, gap).
Dependencies: ObsPy
Notes:
Read-only. No writer is registered.
Implementation: gwexpy/timeseries/io/seismic.py
WIN / WIN32 — NIED WIN Format .win / .cnt
Extension: .win, .cnt
Read/Write: Read Y / Write N
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.win", format="win")
# or
tsd = TimeSeriesDict.read("path/to/data.cnt", format="win32")
Required arguments: None
Key optional arguments:
century(str, optional) — Century part of the year. Default"20".
Dependencies: ObsPy — If ObsPy is not installed, the reader is not registered and an ImportError is raised.
Notes:
Uses gwexpy’s improved parser over ObsPy’s standard WIN reader, with the following fixes:
0.5-byte (4-bit) delta decode: fixed lower nibble sign handling; skip unused nibble on odd delta counts.
3-byte (24-bit) delta decode: fixed operator precedence and sign-preserving unpack/shift.
Gaps are merged with
NaN.
Implementation: gwexpy/timeseries/io/win.py
DTTXML — Diag DTT XML (Time Series & Frequency Series)
Extension: .xml, .xml.gz
Read/Write: Read Y / Write N
Recommended API (time series):
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/dtt_output.xml", format="dttxml", products="TS")
Recommended API (frequency series):
from gwexpy.frequencyseries.collections import FrequencySeriesDict
fsd = FrequencySeriesDict.read("path/to/dtt_output.xml", format="dttxml", products="PSD")
from gwexpy.frequencyseries.matrix import FrequencySeriesMatrix
fsm = FrequencySeriesMatrix.read("path/to/dtt_output.xml", format="dttxml", products="TF")
Required arguments:
products(str) — Product type to extract. Required. RaisesValueErrorif omitted.Time series:
"TS"Frequency series:
"PSD","ASD","FFT"Matrix:
"TF","STF","CSD","COH"
Key optional arguments:
channels(iterable[str], optional) — List of channels to read.unit(str, optional) — Physical unit override.epoch(float or datetime, optional) — Epoch override.timezone(str, optional) — Timezone specification.native(bool, optional) — IfTrue, uses gwexpy’s native XML parser. Recommended for complex TF data (subtype 6 phase loss fix). DefaultFalse. (FrequencySeriesDict / FrequencySeriesMatrix only)rows,cols,pairs— Matrix filtering (FrequencySeriesMatrix only).
Dependencies: None (native implementation)
Notes:
Supports both time-series and frequency-domain data. The output type is determined by the
productsvalue.The
.xmlextension is auto-identified (format="dttxml"can be omitted).
Implementation: gwexpy/timeseries/io/dttxml.py, gwexpy/frequencyseries/io/dttxml.py
GWF — Gravitational Wave Frame .gwf
Extension: .gwf
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.gwf", format="gwf")
tsd.write("output.gwf", format="gwf")
Required arguments: None (follows gwpy standard arguments)
Dependencies: — (gwpy standard. Internally uses python-ldas-tools-framecpp, etc.)
Notes:
Handled through gwpy’s standard IO pathway. No custom reader/writer in gwexpy.
Implementation: gwpy standard (gwpy/timeseries/io/gwf.py)
HDF5 — General Scientific Data .h5 / .hdf5
Extension: .h5, .hdf5
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.h5", format="hdf5")
tsd.write("output.h5", format="hdf5")
Required arguments: None
Dependencies: h5py (included in gwexpy’s required dependencies)
Notes:
gwexpy’s
TimeSeriesDict.read()has extended HDF5 loading logic.Supports automatic layout detection (
LAYOUT_DATASET/LAYOUT_GROUP).Supports keymap and ordering restoration.
Implementation: gwpy standard (gwpy/timeseries/io/hdf5.py) + gwexpy extension (gwexpy/timeseries/collections.py)
CSV / TXT — ASCII Text .csv / .txt
Extension: .csv, .txt
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
# Single file
tsd = TimeSeriesDict.read("path/to/data.csv", format="csv")
# Load all files from a directory
tsd = TimeSeriesDict.read("path/to/data_dir/")
Required arguments: None
Dependencies: None (gwpy standard)
Notes:
gwexpy supports loading all CSV/TXT files from a directory as a single
TimeSeriesDict.
Implementation: gwpy standard (gwpy/timeseries/io/ascii.py) + gwexpy extension (gwexpy/timeseries/collections.py)
Pickle — Python Serialization .pkl
Extension: .pkl
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries import TimeSeries
ts = TimeSeries.read("path/to/data.pkl", format="pickle")
ts.write("output.pkl", format="pickle")
Required arguments: None
Dependencies: None
Notes:
Uses gwpy’s standard serialization pathway. Only use with files from trusted sources (standard pickle security considerations apply).
Implementation: gwpy standard
ROOT — CERN ROOT .root
Extension: .root
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.table import EventTable
table = EventTable.read("path/to/data.root", format="root")
Dependencies: — (via gwpy’s table/root pathway)
Notes:
gwexpy’s
table/io/root.pyis a re-export of gwpy’s module of the same name.Primarily used for event table data.
Implementation: gwpy standard (gwpy/table/io/root.py) — via gwexpy: gwexpy/table/io/root.py
NDS2 — Network Data Server
Extension: None (network protocol) Read/Write: Read Y / Write N Recommended API:
from gwexpy.timeseries import TimeSeries
ts = TimeSeries.fetch("channel_name", start, end)
Dependencies: nds2-client (Python bindings)
Notes:
Network-based data retrieval, not file I/O.
Uses gwpy’s standard
fetch()method.
Implementation: gwpy standard (gwpy/timeseries/io/nds2.py)
Audio — MP3 / FLAC / OGG / M4A (via pydub)
Extension: .mp3, .flac, .ogg, .m4a
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
# Read (auto-detected by extension)
tsd = TimeSeriesDict.read("path/to/audio.mp3")
# Explicit format
tsd = TimeSeriesDict.read("path/to/audio.dat", format="flac")
# Write
tsd.write("output.flac", format="flac")
Required arguments: None
Key optional arguments:
channels(iterable[str], optional) — Channel names to read (e.g."channel_0").unit(str, optional) — Physical unit override.
Dependencies: pydub — Raises ImportError if not installed, with a message suggesting pip install pydub. MP3/M4A encoding also requires ffmpeg (apt install ffmpeg or equivalent). FLAC may work without ffmpeg.
Notes:
t0is always set to0.0(GPS seconds). Audio files carry no absolute timestamps (same behavior as WAV).On read, sample values are normalized to the
[-1.0, 1.0]range.Multi-channel files use channel names
channel_0,channel_1, etc.On write, data is rescaled to peak-normalized 16-bit PCM.
Implementation: gwexpy/timeseries/io/audio.py
NetCDF4 — Scientific Data .nc (via xarray)
Extension: .nc
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.nc", format="netcdf4")
tsd.write("output.nc", format="netcdf4")
Required arguments: None
Key optional arguments:
channels(list[str], optional) — Variable names to read. Defaults to all variables with a time dimension.unit(str, optional) — Physical unit override. Defaults to the file’sunitsattribute.time_coord(str, optional) — Name of the time coordinate. Auto-detected if omitted (looks for"time").
Dependencies: xarray + netcdf4 — Raises ImportError if not installed, with a message suggesting pip install xarray netcdf4.
Notes:
Only variables with a time dimension (
time) are converted toTimeSeries. Variables without a time dimension are skipped.Time coordinates are automatically converted from
datetime64to GPS seconds.Multi-dimensional variables (time + spatial, etc.) are flattened along non-time dimensions, resulting in
varname_0,varname_1, etc.On write, the
datetime64time coordinate is reconstructed fromt0anddt.
Implementation: gwexpy/timeseries/io/netcdf4_.py
Zarr — Cloud-Optimized Chunked Arrays .zarr
Extension: .zarr (directory store)
Read/Write: Read Y / Write Y
Recommended API:
from gwexpy.timeseries.collections import TimeSeriesDict
tsd = TimeSeriesDict.read("path/to/data.zarr", format="zarr")
tsd.write("output.zarr", format="zarr")
Required arguments: None
Key optional arguments:
channels(list[str], optional) — Array names to read. Defaults to all arrays in the root group.unit(str, optional) — Physical unit override.
Dependencies: zarr — Raises ImportError if not installed, with a message suggesting pip install zarr.
Notes:
gwexpy Zarr convention: each array in the root group corresponds to one channel. Array
attrsstoresample_rate(Hz) andt0(GPS seconds).If
sample_rateis not set, the inverse ofdtis used; if neither is present, defaults to 1 Hz.t0defaults to0.0.Supports all store types supported by the zarr library (directory stores, zip stores, etc.).
On write,
sample_rate,t0,dt, andunitare saved as array attributes.
Implementation: gwexpy/timeseries/io/zarr_.py
Design-Level Support (No Dedicated Implementation in gwexpy)
The following formats are listed in the design specification (io_support.csv) but do not have dedicated reader/writer implementations in the gwexpy repository at this time.
Format |
Extension |
Designed Read/Write |
Notes |
|---|---|---|---|
MTH5 |
|
Read Y / Write Y |
Via |
Unimplemented (Stub) Formats
The following formats are registered as placeholders (stubs) in the IO registry. Calling .read() raises an UnimplementedIOError or NotImplementedError. These are reserved for future implementation pending specification or sample files.
Time Series Stubs (gwexpy/timeseries/io/stubs.py)
Format Name |
Registered Classes |
Notes |
|---|---|---|
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
ORF format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
MEM format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
WVF format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
WDF format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
TAFFMAT format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
LSF format |
|
TimeSeries, TimeSeriesDict, TimeSeriesMatrix |
LI format |
Frequency Series Stubs (gwexpy/frequencyseries/io/stubs.py)
Format Name |
Registered Classes |
Notes |
|---|---|---|
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Implemented for time series but not frequency domain |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Same as above |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Implemented for time series but not frequency domain |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
|
FrequencySeries, FrequencySeriesDict, FrequencySeriesMatrix |
Not implemented |
Basic Usage of .read() / .write()
All gwexpy data classes use gwpy’s IO registry. Here is the basic usage pattern.
Reading
from gwexpy.timeseries.collections import TimeSeriesDict
from gwexpy.timeseries import TimeSeries
# Auto-detect format from extension
tsd = TimeSeriesDict.read("path/to/file.gbd", timezone="Asia/Tokyo")
# Explicitly specify format
tsd = TimeSeriesDict.read("path/to/file.dat", format="miniseed")
# Read a single channel
ts = TimeSeries.read("path/to/file.gbd", timezone="Asia/Tokyo")
Writing
# Write to a supported format
tsd.write("output.mseed", format="miniseed")
tsd.write("output.h5", format="hdf5")
Frequency Series
from gwexpy.frequencyseries.collections import FrequencySeriesDict
from gwexpy.frequencyseries.frequencyseries import FrequencySeries
fsd = FrequencySeriesDict.read("path/to/dtt.xml", format="dttxml", products="PSD")
How the format Argument Works
Omitted: The format is auto-detected from the file extension using identifier functions registered via
io_registry.register_identifier(...).Specified: The reader/writer registered for that format name is called directly.
When auto-detection fails (e.g.
.xmlis used by formats other than DTTXML): specifyformatexplicitly.
Reference: Source Implementation Files
Implementation files referenced in the creation of this document.
Module Path |
Summary |
|---|---|
|
Time series IO module registration entry point |
|
GBD reader. |
|
ATS reader (binary parser). |
|
SDB reader (WeeWX SQLite). Registered under |
|
TDMS reader. Depends on |
|
WAV reader. |
|
MiniSEED / SAC / GSE2 / KNET reader/writer. Depends on ObsPy |
|
WIN/WIN32 reader. Requires ObsPy. Improved 4bit/24bit delta decode |
|
DTTXML time series reader. |
|
Audio reader/writer (MP3/FLAC/OGG/M4A). Depends on pydub. |
|
NetCDF4 reader/writer. Depends on xarray. Auto-detects time dimension |
|
Zarr reader/writer. Depends on zarr. gwexpy store convention |
|
Time series stubs ( |
|
HDF5 IO (gwpy re-export) |
|
ASCII IO (gwpy re-export) |
|
NDS2 IO (gwpy re-export) |
|
Cache IO (gwpy re-export) |
|
GWOSC IO (gwpy re-export) |
|
Core IO (gwpy re-export) |
|
Frequency series IO module registration entry point |
|
DTTXML frequency series reader. |
|
Frequency series stubs ( |
|
HDF5 IO (gwpy re-export) |
|
ASCII IO (gwpy re-export) |
|
LIGO_LW IO (gwpy re-export) |
|
ROOT IO (gwpy re-export) |
|
|
|
Design data: supported format list |