File I/O Supported Formats Guideο
Page Role: Guide
This is the end-user I/O guide for gwexpy.
This page only covers the public .read() / .write() / fetch() style APIs that users call directly to read, write, or fetch data.
It does not cover to_*() / from_*() conversions or object bridges to xarray, ROOT objects, or Zarr arrays. If the question is βhow do I convert this object into another library or container?β, that belongs to interop instead. For those topics, see the interop tutorial and the interop API reference.
At a Glanceο
Item |
Details |
|---|---|
Audience |
Users choosing a direct file or network I/O path for |
Prerequisites |
Basic familiarity with |
Use Cases |
Pick a supported format, decide when to set |
Search Hints |
file I/O, direct I/O, |
Search hints: file I/O, direct I/O, read, write, fetch, HDF5, GWF, MiniSEED, Zarr, NDS2, GWOSC
Warning
Security Warning: Pickle Files
Pickle (:term:Pickle) is convenient, but reading Pickle files from untrusted sources is dangerous. A malicious Pickle file can execute arbitrary code on your system.
For data sharing and long-term storage, prefer structured formats such as HDF5, GWF, or Zarr.
First: Decision Rulesο
If you need a default GW storage format, start with HDF5. For existing seismic or geophysical assets, start with MiniSEED / SAC / WIN / ATS. For general interchange, start with CSV / NetCDF4 / Zarr. For logger- or device-specific data, start with GBD / TDMS / SDB / WAV / Audio. For MTH5, the current public direct-I/O story is only the single
ats.mth5path. A generic standaloneformat="mth5"route is not published yet.Auto-detect is fine when the extension uniquely selects one reader.
For generic HDF5, set
format="hdf5"explicitly. The.h5/.hdf5extensions overlap multiple HDF5-backed families, and auto-identification is not uniform across classes.Set
format=explicitly for ambiguous extensions such as.xml, for custom lab extensions, or whenever auto-detection is unclear.Pass
timezoneexplicitly when the file stores local wall-clock time without embedded UTC/GPS. In the current user-facing guide, GBD is the main required case.Read-only / write-only matters:
β / Γmeans a format can be read but not written.For richer direct-I/O objects beyond plain Series, start with HDF5 for
Spectrogram,Histogram, andEventTable. Field-class direct.read()/.write()is still under audit and is not published as a stable contract on this page.
Jump Linksο
Quick Selection Tableο
Group |
Start here when⦠|
First format |
Formats covered here |
|---|---|---|---|
A. GW Standards |
You want standard GW storage, exchange, or acquisition paths |
HDF5 |
GWF, HDF5, hdf.ndscope, xml.diaggui, NDS2, GWOSC |
B. Seismic and Geophysical Observation |
You need to read existing seismic or EM observation data |
mseed |
mseed, SAC, GSE2, K-NET, WIN / WIN32, ATS, ATS.MTH5 (MTH5 standalone is status-only here) |
C. General Analysis and Exchange |
You need general-purpose storage or external analysis exchange |
CSV / TXT or Zarr |
CSV / TXT, NetCDF4, Zarr, ROOT |
D. Loggers and Instrument Formats |
You are working with device- or logger-specific time series |
GBD or TDMS |
GBD, TDMS, SDB / SQLite / SQLite3, WAV, MP3, FLAC, OGG, M4A |
Note:
NDS2andGWOSCare not file formats. They are included in A. GW Standards because they are common GW data entry points. In the tables below, they are labeled asnetwork path.
Basic .read() / .write() / fetch() Usageο
Purpose: show the baseline direct-I/O entry points before format-specific details
Input: file paths, an explicit
format=when needed, or a detector/network queryOutput:
TimeSeries,TimeSeriesDict, or other direct-I/O return objects
from gwexpy.timeseries.collections import TimeSeriesDict
from gwexpy.timeseries import TimeSeries
# Auto-detect from extension
tsd = TimeSeriesDict.read("path/to/data.mseed")
# Explicit format
tsd = TimeSeriesDict.read("path/to/data.dat", format="mseed")
# Write out
tsd.write("output.h5", format="hdf5")
# Network path
ts = TimeSeries.fetch_open_data("H1", 1126259446, 1126259478)
.read()/.write()uses the gwpy-style I/O registry..xmlis ambiguous, so useformat="xml.diaggui"explicitly for DiagGUI XML data.NDS2andGWOSCare not file readers, so they usefetch()/fetch_open_data()instead of.read().
Supported Classes at a Glanceο
If the main question is whether a format is for a single channel or multiple channels, use this table first.
Format / Family |
Single |
Multi |
Other classes |
|---|---|---|---|
GWF / mseed / SAC / GSE2 / K-NET / WIN / WIN32 / ATS / SDB / SQLite / SQLite3 / WAV / Audio |
|
|
Baseline end-user direct I/O pattern |
CSV |
|
|
|
TXT |
|
|
Multi-channel direct I/O uses collection directories |
nc / Zarr / GBD / TDMS |
|
|
Includes matrix-style direct I/O |
HDF5 |
|
|
Also covers |
hdf.ndscope |
- |
|
ndscope-compatible schema; aliases: |
xml.diaggui |
- |
|
Requires |
NDS2 / GWOSC |
|
- |
Use |
ATS.MTH5 |
|
- |
Partial single-path support |
ROOT |
|
- |
Direct I/O is limited to EventTable |
If you are unsure, start by thinking in terms of
TimeSeriesandTimeSeriesDict.TimeSeriesMatrixmainly matters forNetCDF4,Zarr,GBD, andTDMS.If you need to preserve richer objects beyond Series classes, start with HDF5.
Optional Dependency Matrixο
Most direct-I/O routes work in a base GWexpy install. The formats below depend on optional packages or optional metadata helpers.
Format / family |
Optional dependency |
GWexpy extra |
Missing-dependency behavior |
|---|---|---|---|
WAV metadata |
|
|
|
MP3 / FLAC / OGG / M4A |
|
|
Audio read/write raises |
TDMS |
|
|
Reader raises |
mseed / SAC / GSE2 / K-NET |
|
|
Registered reader/writer raises |
WIN / WIN32 |
|
|
Uses conditional registration: when ObsPy is unavailable, the |
ATS.MTH5 |
|
|
Reader raises |
nc / NetCDF4 |
|
|
Reader/writer raises |
Zarr |
|
|
Reader/writer raises |
A. GW Standardsο
These are the standard GW storage, exchange, and acquisition paths. If you are unsure, start with HDF5. Use GWF when you need external standard compatibility, and DTTXML for diagnostic tool output.
Format / Path |
R / W |
Main entry |
Best for |
Notes |
|---|---|---|---|---|
GWF ( |
β / β |
|
Standard LIGO/KAGRA frame exchange |
Standard format, via gwpy |
HDF5 ( |
β / β |
|
Long-term storage with metadata |
Prefer explicit |
hdf.ndscope ( |
β / β |
|
ndscope-compatible HDF5 |
|
xml.diaggui ( |
β / Γ |
|
DiagGUI / DTT outputs |
|
NDS2 |
β / Γ |
|
Detector data server access |
Network path |
GWOSC |
β / Γ |
|
Open data access |
Network path |
Purpose: compare the main GW-oriented direct-I/O and network entry points
Input: HDF5, GWF, DTTXML, or detector/open-data access parameters
Output:
TimeSeries,TimeSeriesDict, or fetched open data
from gwexpy.timeseries.collections import TimeSeriesDict
from gwexpy.timeseries import TimeSeries
tsd = TimeSeriesDict.read("data.h5", format="hdf5")
frame = TimeSeriesDict.read("data.gwf", format="gwf")
merged = TimeSeriesDict.read(["part0.gwf", "part1.gwf"], "H1:STRAIN", pad=float("nan"))
dtt = TimeSeriesDict.read("diag.xml", format="xml.diaggui", products="TS")
open_data = TimeSeries.fetch_open_data("H1", 1126259446, 1126259478)
HDF5 is the safest general recommendation for structured GW data.
GWF reads accept a list or tuple of
.gwffiles forTimeSeriesandTimeSeriesDict. Files are merged in time-span order; contiguous spans join normally, gaps raise by default,pad=<value>orgap="pad"fills gaps, andgap="ignore"concatenates without filling. Overlapping spans are rejected by default or withgap="raise", whilegap="ignore"concatenates files in span order and permits overlap concatenation. Ifstartorendextends beyond available data, the defaultgap="raise"behavior rejects the request; usepad=<value>orgap="pad"to fill the outer interval.gap="ignore"never pads missing samples, including outerstart/endranges. When channel names are not supplied for multi-file reads, auto-discovery uses the first file and assumes the remaining files expose compatible channels.DTTXML changes behavior depending on
products. Keep public direct reads onTimeSeriesDict.read(..., format="xml.diaggui", products=...).Frequency-domain DTTXML direct shims and registry adapters are implementation-only, not part of the public direct-I/O contract. Advanced internal users handling complex transfer functions can prefer
native=Truethere.NDS2 / GWOSC are shown inside group A, but explicitly marked as
network pathrather than file formats.
B. Seismic and Geophysical Observationο
This group is for existing seismic and electromagnetic observation formats. In practice, MiniSEED is the easiest starting point when you need to place a format in context.
Format |
R / W |
Main entry |
Best for |
Notes |
|---|---|---|---|---|
mseed ( |
β / β |
|
Standard seismic waveform exchange |
|
SAC ( |
β / β |
|
Seismic waveform analysis |
Via ObsPy |
GSE2 ( |
β / β |
|
Seismic waveform exchange |
Via ObsPy |
K-NET ( |
β / Γ |
|
Strong-motion records |
Read-only |
WIN / WIN32 ( |
β / Γ |
|
Japanese WIN datasets |
Improved parser, read-only |
ATS ( |
β / Γ |
|
Metronix observation data |
Native binary reader |
ATS.MTH5 ( |
β / Γ |
|
Single MTH5-backed path |
Partial support |
MTH5 standalone ( |
In progress |
Dedicated |
Future general MTH5 direct I/O |
Not currently a public direct-I/O format. The only direct path today is |
Purpose: compare common seismic and geophysical readers without overstating MTH5 support
Input: existing waveform files such as MiniSEED, WIN/WIN32, or the limited
ats.mth5pathOutput:
TimeSeriesorTimeSeriesDictobjects depending on the reader
from gwexpy.timeseries.collections import TimeSeriesDict
from gwexpy.timeseries import TimeSeries
tsd = TimeSeriesDict.read("data.mseed", format="mseed", gap="pad")
win = TimeSeriesDict.read("data.cnt", format="win32")
ats = TimeSeries.read("data.atss", format="ats.mth5")
MiniSEED pads gaps with
NaNby default. Usegap="raise"if you want failures instead.K-NET and WIN / WIN32 are intentionally read-only.
ATS.MTH5 is the limited current direct path.
MTH5 standalone is still in design/publication cleanup. Read this as β
ats.mth5has partial supportβ, not as βMTH5 direct I/O is generally complete.β
C. General Analysis and Exchangeο
These formats are useful for analysis notebooks, interchange, and general storage. The key rule here is not to mix up βformat choiceβ with βlibrary conversion.β
Format |
R / W |
Main entry |
Best for |
Notes |
|---|---|---|---|---|
CSV ( |
β / β |
|
Lightweight exchange and inspection |
Auto-identifies |
TXT ( |
β / β |
|
Plain-text exchange |
Multi-channel direct I/O uses collection directories |
nc ( |
β / β |
|
Scientific storage for time-series-oriented data |
Direct I/O here is centered on TimeSeries classes; legacy format alias: |
Zarr ( |
β / β |
|
Chunked storage and parallel workflows |
Direct I/O here is centered on TimeSeries classes |
ROOT ( |
β / β |
|
EventTable I/O |
Auto-identifies |
Purpose: show the general-purpose direct-I/O routes without mixing them with interop-only bridges
Input: CSV, Zarr, ROOT, or other general exchange formats
Output:
TimeSeriesDict,TimeSeriesMatrix, orEventTable
from gwexpy.timeseries.collections import TimeSeriesDict
from gwexpy.table import EventTable
ascii_data = TimeSeriesDict.read("data.csv")
chunked = TimeSeriesDict.read("data.zarr", format="zarr")
events = EventTable.read("events.root")
CSV remains useful for lightweight exchange and inspection. Treat simple CSV files as metadata-light: use HDF5, GWF, Zarr, NetCDF, or a manifest-backed collection directory when name, channel, and unit metadata must be preserved.
TXT direct I/O is more limited: single-series paths are explicit
format="txt", and multi-channel paths use collection directories.Pickle portability notes still exist in class references, but Pickle is not a published direct
.read()/.write()format on this page.NetCDF4 / Zarr are treated here only as direct TimeSeries-style I/O. Field/xarray bridges belong to interop. For NetCDF,
netcdf4is a legacy format token alias fornc;.netcdf4is not a documented auto-identified extension alias.Zarr direct I/O now expects per-array timing metadata explicitly.
sample_rateis the primary key,dtis accepted as a fallback, and reads raiseValueErrorif neither is present unless you intentionally recover a legacy store withsample_rate_override=...ordt_override=....ROOT object-level export/import belongs to interop. This page only covers EventTable direct I/O, which requires
uproot.
D. Loggers and Instrument Formatsο
This group is for logger and instrument-specific time-series formats.
Time handling, units, and audio t0 semantics are the main points to watch.
Format |
R / W |
Main entry |
Best for |
Notes |
|---|---|---|---|---|
GBD ( |
β / Γ |
|
GRAPHTEC loggers |
|
TDMS ( |
β / Γ |
|
National Instruments data |
Read-only; requires |
SDB / SQLite / SQLite3 ( |
β / Γ |
|
WeeWX and similar archives |
Same reader family; public direct I/O is read-only |
WAV ( |
β / β |
|
Uncompressed audio |
Public write is single-series only; does not preserve absolute time |
MP3 / FLAC / OGG / M4A |
β / β |
|
Compressed audio |
Uses |
Purpose: highlight logger-specific and audio-specific direct-I/O requirements
Input: logger data, SQLite-family archives, or audio files
Output:
TimeSeries,TimeSeriesDict, orTimeSeriesMatrix
from gwexpy.timeseries.collections import TimeSeriesDict
logger = TimeSeriesDict.read("data.gbd", timezone="Asia/Tokyo")
weather = TimeSeriesDict.read("archive.sqlite3", format="sqlite3")
audio = TimeSeriesDict.read("sound.flac", format="flac")
GBD requires
timezone.TDMS requires the optional
nptdmsdependency.MP3 / FLAC / OGG / M4A require the optional
pydubdependency. MP3/M4A commonly also needffmpeg.SDB / SQLite / SQLite3 should all be named explicitly in the public page so users do not need to infer aliases.
WAV / compressed-audio formats do not preserve absolute timestamps. Reading with
t0=0.0is a convenience convention, not a claim that the source had an absolute epoch.
Developer Notesο
Most users can skip this section. It exists mainly to collect not-yet-prominent implementations and placeholders in one place.
Managed in design, but not prominent in the public pageο
Format |
Status |
Notes |
|---|---|---|
|
Implemented, not yet prominent |
|
|
Implemented with partial scope |
Current public direct path backed by MTH5 |
|
In progress |
Dedicated |
Unimplemented Formats (Stubs)ο
These entries exist as placeholders only. Calling .read() on them is expected to fail because they are not ready for end users yet.
TimeSeries stubsο
Format |
Status |
|---|---|
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
FrequencySeries stubsο
Format |
Status |
|---|---|
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
|
Planned |
Next to Readο
Interop / Conversion Guide for
to_*()/from_*()bridges and object-level conversionGPS Time Utility Functions if your I/O workflow needs timezone or GPS-time handling
Installation guide if you need optional dependencies before using a format backend