Database schema

The set of table definitions here builds the fundamental structures for data in the Sparrow system.

The user model parallels the researcher model but is used only for application authentication. We can reset all application access by truncating this table, without losing data.

Projects

If researchers on a project have application user accounts, they can see data even if embargoed (not yet implemented).

Descriptors for types of measurements/techniques

Sample

An object to be measured

Order-of-magnitude precision (in meters) with which this position is known

A representative named location

The elevation column could potentially be recast as a datum tied directly to the sample.

Potential issues:

  • Samples potentially have several levels of abstraction (e.g. different replicates or drillings from the same rock sample) (could have a self-referential ‘part_of’ relation on sample…)
  • Samples might belong to several projects

Should the session table contain the link to project rather than the sample table? This might be more correct, and samples could still be linked to projects through a table relationship

Session

Set of analyses on the same instrument with the same technique, at the same or closely spaced in time.

Examples:

  • An entire Ar/Ar step heating experiment
  • A set of detrital single-crystal zircon measurements
  • A multi-grain igneous age determination

UUID column to provide a globally unique, immutable reference to an analytical dataset. When combined with a lab-specific namespace (not yet implemented), this provides an identifier that can be traced back to the origin facility, maintaining data provenance. This fulfills similar functions to IGSNs and DOIs, and the preliminary implementation here can be changed for interoperability without affecting the internal organization of the Sparrow database.

A field to store extra, semi-structured session data in a key/value format

Single analyses

These two tables will end up needing data-type specific columns

Analysis

Set of data measured together at one time on one instrument

Examples:

  • A single EPMA analytical spot (different oxides measured on an EPMA are data in a single analysis)
  • A heating step for Ar/Ar age determination
  • A single-crystal zircon age measurement

If session_index is not set, analysis_type allows the unique identification of a record within the session

Not really sure that “material” is the best parameterization of this concept…

Some analytical results can be interpreted from other data, so we should explicitly state that this is the case.

Examples:

  • a detrital zircon age spectrum (the datum table would contain individual probability values at each age)
  • a multi-zircon igneous age (the datum table would include jointly-fitted age determinations for each relevant system)
  • a calculated plateau age for a stepped-heating Ar-Ar experiment.

If set, this means that this is an “accepted” value among several related measurements.

Examples:

  • Accepted system for U-Pb single-zircon age
  • Heating steps accepted in the final age analysis

Analytical constants

Constants, etc. used in measurements, and their relationships to individual analytical sessions, etc.

Right now, we support linking these parameters at the session level. Some coarser (e.g. a table for analytical process) or finer (linked parameters for each datum) abstraction might be desired.

In many ways, the column layout mirrors that of the datum table, with the exception that there is a many-to-many link on the data.

Analytical parameters, calibration types, etc. that remain constant across many sessions (e.g. decay constants, assumed physical parameters).

Data files

Original measurement data file information

Foreign key columns to link to data that was imported from this file; this should be done at the appropriate level ( e.g. sample, analysis, session) that fits the data file in question.

The linked data file is then considered the primary source for all data corresponding to this model.

Note: this table and its assumptions are part of the import process and could change significantly at this early stage.

Only one of the linked data file columns should be set at a time (a data file can only be packaged at one level, even if it encompasses information about other entities)