wiki:DataModel18

Version 3 (modified by alisdair, 2 years ago) (diff)

--

VSO Data Model - Version 1.8

Introduction

The VSO Data Model provides a set of template descriptions for information required to describe, access, and search solar data sets in a variety of archives. It is an abstract model, not a suggested set of keywords to be used in data nor in databases. Because of the ubiquity of the FITS standard and the wide use of certain conventions, we provide illustrative values of FITS keywords for certain data elements; but neither the adoption of any set of particular keywords nor the FITS data model at all are required for a data description to conform to the model. The VSO Element Names, are used at a level of abstraction once removed from the search parameters of the data providers. They should be completely internal to the VSO procedures for decoding information from user interfaces, not sent in in queries to data providers. We have deliberately avoided the use of FITS-compatible keyword names to emphasize this point.

VSO Search Parameters

VSO search parameters are those data descriptors for which queries are supported by the VSO in behalf of client applications or requests. These are the parameters that can best discriminate among a large collection of heterogeneous data. They must therefore be supported by the data providers as search parameters applicable to a large subset of the data archives. They must map to parameters in the server data dictionaries in a well-defined and meaningful way. They must also be selected so that the number of data sets meeting a particular selection criterion is small compared to the total number: for the VSO an astronomical type search parameter of Object (Sun) is not particularly useful as a discriminator. The VSO search parameters are divided into a few groups, each described under one of the major subsections. These categories are understood to be orthogonal, in the sense that they can be used to construct non-trivial AND queries. Of course they are not strictly orthogonal: selection of a particular data source (instrument) may automatically restrict the available observing times for example, and vice-versa. Nonetheless it is useful to treat the major categories as if they were orthogonal and treat any dependencies as implicit selections or limits. No particular set of search parameters is required. In the absence of a relevant element or group of elements in its data description, a dataset is assumed to match all queries. For example, if no wavelength information is supplied, then the server will return all records for any selected (or deselected) wavelength interval. If a parameter is not searchable but has a default value, then that value can be supplied directly in the data description. For example, an archive of data all taken at the same wavelength is unlikely to have wavelength as a searchable key in its database, but could (and should) supply that wavelength as a fixed value in its data description to avoid inappropriate satisfaction of client queries. The current parameter list is not intended to be exhaustive, and it may be useful to add additional search elements and categories in future. The categories chosen are those for which the VSO either has attempted to implement a search service or contemplates doing so. So far, only a few of the parameters can be searched in the VSO, and these are marked with asterisks in the following list. The elements are described in detail by group under the following sections.

  • Observing Time
    • Observation_Time*
    • Duration
    • Time_Step
  • Target Location
    • Observation_Center_West
    • Observation_Center_North
    • Bounding_Radius
  • Observer Location
  • Spectral Range
    • Wave_Type
    • Wave_Bands (may be deleted in future versions)
    • Wave_Minimum*
    • Wave_Maximum*
    • Wave_Step delete?
  • Physical_Observable*
  • Data Organization
  • Wave Mode Sampling
    • Degree_Minimum
    • Degree_Maximum
    • Degree_Step delete?
  • Data Source
    • Observatory*
    • Instrument*
    • Provider*

1. Observing Time

Observing time is by general consensus the most likely parameter to be used as a first case for searches, the most ubiquitous indexing parameter for data, and one on which there is widespread agreement and understanding of representations, scales, and units. Most of the complexity involved is in the descriptions of data translation. Here it is sufficient to specify a simple uniform description.
(Most observational data are expected to be associated with observing times, and so far all VSO query structures have been assumed to include a time search parameter. It is possible however that some data may not be; model data are an example. As described above, such data would automatically satisfy any time interval query, and at least one additional parameter would be required to make them selectable.)

Observation_Time

type: time
FITS keyword: T_OBS
The time at which the data comprising an atomic data set were originally recorded. If the duration of the data in the atomic data unit is large compared with the search time resolution, the Observation_Time is to be understood to correspond to the center (mid-point) of the observation(s), weighted as appropriate. For purposes of the Data Model, Observation_Time is given in calendar-clock form, e.g.2004.03.08_16:25. Times are assumed to be UTC. The time resolution is one minute, so for much data the conversion from say start time of an exposure to Observing_Time should not matter. Likewise the conversions between UTC and other units such as ET, TAI, and GPS should not be a matter of much concern. A data match is assumed to include all data from 30 seconds before the target time to 30 seconds after, inclusive (closed at both ends), so that a data Observation_Time can in principle fall into two adjacent target times. Note that since Jan 1, 1999, TAI = UTC + 32 sec, and GPS = UTC + 13 sec.

Duration

type: number
unit: second
FITS keyword: T_LENGTH
The interval between the start and end of observation in the atomic data unit. For a single image or spectrum, this is simply the exposure time; for a movie, it is the time difference between the start of the first image and the end of the last.

Time_Step

type: number
unit: second
FITS keyword: T_STEP
The interval between succesive time samples (data records) in a dataset.

2. Target Location

Target location, by which is meant the spatial location of the target region of imaged or pointed observations on or around the Sun or in the heliosphere, has not yet been built into any VSO query models, although it is a fairly natural selection criterion for observations with a restricted field of view. It may suffice to specify a simple uniform description, although the multi-dimensionality of space makes this harder than one for time. For two-dimensional image data we assume a bounding circle as the simplest model. For this model it is sufficient to specify the center location and radius of the bounding circle. Most real image data are actually described by a bounding rectangle, but this requires specifying at least five parameters (e.g. the coordinates of opposite corners and a position angle).

Observation_Center_West

type: number
unit: arc-second
FITS keyword: CENT_WST

Observation_Center_North

type: number
unit: arc-second
FITS keyword: CENT_NRT
A pair of coordinates specifying the location of the center of the image data circle with respect to the Earth-Sun line at the nominal Observation_Time. This origin is close to the center of the apparent solar image for Earth-based or near-Earth observers, but not necessarily for deep space observations. The North coordinate is measured in the direction of the Carrington axis (RA 286°.13, δ 63°.87 J2000.0), and the West coordinate in the direction of solar rotation.

Bounding_Radius

type: number
unit: arc-second
FITS keyword: R_BOUND
The radius of the bounding circle about the Observation_Center. For the VSO Data Model the bounding circle is to be understood as either the maximum inscribed circle in the bounding data rectangle (polygon), or the minimum circumscribed circle, depending on whether the query is for included data (presumably the normal default) or excluded data, respectively.

3. Observer Location

No Search Parameters have been defined to describe observer location. Two classes of description are appropriate, one for ground-based observations and one for space-based data, particularly in situ measurements. For Earth observatories, a straightforward geographic latitude / longitude / altitude description should suffice, but it is not clear how useful this would be as a discriminator for data searches. For space platforms, where the description of location for in situ data is especially important, we defer to the model (to be) adopted by the VSPO. It should be noted, though, that as stereoscopic imaging of the Sun from space observatories becomes more important, search parameters associated with observer location with respect to solar coordinate frames may have to be introduced.

4. Spectral Range

The electromagnetic wavelength interval or equivalent over which observations are made is the fundamental discriminator among many types of solar image and other data. The model needs to apply to both narrow-band ("monochromatic" or single-line) and broad-band data. Different branches of the field use different units depending on their spectral band -- frequency at the lowest ranges (of frequency), wavelength at intermediate ranges, energy at the highest. Again for the sake of simplicity we define a single model, assuming that the necessary conversions can be simply made.

Wave_Type

type: menu
FITS keyword: WV_TYPE
The class of spectral data, relating to both the nominal spectral bandpass and the spectral target. Three values are recognized:

broad

Indicates that the spectral range of the measurement is large compared to the width of absorption/emission lines within the range, and encompasses multiple lines as well as continuum (unless blanketed).

line

The spectral range of the measurement is of the same order or less than the width of the target line, and is centered on a wavelength within the wings of the line.

narrow

The spectral range of the measurement is of the same order or less than the typical width of lines in the neighborhood, but is centered on a continuum wavelength, outside of any significant lines. This designation is used to distinguish narrow-band continuum (or "white-light") data from true broad-band data. For data of this description, the matching spectral range should be much broader than the instrumental bandpass, on the understanding that the data are proxies for broadband measurements.

The exact definition of the bandpass (e.g. FWHM) is not prescribed, but is left up to the terminology of the data provider. In the absence of a provider definition, FWHM should be used.

Wave_Minimum

type: number
unit: Ångström (10nm)
FITS keyword: WV_MIN

Wave_Maximum

type: number
unit: Ångström (10nm)
FITS keyword: WV_MAX
The nominal minimum (maximum) of the observing spectral bandpass associated with the data. As discussed above, for narrowband continuum data, the range should be much larger than the instrumental bandpass; it should correspond to the spectral range over which the data are useful as a proxy, typically an octave or more.

Wave_Bands

type: number
FITS keyword: WV_NBAND
The number of wavelength bands in the observation

Wave_Step

type: number
unit: Ångström (10nm) / pixel
FITS keyword: WV_STP
The spectral dispersion

5. Observable

It is in the description of the independent variables, what the data in fact measure, that there is the greatest variation in terminology among data archives. Most solar observational data consist of direct measurements of the intensity of radiation as a function of time, direction (location), wavelength, and polarization, or combinations of intensities associated with different independent variables (e.g. line shifts and splittings, Stokes parameters). These data may be interpreted as measurements of certain physical observables, such as temperature, velocity, emission measure, etc. via models. There are of course some important exceptions: some solar data archives include in situ measurements of such observables as particle fluxes and compositions and magnetic field strengths; some solar data sets represent not direct observation but the results of complex inversions or modeling, such as the frequencies of acoustic modes, or the interior structure; and there are catalogs, histories, and descriptions of features and events. As long as the various observable classes are orthogonal, however, these additional cases should present no problem.

The model of describing observables in terms of particular combinations of intensity measurements or the associated physical parameters to be derived from them is a natural one for data deriving from imaging spectrographs, such as magnetographs and helioseismic instruments. For cameras or radiometers measuring only intensity or flux at selected wavelengths, it is not so natural. People dealing with data from such instruments tend to think of the observables as being associated with the spectral wavelength or band selected, or for monochromatic instruments, even the spatial-temporal target of the observations. It is important to understand that the meaning of the term "observable" in the VSO Search Parameter model may not at all agree with the meaning of the term as used by the data providers.

Physical_Observable

type: menu
FITS keyword: PHYS_OBS
The following values are currently recognized:

intensity

the direct intensity, either integrated over the spectral observing range or as a function of wavelength (spectral density)

equivalent_width

differences between intensities measured at nearby wavelengths, typically in line cores, wings, and nearby continuum, whether measured as an intensity difference or an equivalent width

polarization_vector

the net linear polarization

LOS_magnetic_field

the frequency/wavelength Zeeman splitting between opposite circular polarizations of a magnetically-sensitive line

vector_magnetic_field

field strengths and directions inferred from Stokes polarimetry

LOS_velocity

the displacement of line center from rest wavelength/frequency in an arbitrary polarization state

vector_velocity

Two- or three-dimensional velocities, typically inferred from helioseismic inversion or from directly measured velocities transverse to the line of sight, possibly combined with Doppler velocities

wave_power
wave_phase
oscillation_mode_parameters

These all refer to solar internal or atmospheric acoustic-gravity wave measurements. The mode parameters could include frequencies, splittings, amplitudes, widths, etc.

number_density
particle_flux
composition
particle_velocity
thermal_velocity

in-situ observations

In addition to the above, the following classes have been suggested:

  • Electric Field Strength - the Stark effect splitting
  • Transverse Magnetic Field Strength - Hanle effect measurements
  • Stokes Parameters (I, Q, U, V - equivalent to observables of net circular, linear and total polarization, and polarization angle
  • in situ Magnetic Field
  • Differential Emission Measure
  • Model Parameters - Interior, Atmosphere, Solar Wind

6. Data Organization

The data organization describes the physical meaning of the independent variable(s) with respect to which the observables are measured. This is useful for knowing whether and how different data sets can be directly compared, overlaid, mapped, or otherwise transformed.

Data_Layout

type: menu
FITS keyword: DATA_ORG
The following values are recognized:

image

data organized by two dimensions corresponding to angular displacement along the axes; examples include photograms (digital or digitized photographs), spectroheliograms, magnetograms, and Dopplergrams

map

data organized by two dimensions corresponding to spatial displacement along the axes; examples include synoptic charts

time_series

data organized by one dimension corresponding to temporal displacement along the axis; note that this is not the same as a time-tagged set of data records, since it implies sampling uniformity and provision for data gaps

movie

data organized by three dimensions corresponding to spatial or angular displacement along two axes and temporal displacement along the principal (most slowly varying) axis

spectrum

data organized by one dimension corresponding to displacement in electromagnetic wavelength or frequency along the axis

mode_spectrum

data organized by one or more dimensions corresponding to the quantum numbers of oscillations

spectral_temporal

data organized by two dimension corresponding to displacement in wavelength or frequency along one axis and temporal displacement along the other

spatial_spectral

data organized by two dimensions corresponding to spatial or angular image axes and one corresponding to electromagnetic spectral displacement

7. Wave Mode Sampling

These parameters relate to data sets derived from helioseismic analysis of solar image data, specifically to global-mode analysis. No such data sets are currently available from any of the providers, so these search parameters have not yet been implemented.

Degree_Minimum

type: number
FITS keyword: L_MIN

Degree_Maximum

type: number
FITS keyword: L_MAX
The nominal minimum (maximum) of the spherical harmonic degree range associated with the data.

Degree_Step

type: number
unit: Ångström (10nm) / pixel
FITS keyword: L_STP
The spacing between spherical harmonic degrees in the data

8. Data Source

Observatory

type: menu
FITS keyword: OBSERVTY
An identifier of the observatory, space platform, or network of observatories (or spacecraft) from which the data originate. In the case of networks such as GONG or CLUSTER, the particular observatory site or spacecraft may be identified by Instrument if each member is single-instrument. In the case of multi-instrument multi-site networks, another Data Source search parameter (Site or Instance or Platform or Network) may be required. Note that network is used in the sense of functionally identical instruments deployed in different locations, and not coordinated data collections from distinct instruments, such as the H-alpha Network; that is considered a Provider. The recognized values are those in the data registry, and the list is subject to modification whenever the data registry is modified. At the time of writing, they include the following:

  • BBSO : Big Bear Solar Observatory
  • Evans Solar Telescope, Sacramento Peak
  • GONG : Global Oscillations Network Group
  • JSPO : Jeffreys South Pole Observatory
  • KANZ : Kanzelhöhe Solar Observatory
  • KPVT : Kitt Peak Vacuum Tower Telescope
  • McMath? Solar Telescope, Kitt Peak
  • MEDN : Observatoire de Paris, Meudon
  • MLSO : Mauna Loa Solar Observatory
  • MtWilson? : Mt. Wilson 60ft Tower Telescope
  • Nançay Radio Observatory
  • OACT : Osservatorio Astrofisico di Catania
  • PicMidi? : Observatoire du Pic du Midi
  • SOHO : Solar and Heliospheric Observatory
  • SOLIS : Synoptic Optical Long-term Investigations of the Sun
  • OBSPM : Observatoire de Paris, Meudon
  • OVRO : Owens Valley Radio Observatory
  • TON : Taiwan Oscillations Network
  • YNAO : Yunnan Astronomical Observatory
  • Yohkoh For the current list, consult the Registry.

Instrument

type: menu
FITS keyword: INSTRUMT
For multi-instrument space observatories, an identifier of the particular instrument from which the data originate. For observatories, the Instrument may refer to a particular telescope or to one of multiple standard configurations of telescope plus detectors. FFor the list of instruments registered, consult the Registry.

Provider

type: menu
The identifier of the data archive providing search and retrieval functions for the data in question. The same data may of course be mirrored at two or more archives. Since the provider id is at least implicit in a data registry, this just means that the same data set would appear in multiple registries. Some data providers may be virtual, that is the query (but not the archive and distribution) services may be handled by other servers with access to their database information as proxies.
Recognized values at the time of writing:

  • HANET : H-alpha Network, Big Bear Lake
  • HAO :High-Altitude Observatory, Boulder
  • MSU : Montana State University, Bozeman
  • NSO : National Solar Observatory, Tucson
  • OBSPM : Observatoire de Paris, Meudon
  • OVRO : Owens Valley Radio Observatory
  • SDAC : Solar Data Analysis Center, Greenbelt
  • SHA : Stanford Helioseismology Archive

    For the current list, consult the Registry

9. Suggestions for Additional Search Parameters

The following search parameters or categories are under consideration for possible inclusion in future versions of the VSO Data Model:

  • Data processing information - menu?
  • Data format - menu? Possible data formats may include: ASCII, FITS, JPEG, GIF, PNG, MPEG, TIFF

10. Nicknames

Nicknames for famous combinations od Search Parameters were introduced in Version 1.7 of the Data Model in a separate table. Here they are incorporated in the defining document. Certain problems remain to be resolved. For example, mechanisms are required for designating a logical OR of menu-type parameters, and for specifying whether a Bounding_Radius is an inner radius or an outer radius.

White-light image

Observable=intensity, Data_Layout=image Wave_Type={broad | narrow} Wave_Minimum≥3000, Wave_Maximum≤10000

coronagraph image

Observable=intensity, Data_Layout=image |Observation_Center_West|≤20, |Observation_Center_North|≤20, Bounding Radius≥950 (excluded)

H-alpha image

Observable=intensity, Data_Layout=image Wave_Type=line, Wave_Minimum≥6558, Wave_Maximum≤6568

Ca-K image

Observable=intensity, Data_Layout=image Wave_Type=line, Wave_Minimum≥3919, Wave_Maximum≤3952

He 10830 image

Observable=intensity, Data_Layout=image Wave_Type=line, Wave_Minimum≥10825, Wave_Maximum≤10833

Na-D image

Observable=intensity, Data_Layout=image Wave_Type=line, Wave_Minimum≥5888, Wave_Maximum≤5898

Hard X-ray image

Observable=intensity, Data_Layout=image Wave_Minimum≥0.2, Wave_Maximum≤10,

Soft X-ray image

Observable=intensity, Data_Layout=image Wave_Minimum≥5, Wave_Maximum≤150,

EUV image

Observable=intensity, Data_Layout=image Wave_Minimum≥100, Wave_Maximum≤1250,

UV image

Observable=intensity, Data_Layout=image Wave_Minimum≥900, Wave_Maximum≤3800,

10.7 cm image

Observable=intensity, Data_Layout=image Wave_Type=narrow, Wave_Minimum≥1.06*109, Wave_Maximum≤1.08*109,

Continuum image

Observable=intensity, Data_Layout=image Wave_Type=narrow

Full-disk magnetogram

Wave_Type=line, Data_Layout=image Observable={ LOS_magnetic field | vector_magnetic field } |Observation_Center_West|≤20, |Observation_Center_North|≤20, Bounding Radius≥800

LOS magnetogram

Observable=LOS_magnetic field, Data_Layout=image Wave_Type=line

vector magnetogram

Observable=vector_magnetic field, Data_Layout=image Wave_Type=line

Full-disk dopplergram

Observable=LOS_velocity, Data_Layout=image

Wave_Type=line

|Observation_Center_West|≤20, |Observation_Center_North|≤20, Bounding Radius≥800

Na-D dopplergram, Data_Layout=image

Observable=LOS_velocity, Data_Layout=image Wave_Type=line, Wave_Minimum≥5888, Wave_Maximum≤5898

Ni-6768 dopplergram

Observable=LOS_velocity, Data_Layout=image Wave_Type=line, Wave_Minimum≥6767, Wave_Maximum≤6769

K-7699 dopplergram

Observable=LOS_velocity, Data_Layout=image Wave_Type=line, Wave_Minimum≥7698, Wave_Maximum≤7700

EUV Spectrum

Observable=intensity, Data_Layout=spectrum Wave_Type=broad, Wave_Minimum≥100, Wave_Maximum≤1250

UV Spectrum

Observable=intensity, Data_Layout=spectrum Wave_Type=broad, Wave_Minimum≥900, Wave_Maximum≤3800

Visible Spectrum

Observable=intensity, Data_Layout=spectrum Wave_Type=broad, Wave_Minimum≥3500, Wave_Maximum≤10000

IR Spectrum

Observable=intensity, Data_Layout=spectrum Wave_Type=broad, Wave_Minimum≥7000, Wave_Maximum≤3.5*106

Atlas Spectrum

Observable=intensity, Data_Layout=spectrum Wave_Type=broad (?)

Helioseismic Time series

Observable={ wave_power | wave_phase | oscillation_mode_parameters }

Light Curve Time series

Observable=intensity, Data_Layout=time_series