wiki:MinimumInformation
Last modified 6 years ago Last modified on 06/12/13 16:13:13

Minimum Information for Solar Observations

draft v3.1

After months of trying to come up with various recommendations for 'best practices', this document it an attempt at a 'Minimum Information' standard in the spirit of the bioinformatics community's MIBBI project.

If you have any comments, please contact Joe Hourclé.


Documenting Solar Data

Goal:

  • Ensure that current and future researchers can use your data.
  • Ensure that researchers will properly acknowledge your data.
  • Reduce the amount of time needed to support researchers.
  • Reduce the likelihood of data or metadata being misunderstood.
  • Reduce the chance of improper use of the data.
  • Reduce the amount of effort needed to use the data.

Well thought out documentation, organization, file naming and metadata (FITS headers) will make a difference.

Both solar physicists and non-discipline scientists should be able to easily understand what is in a data file from an instrument that they have never dealt with before, and quickly determine if it is useful for their purposes.

The following questions should be answered by the documentation or the data files themselves. Where possible, individual files should provide a link of where to find additional documentation. Active missions should review this information on an annual basis, or at times of significant updates.

NOTE : Because URLs to documentation may change over time, the SDAC is looking into providing stable URLs (under http://data.virtualsolar.org/... or http://solardata.org/...) that would collect up relevant information and links that may change. We're also looking into registering DOIs for these documents. See Recommendations for Data & Software Citation in Solar Physics (2012 SPD poster) and Guidelines / Recommendations for Citing Data for more information.


The Overall Collection (High Level)

  • What is the name of experiment?
  • Who ran the experiment?
    • (organization/institution, PIs)
  • If a researcher has questions:
    • Where can they get documentation?
      • (website; published papers)
    • How can they get help or report possible problems?
      • (website w/ contact info or a generic email like 'instrument@...' )
  • How should the experiment be acknowledged in published research?
  • What was the goal of the experiment?
  • What instruments were used to perform the experiment?
    • (names, acronyms/abbreviations)
    • Where were they?
      • (spacecraft or observatory name, general location (eg, 'near L1', 'near earth', 'Tenerife, Canary Islands')
  • When did the experiment run?
  • What type of observations were collected?
  • What type of derived products are available?
    • (eg, 'white light coronograms', 'EUV images', 'x-ray spectroscopy', 'daily plots', 'carrington maps')
  • Are there caveats or other warnings for potential users of the data?
    • (eg, issues with the collection process that might make the data unsuitable for specific uses; known environmental conditions that introduce error during certain periods? Known biases introduced in the calibration or other processing? Known misleading metadata (eg, clock drift)? Any other potential sources of error?)
  • Is there software in SolarSoft to use the data?
    • If so, where can we get documentation on using it?
  • Is there any other recommended software to use the data?

Dataset Details (Mid Level)

  • What different datasets are in the overall collection?
    • ... different sensors / detectors / cameras
    • ... different observing modes (filters, polarization, cadences, exposure times)
    • ... different processed forms

(for background, see Wynholds, "Linking to Scientific Data: Identity Problems of Unruly and Poorly Bounded Digital Objects")

For each specific dataset:

  • Is there a name or title to distinguish it from the other available datasets?
  • What type of data is it?
    • (eg, intensity, magnetic field, temperature)
  • What are the defining characteristics of the dataset? (eg, level of processing, detector used, calibration version, observing mode (filters, exposure time, cadence, etc.))
  • What is the purpose / intended use of this specific dataset?
    • (why was the dataset created?)
  • Is there a contact for this dataset different from the larger collection?
  • Should it be cited or acknowledged differently from the rest of the collection?
  • Are there specific or additional caveats?
    • (eg, DATE_OBS is a coordinated time, not the spacecraft time)
  • Is this final data, or is there a chance it will be revised?
  • Is this quicklook data or otherwise unsuited for science use?
  • How has the dataset been processed?
    • (flat fielded, limb darkened, reduced, correction for point-spread, compressed, etc.)
  • Is the calibration reversable?
  • What dataset is this derived from (or is it the lowest level available?)
  • What time reference are you using?
    • (UTC, GPS, UNIX, TAI, spacecraft clock)
  • What is the volume of the dataset?:
    • Total number of images or data records?
    • Overall volume on disk (in GB or TB)?
    • (if still in planning stages, how quickly is it expected to grow?)
  • Which datasets are considered to be 'level0'?
    • (or the lowest level available on the ground)
  • How are the different datasets related?
    • (eg, calibrated version of ..., reduced form of ..., repackaging of ..., etc.)
  • How is the data organized?
    • What does each file represent?
    • How are the filenames constructed?
    • Are the filenames unique, or is directory location significant?
    • How are the directories structured?
      • (eg, 'year/month/day/instrument' vs. 'instrument/year/...')

File & Observation Specific Details (Low Level)

NOTE : FITS allows for all of this metadata to be included in each individual file, and where possible, this is the recommended practice. If reprocessing the files would be a burden or for other file formats, you might have tarballs or BagIt archives which include README and checksum files. There may also be a catalog of the data in FITS tables, CVS or some other format, that includes this information.

  • Is the file in a self-describing scientific format?
  • Does it mention how to uniquely reference this file or observation to report problems or check for an updated calibration?
  • Does it mention which specific dataset it's a member of?
  • Does it provide a URL or other reference to the documentation?
  • Have you provided:
    • The time of the observation?
    • The duration (exposure) of the observation?
    • The location of the detector?
    • The pointing of the detector (if appropriate)?
    • Any other details of the observing mode that may vary between observations?

  • A checksum to verify file integrity?

FITS Specific issues

  • Does the file clearly state that it's a FITS file?
    • Is there a reference to the FITS standard, or a link to the FITS website?
  • Do the headers include the assigned filename (FILENAME)?
  • Do the headers include units in the comments?
  • Do the headers spell out abbreviations or other coded values?

References, Recommended Reading & Other Related Stuff: