| 1 | = Minimum Information for Solar Observations = |
| 2 | |
| 3 | == draft v3.1 == |
| 4 | |
| 5 | After months of trying to come up with various recommendations for 'best practices', this document it an attempt at a 'Minimum Information' standard in the spirit of the bioinformatics community's [http://mibbi.sourceforge.net/about.shtml MIBBI project]. |
| 6 | |
| 7 | If you have any comments, please contact [People/JoeHourcle Joe Hourclé]. |
| 8 | |
| 9 | ---- |
| 10 | |
| 11 | = Documenting Solar Data = |
| 12 | |
| 13 | Goal: |
| 14 | |
| 15 | * Ensure that current and future researchers can use your data. |
| 16 | * Ensure that researchers will properly acknowledge your data. |
| 17 | * Reduce the amount of time needed to support researchers. |
| 18 | * Reduce the likelihood of data or metadata being misunderstood. |
| 19 | * Reduce the chance of improper use of the data. |
| 20 | * Reduce the amount of effort needed to use the data. |
| 21 | |
| 22 | Well thought out documentation, organization, file naming and metadata (FITS headers) will make a difference. |
| 23 | |
| 24 | Both solar physicists and non-discipline scientists should be able to easily understand what is in a data file from an instrument that they have never dealt with before, and quickly determine if it is useful for their purposes. |
| 25 | |
| 26 | The following questions should be answered by the documentation or the data files themselves. Where possible, individual files should provide a link of where to find additional documentation. Active missions should review this information on an annual basis, or at times of significant updates. |
| 27 | |
| 28 | NOTE : Because URLs to documentation may change over time, the SDAC is looking into providing stable URLs (under !http://data.virtualsolar.org/... or !http://solardata.org/...) that would collect up relevant information and links that may change. We're also looking into registering DOIs for these documents. See [http://vso1.nascom.nasa.gov/spd2012/2012_SPD_citation.pdf Recommendations for Data & Software Citation in Solar Physics] (2012 SPD poster) and [/wiki/Citation Guidelines / Recommendations for Citing Data] for more information. |
| 29 | |
| 30 | ----- |
| 31 | |
| 32 | == The Overall Collection (High Level) == |
| 33 | |
| 34 | * What is the name of experiment? |
| 35 | * Who ran the experiment? |
| 36 | * (organization/institution, PIs) |
| 37 | * If a researcher has questions: |
| 38 | * Where can they get documentation? |
| 39 | * (website; published papers) |
| 40 | * How can they get help or report possible problems? |
| 41 | * (website w/ contact info or a generic email like 'instrument@...' ) |
| 42 | * How should the experiment be acknowledged in published research? |
| 43 | * What was the goal of the experiment? |
| 44 | * What instruments were used to perform the experiment? |
| 45 | * (names, acronyms/abbreviations) |
| 46 | * Where were they? |
| 47 | * (spacecraft or observatory name, general location (eg, 'near L1', 'near earth', 'Tenerife, Canary Islands') |
| 48 | * When did the experiment run? |
| 49 | * What type of observations were collected? |
| 50 | * What type of derived products are available? |
| 51 | * (eg, 'white light coronograms', 'EUV images', 'x-ray spectroscopy', 'daily plots', 'carrington maps') |
| 52 | * Are there caveats or other warnings for potential users of the data? |
| 53 | * (eg, issues with the collection process that might make the data unsuitable for specific uses; known environmental conditions that introduce error during certain periods? Known biases introduced in the calibration or other processing? Known misleading metadata (eg, clock drift)? Any other potential sources of error?) |
| 54 | * Is there software in SolarSoft to use the data? |
| 55 | * If so, where can we get documentation on using it? |
| 56 | * Is there any other recommended software to use the data? |
| 57 | |
| 58 | ----- |
| 59 | |
| 60 | == Dataset Details (Mid Level) == |
| 61 | |
| 62 | * What different datasets are in the overall collection? |
| 63 | * ... different sensors / detectors / cameras |
| 64 | * ... different observing modes (filters, polarization, cadences, exposure times) |
| 65 | * ... different processed forms |
| 66 | |
| 67 | (for background, see [http://dx.doi.org/10.2218/ijdc.v6i1.183 Wynholds, "Linking to Scientific Data: Identity Problems of Unruly and Poorly Bounded Digital Objects"]) |
| 68 | |
| 69 | For each specific dataset: |
| 70 | * Is there a name or title to distinguish it from the other available datasets? |
| 71 | * What type of data is it? |
| 72 | * (eg, intensity, magnetic field, temperature) |
| 73 | * What are the defining characteristics of the dataset? (eg, level of processing, detector used, calibration version, observing mode (filters, exposure time, cadence, etc.)) |
| 74 | * What is the purpose / intended use of this specific dataset? |
| 75 | * (why was the dataset created?) |
| 76 | * Is there a contact for this dataset different from the larger collection? |
| 77 | * Should it be cited or acknowledged differently from the rest of the collection? |
| 78 | * Are there specific or additional caveats? |
| 79 | * (eg, DATE_OBS is a coordinated time, not the spacecraft time) |
| 80 | * Is this final data, or is there a chance it will be revised? |
| 81 | * Is this quicklook data or otherwise unsuited for science use? |
| 82 | * How has the dataset been processed? |
| 83 | * (flat fielded, limb darkened, reduced, correction for point-spread, compressed, etc.) |
| 84 | * Is the calibration reversable? |
| 85 | * What dataset is this derived from (or is it the lowest level available?) |
| 86 | * What time reference are you using? |
| 87 | * (UTC, GPS, UNIX, TAI, spacecraft clock) |
| 88 | * What is the volume of the dataset?: |
| 89 | * Total number of images or data records? |
| 90 | * Overall volume on disk (in GB or TB)? |
| 91 | * (if still in planning stages, how quickly is it expected to grow?) |
| 92 | |
| 93 | * Which datasets are considered to be 'level0'? |
| 94 | * (or the lowest level available on the ground) |
| 95 | * How are the different datasets related? |
| 96 | * (eg, calibrated version of ..., reduced form of ..., repackaging of ..., etc.) |
| 97 | * How is the data organized? |
| 98 | * What does each file represent? |
| 99 | * How are the filenames constructed? |
| 100 | * Are the filenames unique, or is directory location significant? |
| 101 | * How are the directories structured? |
| 102 | * (eg, 'year/month/day/instrument' vs. 'instrument/year/...') |
| 103 | |
| 104 | ----- |
| 105 | |
| 106 | == File & Observation Specific Details (Low Level) == |
| 107 | |
| 108 | NOTE : FITS allows for all of this metadata to be included in each individual file, and where possible, this is the recommended practice. If reprocessing the files would be a burden or for other file formats, you might have tarballs or [http://tools.ietf.org/html/draft-kunze-bagit-09 BagIt archives] which include README and checksum files. There may also be a catalog of the data in FITS tables, CVS or some other format, that includes this information. |
| 109 | |
| 110 | * Is the file in a self-describing scientific format? |
| 111 | * Does it mention how to uniquely reference this file or observation to report problems or check for an updated calibration? |
| 112 | * Does it mention which specific dataset it's a member of? |
| 113 | * Does it provide a URL or other reference to the documentation? |
| 114 | |
| 115 | * Have you provided: |
| 116 | * The time of the observation? |
| 117 | * The duration (exposure) of the observation? |
| 118 | * The location of the detector? |
| 119 | * The pointing of the detector (if appropriate)? |
| 120 | * Any other details of the observing mode that may vary between observations? |
| 121 | |
| 122 | * A checksum to verify file integrity? |
| 123 | |
| 124 | ----- |
| 125 | |
| 126 | == FITS Specific issues == |
| 127 | |
| 128 | * Does the file clearly state that it's a FITS file? |
| 129 | * Is there a reference to the FITS standard, or a link to the FITS website? |
| 130 | |
| 131 | * Do the headers include the assigned filename (FILENAME)? |
| 132 | * Do the headers include units in the comments? |
| 133 | * Do the headers spell out abbreviations or other coded values? |
| 134 | |
| 135 | ---- |
| 136 | |
| 137 | == References, Recommended Reading & Other Related Stuff: == |
| 138 | |
| 139 | * [http://dx.doi.org/10.2218/ijdc.v6i1.183 Wynholds, "Linking to Scientific Data: Identity Problems of Unruly and Poorly Bounded Digital Objects"] |
| 140 | * [/wiki/Citation Guidelines / Recommendations for Citing Data] |
| 141 | * [/wiki/Checklists Checklists for documenting solar physics data & catalogs] |
| 142 | * [http://vso1.nascom.nasa.gov/spd2012/2012_SPD_FITS_headers.pdf Recommendation for FITS Headers], poster from 2012 SPD meeting. |
| 143 | |
| 144 | |