Last modified 10 years ago Last modified on 09/19/13 06:17:07

Notes about Vocabulary / Terminology on Data Citation

(these notes were made late at night, September 18th, 2014 after 3 days of meetings ... might not be fully coherent, especially not without the context of the "Data Citation Principles" proposal we were discussing -Joe H.)

After thinking about it while stuck in DC traffic on the way home, I realized that there might be as many as five[!1] different things that we've been calling "Data Citation" :

  1. There is the reference from a given item to another item.

example : "SOHO/EIT level 1 data" is derived from "SOHO/EIT level 0 data" and "SOHO/EIT calibration".

  1. There is the string that is a serialized reference to the data.

example : Hutyra, L., S. Wofsy and S. Saleska. 2007. LBA-ECO CD-10 CO2 and H2O Eddy Fluxes at km 67 Tower Site, Tapajos National Forest. Data set. Available online at from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee. U.S.A. doi:10.3334/ORNLDAAC/860

  1. There is the document containing information about the data. (ie, the 'landing page' that we're not trying to mention.)

example :

  1. There is the information contained in #3. (the DataCite metadata, plus whatever other metadata that community may deem useful)
  1. There is the whole process as a more abstract concept.

example : data citation should be part of tenure & promotion considerations.

  1. There is the action of citing data. (I might be starting to stretch things a bit ... but a specific instance of citation; the action vs. the linkage or the record of the action)


If we know what 'potato' is, and we know what 'pancakes' are, we're likely to make an assumption that when someone says 'potato pancakes' they mean something closer to matafan, boxty or a potato fritter rather than latkes.[!2]

Because of the different definitions for 'citation', it's likely that most people will assume 1, 2 or 5, and might get those from context. I believe it's unlikely that they're going to know we're talking about 3 or 4 from the 'Data Citation Principles' document.

I think we need some new term for 3 & 4 ... 'metadata' is likely too generic for #4, and there were problems with just 'description' as we're dealing with a much more formalized item.

I don't think I have a good term yet, but just some notes to might spur someone to think of something:

Data Record (we're trying to establish a record for data ... Ruth thought this is likely to be confused with Data Granule) Data Description (maybe capitalized we suggest that it's got a more formal definition? ... although I don't want to get into the 'publication' vs. 'Publication' discussions all over again) Front Matter Bibliographic Record (the 'BR' in 'FRBR')

I mentioned to Maryann in a break that I think it's important to distinguish 1/2 vs. 3/4:

in 3/4, we only have attributes about the data, and the publication of the data (eg, when it was released) in 1/2, we also have information about the linkage itself. (subsetting of the data being cited and the [http:/ CITO] relationships)


[!1] I had three when I started writing this, but then realized I had to differentiate 1&2 and 3&4. And now I just added a 6th. Blah.

[!2] Not all of you have seen my various pancake presentations ... see or for the 5 min video