wiki:DrmsStorageUnits
Last modified 9 years ago Last modified on 01/16/15 17:49:37

DRMS Storage Units

As you read through DRMS / SUMS documentation, you'll often see mention of 'sunums'. 'sunums' are the identifiers to 'Storage Units' (Storage Unit NUMbers), but the term is used for both the identifiers, and the general concept of storage units themselves. Within the SUMS database, the storage unit number is typically given the field name 'ds_index'.

Storage units are directories on disk, but they don't have a one-to-one relationship with records within DRMS (identified by a series name and the 'recnum' or 'RECord NUMber').

If you look in the SUMS table 'sum_main', you will see records corresponding to each storage unit on disk. As a remote site, we generally only care about the following fields::

  ds_index      : corresponds to the 'sunum' field in DRMS
  owning_series : the series that the storage unit was registered under
  online_loc    : full path to the storage unit on disk.
  bytes         : size of the storage unit, in bytes

There will also be a corresponding record in sum_partn_alloc, with the following fields:

  wd             : the full path to the storage unit (corresponds to sum_main.online_loc)
  sumid          : I have no idea what this is, but it is *not* the sunum.
  ds_index       : the storage unit identifier (corresponds to sunum in the DRMS series)
  bytes          : size of the storage unit, in bytes
  effective_date : a stringified date (YYYYMMDDH24mmSS) with the earliest that 'sum_rm' is allowed to delete this storage unit

Note that 'bytes' seems to be found by summing the sizes of the files together, so it will always be lower than the amount of space taken on the disks (as there's no check for how many file blocks were allocated).

The 'online_loc'/'wd' field will likely be of the format "partition/D#####' where ##### corresponds to an sunum (ds_index), and 'partition' corresponds to an entry from 'sum_partn_avail'. There is an alternate format that you may see in JMD log records, which is 'partition/D####/D#####'. We have been told that these are storage units that were restored from tape.

Within a given storage unit will be one or more directories and a file called Records.txt. The Records.txt which will contain something such as :

series=aia.lev1
slot	record number
0	149593002

Each 'slot' corresponds to one of the directories within the storage unit directory. Slot 0 corresponds to 'S00000' and they increment from there. This particular fle tells is that the files in S00000 were created because of the creation of aia.lev1 recnum 149593002. (note that due to the journaling nature of DRMS series, that there may be a more recent recnum that should be used to extract the metadata when serving the file to the public (handled by the program 'drms_export').

Although a given storage unit will be created specifically for a single series, multiple series may refer to a single storage unit, or even to a single slot within a storage unit. There is typically one slot per aia.lev1 storage unit, but 32 slots per hmi.*_45s storage units and 30 slots per hmi.*_720s storage units. The exceptton is hmi.s_720s, which only has two slots. You will sometimes find other exceptions to this rule, such as aia.lev1 sunums with more than one slot.

Within a given slot's directory, there will be one or more files. The file names vary per DRMS series, but will be consistent withn a given series. For example, hmi.m_45s are named 'magnetogram.fits'. Files in aia.lev1 are named 'image_lev1.fits' and 'spikes.fits'. There are 24 files in each hmi.s_720s slots. Within DRMS documentation, they refer to the files as 'segments'.

...

As best that I can figure out, they did all this work so that SUMS didn't have to track as many directories. There was a lot of talk about trying to not deal with having to individually track lots of small files in the JSOC development telecons.

However, when AIA got folded into using DRMS, they planned on storing 8 images per storage unit, and when we were told that they wouldn't be allowed to retrieve partial storage units (and that the JSOC wouldn't support the data if we forked their code), we insisted that the AIA data be stored as individual records, so that we could retrieve a single wavelength for a period of time, without having to download eight times the data and trash the majority of it. As such, SUMS went from handling about a few storage units per minute for HMI to almost a storage unit per second for aia.lev1. (aia.lev0 may be 'slotted' as they had planned, resulting in 5 storage units per minute).