wiki:drmsGeneralAndHMIissue

Version 4 (modified by niles, 4 years ago) (diff)

--

General Description of NetDRMS software and discussion of some issues

1.0 Introduction

The Solar Dynamics Observatory (SDO) spacecraft, see https://sdo.gsfc.nasa.gov was launched in 2010. As of this time (April 2021) it is still observing the sun with several instruments, including the Atmospheric Imaging Assembly (AIA) and the Helioseismic and Magnetic Imager (HMI). The data rate is prolific, with approximately 70,000 images being acquired daily. These data are stored, and served out from, The Joint Science Operations Center (JSOC) at Stanford University, see http://jsoc.stanford.edu where they are managed by software developed at Stanford known as NetDRMS. This document was written to give a broad overview of NetDRMS, and present some issues that the system has.

SDO data are stored on disk at the JSOC in the Flexible Image Transport System (FITS) file format. This format supports storage of both the image data and meta data header information. As stored on disk, the image data have only very minimal header information, such as the dimensions of the image. The information needed to provide a complete FITS header is stored in a separate database.

When a user requests an image, possibly through a Common Gateway Interface (CGI) script available on the internet, the NetDRMS software combines the stored FITS image data with the information in the database to produce a FITS file with a complete FITS header that has all relevant meta data pertinent to the image. This process is know as exporting the image. This approach allows for edits to the header information to take place without having to overwrite existing FITS files.

SDO data are organized by series with a series typically being data from a certain instrument that has been processed in a certain way. Popular series include aia.lev1 (level one AIA data) hmi.m_45s (HMI 45 second magnetograms) and hmi.ic_45s (HMI 45 second continuum images).

There is a difference in the way that AIA and HMI data are organized. Each AIA image is an item on its own (also, AIA data are actually stored on disk with complete data headers, but still need to go through the export process in order for the data filename to be meaningful). HMI data, however, are bundled together, so that several HMI images together form a the minimum amount of HMI data. This characteristic, that several HMI images are bundled together as a unit, is considered a trait of HMI data.

2.0 Remote NetDRMS sites

To try to spread the load of network requests, in addition to the NetDRMS system at the JSOC in California, the Virtual Solar Observatory (VSO) operates remote NetDRMS sites. At the time of writing there are two remote sites, one at the National Solar Observatory (NSO) in Boulder, Colorado and one at the NASA Goddard Solar Data Analysis Center (SDAC) located in Greenbelt, Maryland. While these remote sites do have significant storage attached to them, they do not store the complete set of SDO images on disk as the JSOC does. Rather, they store a buffer of SDO data, with data aging off over time. The remote sites do maintain a database that is a mirror of the complete database at the JSOC used to generate FITS header information for exporting FITS files.

Thus the remote sites have the complete database of information used to export FITS data to users, but only a subset of stored image data. Image data can be copied from the JSOC to the remote sites in one of two ways.

The first way that data are copied is known as a mirror copy. This is simply an attempt to mirror recent data at the remote sites on the assumption that recent data will be more likely to be downloaded by users, and so it is desirable to have it "staged" at the remote sites for users to download.

The second way that data are copied from the JSOC to the remote sites is known as a user copy. This occurs when a user requests data from a remote site that is not in the remote site buffer of data. In this case, a copy is initiated, and the user who made the request is then obliged to wait while the copy of the data from the JSOC to the remote site takes place, and then wait for the export of the FITS file(s) requested to take place before the download can begin.

For both user and mirror data copies, only a certain number of copies from the JSOC can take place at a certain time. If the limit on the number of copies is reached, then subsequent data copy requests are queued until a copy in process finishes. The intent is to avoid a network bottleneck due to their being too many copies taking place at once.

Note that when AIA data is copied to a remote site, only one image needs to be copied to answer a user request. Because of the bundling of HMI images, however, all the images in an HMI bundle may need to be copied to serve out a single HMI image.

3.0 Distribution of data served out by series

The bulk of the SDO data served out is for the AIA level one series. The pie chart below shows a breakdown of the SDO series served out by the NSO remote site for the first quarter of 2021. This is a period during which the system was running normally without other problems, such as storage being offline at the JSOC, or an issue at NSO, or a network outage. Note that the pie chart is based on the count of data items served out, as opposed to the number of bytes delivered. Approximately 80% of requests are for AIA data.

Pie chart of SDO downloads by series

The bar chart below presents essentially the same data. The six series downloaded during the first quarter of 2021 from the NSO remote site are hmi.s_720s, hmi.m_720s, hmi.v_45s, hmi.ic_45s, hmi.m_45s, and aia.lev1. For each series, the green bar represents the number of data download requests for that series that succeeded (the web server had status 200) while the red bar represents the total number of requests (whatever the status the web server had).

Bar chart of count, by series, of SDO data requested from NSO in first quarter of 2021.

This again shows that most requests are for AIA data. It also shows that while for AIA data most downloads succeed, that is less likely the case for HMI. The plot below shows the number of failed dowloads (not status 200) on the X axis with the number of requests for HMI data on the Y axis for this time period. The two appear correlated. This suggests that the system is having problems when users request HMI data.

Attachments

  • pie.png (23.1 KB) - added by niles 4 years ago. Pie chart of SDO downloads by series
  • count.png (19.3 KB) - added by niles 4 years ago. Bar chart of count, by series, of SDO data requested from NSO in first quarter of 2021.
  • cor.png (40.4 KB) - added by niles 4 years ago.
  • corBad.png (23.8 KB) - added by niles 4 years ago. Similar to cor.png but only for hours when that failed downloads exceed 1000