Version 7 (modified by niles, 4 years ago) (diff) |
---|
General Description of NetDRMS software and discussion of some issues
1.0 Introduction
The Solar Dynamics Observatory (SDO) spacecraft, see https://sdo.gsfc.nasa.gov was launched in 2010. As of this time (April 2021) it is still observing the sun with several instruments, including the Atmospheric Imaging Assembly (AIA) and the Helioseismic and Magnetic Imager (HMI). The data rate is prolific, with approximately 70,000 images being acquired daily. These data are stored, and served out from, The Joint Science Operations Center (JSOC) at Stanford University, see http://jsoc.stanford.edu where they are managed by software developed at Stanford known as NetDRMS. This document was written to give a broad overview of NetDRMS, and present some issues that the system has.
SDO data are stored on disk at the JSOC in the Flexible Image Transport System (FITS) file format. This format supports storage of both the image data and meta data header information. As stored on disk, the image data have only very minimal header information, such as the dimensions of the image. The information needed to provide a complete FITS header is stored in a separate database.
When a user requests an image, possibly through a Common Gateway Interface (CGI) script available on the internet, the NetDRMS software combines the stored FITS image data with the information in the database to produce a FITS file with a complete FITS header that has all relevant meta data pertinent to the image. This process is know as exporting the image. This approach allows for edits to the header information to take place without having to overwrite existing FITS files.
SDO data are organized by series with a series typically being data from a certain instrument that has been processed in a certain way. Popular series include aia.lev1 (level one AIA data) hmi.m_45s (HMI 45 second magnetograms) and hmi.ic_45s (HMI 45 second continuum images).
There is a difference in the way that AIA and HMI data are organized. Each AIA image is an item on its own (also, AIA data are actually stored on disk with complete data headers, but still need to go through the export process in order for the data filename to be meaningful). HMI data, however, are bundled together, so that several HMI images together form a the minimum amount of HMI data. This characteristic, that several HMI images are bundled together as a unit, is considered a trait of HMI data.
2.0 Remote NetDRMS sites
To try to spread the load of network requests, in addition to the NetDRMS system at the JSOC in California, the Virtual Solar Observatory (VSO) operates remote NetDRMS sites. At the time of writing there are two remote sites, one at the National Solar Observatory (NSO) in Boulder, Colorado and one at the NASA Goddard Solar Data Analysis Center (SDAC) located in Greenbelt, Maryland. While these remote sites do have significant storage attached to them, they do not store the complete set of SDO images on disk as the JSOC does. Rather, they store a buffer of SDO data, with data aging off over time. The remote sites do maintain a database that is a mirror of the complete database at the JSOC used to generate FITS header information for exporting FITS files.
Thus the remote sites have the complete database of information used to export FITS data to users, but only a subset of stored image data. Image data can be copied from the JSOC to the remote sites in one of two ways.
The first way that data are copied is known as a mirror copy. This is simply an attempt to mirror recent data at the remote sites on the assumption that recent data will be more likely to be downloaded by users, and so it is desirable to have it "staged" at the remote sites for users to download.
The second way that data are copied from the JSOC to the remote sites is known as a user copy. This occurs when a user requests data from a remote site that is not in the remote site buffer of data. In this case, a copy is initiated, and the user who made the request is then obliged to wait while the copy of the data from the JSOC to the remote site takes place, and then wait for the export of the FITS file(s) requested to take place before the download can begin.
For both user and mirror data copies, only a certain number of copies from the JSOC can take place at a certain time. If the limit on the number of copies is reached, then subsequent data copy requests are queued until a copy in process finishes. The intent is to avoid a network bottleneck due to their being too many copies taking place at once.
Note that when AIA data is copied to a remote site, only one image needs to be copied to answer a user request. Because of the bundling of HMI images, however, all the images in an HMI bundle may need to be copied to serve out a single HMI image.
3.0 Distribution of data served out by series
The bulk of the SDO data served out is for the AIA level one series. The pie chart below shows a breakdown of the SDO series served out by the NSO remote site for the first quarter of 2021. This is a period during which the system was running normally without other problems, such as storage being offline at the JSOC, or an issue at NSO, or a network outage. Note that the pie chart is based on the count of data items served out, as opposed to the number of bytes delivered. Approximately 80% of requests are for AIA data.
The bar chart below presents essentially the same data. The six series downloaded during the first quarter of 2021 from the NSO remote site are hmi.s_720s, hmi.m_720s, hmi.v_45s, hmi.ic_45s, hmi.m_45s, and aia.lev1. For each series, the green bar represents the number of data download requests for that series that succeeded (the web server had status 200) while the red bar represents the total number of requests (whatever the status the web server had).
This again shows that most requests are for AIA data. It also shows that while for AIA data most downloads succeed, that is less likely the case for HMI. The plot below shows the count of the number of failed downloads (not status 200) in an hour on the X axis with the number of requests for HMI data fpr the same hour on the Y axis, again over the first quarter of 2021 at NSO. The two appear correlated. This suggests that the system is having problems when users request HMI data.
The same data, plotted in a similar way, but for hours over which there were more than 1,000 requests for which the downloads returned a non-200 status (ie the system was really failing) shows an increased correlation (below).
4.0 Discussion of issues
The above data would seem to suggest that SDO data downloads seem to begin failing when users request HMI data. Note that this is not always the case, there are some times when users request HMI data but the count of failed download remains low (points in the top left of the fist scatter plot).
There are several mechanisms in operation. First, if a user requests HMI data from a remote site, and the remote site happens to have the data locally (ie already copied to the remote site, either by a recent mirror request or a recent previous user request), then the downloads likely succeed (hence the top left points in the scatter plot).
If the data need to be copied to the remote site to answer the user request, then it becomes possible that the user's connection to the remote site will time out prior to the copying of data from the JSOC completing. This is likely happening for HMI data due to the bundled nature of HMI data making the copying of data to the remote site take longer. Worse, if there are many requests for HMI data in a short period, then the limit on the number of copies from the JSOC to the remote site may be reached, and copy requests will start to be queued. If this occurs, then requests for AIA data that happen to be made at the same time that the the remote site is busy with many requests for HMI data will also start to time out due to the bottleneck in copy requests. In that event, the requests for HMI data will have caused problems not only for HMI data, but for AIA data as well.
There is some evidence that users have become aware that they need to ask for HMI data twice, with the first request failing due to timeout but succeeding at getting the data copied to the remote site so that subsequent requests will succeed. Examination of the apache server logs after one incident at NSO during which the system was struggling to deliver HMI data seemed to support this, that the same user was coming back to retrieve the data requested on the first attempt. Users may not be aware of the underlying mechanism, but at least some experienced users have become aware that they may need to request data multiple times.
These mechanisms have been observed in near real time from web pages that monitor the status of downloads from the NSO remote site. Bad (non-200 status) downloads often correlate with requests for HMI data, with AIA data also being affected in more severe cases.
5.0 Possible solution
Having the VSO serve HMI data out from a machine situated at the JSOC would obviate the need for the copy of data to the remote site prior to exporting. This would avoid timeouts due to the copying of HMI data taking longer than AIA data. This would also have the advantage that linked series, in which one series relies on the meta data of another series, could be served out. This is currently problematic for remote sites. Because AIA data comprise the majority of requests, serving only AIA data out from remote sites would mean that the remote sites are still contributing to sharing the network load, avoiding the JSOC being overloaded.
On another note, it could be that mirror requests for AIA data (ie copying them to the remote sites for staging) could be dispensed with, since the first user request for the data would stage the data for subsequent user requests. This may avoid AIA data being mirrored in instances when there are no subsequent user requests. This problem is relatively minor, however.
Niles Oien April 2021.
Attachments
-
pie.png
(23.1 KB) -
added by niles 4 years ago.
Pie chart of SDO downloads by series
-
count.png
(19.3 KB) -
added by niles 4 years ago.
Bar chart of count, by series, of SDO data requested from NSO in first quarter of 2021.
- cor.png (40.4 KB) - added by niles 4 years ago.
-
corBad.png
(23.8 KB) -
added by niles 4 years ago.
Similar to cor.png but only for hours when that failed downloads exceed 1000