Changes between Version 5 and Version 6 of drmsGeneralAndHMIissue


Ignore:
Timestamp:
04/12/21 16:25:42 (3 years ago)
Author:
niles
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • drmsGeneralAndHMIissue

    v5 v6  
    4545[[Image(cor.png)]] 
    4646 
     47The same data, plotted in a similar way, but for hours over which there were more than 1,000 requests for which the downloads returned a non-200 status (ie the system was really failing) shows an increased correlation (below). 
     48 
     49[[Image(corBad.png)]] 
    4750 
    4851 
     52=== 4.0 Discussion of issues === 
     53 
     54The above data would seem to suggest that SDO data downloads seem to begin failing when users request HMI data. Note that this is not always the case, there are some times when users request HMI data but the count of failed download remains low (points in the top left of the fist scatter plot). 
     55 
     56There are several mechanisms in operation. First, if a user requests HMI data from a remote site, and the remote site happens to have the data locally (ie already copied to the remote site, either by a recent mirror request or a recent previous user request), then the downloads likely succeed (hence the top left points in the scatter plot). 
     57 
     58If the data need to be copied to the remote site to answer the user request, then it becomes possible that the user's connection to the remote site will time out prior to the copying of data from the JSOC completing. Worse, if there are many requests for HMI data in a short period, then the limit on the number of copies from the JSOC to the remote site may be reached, and copy requests will start to be queued. If this occurs, then requests for AIA data that happen to be made at the same time that the the remote site is busy with many requests for HMI data will also start to time out due to the bottleneck in copy requests. In that event, the requests for HMI data will have caused problems not only for HMI data, but for AIA data as well. 
     59 
     60There is some evidence that users have become aware that they need to ask for HMI data twice, with the first request failing due to timeout but succeeding at getting the data copied to the remote site so that subsequent requests will succeed. Examination of the apache server logs after one inciden at NSO during which the system was struggling to deliver HMI data seemed to support this, that the same user was coming back to retrieve the data requested on the first attempt. Users may not be aware of the underlying mechanism, but at least some experienced users have become aware that they may need to request data multiple times. 
     61 
     62=== 5.0 Possible solution === 
     63 
     64Having the VSO serve HMI data out from a machine situated at the JSOC would obviate the need for the copy of data to the remote site prior to exporting. This would avoid timeouts due to the copying of HMI data taking longer than AIA data. This would also have the advantage that linked series, in which one series relies on the meta data of another series, could be served out. This is currently problematic for remote sites. Because AIA data comprise the majority of requests, serving only AIA data out from remote sites would mean that the remote sites are still contributing to sharing the network load, avoiding the JSOC being overloaded. 
     65 
     66On another note, it could be that mirror requests for AIA data (ie copying them to the remote sites for staging) could be dispensed with, since the first user request for the data would stage the data for subsequent user requests. This may avoid AIA data being mirrored in instances when there are no subsequent user requests. This problem is relatively minor, however. 
     67 
     68Niles Oien April 2021. 
    4969 
    5070 
    51  
    52  
    53  
    54