Ticket #325 (assigned problem)

Opened 3 months ago

Last modified 7 weeks ago

Tom Bridgman reports an XML(?) error trying to retrieve data from GONG data provider.

Reported by: alisdair Owned by: ed
Priority: high Milestone:
Component: IDLClient Version: 1.4
Severity: major Keywords: User problem report, GONG, IDL
Cc:

Description

Tom Bridgman reports an XML(?) error trying to retrieve data from GONG data provider.

Hello,

I'm trying to retrieve some imagery from the NSO GONG network.  This basic structure has worked for other time intervals, such as

    startDate='2011/06/07 03:00'
    endDate='2011/06/07 06:00'

Yet for the range defined in the code
outdirBase='/svs/data/solar/'

startDate='2014/08/08 06:00'
endDate='2014/08/10 21:00'
sampleStep=10
wavelength='6562'
outDir=outdirBase+'BBSO/HAlpha/20140808/'
searchfile=vso_search(date=startDate+' - '+endDate, source='GONG', min_wave='6560', max_wave='6570')

print, 'Retrieving data...'
getfile=vso_get(searchfile,filelist=filelist,out_dir=outdir)

it fails with this message suggesting it could not retrieve a usable XML file:

IDL> .run August2014fromVSO
% Compiled module: $MAIN$.
Records Returned : NSO : 1981/6717
Retrieving data...
% IDLFFXMLDOMDOCUMENT::LOAD: Parser fatal error: File: IDL STRING, line: 1, column: 50 : Expected whitespace
% XMLPARSER::DOM: IDLFFXMLDOMDOCUMENT::LOAD: Error encountered during the parse operation.
% XMLPARSER::FINDELEMENT: No DOM tree passed in
% Attempt to call undefined method: 'IDL_INT::GETFIRSTCHILD'.
% Execution halted at: SOAP::DESERIALIZE  223 /svs/projects/SunEarthConn/ssw/gen/idl/clients/vso/soap__define.pro
%                      SOAP::SEND        175 /svs/projects/SunEarthConn/ssw/gen/idl/clients/vso/soap__define.pro
%                      VSO::GETDATA      804 /svs/projects/SunEarthConn/ssw/gen/idl/clients/vso/vso__define.pro
%                      VSO_GET           153 /svs/projects/SunEarthConn/ssw/gen/idl/clients/vso/vso_get.pro
%                      $MAIN$             13 /svs/projects/SunEarthConn/BBSO/idl/August2014fromVSO.pro


Suggestions?

Thanks,
Tom
-- 
Dr. William T."Tom" Bridgman               Scientific Visualization Studio
Global Science & Technology, Inc.          NASA/Goddard Space Flight Center
Email: William.T.Bridgman@nasa.gov         Code 606.4
Phone: 301-286-1346                        Greenbelt, MD 20771
FAX:   301-286-1634                        http://svs.gsfc.nasa.gov/

Change History

comment:1 Changed 3 months ago by alisdair

I don't have a fix, but I think I have a workaround. Rather than submitting the whole array of results to vso_get,

IDL> getfile=vso_get(searchfile,filelist=filelist,out_dir=outdir)

loop over the results submitting one at a time to vso_get. i.e.

IDL> for i=0,n_elements(searchfile)-1 do getfile=vso_get(searchfile[i],filelist=filelist,out_dir=outdir)
Last edited 3 months ago by alisdair (previous) (diff)

comment:2 Changed 3 months ago by jacob

I've drilled down to where the query goes into the GONG DB. this is the code that sends the query and stores the results.

In /srv/www/cgi-bin/VSO/PROD/Physics/Solar/VSO/DataProvider/NSO.pm:

my @result = $db->conn->execQuery($countSql, $querySql, (exists $params->{field})? $params->{field}: undef);

my @resarr = @{ $result[0] };

my $productsRet = $result[1];

my $productsFound = $result[2];

I can only imagine that $productsRet & $productsFound are exactly what they sound like: products returned & found. What immediately jumps out at me is that the variable $result does not exist. @result is what comes back from the db. Trying to print or dump $result gives me an error, and the string $result appears nowhere else in the module except the three instances above, however perl seems to be treating it correctly so I'm not changing it.

The dumped contents of $query are:

$VAR1 = 'SELECT \'11014\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'LEARMONTH\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 14 and SPECTRAL_KEY = 5

UNION

SELECT \'11017\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'CERRO TOLOLO\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 13 and SPECTRAL_KEY = 5

UNION

SELECT \'11016\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'BIG BEAR\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 12 and SPECTRAL_KEY = 5

UNION

SELECT \'11013\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'UDAIPUR\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 17 and SPECTRAL_KEY = 5

UNION

SELECT \'11018\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'EL TEIDE\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 16 and SPECTRAL_KEY = 5

UNION

SELECT \'11015\' "RESOURCE_ID" , \'GONG\' "TELESCOPE" , \'GONG\' "SHORT" , \'MAUNA LOA\' "INSTRUMENT" , \'INTENSITY\' "OBSERVABLE" , TO_CHAR(OBSERVATION_START,\'YYYYMMDDHH24MISS\') "OBS_START" , TO_CHAR(OBSERVATION_STOP,\'YYYYMMDDHH24MISS\') "OBS_END" , PRODUCT_COUNT , TO_CHAR(FILESIZE/1024,\'999999999\') , FILE_LOCATION , \'FULLDISK\' "EXTENT" , \'LINE\' "WAVE_TYPE" , \'656.2\' "WAVE_MIN" , \'656.3\' "WAVE_MAX" , \'N/A\' "COMMENTS" , FILE_NAME "FILE_DETAILS" , THUMBNAIL_LR , THUMBNAIL_HR , \'N \' "THUMBNAIL_EXT" , \'GONG_HALPHA_DAILY\' "TABLE_NAME" , \'IMAGE\' "DATA_TYPE" , 0 "CAR_ROT" , 0 "FILE_TOTAL" FROM NSODBO.GONG_HALPHA_DAILY WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 00:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 23:59:59\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 15 and SPECTRAL_KEY = 5 ';

A dump of @results yields an even longer wall of text with all of the parameters for each fileid returned in @results[0] as expected. @results[1] & [2] are the records returned & found:

$VAR2 = 7103;

$VAR3 = 6717;

Will update w/ more information.

Last edited 3 months ago by jacob (previous) (diff)

comment:3 Changed 3 months ago by jacob

Contents of $countSql:

'select sum(C1) from (SELECT 0 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 14 and SPECTRAL_KEY = 5

UNION

SELECT 1 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 13 and SPECTRAL_KEY = 5

UNION

SELECT 2 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 12 and SPECTRAL_KEY = 5

UNION

SELECT 3 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 17 and SPECTRAL_KEY = 5

UNION

SELECT 4 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 16 and SPECTRAL_KEY = 5

UNION

SELECT 5 as IDX, count(*) as C1 FROM NSODBO.GONG_HALPHA WHERE OBSERVATION_START between TO_DATE(\'08-AUG-2014 06:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND TO_DATE(\'10-AUG-2014 21:00:00\',\'DD-MON-YYYY HH24:MI:SS\') AND INSTRUMENT_KEY = 15 and SPECTRAL_KEY = 5 )';

comment:4 Changed 3 months ago by niles

After looking at it for a while it was established that there is a reason why you get two different numbers for products found and products returned.

One part of what happens the query comes in is pretty straightforward. The table in which the data are looks like this :

SQL> describe NSODBO.GONG_HALPHA;
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 TIMESTAMP				   NOT NULL DATE
 OBSERVATION_START			   NOT NULL DATE
 OBSERVATION_STOP				    DATE
 INSTRUMENT_KEY 			   NOT NULL NUMBER(2)
 RESOLUTION_KEY 			   NOT NULL NUMBER(2)
 SPECTRAL_KEY				   NOT NULL NUMBER(2)
 FILESIZE				   NOT NULL NUMBER(15)
 FILE_LOCATION				   NOT NULL VARCHAR2(255)
 FILE_NAME				   NOT NULL VARCHAR2(255)
 THUMBNAIL_LR					    VARCHAR2(255)
 THUMBNAIL_HR					    VARCHAR2(255)
 AVAILABLE				   NOT NULL CHAR(1)

So you query that with the user-supplied times to get the number of files, with a query that is essentially something like :

SELECT COUNT(*) FROM NSODBO.GONG_HALPHA WHERE  
 OBSERVATION_START between TO_DATE('08-AUG-2014 06:00:00','DD-MON-YYYY HH24:MI:SS') 
  AND TO_DATE('10-AUG-2014 21:00:00','DD-MON-YYYY HH24:MI:SS') 
  AND SPECTRAL_KEY = 5;

Which in this case gave 6717 (which lined up with what was on disk, give or take a couple of files that were on disk but probably came in too late to get picked up by the data "spiders" that update the database).

There's a wrinkle to the above query in that the search has to be done over all the GONG instruments (there are 6 of them, spread out all over the globe). This query looked at all the GONG sites. So, the above becomes a UNION of similar queries, one for each site, differing in that they each had their own INSTRUMENT_KEY. That UNION was posted in an earlier comment. But the gist of what was going on is that the query given above is done to do a fine-grained count of the files available.

However, there is also a table that is a daily summary of these products. This table looks like this :

SQL> describe NSODBO.GONG_HALPHA_DAILY
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 TIMESTAMP				   NOT NULL DATE
 OBSERVATION_START			   NOT NULL DATE
 OBSERVATION_STOP				    DATE
 INSTRUMENT_KEY 			   NOT NULL NUMBER(2)
 RESOLUTION_KEY 			   NOT NULL NUMBER(2)
 SPECTRAL_KEY				   NOT NULL NUMBER(2)
 PRODUCT_COUNT				   NOT NULL NUMBER(15)
 FILESIZE				   NOT NULL NUMBER(15)
 FILE_LOCATION					    VARCHAR2(255)
 FILE_NAME					    VARCHAR2(255)
 THUMBNAIL_LR					    VARCHAR2(255)
 THUMBNAIL_HR					    VARCHAR2(255)
 AVAILABLE				   NOT NULL CHAR(1)

This query is pretty indicative of what's in that table :

SELECT 
 TO_CHAR(OBSERVATION_START, 'yyyy-mm-dd hh24:mi:ss'),
 TO_CHAR(OBSERVATION_STOP, 'yyyy-mm-dd hh24:mi:ss'),
 PRODUCT_COUNT, INSTRUMENT_KEY, FILESIZE, FILE_LOCATION, FILE_NAME
FROM NSODBO.GONG_HALPHA_DAILY WHERE  
 OBSERVATION_START between TO_DATE('08-AUG-2014 00:00:00','DD-MON-YYYY HH24:MI:SS') 
  AND TO_DATE('10-AUG-2014 23:59:59','DD-MON-YYYY HH24:MI:SS') 
  AND SPECTRAL_KEY = 5;

TO_CHAR(OBSERVATION TO_CHAR(OBSERVATION PRODUCT_COUNT INSTRUMENT_KEY   FILESIZE  FILE_LOCATION           FILE_NAME
------------------- ------------------- ------------- -------------- ----------

2014-08-08 00:00:00 2014-08-08 23:59:00 	  697		  12 2075929920 /HA/haf/201408/20140808/
2014-08-08 00:00:00 2014-08-08 23:59:00 	  537		  13 1584138240 /HA/haf/201408/20140808/
2014-08-08 00:00:00 2014-08-08 23:59:00 	  321		  14  944953920 /HA/haf/201408/20140808/
2014-08-08 00:00:00 2014-08-08 23:59:00 	  683		  16 2012457600 /HA/haf/201408/20140808/
2014-08-09 00:00:00 2014-08-09 23:59:00 	  687		  12 2059032960 /HA/haf/201408/20140809/
2014-08-09 00:00:00 2014-08-09 23:59:00 	  529		  13 1568064960 /HA/haf/201408/20140809/
2014-08-09 00:00:00 2014-08-09 23:59:00 	  566		  14 1647054720 /HA/haf/201408/20140809/
2014-08-09 00:00:00 2014-08-09 23:59:00 	  689		  16 2018842560 /HA/haf/201408/20140809/
2014-08-10 00:00:00 2014-08-10 23:59:00 	  597		  12 1791213120 /HA/haf/201408/20140810/
2014-08-10 00:00:00 2014-08-10 23:59:00 	  540		  13 1597250880 /HA/haf/201408/20140810/
2014-08-10 00:00:00 2014-08-10 23:59:00 	  570		  14 1655881920 /HA/haf/201408/20140810/
2014-08-10 00:00:00 2014-08-10 23:59:00 	  687		  16 2004768000 /HA/haf/201408/20140810/

12 rows selected.

Note that while there is a FILE_NAME column in this table, that column is not used. So, to generate a daily summary of counts, the user supplied times :

08-AUG-2014 06:00:00 to 10-AUG-2014 21:00:00

Are blatted out to include full days, like this :

08-AUG-2014 00:00:00 to 10-AUG-2014 23:59:00

(that last time should really be 23:59:59 but Oh Well).

And the upshot is that a query like this is done on the daily table :

SELECT SUM(PRODUCT_COUNT) FROM NSODBO.GONG_HALPHA_DAILY WHERE  
 OBSERVATION_START between TO_DATE('08-AUG-2014 00:00:00','DD-MON-YYYY HH24:MI:SS') 
  AND TO_DATE('10-AUG-2014 23:59:59','DD-MON-YYYY HH24:MI:SS') 
  AND SPECTRAL_KEY = 5;

And again, it's really a UNION across the different instrument sites with different settings for INSTRUMENT_KEY, but that query is the gist of it. And that query returns a higher number, 7103 in our case, because it covers more time.

So, in summary :

The product counts can differ if they're done on the fine-grained level or on the daily summary level. That may not be a good thing, and down the raod, rather than having a pre-packaged daily table an on-the-fly GROUP BY could be used with the user supplied times to populate a daily table.

The use of indexes (SPECTRAL_INDEX, INSTRUMENT_INDEX) is pretty dated and should probably not be continued in a re-design.

This does not explain why we have to loop through individual data instead of getting all the data at once in IDL (or does it? Someone who knows IDL more should look). But it does explain the differing counts.

comment:5 Changed 3 months ago by niles

Also we can derive some schadenfreude in knowing that Tom also pursued this query through another, non-VSO portal at NSO. That other portal also capsized, and needed attention to figure out how to stop the runaway jobs it had started.

comment:6 Changed 3 months ago by niles

Idle thought : doing the daily summaries on the fly as opposed to doing them using a static daily table allows for different time periods to be summarized. Hourly, daily, weekly and monthly summaries might be approachable. Not sure about yearly...

comment:7 Changed 3 months ago by jacob

We've demystified this for the most part, NSO starts to truncate results past a certain limit of query results. The truncated results returned are tar archives of all files for the entire days spanned in the query meeting the specified criteria, disregarding any hhmmss specified by the user. For instance, if I had a query spanning a few days (too big to return just fits files) starting on hour 23 of the first day of the query, I would still get tar archives containing all of the files for that first day.

This is a pretty good MO in my opinion, it takes a lot of time & resources to process queries approaching 2000 results, and anybody who deliberately requests thousands of images probably has code to parse the fits metadata to extract only exactly the files they need. We should certainly make them aware of the fact that they are getting more data than they wanted though.

It should also be possible to explicitly request entire days of data tarred up directly, rather than having to write a big-enough query to cause the vso to start returning the archives. Otherwise it could take somebody a lot of time and bandwidth to download a dataset that's really too big to be efficiently transported as individual files.

Also, the reason more files are returned than found is because just the files within the specified time period are counted up for the "results found" number, while "results returned" is the sum of what vso gives you back, in this case, files from the entire span of the days requested.

comment:8 Changed 8 weeks ago by joe

Have him run the query with '/debug', which will have it dump both the XML sent and recieved, then look at the first line, character 50 (you then do a '.continue' to resume)

comment:9 Changed 8 weeks ago by joe

  • Owner changed from alisdair to ed
  • Status changed from new to assigned

After running under /debug, I found :

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>413 Request Entity Too Large</title> </head><body> <h1>Request Entity Too Large</h1>
The requested resource<br />/cgi-bin/vsoi_tabdelim<br /> does not allow request data with POST requests, or the amount of data provided in
the request exceeds the capacity limit. </body></html>
<?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>Application failed during request deserialization:
no element found at line 1, column 0, byte -1 at /opt/vso/lib/perl5/darwin-thread-multi-2level/XML/Parser.pm line 187.
</faultstring></soap:Fault></soap:Body></soap:Envelope>

So, that settings needs to be tweaked on sdo5.nascom to raise the limit. I'm fairly sure there was a higher limit on vso.nascom, and I think it was in the mod_security configuration (which would explain why there's a message *before* the SOAP response, and SOAP is complaining there was nothing sent)

comment:10 Changed 7 weeks ago by ed

The issue seems to be that vso_get cannot handle directly a record list of Summary row data. When a vso_search is done that returns Summary row data, the subsequent use of vso_get directly on the entire record list fails with the error Tom noted.

This occurs with other instruments such as AIA, and is not restricted to just GONG data.

When a vso_search is done over a small datetime range that does not trigger the generation of Summary row data, both vso_search and vso_get work as expected.

AIA data search for near='2017-12-17':

IDL> aiafile=vso_search(near='2017-12-17', inst='aia')

IDL> results=vso_get(aiafile,filelist=filelist,out_dir=outdir) % Compiled module: VSO_GET. % Compiled module: UNIQ. % Compiled module: DEFAULT. % Loaded DLM: XML. % VSO_GET: This will download 1 file(s) 1 : http://sdo4.nascom.nasa.gov/cgi-bin/drms_export.cgi?series=aia__lev1;record=335_1292544039-1292544039

AIA datetime search w/start='2017-12-17 03:00' end='2017-12-18 22:00':

IDL> aiafile2=vso_search(start='2017-12-17 03:00', end='2017-12-18 22:00', inst='aia') Records Returned : SDAC_AIA : 0/0 Records Returned : JSOC : 0/

JSOC : Large Query (over 5000 records); Summary rows returned; Each record describes multiple observations

Records Returned : JSOC : 103121/103121

IDL> results2=vso_get(aiafile2,filelist=filelist,out_dir=outdir) % IDLFFXMLDOMDOCUMENT::LOAD: Parser fatal error: File: IDL STRING, line: 1, column: 50 : Expected whitespace % Compiled module: ERR_STATE. % XMLPARSER::DOM: IDLFFXMLDOMDOCUMENT::LOAD: Error encountered during the parse operation. % XMLPARSER::FINDELEMENT: No DOM tree passed in % Attempt to call undefined method: 'IDL_INT::GETFIRSTCHILD'. % Execution halted at: SOAP::DESERIALIZE 223 /usr/local/ssw/gen/idl/clients/vso/soapdefine.pro % SOAP::SEND 175 /usr/local/ssw/gen/idl/clients/vso/soapdefine.pro % VSO::GETDATA 804 /usr/local/ssw/gen/idl/clients/vso/vsodefine.pro % VSO_GET 152 /usr/local/ssw/gen/idl/clients/vso/vso_get.pro % $MAIN$ IDL>

When the workaround of Alisdair is used to step thru the Summary row data returned, vso_get does go out and gets the file:

Getting the first 10 records is slow, but it does proceed:

IDL> print,n_elements(aiafile2)

2372

IDL> for i=0,10 do r=vso_get(aiafile2[i],filelist=filelist,out_dir=outdir) % VSO_GET: This will download 1 file(s) 1 : http://sdo4.nascom.nasa.gov/cgi-bin/drms_export.cgi?series=aia__lev1;record=304_1292653808-1292654396 % SOCK_GET_MAIN: 362405569 bytes of aialev1_4k_304A_1292653808-1292654396.tar copied in 902.21 seconds. % VSO_GET: Downloading completed

A similar vso_search of GONG data for a restricted datetime range results in vso_get working as expected when no Summary row data is returned.

IDL> gongfile=vso_search(near='2014-08-06', source='GONG') Records Returned : NSO : 11/11

IDL> results3=vso_get(gongfile,filelist=filelist,out_dir=outdir) % VSO_GET: This will download 11 file(s) 1 : ftp://gong2.nso.edu/oQR/bqa/201408/bbbqa140806/bbbqa140806t0004.fits.gz % Compiled module: SOCK_CHECK. % Compiled module: MKLOG. % Compiled module: SOCK_CONTENT_FTP. % SOCK_GET_MAIN: 570672 bytes of bbbqa140806t0004.fits.gz copied in 2.17 seconds. 2 : ftp://gong2.nso.edu/oQR/bqa/201408/lebqa140806/lebqa140806t0004.fits.gz % SOCK_GET_MAIN: 575179 bytes of lebqa140806t0004.fits.gz copied in 2.35 seconds. 3 : ftp://gong2.nso.edu/oQR/bqa/201408/mlbqa140806/mlbqa140806t0004.fits.gz % SOCK_GET_MAIN: 560993 bytes of mlbqa140806t0004.fits.gz copied in 2.32 seconds. 4 : ftp://gong2.nso.edu/oDM/vzi/201408/mrvzi140806/mrvzi140806t0000.fits.gz % SOCK_GET_MAIN: 970831 bytes of mrvzi140806t0000.fits.gz copied in 3.67 seconds. 5 : ftp://gong2.nso.edu/HA/haf/201408/20140806/20140806000034Lh.fits.fz % SOCK_GET_MAIN: 2911680 bytes of 20140806000034Lh.fits.fz copied in 6.00 seconds. 6 : ftp://gong2.nso.edu/HA/haf/201408/20140806/20140806000014Mh.fits.fz % SOCK_GET_MAIN: 2900160 bytes of 20140806000014Mh.fits.fz copied in 6.82 seconds. 7 : ftp://gong2.nso.edu/HA/haf/201408/20140805/20140805235954Bh.fits.fz % SOCK_GET_MAIN: 3000960 bytes of 20140805235954Bh.fits.fz copied in 6.80 seconds. 8 : ftp://gong2.nso.edu/oDM/bzi/201408/bbbzi140806/bbbzi140806t0000.fits.gz % SOCK_GET_MAIN: 684101 bytes of bbbzi140806t0000.fits.gz copied in 2.41 seconds. 9 : ftp://gong2.nso.edu/oDM/bzi/201408/lebzi140806/lebzi140806t0004.fits.gz % SOCK_GET_MAIN: 702408 bytes of lebzi140806t0004.fits.gz copied in 2.97 seconds. 10 : ftp://gong2.nso.edu/oDM/bzi/201408/mlbzi140806/mlbzi140806t0001.fits.gz % SOCK_GET_MAIN: 697099 bytes of mlbzi140806t0001.fits.gz copied in 2.50 seconds. 11 : ftp://gong2.nso.edu/oDM/bzi/201408/mrbzi140806/mrbzi140806t0000.fits.gz % SOCK_GET_MAIN: 776904 bytes of mrbzi140806t0000.fits.gz copied in 2.67 seconds. % VSO_GET: Downloading completed

The GONG downloads are much faster because the size of the files are considerably smaller than the AIA files.

I could not reproduce the error when using /DEBUG in vso_search for GONG data, the search proceeded without error.

comment:11 Changed 7 weeks ago by joe

Ed, it's not the search that's erroring ... it's the call to vso_get, because it's trying to request almost 2000 fileids at the same time, and SDO5 is rejecting it. If you've set VSO_DEFAULT_SERVER, you might be using an API that doesn't have that problem. Here's what I'm seeing, using Tom's query:

IDL> startDate='2014/08/08 06:00'
IDL> endDate='2014/08/10 21:00'
IDL> sampleStep=10
IDL> wavelength='6562'
IDL> outDir='/tmp'
IDL> searchfile=vso_search(date=startDate+' - '+endDate, source='GONG', min_wave='6560', max_wave='6570')
Records Returned : NSO : 1981/6717
IDL> getfile=vso_get(searchfile,filelist=filelist,out_dir=outdir,/nodownload)
% IDLFFXMLDOMDOCUMENT::LOAD: Parser fatal error: File: IDL STRING, line: 1,
                        column: 50 : Expected whitespace
% XMLPARSER::DOM: IDLFFXMLDOMDOCUMENT::LOAD: Error encountered during the
              parse operation.
% XMLPARSER::FINDELEMENT: No DOM tree passed in
% Attempt to call undefined method: 'IDL_INT::GETFIRSTCHILD'.
% Execution halted at: SOAP::DESERIALIZE  223   /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      SOAP::SEND        175   /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      VSO::GETDATA      804   /usr/local/ssw/gen/idl/clients/vso/vso__define.pro
%                      VSO_GET           153   /usr/local/ssw/gen/idl/clients/vso/vso_get.pro
%                      SOAP::POST        160   /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      SOAP::SEND        172   /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      VSO::GETDATA      804   /usr/local/ssw/gen/idl/clients/vso/vso__define.pro
%                      VSO_GET           153   /usr/local/ssw/gen/idl/clients/vso/vso_get.pro
%                      $MAIN$

With /debug in the vso_get call, I get the message I mentioned before:

<html><body><title>413 Request Entity Too Large</title> </head><body> <h1>Request Entity Too Large</h1>
The requested resource<br />/cgi-bin/vsoi_tabdelim<br /> does not allow request data with POST requests, or the amount of data provided in
the request exceeds the capacity limit. </body></html>

Asking the IDL session to retrieve a single summary record works just fine :

IDL> getfile=vso_get(searchfile[1],filelist=filelist,out_dir=outdir,/nodownload)
IDL> print_struct, getfile
% Compiled module: PRINT_STRUCT.
PROVIDER                                                                               FILEID                                                                  URL  INFO        SIZE
       NSO  pptid=11014;url=ftp://gong2.nso.edu/HA/haf/201408/20140808/20140808060134Lh.fits.fz  ftp://gong2.nso.edu/HA/haf/201408/20140808/20140808060134Lh.fits.fz             0.000
IDL>

It's even fine for 1001 records, but as we get larger, it craps out:

IDL> getfile=vso_get(searchfile[0:500],filelist=filelist,out_dir=outdir,/nodownload)
IDL> getfile=vso_get(searchfile[0:1000],filelist=filelist,out_dir=outdir,/nodownload)
IDL> getfile=vso_get(searchfile[0:1100],filelist=filelist,out_dir=outdir,/nodownload)
% IDLFFXMLDOMDOCUMENT::LOAD: Parser fatal error: File: IDL STRING, line: 1, column: 50 : Expected whitespace
% XMLPARSER::DOM: IDLFFXMLDOMDOCUMENT::LOAD: Error encountered during the parse operation.
% XMLPARSER::FINDELEMENT: No DOM tree passed in
% Attempt to call undefined method: 'IDL_INT::GETFIRSTCHILD'.
% Execution halted at: SOAP::DESERIALIZE  223 /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      SOAP::SEND        175 /usr/local/ssw/gen/idl/clients/vso/soap__define.pro
%                      VSO::GETDATA      804 /usr/local/ssw/gen/idl/clients/vso/vso__define.pro
%                      VSO_GET           153 /usr/local/ssw/gen/idl/clients/vso/vso_get.pro
%                      $MAIN$

.... and I'd also say that comparing *anything* to errors that you get from SDO data is just wrong, as it's running a really hacked up dataprovider so it can interface w/ NetDRMS.

comment:12 Changed 7 weeks ago by alisdair

If you

setenv VSO_SERVER http://vso03.nispdc.nso.edu/cgi-bin/vso/vsoi_tabdelim

before kicking off SSWIDL then the original query from Tom Bridgman and Ed's example with AIA data above, work without an issue.

Note: See TracTickets for help on using tickets.