wiki:jmdBulkLoad

Version 3 (modified by joe, 10 years ago) (diff)

--

Bulk Loading SUNUMS into the JMD

There are times when scientists want to retrieve a lot of data. If we can insure that we have all of the data already local, their downloading will go much faster.

But to do this, we need to get the list of SUNUMs to the JMD.

There are at least three ways to do this:

Using the JMD admin tools

In your JMD install directory, there should be a bin directory with the script jmd_admin.pl. It allows you to do many things, including request specific sunums be retrieved:

   ## To make a data request
     ./jmd_admin.pl --request -- --series=<series_name> --sunums=<comma separated sunum list> [--priority=<a number from 0 to 1000>]
       e.g.
         ./jmd_admin.pl  --request -- --series=hmi_test.v_45s --sunums=16983416,16977656 --priority=20

Typically, you won't know what the list of sunums are that need to be retrieved, so you'll need to use one of the other methods.

Using show_info

If you call the DRMS command show_info with the argument -p, it will attempt to retrieve the records identified by the query. This is useful for contiguous time ranges, but I'm not certain if the logic for sampling is the same between DRMS and the VSO. If the user is going to be using the DRMS for their retrieval, just use their query. if they're going to be using the VSO with the IDL sample keyword, use the next method.

Using the tables in DRMS

If we have a list of VSO fileids, we can write queries that will put the appropriate values into the table public.sunum_queue, which is used to track new observations that we'd like the JMD to retrieve.

warning : something is wrong with this method. See note at end

From IDL, we can get a list of VSO fileids:

IDL> a=vso_search('2012/12/29','2015/03/01', inst='aia', sample=3600*6., wave=304, prov='jsoc')
Records Returned : JSOC : 3112/3112
IDL> print, a.fileid
aia__lev1:304:1167588044 aia__lev1:304:1136808044 aia__lev1:304:1138406444 aia__lev1:304:1173830444 aia__lev1:304:1150243244
aia__lev1:304:1192989644 aia__lev1:304:1180418444 aia__lev1:304:1159056044 aia__lev1:304:1136527244 aia__lev1:304:1153893644
aia__lev1:304:1187503244 aia__lev1:304:1200312044 aia__lev1:304:1200376844 aia__lev1:304:1160870444 aia__lev1:304:1177221644
aia__lev1:304:1142316044 aia__lev1:304:1195495244 aia__lev1:304:1153008044 aia__lev1:304:1146528044 aia__lev1:304:1185300044
...

AIA fileids have three parts separated by colons: the series identifier, the wavelength, and the time (T_REC_INDEX). HMI fileids have two parts : the series identifier, and the time.

Use your preferred tools to extract the list of times and turn it into a comma seperated list, then issue an INSERT similar to:

SELECT nextval('sunum_queue_key'::regclass), lev1.sunum,  'aia.lev1 bulk' AS series_name, now() as timestamp, lev1.recnum  
FROM vso.aia__lev1 LEFT OUTER JOIN sunum_queue ON lev1.sunum = sunum_queue.sunum 
WHERE sunum_queue.sunum IS NULL and WAVELNTH=304 and T_REC_INDEX IN (
1135814444,1135836044,1135857644,1135879244,1135900844,1135922444,1135944044
...
));

Note that this query does not have 'prime key logic', and if used for times from 2010 could cause multiple images to be retrieved for each observation. If you have the VSO shadow tables built, you can do:

INSERT INTO sunum_queue ( 
SELECT nextval('sunum_queue_key'::regclass), aia__lev1.sunum,  'aia.lev1 bulk' AS series_name, now() as timestamp, aia__lev1.recnum  
FROM vso.aia__lev1 LEFT OUTER JOIN sunum_queue ON aia__lev1.sunum = sunum_queue.sunum 
WHERE sunum_queue.sunum IS NULL and wave=304 and T_REC_INDEX IN (
1135814444,1135836044,1135857644,1135879244,1135900844,1135922444,1135944044
...
));

Note that this technique should only be used after the other options. Due to the way that the JMD pulls records out of the queue table, more recent observations will take precidence.

note : somehow, the queue went down by ~5k, and then stopped processing. I had to manually move the sunums from DRMS to the JMD:

alias drms='psql -U postgres -d sdac_drms'
drms -c "select sunum from sunum_queue where series_name like '%bulk'"' > /tmp/for_mk
perl -e 'my $i=100,@sunums; while (my $s = <>) { next if $s!~m/^ ?\d+$/; $s=~s/\s//g; push (@sunums, $s); if ( !$s or !--$i) {  my $sunums = join(",", @sunums); print qq{/opt/JMD/bin/jmd_admin.pl --request -- --series="aia.lev1 bulk2" --sunums=$sunums\n}; last if !$s; $i=100; @sunums=() }}' /tmp/for_mk  > /tmp/jmd_load_mk
sh /tmp/jmd_load_mk
drms -c '"delete from sunum_queue where series_name like '%bulk%'"