Ticket #302 (closed problem: fixed)

Opened 3 months ago

Last modified 3 months ago

NGDC data provider issues

Reported by: niles Owned by: niles
Priority: highest Milestone:
Component: DP:NGDC Version: 1.4
Severity: blocker Keywords:
Cc:

Description

Basically, there are oddities with the NGDC data provider that I need to get to the bottom of.

Alisdair noticed about 7K FITS files in vso/DataProviders/NGDC/indexes/ in CVS. They have names like INDEX_20060731_A_13.FTS and appear to be daily file lists, for an example see

https://satdat.ngdc.noaa.gov/sxi/archive/fits/goes15/2018/03/08/

In trying to ascertain if we need the FITS files in CVS, it was noted that the files are not up to date in that they stop several years ago, which made us wonder what is up with the NGDC data provider. There is a script in CVS named download_indexes.pl that, with minor updating, downloads the FITS files. Nor is it clear if the FITS files we have in CVS are up to date in the sense that they are the same as what we'd get if we re-ran the download script.

Also, searching for SXI data in the web interface gives the message "VSO-D503 Unavailable - Not able to access NGDC at this time. We are working on a solution".

In that CVS directory is a file, notes.txt, that contains "Emails between Joe H & Dan Wilkinson". It may well be that I need to contact Dan to try to figure out what the state of this data provider is. It is not at all clear that the INDEX files are even used, or if they were generated for us on a bespoke basis and thus should be retained in CVS.

Attachments

ngdc_error.png (13.8 KB) - added by niles 3 months ago.
Error when searching NGDC in the web interface
naming_convention.txt (2.2 KB) - added by niles 3 months ago.
notes.txt (4.0 KB) - added by niles 3 months ago.

Change History

Changed 3 months ago by niles

Error when searching NGDC in the web interface

comment:1 Changed 3 months ago by niles

  • Owner changed from joe to niles
  • Status changed from new to assigned

comment:2 Changed 3 months ago by niles

Dan's email is daniel.c.wilkinson@…

comment:3 Changed 3 months ago by alisdair

  • Component changed from (unfiled) to DP:NGDC

comment:4 Changed 3 months ago by alisdair

Supersedes ticket 193.

comment:5 Changed 3 months ago by alisdair

Niles - you should check out the following tickets relevant to NGDC. Tickets 40, 41, 154 and 220.

comment:6 Changed 3 months ago by ed

The NGDC Data provider connects to the goes_12 MySQL database at hondo.ngdc.noaa.gov

I've sent an email today (5/17/18) to Dan Wilkinson at NOAA asking what is the new hostname.

Note: We should probably rename the NGDC data provider to NCEI

(National Centers for Environmental Information) since they changed their name several years ago.

comment:7 Changed 3 months ago by niles

It's becoming clear what needs to be done with the NGDC data provider :

[1] Get it working again by getting it to contact the database correctly again, and

[2] Rename it to NCEI rather than NGDC, and

[3] Look at adding other SXIs on other GOES platforms, right now we only have GOES-12 (GOES-12's SXI was built by Marshall and differs from the other ones from Lockheed) and

[4] See if the Wendelstein observatory data can be added, the broken link we have is http://www.ngdc.noaa.gov/idb/struts/results?op_0=eq&v_0=Wendelstein&t=102827&s=18&d=160 and

[5] Establish that the INDEX*.FTS files in CVS under vso/DataProviders/NGDC/indexes/ can be removed from CVS (I *strongly* suspect they were committed by error or were part of an earlier attempt to "scrape" the data).

This ticket has really snowballed from its "Gee, what are those INDEX*.FTS files doing in CVS?" origin. I'm emailing ncei.info@… (the email given at the above broken link) to see where the Wendelstein data are now. Ed has an email in with Dan asking where the database is now.

Niles.

comment:8 Changed 3 months ago by niles

Looks like Wendelstein data are at :

https://www.ngdc.noaa.gov/stp/solar/fsundraw.html

The web page describes them as "lovely images", and they are, but possibly this should be a separate ticket from the NCEI (formerly NGDC) issues.

comment:9 Changed 3 months ago by niles

Ed got an email response from Dan at NOAA :

"We did move goes databases to another machine this year, however, we disabled all non-noaa access several years before that and I'm pretty sure that policy is getting tighter with time rather than the reverse. All of the data are online and available for unlimited use, just not via the database. I hope you can make use of them in that form."

Which I think probably knocks us back to scraping their online data.

Also I'm going to open another ticket for the "lovely images" in the sun drawings. That is a separate issue.

Changed 3 months ago by niles

Changed 3 months ago by niles

comment:10 Changed 3 months ago by niles

Looking at online documentation at :

https://sxi.ngdc.noaa.gov/images/section06.pdf

and :

https://sxi.ngdc.noaa.gov/docs/GOES_N_Series_Databook_rev-D_ch6_SXI.pdf

Also I attached a couple of small text files as notes.

We're going to have to scrape the database - get all files, save the headers, make a note of the size of the FTS files. And we're going to need to be able to do that moving forward, too. This is going to take a while.

comment:11 Changed 3 months ago by alisdair

Joe G's email.

    Folks -

    Basically, we have it all: they start in 1974, and if the directory listing 
    for 2016 is any indication,  we have all the data from all the GOES 
    spacecraft that were in operation during the year.

    The names of the earliest files conform to an 8.3 format, so we might 
    want to think about changing those (though in some sense that would 
    be tinkering with the archive), and the FITS structure changed from 
    simple to binary table at some point as well.

    Please see the top-level directory listing, directory listing for the 2016 
    files, and a sample of a binary FITS file’s header can be found at:

	https://www.dropbox.com/sh/n0fk08hu0nb92xq/AAANpWqIvjmzKgZEh6hRl5j0a?dl=0 .

    Best,

						Joe
Last edited 3 months ago by alisdair (previous) (diff)

comment:12 Changed 3 months ago by niles

I got an email from Joe H (which I initially mis-read as being a comment on a ticket, but it was just an email). The exchange, in part, is below. I think it means we can get the .FTS files out of CVS, at a minimum. Thanks to Joe H for chiming in!

In trying to ascertain if we need the FITS files in CVS, it was noted that the files are not up to date in that they stop several years ago, which made us wonder what is up with the NGDC data provider. There is a script in CVS named download_indexes.pl that, with minor updating, downloads the FITS files. Nor is it clear if the FITS files we have in CVS are up to date in the sense that they are the same as what we'd get if we re-ran the download script.

The indexes were corrupted, so I stopped downloading them, as I was going to have to re-download everything once they were fixed. It’s a really resource-intensive thing, as it tries grabbing files until it can’t. (basically, there might be a ‘A’ file, an ‘A’ and ‘B’, or ‘A’,’B’ and ‘C’)

The problem was that the dates ended in ‘.0’, throwing off the fields by 2 columns.

Also, searching for SXI data in the web interface gives the message "VSO-D503 Unavailable - Not able to access NGDC at this time. We are working on a solution”.

Which has been up there since the SDAC moved IP networks, and NOAA wouldn’t give us the same firewall access. (2009?). the index parsing was the solution that I was working on.

comment:13 Changed 3 months ago by niles

  • Status changed from assigned to closed
  • Resolution set to fixed

This ticket snowballed into issues that are now better covered by tickets 305, 306 and 307. I'm resolving it because the basic issue of the many FTS files in CVS under vso/DataProviders/NGDC/indexes has been addressed. It has been established that the files are not useful and they have been removed, although I did retain 10 of them (out of the ~7K we had in CVS).

Note: See TracTickets for help on using tickets.