Ticket #235 (closed new task: fixed)

Opened 4 years ago

Last modified 2 weeks ago

Find notes from past tech meetings.

Reported by: joe Owned by: joe
Priority: normal Milestone:
Component: Undefined Version: 1.2
Severity: normal Keywords:
Cc:

Description

For #166 (cartAPI) & catalogs.

Change History

comment:1 Changed 2 weeks ago by joe

From Nov 2006 Tech Meeting (w/ notes re: scientist responses):

Shopping Cart API

What functions?

Using it as a scratch pad -- save to come back later

Should be able to save catalog info, too, not just data

Anyone can find carts by just guessing random numbers -- does that make it a problem for people wanting to make private notes (eg, things to look at later) ... even placing items into a cart might give people an idea of what people are looking at (eg. the folks who got scooped on the finding of Xena?)

Might have to handle authentication for the carts.

keep private until it's been scooped BETTER SOLUTION : download it to your local system ... it gets

purged off the server, but they can publish it later.

JOE G : wait 'till people complain ... just do it on the

server for now, as it requires less effort for the scientists

Types of Annotations:

public vs. personal vs. shared w/ small group

informational (general notes?) problem reports citation (article usage)

Place the carts into SVN ?

So that people have history & can revert back?

Or, have to publish the modified version, and give a note on the previous one that it's been updated, and refer to new location

Joe G seems not thrilled.

lots of extra work; might need isolated server; only a new one if it's substacilly different

Create (Publish) ... are they two different functions?

One works locally, one saves on the server

Update (add records; delete records; annotate record, annotate catalog) (status of ordered products -- for async GetData?)

(personal, don't worry about it?) does have some info about if they tried to get the data, not just used it to make a movie

Delete (flag cart as not available?)

Should other people be able to annotate the catalog / records?

records themselves are a new cart

comments about another cart/catalog -- is it a new cart, or is it linked to one being commented on?

PROBLEM : would we need authentication to keep out spammers/griefers/etc, if we allow posting like this?

(allow them to give a link to to a new cart, and only a cart?) (might have a cart w/ no records, just annotations?)

POSSIBLE SOLUTION : we store an e-mail address (not public); when they want to edit, we send them e-mail with a one time or time sensitive password.

JOE G: NO -- mail someone @ vso, they can pass along stuff.

(creates lots of problems with that nasa security plan)

Needed : way to replay the query (use it to save the search terms)

YES

(maybe a persistant search agent ... for later)

if so, need e-mail to notify them / RSS feed / etc.

NO -- good idea, but not as part as VSO -- can be a value add later.

Find : Search and Browse

What metadata do we search on? Search on views/usage of cart, not just cart itself

Permanence : store as XML, in SVN

Issue -- this should be at the central place, so it doesn't bog down an individual instance

POSTER or ARTICLE on cart/catalog merged functionality ?

comment:2 Changed 2 weeks ago by joe

  • Status changed from new to closed
  • Resolution set to fixed

Notes from Feb 2008 (possibly sent in advance of the meeting, but has some additions to the Nov 2006 tech meeting)

This was also the basis for the poster "Finding Data Using Event and Feature Catalogs in the VSO" at the 2009 SPD meeting : https://sdac.virtualsolar.org/catalogs/jhourcle_SPD_2009.pdf

============================
Questions for the Scientists
============================

  * Catalogs
    * Which Catalogs are most important to implement first?
    * What types of questions would you like to ask?

  * 'New' Parameters
    * How would you word questions for ...
      * Spatial searches
      * Cadence searches
      * Plate scale searches?
      * Resolution (number of pixels) searches?

  * What sort of results would you expect back from those searches?

=======================
Technical Meeting Notes
=======================

What current catalog features do we want to keep?

        Ability to use catalog to extract dates, then use that range to search on
        varying the window (eg, 2 hrs before)


        Items to use from the catalog to search for data:

                fileids (to call getdata directly?) -- or a VSO cart id?

                times

                spectral ranges (where it was seen in?)

                spatial location (region of interest)

                (can you correlate angles in the partial halo CMEs in LASCO to
                where TRACE was observing?)
                        (maybe ... may be easier for some locations of TRACE, and
                        more difficult if it's not in the limb ... need location of
                        the origin of the CME)

                        * need coordinate transformations

                Need to be able to handle 'paths' -- as the instrument is
                tracking across, don't maintain ALL of the data points.


        need to be able to be able to take a given position, translate to
        another time, based on the photospheric rotation
                * for now, only worry about across the front ...14 day window
                * do we assume average rate, or deal with differential speeds
                  between hemisphere & poles?

        standardize on a coordinate system
                * should movement with time be a factor on the choice?
                * TRACE, Swedish Telescope, RHESSI, CDS, etc. and other
                  partial_sun instruments should get precidence over fulldisk
                  instruments.


[CENSORED -- discussion of something to propose that's only
peripherally VSO related ... Alisdair to coordinate]

** Add to registry (or in that format) all known data collections, even
   if not yet in VSO
        (so we can tell people where to go, even if we don't have it yet, or
        we can get insight into what's of interest, or get letters of
        support for us to get funding to bring them online & integrate)

        NOTE -- scientists didn't seem thrilled with the idea.


Catalogs Types of fields:
        Time :
                UTC ; Carrington Rotation
        Time Range
        Spatial :
                coordinate system specific (Carrington longitude, cartesian)
                Specific points  OR area/range
        Enumerations:
                Orderable (greatest -> least) vs. Unorderable (list of
                instruments)
                        Orderable may have specific values (eg, Rhessi spectral bands)
                Mutually Exclusive vs. not  [flags?]
                May be two-dimensional -- should it be broken down into each
                dimension?

        Boolean :
                True / False / NULL
                Need to know how it's recorded in their system
        Numeric :
                integer  or float
                (signed or not?)
                unitless (count / ratio) vs. units
                what is this a measurement of ?  (so we know what we can join
                against or derive)
                        (this may be an issue for CoSEC.  ontologies?)
                should we store min/max info, so we can prompt users if bad
                values?
        Numeric Ranges
        Comments / Free Text
        Primary Key
                eg. NOAA Active Region #; RHESSI flare #; Bright Point #
        Foreign Key
                (and need to know the catalog it refers to)
        Foreign Key to Data
                (eg, SDAC fileids to get to EIT data)
                (note -- may have a prefix / format for the column, and this is
                only
                the variable part)
        Flags
                (boolean, but need info from extracting it from text...)
                (the list may grow with time?)

        URL
                to thumb / browse data
                (may be derivable -- eg, need prefix to ftp server)



        Summary data -- rollups into useful sized chunks

                every catalog should have a default style of rollup
                        (should be based on user context, but we don't have that)
                (need an 'expand all' button)

                For each field, there may be a normal way to summarize it:
                        min : eg, start time
                        max : eg, end time
                        count : primary key, just show # of records in rollup
                        sum : (if the column is a count of something)
                        avg
                        range extents (for orderable enumerations)
                        histogram : if it was an enum ( 15 x; 17 y; 3z )
                        (nothing)
                        ratio : boolean : (3 of 8) or 40%
                        flags : (if true, show it?)


        ** The Cart is a catalog!
                catalogs also have annotation (record & catalog level)
                as well as arbitrary fields


        Get URL Query -- go from the catalog to the VSO form filled out (or
        to another catalog)

        Have metadata that describes the whole catalog (annotations, etc.)

        Need ways to visualize data that's applied across more than one
        record (eg, FK to thumbnail image, only show once, not for all
        records)




* Need to give them ways to derive values for each record ... so that
  they can cross-correlate.


* Test Alisdair's BP catalog -- if we can implement that, should be able
  to do all.
        ( & get back to the original EIT obs through VSO)


Catalog Protocols:

        Ingest / Update  (may need to delete, not just insert)
        Query
        Export ( various formats -- SQLite, CSV, FITS, VOTable, ASCII )

+       Catalog API
+               request last modified date for catalog
+               request records since date (or between dates?)
                + (include intersections -- things occuring at that time)
                ? will we ever have a catalog that is NOT time based?
+               request the most recent record (special case of closest to (x))

+ NOTE -- if someone wants to use this for replication, they can track the old
  state, and get the new state, and run the diff themselves -- the data provider
  does not need to track modifications/deletions of each record.



        RSS feed for new entries
                Maybe with a filter? ... just want X class flares, etc.

                Could also be used by replics to keep in sync (maybe RDF, not
                RSS) might need to track modified date on each record in the
                catalog replication -- look at specific database (or LDAP)
                replication

        What happens if a catalog gets out of sync, or the lag between
        catalog updates

What info do we need about them?
        Location of authoritative source
                Location of lowest level machine readable source (for mirroring)
        + any documentation for the catalog
                + any peer-reviewed article that explains the catalog
        Who maintains it (contact info, name, etc.)
        List of columns.
        Default way of sorting it
                + fields, asc vs. desc, etc.
        Column order (view)
                + NO -- we store a relationship to 'default view' (and the default
                  sorting may be a function of the view).  Each catalog may have 
                  multiple views, and some views may be generated from combining
                  multiple catalogs.
                + also store a 'long view' that includes all values for a catalog
                  (may be the same as the default)

        Is it active? (still being added to?)
                Last modified date
                Note -- just because not active, doesn't mean that it won't
                be updated again in the future, if the data is found to be
                suspect.
        Overall time range (if applicable)
        + primary key (might be time-dependant, might not)

        published vs. dynamic catalog?
                + Need to be able to query catalog provider to get last modified date


        For each column:
                what it means (human readable)
                what it means (machine readable)   *later
                        + UCD+ (IVOA) or ontology
                what the values mean (eg, enums, how they define terms used)
                units
                precision
                what does it mean if it's empty?
                maybe:
                        min/max allowed
                        validation info
                what type it is (enum/numeric/comment/see above list)
                how to best summarize it?
                        (maybe that's a function of the view?)

                How to search on it (more than just enum vs. numeric --
                        nicknames for normal searching)

+       For each 'concept' that a field embodies:
                standard ways to present that info
                        eg, Flare Magnitude : Ergs/cm-2 vs. X#.#
                        (european date vs. american date, etc)
+       For each 'view':
                the catalogs that it requires
                a default sort order
                        (relationship w/ fields?)
                column order
                For each field:
                        how to present / format the field
                                (relationship w/ 'concept' formatting)

        SPASE:
                What it's derived from?

What do we do if no Primary Key?
        sythesize it from the rows
        we may have to assign something



Things that should be possible with the catalogs:
        SolarMonitor.org
        Alisdair's BP catalog
        SSW latest events ?

Look at EGSO's catalogs
Look at Astro folks for catalog standards (Astro::Catalog in CPAN)

Igor likes the zoom thing in SSW.


Note: See TracTickets for help on using tickets.