wiki:SortingResults

Version 3 (modified by alisdair, 6 years ago) (diff)

--

In a search for, in this case, AIA data using "sample=7200", the returned results are not date ordered.

IDL> a=vso_search('2014-09-19T01:00:00', '2014-09-19T21:00:00',instrument='AIA', sample=7200)
Records Returned : JSOC : 10/10
Records Returned : SDAC_AIA : 0/0
IDL> print, a.time
{ 2014-09-19T11:00:01 2014-09-19T11:00:02}{ 2014-09-19T03:00:01
2014-09-19T03:00:02}{ 2014-09-19T15:00:01 2014-09-19T15:00:02}{
2014-09-19T01:00:01 2014-09-19T01:00:02}{ 2014-09-19T07:00:01
2014-09-19T07:00:02}{ 2014-09-19T19:00:01 2014-09-19T19:00:02}{
2014-09-19T09:00:01 2014-09-19T09:00:02}{ 2014-09-19T13:00:01
2014-09-19T13:00:02}{ 2014-09-19T05:00:01 2014-09-19T05:00:02}{
2014-09-19T17:00:01 2014-09-19T17:00:02}

However when doing a straight, non-sampled, search for AIA data, it is returned date ordered.

IDL> a=vso_search('2014-09-19T01:00:00', '2014-09-19T01:00:5',instrument='AIA'))
Records Returned : JSOC : 24/24
Records Returned : SDAC_AIA : 0/0
IDL> print, a.time                                                              
{ 2014-09-19T01:00:01 2014-09-19T01:00:02}{ 2014-09-19T01:00:02
2014-09-19T01:00:03}{ 2014-09-19T01:00:06 2014-09-19T01:00:07}{
2014-09-19T01:00:07 2014-09-19T01:00:08}{ 2014-09-19T01:00:07
2014-09-19T01:00:08}{ 2014-09-19T01:00:08 2014-09-19T01:00:09}{
2014-09-19T01:00:11 2014-09-19T01:00:12}{ 2014-09-19T01:00:11
2014-09-19T01:00:12}{ 2014-09-19T01:00:13 2014-09-19T01:00:14}{
2014-09-19T01:00:14 2014-09-19T01:00:15}{ 2014-09-19T01:00:16
2014-09-19T01:00:17}{ 2014-09-19T01:00:18 2014-09-19T01:00:19}{
2014-09-19T01:00:19 2014-09-19T01:00:20}{ 2014-09-19T01:00:20
2014-09-19T01:00:21}{ 2014-09-19T01:00:23 2014-09-19T01:00:24}{
2014-09-19T01:00:23 2014-09-19T01:00:24}{ 2014-09-19T01:00:25
2014-09-19T01:00:26}{ 2014-09-19T01:00:26 2014-09-19T01:00:27}{
2014-09-19T01:00:30 2014-09-19T01:00:31}{ 2014-09-19T01:00:30
2014-09-19T01:00:31}{ 2014-09-19T01:00:31 2014-09-19T01:00:32}{
2014-09-19T01:00:32 2014-09-19T01:00:33}{ 2014-09-19T01:00:35
2014-09-19T01:00:36}{ 2014-09-19T01:00:35 2014-09-19T01:00:36}

The sample query is a follows

SELECT
   date_obs,
   date_obs as date_end,
   t_rec_index,
   fsn,
   recnum,
   sunum,
   wave as wavemin,
   wave as wavemax,
   clusterid,
   fileid
FROM vso.%s
WHERE 1=1 %%s
  AND t_rec_index IN (
    SELECT hashed.d
    FROM (
      SELECT MIN(t_rec_index) as d,
        FLOOR( ( EXTRACT(EPOCH FROM date_obs) - EXTRACT(EPOCH FROM CAST(? AS TIMESTAMP) ) ) / ? ) AS hash
      FROM vso.%s
      WHERE 1=1 %%s
      GROUP BY hash
    ) AS hashed
  )
%%s => where clause ; %s => table name

Why does this happen?

Simply the results set is returned as the database returns it to the query. So if the query is returned in date order then the results will be date ordered. However the DB will return the results in the most efficient way possible and this may not in fact be time ordered.

Why are they not just sorted?

Sorting is an expensive operation on complex queries especially when the DB is potentially returning hundreds of thousands of results. In the past we have had issues with data providers timing out for these very large queries, especially those which specify a sample time, as they bypass the normal Data Provider limits. Sorting is left to the client to do.