wiki:drmsHandyChecks
Last modified 10 years ago Last modified on 02/12/14 07:36:25

NetDRMS Useful Debugging Checks

This is a collection of short checks that can be done on the SUMS or DRMS databases to debug problems. It should be fairly clear from the context if the query should be run on the DRMD database or on the SUMS one. If in doubt, try one, and fail over to the other.

Check sunum_queue size

This checks the size of the sunum_queue - the sunums waiting to be processed. This should ideally be 0 unless a lot of sunums have come in at once. For a busy system, it could be that this value hovers around a few hundred.

select count(*) from sunum_queue;

Check sunum_queue entries older than 1 day

This checks the number of entries in sunum_queue that are older than a day. This should be 0.

select count(*) from sunum_queue where timestamp < now() - interval '1 days';

See what partitions SUMS has available

This shows what partitions SUMS has available. The last entry in the table - pds_set_num - should be 0. If it is not, then perhaps the disk is unmounted, or SUMS sees it as having filled up (note that SUMS sees a disk as full slightly before the disk is at 100% use). You will have to work with sum_rm to clear up some space and then set pds_set_num to 0 again.

select * from sum_partn_avail;

Temporal coverage

When data are written to disk, they have an "effective date" - a date after which they can be deleted by sum_rm. This returns the latest effective date that is still available.

select min(effective_date) from sum_partn_alloc;

slony updates

This shows the time of the last slony update and the time it was last applied. It should be very recent, at least on the current day.

select * from _jsoc.sl_archive_tracking;

Show data on disk

This shows data that are on disk. Note that you can be subscribed to a dataset and yet not have data for it on disk (no trigger to get the data).

select owning_series, sum(bytes), count(*) from sum_main  group by owning_series order by sum(bytes);