| Version 7 (modified by joe, 11 years ago) (diff) |
|---|
Managing disk use with sum_rm
sum_rm is the program that ages data off disk in the NetDRMS system. The configuration file for sum_rm resides in the same directory as the log files for the program. A highly commented configuration file for sum_rm is included here, as it pre-dates the wiki and documents most of the relevant points.
It is worth noting that sometimes errors like this will be written to the sum_rm log :
Error in SUM_StatOffline no data found on line 76 Err: SUM_StatOffline(844424933554303, ...) Removing sum_main for ds_index = 844424933554303 Error in DS_SumMainDelete no data found on line 24 **Err: DS_SumMainDelete(844424933554303) **Err: DS_RmNowX() can't open /SUM02/D844424933554303/Records.txt
This happens when an entry for a sunum is in the SUMS database in the sum_partn_alloc table but not in the sum_main table (a situation that can arise in the event of a failed download). The error is generally not serious. An entry in sum_partn_alloc without a corresponding entry in sum_main generally indicates that a download failed. Hopefully failed downloads are due to a transient event that only lasted a short time, so sum_rm should work its way through the data from that time and get back to deleting data soon.
It's also worth noting that sum_rm is *very* quiet in the event that there are no data for which the effective date is in the past - it does not print anything to indicate that this is the case. The user will have to run this query :
select effective_date from sum_partn_alloc order by effective_date limit 30;
to ascertain if this may be the case. You can also try:
select count(*) from sum_partn_alloc where effective_date < to_char( now(), 'YYYYMMDDHH24MI' );
The configuration file for sum_rm is below :
#
# Configuration file for sum_rm program
#
# The sum_rm program does disk data age off (ie. deletes old data) to stop
# disk partitions from filling up. It is started as
# a service along with the rest of sums. It works as follows.
#
# In the sum_partn_alloc table of the DRMS database,
# there is an 'effective_date' column. After the
# effective_date, the data product may be deleted by sum_rm.
#
# sum_rm runs repeatedly. For each run, sum_rm does the following :
#
# [1] It reads this configuration file (allowing
# dynamic adjustment of these parameters, ie. you do not
# need to restart the sum_rm program for a config change
# to take effect, which is very handy).
# [2] Accesses the database and orders results by effective_date.
# [3] Starts deleting products from disk if the effective_date is in the past,
# until one of three conditions is met :
# (a) The disk has at least PART_PERCENT_FREE percent
# of free space on it (see below), or
# (b) There are no more cases in which effective_date is in the past, or
# (c) At least 600 deletions have taken place (this is defined in
# the LIMIT statement in the file SUMLIB_RmDoX.pgc).
# [4] Sleeps for the time defined by the SLEEP parameter (see below) before the next run,
# then goes back to [1] for the next run.
#
############# sum_rm parameters ##########################
#
# Number of seconds to sleep between runs of sum_rm
SLEEP=60
#
# Integer percent of each disk partition to be kept free. Applies to all partitions.
# Defaults to 3 if not specified. Setting PART_PERCENT_FREE=5 will allow
# partitions to fill to 95%. Note that the math the 'df' command uses
# tends to round up, but dividing the number of used blocks by the total
# number of blocks should give the result specified by PART_PERCENT_FREE.
#
# NOTE : This parameter used to be specified as MAX_FREE_0, MAX_FREE_1...
# ...MAX_FREE_n, which specified the number of free MB on different partitions,
# but this is no longer the case and specifying
# MAX_FREE_{n} will have no effect as of NetDRMS 7.0. If
# MAX_FREE_{n} is specified and PART_PERCENT_FREE is not,
# then the default PART_PERCENT_FREE=3 will be used and
# disk partitions will fill to 97%.
PART_PERCENT_FREE=4
#
# Log file (only opened at startup and pid gets appended to this name)
LOG=/usr/local/logs/SUM/sum_rm.log
#
# Who to email when there's a notable problem
MAIL=webmaster@mydomain.edu
#
# To prevent sum_rm from doing anything set NOOP to non-0.
NOOP=0
#
# sum_rm can only be enabled for a single user as defined by USER.
USER=sumsUser
#
# Don't run sum_rm between these NORUN hours of the day (0-23).
# Comment out to ignore, or set them both to the same hour.
# The NORUN_STOP must be >= NORUN_START.
#
# Don't run when the hour first hits NORUN_START
NORUN_START=7
#
# Start running again when the hour first hits NORUN_STOP
NORUN_STOP=7
If you are setting SUMS up on a test server that doesn't have much disk space, the following trigger can be used to ensure that the SUMS effective_date is kept low enough to ensure sum_rm can clean up:
CREATE OR REPLACE FUNCTION no_retention ( ) RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN NEW.effective_date := to_char( (now()+INTERVAL '6 hours'), 'YYYYMMDDHH24MI' ); RETURN NEW; END $$; CREATE TRIGGER no_retention BEFORE INSERT ON sum_partn_alloc FOR EACH ROW EXECUTE PROCEDURE no_retention();}}}
