Version 7 (modified by joe, 10 years ago) (diff) |
---|
Managing disk use with sum_rm
sum_rm is the program that ages data off disk in the NetDRMS system. The configuration file for sum_rm resides in the same directory as the log files for the program. A highly commented configuration file for sum_rm is included here, as it pre-dates the wiki and documents most of the relevant points.
It is worth noting that sometimes errors like this will be written to the sum_rm log :
Error in SUM_StatOffline no data found on line 76 Err: SUM_StatOffline(844424933554303, ...) Removing sum_main for ds_index = 844424933554303 Error in DS_SumMainDelete no data found on line 24 **Err: DS_SumMainDelete(844424933554303) **Err: DS_RmNowX() can't open /SUM02/D844424933554303/Records.txt
This happens when an entry for a sunum is in the SUMS database in the sum_partn_alloc table but not in the sum_main table (a situation that can arise in the event of a failed download). The error is generally not serious. An entry in sum_partn_alloc without a corresponding entry in sum_main generally indicates that a download failed. Hopefully failed downloads are due to a transient event that only lasted a short time, so sum_rm should work its way through the data from that time and get back to deleting data soon.
It's also worth noting that sum_rm is *very* quiet in the event that there are no data for which the effective date is in the past - it does not print anything to indicate that this is the case. The user will have to run this query :
select effective_date from sum_partn_alloc order by effective_date limit 30;
to ascertain if this may be the case. You can also try:
select count(*) from sum_partn_alloc where effective_date < to_char( now(), 'YYYYMMDDHH24MI' );
The configuration file for sum_rm is below :
# # Configuration file for sum_rm program # # The sum_rm program does disk data age off (ie. deletes old data) to stop # disk partitions from filling up. It is started as # a service along with the rest of sums. It works as follows. # # In the sum_partn_alloc table of the DRMS database, # there is an 'effective_date' column. After the # effective_date, the data product may be deleted by sum_rm. # # sum_rm runs repeatedly. For each run, sum_rm does the following : # # [1] It reads this configuration file (allowing # dynamic adjustment of these parameters, ie. you do not # need to restart the sum_rm program for a config change # to take effect, which is very handy). # [2] Accesses the database and orders results by effective_date. # [3] Starts deleting products from disk if the effective_date is in the past, # until one of three conditions is met : # (a) The disk has at least PART_PERCENT_FREE percent # of free space on it (see below), or # (b) There are no more cases in which effective_date is in the past, or # (c) At least 600 deletions have taken place (this is defined in # the LIMIT statement in the file SUMLIB_RmDoX.pgc). # [4] Sleeps for the time defined by the SLEEP parameter (see below) before the next run, # then goes back to [1] for the next run. # ############# sum_rm parameters ########################## # # Number of seconds to sleep between runs of sum_rm SLEEP=60 # # Integer percent of each disk partition to be kept free. Applies to all partitions. # Defaults to 3 if not specified. Setting PART_PERCENT_FREE=5 will allow # partitions to fill to 95%. Note that the math the 'df' command uses # tends to round up, but dividing the number of used blocks by the total # number of blocks should give the result specified by PART_PERCENT_FREE. # # NOTE : This parameter used to be specified as MAX_FREE_0, MAX_FREE_1... # ...MAX_FREE_n, which specified the number of free MB on different partitions, # but this is no longer the case and specifying # MAX_FREE_{n} will have no effect as of NetDRMS 7.0. If # MAX_FREE_{n} is specified and PART_PERCENT_FREE is not, # then the default PART_PERCENT_FREE=3 will be used and # disk partitions will fill to 97%. PART_PERCENT_FREE=4 # # Log file (only opened at startup and pid gets appended to this name) LOG=/usr/local/logs/SUM/sum_rm.log # # Who to email when there's a notable problem MAIL=webmaster@mydomain.edu # # To prevent sum_rm from doing anything set NOOP to non-0. NOOP=0 # # sum_rm can only be enabled for a single user as defined by USER. USER=sumsUser # # Don't run sum_rm between these NORUN hours of the day (0-23). # Comment out to ignore, or set them both to the same hour. # The NORUN_STOP must be >= NORUN_START. # # Don't run when the hour first hits NORUN_START NORUN_START=7 # # Start running again when the hour first hits NORUN_STOP NORUN_STOP=7
If you are setting SUMS up on a test server that doesn't have much disk space, the following trigger can be used to ensure that the SUMS effective_date is kept low enough to ensure sum_rm can clean up:
CREATE OR REPLACE FUNCTION no_retention ( ) RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN NEW.effective_date := to_char( (now()+INTERVAL '6 hours'), 'YYYYMMDDHH24MI' ); RETURN NEW; END $$; CREATE TRIGGER no_retention BEFORE INSERT ON sum_partn_alloc FOR EACH ROW EXECUTE PROCEDURE no_retention();}}}