Ticket #336 (closed enhancement: fixed)

Opened 10 days ago

Last modified 9 days ago

Server monitoring

Reported by: jacob Owned by: jacob
Priority: normal Milestone: Continuing Development
Component: Website Version: 1.4
Severity: major Keywords: monitoring, server, outage, downtime, offline
Cc:

Description

I wrote a brief python script that checks a couple of sample urls on many of our servers. The script runs on a thirty-minute cronjob and gets presented on the vso website @ https://vso.nascom.nasa.gov/new/vso_web/VSO_Status.html I'm also working on folding this into my monitoring system on vso1, and then changing that script so that it emails out to vsotechs instead of just me. I'll make that change as soon as I'm certain everybody else won't be getting an email every thirty minutes about something they have no power to fix.

Change History

comment:1 Changed 9 days ago by jacob

  • Status changed from new to closed
  • Resolution set to fixed

Testing of the monitoring scripts went well, I'm going to change the target email to help@…. The scripts run on a 30 minute cron, an email will only go out if there has been any change to the last message, for example if you fix something or something else broke. I'll be updating the readme for these files, which are centralized in vso1: /usr/local/scripts/sys_test/

comment:2 Changed 9 days ago by jacob

Added monitoring script code to cvs under vso/diagnostics/sys_test

Note: See TracTickets for help on using tickets.