Changes between Initial Version and Version 1 of jmdDebug


Ignore:
Timestamp:
02/24/15 10:35:32 (10 years ago)
Author:
joe
Comment:

partial, just in case I dump.

Legend:

Unmodified
Added
Removed
Modified
  • jmdDebug

    v1 v1  
     1= Debugging the JMD = 
     2 
     3Every once in a while, things get stuck in the JMD.  Here are some tips to try to figure out what's going wrong. 
     4 
     5**note** : Joe H. wrote up some notes a while back re: the queues / sum service, etc. 
     6 
     7== Restarting the JMD == 
     8 
     9 
     10== Sniffing Incoming Requests == 
     11 
     12When nothing else works, we can try monitoring the traffic to make sure that the JMD processes are all happy. 
     13 
     14To make sure that user requests are coming from DRMS to the JMD, we can monitor the traffic on port 8080.  You'll need to be on the same machine, as it's only opened up to localhost, not to the world.  First, install wireshark, then : 
     15 
     16{{{ 
     17tshark -i lo -f 'tcp port 8080' -d tcp.port==8080,http | grep '/JMD' 
     18}}} 
     19 
     20Then issue a request via the VSO IDL client, making sure to specify 'site' on the vso_get call. 
     21 
     22You should see something like: 
     23 
     24{{{ 
     25[oneiros@sdo4 scripts]$ sudo tshark -i lo -f 'tcp port 8080' -d tcp.port==8080,http | grep '/JMD' 
     26Running as user "root" and group "root". This could be dangerous. 
     27Capturing on lo 
     28  0.000176    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=baa20705-b50b-4ac6-a7b8-b099e5f2850a HTTP/1.1 
     29  2.875262    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=c3aa5a5e-95b7-45b0-9de4-df85fa56d4c7 HTTP/1.1 
     30  9.370789    127.0.0.1 -> 127.0.0.1    HTTP 231 POST /JMD/JMD HTTP/1.1  (application/x-www-form-urlencoded) 
     31 10.012264    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=baa20705-b50b-4ac6-a7b8-b099e5f2850a HTTP/1.1 
     32 10.820344    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=a4d74aab-2017-47a8-8014-b1476de52f6c HTTP/1.1 
     33 12.887373    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=c3aa5a5e-95b7-45b0-9de4-df85fa56d4c7 HTTP/1.1 
     34 20.023336    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=baa20705-b50b-4ac6-a7b8-b099e5f2850a HTTP/1.1 
     35 20.829574    127.0.0.1 -> 127.0.0.1    HTTP 172 GET /JMD/JMD?type=query&sessionid=a4d74aab-2017-47a8-8014-b1476de52f6c HTTP/1.1 
     36}}} 
     37 
     38The POST is the initial request, and then the subsequent GET requests are to monitor if the retrieval is completed.  (note that there were multiple ones running concurently in this example).  If you see bogus characters in there, and not a valid UUID, you need to upgrade your JMD to fix  a resolved bug. 
     39 
     40== Sniffing Inter-server Communications 
     41 
     42When things are 'stuck' in the JMD's queue, we can also try to monitor to make sure that we're getting good messages between NetDRMS nodes. 
     43 
     44First, we'll need to identify the name of our external interface using either `ifconfig` or `ip`:  
     45 
     46{{{ 
     47[oneiros@sdo4 scripts]$ ip addr 
     481: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
     49    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 
     50    inet 127.0.0.1/8 scope host lo 
     512: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 
     52    link/ether bc:30:5b:ee:fb:48 brd ff:ff:ff:ff:ff:ff 
     53    inet 198.118.[CENSORED]/24 brd 198.118.248.255 scope global em1 
     543: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 
     55    link/ether bc:30:5b:ee:fb:49 brd ff:ff:ff:ff:ff:ff 
     564: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 
     57    link/ether bc:30:5b:ee:fb:4a brd ff:ff:ff:ff:ff:ff 
     585: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 
     59    link/ether bc:30:5b:ee:fb:4b brd ff:ff:ff:ff:ff:ff 
     60[oneiros@sdo4 scripts]$ ifconfig -a 
     61em1       Link encap:Ethernet  HWaddr BC:30:5B:EE:FB:48 
     62          inet addr:198.118.[CENSORED]  Bcast:198.118.248.255  Mask:255.255.255.0 
     63          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1 
     64          RX packets:25800945176 errors:0 dropped:6701 overruns:95689 frame:0 
     65          TX packets:19038424359 errors:0 dropped:0 overruns:0 carrier:0 
     66          collisions:0 txqueuelen:1000 
     67          RX bytes:23062736230177 (20.9 TiB)  TX bytes:23905987935940 (21.7 TiB) 
     68          Memory:dcb00000-dcbfffff 
     69... 
     70}}} 
     71 
     72So in this case, our external interface is 'em1', so we can sniff that interface.  The censored partially censored IP is that interface's address.  Unfortunately, because we accept connections on port 80, and we're trying to monitor port 80 outbound, it's a bit messy.  Also, this won't show us the response payload: 
     73 
     74{{{ 
     75[[oneiros@sdo4 scripts]$ sudo tshark -i em1 -f 'tcp dst port 80 and src 198.118.[CENSORED]' -d tcp.port==80,http | grep HTTP 
     76Running as user "root" and group "root". This could be dangerous. 
     77Capturing on em1 
     78  0.013785 198.118.[CENSORED] -> 131.142.[CENSORED] HTTP 166 POST /cgi-bin/VSO/DRMS/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     79  0.313581 198.118.[CENSORED] -> 140.252.[CENSORED] HTTP 166 POST /cgi-bin/VSO/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     80  0.686565 198.118.[CENSORED] -> 171.64.[CENSORED] HTTP 166 POST /cgi-bin/ajax/jsoc_fetch_VSO HTTP/1.1  (application/x-www-form-urlencoded) 
     81  2.138692 198.118.[CENSORED] -> 131.142.[CENSORED] HTTP 166 POST /cgi-bin/VSO/DRMS/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     82  2.439454 198.118.[CENSORED] -> 140.252.[CENSORED] HTTP 166 POST /cgi-bin/VSO/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     83  2.807359 198.118.[CENSORED] -> 171.64.[CENSORED] HTTP 166 POST /cgi-bin/ajax/jsoc_fetch_VSO HTTP/1.1  (application/x-www-form-urlencoded) 
     84 11.453624 198.118.[CENSORED] -> 131.142.[CENSORED] HTTP 166 POST /cgi-bin/VSO/DRMS/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     85 11.753938 198.118.[CENSORED] -> 140.252.[CENSORED] HTTP 166 POST /cgi-bin/VSO/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     86 12.118593 198.118.[CENSORED] -> 171.64.[CENSORED] HTTP 166 POST /cgi-bin/ajax/jsoc_fetch_VSO HTTP/1.1  (application/x-www-form-urlencoded) 
     87 13.580818 198.118.[CENSORED] -> 131.142.[CENSORED] HTTP 166 POST /cgi-bin/VSO/DRMS/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     88 13.878723 198.118.[CENSORED] -> 140.252.[CENSORED] HTTP 166 POST /cgi-bin/VSO/vso_jsoc_fetch.cgi HTTP/1.1  (application/x-www-form-urlencoded) 
     89}}} 
     90 
     91To actually sniff the connection requires more work.  You might be able to use the full wireshark command if you can can get an X console to the machine, or you can dump the traffic, and then move the dump to another machine to analyze: 
     92 
     93{{{ 
     94 
     95}}}