No, there arenât.
All attempts from one ip(logs from iLO):
123030 Informational iLO 4 05/03/2017 18:16 05/03/2017 18:16 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123029 Informational iLO 4 05/03/2017 18:16 05/03/2017 18:16 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
123028 Informational iLO 4 05/03/2017 18:05 05/03/2017 18:05 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123027 Informational iLO 4 05/03/2017 18:05 05/03/2017 18:05 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
123022 Informational iLO 4 05/03/2017 17:54 05/03/2017 17:54 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123021 Informational iLO 4 05/03/2017 17:54 05/03/2017 17:54 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
123020 Informational iLO 4 05/03/2017 17:49 05/03/2017 17:49 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123019 Informational iLO 4 05/03/2017 17:49 05/03/2017 17:49 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
123016 Informational iLO 4 05/03/2017 17:40 05/03/2017 17:40 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123015 Informational iLO 4 05/03/2017 17:40 05/03/2017 17:40 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
123010 Informational iLO 4 05/03/2017 17:26 05/03/2017 17:26 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
123009 Informational iLO 4 05/03/2017 17:26 05/03/2017 17:26 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122994 Informational iLO 4 05/03/2017 15:56 05/03/2017 15:56 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122993 Informational iLO 4 05/03/2017 15:56 05/03/2017 15:56 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122990 Informational iLO 4 05/03/2017 15:44 05/03/2017 15:44 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122989 Informational iLO 4 05/03/2017 15:44 05/03/2017 15:44 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122988 Informational iLO 4 05/03/2017 15:36 05/03/2017 15:36 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122987 Informational iLO 4 05/03/2017 15:36 05/03/2017 15:36 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122986 Informational iLO 4 05/03/2017 15:30 05/03/2017 15:30 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122985 Informational iLO 4 05/03/2017 15:30 05/03/2017 15:30 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122980 Informational iLO 4 05/03/2017 15:15 05/03/2017 15:15 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122979 Informational iLO 4 05/03/2017 15:15 05/03/2017 15:15 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122974 Informational iLO 4 05/03/2017 15:06 05/03/2017 15:06 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
122973 Informational iLO 4 05/03/2017 15:06 05/03/2017 15:06 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122970 Informational iLO 4 05/03/2017 14:52 05/03/2017 14:52 1 IPMI/RMCP logout: root - 10.10.114.30(xcat-sn1.mlan).
122967 Informational iLO 4 05/03/2017 14:52 05/03/2017 14:52 1 IPMI/RMCP login by root - 10.10.114.30(xcat-sn1.mlan).
On 3 May 2017 at 19:19:26, Jarrod Johnson (***@lenovo.com) wrote:
Hmm, and there isnât anything like conserver or another confluent trying to run at the same time to the same node?
Â
From: banuchka [mailto:***@gmail.com]
Sent: Wednesday, May 03, 2017 2:10 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Hi,
Â
one more strange thing about confluent:
Â
May  3 12:57:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 12:57:26 console connected]
May  3 13:02:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:02:06 console disconnected]
May  3 13:10:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:10:30 console connected]
May  3 13:12:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:12:06 console disconnected]
May  3 13:21:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:21:06 console connected]
May  3 13:22:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:22:03 console disconnected]
May  3 13:26:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:26:00 console connected]
May  3 13:32:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:32:03 console disconnected]
May  3 13:33:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:33:15 console connected]
May  3 14:22:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:22:00 console disconnected]
May  3 14:23:11 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:23:09 console connected]
May  3 14:32:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:32:00 console disconnected]
May  3 14:39:44 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:39:42 console connected]
May  3 14:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:05 console disconnected]
May  3 14:52:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:15 console connected]
May  3 15:02:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:02:13 console disconnected]
May  3 15:06:40 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:06:38 console connected]
May  3 15:12:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:12:15 console disconnected]
May  3 15:15:30 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:15:28 console connected]
May  3 15:22:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:22:15 console disconnected]
May  3 15:30:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:30:26 console connected]
May  3 15:32:21 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:32:19 console disconnected]
May  3 15:36:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:36:40 console connected]
May  3 15:41:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:41:57 console disconnected]
May  3 15:45:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:45:15 console connected]
May  3 15:51:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:51:57 console disconnected]
May  3 15:57:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:57:03 console connected]
May  3 17:22:12 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:22:10 console disconnected]
May  3 17:26:38 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:26:36 console connected]
May  3 17:32:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:32:13 console disconnected]
May  3 17:41:26 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:24 console connected]
May  3 17:42:01 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:59 console disconnected]
May  3 17:49:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:49:30 console connected]
May  3 17:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:05 console disconnected]
May  3 17:52:42 xcat-sn1.mlan confluent[4102]: audit :May 03 17:52:40 {"operation": "start", "allowed": true, "target": "/nodes/unreg25/console/session", "user": "xcat_console"}
May  3 17:52:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:40 connection by xcat_console]
May  3 17:52:45 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:43 console disconnected]
May  3 17:56:09 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:56:07 console connected]
Â
it isnât Dell BMC⊠Â
Â
I think iâve wrote about that behaviour here before, anyway. Times here are so random doesnât look like a timeout issue in some place.
Â
Need an advice before rolling back :) Thanks
Â
On 14 April 2017 at 20:59:04, Jarrod Johnson (***@lenovo.com) wrote:
Yeah, there will be a bit push in the coming weeks it will have at least an âeventsâ log along with a lot more function.
Â
Then some more fleshed out documentation (beyond the preliminary stuff on hpc.lenovo.com).
Â
Let me know if the firmware exploration works out. That particular change line suggests firmware upgrades, but it is possible they could have some high BMC cpu usage that could manifest in such a way. The âworks with ipmitoolâ though has me scratching my head.
Â
From: banuchka [mailto:***@gmail.com]
Sent: Friday, April 14, 2017 2:54 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Last idea doesnât work for me. So by the way idea as is is working great â confluent does disconnect/connect after time in constant. But for now it is 100% correct to say â it is a problem with IDRAC fw.
from release notes for last fw:
===
- Fix for occasional iDRAC unresponsiveness caused by upgrades via Firmware RACADM or
have an active SOL or SSH sessions while firmware upgrade is in progress.
===
Iâm not sure, but maybe its something like i have here. So did the upgrade on few hosts and give them plenty of time to show me results.
Thanks for your answers, help and time⊠it is very interesting quest :)
Â
Bit more about Confluent:
- Interesting ambitionsÂ
- Python VS Perl, thats good
- I think log files(not just trace, stderr, stdout) and documentation(source on Github is the best doc o know, butâŠ) are things that i would like to be in Confluent
Â
On 14 April 2017 at 19:27:20, Jarrod Johnson (***@lenovo.com) wrote:
Very interested in the outcome. And thank you for working through it. Also interested what you have liked, would like, and have disliked about confluent.
Â
From: banuchka [mailto:***@gmail.com]
Sent: Friday, April 14, 2017 12:01 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Thank you Jarrod, iâll try to add patch and let you know after. Hope 90 minutes is enough, yes.
Â
On 14 April 2017 at 16:57:24, Jarrod Johnson (***@lenovo.com) wrote:
Hmm, this is going to be very difficult to root cause (I only have Lenovo equipment as one might expect).
Â
Iâm loathe to do a workaround, but in console.py (find /usr âname console.py) , might be interesting to see how a change like the following:
diff --git a/pyghmi/ipmi/console.py b/pyghmi/ipmi/console.py
index 95e8551..a5f6062 100644
--- a/pyghmi/ipmi/console.py
+++ b/pyghmi/ipmi/console.py
@@ -42,6 +42,7 @@ class Console(object):
    def __init__(self, bmc, userid, password,
                 iohandler, port=623,
                 force=False, kg=None):
+Â Â Â Â Â Â Â self.keepalivecount = 0
        self.keepaliveid = None
        self.connected = False
        self.broken = False
@@ -70,6 +71,7 @@ class Console(object):
        if 'error' in response:
            self._print_error(response['error'])
            return
+Â Â Â Â Â Â Â self.keepalivecount = 0
        #Send activate sol payload directive
        #netfn= 6 (application)
        #command = 0x48 (activate payload)
@@ -150,11 +152,12 @@ class Console(object):
            return
        currowner = struct.unpack(
            "<I", struct.pack('4B', *response['data'][:4]))
-Â Â Â Â Â Â Â if currowner[0] != self.ipmi_session.sessionid:
+       if currowner[0] != self.ipmi_session.sessionid or self.keepalivecount > 180:
            # the session is deactivated or active for something else
            self.activated = False
            self._print_error('SOL deactivated')
            return
+Â Â Â Â Â Â Â self.keepalivecount += 1
        # ok, still here, that means session is alive, but another
        # common issue is firmware messing with mux on reboot
        # this would be a nice thing to check, but the serial channel
Â
If it would pan out, should cause the console session to disconnect itself roughly every 90 minutes and trigger reconnect (is 90 minutes short enough in your case?)Â Would require a service confluent restart to see if it had the desired effect.
Â
Sorry I havenât tested and canât think of root cause, but going to take some time off for the weekend.
Â
I would be curious if the same ipmitool is running a day later than a check (e.g. if ipmitool is exiting and getting restarted). I donât have the time at the moment to see if they do some other interesting thing to avoid the behavior.
Â
From: banuchka [mailto:***@gmail.com]
Sent: Friday, April 14, 2017 11:45 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
cloud53.ulan:/home/banuchka # ipmitool sol info 1
Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01
Set in progress         : set-complete
Enabled             : true
Force Encryption         : true
Force Authentication       : false
Privilege Level         : ADMINISTRATOR
Character Accumulate Level (ms) : 50
Character Send Threshold     : 255
Retry Count           : 7
Retry Interval (ms) Â Â Â Â Â Â : 480
Volatile Bit Rate (kbps) Â Â Â Â : 38.4
Non-Volatile Bit Rate (kbps) Â Â : 115.2
Payload Channel         : 1 (0x01)
Payload Port           : 623
cloud53.ulan:/home/banuchka # ipmitool sol set volatile-bit-rate 115.2 1
cloud53.ulan:/home/banuchka # ipmitool sol info 1
Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01
Set in progress         : set-complete
Enabled             : true
Force Encryption         : true
Force Authentication       : false
Privilege Level         : ADMINISTRATOR
Character Accumulate Level (ms) : 50
Character Send Threshold     : 255
Retry Count           : 7
Retry Interval (ms) Â Â Â Â Â Â : 480
Volatile Bit Rate (kbps) Â Â Â Â : 115.2
Non-Volatile Bit Rate (kbps) Â Â : 115.2
Payload Channel         : 1 (0x01)
Payload Port           : 623
cloud53.ulan:/home/banuchka # echo 123 > /dev/console
Â
and nothing happened
Â
in the consoleâs log
â
[04/14 12:49:12 console disconnected][04/14 12:49:29 console connected][04/14 13:01:02 console disconnected][04/14 13:01:02 console connected][04/14 13:03:54 console disconnected][04/14 13:04:15 console connected][04/14 13:38:37 console connected][04/14 15:31:47 console disconnected][04/14 15:36:24 console connected][04/14 15:42:08 connection by xcat_console]
---
Â
On 14 April 2017 at 16:39:35, Jarrod Johnson (***@lenovo.com) wrote:
If you do have any in corrupted state, would be interested to see what happens if you do:
ipmitool sol set volatile-bit-rate 115.2 1
Â
Â
To change the volatile bit rate to match the non-volatile bit rate and see if the corruption goes away.
Â
From: banuchka [mailto:***@gmail.com]
Sent: Friday, April 14, 2017 11:36 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
115200
Â
idracadm7 get iDRAC.IPMISerial
[Key=iDRAC.Embedded.1#IPMISerial.1]
BaudRate=115200
ChanPrivLimit=4
ConnectionMode=Terminal
DeleteControl=Disabled
EchoControl=Enabled
FlowControl=RTS/CTS
HandshakeControl=Enabled
InputNewLineSeq=1
LineEdit=Enabled
NewLineSeq=CR-LF
Â
that is strange, right
Â
On 14 April 2017 at 16:31:27, Jarrod Johnson (***@lenovo.com) wrote:
Hmm, whatâs the baud rate the console is actually running at? Odd to see the volatile and non volatile bit rates not be the same.
Â
From: banuchka [mailto:***@gmail.com]
Sent: Friday, April 14, 2017 11:28 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Â
Â
On 14 April 2017 at 16:15:16, Jarrod Johnson (***@lenovo.com) wrote:
And to be clear, the corruption only starts after a long period of time of being continuously connected?
Yes, that is correct
Â
I might be interested in seeing ipmitool sol info 1 output against a system while it is working versus showing corrupted info.
corrupted:
# ipmitool -I lanplus -H cloud2manage -U root -a sol info 1
Password:
Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01
Set in progress         : set-complete
Enabled             : true
Force Encryption         : true
Force Authentication       : false
Privilege Level         : ADMINISTRATOR
Character Accumulate Level (ms) : 50
Character Send Threshold     : 255
Retry Count           : 7
Retry Interval (ms) Â Â Â Â Â Â : 480
Volatile Bit Rate (kbps) Â Â Â Â : 38.4
Non-Volatile Bit Rate (kbps) Â Â : 115.2
Payload Channel         : 1 (0x01)
Payload Port           : 623
Â
Working:
# ipmitool -I lanplus -H cloud2manage -U root -a sol info 1
Password:
Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01
Set in progress         : set-complete
Enabled             : true
Force Encryption         : true
Force Authentication       : false
Privilege Level         : ADMINISTRATOR
Character Accumulate Level (ms) : 50
Character Send Threshold     : 255
Retry Count           : 7
Retry Interval (ms) Â Â Â Â Â Â : 480
Volatile Bit Rate (kbps) Â Â Â Â : 38.4
Non-Volatile Bit Rate (kbps) Â Â : 115.2
Payload Channel         : 1 (0x01)
Payload Port           : 623
Â
From:Â banuchka [mailto:***@gmail.com]Â
Sent:Â Friday, April 14, 2017 11:09 AM
To:Â xCAT Users Mailing list; Jarrod Johnson
Subject:Â RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Yes, reopen causes it to work again,  without any garbage⊠so looks like normal console :)
Hit <enter> causes at first garbage output(ᅵᅵ Porᅵlo) and *normal console* before...
Â
On 14 April 2017 at 16:02:09, Jarrod Johnson (***@lenovo.com) wrote:
So reopen causes it to work again, and before, itâs not *hung*, but erratic with garbage characters and occasional blips of sanity?
Â
From:Â banuchka [mailto:***@gmail.com]Â
Sent:Â Friday, April 14, 2017 11:00 AM
To:Â xCAT Users Mailing list; Jarrod Johnson
Subject:Â RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Reopen console did the trick as well...
Â
On 14 April 2017 at 15:54:03, Jarrod Johnson (***@lenovo.com) wrote:
âctrl-e, then c, then oâ to reconnect.
Â
Was conserver ondemand or full logging?
Â
From:Â banuchka [mailto:***@gmail.com]Â
Sent:Â Friday, April 14, 2017 10:52 AM
To:Â xCAT Users Mailing list; Jarrod Johnson
Subject:Â RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Console starts showing garbage after <enter> inside rcons.
What do you mean when said ârestarting consoleâ?
Console continue its work after:
- <enter> inside rcons/confetty
- bmc reset (console disconnected/console connected)
Â
Youâre absolutely right with ipmitool and conserver with the same servers we were out of such troubles.
On 14 April 2017 at 15:47:14, Jarrod Johnson (***@lenovo.com) wrote:
So the console starts showing garbage? Restarting the console causes the garbage to go away?
Â
You said that ipmitool with a certain configuration did not trigger this?
Â
From:Â banuchka [mailto:***@gmail.com]Â
Sent:Â Friday, April 14, 2017 9:29 AM
To:Â xCAT Users Mailing list; Jarrod Johnson
Subject:Â Re: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
Iâm out of ideas, let me show you all i see.
Â
Inside rcons i see:
Â
MONITORING_TEST dbb54 1492160401 <= last message iâve sent from OS (more complex log below)
Â
tcpdump(keepalive?):
Â
13:23:42.342886 IP (tos 0x0, ttl 64, id 16448, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:23:42.345504 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
Â
âŠ
Â
13:24:09.422491 IP (tos 0x0, ttl 64, id 17060, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:24:09.425045 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
Â
Hit <enter> in rcons:
---
MONITORING_TEST dbb54 1492160401
Â
ᅵᅵ
 Porᅵ
â
Â
tcpdump:
13:24:35.727671 IP (tos 0x0, ttl 64, id 19582, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:24:35.731533 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
13:24:47.390367 IP (tos 0x0, ttl 64, id 20347, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:24:47.392799 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 64
13:24:47.408312 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
13:24:47.409797 IP (tos 0x0, ttl 64, id 20349, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:03.127774 IP (tos 0x0, ttl 64, id 21818, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:03.131561 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
13:25:27.269696 IP (tos 0x0, ttl 64, id 26284, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:27.272204 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
13:25:47.410313 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 64
13:25:47.413754 IP (tos 0x0, ttl 64, id 28210, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:48.709947 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 204)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 176
13:25:48.712033 IP (tos 0x0, ttl 64, id 28355, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:52.564080 IP (tos 0x0, ttl 64, id 29103, offset 0, flags [DF], proto UDP (17), length 92)
  10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64
13:25:52.566810 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP (17), length 108)
  10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80
Â
and Magic, rcons:
---
 Porᅵlo]0;console: dbb54 [13:25]
Â
Â
dbb54 login:
---
Â
On 14 April 2017 at 12:42:03, Jarrod Johnson (***@lenovo.com) wrote:
If you ctrl-e, c, o, does it restore the console after the time?
Â
Can you tell that it goes after exactly 24hours on the dot?
Â
When console hung, does âipmitool sol activateâ say âsession already activeâ?
Yes,Â
# ipmitool -I lanplus -H 10.10.106.155 -U root -a sol activate
Password:
Info: SOL payload already active on another session
Â
Does /var/log/confluent/consoles/<nodename> have any interesting events crop up?
[04/13 15:17:21 console connected]
⊠many our own messages
^MMONITORING_TEST dbb54 1492160401 | <== This is the last message from OS/ # date -***@1492160401 (Fri Apr 14 09:00:01 UTC 2017)
^M
[04/14 09:05:13 console connected]
[04/14 09:11:59 console connected]
[04/14 09:13:38 console disconnected]
[04/14 09:14:54 console connected]
[04/14 10:15:13 connection by xcat_console]
[04/14 10:15:14 disconnection by xcat_console]
[04/14 13:14:30 connection by xcat_console]
Â
Pyghmi will do keepalive as well, and if thatâs the problem, it should be much shorter than 24 hours. In fact, it should be checking if the SOL payload is active and owned by confluent specifically every couple of minutes.
yes, thats correct
Â
From:Â banuchka [mailto:***@gmail.com]Â
Sent:Â Friday, April 14, 2017 5:55 AM
To:Â xcat-***@lists.sourceforge.net
Subject:Â Re: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.
Â
My last reply was incorrect. Problems still here. Im trying to find something usefull inbetween confluent/pyghmi...
Confluent restart solves hangs/reopen all connections.
I think it isnt the best option to restart confluent 1 or 2 times in 24h.
--Â
banuchka
On 13 April 2017 at 17:03:19, banuchka (***@gmail.com) wrote:
It is Dellâs related problem, not 100% butâŠ
Confluent from current master is doing things well :)Â
Thanks for pretty nice tool âconfluentdbutil".
Â
On 13 April 2017 at 11:30:14, banuchka (***@gmail.com) wrote:
Looks like that problem was before⊠The fix was to use ipmitool with keepalive(one from xcat repos).
Here pyghmi is used maybe that the reason?
Â
On 13 April 2017 at 08:22:28, banuchka (***@gmail.com) wrote:
Hi,
Â
Im trying to completely migrate from conserver to confluent, but catch strange behaviour.
Some of my consoles hangs ~after 24, so no any new messages in their logs or in rcons.
I send messages with timestamp from OS >/dev/console every 30-60min and take a look on them for monitoring purposes(consoles availability monitoring).
I can open rcons and hit enter, after few secs console is waking up(strange). I didnt see it happen with conserver or maybe im wrong...
Some details:
- as i can see the bigest part of consoles with hangs behaviour are Dell idrac. Doesnt matter which type of RacSerial or IPMISerial is in use.
- racreset hard/ipmitool bmc reset didnt do the things
- hit enter to console wake it up(for example with expect i can send \r\n\f, but it looks bad)
- i didnt try to clean confluent's conf and restart it. Not sure it may help.
- HP consoles works well, same ipmi
- few consoles with custom pluging works good as well
Â
So maybe my question is not about confluent, but if some of you have some knowledge about same problems please share it! ;)
Â
--
banuchka
--
banuchka
--Â
banuchka
------------------------------------------------------------------------------Â
Check out the vibrant tech community on one of the world's mostÂ
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________Â
xCAT-user mailing listÂ
xCAT-***@lists.sourceforge.netÂ
https://lists.sourceforge.net/lists/listinfo/xcat-userÂ
Â
Â
Â
--Â
banuchka
--Â
banuchka
--Â
banuchka
--Â
banuchka
Â
Â
--Â
banuchka
--Â
banuchka
--Â
banuchka
--Â
banuchka
--Â
banuchka
--Â
banuchka
--Â
banuchka