Discussion:
[xcat-user] xcatd service failed (while actually still up)
Gilad Berman
2017-06-07 09:32:28 UTC
Permalink
Hello,

We are seeing some strange behavior with xCAT service -

Systemctl status xcatd return the following output -

s08:~ # systemctl status xcatd
$B!|(B xcatd.service - LSB: xcatd
Loaded: loaded (/etc/init.d/xcatd; bad; vendor preset: disabled)
Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago
Docs: man:systemd-sysv-generator(8)
Process: 30751 ExecStop=/etc/init.d/xcatd stop (code=exited, status=0/SUCCESS)
Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)
Tasks: 8 (limit: 512)
CGroup: /system.slice/xcatd.service
$B('(!(B23978 /usr/sbin/in.tftpd -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf
$B('(!(B28343 xcatd: SSL listener
$B('(!(B28344 xcatd: DB Access
$B('(!(B28345 xcatd: UDP listener
$B('(!(B28346 xcatd: install monitor
$B('(!(B28347 xcatd: Discovery worker
$B('(!(B28348 xcatd: Command log writer
$B(&(!(B28727 xcatd: DB Access


While xcatd is still running and operational (tabdump every other functionality seems to work).

Doing a systemctl restart xcatd immediately works and seems to fix the issue.

Any idea what could make the service thinks it failed? Anything we should look at?

THX!!
[Loading Image...]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]
Xiao Peng Wang
2017-06-07 12:52:37 UTC
Permalink
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Gilad Berman
2017-06-07 14:26:16 UTC
Permalink
Probably yes. But the service was OK since then (we are monitoring the xCAT service status)

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Xiao Peng Wang [mailto:***@cn.ibm.com]
Sent: Wednesday, June 7, 2017 3:53 PM
To: xcat-***@lists.sourceforge.net
Cc: xcat-***@lists.sourceforge.net
Subject: Re: [xcat-user] xcatd service failed (while actually still up)


The start of xcatd looks failed to finish:
Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

Did you restart xcatd 16h ago?
Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)

Manager for HPC SW Dev: xCAT, ESSL, SMI, Test
IBM China Systems Laboratory (CSL)

Tel: 86-10-82453455
Email: ***@cn.ibm.com<mailto:***@cn.ibm.com>


----- Original message -----
From: Gilad Berman <***@lenovo.com<mailto:***@lenovo.com>>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>>
Cc:
Subject: [xcat-user] xcatd service failed (while actually still up)
Date: Wed, Jun 7, 2017 5:35 PM


Hello,



We are seeing some strange behavior with xCAT service –



Systemctl status xcatd return the following output –



s08:~ # systemctl status xcatd

● xcatd.service - LSB: xcatd

Loaded: loaded (/etc/init.d/xcatd; bad; vendor preset: disabled)

Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

Docs: man:systemd-sysv-generator(8)

Process: 30751 ExecStop=/etc/init.d/xcatd stop (code=exited, status=0/SUCCESS)

Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

Tasks: 8 (limit: 512)

CGroup: /system.slice/xcatd.service

├─23978 /usr/sbin/in.tftpd -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf

├─28343 xcatd: SSL listener

├─28344 xcatd: DB Access

├─28345 xcatd: UDP listener

├─28346 xcatd: install monitor

├─28347 xcatd: Discovery worker

├─28348 xcatd: Command log writer

└─28727 xcatd: DB Access





While xcatd is still running and operational (tabdump every other functionality seems to work).



Doing a systemctl restart xcatd immediately works and seems to fix the issue.



Any idea what could make the service thinks it failed? Anything we should look at?



THX!!

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]


Gilad Berman
HPC Architect
Lenovo EMEA


[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>







Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>



[DCG-Hardware]








------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net<mailto:xCAT-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user
Xiao Peng Wang
2017-06-07 14:44:37 UTC
Permalink
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Gilad Berman
2017-06-07 15:13:28 UTC
Permalink
We use Icinga, but simple make sure the service is alive.

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Xiao Peng Wang [mailto:***@cn.ibm.com]
Sent: Wednesday, June 7, 2017 5:45 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Cc: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Subject: [xcat-user] 回倍 xcatd service failed (while actually still up)

Someone from xcat will try to recreate. I am curious that how do you monitor the xcat service.

Using IBM Verse, send from my iPhone.
________________________________
圚 2017幎6月7日䞋午10:29:20***@lenovo.com<mailto:***@lenovo.com> 写道

From: ***@lenovo.com<mailto:***@lenovo.com>
To: xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>
Cc:
Date: 2017幎6月7日 䞋午10:29:20
Subject: Re: [xcat-user] xcatd service failed (while actually still up)
Probably yes. But the service was OK since then (we are monitoring the xCAT service status)

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262<tel:+972-52-2554262>
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Xiao Peng Wang [mailto:***@cn.ibm.com]
Sent: Wednesday, June 7, 2017 3:53 PM
To: xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>
Cc: xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>
Subject: Re: [xcat-user] xcatd service failed (while actually still up)


The start of xcatd looks failed to finish:
Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

Did you restart xcatd 16h ago?
Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)

Manager for HPC SW Dev: xCAT, ESSL, SMI, Test
IBM China Systems Laboratory (CSL)

Tel: 86-10-82453455<tel:86-10-82453455>
Email: ***@cn.ibm.com<mailto:***@cn.ibm.com>


----- Original message -----
From: Gilad Berman <***@lenovo.com<mailto:***@lenovo.com>>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>>
Cc:
Subject: [xcat-user] xcatd service failed (while actually still up)
Date: Wed, Jun 7, 2017 5:35 PM


Hello,



We are seeing some strange behavior with xCAT service –



Systemctl status xcatd return the following output –



s08:~ # systemctl status xcatd

● xcatd.service - LSB: xcatd

Loaded: loaded (/etc/init.d/xcatd; bad; vendor preset: disabled)

Active: failed (Result: timeout) since Tue 2017-06-06 19:12:36 CEST; 16h ago

Docs: man:systemd-sysv-generator(8)

Process: 30751 ExecStop=/etc/init.d/xcatd stop (code=exited, status=0/SUCCESS)

Process: 7630 ExecStart=/etc/init.d/xcatd start (code=killed, signal=TERM)

Tasks: 8 (limit: 512)

CGroup: /system.slice/xcatd.service

├─23978 /usr/sbin/in.tftpd -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf

├─28343 xcatd: SSL listener

├─28344 xcatd: DB Access

├─28345 xcatd: UDP listener

├─28346 xcatd: install monitor

├─28347 xcatd: Discovery worker

├─28348 xcatd: Command log writer

└─28727 xcatd: DB Access





While xcatd is still running and operational (tabdump every other functionality seems to work).



Doing a systemctl restart xcatd immediately works and seems to fix the issue.



Any idea what could make the service thinks it failed? Anything we should look at?



THX!!

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]


Gilad Berman
HPC Architect
Lenovo EMEA


[Phone]+972-52-2554262<tel:+972-52-2554262>
[Email]***@lenovo.com<mailto:***@lenovo.com>







Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>



[DCG-Hardware]








------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net<mailto:xCAT-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user

Loading...