Discussion:
[xcat-user] problems booting redhat 7.3 on NextScale 360M5
Gilad Berman
2017-05-25 18:52:22 UTC
Permalink
Do you have consoles on demand set to yes in the site table (or specific to the node)?
If yes, remove “hard” from your console settings, nodeset again and try.

If this is a similar case, it is because when you set the flow control to hardware, the OS waits for the serial console to be connected (which is flow control..)


[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 9:44 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5

We are installing RH7.3 on NextScale nodes and after the install, node reboots. The problem is node seems to get "stuck" until I rcons into it and then it continues booting. So far, I can't pinpoint exact spot where it gets stuck, but it just sits there until I remote console (rcons) into it and then it continues booting. Here is where the last one got stuck until I did remote console:


[***@mgt ~]# rcons qnode5118

[Enter `^Ec?' for help]

Info: SOL payload already de-activated

[SOL Session operational. Use ~? for help]

??6?+

?+#6cK??6?[?+s?

[ 131.345026] systemd[1]: Created slice Root Slice.

[ 131.350307] systemd[1]: Starting Root Slice.

[ OK ] Listening on Journal Socket.

[ 131.360103] systemd[1]: Listening on Journal Socket.

[ 131.365676] systemd[1]: Starting Journal Socket.

[ OK ] Listening on udev Control Socket.

[ 131.377102] systemd[1]: Listening on udev Control Socket.

[ 131.383152] systemd[1]: Starting udev Control Socket.

[ OK ] Listening on udev Kernel Socket.

[ 131.395101] systemd[1]: Listening on udev Kernel Socket.

[ 131.401055] systemd[1]: Starting udev Kernel Socket.

[ OK ] Reached target Sockets.

[ 131.411102] systemd[1]: Reached target Sockets.

[ 131.416180] systemd[1]: Starting Sockets.

[ OK ] Created slice System Slice.

[ 131.426105] systemd[1]: Created slice System Slice.

[ 131.431578] systemd[1]: Starting System Slice.

[ 131.437357] systemd[1]: Starting Apply Kernel Variables...

Starting Apply Kernel Variables...

[ 131.448652] systemd[1]: Starting Journal Service...

Any help is appreciated.
Thanks,
Damir
Gilad Berman
2017-05-25 19:24:16 UTC
Permalink
Does your /tftpboot/xcat/xnba/nodes/<node> (I am writing from memory, might be slightly different path) still have something like 115200n8r or only 115200?

It should be the latter. Maybe nodeset osimage will do?

Writing without access to my system so sorry for the mistakes 😊

[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]

Gilad Berman
HPC Architect
Lenovo EMEA

[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


[DCG-Hardware]




From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 10:16 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Subject: Re: [xcat-user] problems booting redhat 7.3 on NextScale 360M5

So I removed it from the node’s definition in the nodehm table:

[***@mgt pxelinux.cfg]# nodels qnode5118 nodehm
qnode5118: nodehm.mgt: ipmi
qnode5118: nodehm.serialport: 0
qnode5118: nodehm.node: qnode5118
qnode5118: nodehm.serialspeed: 115200
qnode5118: nodehm.serialflow:
qnode5118: nodehm.cmdmapping:
qnode5118: nodehm.termport:
qnode5118: nodehm.comments:
qnode5118: nodehm.consoleondemand:
qnode5118: nodehm.cons:
qnode5118: nodehm.conserver:
qnode5118: nodehm.getmac:
qnode5118: nodehm.termserver:
qnode5118: nodehm.power:
qnode5118: nodehm.disable:


Set it to boot:
[***@mgt pxelinux.cfg]# nodeset qnode5118 boot
qnode5118: boot


Rebooted it and it still hangs:


[***@mgt pxelinux.cfg]# rpower qnode5118 boot
qnode5118: reset




On May 25, 2017, at 1:52 PM, Gilad Berman <***@lenovo.com<mailto:***@lenovo.com>> wrote:

Do you have consoles on demand set to yes in the site table (or specific to the node)?
If yes, remove “hard” from your console settings, nodeset again and try.

If this is a similar case, it is because when you set the flow control to hardware, the OS waits for the serial console to be connected (which is flow control..)


<image001.gif>

Gilad Berman
HPC Architect
Lenovo EMEA

<image002.gif>+972-52-2554262
<image003.gif>***@lenovo.com<mailto:***@lenovo.com>



Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>


<image004.gif>




From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 9:44 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>>
Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5

We are installing RH7.3 on NextScale nodes and after the install, node reboots. The problem is node seems to get "stuck" until I rcons into it and then it continues booting. So far, I can't pinpoint exact spot where it gets stuck, but it just sits there until I remote console (rcons) into it and then it continues booting. Here is where the last one got stuck until I did remote console:

[***@mgt ~]# rcons qnode5118
[Enter `^Ec?' for help]
Info: SOL payload already de-activated
[SOL Session operational. Use ~? for help]
??6?+
?+#6cK??6?[?+s?
[ 131.345026] systemd[1]: Created slice Root Slice.
[ 131.350307] systemd[1]: Starting Root Slice.
[ OK ] Listening on Journal Socket.
[ 131.360103] systemd[1]: Listening on Journal Socket.
[ 131.365676] systemd[1]: Starting Journal Socket.
[ OK ] Listening on udev Control Socket.
[ 131.377102] systemd[1]: Listening on udev Control Socket.
[ 131.383152] systemd[1]: Starting udev Control Socket.
[ OK ] Listening on udev Kernel Socket.
[ 131.395101] systemd[1]: Listening on udev Kernel Socket.
[ 131.401055] systemd[1]: Starting udev Kernel Socket.
[ OK ] Reached target Sockets.
[ 131.411102] systemd[1]: Reached target Sockets.
[ 131.416180] systemd[1]: Starting Sockets.
[ OK ] Created slice System Slice.
[ 131.426105] systemd[1]: Created slice System Slice.
[ 131.431578] systemd[1]: Starting System Slice.
[ 131.437357] systemd[1]: Starting Apply Kernel Variables...
Starting Apply Kernel Variables...
[ 131.448652] systemd[1]: Starting Journal Service...

Any help is appreciated.
Thanks,
Damir


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net<mailto:xCAT-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user
Jarrod Johnson
2017-05-25 18:49:41 UTC
Permalink
nodels <nodename> nodehm

If serialflow is hard, I suggest clearing that. It’s taking a rather strict interpretation that if no one is connected, then to not assert CTS.

In general, there isn’t a very practical downside. At worst output is missed, but that’s better than the kernel hanging for CTS.

From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 2:44 PM
To: xCAT Users Mailing list
Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5

We are installing RH7.3 on NextScale nodes and after the install, node reboots. The problem is node seems to get "stuck" until I rcons into it and then it continues booting. So far, I can't pinpoint exact spot where it gets stuck, but it just sits there until I remote console (rcons) into it and then it continues booting. Here is where the last one got stuck until I did remote console:


[***@mgt ~]# rcons qnode5118

[Enter `^Ec?' for help]

Info: SOL payload already de-activated

[SOL Session operational. Use ~? for help]

??6?+

?+#6cK??6?[?+s?

[ 131.345026] systemd[1]: Created slice Root Slice.

[ 131.350307] systemd[1]: Starting Root Slice.

[ OK ] Listening on Journal Socket.

[ 131.360103] systemd[1]: Listening on Journal Socket.

[ 131.365676] systemd[1]: Starting Journal Socket.

[ OK ] Listening on udev Control Socket.

[ 131.377102] systemd[1]: Listening on udev Control Socket.

[ 131.383152] systemd[1]: Starting udev Control Socket.

[ OK ] Listening on udev Kernel Socket.

[ 131.395101] systemd[1]: Listening on udev Kernel Socket.

[ 131.401055] systemd[1]: Starting udev Kernel Socket.

[ OK ] Reached target Sockets.

[ 131.411102] systemd[1]: Reached target Sockets.

[ 131.416180] systemd[1]: Starting Sockets.

[ OK ] Created slice System Slice.

[ 131.426105] systemd[1]: Created slice System Slice.

[ 131.431578] systemd[1]: Starting System Slice.

[ 131.437357] systemd[1]: Starting Apply Kernel Variables...

Starting Apply Kernel Variables...

[ 131.448652] systemd[1]: Starting Journal Service...

Any help is appreciated.
Thanks,
Damir
Jarrod Johnson
2017-05-26 02:35:01 UTC
Permalink
Or rather remove n8r from the ttyS0 console...

tty0 console being removed is fine, but anyone glancing at the local video may presume bad things when they just see a blinking cursor

-Written on my mobile device



On Thu, May 25, 2017 at 8:59 PM -0400, "Er Tao Zhao" <***@cn.ibm.com<mailto:***@cn.ibm.com>> wrote:

Will you pls try to remove the "console=tty0" kernel parameter in /boot/grub/grub.cfg in the host OS? Then, reboot it and take a look at.

Thx!
Best Regards,
-----------------------------------
Zhao Er Tao

IBM China System and Technology Laboratory, Beijing
Tel:(86-10)82450485
Email: ***@cn.ibm.com
Address: 1/F, 28 Building,ZhongGuanCun Software Park,
No.8 DongBeiWang West Road, Haidian District,
Beijing, 100193, P.R.China


----- Original message -----
From: Gilad Berman <***@lenovo.com>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] problems booting redhat 7.3 on NextScale 360M5
Date: Fri, May 26, 2017 3:27 AM


Does your /tftpboot/xcat/xnba/nodes/<node> (I am writing from memory, might be slightly different path) still have something like 115200n8r or only 115200?



It should be the latter. Maybe nodeset osimage will do?



Writing without access to my system so sorry for the mistakes 😊



[http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]


Gilad Berman
HPC Architect
Lenovo EMEA


[Phone]+972-52-2554262
[Email]***@lenovo.com<mailto:***@lenovo.com>





Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>



[DCG-Hardware]







From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 10:16 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Subject: Re: [xcat-user] problems booting redhat 7.3 on NextScale 360M5



So I removed it from the node’s definition in the nodehm table:



[***@mgt pxelinux.cfg]# nodels qnode5118 nodehm

qnode5118: nodehm.mgt: ipmi

qnode5118: nodehm.serialport: 0

qnode5118: nodehm.node: qnode5118

qnode5118: nodehm.serialspeed: 115200

qnode5118: nodehm.serialflow:

qnode5118: nodehm.cmdmapping:

qnode5118: nodehm.termport:

qnode5118: nodehm.comments:

qnode5118: nodehm.consoleondemand:

qnode5118: nodehm.cons:

qnode5118: nodehm.conserver:

qnode5118: nodehm.getmac:

qnode5118: nodehm.termserver:

qnode5118: nodehm.power:

qnode5118: nodehm.disable:



Set it to boot:

[***@mgt pxelinux.cfg]# nodeset qnode5118 boot

qnode5118: boot



Rebooted it and it still hangs:



[***@mgt pxelinux.cfg]# rpower qnode5118 boot

qnode5118: reset





On May 25, 2017, at 1:52 PM, Gilad Berman <***@lenovo.com<mailto:***@lenovo.com>> wrote:



Do you have consoles on demand set to yes in the site table (or specific to the node)?

If yes, remove “hard” from your console settings, nodeset again and try.



If this is a similar case, it is because when you set the flow control to hardware, the OS waits for the serial console to be connected (which is flow control..)





<image001.gif>


Gilad Berman
HPC Architect
Lenovo EMEA


<image002.gif>+972-52-2554262
<image003.gif>***@lenovo.com<mailto:***@lenovo.com>





Lenovo.com <http://www.lenovo.com/>
Twitter<http://twitter.com/lenovo> | Facebook<http://www.facebook.com/lenovo> | Instagram<https://instagram.com/lenovo> | Blogs<http://blog.lenovo.com/> | Forums<http://forums.lenovo.com/>



<image004.gif>







From: Damir Krstic [mailto:***@gmail.com]
Sent: Thursday, May 25, 2017 9:44 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net<mailto:xcat-***@lists.sourceforge.net>>
Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5



We are installing RH7.3 on NextScale nodes and after the install, node reboots. The problem is node seems to get "stuck" until I rcons into it and then it continues booting. So far, I can't pinpoint exact spot where it gets stuck, but it just sits there until I remote console (rcons) into it and then it continues booting. Here is where the last one got stuck until I did remote console:



[***@mgt ~]# rcons qnode5118

[Enter `^Ec?' for help]

Info: SOL payload already de-activated

[SOL Session operational. Use ~? for help]

??6?+

?+#6cK??6?[?+s?

[ 131.345026] systemd[1]: Created slice Root Slice.

[ 131.350307] systemd[1]: Starting Root Slice.

[ OK ] Listening on Journal Socket.

[ 131.360103] systemd[1]: Listening on Journal Socket.

[ 131.365676] systemd[1]: Starting Journal Socket.

[ OK ] Listening on udev Control Socket.

[ 131.377102] systemd[1]: Listening on udev Control Socket.

[ 131.383152] systemd[1]: Starting udev Control Socket.

[ OK ] Listening on udev Kernel Socket.

[ 131.395101] systemd[1]: Listening on udev Kernel Socket.

[ 131.401055] systemd[1]: Starting udev Kernel Socket.

[ OK ] Reached target Sockets.

[ 131.411102] systemd[1]: Reached target Sockets.

[ 131.416180] systemd[1]: Starting Sockets.

[ OK ] Created slice System Slice.

[ 131.426105] systemd[1]: Created slice System Slice.

[ 131.431578] systemd[1]: Starting System Slice.

[ 131.437357] systemd[1]: Starting Apply Kernel Variables...

Starting Apply Kernel Variables...

[ 131.448652] systemd[1]: Starting Journal Service...



Any help is appreciated.

Thanks,

Damir





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net<mailto:xCAT-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Loading...