Discussion:
[xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.
Viviano, Brad
2014-07-28 13:29:09 UTC
Permalink
Good morning,

I upgraded xCAT via yum last week from 2.8.3 to to 2.8.4. Since then, if I put a node (in this case a dx360M3) through discovery, discovery completes, but if I then try and use nodeset to use toolscenter or set this diskless image on the node (nodeset XXX osimage=), it isn't working.

I'm noticing that the nodeset and rsetboot commands are taking > 30 seconds to complete and when I reboot the node, it just tries to boot from disk. A little digging and it appears that nodeset isn't updating the underlying files on /tftpboot/xcat/xnba/nodes/* like expected:


# /usr/bin/time -p nodeset c4u34 netboot=toolscenter-x86_64-bomc
c4u34: netboot toolscenter-x86_64-bomc
real 31.20
user 0.11
sys 0.03


# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot toolscenter-x86_64-bomc
LOCALBOOT 0


# /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
c4u34: netboot rhel6_c-x86_64-compute
real 49.85
user 0.12
sys 0.01


# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot rhel6_c-x86_64-compute
LOCALBOOT 0



The long time makes me think it's hitting a timeout value and aborting. Did I miss something with the upgrade to 2.8.4? I have no problems with 2.8.3 and discovery is working on 2.8.4.


Thanks,

-Brad Viviano


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
Jarrod B Johnson
2014-07-28 13:53:53 UTC
Permalink
Odd that rsetboot should take a long time. I assume something like rpower
stat also takes a long time?



From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 09:49 AM
Subject: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.



Good morning,
I upgraded xCAT via yum last week from 2.8.3 to to 2.8.4. Since then,
if I put a node (in this case a dx360M3) through discovery, discovery
completes, but if I then try and use nodeset to use toolscenter or set this
diskless image on the node (nodeset XXX osimage=), it isn't working.
I'm noticing that the nodeset and rsetboot commands are taking > 30
seconds to complete and when I reboot the node, it just tries to boot from
disk. A little digging and it appears that nodeset isn't updating the
underlying files on /tftpboot/xcat/xnba/nodes/* like expected:

# /usr/bin/time -p nodeset c4u34 netboot=toolscenter-x86_64-bomc
c4u34: netboot toolscenter-x86_64-bomc
real 31.20
user 0.11
sys 0.03

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot toolscenter-x86_64-bomc
LOCALBOOT 0

# /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
c4u34: netboot rhel6_c-x86_64-compute
real 49.85
user 0.12
sys 0.01

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot rhel6_c-x86_64-compute
LOCALBOOT 0


The long time makes me think it's hitting a timeout value and aborting.
Did I miss something with the upgrade to 2.8.4? I have no problems with
2.8.3 and discovery is working on 2.8.4.

Thanks,
-Brad Viviano



===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
Viviano, Brad
2014-07-28 14:16:57 UTC
Permalink
Jarrod,

Sorry. The rsetboot isn't taking a long time. It's just nodeset that is. I cut/pasted a group of commands into the terminal for the system I was working on (nodeset, rsetboot, rpower) and it hung on nodeset. So, there is something going wrong with nodeset. I have another xCAT iDataplex cluster (my testing cluster) which I upgraded to 2.8.4 and it isn't having a problem:


# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#netboot rhel6_c-x86_64-compute
imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
imgload kernel
imgargs kernel imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz XCAT=${next-server}:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200 BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz
imgexec kernel


# /usr/bin/time -p nodeset c0u13 boot
c0u13: boot
real 0.47
user 0.07
sys 0.01


# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#boot
exit


# /usr/bin/time -p nodeset c0u13 osimage=rhel6_c-x86_64-netboot-compute
c0u13: netboot rhel6_c-x86_64-compute
real 1.01
user 0.07
sys 0.03


# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#netboot rhel6_c-x86_64-compute
imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
imgload kernel
imgargs kernel imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz XCAT=${next-server}:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200 BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz
imgexec kernel



Granted my test cluster is only a few nodes. But still, it's updating everything correctly. I also noticed that the node I put through discovery on 2.8.4 showed discovery method as "UNDEF" according to nodediscoverls:


# nodediscoverls | grep c4u34
18F64EE1-3CC1-DF11-B526-E41F134D13C2 c4u34 undef 7323AC1 23DF198


Where are my other nodes show it as "switch":


# nodediscoverls | grep c4u33
0ED7839B-3FC1-DF11-B752-E41F134D42D8 c4u33 switch 7323AC1 23DF167


-Brad


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
________________________________
From: Jarrod B Johnson <***@us.ibm.com>
Sent: Monday, July 28, 2014 9:53 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.


Odd that rsetboot should take a long time. I assume something like rpower stat also takes a long time?

[Inactive hide details for "Viviano, Brad" ---07/28/2014 09:49:38 AM---Good morning, I upgraded xCAT via yum last week from]"Viviano, Brad" ---07/28/2014 09:49:38 AM---Good morning, I upgraded xCAT via yum last week from 2.8.3 to to 2.8.4. Since then, if I put a n

From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 09:49 AM
Subject: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.

________________________________



Good morning,
I upgraded xCAT via yum last week from 2.8.3 to to 2.8.4. Since then, if I put a node (in this case a dx360M3) through discovery, discovery completes, but if I then try and use nodeset to use toolscenter or set this diskless image on the node (nodeset XXX osimage=), it isn't working.
I'm noticing that the nodeset and rsetboot commands are taking > 30 seconds to complete and when I reboot the node, it just tries to boot from disk. A little digging and it appears that nodeset isn't updating the underlying files on /tftpboot/xcat/xnba/nodes/* like expected:

# /usr/bin/time -p nodeset c4u34 netboot=toolscenter-x86_64-bomc
c4u34: netboot toolscenter-x86_64-bomc
real 31.20
user 0.11
sys 0.03

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot toolscenter-x86_64-bomc
LOCALBOOT 0

# /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
c4u34: netboot rhel6_c-x86_64-compute
real 49.85
user 0.12
sys 0.01

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot rhel6_c-x86_64-compute
LOCALBOOT 0


The long time makes me think it's hitting a timeout value and aborting. Did I miss something with the upgrade to 2.8.4? I have no problems with 2.8.3 and discovery is working on 2.8.4.

Thanks,
-Brad Viviano

===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Jarrod B Johnson
2014-07-28 14:37:49 UTC
Permalink
Does anything conspicous appear in /var/log/messages?



From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 10:22 AM
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.



Jarrod,
Sorry. The rsetboot isn't taking a long time. It's just nodeset that
is. I cut/pasted a group of commands into the terminal for the system I was
working on (nodeset, rsetboot, rpower) and it hung on nodeset. So, there is
something going wrong with nodeset. I have another xCAT iDataplex cluster
(my testing cluster) which I upgraded to 2.8.4 and it isn't having a
problem:

# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#netboot rhel6_c-x86_64-compute
imgfetch -n kernel http://$
{next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
imgload kernel
imgargs kernel
imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
XCAT=${next-server}:3001 NODE=c0u13 FC=0 console=tty0
console=ttyS0,115200 BOOTIF=01-${netX/machyp}
imgfetch http://$
{next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz

imgexec kernel

# /usr/bin/time -p nodeset c0u13 boot
c0u13: boot
real 0.47
user 0.07
sys 0.01

# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#boot
exit

# /usr/bin/time -p nodeset c0u13 osimage=rhel6_c-x86_64-netboot-compute
c0u13: netboot rhel6_c-x86_64-compute
real 1.01
user 0.07
sys 0.03

# cat /tftpboot/xcat/xnba/nodes/c0u13
#!gpxe
#netboot rhel6_c-x86_64-compute
imgfetch -n kernel http://$
{next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
imgload kernel
imgargs kernel
imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
XCAT=${next-server}:3001 NODE=c0u13 FC=0 console=tty0
console=ttyS0,115200 BOOTIF=01-${netX/machyp}
imgfetch http://$
{next-server}/tftpboot/xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz

imgexec kernel


Granted my test cluster is only a few nodes. But still, it's updating
everything correctly. I also noticed that the node I put through discovery
on 2.8.4 showed discovery method as "UNDEF" according to nodediscoverls:

# nodediscoverls | grep c4u34
18F64EE1-3CC1-DF11-B526-E41F134D13C2 c4u34 undef
7323AC1 23DF198

Where are my other nodes show it as "switch":

# nodediscoverls | grep c4u33
0ED7839B-3FC1-DF11-B752-E41F134D42D8 c4u33 switch
7323AC1 23DF167

-Brad



===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov

From: Jarrod B Johnson <***@us.ibm.com>
Sent: Monday, July 28, 2014 9:53 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.

Odd that rsetboot should take a long time. I assume something like rpower
stat also takes a long time?

Inactive hide details for "Viviano, Brad" ---07/28/2014 09:49:38 AM---Good
morning, I upgraded xCAT via yum last week from "Viviano, Brad"
---07/28/2014 09:49:38 AM---Good morning, I upgraded xCAT via yum last
week from 2.8.3 to to 2.8.4. Since then, if I put a n

From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 09:49 AM
Subject: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.





Good morning,
I upgraded xCAT via yum last week from 2.8.3 to to 2.8.4. Since then,
if I put a node (in this case a dx360M3) through discovery, discovery
completes, but if I then try and use nodeset to use toolscenter or set this
diskless image on the node (nodeset XXX osimage=), it isn't working.
I'm noticing that the nodeset and rsetboot commands are taking > 30
seconds to complete and when I reboot the node, it just tries to boot from
disk. A little digging and it appears that nodeset isn't updating the
underlying files on /tftpboot/xcat/xnba/nodes/* like expected:

# /usr/bin/time -p nodeset c4u34 netboot=toolscenter-x86_64-bomc
c4u34: netboot toolscenter-x86_64-bomc
real 31.20
user 0.11
sys 0.03

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot toolscenter-x86_64-bomc
LOCALBOOT 0

# /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
c4u34: netboot rhel6_c-x86_64-compute
real 49.85
user 0.12
sys 0.01

# cat /tftpboot/xcat/xnba/nodes/c4u34
#!gpxe
#netboot rhel6_c-x86_64-compute
LOCALBOOT 0


The long time makes me think it's hitting a timeout value and aborting.
Did I miss something with the upgrade to 2.8.4? I have no problems with
2.8.3 and discovery is working on 2.8.4.

Thanks,
-Brad Viviano
===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
------------------------------------------------------------------------------

Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Viviano, Brad
2014-07-28 14:50:46 UTC
Permalink
Just what I would expect:


# date ; /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
Mon Jul 28 10:44:49 EDT 2014
c4u34: netboot rhel6_c-x86_64-compute
real 31.77
user 0.11
sys 0.03
Viviano, Brad
2014-07-28 15:21:34 UTC
Permalink
Jarrod,

%bphash isn't coming into xnba.pm::setstate as expected. Data::Dumper shows:


BPHash: $VAR1 = 'c4u34';
$VAR2 = [
undef
];



Compared to my working 2.8.4 cluster:


BPHash: $VAR1 = 'c0u13';
$VAR2 = [
{
'kcmdline' => 'imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz XCAT=!myipfn!:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200',
'kernel' => 'xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel',
'initrd' => 'xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz'
}
];


So, it feels like I have a corruption in the database, maybe a hash or key is screwed up causing the hang and problems. I have a backup of the database files from dumpxCATdb. What's the best way to clear out and restore the database files from the backup.

I tried to stop xcatd, remove all the /etc/xcat/*.sqlite files and restart xcatd, but it complains. What is the approved procedure to restore sqlite files from backup?


-Brad


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
________________________________
From: Viviano, Brad <***@epa.gov>
Sent: Monday, July 28, 2014 10:50 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.


Just what I would expect:


# date ; /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
Mon Jul 28 10:44:49 EDT 2014
c4u34: netboot rhel6_c-x86_64-compute
real 31.77
user 0.11
sys 0.03
Lissa Valletta
2014-07-28 16:18:21 UTC
Permalink
All you have to do to restore your database is run restorexCATdb -p <path
to you backup directory>. It will replace all the tables. You should
never remove any of the files manually and you do not have to stop
xcatdb. But since it looks like you already have removed files, do
the restore in bypass mode.
service xcatd stop. ps -ef | grep xcatd and make sure all processes are
gone.
export XCATBYPASS=y
restorexCATdb -p <path to you backup>
unset XCATBYPASS
service xcatd start



Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102





From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 11:45 AM
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.



Jarrod,
%bphash isn't coming into xnba.pm::setstate as expected. Data::Dumper
shows:

BPHash: $VAR1 = 'c4u34';
$VAR2 = [
undef
];


Compared to my working 2.8.4 cluster:

BPHash: $VAR1 = 'c0u13';
$VAR2 = [
{
'kcmdline' =>
'imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
XCAT=!myipfn!:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200',
'kernel' =>
'xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel',
'initrd' =>
'xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz'
}
];

So, it feels like I have a corruption in the database, maybe a hash or key
is screwed up causing the hang and problems. I have a backup of the
database files from dumpxCATdb. What's the best way to clear out and
restore the database files from the backup.
I tried to stop xcatd, remove all the /etc/xcat/*.sqlite files and
restart xcatd, but it complains. What is the approved procedure to restore
sqlite files from backup?

-Brad



===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov

From: Viviano, Brad <***@epa.gov>
Sent: Monday, July 28, 2014 10:50 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.

Just what I would expect:

# date ; /usr/bin/time -p nodeset c4u34
osimage=rhel6_c-x86_64-netboot-compute
Mon Jul 28 10:44:49 EDT 2014
c4u34: netboot rhel6_c-x86_64-compute
real 31.77
user 0.11
sys 0.03
Viviano, Brad
2014-07-28 17:01:41 UTC
Permalink
Lissa/Jarrod,

As I suspected, there was a corruption of some type in the xCAT SQLite files. I made a backup of /etc/xcat before I upgraded from 2.8.3 to 2.8.4. So, I stopped xCATd, deleted /etc/xcat, restored it from backup created at 2.8.3 and then started xCATd back up. When I did, xCATd re-ran the schema updates and now everything is working correctly. Not sure if this is something in the RPM post section that messed up, or if the RPM ran the Schema updates without stopping xcatd, or what.

I have copies of my 2.8.3 database and 2.8.4 (post RPM that was broken) and 2.8.4 (working), if either of you are interested in debugging the problems I experienced, I can bundle them up and upload them to you in some secure fashion.

Just one more reason to make sure you have a complete and correct backup before performing any updates :).


However, that doesn't fix the problem of nodediscoverls reporting newly discovered nodes via SNMP as "undef" instead of "switch". It's still doing that, doesn't seem to have an impact on anything, but I am just letting you know in case it might impact some other piece of xCAT.


Thanks,

-Brad Viviano



===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov
________________________________
From: Lissa Valletta <***@us.ibm.com>
Sent: Monday, July 28, 2014 12:18 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.


All you have to do to restore your database is run restorexCATdb -p <path to you backup directory>. It will replace all the tables. You should never remove any of the files manually and you do not have to stop xcatdb. But since it looks like you already have removed files, do the restore in bypass mode.
service xcatd stop. ps -ef | grep xcatd and make sure all processes are gone.
export XCATBYPASS=y
restorexCATdb -p <path to you backup>
unset XCATBYPASS
service xcatd start



Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



[Inactive hide details for "Viviano, Brad" ---07/28/2014 11:45:28 AM---Jarrod, %bphash isn't coming into xnba.pm::setstate a]"Viviano, Brad" ---07/28/2014 11:45:28 AM---Jarrod, %bphash isn't coming into xnba.pm::setstate as expected. Data::Dumper shows:

From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 11:45 AM
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.

________________________________



Jarrod,
%bphash isn't coming into xnba.pm::setstate as expected. Data::Dumper shows:

BPHash: $VAR1 = 'c4u34';
$VAR2 = [
undef
];


Compared to my working 2.8.4 cluster:

BPHash: $VAR1 = 'c0u13';
$VAR2 = [
{
'kcmdline' => 'imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz XCAT=!myipfn!:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200',
'kernel' => 'xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel',
'initrd' => 'xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz'
}
];

So, it feels like I have a corruption in the database, maybe a hash or key is screwed up causing the hang and problems. I have a backup of the database files from dumpxCATdb. What's the best way to clear out and restore the database files from the backup.
I tried to stop xcatd, remove all the /etc/xcat/*.sqlite files and restart xcatd, but it complains. What is the approved procedure to restore sqlite files from backup?

-Brad

===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov

________________________________

From: Viviano, Brad <***@epa.gov>
Sent: Monday, July 28, 2014 10:50 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as expected.

Just what I would expect:

# date ; /usr/bin/time -p nodeset c4u34 osimage=rhel6_c-x86_64-netboot-compute
Mon Jul 28 10:44:49 EDT 2014
c4u34: netboot rhel6_c-x86_64-compute
real 31.77
user 0.11
sys 0.03
Lissa Valletta
2014-07-28 18:43:06 UTC
Permalink
You should not need to backing up directories like /etc/xcat. Just run
dumpxCATdb to backup your database and restorexCATdb to restore it; if you
think you have problems with your database. You should not even need to
do that normally. When you update xcat code, you should not stop the
xcatd. xCAT will take care of any xcatd stops and starts needed. In fact
the Schema updates take place with xcatd running and your database should
be preserved. Making a backup of /etc/xcat and restoring it on a new
version of xcat probably did not hurt but it could cause problems. You
also should not be deleting any files in /etc/xcat. Those files are created
by xCAT and what we expect in that directory might change from release to
release. Just let the xCAT commands do the work for you. All you should
have to do is a yum or zypper update of the xCAT code.

Did you have any errors on your yum update?
I guess you could check to see if the rpms are all at the correct level
rpm -qa | grep xCAT
lsxcatd -a




Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102





From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 01:04 PM
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.



Lissa/Jarrod,
As I suspected, there was a corruption of some type in the xCAT SQLite
files. I made a backup of /etc/xcat before I upgraded from 2.8.3 to 2.8.4.
So, I stopped xCATd, deleted /etc/xcat, restored it from backup created at
2.8.3 and then started xCATd back up. When I did, xCATd re-ran the schema
updates and now everything is working correctly. Not sure if this is
something in the RPM post section that messed up, or if the RPM ran the
Schema updates without stopping xcatd, or what.
I have copies of my 2.8.3 database and 2.8.4 (post RPM that was broken)
and 2.8.4 (working), if either of you are interested in debugging the
problems I experienced, I can bundle them up and upload them to you in some
secure fashion.
Just one more reason to make sure you have a complete and correct
backup before performing any updates :).

However, that doesn't fix the problem of nodediscoverls reporting newly
discovered nodes via SNMP as "undef" instead of "switch". It's still doing
that, doesn't seem to have an impact on anything, but I am just letting you
know in case it might impact some other piece of xCAT.

Thanks,
-Brad Viviano




===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov

From: Lissa Valletta <***@us.ibm.com>
Sent: Monday, July 28, 2014 12:18 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.

All you have to do to restore your database is run restorexCATdb -p <path
to you backup directory>. It will replace all the tables. You should
never remove any of the files manually and you do not have to stop
xcatdb. But since it looks like you already have removed files, do
the restore in bypass mode.
service xcatd stop. ps -ef | grep xcatd and make sure all processes are
gone.
export XCATBYPASS=y
restorexCATdb -p <path to you backup>
unset XCATBYPASS
service xcatd start



Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



Inactive hide details for "Viviano, Brad" ---07/28/2014 11:45:28
AM---Jarrod, %bphash isn't coming into xnba.pm::setstate a"Viviano,
Brad" ---07/28/2014 11:45:28 AM---Jarrod, %bphash isn't coming into
xnba.pm::setstate as expected. Data::Dumper shows:

From: "Viviano, Brad" <***@epa.gov>
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Date: 07/28/2014 11:45 AM
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.





Jarrod,
%bphash isn't coming into xnba.pm::setstate as expected. Data::Dumper
shows:

BPHash: $VAR1 = 'c4u34';
$VAR2 = [
undef
];


Compared to my working 2.8.4 cluster:

BPHash: $VAR1 = 'c0u13';
$VAR2 = [
{
'kcmdline' =>
'imgurl=http://172.20.0.1:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
XCAT=!myipfn!:3001 NODE=c0u13 FC=0 console=tty0 console=ttyS0,115200',
'kernel' =>
'xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel',
'initrd' =>
'xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz'
}
];

So, it feels like I have a corruption in the database, maybe a hash or key
is screwed up causing the hang and problems. I have a backup of the
database files from dumpxCATdb. What's the best way to clear out and
restore the database files from the backup.
I tried to stop xcatd, remove all the /etc/xcat/*.sqlite files and
restart xcatd, but it complains. What is the approved procedure to restore
sqlite files from backup?

-Brad
===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - ***@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - ***@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - ***@epa.gov



From: Viviano, Brad <***@epa.gov>
Sent: Monday, July 28, 2014 10:50 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Upgraded to xCAT 2.8.4, nodeset not working as
expected.

Just what I would expect:

# date ; /usr/bin/time -p nodeset c4u34
osimage=rhel6_c-x86_64-netboot-compute
Mon Jul 28 10:44:49 EDT 2014
c4u34: netboot rhel6_c-x86_64-compute
real 31.77
user 0.11
sys 0.03
Loading...