Discussion:
[xcat-user] upgrading xCAT onto new servers
David D. Johnson
2017-02-03 19:02:50 UTC
Permalink
We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,
and from centos6.7 to rhels7.2. I have the new frontend installed and somewhat functional.
Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.
I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to make
the new mgt5 a slave to the existing named running on mgt1. Any pros/cons? What would you do?

Thanks,

— ddj
Er Tao Zhao
2017-02-07 09:32:33 UTC
Permalink
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
David D Johnson
2017-02-07 11:35:56 UTC
Permalink
That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
I don't believe it will forward requests within the zones that it is authoritative.
I ended up using tabdump to recreate the hosts and nodelist tables. Mostly good.

Now the problem of the day is fixing the SSH credentials so that all the diskless nodes booting off the
new frontend can get root access to all the nodes still booted off the old frontend. Need this
especially for GPFS. I've been trying to follow what's going on in the remoteshell postscript,
and I'm wondering if my "sitespecific" postscript is running before "remoteshell" is competed.
Is there a way to determine/force the order the postscripts are executed? Sitespecific is after
remoteshell both in alphabet and in the lsdef output.
The basic problem is that mmsdrrestore fails during sitespecific, but works fine when I try it again later by hand.

-- ddj
Dave Johnson
Brown University

> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao <***@cn.ibm.com> wrote:
>
> Hi, David
>
> Will you pls try 'chdef -t site forwarders=<ip_of_mgt1>' and then 'makedns' to use mgt1 as your remote DNS server.
> Pls feel free to let me know if there is any more issues.
>
> Thx!
> Best Regards,
> -----------------------------------
> Zhao Er Tao
>
> IBM China System and Technology Laboratory, Beijing
> Tel:(86-10)82450485
> Email: ***@cn.ibm.com
> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
> No.8 DongBeiWang West Road, Haidian District,
> Beijing, 100193, P.R.China
>
>
> ----- Original message -----
> From: "David D. Johnson" <***@brown.edu>
> To: "xcat-***@lists.sourceforge.net" <xcat-***@lists.sourceforge.net>
> Cc:
> Subject: [xcat-user] upgrading xCAT onto new servers
> Date: Sat, Feb 4, 2017 3:04 AM
>
> We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,
> and from centos6.7 to rhels7.2. I have the new frontend installed and somewhat functional.
> Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.
> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to make
> the new mgt5 a slave to the existing named running on mgt1. Any pros/cons? What would you do?
>
> Thanks,
>
> — ddj
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> _______________________________________________
> xCAT-user mailing list
> xCAT-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot_______________________________________________
> xCAT-user mailing list
> xCAT-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
David D. Johnson
2017-02-07 19:02:09 UTC
Permalink
Drilling down deeper, seems to be two different situations.

On nodes without X11, remoteshell script takes less than a second,
-rw------- 1 root root 1231 Feb 7 2017 authorized_keys
-rw------- 1 root root 1675 Feb 7 2017 id_rsa
-rw------- 1 root root 410 Feb 7 2017 id_rsa.pub
7cb5ab60ff42ede791c823afd016997d /root/.ssh/authorized_keys
13f430f0001adff42dc250f818eabbd1 /root/.ssh/id_rsa
3f5101404ac152d4aaea6c62f7eb6e30 /root/.ssh/id_rsa.pub

However later in the script, trying to set up to start gpfs, I get this message:

Install: recovering gpfs sdr
Tue Feb 7 18:20:28 UTC 2017: mmsdrrestore: Processing node gpu002
mmsdrrestore: Run the command from an active terminal or enable global passwordless access.
mmsdrrestore: Unable to retrieve GPFS cluster files from node ut002.oscar.ccv.brown.edu
mmsdrrestore: File /var/mmfs/ssl/stage/genkeyData1 not found.
Use mmauth genkey to recover the file, or to generate and commit a new key.
mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1
mmsdrrestore: Command failed. Examine previous error messages to determine cause.


If I copy/paste the command from the postscript file, run it from ssh login, I get
[***@gpu002 xcat]# /usr/lpp/mmfs/bin/mmsdrrestore -p ut003 -R /usr/bin/scp
Tue Feb 7 18:26:10 UTC 2017: mmsdrrestore: Processing node gpu002
Warning: Permanently added 'ut002.oscar.ccv.brown.edu' (RSA) to the list of known hosts.
mmsdrrestore: Node gpu002 successfully restored.

There is no difference in the /root/.ssh files before or after. Why does it work by hand, but not from inside script?

Found that on nodes with X11, remoteshell script was taking 12 minutes to run to “completion”,
and the result is zero length id_rsa.pub file.

-rw------- 1 root root 821 Feb 7 12:59 authorized_keys
-rw------- 1 root root 0 Feb 7 13:09 id_rsa.pub
-rw-r--r-- 1 root root 183 Feb 7 13:02 known_hosts
4cd344ed6d3721a283f442977862b981 /root/.ssh/authorized_keys
d41d8cd98f00b204e9800998ecf8427e /root/.ssh/id_rsa.pub
a178f5a553c74d99590b2047d9517363 /root/.ssh/known_hosts

I thought it was NetworkManager, but it turns out it was firewalld.
(chroot . systemctl disable firewalld )

— ddj

> On Feb 7, 2017, at 6:35 AM, David D Johnson <***@brown.edu> wrote:
>
> That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
> I don't believe it will forward requests within the zones that it is authoritative.
> I ended up using tabdump to recreate the hosts and nodelist tables. Mostly good.
>
> Now the problem of the day is fixing the SSH credentials so that all the diskless nodes booting off the
> new frontend can get root access to all the nodes still booted off the old frontend. Need this
> especially for GPFS. I've been trying to follow what's going on in the remoteshell postscript,
> and I'm wondering if my "sitespecific" postscript is running before "remoteshell" is competed.
> Is there a way to determine/force the order the postscripts are executed? Sitespecific is after
> remoteshell both in alphabet and in the lsdef output.
> The basic problem is that mmsdrrestore fails during sitespecific, but works fine when I try it again later by hand.
>
> -- ddj
> Dave Johnson
> Brown University
>
>> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao <***@cn.ibm.com <mailto:***@cn.ibm.com>> wrote:
>>
>> Hi, David
>>
>> Will you pls try 'chdef -t site forwarders=<ip_of_mgt1>' and then 'makedns' to use mgt1 as your remote DNS server.
>> Pls feel free to let me know if there is any more issues.
>>
>> Thx!
>> Best Regards,
>> -----------------------------------
>> Zhao Er Tao
>>
>> IBM China System and Technology Laboratory, Beijing
>> Tel:(86-10)82450485
>> Email: ***@cn.ibm.com <mailto:***@cn.ibm.com>
>> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
>> No.8 DongBeiWang West Road, Haidian District,
>> Beijing, 100193, P.R.China
>>
>>
>> ----- Original message -----
>> From: "David D. Johnson" <***@brown.edu <mailto:***@brown.edu>>
>> To: "xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>" <xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>>
>> Cc:
>> Subject: [xcat-user] upgrading xCAT onto new servers
>> Date: Sat, Feb 4, 2017 3:04 AM
>>
>> We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,
>> and from centos6.7 to rhels7.2. I have the new frontend installed and somewhat functional.
>> Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.
>> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to make
>> the new mgt5 a slave to the existing named running on mgt1. Any pros/cons? What would you do?
>>
>> Thanks,
>>
>> — ddj
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________ <http://sdm.link/slashdot_______________________________________________>
>> xCAT-user mailing list
>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
David D Johnson
2017-02-07 22:35:52 UTC
Permalink
Historical details: cluster originally set up with GPFS accessMode= allToAll, back in 2009.
All nodes still have ability to ssh as root to any other node as root, but we changed
the gpfs cluster configuration to accessMode=central a few years ago.
Some day we may want to tighten it down, but not now.

Originally /var/mmfs/gen was copied into the diskless boot image, but this
became painful as we continued to add a few dozen new nodes every few months,
and would have to respin the diskless images to update the SDR.

Nodes in the old cluster continues to work fine with xCAT 2.8.x and centos 6.7 and gpfs 4.2.2-1.
New cluster nodes are exhibiting this problem with xCAT 2.13.1 and rhels7.2 and gpfs 4.2.2-1.

So, did xCAT formerly run postscripts in a pseudo-tty environment, and not any more?

The goal is to have each compute node be able to boot up diskless, rejoin GPFS at will
without any manual intervention.
> On Feb 7, 2017, at 2:02 PM, David D. Johnson <***@brown.edu> wrote:
>
> Drilling down deeper, seems to be two different situations.
>
> On nodes without X11, remoteshell script takes less than a second,
> -rw------- 1 root root 1231 Feb 7 2017 authorized_keys
> -rw------- 1 root root 1675 Feb 7 2017 id_rsa
> -rw------- 1 root root 410 Feb 7 2017 id_rsa.pub
> 7cb5ab60ff42ede791c823afd016997d /root/.ssh/authorized_keys
> 13f430f0001adff42dc250f818eabbd1 /root/.ssh/id_rsa
> 3f5101404ac152d4aaea6c62f7eb6e30 /root/.ssh/id_rsa.pub
>
> However later in the script, trying to set up to start gpfs, I get this message:
>
> Install: recovering gpfs sdr
> Tue Feb 7 18:20:28 UTC 2017: mmsdrrestore: Processing node gpu002
> mmsdrrestore: Run the command from an active terminal or enable global passwordless access.
> mmsdrrestore: Unable to retrieve GPFS cluster files from node ut002.oscar.ccv.brown.edu <http://ut002.oscar.ccv.brown.edu/>
> mmsdrrestore: File /var/mmfs/ssl/stage/genkeyData1 not found.
> Use mmauth genkey to recover the file, or to generate and commit a new key.
> mmsdrrestore: Unexpected error from updateMmfsEnvironment. Return code: 1
> mmsdrrestore: Command failed. Examine previous error messages to determine cause.
>
>
> If I copy/paste the command from the postscript file, run it from ssh login, I get
> [***@gpu002 xcat]# /usr/lpp/mmfs/bin/mmsdrrestore -p ut003 -R /usr/bin/scp
> Tue Feb 7 18:26:10 UTC 2017: mmsdrrestore: Processing node gpu002
> Warning: Permanently added 'ut002.oscar.ccv.brown.edu <http://ut002.oscar.ccv.brown.edu/>' (RSA) to the list of known hosts.
> mmsdrrestore: Node gpu002 successfully restored.
>
> There is no difference in the /root/.ssh files before or after. Why does it work by hand, but not from inside script?
>
> Found that on nodes with X11, remoteshell script was taking 12 minutes to run to “completion”,
> and the result is zero length id_rsa.pub file.
>
> -rw------- 1 root root 821 Feb 7 12:59 authorized_keys
> -rw------- 1 root root 0 Feb 7 13:09 id_rsa.pub
> -rw-r--r-- 1 root root 183 Feb 7 13:02 known_hosts
> 4cd344ed6d3721a283f442977862b981 /root/.ssh/authorized_keys
> d41d8cd98f00b204e9800998ecf8427e /root/.ssh/id_rsa.pub
> a178f5a553c74d99590b2047d9517363 /root/.ssh/known_hosts
>
> I thought it was NetworkManager, but it turns out it was firewalld.
> (chroot . systemctl disable firewalld )
>
> — ddj
>
>> On Feb 7, 2017, at 6:35 AM, David D Johnson <***@brown.edu <mailto:***@brown.edu>> wrote:
>>
>> That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
>> I don't believe it will forward requests within the zones that it is authoritative.
>> I ended up using tabdump to recreate the hosts and nodelist tables. Mostly good.
>>
>> Now the problem of the day is fixing the SSH credentials so that all the diskless nodes booting off the
>> new frontend can get root access to all the nodes still booted off the old frontend. Need this
>> especially for GPFS. I've been trying to follow what's going on in the remoteshell postscript,
>> and I'm wondering if my "sitespecific" postscript is running before "remoteshell" is competed.
>> Is there a way to determine/force the order the postscripts are executed? Sitespecific is after
>> remoteshell both in alphabet and in the lsdef output.
>> The basic problem is that mmsdrrestore fails during sitespecific, but works fine when I try it again later by hand.
>>
>> -- ddj
>> Dave Johnson
>> Brown University
>>
>>> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao <***@cn.ibm.com <mailto:***@cn.ibm.com>> wrote:
>>>
>>> Hi, David
>>>
>>> Will you pls try 'chdef -t site forwarders=<ip_of_mgt1>' and then 'makedns' to use mgt1 as your remote DNS server.
>>> Pls feel free to let me know if there is any more issues.
>>>
>>> Thx!
>>> Best Regards,
>>> -----------------------------------
>>> Zhao Er Tao
>>>
>>> IBM China System and Technology Laboratory, Beijing
>>> Tel:(86-10)82450485
>>> Email: ***@cn.ibm.com <mailto:***@cn.ibm.com>
>>> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
>>> No.8 DongBeiWang West Road, Haidian District,
>>> Beijing, 100193, P.R.China
>>>
>>>
>>> ----- Original message -----
>>> From: "David D. Johnson" <***@brown.edu <mailto:***@brown.edu>>
>>> To: "xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>" <xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>>
>>> Cc:
>>> Subject: [xcat-user] upgrading xCAT onto new servers
>>> Date: Sat, Feb 4, 2017 3:04 AM
>>>
>>> We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,
>>> and from centos6.7 to rhels7.2. I have the new frontend installed and somewhat functional.
>>> Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.
>>> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to make
>>> the new mgt5 a slave to the existing named running on mgt1. Any pros/cons? What would you do?
>>>
>>> Thanks,
>>>
>>> — ddj
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>>> _______________________________________________
>>> xCAT-user mailing list
>>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________ <http://sdm.link/slashdot_______________________________________________>
>>> xCAT-user mailing list
>>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>
>
Christopher Samuel
2017-02-08 22:39:28 UTC
Permalink
On 07/02/17 22:35, David D Johnson wrote:

> Now the problem of the day is fixing the SSH credentials so that all
> the diskless nodes booting off the new frontend can get root access
> to all the nodes still booted off the old frontend. Need this
> especially for GPFS.

For our diskless nodes we run them as statelite and have this in our
litefile config:

"ALL","/var/adm/","persistent",,
"ALL","/var/mmfs/","persistent",,

this means that once set up our diskless nodes need no manual
intervention to bring GPFS up.

Hope this helps!
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
David D. Johnson
2017-05-25 14:15:03 UTC
Permalink
It looks like I forgot to follow up with the resolution to this problem.

The actual workaround was to add “< /dev/console” to the command line.

/usr/lpp/mmfs/bin/mmsdrrestore -p ut003 -R /usr/bin/scp < /dev/console

The problem seems to be that under RHEL7, the postscripts do not have a /dev/tty and stdin is not a tty.
I did not check under CentOS 6, but presume it had worked (and still works) that way because
the scripts had always been run under a pseudoterminal. If the cluster is set up with central mode
ssh authentication, the xcat code refuses to attempt an operation that might require typing a password
if there is no tty ….

Hope this is useful.

— ddj
Dave Johnson
Brown University

> On Feb 8, 2017, at 5:39 PM, Christopher Samuel <***@unimelb.edu.au> wrote:
>
> On 07/02/17 22:35, David D Johnson wrote:
>
>> Now the problem of the day is fixing the SSH credentials so that all
>> the diskless nodes booting off the new frontend can get root access
>> to all the nodes still booted off the old frontend. Need this
>> especially for GPFS.
>
> For our diskless nodes we run them as statelite and have this in our
> litefile config:
>
> "ALL","/var/adm/","persistent",,
> "ALL","/var/mmfs/","persistent",,
>
> this means that once set up our diskless nodes need no manual
> intervention to bring GPFS up.
>
> Hope this helps!
> Chris
> --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> xCAT-user mailing list
> xCAT-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
Er Tao Zhao
2017-02-08 09:52:05 UTC
Permalink
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
David D Johnson
2017-02-08 10:44:02 UTC
Permalink
The problem with /root/.ssh credentials (zero length files, 12 minute timeouts)
was because adding gnome to the image pulled in and enabled "firewalld".
This prevented access on ports 3001 .....
Disabling firewalld in the image resolved that problem

At the moment the only problem is that the mmsdrrestore command that always
worked for us in xCAT 2.8, from postscript sitespecific, no longer works in xCAT 2.13.1.
This is because the GPFS command mmsdrfsdef called from mmsdrrestore is not trying
to copy the credentials, because adminMode=central and fd 0 is not a tty.
I found the line using grep in /usr/lpp/mmfs/bin for "passwordless".

We had changed the GPFS adminMode years ago in anticipation of tightening access down,
but never got around to actually revoking "all to all" access on the nodes, so it still works.

I have a feeling older version of xcat ran postscripts in a pseudo-tty environment.
Not familiar enough with the internals to know for sure.

I'm trying to come up with other ways of sucking down the SDR, maybe wget, that doesn't need ssh as root.
The goal is for the diskless node to be able to reboot and rejoin the GPFS cluster without any manual intervention.

Thanks,
-- ddj

# lsdef gpu002 --osimage
Object name: gpu002
arch=x86_64
bmc=gpu002-bmc
bmcport=0
currstate=netboot rhels7.2-x86_64-cave
groups=gpu,debug,ipmi,rackA4,rackA4B,all,chs
initrd=xcat/osimage/rhels7.2-x86_64-netboot-cave/initrd-stateless.gz
installnic=eno1
ip=172.20.105.82
kcmdline=imgurl=http://172.20.0.6:80//install/netboot/rhels7.2/x86_64/cave/rootimg.cpio.gz XCAT=!myipfn!:3001 NODE=gpu002 FC=0 netdev=eno1 console=tty0 console=ttyS0,115200
kernel=xcat/osimage/rhels7.2-x86_64-netboot-cave/kernel
mac=e4:1f:13:84:55:f2!gpu002
mgt=ipmi
netboot=xnba
nfsserver=172.20.0.6
os=rhels7.2
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles,setupntp,ipoib,sitespecific
power=ipmi
primarynic=eno1
profile=cave
provmethod=rhels7.2-x86_64-netboot-cave
serialport=0
serialspeed=115200
status=netbooting
statustime=02-07-2017 13:19:41
tftpserver=172.20.0.6
profile=cave
pkglist=/install/custom/netboot/rh/cave.rhels7.x86_64.pkglist
osname=Linux
postinstall=/install/custom/netboot/rh/cave.rhels7.x86_64.postinstall
exlist=/install/custom/netboot/rh/cave.rhels7.x86_64.exlist
osdistroname=rhels7.2-x86_64
osvers=rhels7.2
objtype=osimage
osarch=x86_64
provmethod=netboot
rootimgdir=/install/netboot/rhels7.2/x86_64/cave
imagetype=linux
otherpkgdir=/install/post/otherpkgs/rhels7.2/x86_64
pkgdir=/install/rhels7.2/x86_64
-- ddj
Dave Johnson
> On Feb 8, 2017, at 4:52 AM, Er Tao Zhao <***@cn.ibm.com> wrote:
>
> Hi, David
>
> Can you show me the node definition?
> The postbootscript will be run one by one on the CN no matter of the alphabet order.
>
> Thx!
> Best Regards,
> -----------------------------------
> Zhao Er Tao
>
> IBM China System and Technology Laboratory, Beijing
> Tel:(86-10)82450485
> Email: ***@cn.ibm.com
> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
> No.8 DongBeiWang West Road, Haidian District,
> Beijing, 100193, P.R.China
>
>
> ----- Original message -----
> From: David D Johnson <***@brown.edu>
> To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
> Cc:
> Subject: Re: [xcat-user] upgrading xCAT onto new servers
> Date: Tue, Feb 7, 2017 8:04 PM
>
> That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
> I don't believe it will forward requests within the zones that it is authoritative.
> I ended up using tabdump to recreate the hosts and nodelist tables. Mostly good.
>
> Now the problem of the day is fixing the SSH credentials so that all the diskless nodes booting off the
> new frontend can get root access to all the nodes still booted off the old frontend. Need this
> especially for GPFS. I've been trying to follow what's going on in the remoteshell postscript,
> and I'm wondering if my "sitespecific" postscript is running before "remoteshell" is competed.
> Is there a way to determine/force the order the postscripts are executed? Sitespecific is after
> remoteshell both in alphabet and in the lsdef output.
> The basic problem is that mmsdrrestore fails during sitespecific, but works fine when I try it again later by hand.
>
> -- ddj
> Dave Johnson
> Brown University
>
>> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao <***@cn.ibm.com <mailto:***@cn.ibm.com>> wrote:
>>
>> Hi, David
>>
>> Will you pls try 'chdef -t site forwarders=<ip_of_mgt1>' and then 'makedns' to use mgt1 as your remote DNS server.
>> Pls feel free to let me know if there is any more issues.
>>
>> Thx!
>> Best Regards,
>> -----------------------------------
>> Zhao Er Tao
>>
>> IBM China System and Technology Laboratory, Beijing
>> Tel:(86-10)82450485
>> Email: ***@cn.ibm.com <mailto:***@cn.ibm.com>
>> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
>> No.8 DongBeiWang West Road, Haidian District,
>> Beijing, 100193, P.R.China
>>
>>
>> ----- Original message -----
>> From: "David D. Johnson" <***@brown.edu <mailto:***@brown.edu>>
>> To: "xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>" <xcat-***@lists.sourceforge.net <mailto:xcat-***@lists.sourceforge.net>>
>> Cc:
>> Subject: [xcat-user] upgrading xCAT onto new servers
>> Date: Sat, Feb 4, 2017 3:04 AM
>>
>> We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,
>> and from centos6.7 to rhels7.2. I have the new frontend installed and somewhat functional.
>> Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.
>> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to make
>> the new mgt5 a slave to the existing named running on mgt1. Any pros/cons? What would you do?
>>
>> Thanks,
>>
>> — ddj
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot <http://sdm.link/slashdot>
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________ <http://sdm.link/slashdot_______________________________________________>
>> xCAT-user mailing list
>> xCAT-***@lists.sourceforge.net <mailto:xCAT-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> _______________________________________________
> xCAT-user mailing list
> xCAT-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot_______________________________________________
> xCAT-user mailing list
> xCAT-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
Loading...