Discussion:
[xcat-user] statelite vs stateless
David D. Johnson
2017-04-03 16:21:31 UTC
Permalink
We have several hundred stateless compute nodes, but we’re starting to wonder
if we should be using statelite provisioning instead. The primary issue we would hope
to address is having a place to save Kerberos host credentials. Of course we could
do this at boot time using some kind of post install script (anybody have one they’d
be willing to share?). Are there other advantages that any of you folks have found
to state-lite that would make me want to convert? Do you use local storage or something
like GPFS or NAS to hold the node-specific partitions?

Thanks for any advice / suggestions / caveats!

Regards,
— ddj
Dave Johnson
Brown University
Gilad Berman
2017-04-05 15:34:34 UTC
Permalink
David hello,

Because this has the potential to be a long debate, allow me to start with few quick notes -
PROS:
All the statelite pros you can think of -
- single image to manage (no puppet etc.).
- Small RAM footprint (essentially equivalent or close to equivalent to statefull)
- Unlike stateless, no RAM 'penalty' on image size
On top of that I would add - Ones deployed, easy management - much of the management burden simply not there.

CONS:
- Harder to setup and deploy
- not a lot of installations use statelite so naturally, more bugs can be found and less customer experience.
- NFS traffic - you need the right infrastructure to provide sufficient performance
- the infrastructure is yet another component that can fail. Usually you would like to have HA so this is another layer that add complexity.

We used GPFS cNFS as the mechanism to ensure HA to hold both the image and the node specific files. This turned out to be a solid solution and proved to be reliable.
We used the service nodes as the GPFS cNFS servers, with /install mounted as GPFS volume to all of them. This way we could also export the image itself with cNFS.

Bottom line - before we've deployed our first statelite cluster I was a bit sceptic, but this is actually working very well and I would recommend it (well, depends on the specifics, of course).

I am pretty sure you would like some more details so just let me know and I will be happy to provide.


Gilad Berman
HPC Architect
Lenovo EMEA
+972-52-2554262
***@lenovo.com
 

Lenovo.com
Twitter | Facebook | Instagram | Blogs | Forums






-----Original Message-----
From: David D. Johnson [mailto:***@brown.edu]
Sent: Monday, April 3, 2017 7:22 PM
To: xCAT Users Mailing list <xcat-***@lists.sourceforge.net>
Subject: [xcat-user] statelite vs stateless

We have several hundred stateless compute nodes, but we’re starting to wonder if we should be using statelite provisioning instead. The primary issue we would hope to address is having a place to save Kerberos host credentials. Of course we could do this at boot time using some kind of post install script (anybody have one they’d be willing to share?). Are there other advantages that any of you folks have found to state-lite that would make me want to convert? Do you use local storage or something like GPFS or NAS to hold the node-specific partitions?

Thanks for any advice / suggestions / caveats!

Regards,
— ddj
Dave Johnson
Brown University
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________
xCAT-user mailing list
xCAT-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Christopher Samuel
2017-04-05 22:56:41 UTC
Permalink
Hiya,
Post by David D. Johnson
Are there other advantages that any of you folks have found
to state-lite that would make me want to convert? Do you use local storage or something
like GPFS or NAS to hold the node-specific partitions?
We use statelite for all our x86 clusters and it's worked well for us.

The NFS server is just the management node (10gigE to the management
switch) and RAID5 or RAID6 (depending on how old the management node
is). There's not a lot of NFS load at all - we use a RAMdisk for
statelite boot.

Other advantages we've found are that we can:

1) keep /var/log on NFS for persistent logging across reboots for
non-syslog managed logs like Slurm.
2) keep the GPFS config directory on NFS so we don't need to
re-provision the node by hand on a reboot (we don't allow root logins
back to the management node in our xCAT config).

Here is our current litefile config for one of our clusters, the entries
marked as "persistent" are the ones stored on NFS:

***@merri-m:~# tabdump litefile
#image,file,options,comments,disable
"ALL","/etc/adjtime","tmpfs",,
"ALL","/etc/securetty","tmpfs",,
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/rsyslog.conf","tmpfs",,
"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,
"ALL","/etc/udev/","tmpfs",,
"ALL","/etc/ntp.conf.predhclient","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/resolv.conf.predhclient","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/opt/xcat/","tmpfs",,
"ALL","/xcatpost/","tmpfs",,
"ALL","/root/.ssh/","tmpfs",,
"ALL","/var/adm/","persistent",,
"ALL","/var/mmfs/","persistent",,
"ALL","/var/spool/abrt/","persistent",,
"ALL","/var/lib/random-seed","persistent",,
"ALL","/var/run/slurm/","persistent",,
"ALL","/var/log/","persistent",,
"ALL","/var/spool/slurm/","persistent",,
"ALL","/var/crash/","persistent",,
"ALL","/boot/",,,
"ALL","/etc/kdump.conf",,,

Best of luck!
Chris
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
Loading...