Diskless Linux Cluster      
January 27, 2000
Presented by Stephen Inkpen and Michael Rayment

Reasons for Using Diskless Linux:

1. To manage a large number of workstations in a consistent and efficient manner. Management includes software updates bug fixes, system configuration, filesystem integrity, security etc.

2. To control a group of computers form a central location. These computers could be embedded processors or large numbers of computers ganged together to solve compute intensive problems (beowulf clusters).


Motivation for the use of diskless Linux within the Department of Computer Science:

1. Lab computers are subject to a wide range of abuses that cause the system disk to become corrupted. This could be as a result of powering off or resetting the computer without properly shutting down the computer.

2. Upgrading system software without worrying about whether the machine is turned on or not.

3. Improved system security through the use of read only file systems and verification programs such as tripwire that ensure the integrity of the image that is presented to the user.


Design Considerations:

1. We did not want to sacrifice the overall workstation performance as a result of moving to a diskless solution.

2. We wanted to build in robustness so that we would not be incapacitated by a single point of failure.


Final Design:

1. The swapping of files over NFS was felt to drastically compromise the overall workstation performance issue so we opted for 128 megabytes of memory as a suitable amount of memory for each workstation. The system does perform moderately well on as little as 64 megabytes of memory as long as there are not too many simultaneously running applications or windows. (note we used xosview to determine this value)

2. The 100 megabit/sec switched ethernet networking infra structure was chosen to interconnect the clients and 1000 megabit/sec was chosen to connect to the servers.

3. Two system software servers equipped with 700 megahertz AMD processors and 256 megs of memory were chosen to deliver high speed file services to the desktop. Essentially these servers are intelligent network disk controllers. Two identical systems were chosen so that we could have protection against a major system failure. The sysetms disks on the server are using software RAID I to further enhance robustness.

4. User file servers are on separate servers to prevent the memory cache from being recycled.


Installation and Testing

The following steps were undertaken to bring up our diskless workstations:

1. Boot proms were made for the ethernet cards in our lab computers. They were then tested to make sure that hey sent out the proper dhcp/bootp requests.

2. A shared kernel image was compiled with the necessary options turned on. IN particular you need to define autoconfiguring via bootp and nfs root file system. Clearly you also need to define at lease one network adaptor and tcp/ip support. You do not need ext2 filesystem support but you may want to include support for cdrom filesysetms or other peripherals.

3. The dhcp configuration table was created with an entry consisting of the MAC address of each of the ethernet cards in all the workstations, the host name, ip address, the root file system and the system image. Note that you can have a different image loaded for each system but we chose to use a common image since all the computers are the same. Similarly you can use different root file systems or a shared root file system.

4. Tested the workstations to see that they loaded the image properly. You need to make sure that tftp is running and that the directories leading to the image and the image itself are accessible to anybody. To enable tftp look at the inetd.conf file in /etc. Make sure that the directory that tftp enables is one were you put your image.

5. Make sure that you have nfs support compiled into the kernel of the server.

6. Edit the file that allows NFS clients to connect to your server. This file is /etc/exports and contains lines indicating which client system can mount which directories. Make sure that you do not allow unlimited access to sensitive information and read protect filesystems that the clients do not need to write on.

7. Reboot the clients and see if they mount up the proper root file system.

8. Configure the fstab and the start up scripts to mount up the various user partitions as required. User partitions and the var partition must be mounted read/write whereas the root file system and the software under /usr can be mounted read only except for some configuration information that has to be stored individually in /etc.

9. The following code is included at the very beginning of the start up scripts to complete the configuration of the network before carrying on with the setup. Note that you can disable the normal network configuring script as the network has already been configured automatically by bootp.

HOSTNAME=`hostname`

echo configuring system ${HOSTNAME} ....

echo doing ifconfig for lo

ifconfig lo 127.0.0.1 netmask 255.0.0.0 broadcast 127.255.255.255

echo doing route for lo

route add -net 127.0.0.0 netmask 255.0.0.0 lo

echo running the portmapper...

portmap

echo mounting /var partition

mount -n -t nfs 134.153.1.9:/diskless/vars/${HOSTNAME} /var

echo mounting /proc partition

mount -n -t proc /proc /proc

echo running rpc.statd ...

rpc.statd

10. Get rid of superfluous functions in the crontab entries.


Conclusions Based on our Experience

1. Performance of the lab workstations is noticeably improved especially under loaded conditions such as the start of the lab and the launching of many large applications.

2. Extremely easy to install software.

3. Need to do more work in configuring of rsync to ensure that the redundant images are always kept in sync.



Useful Links

PVM3 http://www.epm.ornl.gov/pvm/pvm_home.html
LAM MPI http://www.mpi.nd.edu/lam/
MPICH http://www-unix.mcs.anl.gov/mpi/mich/
MPI FORUM http://www.mpi-forum.org/docs/docs.html
SETI@home http://www.setiathome.ssl.berkeley.edu/
distributed.net http://www.distributed.net



Remotely Controlled Embedded Processors

Examples:

Home Automation
The server could be located somewhere in the house and would control all devices through the X-10 protocol. Everything could be controlled that way such as VCR, TV, lighting, heating, telephone, etc.

Airplane Industry
All displays/controls are routed through the central server in the plane. If something breaks, the server would switch control to another display. The new display would download the operating instructions from the server and therefore commence performing the new service.



Photos from the seminar

Mike and Steve   The auditorium   The cluster   Steve with his cluster
Photos by Sheldon Andrews