CSG Computing Cluster

CSG Computing Cluster

Server racks

Physical Description

The CSG Computing Cluster is a group of machines dedicated to CPU intensive applications for individuals working with Dr. Goncalo Abecasis and Dr. Michael Boehnke. It consists of few "gateway" machines and a set of dedicated nodes.

The machines run MOSIX and use uniqnames from the umich.edu AFS domain. OpenAFS is available on all MOSIX gateways, so users may use AFS passwords or Unix passwords. If you choose the latter, you will need to manage your cluster password in addition to your AFS and Novell passwords.

The gateway machines (fantasia, snowwhite and peterpan) are multi-processor machines running at 3.0+GHz, each with far more real memory than the 'client' nodes. The 'client' nodes in the cluster have two to eight processors (cores) each with several GB of real memory and running at 2.8-3.2GHz. Some pictures of the cluster are available.

Details on how to use the MOSIX cluster are described here.

An overall status of the clusters can be seen here.

Performance details on the cluster can be seen here.

Policies

This cluster is not administered by SPH Computing Services. Problems or questions should be addressed to Terry Gliedt or Goncalo Abecasis. Please do not contact the SPH Help desk.

  • Security - users are required to take all reasonable precautions to prevent unauthorized access to this cluster. This includes choosing a secure password, keeping your password private, and keeping SSH sessions secure so someone cannot use your session.

  • HOME - the user's HOME will be on a local disk.

  • Shell - the user will get a 'tcsh' (smart C shell) by default. If you want another shell, just ask.

  • Backups - data in /home on this machine reside on a local RAID device to provide maximum protection against single disk failure. Data will also be copied on a regular basis to another device for an additional level of protection. If data on this machine is important, individual users are responsible for copying it somewhere (like AFS) where it will be safe.

  • Known Problems
    • MOSIX does not support shared memory system calls. Most likely you never code these yourself, but tools you DO use, might. For instance SAS always invokes a share memory system call and hence SAS does not work on the cluster nodes. In the same manner, some R functions make use of shared memory and so sometimes your R application cannot run using runon. If you need to run a SAS or some R job, run it directly on the gateway node (without runon.) If you have exceedingly long SAS/R tasks to run, maybe you should be running elsewhere.




CSG Computing Cluster
$RCSfile: intro.html,v $
$Date: 2008-02-27 15:31:29 $
$Revision: 1.1 $