Running on frontends

The group has access to a number of frontends for interactive usage.  These are used for data analysis, compiling of codes and shorter test runsThe priority is for interactive data analysis of large data sets.

We are all dependent on these machines,

  • Make sure there is enough memory available.
  • Release any memory demanding software when they are no longer required.
  • Background jobs should be executed using "nice +19 command" , allowing interactive jobs accessing the cpu power when needed.

There are two different types of machines:

  • astro04
    • Is only accessible from inside the HPC system (through astro06-09).
    • It has 247 Gb shared memory and 48 cores.
    • There are two locally mounted file systems /sc1 (66 Tb) and /sc2/astro (77 Tb) where you can make you own data directory.
    • This machine is used for data analysis and simple visualisation (no graphics cards), where the locally mounted file systems give much larger I/O speeds that can be obtained on astro06-09
  • astro06-09
    • Are frontends for the HPC system and accessible from your positive list of IP addresses.
    • Each machine has 768 Gb shared memory and 32 cores (64 threads).
    • All data disks are nfs mounted and therefore gives relative slow IO.
      • Instead one may use part of RAM (379 Gb) for temporary data storage through /dev/shm.
      • astro06 has a SSD drive, /scratch, with 2.7 Tb. This can be used for temporary storage for enhanced IO performance when doing data analysis. 
      • NOTE:  Please make sure to normally remove your data from /scratch and /dev/shm at the end of the day (/scratch may be used for slighly longer work sessions)
    • All nodes have access to a graphics card allowing to run 3D visualization tools using graphics acceleration through a VNC client -- cf. the Software pages

Running MPI jobs on frontends and interactive reservations

When compiling your code with the default setup (Intel compiler + Intel MPI) the queue system library plugin is automatically enabled. This is a good thing, because it ensures optimal performance and distribution of tasks on the compute cluster. A drawback is that it is not possible to run the programs interactively (e.g. from the command line) on the frontends.

There are two workarounds

  • (recommended) Use srun and execute through the front-end queue astro_fe. This is as simple as:
    • srun -p astro_fe -n 12 -t 1:00:00 ./exe arguments [run on any frontend machine for 1 hour with 12 cores]
    • srun -p astro_fe -w astro06 -n 12 -t 1:00:00 ./exe arguments [run on astro06 for 1 hour with 12 cores]
  • (not recommended) Recompile the application with another mpi library, such as mvapich2

Notice that srun will happily accept running on more than one frontend if neccesary, and that this works equally well for running on the compute nodes. E.g. :

  • srun -p astro_short -n 12 -t 1:00:00 ./exe arguments [run on any cluster machine for 1 hour with 12 cores]

is also fine, and will launch the process on any cluster node.

The advantage of running on the frontend nodes is that it is possible to access larger amounts of memory, and it is easy to attach e.g. a debugger if needed. The advantage of running on the compute nodes is that the nodes are reserved exclusively for the program, and the compute node cores are faster.

If interactive shell access is required this can be accomplished with

  • srun -p astro_long -t 1:00:00 -n 12 --pty bash [get a bash shell on a compute node, and have 12 cores reserved for your program]

Inside this shell all environment variables for SLURM are setup correctly, and in this environment it is possible to use e.g. mpiexec for launching mpi programs. Use astro_fe instead of astro_long to get a reservation on the frontend node.