Debugging and profiling

Debugging

There exists a number of debuggers on the cluster:

  • The standard debugger for the GNU compilers is gdb
  • The debugger supplied with the Intel compiler suite is a modified version of gdb, called gdb-ia
  • Together with the Portland Group fortran compiler is the graphical debugger pgdbg

OpenMP correctness checking

To check a program for race conditions, memory errors and many other memory or OpenMP related problems, check it with the intel inspector tool. The steps to follow are

  • Load an intel intel compiler, and the corresponding version of intel inspector using the module command. E.g. module load intel intel/inspector will work.
  • Recompile the program.
  • Prepare a small test setup. Intel inspector runs the program 5-50x times slower, because it checks all memory references.
  • Launch the intel inspector GUI with inspxe-gui and run the program by setting up a program. The GUI is heavy. It is recommended to run it in a VNC session [See Software-VNC]

Profiling

Intel provides some useful profiling tools that are installed on the cluster. The most interesting are Advisor (for vectorization analysis) and Amplifier (for general profiling).

Advisor

To run Advisor load module intel/advisor/* and open the GUI by typing advixe-gui (assuming that you ssh-ed with -X or -Y option). Compile your code with ifort using the debug option -g and full optimization (e.g. -ip -ipo -O3). Don't forget to force vectorization with -vec -simd and -qopenmp if you want to use OpenMP. Then start a new project and select your code executable. It is important to note that your code should "gently" run until the end, i.e. without crashing or manual interrupting. For this reason it is probably better not to test a code that runs in 10 hours. Press Collect ▶ button and wait. The results page will show where vectorization worked properly, where not and where it is worth spending some time change your code to a vectorized version. More information here.

Amplifier

This tool allows you to profile your code. Use the module intel/vtune/* and then start the GUI by typing amplxe-gui. Compile your code with ifort using the debug option -g and full optimization (e.g. -ip -ipo -O3). Start a new project and add the name of your executable. Then select "basic hotspot", press the button Start ▶, and wait until the code "gently" ends (i.e. without crashes). You will get several tabs containing the results. "Summary" tab contains the most expensive parts of your code, while "bottom-up" tab gives information on who is calling who. Other tabs are intuitive. Double-clicking on the table items shows the source code with more information on the time spent on bottlenecks code lines. More information here.

Using the Intel tools with MPI

If you want to use Intel Inspector/Advisor on the cluster with MPI, some extra work is required.

  1. Compile with mpiifort.org rather than the usual mpiifort.
    • Generally, Intel recommends that you compile your program with -g -check none -O0 -qopenmp for Inspector, and -g -O2 -vec -simd -qopenmp for Advisor; VTune TBD.
  2. Ask SLURM for resources. For example:
    srun -p astro_devel -t 1:00:00 --nodes=2 --pty bash
    • Note that you can also use the front-ends (astro_fe).
  3. By using mpiifort.org, we are circumventing the SLURM libraries. This has the benefit that we can use mpiexec.hydra directly, but the downside is that the environment variables are all mucked up and MPI doesn't know which hosts it should use... = P
    • Invoke scontrol show hostnames > hosts; if you run on a front-end, your host file should only have one entry.
    • Unset the following environment variables:
      unset I_MPI_HYDRA_JMI_LIBRARY
              unset I_MPI_HYDRA_BOOTSTRAP
              unset I_MPI_PMI_LIBRARY
  4. Run the command-line version of the Intel tool through the mpiexec.hydra. If you don't know what options to set, launch the GUI version on a front-end, set up your experiment, and then click the "command line" button and copy-paste.
    • For example:
      mpiexec.hydra -hosts hosts inspxe-cl -collect mi1 -knob stack-depth=8
       -r ~/r109mi1 -- hostname
    • Note that there is an outstanding issue with the result folders created by the Intel tools (e.g., r109mi1) and the Lustre set up. It is thus recommended that you set the result directory to somewhere in your home directory (using, e.g., -r ~/foldername) until the issue is resolved.
    • Additional note: if you run on a front-end, you might see an error at the end of the experiment that states the results directory is busy or that it cannot be accessed. This is because more than one MPI rank is trying to write to the same place at the same time; in the end, you only have access to the results from one of your MPI processes, but you will still have some result to look at. If you, meanwhile, run on the compute nodes, the Intel tools are smart enough to create a folder for each host/node.
  5. Open the results file(s) in the Intel Inspector/Advisor GUIs on a front-end.
  6. Start fixing bugs and improving performance! ;)