NVIDIA GPU HPL Benchmark Important Command And Attributes | Benchmark Error And Solution

1)List Out the NVIDIA Device

$ lspci -kv | grep -i -A 10 nVidia
02:00.0 3D controller: nVidia Corporation Device 1028 (rev a1)
        Subsystem: nVidia Corporation Device 1015
        Flags: bus master, fast devsel, latency 0, IRQ 32
        Memory at dd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3c0fe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 3c0ff0000000 (64-bit, prefetchable) [size=32M]
        [virtual] Expansion ROM at de000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia, nouveau, nvidiafb
03:00.0 3D controller: nVidia Corporation Device 1028 (rev a1)
        Subsystem: nVidia Corporation Device 1015
        Flags: bus master, fast devsel, latency 0, IRQ 40
        Memory at db000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3c0fc0000000 (64-bit, prefetchable) [size=256M]
        Memory at 3c0fd0000000 (64-bit, prefetchable) [size=32M]
        [virtual] Expansion ROM at dc000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia, nouveau, nvidiafb

84:00.0 3D controller: nVidia Corporation Device 1028 (rev a1)
        Subsystem: nVidia Corporation Device 1015
        Flags: bus master, fast devsel, latency 0, IRQ 64
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3c1fe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 3c1ff0000000 (64-bit, prefetchable) [size=32M]
        [virtual] Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia, nouveau, nvidiafb

2)To Get The Details Information About The Devices.

3)To Monitor the Devices Process.
#nvidia-smi
#nvidia-smi -a

HPL.dat File Important Attributes.

1)Header File Add Symbol -I For Library Symbol -L
MPinc        = -I/shared/apps/openmpi/intel-compilers/openmpi-1.6.3/include/
LAlib        = -L $(TOPdir)/src/cuda  -L/shared/apps/cuda/cuda-5.0/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5

2)System Going To Hang.
HPL.dat value if the Ns value is too high rather than the available physical memory.it will try to utilize the swap memory it leads to hangup the system.

NVIDIA HPL Benchmark Error Message & Solution.

 Error Message 1: Even give the proper MPI Process getting following error message.
HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 8 processes for these tests <<<
HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 8 processes for these tests <<<
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<
........................................
.........................................
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<
SOLUTION: The problem will resolved by setting the proper MPI Environment Module using Module parameter.  Error Message 2: Not enough GPUs on node
 The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0
[compute022:23207] 5 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[compute022:23207] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
!!! ERROR: Not enough GPUs on node compute0xx, 3 GPUs found, 6 GPUs required !!!
--------------------
SOLUTION :If HPL JOb is assigned to the Nvidia GPU Card.if we run more process rather than the specified GPU card then we will get message like you have only 3 GPU card so specified only 3 Processor don't specify more than 3 Processor.
If it is didn't identify(or)discern these input then xhpl job is not fire on the GPU Cards only CPU these consequence we will not
expected output.

Error Message 3: Header File Not Found.

In file included from /home/hcl/code/hpl-2.0_FERMI_v15/include/hpl.h:90,
 from ../HPL_dlacpy.c:50:
/home/hcl/code/hpl-2.0_FERMI_v15/include/hpl_ptimer.h:84: error: expected ‘)’ before ‘comm’
make[2]: *** [HPL_dlacpy.o] Error 1
make[2]: Leaving directory `/home/hcl/code/hpl-2.0_FERMI_v15/src/auxil/CUDA'
make[1]: *** [build_src] Error 2
make[1]: Leaving directory `/home/hcl/code/hpl-2.0_FERMI_v15'
make: *** [build] Error 2
SOLUTION: Add export the header file.
export C_INCLUDE_PATH=/shared/apps/openmpi/openmpi-1.6.3/include/:/home/shared/apps/cuda/cuda-5.5/include/ 


Post a Comment

0 Comments