HPC Performance Calculator HPL Benchmark

hpc perfromance is calculating in flops(floating point operation per second).
Rpeak - Theoritical Performance
Rmax - Particial performance value.
Mflops = 10^6 flops (Millions of Flops)
Gflops = 10^9 flops (Billions of Flops)
Tflops = 10^12 flops (Trillions of Flops)
Pflops = 10^15 flops (peta FLOPS)
Eflops = 10^18 flops (Exa FLops)
Zflops = 10^21 flops (Zetta Flops)
Yflops = 10^24 flops (yotta Flops)
Performance it is depants on
CPU - Memory - Cluster Inter network.

HPL  High Performance Link pack.
HPC Benchmark HPL Calculator sheet.
http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html
HPL Calculator.
http://hpl-calculator.sourceforge.net/
 
HPL.dat Important Parameter

P*Q Processor Grid.
NS Value depants on Memory (Problem size is NS should be largest fit into the memory to get better performance)
N Number of Problem
NB Block Size


Linpack Benchmark - Not efficiency for Matrix Problem.
Best for Block Operation  - memory hierarchies.
LAPACK - Slove the Matrix Problem In Linear algebra

HPL it is depants on 
BLAS ( Basic Linear Algebra Sub Program ]
(or)
VSIPL (Vector Signal Image Processing Library ]
 Downloading Link : HPL Benchmark - http://netlib.org/benchmark/hpl/
 After extract the package.
1)Inside the setup folder. choose any one of the make file depants upon your architecture.Ex:Make.Linux_PII_CBLAS
2)Inside these file Edit Variable according to the current Setup.
For Example.
a)Set the HPL Architecture
ARCH         = HPL_FBLAS
TOPdir       = /opt/cluster/benchmark/hpl
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
HPLlib       = $(LIBdir)/libhpl.a
ARCH         = HPL_FBLAS
b)set the MPI Enviroment Variable
MPdir        = /opt/cluster/mpi/mvapich1_intel
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a
For MPI Library - libmpi.a
For MPICH Library - libmpich.a

c)set the BLAS Library 
If blas library is not installed installed through 
#yum install blas blas-devel or #yum install atlas atlas-devel  
LAdir        =
LAinc        =
LAlib        = /opt/cluster/benchmark/GotoBLAS/libgoto.a
d)specify the the Compiler 
CC           = /opt/cluster/mpi/mvapich1_intel/bin/mpicc

3)make arch=
For example : Make.Linux_PII_CBLAS
Install HPL
#make arch=Linux_PII_CBLAS 2>&1 | tee OUTPUT.log


How to Calculate the HPL.dat

N Value
it is depant on.
Total Memory size of the cluster
Refer the Below Link.
http://netlib.org/benchmark/hpl/faqs.html#pbsize
As per these.
Nodes - 4
Memory - 256MB
Total no of memory = 4*256=1 GB in bytes 1*1024*1024*1024(Convert GB into Bytes)= 1073741824
Squareroot(4*256*(1024*1024))/8)(convert MB into Bytes)= Answer = 11585.
If you calculate the same value in http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html.
Full Memory = 11585
For 88% efficiency =11585*.88=10194
Final 10194 value is divided by Block size 128.
we have to get Integer value. for these case
10194/128=79.640.
so multiply 80*128=10240
Reason: the Total memory value divided by block size if it is come with integer then that is good to divided and fire the hpl job.

For calculating HPC Theoretical Performance.
HPC Cluster Performance =<Total No of Nodes> * < No Of Cores Per Node> * <CPU Frequency>*<Operation Per Cycle>
= Answer In ( GFLOPS)

Note: HPL Performance fully depends on processor parameters.


1)hpl.dat file Editing
Line 4 : device out (6=stdout,7=stderr,file) [ 6 then only we will get the output ]
Line 7 : 128 Blog size is good
From the Below Line 14 to 19
Even Number 14, 16, 18 are consider as No of problem.
Odd Number 15,17,19 consider the problem value.
Line 14 : 3 # of panel fact [ Better to Mention Problem Size is 1 ]
Line 15 : 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) [ if Problem Size is 1 then we can calculate any one of the face 0 ]
Line 16 : 4 # of recursive stopping criterium
Line 17 : 1 2 4 8 NBMINs (>= 1)
Line 18 : 3 # of panels in recursion
Line 19 : 2 3 4 NDIVs

Gotoblas -
1)Forton & Assemply code implementaion of blas(interface).
it is fastest blas library.
1)Support multi-threads.

Related post