Opencl big-bang(Nbody) benchmark for multi device systems(heterogeneous or not) Hi, heres a benchmark for all of you benchmarkers. Needs java and opencl drivers installed. New version! vector-type(vliw,mmx,avx) workload resolution increased. Benchmark in Rar file(English): http://www.fileswap.com/dl/zaZqomw3bk/ If WINDOWS.BAT does not work, here is alternative: http://www.fileswap.com/dl/h69qAKCfd7/ Benchmark with alternative batch file(English): http://www.fileswap.com/dl/qJiBMxfYzw/ Youtube video shows usage: http://www.youtube.com/watch?v=YNj10f-7yTA When you dont have java: http://java.com/en/download/index.jsp Drivers: INTEL(HD3000 has problems, HD4000 working, no problem for CPU part, Xeon-Phi included) http://software.intel.com/en-us/vcsource/tools/opencl-sdk AMD(AMD APP SDK) may or may not be needed http://developer.amd.com/tools-and-...erated-parallel-processing-app-sdk/downloads/ NVIDIA opencl http://www.nvidia.com/Download/index.aspx?lang=en-us https://developer.nvidia.com/cuda-toolkit-31-downloads Just extract archive to a folder. Then run the WINDOWS.BAT or LINUX.SH to run the benchmark. If those dont work, just double click the jar file to run in that extraction folder. If you have MacOSX or Solaris, just run the .jar file. This benchmark is written on Java using (jocl) + (jmonkey engine 3.0) by me. This program uses opencl power to simulate a big-bang of 25600, 51200, 102400 particles just like a super-nova (without relativistic approach) on any opencl-capable device(or combination of devices) to get benchmark points related to cumpute performance(linearly dependant to gflops value but exponentially dependant to particle number because this is full O(N^2) ). For example, if you get 1000 points in 51200 particle test, you will likely get 250 points in 102400 particle test. Ofcourse gpu will be better performing with more particles so you will get 300 instead of 250. 25600 particle test is easiest and is related with host(cpu), device (cpu-gpu-accelerator-coprocessor) and ram performances. VLIW4(and 5), SSE, AVX, GCN, CUDA, float16(Xeon-phi) capability included. Opencl is slower than CUDA for same Nvidia hardware, so Nvidia scores will be multiplied by 1.2 to have an idea about how CUDA performance would be. Old but the only source I could find for this multiplier: http://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdf Theres scaling for at least 4 cards or even heterogeneous device combinations like nvidia+amd+intel's HD4000. I will add any scores with screenshots of score results and cpu-z gpu-z. Warning: this will heat like prime95 or furmark in the process. Dont use a low gflops cpu with a high gflops GPU because GPU will mostly be idle waiting for CPU periodically plus increasing(and suddenly decreasing) temperature on itself(if you do this, you will need a solder-gun, GTX700 series may be immune to this because they have temperature-lock mechanism, I dont know) Absolutely no harm for similar-performance heterogeneous devices. Example: HD7850+GTX660 or 2*Xeons and a HD4830 For two or more devices used, you can use any workload ratio such as 50-50(crossfire or two), 30-35-35(triple sli but first card is doing the video output work so decreased opencl load), 25-25-25-25(4*opteron) Some scores from another forum for 25600 particles: GT650M (1000/1500) ----->1371 points (i7 3610QM 3.1GHz) Intel-HD4000 (1100/800) ----->482 points (i7 3610QM 3.1GHz) Xeon-E3-1270-V2 (3500MHz-7 threads) ----->268 points i7-3770K (4500MHz-7 threads) ----->207 points (igpu off) FX8150 (4300MHz-7 threads) ----->142 points Full list: http://forum.donanimhaber.com/m_75692871/tm.htm Submissions from this thread: __________25600 particles(for high-end cpu or low-end gpu)(1 point=1 calculation cycle)(1 cycle = 0.375 Gfloats )_________ TwoPlusTwo: Titan SLI @ 1150/3213 & 1137/3213------>11156 points ( 3930k @ 4.6 GHz, DDR-2133) 4183/s Gflops ---TK---: GTX6804GB SLI (1254/7200) ------>8692 points ( i7 2600k 4.7Ghz HT Off, DDR3-2133) 3259/s Gflops Tugrul_512bit: HD7870 (1200/1320)(13.4 WHQL) ----->3982 points (FX8150@4300MHz, DDR3-1866) 1493 Gflop/s(with Win 7 x64 SP1) ricardonuno1980: GTX480 (701/1848)(320.14Beta) ------>2601 points ( i5-2500K@4.5GHz, DDR3-1600) 975 Gflops/s(with Win 7 x64 SP1) __________51200 particles(for 2-3 gpu's)(1 point=1 calculation cycle)(1 cycle = 1.5 Gfloats )_________ Tugrul_512bit: HD7870 (1200/1320)(13.4 WHQL) ----->1140 points (FX8150@4300MHzde, DDR3-1866) 1710 Gflop/s(with Win 7 x64 SP1) __________102400 particles(for many gpus)(1 point=1 calculation cycle)(1 cycle = 6.0 Gfloats )_________ Tugrul_512bit: HD7870 (1200/1320)(13.4 WHQL) ----->323points (FX8150@4300MHzde, DDR3-1866) 1938 Gflop/s(with Win 7 x64 SP1) Important note: If you use your CPU for benchmarking, you can change the number of cores used for computing and used for data-moving/rendering from the scrollbar under benchmark buttons(slider to left for max cores computing, slider to right for minimum cores for computing) Thanks. Dont mind below: High performance computing, benchmark, OpenCL scaling, jmonkey engine 3.0, jocl, galaxy simulation, nbody benchmark. Xeon-phi, Titan, multi-thread. Rise against injustice. Megadeth.