Skip to content

jmfriedt/learning_blas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning BLAS

The objective of this set of programs is to become familiar with the API and compare implementation, performance and results of linear algebra algorithms implemented using BLAS (CPU), cuBLAS (CUDA on GPU), and GSL, and for the Fast Fourier Transform (FFT) using fftw3 (CPU) and cuFFT (CUDA on GPU).

All programs have been tested on either an Intel Xeon W-2295 CPU @ 3.00GHz with its 36 cores, or a T400 GPU with deviceQuery stating

Device 0: "NVIDIA T400 4GB"
  CUDA Driver Version / Runtime Version          12.0 / 12.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 3897 MBytes (4086366208 bytes)
  (006) Multiprocessors, (064) CUDA Cores/MP:    384 CUDA Cores
...

All subdirectories include a Makefile for compiling the programs and a description of the expected output. The focus is on least square optimization so aimed at pseudo-inverse matrix calculation, reaching this result step by step from handling square matrices to rectangular matrices to inverting such matrices. The FFT are computed in the context of correlations assessed as inverse Fourier transform of the product of the Fourier transform of the signal and the pattern.

Start with octave to get the basics on signal processing involved in this demonstration and the expected outcome.

Then blas demonstrates the use of the linear algebra BLAS package to achieve the same result in C++ on a general purpose CPU.

Then cuda demonstrates the use of cuBLAS and cuFFT for processing on an Nvidia GPU.

Finally gsl aims at wrapping BLAS function calls in interfaces removing some of the FORTRAN flavor of the original library.

About

Linear algebra using BLAS, cuBLAS and others

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors