OpenMP
This page contains information on GCC's implementation of the OpenMP specification and related functionality like the auto parallelizer (-ftree-parallelize-loops).
As of GCC 4.2, the compiler implements version 2.5 of the OpenMP specification, as of 4.4 it implements version 3.0 and since GCC 4.7 it supports the OpenMP 3.1 specification. GCC 4.9 supports OpenMP 4.0 for C/C++, GCC 4.9.1 also for Fortran. GCC 5 adds support for Offloading. OpenMP 4.5 is supported for C/C++ since GCC 6 and since GCC 7 for Fortran (with omissions, largest missing item is structure element mapping). Since GCC 9, there is initial OpenMP 5 support (essentially C/C++, only). GCC 10 added some more features, mainly for C/C++ but also for Fortran.
GCC 11 extended the Fortran compiler to fully support OpenMP 4.5 and support more OpenMP 5.0; additionally, nonrectangular loops are now supported. GCC 12 has a growing support of OpenMP 5.0 and first support of OpenMP 5.1 features.
GCC 13 has a still growing support of OpenMP 5.0, extends the support of OpenMP 5.1 features and very limited first support of OpenMP 5.2 features. While GCC 14 is extending OpenMP 5.x support, mostly by integrating features from the OG12/OG13 branches.
GCC 15 extends the OpenMP 5.0 and 5.1 support and contains the first OpenMP 6.0 features.
OG15 branch (branch: 'devel/omp/gcc-15': Based on the GCC 15 release branch, OG15 contains several commits extending the OpenMP, OpenACC, and offloading support. Those additions are either only in GCC mainline/16 or still have to be upstreamed.
See also
Documentation on libgomp (GNU Offloading and Multi Processing Runtime Library).
GOMP Project homepage — includes by-release implementation status
Release Notes
GCC 9 Changes – contains a TODO list
OpenMP Documentation
Documentation on libgomp (GNU Offloading and Multi Processing Runtime Library).
OpenMP Specification page – includes the specifications (also available as book), the additional definition documents, and the offical OpenMP example documents as well as technical reports
Automatic Parallelization
(-ftree-parallelize-loops)
- Streamization
Test Suites and Benchmarks
OpenMP Validation Suite by HLRS, Univ. Stuttgart and Univ. of Houston (2007 version at UH: README, Download)
OpenMP task test suite by BSC
Rodinia Benchmark suite (2): OpenMP, OpenCL, CUDA benchmark
TODO List
(Outdated!)
Feel free to add new items to this list as you run into issues or features that would be interesting to add. Send mail to the list and/or the GCC OpenMP maintainers if any item in this list sounds interesting but is hard to understand.
Fine tune the auto scheduling feature for parallel loops.
Implement untied tasks (no compliance issue; needs to be well tuned to be actually faster; cf. page 53 of pdf) - see also next item
Tasks need some tuning in taskwait. Cf. GCC email, comparison. Algorithms: PDF 1, PDF 2, PDF 3
A comparison for tasks between libgomp, another library (gcc+libKOMP) and Intel (cf. benchmark on p. 10); see also another IOMP 2012 comparison
taskyield is a stub and mergeable task clonning could be optimized
OpenMP 4.0 (specifications released on July 2013). Cf. slides at the IWOMP, the International Workshop for OpenMP (slides and tutorials) in June 2010 and the OWOMP 2010 proceedings. There is also a blog entry. See also IWOMP 2012's talks and the committee report. SC2011 (November 2011): OpenMP Lang Committee Report, CEO report
And the 2012 slides: IWOMP program, Language Committee report
And the OpenMP 4.0 release candidate documents: OpenMP v4.0rc2 specification (March 2013) (OpenMP 4.0 API forum, rc2 changes)
And a comment prior to the final release: OpenMP 4.0 about to be released and IWOMP 2013, by Michael Wong (May 2013)