Skip to content

Allow custom default address spaces and parse p- specifications in the datalayout string #143182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

xdoardo
Copy link

@xdoardo xdoardo commented Jun 29, 2025

Some targets, such as CHERI, use as default an address space different from the "normal" default address space 0 (in the case of CHERI, 200 is used). Currently, rustc does not allow to specify custom address spaces and does not take into consideration p- specifications in the datalayout string.

This patch tries to mitigate these problems by allowing targets to define a custom default address space (while keeping the default value to address space 0) and adding the code to parse the p- specifications in rustc_abi. The main changes are that TargetDataLayout now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space; furthermore, the two pointer_size and pointer_align fields in TargetDataLayout are replaced with an FxHashMap that holds info for all the possible address spaces, as parsed by the p- specifications.

The potential performance drawbacks of not having ad-hoc fields for the default address space will be tested in this PR's CI run.

r? workingjubilee

@rustbot
Copy link
Collaborator

rustbot commented Jun 29, 2025

workingjubilee is currently at their maximum review capacity.
They may take a while to respond.

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 29, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 29, 2025

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

These commits modify compiler targets.
(See the Target Tier Policy.)

Some changes occurred in compiler/rustc_ast_lowering/src/format.rs

cc @m-ou-se

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

Some changes occurred in cfg and check-cfg configuration

cc @Urgau

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@workingjubilee
Copy link
Member

@bors2 try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jun 29, 2025

⌛ Trying commit ef6e4ef with merge a6d8fc2

To cancel the try build, run the command @bors2 try cancel.

rust-bors bot added a commit that referenced this pull request Jun 29, 2025
Allow custom default address spaces and parse `p-` specifications in the datalayout string

Some targets, such as CHERI, use as default an address space different from the "normal" default address space `0` (in the case of CHERI, [200 is used](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-877.pdf)). Currently, `rustc` does not allow to specify custom address spaces and does not take into consideration [`p-` specifications in the datalayout string](https://llvm.org/docs/LangRef.html#langref-datalayout).

This patch tries to mitigate these problems by allowing target to define a custom default address space (while keeping the default value to address space `0`) and adding the code to parse the `p-` specifications in `rustc_abi`. The main changes are that `TargetDataLayout` now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space: the potential performance drawbacks of not having ad-hoc fields will be tested in this PR's CI run.

r? workingjubilee
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 29, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jun 29, 2025

☀️ Try build successful (CI)
Build commit: a6d8fc2 (a6d8fc279459764136b16847f5b4411d4298525d, parent: 5ca574e85b67cec0a6fc3fddfe398cbe676c9c69)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a6d8fc2): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.0% [0.7%, 1.7%] 8
Regressions ❌
(secondary)
2.0% [0.9%, 4.7%] 38
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [0.7%, 1.7%] 8

Max RSS (memory usage)

Results (primary -2.6%, secondary -8.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.6% [-3.6%, -1.5%] 2
Improvements ✅
(secondary)
-8.0% [-8.0%, -8.0%] 1
All ❌✅ (primary) -2.6% [-3.6%, -1.5%] 2

Cycles

Results (secondary 2.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.4%, 2.7%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 695.013s -> 695.668s (0.09%)
Artifact size: 371.77 MiB -> 372.01 MiB (0.06%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 29, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 29, 2025

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@workingjubilee
Copy link
Member

@bors2 try

@rust-bors
Copy link

rust-bors bot commented Jun 29, 2025

⌛ Trying commit fce06e3 with merge 92687cf

To cancel the try build, run the command @bors2 try cancel.

rust-bors bot added a commit that referenced this pull request Jun 29, 2025
Allow custom default address spaces and parse `p-` specifications in the datalayout string

Some targets, such as CHERI, use as default an address space different from the "normal" default address space `0` (in the case of CHERI, [200 is used](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-877.pdf)). Currently, `rustc` does not allow to specify custom address spaces and does not take into consideration [`p-` specifications in the datalayout string](https://llvm.org/docs/LangRef.html#langref-datalayout).

This patch tries to mitigate these problems by allowing targets to define a custom default address space (while keeping the default value to address space `0`) and adding the code to parse the `p-` specifications in `rustc_abi`. The main changes are that `TargetDataLayout` now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space; furthermore, the two `pointer_size` and `pointer_align` fields in `TargetDataLayout` are replaced with an `FxHashMap` that holds info for all the possible address spaces, as parsed by the `p-` specifications.

The potential performance drawbacks of not having ad-hoc fields for the default address space will be tested in this PR's CI run.

r? workingjubilee
@rust-bors
Copy link

rust-bors bot commented Jun 29, 2025

☀️ Try build successful (CI)
Build commit: 92687cf (92687cf86ed4b58afb2653d78a1c735272295f02, parent: ed2d759783dc9de134bbb3f01085b1e6dbf539f3)

@workingjubilee

This comment was marked as outdated.

@rust-timer

This comment was marked as outdated.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 30, 2025
@workingjubilee

This comment was marked as outdated.

@workingjubilee
Copy link
Member

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (92687cf): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.6%] 4
Regressions ❌
(secondary)
1.1% [0.3%, 1.9%] 29
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.4%, 0.6%] 4

Max RSS (memory usage)

Results (primary -2.3%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
-2.3% [-3.1%, -1.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.3% [-3.1%, -1.4%] 2

Cycles

Results (primary -1.1%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [1.9%, 1.9%] 1
Improvements ✅
(primary)
-1.1% [-1.2%, -0.9%] 2
Improvements ✅
(secondary)
-1.2% [-1.2%, -1.2%] 1
All ❌✅ (primary) -1.1% [-1.2%, -0.9%] 2

Binary size

Results (secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 694.715s -> 695.8s (0.16%)
Artifact size: 371.77 MiB -> 371.86 MiB (0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 30, 2025
@xdoardo xdoardo force-pushed the more-addrspace branch 2 times, most recently from defc7a8 to 3f8b20b Compare June 30, 2025 20:04
@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2025

@bors try
@rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 1, 2025
bors added a commit that referenced this pull request Jul 1, 2025
Allow custom default address spaces and parse `p-` specifications in the datalayout string

Some targets, such as CHERI, use as default an address space different from the "normal" default address space `0` (in the case of CHERI, [200 is used](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-877.pdf)). Currently, `rustc` does not allow to specify custom address spaces and does not take into consideration [`p-` specifications in the datalayout string](https://llvm.org/docs/LangRef.html#langref-datalayout).

This patch tries to mitigate these problems by allowing targets to define a custom default address space (while keeping the default value to address space `0`) and adding the code to parse the `p-` specifications in `rustc_abi`. The main changes are that `TargetDataLayout` now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space; furthermore, the two `pointer_size` and `pointer_align` fields in `TargetDataLayout` are replaced with an `FxHashMap` that holds info for all the possible address spaces, as parsed by the `p-` specifications.

The potential performance drawbacks of not having ad-hoc fields for the default address space will be tested in this PR's CI run.

r? workingjubilee
@bors
Copy link
Collaborator

bors commented Jul 1, 2025

⌛ Trying commit afec259 with merge d439e37...

@bors
Copy link
Collaborator

bors commented Jul 1, 2025

☀️ Try build successful - checks-actions
Build commit: d439e37 (d439e37333c92ecce0cd62a651f35235fd04cb18)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d439e37): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.0% [-1.1%, -0.9%] 4
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary 0.2%, secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.1% [1.2%, 3.2%] 3
Regressions ❌
(secondary)
2.8% [2.7%, 2.9%] 3
Improvements ✅
(primary)
-1.6% [-2.3%, -1.1%] 3
Improvements ✅
(secondary)
-1.9% [-3.0%, -0.6%] 3
All ❌✅ (primary) 0.2% [-2.3%, 3.2%] 6

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 461.796s -> 461.853s (0.01%)
Artifact size: 372.30 MiB -> 371.85 MiB (-0.12%)

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jul 1, 2025
@rust-log-analyzer

This comment has been minimized.

dl.pointer_size = parse_size(s, p)?;
dl.pointer_align = parse_align(a, p)?;
[p, s, a @ ..] if p.starts_with("p") => {
// Some targets, such as CHERI, use the 'f' suffix in the p- spec to signal that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to add more flags upstream for non-integral pointer properties so ignoring any letter for now is great for future proofing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to reduce future-proofing in this code: deviation from the LLVM datalayout string will cause rustc to error, quite deliberately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making rustc correct with respect to address spaces, then if you add a character that would be relevant for how the platform is handled by codegen, why should we ignore it? And if we don't know what it means, why should we assume it is irrelevant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just ignoring things we don't know about is generally bad since we don't know what we are ignoring, so there's a high risk we are ignoring something important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it so that only p and pf are stripped. Do you think that would make sense? Once more flags are added we can take them into account accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems fine. I wonder if there's something we should do in response to "pf" but nothing immediately occurs to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CHERI targets the f is used to signal that the pointer is non integral (or fat). Upstream LLVM, in practice, just ignores it and sets to false the according field in the pointer spec.

Another possibility would be to wait for f to be taken into account at all until a target that uses it is introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair and it probably makes sense to wait until such uses exist. I am planning to introduce 'e', 'n', 'u' flags which are basically finger-grained properties for the current ni:<addrspace> (llvm/llvm-project#105735), but to allow building rustc against the CHERI LLVM fork, ignoring 'f' would be quite nice since it doesn't affect the rest of rustc.

Copy link
Author

@xdoardo xdoardo Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ccc2083 removes the stripping of f. Instead, upon seeing a pX specification, an error that tells the user that the given pointer specification is unknown is returned.

}
}
}
[p, s, _pr, i, a @ ..] if p.starts_with("p") => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[p, s, _pr, i, a @ ..] if p.starts_with("p") => {
[p, s, _pr, a, idc @ ..] if p.starts_with("p") => {

Index size is the last component?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed that. Do you think it should it be, then,

[p, s, a , _pr, i @ ..] if p.starts_with("p") => {

instead?

For reference, the relevant bit from LLVM's datalayout string spec:

p[n]:<size>:<abi>[:<pref>[:<idx>]]

I can also add the pref field to AddressSpaceInfo, which perhaps would make even more sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit ccc2083 adds the parsing of :<pref> with functions to retrieve that value: I set it to default to <abi> when not specified.

I also had to slightly change the utility parse_align function, which took as parameter a &[&str], whose semantics were actually to ignore the potential :<pref>:<more...> following <abi>.

@@ -1104,7 +1287,7 @@ impl Primitive {
// FIXME(erikdesjardins): ignoring address space is technically wrong, pointers in
// different address spaces can have different sizes
// (but TargetDataLayout doesn't currently parse that part of the DL string)
Pointer(_) => dl.pointer_size,
Pointer(a) => dl.pointer_size_in(a),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this Todo can be removed now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -1118,7 +1301,7 @@ impl Primitive {
// FIXME(erikdesjardins): ignoring address space is technically wrong, pointers in
// different address spaces can have different alignments
// (but TargetDataLayout doesn't currently parse that part of the DL string)
Pointer(_) => dl.pointer_align,
Pointer(a) => dl.pointer_align_in(a),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Todo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -298,6 +323,7 @@ impl TargetDataLayout {
/// determined from llvm string.
pub fn parse_from_llvm_datalayout_string<'a>(
input: &'a str,
default_address_space: AddressSpace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be part of the data layout but currently none of alloca/globals/program AS can be used for this, so passing it in seems reasonable.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Jul 3, 2025

Some changes occurred in coverage instrumentation.

cc @Zalathar

@rust-log-analyzer

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
8 participants