1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
|
.\" Copyright, the authors of the Linux man-pages project
.\"
.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
.\" may be freely modified and distributed
.\" %%%LICENSE_END
.\"
.TH FUTEX_LOCK_PI 2const (date) "Linux man-pages (unreleased)"
.SH NAME
FUTEX_LOCK_PI \- lock a priority-inheritance futex
.SH LIBRARY
Standard C library
.RI ( libc ,\~ \-lc )
.SH SYNOPSIS
.nf
.BR "#include <linux/futex.h>" " /* Definition of " FUTEX_* " constants */"
.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
.B #include <unistd.h>
.P
.BI "long syscall(SYS_futex, uint32_t *" uaddr ", FUTEX_LOCK_PI, 0,"
.BI " const struct timespec *" timeout );
.fi
.SH DESCRIPTION
This operation is used after an attempt to acquire
the lock via an atomic user-mode instruction failed
because the futex word has a nonzero value\[em]specifically,
because it contained the (PID-namespace-specific) TID of the lock owner.
.P
The operation checks the value of the futex word at the address
.IR uaddr .
If the value is 0, then the kernel tries to atomically set
the futex value to the caller's TID.
If the futex word's value is nonzero,
the kernel atomically sets the
.B FUTEX_WAITERS
bit, which signals the futex owner that it cannot unlock the futex in
user space atomically by setting the futex value to 0.
.\" tglx (July 2015):
.\" The operation here is similar to the FUTEX_WAIT logic. When the user
.\" space atomic acquire does not succeed because the futex value was non
.\" zero, then the waiter goes into the kernel, takes the kernel internal
.\" lock and retries the acquisition under the lock. If the acquisition
.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
.\" the lock owner that it needs to go into the kernel. Here is the pseudo
.\" code:
.\"
.\" lock(kernel_lock);
.\" retry:
.\"
.\" /*
.\" * Owner might have unlocked in user space before we
.\" * were able to set the waiter bit.
.\" */
.\" if (atomic_acquire(futex) == SUCCESS) {
.\" unlock(kernel_lock());
.\" return 0;
.\" }
.\"
.\" /*
.\" * Owner might have unlocked after the above atomic_acquire()
.\" * attempt.
.\" */
.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
.\" goto retry;
.\"
.\" queue_waiter();
.\" unlock(kernel_lock);
.\" block();
.\"
After that, the kernel:
.IP (1) 5
Tries to find the thread which is associated with the owner TID.
.IP (2)
Creates or reuses kernel state on behalf of the owner.
(If this is the first waiter, there is no kernel state for this
futex, so kernel state is created by locking the RT-mutex
and the futex owner is made the owner of the RT-mutex.
If there are existing waiters, then the existing state is reused.)
.IP (3)
Attaches the waiter to the futex
(i.e., the waiter is enqueued on the RT-mutex waiter list).
.P
If more than one waiter exists,
the enqueueing of the waiter is in descending priority order.
(For information on priority ordering, see the discussion of the
.BR SCHED_DEADLINE ,
.BR SCHED_FIFO ,
and
.B SCHED_RR
scheduling policies in
.BR sched (7).)
The owner inherits either the waiter's CPU bandwidth
(if the waiter is scheduled under the
.B SCHED_DEADLINE
policy) or the waiter's priority (if the waiter is scheduled under the
.B SCHED_RR
or
.B SCHED_FIFO
policy).
.\" August 2015:
.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
.\" processes, does the nice value come into play also?
.\"
.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
This inheritance follows the lock chain in the case of nested locking
.\" (i.e., task 1 blocks on lock A, held by task 2,
.\" while task 2 blocks on lock B, held by task 3)
and performs deadlock detection.
.P
The
.I timeout
argument provides a timeout for the lock attempt.
If
.I timeout
is not NULL, the structure it points to specifies
an absolute timeout.
If
.I timeout
is NULL, the operation will block indefinitely.
.SH RETURN VALUE
On error,
\-1 is returned,
and
.I errno
is set to indicate the error.
.P
On success,
.B FUTEX_LOCK_PI
returns 0 if the futex was successfully locked.
.SH ERRORS
See
.BR futex (2).
.TP
.B EAGAIN
The futex owner thread ID of
.I uaddr
is about to exit,
but has not yet handled the internal state cleanup.
Try again.
.TP
.B EDEADLK
The futex word at
.I uaddr
is already locked by the caller.
.TP
.B EFAULT
.I timeout
did not point to a valid user-space address.
.TP
.B EINVAL
The supplied
.I timeout
argument was invalid
.RI ( tv_sec
was less than zero, or
.I tv_nsec
was not less than 1,000,000,000).
.TP
.B EINVAL
The kernel detected an inconsistency between the user-space state at
.I uaddr
and the kernel state.
This indicates either state corruption
or that the kernel found a waiter on
.I uaddr
which is waiting via
.BR FUTEX_WAIT (2const)
or
.BR FUTEX_WAIT_BITSET (2const).
.TP
.B ENOMEM
The kernel could not allocate memory to hold state information.
.TP
.B ENOSYS
A run-time check determined that the operation is not available.
The PI-futex operations are not implemented on all architectures and
are not supported on some CPU variants.
.TP
.B EPERM
The caller is not allowed to attach itself to the futex at
.IR uaddr .
(This may be caused by a state corruption in user space.)
.TP
.B ESRCH
The thread ID in the futex word at
.I uaddr
does not exist.
.TP
.B ETIMEDOUT
The timeout expired before the operation completed.
.\"
.SH STANDARDS
Linux.
.SH HISTORY
Linux 2.6.18.
.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
.SH CAVEATS
Unlike other
.BR futex (2)
operations,
the timeout is measured against the
.B CLOCK_REALTIME
clock.
.\" 2016-07-07 response from Thomas Gleixner on LKML:
.\" From: Thomas Gleixner <tglx@linutronix.de>
.\" Date: 6 July 2016 at 20:57
.\" Subject: Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
.\"
.\" On Thu, 23 Jun 2016, Michael Kerrisk (man-pages) wrote:
.\" > On 06/23/2016 08:28 PM, Darren Hart wrote:
.\" > > And as a follow-on, what is the reason for FUTEX_LOCK_PI only using
.\" > > CLOCK_REALTIME? It seems reasonable to me that a user may want to wait a
.\" > > specific amount of time, regardless of wall time.
.\" >
.\" > Yes, that's another weird inconsistency.
.\"
.\" The reason is that phtread_mutex_timedlock() uses absolute timeouts based on
.\" CLOCK_REALTIME. glibc folks asked to make that the default behaviour back
.\" then when we added LOCK_PI.
.\"
.SH SEE ALSO
.BR futex (2)
|