Talk:Floating-point arithmetic

This is the talk page for discussing improvements to the Floating-point arithmetic article.
This is not a forum for general discussion of the subject of the article.

Add new text under old text.
New to Wikipedia? Welcome! Learn to edit; get help.

Start a new topic

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2, 3, 4, 5: 3 months

Computing: CompSci Top‑importance

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing

Top

This article has been rated as Top-importance on the project's importance scale.

This article is supported by WikiProject Computer science (assessed as Top-importance).

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Computer science Top‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

Top

This article has been rated as Top-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Archives

This page has archives. Topics inactive for 90 days are automatically archived 1 or more at a time by if there are more than 4.

Digits of precision, a confusing early statement

I have removed the portion after the ellipses from the following text formerly found in the article: "12.345 is a floating-point number in a base-ten representation with five digits of precision...However, 12.345 is not a floating-point number with five base-ten digits of precision." I recognize the distinction made (a number with 5 base-ten digits of precision vs. a base-ten representation of a number with five digits of precision) and I suspect the author intended to observe that a binary representation of 12.345 would not have five base-ten digits of precision, but I can't divine what useful thing is intended to have been communicated there, so I've removed it. If I'm missing something obvious in the interpretation of this line, I suspect many others could, and encourage a more direct explanation if it's replaced. john factorial (talk) 18:44, 24 July 2023 (UTC)[reply]

The sentence was made nonsensical by this revision by someone who mistook 12.3456 for a typo rather than a counterexample: https://en.wikipedia.org/w/index.php?title=Floating-point_arithmetic&diff=prev&oldid=1166821013

I have reverted the changes, and added a little more verbiage to emphasize that 12.3456 is a counterexample. Taylor Riastradh Campbell (talk) 20:56, 24 July 2023 (UTC)[reply]

Counterexample of what?

I have no idea why we're trying to fit a number in 5 digits. ~2025-38812-55 (talk) 06:45, 3 January 2026 (UTC)[reply]

5 is the precision of the floating-point system in the example. But the presentation may not be very good. — Vincent Lefèvre (talk) 08:04, 3 January 2026 (UTC)[reply]

Patriot missile incident

The other day I made an edit clarifying the nature of the Patriot missile incident, based on the public sources already cited. User:Vincent Lefèvre reverted two parts of them:

First, I replaced the link to loss of significance by the simpler word ‘error’, because loss of significance now just redirects to catastrophic cancellation since the old article was deleted. I was loosely involved in this deletion but I don't feel strongly about this; I think the term ‘loss of significance’ is unnecessarily fancy without saying anything more than ‘error’ does, but it's fine, and the error is essentially catastrophic cancellation after all.

Second, I added the text:

The error arose not from the use of floating-point, but from the use of two different unit conversions when representing time in different parts of a calculation.

This text was deleted on the grounds that:

The intent is that there is a single time unit: 0.1s. The issue is that the software assumed that its accuracy did not matter; Skeel says: "this time difference should be in error by only 0.0001%, a truly insignificant amount". Something that may remain true... until a cancellation occurs like here.

But I don't think that is the whole story. The Skeel citation^[1] says (emphasis added):

When Patriot systems were brought into the Gulf conflict, the software was modified (several times) to cope with the high speed of ballistic missiles, for which the system was not originally designed.

At least one of these software modifications was the introduction of a subroutine for converting clock-time more accurately into floating-point. This calculation was needed in about half a dozen places in the program, but the call to the subroutine was not inserted at every point where it was needed. Hence, with a less accurate truncated time of one radar pulse being subtracted from a more accurate time of another radar pulse, the error no longer cancelled.

The designers certainly didn't assume that its accuracy did not matter—if they did assume that, why would they have written a new conversion subroutine for more accurate conversion?

Suppose the floating-point system on the control computer had 30-bit precision (a low estimate for a 48-bit floating-point format). The logic computed something like $C_{1}(t_{1})-C_{0}(t_{0})$ , where $C_{1}(t)$ is (say) the new higher-precision conversion from fixed-point to floating-point giving $0.1\times t\times (1-2^{-30})$ , and $C_{0}(t)$ is (say) the old lower-precision conversion giving $0.1\times t\times (1-2^{-20})$ . There may be an additional floating-point rounding error of about one ulp, but that pales in comparison to the discrepancy between conversion subroutines of about $2^{10}\approx 1000$ ulps in this hypothesis of 30-bit precision (if it were 40-bit precision, then it would be $2^{20}\approx 10^{6}$ ulps, and so on).

In brief, this was a much more mundane software engineering mistake—updating a unit conversion subroutine call in one place but not another, so the units are no longer commensurate—rather than anything you can rightly blame floating-point for.

It's possible that, after long enough uptime, computing $C(t_{1})-C(t_{0})$ rather than $C(t_{1}-t_{0})$ with the same conversion subroutine $C$ could lose enough significant bits due to floating-point rounding error to cause the same problem. But in this case, the problem was using different conversion subroutines $C_{1}$ and $C_{0}$ . And, with at least 30-bit precision, the floating-point rounding error would take a thousand times as long to cause the same problem—over twenty thousand hours before a problem, or about two years and four months of continuous uptime. (I would also guess the format has >30 bits of precision, so it's likely much longer than that.)

This cautionary tale is often used to blame the designers for using floating-point to represent time and to argue that floating-point numbers are incomprehensible black magic where reasoning goes out the window (e.g., on Hacker News and Reddit), even though the underlying story justifies neither of these conclusions. So that's why I think it is important to spell out the actual bug here—incomplete software change caused subtraction of incommensurate (but similar) units. Taylor Riastradh Campbell (talk) 04:16, 17 July 2025 (UTC)[reply]

References

^ Skeel, Robert (July 1992), "Roundoff Error and the Patriot Missile" (PDF), SIAM News, 25 (4): 11, retrieved 2024-11-15

@Taylor Riastradh Campbell:

Just saying "error" would be misleading because in general, one has an error at almost each floating-point operation, and this is often not a major issue (with carefully designed code). What matters here is that the (relative) error is very large due to a catastrophic cancellation as described in the document.

Saying that there are "two different unit conversions" is incorrect, as the time unit is the same in both routines (0.1s), contrary to the Mars Climate Orbiter failure, where the pound-force second and newton-second units were mixed up (so, even with an infinite precision, the failure would still have occurred); the issue here is that there are different approximations in the time calculation, i.e. with different accuracy (see the term "accurate" used by Skeel). This is really related to error analysis (with an infinite precision, there would have been no issues).

— Vincent Lefèvre (talk) 12:30, 17 July 2025 (UTC)[reply]

Just saying "error" would be misleading because in general, one has an error at almost each floating-point operation, and this is often not a major issue (with carefully designed code). What matters here is that the (relative) error is very large due to a catastrophic cancellation as described in the document.

Right, I have no objection to your reverting that part of the change. I was just explaining why I changed the text/link ‘loss of significance’ in the first place.

Saying that there are "two different unit conversions" is incorrect, as the time unit is the same in both routines (0.1s)…the issue here is that there are different approximations in the time calculation, i.e. with different accuracy (see the term "accurate" used by Skeel).

I'm not attached to phrasing it in terms of unit conversions (though I think a better analogy is yards/meters, which differ by about 9%, or quarts/liters, which differ by about 6%, rather than pounds/newtons, which differ by a factor of four). The part that is important to emphasize is the error from subtracting different approximations to the time conversion—because of an incomplete software change that didn't update all the subroutine calls—rather than the error from using floating-point.

Had the control computer used either

C_{0}(t_{1})-C_{0}(t_{0})

or

C_{1}(t_{1})-C_{1}(t_{0})

consistently, whether the older and worse approximation or the newer and better approximation, the incident likely wouldn't have happened even though it used floating-point (until several years of uptime, rather than several dozen hours of uptime). And subtracting different conversion approximations

C_{0}

and

C_{1}

even in exclusively fixed-point arithmetic, or infinite-precision arithmetic, would also have caused the same error. Taylor Riastradh Campbell (talk) 13:13, 17 July 2025 (UTC)[reply]

The old code had to be rewritten to use a more accurate time computation to cope with the high speed of ballistic missiles. So, anyway, even the old code, which used a consistent conversion, was globally not accurate enough. And you do not necessarily need to use the same conversion; two different conversion routines with sufficient accuracy would be enough. With infinite precision, you would have

C_{0}=C_{1}

, so there would be no errors. — Vincent Lefèvre (talk) 13:46, 17 July 2025 (UTC)[reply]

"Fast math" listed at Redirects for discussion

The redirect Fast math has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2026 January 5 § Fast math until a consensus is reached. consarn _{(talck) (contirbuton s)} 19:29, 5 January 2026 (UTC)[reply]

Mention 0 as a special case

It should be mentioned that 0 is a special case, just like ∞, because the underlying scheme (normalized mantissa + exponent) cannot represent it.

AFAIU 0 can be argued to be an edge case of Subnormal numbers, but 0 is supported even when subnormals aren't. Musaran (talk) 12:05, 3 February 2026 (UTC)[reply]

Whether 0 is regarded as a special case depends on the context. With the main definition of floating-point numbers, normalization is not used. In practice, IEEE 754 decimal formats do no use normalization (it is just required when a result is inexact). — Vincent Lefèvre (talk) 15:09, 3 February 2026 (UTC)[reply]

[Skeel-1] Skeel, Robert (July 1992), "Roundoff Error and the Patriot Missile" (PDF), SIAM News, 25 (4): 11, retrieved 2024-11-15

[1]