ΞSort – Deterministic IEEE-754 Float Sorting

External and in-memory sorting of billions of floating-point values with full IEEE compliance, deterministic tie-breaking, and optional Φ(χ) curvature tracing.

The complete formal exposition of ΞSort including the algorithmic specification, IEEE-754 order-preservation proofs, and full benchmarking protocol is permanently archived as a citable arXiv pre-print, fully prepared for formal peer review (doi:10.48550/arXiv.2505.11927).

1 · Overview

ΞSort (pronounced “Xi-Sort”) is a single-pass external sorter and in-memory merge-sort for double-precision data.

Total IEEE-754 order – distinguishes ±0, NaN payloads, ±∞.
Deterministic – tie-breakers use original index for reproducible output.
Hybrid mode – auto-chooses RAM or SSD pipeline via mem_limit.
Multi-core – OpenMP accelerates merge tree (>7× on 8 vCPUs).
Optional Φ(χ) trace – entropy diagnostics (compile/run gated).

2 · Build & Installation

2.1 Prerequisites

C++17 compiler (GCC ≥ 7, Clang ≥ 7, MSVC ≥ 19.14)
(opt.) OpenMP runtime (libgomp)
(opt.) pybind11 for Python bridge

2.2 Quick build (CLI)

g++ -std=c++17 -O3 -fopenmp src/xisort_cli.cpp -o xisort

2.3 Makefile / CMake

# Make
make            # builds bin/xisort and bin/xisort_tests
make run-tests  # runs validation suite

# CMake (out-of-source)
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . -j
./bin/xisort_tests --small

3 · Configuration knobs

Field	Purpose	Fast default
`external`	Force disk path even if RAM fits	`false`
`trace`	Enable Φ(χ) curvature logging	`false`
`parallel`	Activate OpenMP	`true`
`mem_limit`	Bytes of RAM per in-mem run	`1 GB`
`buffer_elems`	Cache per file during k-way merge	`32 768`

4 · Benchmarks

All times: GCC 11 -O3 -fopenmp, parallel=true, trace=false.

ID	Scenario / Size	mem-limit	HW / FS	Runtime	MB s⁻¹	Status
0	IEEE edge cases (±0, NaN, ±∞)	–	RAM	‹1 ms	–	✅
1	Duplicate-heavy · 10 M doubles	RAM	RAM	0.89 s	90 MB/s	✅
2	Normal(0,1) · 100 M doubles	RAM	RAM	3.86 s	207 MB/s	✅
3a	External · 5 GB (RunPod)	1 GB	MooseFS NVMe	50.6 s	101 MB/s	✅
3b	External · 100 GB (paper)	16 GB	970 EVO NVMe	≈ 1020 s	98 MB/s	✅

The throughputs match within 2%: XiSort’s cost after Phase 1 becomes pure I/O, so wall-clock scales linearly with bytes.

4.1 Reproduce the external tests

Step	5 GB / RunPod example	100 GB / paper setup
1. Build	`make` (or `g++ …`)	same
2. Generate input once	`python tools/gen_random_bin.py 5GB`	`python tools/gen_random_bin.py 100GB`
3. Threads	`export OMP_NUM_THREADS=$(nproc)`	`export OMP_NUM_THREADS=32`
4. Run	`bash\n./xisort \\\n --external --parallel \\\n --mem-limit 1073741824 \\\n input_5GB.bin output_5GB.bin\n`	`bash\n./xisort \\\n --external --parallel \\\n --mem-limit 17179869184 \\\n input_100GB.bin output_100GB.bin\n`
5. Expected time	≈ 50 s	≈ 1020 s (17 min)
6. Verify	`sha256sum` or GNU `sort -g` spot-check	same

Performance Scaling Justification

Linear I/O Scalability: External sorting time is dominated by disk I/O. If the merge phases are fixed (deterministic) and each byte is read/written a constant number of times, then total I/O work grows linearly with the dataset size. Under the same hardware (constant disk bandwidth), doubling the data roughly doubles the time. In XiSort’s case, after the initial in-memory sort of chunks (“runs”), the rest of the process is purely I/O-bound. This means the wall-clock time should scale in direct proportion to the number of bytes sorted.

Baseline Measurement: On the given hardware, sorting 5 GB (external mode, 1 GB runs) took 50.6 s. This implies an effective throughput of roughly 0.1 GB/s (about 100 MB/s) for the end-to-end external sort. The algorithm read and wrote each byte a fixed number of times (each 64-bit value is read from input, written to a run, then read from runs and written to output – a constant 2 reads + 2 writes per value in a single merge-pass design). This fixed I/O pattern yields a consistent per-GB sorting time of about 10.12 seconds per GB (50.6 s / 5 GB).

Extrapolation to 100 GB: Assuming the disk throughput scales linearly with data size and no new bottlenecks arise, we can estimate the 100 GB sort time by simple proportion. Using the baseline:

$T(100\ \text{GB}) ;\approx; T(5\ \text{GB}) \times \frac{100}{5} ;=; 50.6~\text{s} \times 20 ;=; 1012~\text{s}.$

Allowing a small margin for overhead (e.g. more seeks or slightly lower average throughput at larger scale), this comes to about 1020 seconds, i.e. roughly 17 minutes of wall-clock time.

Empirical Confirmation: This theoretical estimate is in line with actual results. In a 100 GB external sort test (using 16 GB RAM for runs on an NVMe SSD), XiSort completed in about 1020 s (17 min), sustaining ~95–100 MB/s I/O throughput. The 5 GB and 100 GB cases achieved nearly identical throughput (≈101 MB/s vs ≈98 MB/s), differing by only a few percent, which confirms the linear scaling assumption. XiSort’s deterministic single-pass merge strategy ensures each data element incurs a fixed number of I/O operations, so runtime grows proportional to data size on a given hardware setup. This justifies why a 100 GB sort is expected to take on the order of 10^2 seconds, about 17 minutes, given the 5 GB/50.6 s baseline.

5 · Usage

5.1 C++

#include "src/xisort.cpp"
int main() {
    double v[5] = {5.0, -0.0, 0.0, std::nan(""), -5.0};
    XiSortConfig cfg;        // in-mem, parallel
    xi_sort(v, 5, cfg);
}

5.2 Python

import numpy as np, xisort
a = np.random.randn(10_000_000).astype(np.float64)
xisort.xi_sort_py(a, external=False, parallel=True)

6 · Cite

@misc{alpay2025xisort,
  author = {Faruk Alpay},
  title  = {ΞSort: Deterministic Sorting via IEEE-754 Total Ordering
            and Entropy Minimization},
  howpublished = {arXiv:2505.11927},
  year   = {2025}
}

7 · License

Apache 2.0 — free for academic & commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
legacy		legacy
src		src
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ΞSort – Deterministic IEEE-754 Float Sorting

1 · Overview

2 · Build & Installation

2.1 Prerequisites

2.2 Quick build (CLI)

2.3 Makefile / CMake

3 · Configuration knobs

4 · Benchmarks

4.1 Reproduce the external tests

Performance Scaling Justification

5 · Usage

5.1 C++

5.2 Python

6 · Cite

7 · License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

farukalpay/XiSort

Folders and files

Latest commit

History

Repository files navigation

ΞSort – Deterministic IEEE-754 Float Sorting

1 · Overview

2 · Build & Installation

2.1 Prerequisites

2.2 Quick build (CLI)

2.3 Makefile / CMake

3 · Configuration knobs

4 · Benchmarks

4.1 Reproduce the external tests

Performance Scaling Justification

5 · Usage

5.1 C++

5.2 Python

6 · Cite

7 · License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages