datasketches
  • Distinct Counting
  • Quantiles Sketches
  • Frequency Sketches
  • Vector Sketches
  • Random Sampling Sketches
  • Helper Classes
datasketches
  • Apache DataSketches
  • View page source

Apache DataSketches

DataSketches are highly-efficient algorithms to analyze big data quickly.

Counting Distincts

  • Distinct Counting
    • HyperLogLog (HLL)
    • Compressed Probabilistic Counting (CPC)
    • Theta Sketch
    • Tuple Sketch

Quantile Estimation

  • Quantiles Sketches
    • KLL Sketch
    • Relative Error Quantiles (REQ) Sketch
    • t-digest
    • Quantiles Sketch (Deprecated)

Frequency Sketches

This problem may also be known as heavy hitters or TopK

  • Frequency Sketches
    • Frequent Items
    • CountMin Sketch

Vector Sketches

  • Vector Sketches
    • Density Sketch

Random Sampling

  • Random Sampling Sketches
    • Variance Optimal Sampling (VarOpt)
    • Exact and Bounded, Probabilitiy Proportional to Size (EBPPS) Sampling

Helper Classes

  • Helper Classes
    • Serialize/Deserialize (SerDe)
    • Jaccard Similarity
    • Tuple Policy
    • Kolmogorov-Smirnov Test
    • Kernel Function

Note

This project is under active development.

Indices and tables

  • Index

  • Module Index

  • Search Page

Next

© Copyright 2023.

Built with Sphinx using a theme provided by Read the Docs.