Write a nearestpdf finder and local cache for CORE API#629
Conversation
✅ Deploy Preview for obu ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[skip ci]
[skip ci]
[skip ci]
[skip ci]
fc1ebcc to
463bf95
Compare
|
For the final combination of the title match scores and the document cosine similarities:
Fitting a Logistic Regression to the above scatterplot yields the equation: Adding in a hundred manual examples from PDF similarities as well, that version was more conservative, giving a cutoff equation of After playing around on Desmos I found a good line between the two regressions at Playing around with normalizing the Logistic curve, I found that dividing the Z score by 3 gives reasonable P values in the range [0.5, 0.953) over the output domain. |
[skip ci]



We added (in #626 ) a title -> file name matching algo. Here we add a content matching algo and the code to use it for deduping files. Then we add the code for combining the filename signal with the content signals to get an overall matching algo capable of saying whether we already have a given CORE API work.
This PR also adds the local SQLite manager which will oversee pulling data from the CORE API.