Dylan search function#648
Conversation
…hist-uni.github.io into dylan-search-function
…hist-uni.github.io into dylan-search-function
…hist-uni.github.io into dylan-search-function
…hist-uni.github.io into dylan-search-function
✅ Deploy Preview for obu ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
khemarato
left a comment
There was a problem hiding this comment.
Okay. A good start! Comments inline.
| }); | ||
| }); | ||
| } | ||
| finalResults = tokenResults.length ? tokenResults : results; |
There was a problem hiding this comment.
Only run findOneWordTitleMatches if the normal search returns no results and there is sutta in the query.
| for (var i in store){ | ||
| const item = store[i]; | ||
| const title = (item && item.title) ? item.title : ""; | ||
| const titleMatch = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|SNP|DHP|ITI|THAG|THIG|UD)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/(\bsutta\b).*$/i, "$1").toLowerCase().replace(/[^a-z0-9]/g, ""); |
There was a problem hiding this comment.
Split this normalization off into a function and add tests for it showing how you expect it to work. Then use that function to precompute these normalized strings in the index build. This will make searched faster and will all us to inspect the normalized strings for errors.
| var BMAX = 250; // Max blurb size in characters | ||
| var RMAX = 100; // Max number of results to display | ||
|
|
||
| const suttaFinder = '<a href="https://name.readingfaithfully.org/" class="btn" target="_blank">Sutta Finder</a>' |
There was a problem hiding this comment.
remove this unused variable.
| } | ||
| } | ||
|
|
||
| function findOneWordTitleMatches(query, store) { |
There was a problem hiding this comment.
Let's rename this to findOneWordSuttaTitleMatches to make it clear it's only looking at Pāli suttas
| }); | ||
| }); | ||
| } | ||
| finalResults = results.length ? results : tokenResults = findOneWordSuttaTitleMatches(data.q.trim(), store); |
There was a problem hiding this comment.
Why are you calling findOneWordSuttaTitleMatches twice?
| function findOneWordSuttaTitleMatches(query, store) { | ||
| var tokenResults = []; | ||
| const normalizedQuery = query.normalize("NFD").replace(/[\u0300-\u036f]/g, "").toLowerCase().replace(/[^a-z0-9]/g, ""); | ||
| for (var i in store){ |
There was a problem hiding this comment.
This loop should be done at index build not on message reply.
… store items with new normalized titles
…hist-uni.github.io into dylan-search-function
khemarato
left a comment
There was a problem hiding this comment.
Okay. This is getting quite close to merge-worthy. Will take a closer look at your parsing logic tomorrow.
| } | ||
| }); | ||
|
|
||
| joinedTitles = normalizeSuttaTitles(store); |
There was a problem hiding this comment.
Oh yeah. That's what I meant! You got it 🙂
|
yay!
…On Wed, Apr 15, 2026 at 10:22 AM Khemarato Bhikkhu ***@***.***> wrote:
***@***.**** commented on this pull request.
Okay. This is getting quite close to merge-worthy. Will take a closer look
at your parsing logic tomorrow.
------------------------------
In assets/js/search_index.js
<#648 (comment)>
:
> @@ -69,6 +69,8 @@ var idx = lunr(function () {
}
});
+joinedTitles = normalizeSuttaTitles(store);
Oh yeah. That's what I meant! You got it 🙂
—
Reply to this email directly, view it on GitHub
<#648 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BJCKZITWIRVKDHSQV5RYES34V5IFBAVCNFSM6AAAAACXU77VDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DCMJSGM2DCMJVGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
| const item = obj[i]; | ||
| if (!item || item.type !== "content" || item.category !== "canon") continue; | ||
| const title = item.title || ""; | ||
| const titleJoin = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|SNP|DHP|ITI|THAG|THIG|UD)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/(\bsutta\b).*$/i, "$1").toLowerCase().replace(/[^a-z0-9]/g, ""); |
There was a problem hiding this comment.
So right now, there are some bugs with this function, for a quick sampling:
- ma220 is right now returning
ma220aritthasutrathediscourseonknowingthebetterwaytocatchasnakewhen it should bearitthasutra - ma128 is giving
ma128upasakasutradiscourseonthewhitecladdiscipleinstead ofupasakasutra thequestionsofkingmalindaanabridgementofthemilindapanhashould probably be skippedthemahasatipatthanasuttaandtheuppatipatikasuttaandtheyogasutrashouldn't include thetheat the beginning- The Thig / Thag entries are not working right. For example:
subhajivakambavanikatherigathasubhaofjivakasmangogroveorpunnatherigathapunnika - for ma80, we should just filter it out instead of giving
ma80theroughcloth - lal26 has
lal26dharmacakrapravartanasutrathediscoursethatsetthedharmawheelrollinginstead ofdharmacakrapravartanasutra
Add tests for these cases and fix the implementation so that these cases pass. In our call tomorrow morning, I can show you how I found these.
|
perfect I will document this and I look forward to our meeting. Thank you
for your time - (this one is a tough cookie but I'll get there)
…On Wed, Apr 15, 2026 at 1:21 PM Khemarato Bhikkhu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In assets/js/search_functions.js
<#648 (comment)>
:
> @@ -1,6 +1,24 @@
// Parameters
var BMAX = 250; // Max blurb size in characters
var RMAX = 100; // Max number of results to display
+var joinedTitles = []
+
+function normalizeSuttaTitles (obj) {
+ var joinedTitleDatabase = []
+
+ for (var i in obj){
+ const item = obj[i];
+ if (!item || item.type !== "content" || item.category !== "canon") continue;
+ const title = item.title || "";
+ const titleJoin = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|SNP|DHP|ITI|THAG|THIG|UD)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/(\bsutta\b).*$/i, "$1").toLowerCase().replace(/[^a-z0-9]/g, "");
So right now, there are some bugs with this function, for a quick sampling:
- ma220 is right now returning
ma220aritthasutrathediscourseonknowingthebetterwaytocatchasnake when
it should be aritthasutra
- ma128 is giving ma128upasakasutradiscourseonthewhitecladdisciple
instead of upasakasutra
- thequestionsofkingmalindaanabridgementofthemilindapanha should
probably be skipped
- themahasatipatthanasutta and theuppatipatikasutta and theyogasutra
shouldn't include the the at the beginning
- The Thig / Thag entries are not working right. For example:
subhajivakambavanikatherigathasubhaofjivakasmangogrove or
punnatherigathapunnika
- for ma80, we should just filter it out instead of giving
ma80theroughcloth
- lal26 has
lal26dharmacakrapravartanasutrathediscoursethatsetthedharmawheelrolling
instead of dharmacakrapravartanasutra
Add tests for these cases and fix the implementation so that these cases
pass. In our call tomorrow morning, I can show you how I found these.
—
Reply to this email directly, view it on GitHub
<#648 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BJCKZIWJDSSPTKVZP7ICEUL4V55F3AVCNFSM6AAAAACXU77VDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DCMJTGQ3TINRQHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
…hist-uni.github.io into dylan-search-function
…te: remove all text after : for scale
…hist-uni.github.io into dylan-search-function
…hist-uni.github.io into dylan-search-function
…w my feature works
| assert.equal(result[0].title, 'upasakasutra'); | ||
| }); | ||
|
|
||
| it('integrated more nikaya indexes for parsing - lets test lal', () => { |
There was a problem hiding this comment.
the it() strings are supposed to read like an English sentence. Obviously that's not required by the machine. It doesn't care. But as a courtesy to your human readers (in this case, me), the it() string should describe the behavior your test addresses as if you were speaking to them (which, of course, you are).
To quote the programming "bible":
a computer language is not just a way of getting a computer to perform operations but rather it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.
| assert.equal(result[0].title, 'culasaccakasutta'); | ||
| }); | ||
|
|
||
| it('remove words after sutta and sutra but using : as a reference', () => { |
There was a problem hiding this comment.
This test is just testing sutra and you don't need to mention the implementation details. You just describe what you are testing. In this case, I'd just say 'also handles sutras'
| assert.equal(result[0].title, 'dharmacakrapravartanasutra'); | ||
| }); | ||
|
|
||
| it('it returns a joined title from a thig nikaya leading discourse', () => { |
There was a problem hiding this comment.
'can parse Therigathas'
| assert.equal(result[0].title, 'bhalliyatheragatha'); | ||
| }); | ||
|
|
||
| it('handles "the" and removes it from a string if it appears at the beginning', () => { |
| const title = item.title || ""; | ||
| const titleJoin = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|KN|LAL|DA|MA|SA|EA|SNP|DHP|ITI|THAG|THIG|UD|NIDD|CV|BV|AP|JA|PV|VV|KP|PTS)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/\s*[:\-–]\s*.*$/, "").toLowerCase().replace(/[^a-z0-9]/g, ""); | ||
| const removedTheOnJoin = titleJoin.replace(/^\s*(?:the)\s*/i, ""); | ||
| joinedTitleDatabase.push({ |
There was a problem hiding this comment.
For filtering out the non-sanskrit/pali titles, you could probably just add a test here, something like if(removedTheOnJoin.includes('sutta') || removedTheOnJoin.includes('sutra') || removedTheOnJoin.includes('gatha')) { then .push({ This will make sure we aren't adding anything that doesn't have one of the "approved" name types.
…hist-uni.github.io into dylan-search-function
khemarato
left a comment
There was a problem hiding this comment.
Okay. Just a few minor nits on the test cases and then this is good to merge :)
| }); | ||
|
|
||
| it('integrated more nikaya indexes for parsing - lets test lal', () => { | ||
| it('handles the filtering out of a wide range of nikaya indexes', () => { |
There was a problem hiding this comment.
This test doesn't test a wide range of nikaya indexes. The test is fine, but say what it tests. It tests that it can parse a Lal sutra.
| id1: { | ||
| title: 'DN 22 The Mahāsatipaṭṭhāna Sutta: The Long Discourse about the Ways of Attending to Mindfulness', | ||
| title: 'Audio/Video', | ||
| type: 'av', |
There was a problem hiding this comment.
av is category not a type. It still has type: 'content'. Also, to make this test truly a test of this filter, you have to give it a title that looks like a sutta!
| }); | ||
|
|
||
| it('handles the filtering out of a wide range of nikaya indexes', () => { | ||
| it('tests that it can parse a lal sutra', () => { |
There was a problem hiding this comment.
The it is describing the function being tested 😅 it('can parse a lal sutra', We know it's a test already!
|
oh my..... took 3 years but we got it!
…On Mon, Apr 27, 2026 at 10:41 AM Khemarato Bhikkhu ***@***.***> wrote:
Merged #648
<#648> into
main.
—
Reply to this email directly, view it on GitHub
<#648 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BJCKZISYBUU2PNBQDG7AIL34X4TNBAVCNFSM6AAAAACXU77VDWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMRUHA4TSNJVGM4TQMY>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
<buddhist-uni/buddhist-uni.github.io/pull/648/issue_event/24899553983@
github.com>
|
|
Congratulations 😊 |
Added the ability to check one words against database titles.
culasaccakasutta will fetch -> MN 35 Cūḷa Saccaka Sutta: The Shorter Discourse With Saccaka
The function comes with an extensive regex for parsing both the store object titles and user queries.
It is a good start on this issue, and it keeps people engaged with the site I hope.