Skip to content

Dylan search function#648

Merged
khemarato merged 28 commits intobuddhist-uni:mainfrom
DylanAustin-TheDreamer:dylan-search-function
Apr 27, 2026
Merged

Dylan search function#648
khemarato merged 28 commits intobuddhist-uni:mainfrom
DylanAustin-TheDreamer:dylan-search-function

Conversation

@DylanAustin-TheDreamer
Copy link
Copy Markdown
Contributor

Added the ability to check one words against database titles.

culasaccakasutta will fetch -> MN 35 Cūḷa Saccaka Sutta: The Shorter Discourse With Saccaka

The function comes with an extensive regex for parsing both the store object titles and user queries.

It is a good start on this issue, and it keeps people engaged with the site I hope.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 11, 2026

Deploy Preview for obu ready!

Name Link
🔨 Latest commit a2d8a89
🔍 Latest deploy log https://app.netlify.com/projects/obu/deploys/69ef290534d59b000752264a
😎 Deploy Preview https://deploy-preview-648--obu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Collaborator

@khemarato khemarato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. A good start! Comments inline.

Comment thread assets/js/search_functions.js Outdated
});
});
}
finalResults = tokenResults.length ? tokenResults : results;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only run findOneWordTitleMatches if the normal search returns no results and there is sutta in the query.

Comment thread assets/js/search_functions.js Outdated
Comment thread assets/js/search_functions.js Outdated
for (var i in store){
const item = store[i];
const title = (item && item.title) ? item.title : "";
const titleMatch = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|SNP|DHP|ITI|THAG|THIG|UD)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/(\bsutta\b).*$/i, "$1").toLowerCase().replace(/[^a-z0-9]/g, "");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this normalization off into a function and add tests for it showing how you expect it to work. Then use that function to precompute these normalized strings in the index build. This will make searched faster and will all us to inspect the normalized strings for errors.

Comment thread assets/js/search_functions.js Outdated
var BMAX = 250; // Max blurb size in characters
var RMAX = 100; // Max number of results to display

const suttaFinder = '<a href="https://name.readingfaithfully.org/" class="btn" target="_blank">Sutta Finder</a>'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this unused variable.

Comment thread assets/js/search_functions.js Outdated
}
}

function findOneWordTitleMatches(query, store) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to findOneWordSuttaTitleMatches to make it clear it's only looking at Pāli suttas

Comment thread assets/js/search_functions.js Outdated
});
});
}
finalResults = results.length ? results : tokenResults = findOneWordSuttaTitleMatches(data.q.trim(), store);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you calling findOneWordSuttaTitleMatches twice?

Comment thread assets/js/search_functions.js Outdated
function findOneWordSuttaTitleMatches(query, store) {
var tokenResults = [];
const normalizedQuery = query.normalize("NFD").replace(/[\u0300-\u036f]/g, "").toLowerCase().replace(/[^a-z0-9]/g, "");
for (var i in store){
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop should be done at index build not on message reply.

Copy link
Copy Markdown
Collaborator

@khemarato khemarato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. This is getting quite close to merge-worthy. Will take a closer look at your parsing logic tomorrow.

Comment thread assets/js/search_index.js
}
});

joinedTitles = normalizeSuttaTitles(store);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. That's what I meant! You got it 🙂

@DylanAustin-TheDreamer
Copy link
Copy Markdown
Contributor Author

DylanAustin-TheDreamer commented Apr 15, 2026 via email

Comment thread assets/js/search_functions.js Outdated
const item = obj[i];
if (!item || item.type !== "content" || item.category !== "canon") continue;
const title = item.title || "";
const titleJoin = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|SNP|DHP|ITI|THAG|THIG|UD)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/(\bsutta\b).*$/i, "$1").toLowerCase().replace(/[^a-z0-9]/g, "");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now, there are some bugs with this function, for a quick sampling:

  • ma220 is right now returning ma220aritthasutrathediscourseonknowingthebetterwaytocatchasnake when it should be aritthasutra
  • ma128 is giving ma128upasakasutradiscourseonthewhitecladdisciple instead of upasakasutra
  • thequestionsofkingmalindaanabridgementofthemilindapanha should probably be skipped
  • themahasatipatthanasutta and theuppatipatikasutta and theyogasutra shouldn't include the the at the beginning
  • The Thig / Thag entries are not working right. For example: subhajivakambavanikatherigathasubhaofjivakasmangogrove or punnatherigathapunnika
  • for ma80, we should just filter it out instead of giving ma80theroughcloth
  • lal26 has lal26dharmacakrapravartanasutrathediscoursethatsetthedharmawheelrolling instead of dharmacakrapravartanasutra

Add tests for these cases and fix the implementation so that these cases pass. In our call tomorrow morning, I can show you how I found these.

@DylanAustin-TheDreamer
Copy link
Copy Markdown
Contributor Author

DylanAustin-TheDreamer commented Apr 15, 2026 via email

Copy link
Copy Markdown
Collaborator

@khemarato khemarato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close! 😸

Comment thread assets/js/tests/search-worker.test.js Outdated
assert.equal(result[0].title, 'upasakasutra');
});

it('integrated more nikaya indexes for parsing - lets test lal', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the it() strings are supposed to read like an English sentence. Obviously that's not required by the machine. It doesn't care. But as a courtesy to your human readers (in this case, me), the it() string should describe the behavior your test addresses as if you were speaking to them (which, of course, you are).

To quote the programming "bible":

a computer language is not just a way of getting a computer to perform operations but rather it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.

Comment thread assets/js/tests/search-worker.test.js Outdated
assert.equal(result[0].title, 'culasaccakasutta');
});

it('remove words after sutta and sutra but using : as a reference', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is just testing sutra and you don't need to mention the implementation details. You just describe what you are testing. In this case, I'd just say 'also handles sutras'

Comment thread assets/js/tests/search-worker.test.js Outdated
assert.equal(result[0].title, 'dharmacakrapravartanasutra');
});

it('it returns a joined title from a thig nikaya leading discourse', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'can parse Therigathas'

Comment thread assets/js/tests/search-worker.test.js Outdated
assert.equal(result[0].title, 'bhalliyatheragatha');
});

it('handles "the" and removes it from a string if it appears at the beginning', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Great!

Comment thread assets/js/search_functions.js Outdated
Comment thread assets/js/tests/search-worker.test.js
Comment thread assets/js/search_functions.js Outdated
const title = item.title || "";
const titleJoin = title.normalize("NFD").replace(/[\u0300-\u036f]/g, "").replace(/^\s*(?:DN|MN|SN|AN|KN|LAL|DA|MA|SA|EA|SNP|DHP|ITI|THAG|THIG|UD|NIDD|CV|BV|AP|JA|PV|VV|KP|PTS)\s*\d+(?:\.\d+)?\s*[:.-]?\s*/i, "").replace(/\s*[:\-–]\s*.*$/, "").toLowerCase().replace(/[^a-z0-9]/g, "");
const removedTheOnJoin = titleJoin.replace(/^\s*(?:the)\s*/i, "");
joinedTitleDatabase.push({
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For filtering out the non-sanskrit/pali titles, you could probably just add a test here, something like if(removedTheOnJoin.includes('sutta') || removedTheOnJoin.includes('sutra') || removedTheOnJoin.includes('gatha')) { then .push({ This will make sure we aren't adding anything that doesn't have one of the "approved" name types.

Copy link
Copy Markdown
Collaborator

@khemarato khemarato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Just a few minor nits on the test cases and then this is good to merge :)

Comment thread assets/js/tests/search-worker.test.js Outdated
});

it('integrated more nikaya indexes for parsing - lets test lal', () => {
it('handles the filtering out of a wide range of nikaya indexes', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't test a wide range of nikaya indexes. The test is fine, but say what it tests. It tests that it can parse a Lal sutra.

Comment thread assets/js/tests/search-worker.test.js
Comment thread assets/js/tests/search-worker.test.js
Comment thread assets/js/tests/search-worker.test.js Outdated
id1: {
title: 'DN 22 The Mahāsatipaṭṭhāna Sutta: The Long Discourse about the Ways of Attending to Mindfulness',
title: 'Audio/Video',
type: 'av',
Copy link
Copy Markdown
Collaborator

@khemarato khemarato Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

av is category not a type. It still has type: 'content'. Also, to make this test truly a test of this filter, you have to give it a title that looks like a sutta!

Comment thread assets/js/tests/search-worker.test.js
Comment thread assets/js/tests/search-worker.test.js Outdated
});

it('handles the filtering out of a wide range of nikaya indexes', () => {
it('tests that it can parse a lal sutra', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The it is describing the function being tested 😅 it('can parse a lal sutra', We know it's a test already!

Comment thread assets/js/tests/search-worker.test.js
@khemarato khemarato merged commit dee7f08 into buddhist-uni:main Apr 27, 2026
4 checks passed
@DylanAustin-TheDreamer
Copy link
Copy Markdown
Contributor Author

DylanAustin-TheDreamer commented Apr 27, 2026 via email

@khemarato
Copy link
Copy Markdown
Collaborator

Congratulations 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants