Skip to content

Conversation

SamSalvatico
Copy link

Summary

Please note: I am opening this PR replacing the #7718 to ensure that all commits are signed

As discussed through the Discord channel, I've added a GIN index on the identities column of the users table to speed up this query

const findUserByIdentity = async (target: string, userId: string) =>
    pool.maybeOne<User>(
      sql`
        select ${sql.join(Object.values(fields), sql`,`)}
        from ${table}
        where ${fields.identities}::json#>>'{${sql.identifier([target])},userId}' = ${userId}
      `
    );

That starts slowing down while users are incrementing

Testing

Tested locally running tests and manually invoking APIs

Checklist

  • .changeset
  • unit tests
  • integration tests
  • necessary TSDoc comments
Copy link

github-actions bot commented Sep 15, 2025

COMPARE TO master

Total Size Diff 📈 +502 Bytes

Diff by File
Name Diff
packages/schemas/alterations/next-1757660909-add-identities-index.ts 📈 +428 Bytes
packages/schemas/tables/users.sql 📈 +74 Bytes
@github-actions github-actions bot added size/s and removed size/s labels Sep 15, 2025
@SamSalvatico
Copy link
Author

My fault this time, I named the file as sql instead of ts. Sorry

@SamSalvatico
Copy link
Author

I'm sorry but I can't understand what's going on
I see

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/feat/add-identities-index*:refs/remotes/origin/feat/add-identities-index* +refs/tags/feat/add-identities-index*:refs/tags/feat/add-identities-index*

That is not able to retrieve the branch
But I didn't add the branch to the original repo, I've just added it to my fork

Probably I'm missing something, but I can't understand what is it

@simeng-li
Copy link
Contributor

Hi, @SamSalvatico thanks for your contribution. The PR itself looks straightforward to me. However, I’m not sure about the effectiveness of creating a GIN index on the entire user.identities JSONB field.

In the findUserByIdentity query method, we use the json#>> operator, which doesn’t seem to take advantage of a GIN (JSONB) index. Instead, it always runs a sequential scan. Do you have any references or test results showing that creating a GIN index on the entire identities JSONB field could improve query performance?

@simeng-li
Copy link
Contributor

simeng-li commented Sep 24, 2025

I'm sorry but I can't understand what's going on I see

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/feat/add-identities-index*:refs/remotes/origin/feat/add-identities-index* +refs/tags/feat/add-identities-index*:refs/tags/feat/add-identities-index*

That is not able to retrieve the branch But I didn't add the branch to the original repo, I've just added it to my fork

Probably I'm missing something, but I can't understand what is it

You need to bump the timestamp in the alteration script file you added. Since other alteration scripts have already been merged into master, the CI job compares the head with the master branch to determine which scripts to run. The timestamp in this PR may be outdated and could be rejected.

@simeng-li simeng-li closed this Sep 24, 2025
@simeng-li simeng-li reopened this Sep 24, 2025
@github-actions github-actions bot added size/s and removed size/s labels Sep 24, 2025
@SamSalvatico
Copy link
Author

SamSalvatico commented Sep 24, 2025

I'm sorry but I can't understand what's going on I see

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/feat/add-identities-index*:refs/remotes/origin/feat/add-identities-index* +refs/tags/feat/add-identities-index*:refs/tags/feat/add-identities-index*

That is not able to retrieve the branch But I didn't add the branch to the original repo, I've just added it to my fork
Probably I'm missing something, but I can't understand what is it

You need to bump the timestamp in the alteration script file you added. Since other alteration scripts have already been merged into master, the CI job compares the head with the master branch to determine which scripts to run. The timestamp in this PR may be outdated and could be rejected.

I did update the timestamp and it was the latest in the list, before the merge, but was still failing.

Anyway, we introduced that index in our fork, but we noticed it wasn't hit as expected.
After further analysis, we've discovered that it was because of the Row Security Policies that exist for the users table, that links it to the tenants one.

Given that we had no luck with different indexes, we tried another approach.
As you can see, the query is containing a ::json cast.

where ${fields.identities}::json#>>'{${sql.identifier([target])},userId}' = ${userId}

But we noticed that the identities column, on the db, is declared as JSONB, so it is forcing the casting to the JSON type at runtime, with no specific needs.

We tried removing that cast both in findUserByIdentity and in hasUserWithIdentity, and, executing the findUserByIdentityquery on the same db (about 50k users, all of them containing one identity each), with or without the casting, we noticed that the response time was more than 10X faster (~600ms with cast, ~14/15ms without casting).

What do you think? Would be an idea to "fix" that query before moving on with the indexes?

Or is there a reason for the ::json cast to happen?

@simeng-li
Copy link
Contributor

I'm sorry but I can't understand what's going on I see

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/feat/add-identities-index*:refs/remotes/origin/feat/add-identities-index* +refs/tags/feat/add-identities-index*:refs/tags/feat/add-identities-index*

That is not able to retrieve the branch But I didn't add the branch to the original repo, I've just added it to my fork
Probably I'm missing something, but I can't understand what is it

You need to bump the timestamp in the alteration script file you added. Since other alteration scripts have already been merged into master, the CI job compares the head with the master branch to determine which scripts to run. The timestamp in this PR may be outdated and could be rejected.

I did update the timestamp and it was the latest in the list, before the merge, but was still failing.

Anyway, we introduced that index in our fork, but we noticed it wasn't hit as expected. After further analysis, we've discovered that it was because of the Row Security Policies that exist for the users table, that links it to the tenants one.

Given that we had no luck with different indexes, we tried another approach. As you can see, the query is containing a ::json cast.

where ${fields.identities}::json#>>'{${sql.identifier([target])},userId}' = ${userId}

But we noticed that the identities column, on the db, is declared as JSONB, so it is forcing the casting to the JSON type at runtime, with no specific needs.

We tried removing that cast both in findUserByIdentity and in hasUserWithIdentity, and, executing the findUserByIdentityquery on the same db (about 50k users, all of them containing one identity each), with or without the casting, we noticed that the response time was more than 10X faster (~600ms with cast, ~14/15ms without casting).

What do you think? Would be an idea to "fix" that query before moving on with the indexes?

Or is there a reason for the ::json cast to happen?

Thanks for sharing — that’s a very good point. Since this query is quite old, I might not have enough context at the moment, l et me check with the team and get back to you shortly.

By the way, regarding the failing CI job, I suspect it may be related to repository access rules. Once we’ve agreed on the code updates, I’ll take a closer look. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants