From: Zi Yan <ziy@nvidia.com>
To: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Song Liu <songliubraving@fb.com>
Cc: Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Lorenzo Stoakes <ljs@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org,
Liam Howlett <liam@infradead.org>
Subject: [PATCH v6 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files
Date: Sun, 17 May 2026 09:54:15 -0400 [thread overview]
Message-ID: <20260517135416.1434539-14-ziy@nvidia.com> (raw)
In-Reply-To: <20260517135416.1434539-1-ziy@nvidia.com>
collapse_file() is capable of collapsing pagecache folios from writable
files to PMD folios. Now enable clean pagecache folio collapse in
addition to read-only pagecache folio collapse by removing the
inode_is_open_for_write() from file_thp_enabled() and only performing
filemap_flush() if the file is read-only.
This means userspace needs to explicitly flush the content of pagecache
folios before khugepaged can collapse the folios, or use
madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is
that blindly enabling dirty pagecache folio from writable files collapse
makes khugepaged flush these folios all the time. It is undesirable to
cause system level pagecache flushes.
To properly support dirty pagecache folio collapse, filemap_flush() needs
to be avoided. Potentially, merging associated buffer instead of dropping
it with filemap_release_folio() might be needed.
NOTE: this breaks khugepaged selftests for writable file pagecache
collapse, which is set to fail all the time. The next commit fixes it.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: David Sterba <dsterba@suse.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
---
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 15 +++++++++------
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d055f53be8502..c565b2a651e06 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
if (!mapping_pmd_folio_support(vma->vm_file->f_mapping))
return false;
- return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+ return S_ISREG(inode->i_mode);
}
/* If returns true, we are unable to access the VMA's folios. */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c743ec41a7b8b..395c40c24dbc5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2342,18 +2342,21 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
} else if (folio_test_dirty(folio)) {
/*
* This page is dirty because it hasn't
- * been flushed since first write. There
- * won't be new dirty pages.
+ * been flushed since first write.
*
- * Trigger async flush here and hope the
- * writeback is done when khugepaged
- * revisits this page.
+ * Trigger async flush for read-only files and
+ * hope the writeback is done when khugepaged
+ * revisits this page. Writable files can have
+ * their folios dirty at any time; blindly
+ * flushing them would cause undesirable
+ * system-wide writeback.
*
* This is a one-off situation. We are not
* forcing writeback in loop.
*/
xas_unlock_irq(&xas);
- filemap_flush(mapping);
+ if (!inode_is_open_for_write(mapping->host))
+ filemap_flush(mapping);
result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
goto xa_unlocked;
} else if (folio_test_writeback(folio)) {
--
2.53.0
next prev parent reply other threads:[~2026-05-17 13:55 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-17 13:54 [PATCH v6 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-05-17 13:54 ` [PATCH v6 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-05-17 13:54 ` [PATCH v6 02/14] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-05-17 13:54 ` [PATCH v6 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-05-17 13:54 ` [PATCH v6 04/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
2026-05-17 13:54 ` [PATCH v6 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-05-17 13:54 ` [PATCH v6 06/14] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-05-17 13:54 ` [PATCH v6 07/14] fs: remove nr_thps from struct address_space Zi Yan
2026-05-17 13:54 ` [PATCH v6 08/14] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-05-17 13:54 ` [PATCH v6 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-05-17 13:54 ` [PATCH v6 10/14] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-05-17 13:54 ` [PATCH v6 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-05-17 13:54 ` [PATCH v6 12/14] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-05-17 13:54 ` Zi Yan [this message]
2026-05-17 13:54 ` [PATCH v6 14/14] selftests/mm: add writable-file collapse tests for khugepaged Zi Yan
2026-05-18 22:21 ` [PATCH v6 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Andrew Morton
2026-05-18 23:39 ` Zi Yan
2026-05-19 0:45 ` Andrew Morton
2026-05-19 0:59 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260517135416.1434539-14-ziy@nvidia.com \
--to=ziy@nvidia.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.