mm/mglru: simplify and improve dirty writeback handling

Right now the flusher wakeup mechanism for MGLRU is less responsive and unlikely to trigger compared to classical LRU. The classical LRU wakes the flusher if one batch of folios passed to shrink_folio_list is unevictable due to under writeback. MGLRU instead check and handle this after the whole reclaim loop is done. We previously even saw OOM problems due to passive flusher, which were fixed but still not perfect [1]. We have just unified the dirty folio counting and activation routine, now just move the dirty flush into the loop right after shrink_folio_list. This improves the performance a lot for workloads involving heavy writeback and prepares for throttling too. Test with YCSB workloadb showed a major performance improvement: Before this series: Throughput(ops/sec): 62485.02962831822 AverageLatency(us): 500.9746963330107 pgpgin 159347462 workingset_refault_file 34522071 After this commit: Throughput(ops/sec): 80857.08510208207 AverageLatency(us): 386.653262968934 pgpgin 112233121 workingset_refault_file 19516246 The performance is a lot better with significantly lower refault. We also observed similar or higher performance gain for other real-world workloads. We were concerned that the dirty flush could cause more wear for SSD: that should not be the problem here, since the wakeup condition is when the dirty folios have been pushed to the tail of LRU, which indicates that memory pressure is so high that writeback is blocking the workload already. Link: https://lore.kernel.org/20260428-mglru-reclaim-v7-11-02fabb92dc43@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Link: https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@gmail.com/ [1] Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chen Ridong <chenridong@huaweicloud.com> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: David Stevens <stevensd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Leno Hou <lenohou@gmail.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vernon Yang <vernon2gm@gmail.com> Cc: Wei Xu <weixugc@google.com> Cc: Yafang <laoar.shao@gmail.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
author: Kairui Song <kasong@tencent.com> 2026-04-28 02:07:02 +0800
committer: Andrew Morton <akpm@linux-foundation.org> 2026-05-28 21:31:30 -0700
commit: 7eb22f9f5795e2fd41c5df0d447eaeff03ba76b6 (patch)
tree: 56c689a46cf425382a1f28c98891dcb108d5c471 /mm
parent: 695557f8956cc5018ee48335322552ec4881ec72 (diff)
download: linux-next-history-7eb22f9f5795e2fd41c5df0d447eaeff03ba76b6.tar.gz
1 files changed, 16 insertions, 25 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e699425c5b064..d26c89546542c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4728,8 +4728,6 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
 				scanned, skipped, isolated,
 				type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
-	if (type == LRU_GEN_FILE)
-		sc->nr.file_taken += isolated;
 
 	*isolatedp = isolated;
 	return scanned;
@@ -4842,12 +4840,27 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
 		return scanned;
 retry:
 	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false, memcg);
-	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
 	sc->nr_reclaimed += reclaimed;
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
 			type_scanned, reclaimed, &stat, sc->priority,
 			type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
 
+	/*
+	 * If too many file cache in the coldest generation can't be evicted
+	 * due to being dirty, wake up the flusher.
+	 */
+	if (stat.nr_unqueued_dirty == isolated) {
+		wakeup_flusher_threads(WB_REASON_VMSCAN);
+
+		/*
+		 * For cgroupv1 dirty throttling is achieved by waking up
+		 * the kernel flusher here and later waiting on folios
+		 * which are in writeback to finish (see shrink_folio_list()).
+		 */
+		if (!writeback_throttling_sane(sc))
+			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+	}
+
 	list_for_each_entry_safe_reverse(folio, next, &list, lru) {
 		DEFINE_MIN_SEQ(lruvec);
 
@@ -5004,28 +5017,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 		cond_resched();
 	}
 
-	/*
-	 * If too many file cache in the coldest generation can't be evicted
-	 * due to being dirty, wake up the flusher.
-	 */
-	if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) {
-		struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
-		wakeup_flusher_threads(WB_REASON_VMSCAN);
-
-		/*
-		 * For cgroupv1 dirty throttling is achieved by waking up
-		 * the kernel flusher here and later waiting on folios
-		 * which are in writeback to finish (see shrink_folio_list()).
-		 *
-		 * Flusher may not be able to issue writeback quickly
-		 * enough for cgroupv1 writeback throttling to work
-		 * on a large system.
-		 */
-		if (!writeback_throttling_sane(sc))
-			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
-	}
-
 	return need_rotate;
 }
author	Kairui Song <kasong@tencent.com>	2026-04-28 02:07:02 +0800
committer	Andrew Morton <akpm@linux-foundation.org>	2026-05-28 21:31:30 -0700
commit	7eb22f9f5795e2fd41c5df0d447eaeff03ba76b6 (patch)
tree	56c689a46cf425382a1f28c98891dcb108d5c471 /mm
parent	695557f8956cc5018ee48335322552ec4881ec72 (diff)
download	linux-next-history-7eb22f9f5795e2fd41c5df0d447eaeff03ba76b6.tar.gz