summary refs log tree commit diff
path: root/kernel
diff options
context:
space:
mode:
authorMuchun Song <songmuchun@bytedance.com>2021-04-29 22:56:02 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2021-04-30 11:20:37 -0700
commit27faca83a7e955e4e0b831d75a8a9a840fe9bae4 (patch)
tree25fb75d43422c6e5520e631696e1c74a1efc3f88 /kernel
parent2840d498e30ce53a3a7cb482a5445efd892c7697 (diff)
downloadlinux-27faca83a7e955e4e0b831d75a8a9a840fe9bae4.tar.gz
mm: memcontrol: fix kernel stack account
For simplification commit 991e7673859e ("mm: memcontrol: account kernel
stack per node") changed the per zone vmalloc backed stack pages
accounting to per node.

By doing that we have lost a certain precision because those pages might
live in different NUMA nodes.  In the end NR_KERNEL_STACK_KB exported to
the userspace might be over estimated on some nodes while underestimated
on others.  But this is not a real world problem, just a problem found
by reading the code.  So there is no actual data to showing how much
impact it has on users.

This doesn't impose any real problem to correctnes of the kernel
behavior as the counter is not used for any internal processing but it
can cause some confusion to the userspace.

Address the problem by accounting each vmalloc backing page to its own
node.

Link: https://lkml.kernel.org/r/20210303151843.81156-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'kernel')
-rw-r--r--kernel/fork.c13
1 files changed, 8 insertions, 5 deletions
diff --git a/kernel/fork.c b/kernel/fork.c
index 0a5d28fe9990..771e0ea90499 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -380,14 +380,17 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 	void *stack = task_stack_page(tsk);
 	struct vm_struct *vm = task_stack_vm_area(tsk);
 
+	if (vm) {
+		int i;
 
-	/* All stack pages are in the same node. */
-	if (vm)
-		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
-				      account * (THREAD_SIZE / 1024));
-	else
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
+			mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
+					      account * (PAGE_SIZE / 1024));
+	} else {
+		/* All stack pages are in the same node. */
 		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
+	}
 }
 
 static int memcg_charge_kernel_stack(struct task_struct *tsk)