summary refs log tree commit diff
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2021-11-10 16:10:47 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2021-11-10 16:10:47 -0800
commita41b74451b35f7a6529689760eb8c05241feecbc (patch)
tree3a5985890703e5ef36a698a3284ddecb6d1086c8
parent6752de1aebee8e73ee9cc31263407fdf0e29c274 (diff)
parent61bc346ce64a3864ac55f5d18bdc1572cda4fb18 (diff)
downloadlinux-a41b74451b35f7a6529689760eb8c05241feecbc.tar.gz
Merge tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull prctl updates from Christian Brauner:
 "This contains the missing prctl uapi pieces for PR_SCHED_CORE.

  In order to activate core scheduling the caller is expected to specify
  the scope of the new core scheduling domain.

  For example, passing 2 in the 4th argument of

     prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, <pid>,  2, 0);

  would indicate that the new core scheduling domain encompasses all
  tasks in the process group of <pid>. Specifying 0 would only create a
  core scheduling domain for the thread identified by <pid> and 2 would
  encompass the whole thread-group of <pid>.

  Note, the values 0, 1, and 2 correspond to PIDTYPE_PID, PIDTYPE_TGID,
  and PIDTYPE_PGID. A first version tried to expose those values
  directly to which I objected because:

   - PIDTYPE_* is an enum that is kernel internal which we should not
     expose to userspace directly.

   - PIDTYPE_* indicates what a given struct pid is used for it doesn't
     express a scope.

  But what the 4th argument of PR_SCHED_CORE prctl() expresses is the
  scope of the operation, i.e. the scope of the core scheduling domain
  at creation time. So Eugene's patch now simply introduces three new
  defines PR_SCHED_CORE_SCOPE_THREAD, PR_SCHED_CORE_SCOPE_THREAD_GROUP,
  and PR_SCHED_CORE_SCOPE_PROCESS_GROUP. They simply express what
  happens.

  This has been on the mailing list for quite a while with all relevant
  scheduler folks Cced. I announced multiple times that I'd pick this up
  if I don't see or her anyone else doing it. None of this touches
  proper scheduler code but only concerns uapi so I think this is fine.

  With core scheduling being quite common now for vm managers (e.g.
  moving individual vcpu threads into their own core scheduling domain)
  and container managers (e.g. moving the init process into its own core
  scheduling domain and letting all created children inherit it) having
  to rely on raw numbers passed as the 4th argument in prctl() is a bit
  annoying and everyone is starting to come up with their own defines"

* tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument
-rw-r--r--Documentation/admin-guide/hw-vuln/core-scheduling.rst5
-rw-r--r--include/uapi/linux/prctl.h3
-rw-r--r--kernel/sched/core_sched.c4
3 files changed, 10 insertions, 2 deletions
diff --git a/Documentation/admin-guide/hw-vuln/core-scheduling.rst b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
index 0febe458597c..cf1eeefdfc32 100644
--- a/Documentation/admin-guide/hw-vuln/core-scheduling.rst
+++ b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
@@ -61,8 +61,9 @@ arg3:
     ``pid`` of the task for which the operation applies.
 
 arg4:
-    ``pid_type`` for which the operation applies. It is of type ``enum pid_type``.
-    For example, if arg4 is ``PIDTYPE_TGID``, then the operation of this command
+    ``pid_type`` for which the operation applies. It is one of
+    ``PR_SCHED_CORE_SCOPE_``-prefixed macro constants.  For example, if arg4
+    is ``PR_SCHED_CORE_SCOPE_THREAD_GROUP``, then the operation of this command
     will be performed for all tasks in the task group of ``pid``.
 
 arg5:
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index de45fcd2dcbe..bb73e9a0b24f 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -268,5 +268,8 @@ struct prctl_mm_map {
 # define PR_SCHED_CORE_SHARE_TO		2 /* push core_sched cookie to pid */
 # define PR_SCHED_CORE_SHARE_FROM	3 /* pull core_sched cookie to pid */
 # define PR_SCHED_CORE_MAX		4
+# define PR_SCHED_CORE_SCOPE_THREAD		0
+# define PR_SCHED_CORE_SCOPE_THREAD_GROUP	1
+# define PR_SCHED_CORE_SCOPE_PROCESS_GROUP	2
 
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 48ac72696012..517f72b008f5 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -135,6 +135,10 @@ int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
 	if (!static_branch_likely(&sched_smt_present))
 		return -ENODEV;
 
+	BUILD_BUG_ON(PR_SCHED_CORE_SCOPE_THREAD != PIDTYPE_PID);
+	BUILD_BUG_ON(PR_SCHED_CORE_SCOPE_THREAD_GROUP != PIDTYPE_TGID);
+	BUILD_BUG_ON(PR_SCHED_CORE_SCOPE_PROCESS_GROUP != PIDTYPE_PGID);
+
 	if (type > PIDTYPE_PGID || cmd >= PR_SCHED_CORE_MAX || pid < 0 ||
 	    (cmd != PR_SCHED_CORE_GET && uaddr))
 		return -EINVAL;