We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.
But first update code that relied on the implicit header inclusion.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull namespace updates from Eric Biederman:
"There is a lot here. A lot of these changes result in subtle user
visible differences in kernel behavior. I don't expect anything will
care but I will revert/fix things immediately if any regressions show
up.
From Seth Forshee there is a continuation of the work to make the vfs
ready for unpriviled mounts. We had thought the previous changes
prevented the creation of files outside of s_user_ns of a filesystem,
but it turns we missed the O_CREAT path. Ooops.
Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
children that are forked after the prctl are considered and not
children forked before the prctl. The only known user of this prctl
systemd forks all children after the prctl. So no userspace
regressions will occur. Holding earlier forked children to the same
rules as later forked children creates a semantic that is sane enough
to allow checkpoing of processes that use this feature.
There is a long delayed change by Nikolay Borisov to limit inotify
instances inside a user namespace.
Michael Kerrisk extends the API for files used to maniuplate
namespaces with two new trivial ioctls to allow discovery of the
hierachy and properties of namespaces.
Konstantin Khlebnikov with the help of Al Viro adds code that when a
network namespace exits purges it's sysctl entries from the dcache. As
in some circumstances this could use a lot of memory.
Vivek Goyal fixed a bug with stacked filesystems where the permissions
on the wrong inode were being checked.
I continue previous work on ptracing across exec. Allowing a file to
be setuid across exec while being ptraced if the tracer has enough
credentials in the user namespace, and if the process has CAP_SETUID
in it's own namespace. Proc files for setuid or otherwise undumpable
executables are now owned by the root in the user namespace of their
mm. Allowing debugging of setuid applications in containers to work
better.
A bug I introduced with permission checking and automount is now
fixed. The big change is to mark the mounts that the kernel initiates
as a result of an automount. This allows the permission checks in sget
to be safely suppressed for this kind of mount. As the permission
check happened when the original filesystem was mounted.
Finally a special case in the mount namespace is removed preventing
unbounded chains in the mount hash table, and making the semantics
simpler which benefits CRIU.
The vfs fix along with related work in ima and evm I believe makes us
ready to finish developing and merge fully unprivileged mounts of the
fuse filesystem. The cleanups of the mount namespace makes discussing
how to fix the worst case complexity of umount. The stacked filesystem
fixes pave the way for adding multiple mappings for the filesystem
uids so that efficient and safer containers can be implemented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc/sysctl: Don't grab i_lock under sysctl_lock.
vfs: Use upper filesystem inode in bprm_fill_uid()
proc/sysctl: prune stale dentries during unregistering
mnt: Tuck mounts under others instead of creating shadow/side mounts.
prctl: propagate has_child_subreaper flag to every descendant
introduce the walk_process_tree() helper
nsfs: Add an ioctl() to return owner UID of a userns
fs: Better permission checking for submounts
exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
vfs: open() with O_CREAT should not create inodes with unknown ids
nsfs: Add an ioctl() to return the namespace type
proc: Better ownership of files for non-dumpable tasks in user namespaces
exec: Remove LSM_UNSAFE_PTRACE_CAP
exec: Test the ptracer's saved cred to see if the tracee can gain caps
exec: Don't reset euid and egid when the tracee has CAP_SETUID
inotify: Convert to using per-namespace limits
Pull security layer updates from James Morris:
"Highlights:
- major AppArmor update: policy namespaces & lots of fixes
- add /sys/kernel/security/lsm node for easy detection of loaded LSMs
- SELinux cgroupfs labeling support
- SELinux context mounts on tmpfs, ramfs, devpts within user
namespaces
- improved TPM 2.0 support"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
tpm: declare tpm2_get_pcr_allocation() as static
tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
tpm xen: drop unneeded chip variable
tpm: fix misspelled "facilitate" in module parameter description
tpm_tis: fix the error handling of init_tis()
KEYS: Use memzero_explicit() for secret data
KEYS: Fix an error code in request_master_key()
sign-file: fix build error in sign-file.c with libressl
selinux: allow changing labels for cgroupfs
selinux: fix off-by-one in setprocattr
tpm: silence an array overflow warning
tpm: fix the type of owned field in cap_t
tpm: add securityfs support for TPM 2.0 firmware event log
tpm: enhance read_log_of() to support Physical TPM event log
tpm: enhance TPM 2.0 PCR extend to support multiple banks
tpm: implement TPM 2.0 capability to get active PCR banks
tpm: fix RC value check in tpm2_seal_trusted
tpm_tis: fix iTPM probe via probe_itpm() function
tpm: Begin the process to deprecate user_read_timer
tpm: remove tpm_read_index and tpm_write_index from tpm.h
...
With previous changes every location that tests for
LSM_UNSAFE_PTRACE_CAP also tests for LSM_UNSAFE_PTRACE making the
LSM_UNSAFE_PTRACE_CAP redundant, so remove it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I am still tired of having to find indirect ways to determine
what security modules are active on a system. I have added
/sys/kernel/security/lsm, which contains a comma separated
list of the active security modules. No more groping around
in /proc/filesystems or other clever hacks.
Unchanged from previous versions except for being updated
to the latest security next branch.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The kernel build bot turned up a bad config combination when
CONFIG_SECURITY_APPARMOR is y and CONFIG_SECURITY_APPARMOR_HASH is n,
resulting in the build error
security/built-in.o: In function `aa_unpack':
(.text+0x841e2): undefined reference to `aa_g_hash_policy'
Signed-off-by: John Johansen <john.johansen@canonical.com>
If this sysctl is set to non-zero and a process with CAP_MAC_ADMIN in
the root namespace has created an AppArmor policy namespace,
unprivileged processes will be able to change to a profile in the
newly created AppArmor policy namespace and, if the profile allows
CAP_MAC_ADMIN and appropriate file permissions, will be able to load
policy in the respective policy namespace.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Allow a profile to carry extra data that can be queried via userspace.
This provides a means to store extra data in a profile that a trusted
helper can extract and use from live policy.
Signed-off-by: William Hua <william.hua@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
apparmor should be checking the SECURITY_CAP_NOAUDIT constant. Also
in complain mode make it so apparmor can elect to log a message,
informing of the check.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Allow turning off the computation of the policy hashes via the
apparmor.hash_policy kernel parameter.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Moving the use of fqname to later allows learning profiles to be based
on the fqname request instead of just the hname. It also allows cleaning
up some of the name parsing and lookup by allowing the use of
the fqlookupn_profile() lib fn.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The aad macro can replace aad strings when it is not intended to. Switch
to a fn macro so it is only applied when intended.
Also at the same time cleanup audit_data initialization by putting
common boiler plate behind a macro, and dropping the gfp_t parameter
which will become useless.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Having ops be an integer that is an index into an op name table is
awkward and brittle. Every op change requires an edit for both the
op constant and a string in the table. Instead switch to using const
strings directly, eliminating the need for the table that needs to
be kept in sync.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Trying to update the task cred while the task current cred is not the
real cred will result in an error at the cred layer. Avoid this by
failing early and delaying the update.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Verify that profiles in a load set specify the same policy ns and
audit the name of the policy ns that policy is being loaded for.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Store loaded policy and allow introspecting it through apparmorfs. This
has several uses from debugging, policy validation, and policy checkpoint
and restore for containers.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Policy management will be expanded beyond traditional unconfined root.
This will require knowning the profile of the task doing the management
and the ns view.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available and checking
that the user namespace level is the same as the policy ns level.
This strict pairing will be relaxed once true support of user namespaces
lands.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Borrow the special null device file from selinux to "close" fds that
don't have sufficient permissions at exec time.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Commit 9f834ec18d ("binfmt_elf: switch to new creds when switching to new mm")
changed when the creds are installed by the binfmt_elf handler. This
affects which creds are used to mmap the executable into the address
space. Which can have an affect on apparmor policy.
Add a flag to apparmor at
/sys/kernel/security/apparmor/features/domain/fix_binfmt_elf_mmap
to make it possible to detect this semantic change so that the userspace
tools and the regression test suite can correctly deal with the change.
BugLink: http://bugs.launchpad.net/bugs/1630069
Signed-off-by: John Johansen <john.johansen@canonical.com>
Instead of testing whether a given dfa exists in every code path, have
a default null dfa that is used when loaded policy doesn't provide a
dfa.
This will let us get rid of special casing and avoid dereference bugs
when special casing is missed.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Newer policy will combine the file and policydb dfas, allowing for
better optimizations. However to support older policy we need to
keep the ability to address the "file" dfa separately. So dup
the policydb as if it is the file dfa and set the appropriate start
state.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The dfa is currently setup to be shared (has the basis of refcounting)
but currently can't be because the count can't be increased.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Newer policy encodes more than just version in the version tag,
so add masking to make sure the comparison remains correct.
Note: this is fully compatible with older policy as it will never set
the bits being masked out.
Signed-off-by: John Johansen <john.johansen@canonical.com>
When possible its better to name a learning profile after the missing
profile in question. This allows for both more informative names and
for profile reuse.
Signed-off-by: John Johansen <john.johansen@canonical.com>
prepare_ns() will need to be called from alternate views, and namespaces
will need to be created via different interfaces. So refactor and
allow specifying the view ns.
Signed-off-by: John Johansen <john.johansen@canonical.com>