Commit graph

22 commits

Author SHA1 Message Date
Andrea Arcangeli
30e0fca6c1 [PATCH] ptrace/coredump/exit_group deadlock
I could seldom reproduce a deadlock with a task not killable in T state
(TASK_STOPPED, not TASK_TRACED) by attaching a NPTL threaded program to
gdb, by segfaulting the task and triggering a core dump while some other
task is executing exit_group and while one task is in ptrace_attached
TASK_STOPPED state (not TASK_TRACED yet).  This originated from a gdb
bugreport (the fact gdb was segfaulting the task wasn't a kernel bug), but
I just incidentally noticed the gdb bug triggered a real kernel bug as
well.

Most threads hangs in exit_mm because the core_dumping is still going, the
core dumping hangs because the stopped task doesn't exit, the stopped task
can't wakeup because it has SIGNAL_GROUP_EXIT set, hence the deadlock.

To me it seems that the problem is that the force_sig_specific(SIGKILL) in
zap_threads is a noop if the task has PF_PTRACED set (like in this case
because gdb is attached).  The __ptrace_unlink does nothing because the
signal->flags is set to SIGNAL_GROUP_EXIT|SIGNAL_STOP_DEQUEUED (verified).

The above info also shows that the stopped task hit a race and got the stop
signal (presumably by the ptrace_attach, only the attach, state is still
TASK_STOPPED and gdb hangs waiting the core before it can set it to
TASK_TRACED) after one of the thread invoked the core dump (it's the core
dump that sets signal->flags to SIGNAL_GROUP_EXIT).

So beside the fact nobody would wakeup the task in __ptrace_unlink (the
state is _not_ TASK_TRACED), there's a secondary problem in the signal
handling code, where a task should ignore the ptrace-sigstops as long as
SIGNAL_GROUP_EXIT is set (or the wakeup in __ptrace_unlink path wouldn't be
enough).

So I attempted to make this patch that seems to fix the problem.  There
were various ways to fix it, perhaps you prefer a different one, I just
opted to the one that looked safer to me.

I also removed the clearing of the stopped bits from the zap_other_threads
(zap_other_threads was safe unlike zap_threads).  I don't like useless
code, this whole NPTL signal/ptrace thing is already unreadable enough and
full of corner cases without confusing useless code into it to make it even
less readable.  And if this code is really needed, then you may want to
explain why it's not being done in the other paths that sets
SIGNAL_GROUP_EXIT at least.

Even after this patch I still wonder who serializes the read of
p->ptrace in zap_threads.

Patch is called ptrace-core_dump-exit_group-deadlock-1.

This was the trace I've got:

test          T ffff81003e8118c0     0 14305      1         14311 14309 (NOTLB)
ffff810058ccdde8 0000000000000082 000001f4000037e1 ffff810000000013
       00000000000000f8 ffff81003e811b00 ffff81003e8118c0 ffff810011362100
       0000000000000012 ffff810017ca4180
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff80141677>{finish_stop+87}
       <ffffffff8014367f>{get_signal_to_deliver+1359} <ffffffff8010d3ad>{do_signal+157}
       <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80111575>{sys_ptrace+2293}
       <ffffffff80131810>{default_wake_function+0} <ffffffff80196399>{sys_ioctl+73}
       <ffffffff8010dd27>{sysret_signal+28} <ffffffff8010e00f>{ptregscall_common+103}

test          D ffff810011362100     0 14309      1         14305 14312 (NOTLB)
ffff810053c81cf8 0000000000000082 0000000000000286 0000000000000001
       0000000000000195 ffff810011362340 ffff810011362100 ffff81002e338040
       ffff810001e0ca80 0000000000000001
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
       <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
       <ffffffff801381af>{do_exit+479} <ffffffff80138d0c>{do_group_exit+252}
       <ffffffff801436db>{get_signal_to_deliver+1451} <ffffffff8010d3ad>{do_signal+157}
       <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2

       <ffffffff8014208a>{force_sig_info+186} <ffffffff804479a0>{do_int3+112}
       <ffffffff8010e308>{retint_signal+61}
test          D ffff81002e338040     0 14311      1         14716 14305 (NOTLB)
ffff81005ca8dcf8 0000000000000082 0000000000000286 0000000000000001
       0000000000000120 ffff81002e338280 ffff81002e338040 ffff8100481cb740
       ffff810001e0ca80 0000000000000001
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
       <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
       <ffffffff801381af>{do_exit+479} <ffffffff80142d0e>{__dequeue_signal+558}
       <ffffffff80138d0c>{do_group_exit+252} <ffffffff801436db>{get_signal_to_deliver+1451}
       <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222}
       <ffffffff80140850>{specific_send_sig_info+208} <ffffffff8014208a>{force_sig_info+186}
       <ffffffff804479a0>{do_int3+112} <ffffffff8010e308>{retint_signal+61}

test          D ffff810017ca4180     0 14312      1         14309 13882 (NOTLB)
ffff81005d15fcb8 0000000000000082 ffff81005d15fc58 ffffffff80130816
       0000000000000897 ffff810017ca43c0 ffff810017ca4180 ffff81003e8118c0
       0000000000000082 ffffffff801317ed
Call Trace:<ffffffff80130816>{activate_task+150} <ffffffff801317ed>{try_to_wake_up+893}
       <ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0}
       <ffffffff8018cdc3>{do_coredump+819} <ffffffff80445f52>{thread_return+82}
       <ffffffff801436d4>{get_signal_to_deliver+1444} <ffffffff8010d3ad>{do_signal+157}
       <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2

       <ffffffff804472e5>{_spin_unlock_irqrestore+5} <ffffffff8014208a>{force_sig_info+186}
       <ffffffff804476ff>{do_general_protection+159} <ffffffff8010e308>{retint_signal+61}

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:22 -08:00
Vadim Lobanov
6dd69f1061 [PATCH] Unify sys_tkill() and sys_tgkill()
The majority of the sys_tkill() and sys_tgkill() function code is
duplicated between the two of them.  This patch pulls the duplication out
into a separate function -- do_tkill() -- and lets sys_tkill() and
sys_tgkill() be simple wrappers around it.  This should make it easier to
maintain in light of future changes.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:19 -08:00
Oleg Nesterov
19a4fcb531 [PATCH] kill sigqueue->lock
This lock is used in sigqueue_free(), but it is always equal to
current->sighand->siglock, so we don't need to keep it in the struct
sigqueue.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:19 -08:00
Oleg Nesterov
8d027de54c [PATCH] fix ->signal->live leak in copy_process()
exit_signal() (called from copy_process's error path) should decrement
->signal->live, otherwise forking process will miss 'group_dead' in
do_exit().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 10:28:13 -07:00
Roland McGrath
25f407f0b6 [PATCH] Call exit_itimers from do_exit, not __exit_signal
When I originally moved exit_itimers into __exit_signal, that was the only
place where we could reliably know it was the last thread in the group
dying, without races.  Since then we've gotten the signal_struct.live
counter, and do_exit can reliably do group-wide cleanup work.

This patch moves the call to do_exit, where it's made without locks.  This
avoids the deadlock issues that the old __exit_signal code's comment talks
about, and the one that Oleg found recently with process CPU timers.

[ This replaces e03d13e985, which is why
  it was just reverted. ]

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-21 15:38:08 -07:00
Harald Welte
46113830a1 [PATCH] Fix signal sending in usbdevio on async URB completion
If a process issues an URB from userspace and (starts to) terminate
before the URB comes back, we run into the issue described above.  This
is because the urb saves a pointer to "current" when it is posted to the
device, but there's no guarantee that this pointer is still valid
afterwards.

In fact, there are three separate issues:

1) the pointer to "current" can become invalid, since the task could be
   completely gone when the URB completion comes back from the device.

2) Even if the saved task pointer is still pointing to a valid task_struct,
   task_struct->sighand could have gone meanwhile.

3) Even if the process is perfectly fine, permissions may have changed,
   and we can no longer send it a signal.

So what we do instead, is to save the PID and uid's of the process, and
introduce a new kill_proc_info_as_uid() function.

Signed-off-by: Harald Welte <laforge@gnumonks.org>
[ Fixed up types and added symbol exports ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 16:16:33 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Oleg Nesterov
788e05a67c [PATCH] fix do_coredump() vs SIGSTOP race
Let's suppose we have 2 threads in thread group:
	A - does coredump
	B - has pending SIGSTOP

thread A						thread B

do_coredump:						get_signal_to_deliver:

  lock(->sighand)
  ->signal->flags = SIGNAL_GROUP_EXIT
  unlock(->sighand)

							lock(->sighand)
							signr = dequeue_signal()
								->signal->flags |= SIGNAL_STOP_DEQUEUED
								return SIGSTOP;

							do_signal_stop:
							    unlock(->sighand)

  coredump_wait:

      zap_threads:
          lock(tasklist_lock)
          send SIGKILL to B
              // signal_wake_up() does nothing
          unlock(tasklist_lock)

							    lock(tasklist_lock)
							    lock(->sighand)
							    re-check sig->flags & SIGNAL_STOP_DEQUEUED, yes
							    set_current_state(TASK_STOPPED);
							    finish_stop:
							        schedule();
							            // ->state == TASK_STOPPED

      wait_for_completion(&startup_done)
         // waits for complete() from B,
         // ->state == TASK_UNINTERRUPTIBLE

We can't wake up 'B' in any way:

	SIGCONT will be ignored because handle_stop_signal() sees
	->signal->flags & SIGNAL_GROUP_EXIT.

	sys_kill(SIGKILL)->__group_complete_signal() will choose
	uninterruptible 'A', so it can't help.

	sys_tkill(B, SIGKILL) will be ignored by specific_send_sig_info()
	because B already has pending SIGKILL.

This scenario is not possbile if 'A' does do_group_exit(), because
it sets sig->flags = SIGNAL_GROUP_EXIT and delivers SIGKILL to
subthreads atomically, holding both tasklist_lock and sighand->lock.
That means that do_signal_stop() will notice !SIGNAL_STOP_DEQUEUED
after re-locking ->sighand. And it is not possible to any other
thread to re-add SIGNAL_STOP_DEQUEUED later, because dequeue_signal()
can only return SIGKILL.

I think it is better to change do_coredump() to do sigaddset(SIGKILL)
and signal_wake_up() under sighand->lock, but this patch is much
simpler.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 14:53:31 -07:00
Roland McGrath
5acbc5cb50 [PATCH] Fix task state testing properly in do_signal_stop()
Any tests using < TASK_STOPPED or the like are left over from the time
when the TASK_ZOMBIE and TASK_DEAD bits were in the same word, and it
served to check for "stopped or dead".  I think this one in
do_signal_stop is the only such case.  It has been buggy ever since
exit_state was separated, and isn't testing the exit_state value.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 15:20:47 -07:00
Linus Torvalds
188a1eafa0 Make sure SIGKILL gets proper respect
Bhavesh P. Davda <bhavesh@avaya.com> noticed that SIGKILL wouldn't
properly kill a process under just the right cicumstances: a stopped
task that already had another signal queued would get the SIGKILL
queued onto the shared queue, and there it would remain until SIGCONT.

This simplifies the signal acceptance logic, and fixes the bug in the
process.

Losely based on an earlier patch by Bhavesh.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-23 13:22:21 -07:00
Nishanth Aravamudan
75bcc8c5e1 [PATCH] kernel: fix-up schedule_timeout() usage
Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:37 -07:00
Oleg Nesterov
e752dd6cc6 [PATCH] fix send_sigqueue() vs thread exit race
posix_timer_event() first checks that the thread (SIGEV_THREAD_ID case)
does not have PF_EXITING flag, then it calls send_sigqueue() which locks
task list.  But if the thread exits in between the kernel will oops
(->sighand == NULL after __exit_sighand).

This patch moves the PF_EXITING check into the send_sigqueue(), it must be
done atomically under tasklist_lock.  When send_sigqueue() detects exiting
thread it returns -1.  In that case posix_timer_event will send the signal
to thread group.

Also, this patch fixes task_struct use-after-free in posix_timer_event.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:33 -07:00
Oleg Nesterov
bc505a478d [PATCH] do_notify_parent_cldstop() cleanup
This patch simplifies the usage of do_notify_parent_cldstop(), it lessens
the source and .text size slightly, and makes the code (in my opinion) a
bit more readable.

I am sending this patch now because I'm afraid Paul will touch
do_notify_parent_cldstop() really soon, It's better to cleanup first.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:31 -07:00
Bhavesh P. Davda
dd12f48d4e [PATCH] NPTL signal delivery deadlock fix
This bug is quite subtle and only happens in a very interesting
situation where a real-time threaded process is in the middle of a
coredump when someone whacks it with a SIGKILL.  However, this deadlock
leaves the system pretty hosed and you have to reboot to recover.

Not good for real-time priority-preemption applications like our
telephony application, with 90+ real-time (SCHED_FIFO and SCHED_RR)
processes, many of them multi-threaded, interacting with each other for
high volume call processing.

Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-17 12:52:04 -07:00
Christoph Lameter
3e1d1d28d9 [PATCH] Cleanup patch for process freezing
1. Establish a simple API for process freezing defined in linux/include/sched.h:

   frozen(process)		Check for frozen process
   freezing(process)		Check if a process is being frozen
   freeze(process)		Tell a process to freeze (go to refrigerator)
   thaw_process(process)	Restart process
   frozen_process(process)	Process is frozen now

2. Remove all references to PF_FREEZE and PF_FROZEN from all
   kernel sources except sched.h

3. Fix numerous locations where try_to_freeze is manually done by a driver

4. Remove the argument that is no longer necessary from two function calls.

5. Some whitespace cleanup

6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
   cleared before setting PF_FROZEN, recalc_sigpending does not check
   PF_FROZEN).

This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 17:10:13 -07:00
Kirill Korotaev
4fea2838aa [PATCH] Software suspend and recalc sigpending bug fix
This patch fixes recalc_sigpending() to work correctly with tasks which are
being freezed.

The problem is that freeze_processes() sets PF_FREEZE and TIF_SIGPENDING
flags on tasks, but recalc_sigpending() called from e.g.
sys_rt_sigtimedwait or any other kernel place will clear TIF_SIGPENDING due
to no pending signals queued and the tasks won't be freezed until it
recieves a real signal or freezed_processes() fail due to timeout.

Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:27 -07:00
David Woodhouse
1c3f45ab2f Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-02 16:39:11 +01:00
Kirill Korotaev
c33880aadd [PATCH] sigkill priority fix
If SIGKILL does not have priority, we cannot instantly kill task before it
makes some unexpected job.  It can be critical, but we were unable to
reproduce this easily until Heiko Carstens <Heiko.Carstens@de.ibm.com>
reported this problem on LKML.

Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-24 20:08:13 -07:00
Steve Grubb
c2f0c7c356 The attached patch addresses the problem with getting the audit daemon
shutdown credential information. It creates a new message type 
AUDIT_TERM_INFO, which is used by the audit daemon to query who issued the 
shutdown. 

It requires the placement of a hook function that gathers the information. The 
hook is after the DAC & MAC checks and before the function returns. Racing 
threads could overwrite the uid & pid - but they would have to be root and 
have policy that allows signalling the audit daemon. That should be a 
manageable risk.

The userspace component will be released later in audit 0.7.2. When it 
receives the TERM signal, it queries the kernel for shutdown information. 
When it receives it, it writes the message and exits. The message looks 
like this:

type=DAEMON msg=auditd(1114551182.000) auditd normal halt, sending pid=2650 
uid=525, auditd pid=1685

Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-06 12:38:39 +01:00
Jesper Juhl
7ed20e1ad5 [PATCH] convert that currently tests _NSIG directly to use valid_signal()
Convert most of the current code that uses _NSIG directly to instead use
valid_signal().  This avoids gcc -W warnings and off-by-one errors.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:14 -07:00
Christoph Hellwig
6cae60feb6 [PATCH] kill #ifndef HAVE_ARCH_GET_SIGNAL_TO_DELIVER in signal.c
Now that no architectures defines HAVE_ARCH_GET_SIGNAL_TO_DELIVER anymore
this can go away.  It was a transitional hack only.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:25:47 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00