PM / reboot: Eliminate race between reboot and suspend

At present, "systemctl suspend" and "shutdown" can run in parrallel. A
system can suspend after devices_shutdown(), and resume. Then the shutdown
task goes on to power off. This causes many devices are not really shut
off. Hence replacing reboot_mutex with system_transition_mutex (renamed
from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as
system_transition_mutex can be better to reflect the purpose of the mutex.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit is contained in:
Pingfan Liu 2018-07-31 16:51:32 +08:00 committed by Rafael J. Wysocki
parent 82837ad5bd
commit 55f2503c3b
10 changed files with 40 additions and 36 deletions

View file

@ -204,26 +204,26 @@ VI. Are there any precautions to be taken to prevent freezing failures?
Yes, there are.
First of all, grabbing the 'pm_mutex' lock to mutually exclude a piece of code
First of all, grabbing the 'system_transition_mutex' lock to mutually exclude a piece of code
from system-wide sleep such as suspend/hibernation is not encouraged.
If possible, that piece of code must instead hook onto the suspend/hibernation
notifiers to achieve mutual exclusion. Look at the CPU-Hotplug code
(kernel/cpu.c) for an example.
However, if that is not feasible, and grabbing 'pm_mutex' is deemed necessary,
it is strongly discouraged to directly call mutex_[un]lock(&pm_mutex) since
However, if that is not feasible, and grabbing 'system_transition_mutex' is deemed necessary,
it is strongly discouraged to directly call mutex_[un]lock(&system_transition_mutex) since
that could lead to freezing failures, because if the suspend/hibernate code
successfully acquired the 'pm_mutex' lock, and hence that other entity failed
successfully acquired the 'system_transition_mutex' lock, and hence that other entity failed
to acquire the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE
state. As a consequence, the freezer would not be able to freeze that task,
leading to freezing failure.
However, the [un]lock_system_sleep() APIs are safe to use in this scenario,
since they ask the freezer to skip freezing this task, since it is anyway
"frozen enough" as it is blocked on 'pm_mutex', which will be released
"frozen enough" as it is blocked on 'system_transition_mutex', which will be released
only after the entire suspend/hibernation sequence is complete.
So, to summarize, use [un]lock_system_sleep() instead of directly using
mutex_[un]lock(&pm_mutex). That would prevent freezing failures.
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
V. Miscellaneous
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze

View file

@ -32,7 +32,7 @@ More details follow:
sysfs file
|
v
Acquire pm_mutex lock
Acquire system_transition_mutex lock
|
v
Send PM_SUSPEND_PREPARE
@ -96,10 +96,10 @@ execution during resume):
* thaw tasks
* send PM_POST_SUSPEND notifications
* Release pm_mutex lock.
* Release system_transition_mutex lock.
It is to be noted here that the pm_mutex lock is acquired at the very
It is to be noted here that the system_transition_mutex lock is acquired at the very
beginning, when we are just starting out to suspend, and then released only
after the entire cycle is complete (i.e., suspend + resume).

View file

@ -414,7 +414,7 @@ static inline bool hibernation_available(void) { return false; }
#define PM_RESTORE_PREPARE 0x0005 /* Going to restore a saved image */
#define PM_POST_RESTORE 0x0006 /* Restore failed */
extern struct mutex pm_mutex;
extern struct mutex system_transition_mutex;
#ifdef CONFIG_PM_SLEEP
void save_processor_state(void);

View file

@ -15,7 +15,9 @@
atomic_t system_freezing_cnt = ATOMIC_INIT(0);
EXPORT_SYMBOL(system_freezing_cnt);
/* indicate whether PM freezing is in effect, protected by pm_mutex */
/* indicate whether PM freezing is in effect, protected by
* system_transition_mutex
*/
bool pm_freezing;
bool pm_nosig_freezing;

View file

@ -338,7 +338,7 @@ static int create_image(int platform_mode)
* hibernation_snapshot - Quiesce devices and create a hibernation image.
* @platform_mode: If set, use platform driver to prepare for the transition.
*
* This routine must be called with pm_mutex held.
* This routine must be called with system_transition_mutex held.
*/
int hibernation_snapshot(int platform_mode)
{
@ -500,8 +500,9 @@ static int resume_target_kernel(bool platform_mode)
* hibernation_restore - Quiesce devices and restore from a hibernation image.
* @platform_mode: If set, use platform driver to prepare for the transition.
*
* This routine must be called with pm_mutex held. If it is successful, control
* reappears in the restored target kernel in hibernation_snapshot().
* This routine must be called with system_transition_mutex held. If it is
* successful, control reappears in the restored target kernel in
* hibernation_snapshot().
*/
int hibernation_restore(int platform_mode)
{
@ -806,13 +807,13 @@ static int software_resume(void)
* name_to_dev_t() below takes a sysfs buffer mutex when sysfs
* is configured into the kernel. Since the regular hibernate
* trigger path is via sysfs which takes a buffer mutex before
* calling hibernate functions (which take pm_mutex) this can
* cause lockdep to complain about a possible ABBA deadlock
* calling hibernate functions (which take system_transition_mutex)
* this can cause lockdep to complain about a possible ABBA deadlock
* which cannot happen since we're in the boot code here and
* sysfs can't be invoked yet. Therefore, we use a subclass
* here to avoid lockdep complaining.
*/
mutex_lock_nested(&pm_mutex, SINGLE_DEPTH_NESTING);
mutex_lock_nested(&system_transition_mutex, SINGLE_DEPTH_NESTING);
if (swsusp_resume_device)
goto Check_image;
@ -900,7 +901,7 @@ static int software_resume(void)
atomic_inc(&snapshot_device_available);
/* For success case, the suspend path will release the lock */
Unlock:
mutex_unlock(&pm_mutex);
mutex_unlock(&system_transition_mutex);
pm_pr_dbg("Hibernation image not present or could not be loaded.\n");
return error;
Close_Finish:

View file

@ -15,17 +15,16 @@
#include <linux/workqueue.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/suspend.h>
#include "power.h"
DEFINE_MUTEX(pm_mutex);
#ifdef CONFIG_PM_SLEEP
void lock_system_sleep(void)
{
current->flags |= PF_FREEZER_SKIP;
mutex_lock(&pm_mutex);
mutex_lock(&system_transition_mutex);
}
EXPORT_SYMBOL_GPL(lock_system_sleep);
@ -37,8 +36,9 @@ void unlock_system_sleep(void)
*
* Reason:
* Fundamentally, we just don't need it, because freezing condition
* doesn't come into effect until we release the pm_mutex lock,
* since the freezer always works with pm_mutex held.
* doesn't come into effect until we release the
* system_transition_mutex lock, since the freezer always works with
* system_transition_mutex held.
*
* More importantly, in the case of hibernation,
* unlock_system_sleep() gets called in snapshot_read() and
@ -47,7 +47,7 @@ void unlock_system_sleep(void)
* enter the refrigerator, thus causing hibernation to lockup.
*/
current->flags &= ~PF_FREEZER_SKIP;
mutex_unlock(&pm_mutex);
mutex_unlock(&system_transition_mutex);
}
EXPORT_SYMBOL_GPL(unlock_system_sleep);

View file

@ -556,7 +556,7 @@ static int enter_state(suspend_state_t state)
} else if (!valid_state(state)) {
return -EINVAL;
}
if (!mutex_trylock(&pm_mutex))
if (!mutex_trylock(&system_transition_mutex))
return -EBUSY;
if (state == PM_SUSPEND_TO_IDLE)
@ -590,7 +590,7 @@ static int enter_state(suspend_state_t state)
pm_pr_dbg("Finishing wakeup.\n");
suspend_finish();
Unlock:
mutex_unlock(&pm_mutex);
mutex_unlock(&system_transition_mutex);
return error;
}

View file

@ -216,7 +216,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (!mutex_trylock(&pm_mutex))
if (!mutex_trylock(&system_transition_mutex))
return -EBUSY;
lock_device_hotplug();
@ -394,7 +394,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
}
unlock_device_hotplug();
mutex_unlock(&pm_mutex);
mutex_unlock(&system_transition_mutex);
return error;
}

View file

@ -294,7 +294,7 @@ void kernel_power_off(void)
}
EXPORT_SYMBOL_GPL(kernel_power_off);
static DEFINE_MUTEX(reboot_mutex);
DEFINE_MUTEX(system_transition_mutex);
/*
* Reboot system call: for obvious reasons only root may call it,
@ -338,7 +338,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off)
cmd = LINUX_REBOOT_CMD_HALT;
mutex_lock(&reboot_mutex);
mutex_lock(&system_transition_mutex);
switch (cmd) {
case LINUX_REBOOT_CMD_RESTART:
kernel_restart(NULL);
@ -389,7 +389,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
ret = -EINVAL;
break;
}
mutex_unlock(&reboot_mutex);
mutex_unlock(&system_transition_mutex);
return ret;
}

View file

@ -155,16 +155,17 @@ static inline void set_pcppage_migratetype(struct page *page, int migratetype)
* The following functions are used by the suspend/hibernate code to temporarily
* change gfp_allowed_mask in order to avoid using I/O during memory allocations
* while devices are suspended. To avoid races with the suspend/hibernate code,
* they should always be called with pm_mutex held (gfp_allowed_mask also should
* only be modified with pm_mutex held, unless the suspend/hibernate code is
* guaranteed not to run in parallel with that modification).
* they should always be called with system_transition_mutex held
* (gfp_allowed_mask also should only be modified with system_transition_mutex
* held, unless the suspend/hibernate code is guaranteed not to run in parallel
* with that modification).
*/
static gfp_t saved_gfp_mask;
void pm_restore_gfp_mask(void)
{
WARN_ON(!mutex_is_locked(&pm_mutex));
WARN_ON(!mutex_is_locked(&system_transition_mutex));
if (saved_gfp_mask) {
gfp_allowed_mask = saved_gfp_mask;
saved_gfp_mask = 0;
@ -173,7 +174,7 @@ void pm_restore_gfp_mask(void)
void pm_restrict_gfp_mask(void)
{
WARN_ON(!mutex_is_locked(&pm_mutex));
WARN_ON(!mutex_is_locked(&system_transition_mutex));
WARN_ON(saved_gfp_mask);
saved_gfp_mask = gfp_allowed_mask;
gfp_allowed_mask &= ~(__GFP_IO | __GFP_FS);