Enable hardware memory error handling for NFS
Truncation of data pages at runtime should be safe in NFS,
even when it doesn't support migration so far.
Trond tells me migration is also queued up for 2.6.32.
Acked-by: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Enable removing of corrupted pages through truncation
for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
These should cover most server needs.
I chose the set of migration aware file systems for this
for now, assuming they have been especially audited.
But in general it should be safe for all file systems
on the data area that support read/write and truncate.
Caveat: the hardware error handler does not take i_mutex
for now before calling the truncate function. Is that ok?
Cc: tytso@mit.edu
Cc: hch@infradead.org
Cc: mfasheh@suse.com
Cc: aia21@cantab.net
Cc: hugh.dickins@tiscali.co.uk
Cc: swhiteho@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add the high level memory handler that poisons pages
that got corrupted by hardware (typically by a two bit flip in a DIMM
or a cache) on the Linux level. The goal is to prevent everyone
from accessing these pages in the future.
This done at the VM level by marking a page hwpoisoned
and doing the appropriate action based on the type of page
it is.
The code that does this is portable and lives in mm/memory-failure.c
To quote the overview comment:
High level machine check handler. Handles pages reported by the
hardware as being corrupted usually due to a 2bit ECC memory or cache
failure.
This focuses on pages detected as corrupted in the background.
When the current CPU tries to consume corruption the currently
running process can just be killed directly instead. This implies
that if the error cannot be handled for some reason it's safe to
just ignore it because no corruption has been consumed yet. Instead
when that happens another machine check will happen.
Handles page cache pages in various states. The tricky part
here is that we can access any page asynchronous to other VM
users, because memory failures could happen anytime and anywhere,
possibly violating some of their assumptions. This is why this code
has to be extremely careful. Generally it tries to use normal locking
rules, as in get the standard locks, even if that means the
error handling takes potentially a long time.
Some of the operations here are somewhat inefficient and have non
linear algorithmic complexity, because the data structures have not
been optimized for this case. This is in particular the case
for the mapping from a vma to a process. Since this case is expected
to be rare we hope we can get away with this.
There are in principle two strategies to kill processes on poison:
- just unmap the data and wait for an actual reference before
killing
- kill as soon as corruption is detected.
Both have advantages and disadvantages and should be used
in different situations. Right now both are implemented and can
be switched with a new sysctl vm.memory_failure_early_kill
The default is early kill.
The patch does some rmap data structure walking on its own to collect
processes to kill. This is unusual because normally all rmap data structure
knowledge is in rmap.c only. I put it here for now to keep
everything together and rmap knowledge has been seeping out anyways
Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
Nick Piggin (who did a lot of great work) and others.
Cc: npiggin@suse.de
Cc: riel@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
This allows processes to override their early/late kill
behaviour on hardware memory errors.
Typically applications which are memory error aware is
better of with early kill (see the error as soon
as possible), all others with late kill (only
see the error when the error is really impacting execution)
There's a global sysctl, but this way an application
can set its specific policy.
We're using two bits, one to signify that the process
stated its intention and that
I also made the prctl future proof by enforcing
the unused arguments are 0.
The state is inherited to children.
Note this makes us officially run out of process flags
on 32bit, but the next patch can easily add another field.
Manpage patch will be supplied separately.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
The dirtying of page and set_page_dirty() can be moved into the page lock.
- In shmem_write_end(), the page was dirtied while the page lock was held,
but it's being marked dirty just after dropping the page lock.
- In shmem_symlink(), both dirtying and marking can be moved into page lock.
It's valuable for the hwpoison code to know whether one bad page can be dropped
without losing data. It mainly judges by testing the PG_dirty bit after taking
the page lock. So it becomes important that the dirtying of page and the
marking of dirtiness are both done inside the page lock. Which is a common
practice, but sadly not a rule.
The noticeable exceptions are
- mapped pages
- pages with buffer_heads
The above pages could go dirty at any time. Fortunately the hwpoison will
unmap the page and release the buffer_heads beforehand anyway.
Many other types of pages (eg. metadata pages) can also be dirtied at will by
their owners, the hwpoison code cannot do meaningful things to them anyway.
Only the dirtiness of pagecache pages owned by regular files are interested.
v2: AK: Add comment about set_page_dirty rules (suggested by Peter Zijlstra)
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Truncating metadata pages is not safe right now before
we haven't audited all file systems.
To enable truncation only for data address space define
a new address_space callback error_remove_page.
This is used for memory_failure.c memory error handling.
This can be then set to truncate_inode_page()
This patch just defines the new operation and adds documentation.
Callers and users come in followon patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add a simple way to invalidate a single page
This is just a refactoring of the truncate.c code.
Originally from Fengguang, modified by Andi Kleen.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Extract out truncate_inode_page() out of the truncate path so that
it can be used by memory-failure.c
[AK: description, headers, fix typos]
v2: Some white space changes from Fengguang Wu
Signed-off-by: Andi Kleen <ak@linux.intel.com>
If memory corruption hits the free buddy pages, we can safely ignore them.
No one will access them until page allocation time, then prep_new_page()
will automatically check and isolate PG_hwpoison page for us (for 0-order
allocation).
This patch expands prep_new_page() to check every component page in a high
order page allocation, in order to completely stop PG_hwpoison pages from
being recirculated.
Note that the common case -- only allocating a single page, doesn't
do any more work than before. Allocating > order 0 does a bit more work,
but that's relatively uncommon.
This simple implementation may drop some innocent neighbor pages, hopefully
it is not a big problem because the event should be rare enough.
This patch adds some runtime costs to high order page users.
[AK: Improved description]
v2: Andi Kleen:
Port to -mm code
Move check into separate function.
Don't dump stack in bad_pages for hwpoisoned pages.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
When a page has the poison bit set replace the PTE with a poison entry.
This causes the right error handling to be done later when a process runs
into it.
v2: add a new flag to not do that (needed for the memory-failure handler
later) (Fengguang)
v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan)
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
which are selected by magic flag variables. The logic is not very straight
forward, because each of these flag change multiple behaviours (e.g.
migration turns off aging, not only sets up migration ptes etc.)
Also the different flags interact in magic ways.
A later patch in this series adds another mode to try_to_unmap, so
this becomes quickly unmanageable.
Replace the different flags with a action code (migration, munlock, munmap)
and some additional flags as modifiers (ignore mlock, ignore aging).
This makes the logic more straight forward and allows easier extension
to new behaviours. Change all the caller to declare what they want to
do.
This patch is supposed to be a nop in behaviour. If anyone can prove
it is not that would be a bug.
Cc: Lee.Schermerhorn@hp.com
Cc: npiggin@suse.de
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add VM_FAULT_HWPOISON handling to the x86 page fault handler. This is
very similar to VM_FAULT_OOM, the only difference is that a different
si_code is passed to user space and the new addr_lsb field is initialized.
v2: Make the printk more verbose/unique
Cc: x86@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Bail out early when hardware poisoned pages are found in page fault handling.
Since they are poisoned they should not be mapped freshly into processes,
because that would cause another (potentially deadly) machine check
This is generally handled in the same way as OOM, just a different
error code is returned to the architecture code.
v2: Do a page unlock if needed (Fengguang Wu)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
- Add a new VM_FAULT_HWPOISON error code to handle_mm_fault. Right now
architectures have to explicitely enable poison page support, so
this is forward compatible to all architectures. They only need
to add it when they enable poison page support.
- Add poison page handling in swap in fault code
v2: Add missing delayacct_clear_flag (Hidehiro Kawai)
v3: Really use delayacct_clear_flag (Hidehiro Kawai)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add new SIGBUS codes for reporting machine checks as signals. When
the hardware detects an uncorrected ECC error it can trigger these
signals.
This is needed for telling KVM's qemu about machine checks that happen to
guests, so that it can inject them, but might be also useful for other programs.
I find it useful in my test programs.
This patch merely defines the new types.
- Define two new si_codes for SIGBUS. BUS_MCEERR_AO and BUS_MCEERR_AR
* BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some
corruption has been detected in the background, but nothing has been consumed
so far. The program can ignore those if it wants (but most programs would
already get killed)
* BUS_MCEERR_AR is for "Action Required" machine checks. This happens
when corrupted data is consumed or the application ran into an area
which has been known to be corrupted earlier. These require immediate
action and cannot just returned to. Most programs would kill themselves.
- They report the address of the corruption in the user address space
in si_addr.
- Define a new si_addr_lsb field that reports the extent of the corruption
to user space. That's currently always a (small) page. The user application
cannot tell where in this page the corruption happened.
AK: I plan to write a man page update before anyone asks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Memory migration uses special swap entry types to trigger special actions on
page faults. Extend this mechanism to also support poisoned swap entries, to
trigger poison handling on page faults. This allows follow-on patches to
prevent processes from faulting in poisoned pages again.
v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu)
v3: Better overflow fix (Hidehiro Kawai)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Needed for later patch that walks rmap entries on its own.
This used to be very frowned upon, but memory-failure.c does
some rather specialized rmap walking and rmap has been stable
for quite some time, so I think it's ok now to export it.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Hardware poisoned pages need special handling in the VM and shouldn't be
touched again. This requires a new page flag. Define it here.
The page flags wars seem to be over, so it shouldn't be a problem
to get a new one.
v2: Add TestSetHWPoison (suggested by Johannes Weiner)
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6:
ide: fixup for fujitsu disk
ide: convert to ->proc_fops
at91_ide: remove headers specific for at91sam9263
IDE: palm_bk3710: convert clock usage after clkdev conversion
ide: fix races in handling of user-space SET XFER commands
ide: allow ide_dev_read_id() to be called from the IRQ context
ide: ide-taskfile.c fix style problems
drivers/ide/ide-cd.c: Use DIV_ROUND_CLOSEST
ide-tape: fix handling of postponed rqs
ide-tape: convert to ide_debug_log macro
ide-tape: fix debug call
ide: Fix annoying warning in ide_pio_bytes().
IDE: Save a call to PageHighMem()
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
Due to problems at cam.org, my nico@cam.org email address is no longer
valid. FRom now on, nico@fluxnic.net should be used instead.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix buffer overflow in perf_copy_attr()
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, pat: Fix cacheflush address in change_page_attr_set_clr()
mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
x86: Fix earlyprintk=dbgp for machines without NX
x86, pat: Sanity check remap_pfn_range for RAM region
x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
x86, pat: Add lookup_memtype to get the current memtype of a paddr
x86, pat: Use page flags to track memtypes of RAM pages
x86, pat: Generalize the use of page flag PG_uncached
x86, pat: Add rbtree to do quick lookup in memtype tracking
x86, pat: Add PAT reserve free to io_mapping* APIs
x86, pat: New i/f for driver to request memtype for IO regions
x86, pat: ioremap to follow same PAT restrictions as other PAT users
x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
x86, mtrr: make mtrr_aps_delayed_init static bool
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
x86: Fix an incorrect argument of reserve_bootmem()
x86: Fix system crash when loading with "reservetop" parameter
* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, intel_txt: clean up the impact on generic code, unbreak non-x86
x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
x86, intel_txt: Fix typos in Kconfig help
x86, intel_txt: Factor out the code for S3 setup
x86, intel_txt: tboot.c needs <asm/fixmap.h>
intel_txt: Force IOMMU on for Intel TXT launch
x86, intel_txt: Intel TXT Sx shutdown support
x86, intel_txt: Intel TXT reboot/halt shutdown support
x86, intel_txt: Intel TXT boot support
* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: remove restore in resume
agp: fix uninorth build
intel-agp: Set dma mask for i915
agp: kill phys_to_gart() and gart_to_phys()
intel-agp: fix sglist allocation to avoid vmalloc()
intel-agp: Move repeated sglist free into separate function
agp: Switch agp_{un,}map_page() to take struct page * argument
agp: tidy up handling of scratch pages w.r.t. DMA API
intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
agp: Add generic support for graphics dma remapping
agp: Switch mask_memory() method to take address argument again, not page
If we pass a big size data over perf_counter_open() syscall,
the kernel will copy this data to a small buffer, it will
cause kernel crash.
This bug makes the kernel unsafe and non-root local user can
trigger it.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
LKML-Reference: <4AAF37D4.5010706@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
SELinux: inline selinux_is_enabled in !CONFIG_SECURITY_SELINUX
KEYS: Fix garbage collector
KEYS: Unlock tasklist when exiting early from keyctl_session_to_parent
CRED: Allow put_cred() to cope with a NULL groups list
SELinux: flush the avc before disabling SELinux
SELinux: seperate avc_cache flushing
Creds: creds->security can be NULL is selinux is disabled
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
at_hdmac: Rework suspend_late()/resume_early()
PM: Reset transition_started at dpm_resume_noirq
PM: Update kerneldoc comments in drivers/base/power/main.c
PM: Add convenience macro to make switching to dev_pm_ops less error-prone
hp-wmi: Switch driver to dev_pm_ops
floppy: Switch driver to dev_pm_ops
PM: Trivial fixes
PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
PM/Hibernate: Rework shrinking of memory
PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
PM: Run-time PM platform device bus support
PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
Driver Core: Make PM operations a const pointer
PM: Remove platform device suspend_late()/resume_early() V2
USB: Rework musb suspend()/resume_early()
I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
I2C: Rework i2c-pxa suspend_late()/resume_early()
DMA: Rework txx9dmac suspend_late()/resume_early()
...
Fix trivial conflict in drivers/base/platform.c (due to same
constification patch being merged in both sides, along with some other
PM work in the PM branch)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig:
kconfig: add missing dependency of conf to localyesconfig
kconfig: test if a .config already exists
kconfig: make local .config default for streamline_config
kconfig: test for /boot/config-uname after /proc/config.gz in localconfig
kconfig: unset IKCONFIG_PROC and clean up nesting
kconfig: search for a config to base the local(mod|yes)config on
kconfig: keep config.gz around even if CONFIG_IKCONFIG_PROC is not set
kconfig: have extract-ikconfig read ELF files
kconfig: add check if end exists in extract-ikconfig
kconfig: enable CONFIG_IKCONFIG from streamline_config.pl
kconfig: do not warn about modules built in
kconfig: streamline_config.pl do not stop with no depends
kconfig: add make localyesconfig option
kconfig: make localmodconfig to run streamline_config.pl
kconfig: add streamline_config.pl to scripts
Added Kworld DVD Maker 2
Thanks to C Western <l@c-m-w.me.uk> for reporting this board.
Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Without this patch building a kernel emits millions of warning like:
include/linux/selinux.h:92: warning: ?selinux_is_enabled? defined but not used
When it is build without CONFIG_SECURITY_SELINUX. This is harmless, but
the function should be inlined, so it gets compiled out.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (52 commits)
Input: bcm5974 - silence uninitialized variables warnings
Input: wistron_btns - add keymap for AOpen 1557
Input: psmouse - use boolean type
Input: i8042 - use platform_driver_probe
Input: i8042 - use boolean type where it makes sense
Input: i8042 - try disabling and re-enabling AUX port at close
Input: pxa27x_keypad - allow modifying keymap from userspace
Input: sunkbd - fix formatting
Input: i8042 - bypass AUX IRQ delivery test on laptops
Input: wacom_w8001 - simplify querying logic
Input: atkbd - allow setting force-release bitmap via sysfs
Input: w90p910_keypad - move a dereference below a NULL test
Input: add twl4030_keypad driver
Input: matrix-keypad - add function to build device keymap
Input: tosakbd - fix cleaning up KEY_STROBEs after error
Input: joydev - validate axis/button maps before clobbering current ones
Input: xpad - add USB ID for the drumkit controller from Rock Band
Input: w90p910_keypad - rename driver name to match platform
Input: add new driver for Sentelic Finger Sensing Pad
Input: psmouse - allow defining read-only attributes
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: completely remove apple mightymouse from blacklist
HID: support larger reports than 64 bytes in hiddev
HID: local function should be static
HID: ignore Philips IEEE802.15.4 RF Dongle
HID: ignore all recent SoundGraph iMON devices
HID: fix memory leak on error patch in debug code
HID: fix overrun in quirks initialization
HID: Drop NULL test on list_entry result
HID: driver for Twinhan USB 6253:0100 remote control
HID: adding __init/__exit macros to module init/exit functions
HID: add rumble support for Thrustmaster Dual Trigger 3-in-1
HID: ntrig tool separation and pen usages
HID: Avoid double spin_lock_init on usbhid->lock
HID: add force feedback support for Logitech WingMan Formula Force GP
HID: Support new variants of Samsung USB IR receiver (0419:0001)
HID: fix memory leak on error path in debug code
HID: fix debugfs build with !CONFIG_DEBUG_FS
HID: use debugfs for events/reports dumping
HID: use debugfs for report dumping descriptor
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
block: use blkdev_issue_discard in blk_ioctl_discard
Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
block: don't assume device has a request list backing in nr_requests store
block: Optimal I/O limit wrapper
cfq: choose a new next_req when a request is dispatched
Seperate read and write statistics of in_flight requests
aoe: end barrier bios with EOPNOTSUPP
block: trace bio queueing trial only when it occurs
block: enable rq CPU completion affinity by default
cfq: fix the log message after dispatched a request
block: use printk_once
cciss: memory leak in cciss_init_one()
splice: update mtime and atime on files
block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
cfq-iosched: get rid of must_alloc flag
block: use interrupts disabled version of raise_softirq_irqoff()
block: fix comment in blk-iopoll.c
block: adjust default budget for blk-iopoll
block: fix long lines in block/blk-iopoll.c
block: add blk-iopoll, a NAPI like approach for block devices
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (209 commits)
[SCSI] fix oops during scsi scanning
[SCSI] libsrp: fix memory leak in srp_ring_free()
[SCSI] libiscsi, bnx2i: make bound ep check common
[SCSI] libiscsi: add completion function for drivers that do not need pdu processing
[SCSI] scsi_dh_rdac: changes for rdac debug logging
[SCSI] scsi_dh_rdac: changes to collect the rdac debug information during the initialization
[SCSI] scsi_dh_rdac: move the init code from rdac_activate to rdac_bus_attach
[SCSI] sg: fix oops in the error path in sg_build_indirect()
[SCSI] mptsas : Bump version to 3.04.12
[SCSI] mptsas : FW event thread and scsi mid layer deadlock in SYNCHRONIZE CACHE command
[SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
[SCSI] mptsas : PAE Kernel more than 4 GB kernel panic
[SCSI] mptsas : NULL pointer on big endian systems causing Expander not to tear off
[SCSI] mptsas : Sanity check for phyinfo is added
[SCSI] scsi_dh_rdac: Add support for Sun StorageTek ST2500, ST2510 and ST2530
[SCSI] pmcraid: PMC-Sierra MaxRAID driver to support 6Gb/s SAS RAID controller
[SCSI] qla2xxx: Update version number to 8.03.01-k6.
[SCSI] qla2xxx: Properly delete rports attached to a vport.
[SCSI] qla2xxx: Correct various NPIV issues.
[SCSI] qla2xxx: Correct qla2x00_eh_wait_on_command() to wait correctly.
...
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (257 commits)
[ARM] Update mach-types
ARM: 5636/1: Move vendor enum to AMBA include
ARM: Fix pfn_valid() for sparse memory
[ARM] orion5x: Add LaCie NAS 2Big Network support
[ARM] pxa/sharpsl_pm: zaurus c3000 aka spitz: fix resume
ARM: 5686/1: at91: Correct AC97 reset line in at91sam9263ek board
ARM: 5640/1: This patch modifies the support of AC97 on the at91sam9263 ek board
ARM: 5689/1: Update default config of HP Jornada 700-series machines
ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with highmem
ARM: 5688/1: ks8695_serial: disable_irq() lockup
ARM: 5687/1: fix an oops with highmem
ARM: 5684/1: Add nuc960 platform to w90x900
ARM: 5683/1: Add nuc950 platform to w90x900
ARM: 5682/1: Add cpu.c and dev.c and modify some files of w90p910 platform
ARM: 5626/1: add suspend/resume functions to amba-pl011 serial driver
ARM: 5625/1: fix hard coded 4K resource size in amba bus detection
MMC: MMCI: convert realview MMC to use gpiolib
ARM: 5685/1: Make MMCI driver compile without gpiolib
ARM: implement highpte
ARM: Show FIQ in /proc/interrupts on CONFIG_FIQ
...
Fix up trivial conflict in arch/arm/kernel/signal.c.
It was due to the TIF_NOTIFY_RESUME addition in commit d0420c83f ("KEYS:
Extend TIF_NOTIFY_RESUME to (almost) all architectures") and follow-ups.
My 353d5c30c6 "mm: fix hugetlb bug due to
user_shm_unlock call" broke the CONFIG_SYSVIPC !CONFIG_MMU build of both
2.6.31 and 2.6.30.6: "undefined reference to `user_shm_unlock'".
gcc didn't understand my comment! so couldn't figure out to optimize
away user_shm_unlock() from the error path in the hugetlb-less case, as
it does elsewhere. Help it to do so, in a language it understands.
Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
MAINTAINERS: update KVM entry
KVM: correct error-handling code
KVM: fix compile warnings on s390
KVM: VMX: Check cpl before emulating debug register access
KVM: fix misreporting of coalesced interrupts by kvm tracer
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
KVM: VMX: call vmx_load_host_state() only if msr is cached
KVM: VMX: Conditionally reload debug register 6
KVM: Use thread debug register storage instead of kvm specific data
KVM guest: do not batch pte updates from interrupt context
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
KVM: Document KVM_CAP_IRQCHIP
KVM: Protect update_cr8_intercept() when running without an apic
KVM: VMX: Fix EPT with WP bit change during paging
KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
KVM: x86 emulator: Add adc and sbb missing decoder flags
KVM: Add missing #include
...
console_print() is an old legacy interface mostly unused in the entire
kernel tree. It's best to clean up its existing use and let developers
use their own implementation of it as they feel fit.
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: fix slab_pad_check()
slub: release kobject if sysfs_create_group failed in sysfs_slab_add
SLUB: fix ARCH_KMALLOC_MINALIGN cases 64 and 256
SLUB: Fix some coding style issues
SLUB: Drop write permission to /proc/slabinfo
slab: remove duplicate kmem_cache_init_late() declarations
slub: change kmem_cache->align to record the real alignment
slub: use size and objsize orders to disable debug flags
slub: add option to disable higher order debugging slabs
Fix a number of problems with the new key garbage collector:
(1) A rogue semicolon in keyring_gc() was causing the initial count of dead
keys to be miscalculated.
(2) A missing return in keyring_gc() meant that under certain circumstances,
the keyring semaphore would be unlocked twice.
(3) The key serial tree iterator (key_garbage_collector()) part of the garbage
collector has been modified to:
(a) Complete each scan of the keyrings before setting the new timer.
(b) Only set the new timer for keys that have yet to expire. This means
that the new timer is now calculated correctly, and the gc doesn't
get into a loop continually scanning for keys that have expired, and
preventing other things from happening, like RCU cleaning up the old
keyring contents.
(c) Perform an extra scan if any keys were garbage collected in this one
as a key might become garbage during a scan, and (b) could mean we
don't set the timer again.
(4) Made key_schedule_gc() take the time at which to do a collection run,
rather than the time at which the key expires. This means the collection
of dead keys (key type unregistered) can happen immediately.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
When we exit early from keyctl_session_to_parent because of permissions or
because the session keyring is the same as the parent, we need to unlock the
tasklist.
The missing unlock causes the system to hang completely when using
keyctl(KEYCTL_SESSION_TO_PARENT) with a keyring shared with the parent.
Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
put_cred() will oops if given a NULL groups list, but that is now possible with
the existence of cred_alloc_blank(), as used in keyctl_session_to_parent().
Added in commit:
commit ee18d64c1f
Author: David Howells <dhowells@redhat.com>
Date: Wed Sep 2 09:14:21 2009 +0100
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
fsync: wait for data writeout completion before calling ->fsync
vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()
fat: Opencode sync_page_range_nolock()
pohmelfs: Use new syncing helper
xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
ocfs2: Update syncing after splicing to match generic version
ntfs: Use new syncing helpers and update comments
ext4: Remove syncing logic from ext4_file_write
ext3: Remove syncing logic from ext3_file_write
ext2: Update comment about generic_osync_inode
vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
vfs: Rename generic_file_aio_write_nolock
ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
vfs: Export __generic_file_aio_write() and add some comments
vfs: Introduce filemap_fdatawait_range