kernel-fxtec-pro1x

Author	SHA1	Message	Date
Johannes Weiner	a756cf5908	mm: try to distribute dirty pages fairly across zones The maximum number of dirty pages that exist in the system at any time is determined by a number of pages considered dirtyable and a user-configured percentage of those, or an absolute number in bytes. This number of dirtyable pages is the sum of memory provided by all the zones in the system minus their lowmem reserves and high watermarks, so that the system can retain a healthy number of free pages without having to reclaim dirty pages. But there is a flaw in that we have a zoned page allocator which does not care about the global state but rather the state of individual memory zones. And right now there is nothing that prevents one zone from filling up with dirty pages while other zones are spared, which frequently leads to situations where kswapd, in order to restore the watermark of free pages, does indeed have to write pages from that zone's LRU list. This can interfere so badly with IO from the flusher threads that major filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim already, taking away the VM's only possibility to keep such a zone balanced, aside from hoping the flushers will soon clean pages from that zone. Enter per-zone dirty limits. They are to a zone's dirtyable memory what the global limit is to the global amount of dirtyable memory, and try to make sure that no single zone receives more than its fair share of the globally allowed dirty pages in the first place. As the number of pages considered dirtyable excludes the zones' lowmem reserves and high watermarks, the maximum number of dirty pages in a zone is such that the zone can always be balanced without requiring page cleaning. As this is a placement decision in the page allocator and pages are dirtied only after the allocation, this patch allows allocators to pass __GFP_WRITE when they know in advance that the page will be written to and become dirty soon. The page allocator will then attempt to allocate from the first zone of the zonelist - which on NUMA is determined by the task's NUMA memory policy - that has not exceeded its dirty limit. At first glance, it would appear that the diversion to lower zones can increase pressure on them, but this is not the case. With a full high zone, allocations will be diverted to lower zones eventually, so it is more of a shift in timing of the lower zone allocations. Workloads that previously could fit their dirty pages completely in the higher zone may be forced to allocate from lower zones, but the amount of pages that "spill over" are limited themselves by the lower zones' dirty constraints, and thus unlikely to become a problem. For now, the problem of unfair dirty page distribution remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, wake up the flusher threads and remedy the situation. Because of this, an allocation that could not succeed on any of the considered zones is allowed to ignore the dirty limits before going into direct reclaim or even failing the allocation, until a future patch changes the global dirty throttling and flusher thread activation so that they take individual zone states into account. Test results 15M DMA + 3246M DMA32 + 504 Normal = 3765M memory 40% dirty ratio 16G USB thumb drive 10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 << 15)) seconds nr_vmscan_write (stddev) min\| median\| max xfs vanilla: 549.747( 3.492) 0.000\| 0.000\| 0.000 patched: 550.996( 3.802) 0.000\| 0.000\| 0.000 fuse-ntfs vanilla: 1183.094(53.178) 54349.000\| 59341.000\| 65163.000 patched: 558.049(17.914) 0.000\| 0.000\| 43.000 btrfs vanilla: 573.679(14.015) 156657.000\| 460178.000\| 606926.000 patched: 563.365(11.368) 0.000\| 0.000\| 1362.000 ext4 vanilla: 561.197(15.782) 0.000\|2725438.000\|4143837.000 patched: 568.806(17.496) 0.000\| 0.000\| 0.000 Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Tested-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
Johannes Weiner	ccafa2879f	mm: writeback: cleanups in preparation for per-zone dirty limits The next patch will introduce per-zone dirty limiting functions in addition to the traditional global dirty limiting. Rename determine_dirtyable_memory() to global_dirtyable_memory() before adding the zone-specific version, and fix up its documentation. Also, move the functions to determine the dirtyable memory and the function to calculate the dirty limit based on that together so that their relationship is more apparent and that they can be commented on as a group. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mel@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
Johannes Weiner	ab8fabd46f	mm: exclude reserved pages from dirtyable memory Per-zone dirty limits try to distribute page cache pages allocated for writing across zones in proportion to the individual zone sizes, to reduce the likelihood of reclaim having to write back individual pages from the LRU lists in order to make progress. This patch: The amount of dirtyable pages should not include the full number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: +----------+ --- \| anon \| \| +----------+ \| \| \| \| \| \| -- dirty limit new -- flusher new \| file \| \| \| \| \| \| \| \| \| -- dirty limit old -- flusher old \| \| \| +----------+ --- reclaim \| reserved \| +----------+ \| kernel \| +----------+ This patch introduces a per-zone dirty reserve that takes both the lowmem reserve as well as the high watermark of the zone into account, and a global sum of those per-zone values that is subtracted from the global amount of dirtyable pages. The lowmem reserve is unavailable to page cache allocations and kswapd tries to keep the high watermark free. We don't want to end up in a situation where reclaim has to clean pages in order to balance zones. Not treating reserved pages as dirtyable on a global level is only a conceptual fix. In reality, dirty pages are not distributed equally across zones and reclaim runs into dirty pages on a regular basis. But it is important to get this right before tackling the problem on a per-zone level, where the distance between reclaim and the dirty pages is mostly much smaller in absolute numbers. [akpm@linux-foundation.org: fix highmem build] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
KOSAKI Motohiro	25bd91bd27	vmscan: add task name to warn_scan_unevictable() messages If we need to know a usecase, caller program name is critical important. Show it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> David Rientjes <rientjes@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
David Rientjes	f6d7e0cb3e	mm, debug: test for online nid when allocating on single node Calling alloc_pages_exact_node() means the allocation only passes the zonelist of a single node into the page allocator. If that node isn't online, it's zonelist may never have been initialized causing a strange oops that may not immediately be clear. I recently debugged an issue where node 0 wasn't online and an allocator was passing 0 to alloc_pages_exact_node() and it resulted in a NULL pointer on zonelist->_zoneref. If CONFIG_DEBUG_VM is enabled, though, it would be nice to catch this a bit earlier. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
Shawn Bohrer	ad8a1b558e	fadvise: only initiate writeback for specified range with FADV_DONTNEED Previously POSIX_FADV_DONTNEED would start writeback for the entire file when the bdi was not write congested. This negatively impacts performance if the file contains dirty pages outside of the requested range. This change uses __filemap_fdatawrite_range() to only initiate writeback for the requested range. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
Stanislaw Gruszka	fc8d8620d3	slub: min order when debug_guardpage_minorder > 0 Disable slub debug facilities and allocate slabs at minimal order when debug_guardpage_minorder > 0 to increase probability to catch random memory corruption by cpu exception. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
Stanislaw Gruszka	c6968e73b9	PM/Hibernate: do not count debug pages as savable When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder > 0, we have lot of free pages that are not marked so. Snapshot code account them as savable, what cause hibernate memory preallocation failure. It is pretty hard to make hibernate allocation succeed with debug_guardpage_minorder=1. This change at least make it possible when system has relatively big amount of RAM. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Stanislaw Gruszka	c0a32fc5a2	mm: more intensive memory corruption debugging With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception on access (read,write) to an unallocated page, which permits us to catch code which corrupts memory. However the kernel is trying to maximise memory usage, hence there are usually few free pages in the system and buggy code usually corrupts some crucial data. This patch changes the buddy allocator to keep more free/protected pages and to interlace free/protected and allocated pages to increase the probability of catching corruption. When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC, debug_guardpage_minorder defines the minimum order used by the page allocator to grant a request. The requested size will be returned with the remaining pages used as guard pages. The default value of debug_guardpage_minorder is zero: no change from current behaviour. [akpm@linux-foundation.org: tweak documentation, s/flg/flag/] Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
David Daney	1399ff86f2	kernel.h: add BUILD_BUG() macro We can place this in definitions that we expect the compiler to remove by dead code elimination. If this assertion fails, we get a nice error message at build time. The GCC function attribute error("message") was added in version 4.3, so we define a new macro __linktime_error(message) to expand to this for GCC-4.3 and later. This will give us an error diagnostic from the compiler on the line that fails. For other compilers __linktime_error(message) expands to nothing, and we have to be content with a link time error, but at least we will still get a build error. BUILD_BUG() expands to the undefined function __build_bug_failed() and will fail at link time if the compiler ever emits code for it. On GCC-4.3 and later, attribute((error())) is used so that the failure will be noted at compile time instead. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: David Rientjes <rientjes@google.com> Cc: DM <dm.n9107@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
KAMEZAWA Hiroyuki	1e16a539ac	mm/hugetlb.c: fix virtual address handling in hugetlb fault handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This address is not aligned to a hugepage boundary. Most of the functions for hugetlb pages are aware of that and calculate an alignment themselves. However some functions such as copy_user_huge_page() and clear_huge_page() don't handle alignment by themselves. This patch make hugeltb_fault() fix the alignment and pass an aligned addresss (to address of a faulted hugepage) to functions. [akpm@linux-foundation.org: use &=] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Michal Hocko	ef009b25f4	hugetlb: clarify hugetlb_instantiation_mutex usage Let's make it clear that we cannot race with other fault handlers due to hugetlb (global) mutex. Also make it clear that we want to keep pte_same checks anayway to have a transition from the global mutex easier. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Hillf Danton	a734bcc812	hugetlb: detect race upon page allocation failure during COW Currently we are not rechecking pte_same in hugetlb_cow after we take ptl lock again in the page allocation failure code path and simply retry again. This is not an issue at the moment because hugetlb fault path is protected by hugetlb_instantiation_mutex so we cannot race. The original page is locked and so we cannot race even with the page migration. Let's add the pte_same check anyway as we want to be consistent with the other check later in this function and be safe if we ever remove the mutex. [mhocko@suse.cz: reworded the changelog] Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Konstantin Khlebnikov	5f8aefd44e	mm: account reaped page cache on inode cache pruning Inode cache pruning indirectly reclaims page-cache by invalidating mapping pages. Let's account them into reclaim-state to notice this progress in memory reclaimer. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Mel Gorman	f90ac3982a	mm: avoid livelock on !__GFP_FS allocations Colin Cross reported; Under the following conditions, __alloc_pages_slowpath can loop forever: gfp_mask & __GFP_WAIT is true gfp_mask & __GFP_FS is false reclaim and compaction make no progress order <= PAGE_ALLOC_COSTLY_ORDER These conditions happen very often during suspend and resume, when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL allocations into __GFP_WAIT. The oom killer is not run because gfp_mask & __GFP_FS is false, but should_alloc_retry will always return true when order is less than PAGE_ALLOC_COSTLY_ORDER. In his fix, he avoided retrying the allocation if reclaim made no progress and __GFP_FS was not set. The problem is that this would result in GFP_NOIO allocations failing that previously succeeded which would be very unfortunate. The big difference between GFP_NOIO and suspend converting GFP_KERNEL to behave like GFP_NOIO is that normally flushers will be cleaning pages and kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay. The same does not necessarily apply during suspend as the storage device may be suspended. This patch special cases the suspend case to fail the page allocation if reclaim cannot make progress and adds some documentation on how gfp_allowed_mask is currently used. Failing allocations like this may cause suspend to abort but that is better than a livelock. [mgorman@suse.de: Rework fix to be suspend specific] [rientjes@google.com: Move suspended device check to should_alloc_retry] Reported-by: Colin Cross <ccross@android.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Mel Gorman	938929f14c	mm: reduce the amount of work done when updating min_free_kbytes When min_free_kbytes is updated, some pageblocks are marked MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early in boot but on large machines with 1TB of memory, this has been reported to delay boot times, probably due to the NUMA distances involved. The bulk of the work is due to calling calling pageblock_is_reserved() an unnecessary amount of times and accessing far more struct page metadata than is necessary. This patch significantly reduces the amount of work done by setup_zone_migrate_reserve() improving boot times on 1TB machines. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:42 -08:00
Jacobo Giralt	937a94c9db	mm: migrate: one less atomic operation migrate_page_move_mapping() drops a reference from the old page after unfreezing its counter. Both operations can be merged into a single atomic operation by directly unfreezing to one less reference. The same applies to migrate_huge_page_move_mapping(). Signed-off-by: Jacobo Giralt <jacobo.giralt@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	90a5d5af74	mm-tracepoint: fix documentation and examples We renamed the page-free mm tracepoints. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	b413d48aa7	mm-tracepoint: rename page-free events Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into mm_page_free_batched Since v2.6.33-5426-gc475dab the kernel triggers mm_page_free_direct for all freed pages, not only for directly freed. So, let's name it properly. For pages freed via page-list we also trigger mm_page_free_batched event. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	da066ad357	mm: remove unused pagevec_free It not exported and now nobody uses it. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	cc59850ef9	mm: add free_hot_cold_page_list() helper This patch adds helper free_hot_cold_page_list() to free list of 0-order pages. It frees pages directly from list without temporary page-vector. It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour. bloat-o-meter: add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28) function old new delta free_hot_cold_page_list - 264 +264 get_page_from_freelist 2129 2132 +3 __pagevec_free 243 239 -4 split_free_page 380 373 -7 release_pages 606 510 -96 free_page_list 188 - -188 Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	c909e99364	vmscan: activate executable pages after first usage Logic added in commit `8cab4754d2` ("vmscan: make mapped executable pages the first class citizen") was noticeably weakened in commit `6457474624` ("vmscan: detect mapped file pages used only once"). Currently these pages can become "first class citizens" only after second usage. After this patch page_check_references() will activate they after first usage, and executable code gets yet better chance to stay in memory. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Konstantin Khlebnikov	34dbc67a64	vmscan: promote shared file mapped pages Commit `6457474624` ("vmscan: detect mapped file pages used only once") greatly decreases lifetime of single-used mapped file pages. Unfortunately it also decreases life time of all shared mapped file pages. Because after commit `bf3f3bc5e7` ("mm: don't mark_page_accessed in fault path") page-fault handler does not mark page active or even referenced. Thus page_check_references() activates file page only if it was used twice while it stays in inactive list, meanwhile it activates anon pages after first access. Inactive list can be small enough, this way reclaimer can accidentally throw away any widely used page if it wasn't used twice in short period. After this patch page_check_references() also activate file mapped page at first inactive list scan if this page is already used multiple times via several ptes. I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5 (~2.6.18). There a complete mess with >100 web/mail/spam/ftp containers, they share all their files but there a lot of anonymous pages: ~500mb shared file mapped memory and 15-20Gb non-shared anonymous memory. In this situation major-pagefaults are very costly, because all containers share the same page. In my load kernel created a disproportionate pressure on the file memory, compared with the anonymous, they equaled only if I raise swappiness up to 150 =) These patches actually wasn't helped a lot in my problem, but I saw noticable (10-20 times) reduce in count and average time of major-pagefault in file-mapped areas. Actually both patches are fixes for commit v2.6.33-5448-g6457474, because it was aimed at one scenario (singly used pages), but it breaks the logic in other scenarios (shared and/or executable pages) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Johannes Weiner	1edf223485	mm/page-writeback.c: make determine_dirtyable_memory static again The tracing ring-buffer used this function briefly, but not anymore. Make it local to the writeback code again. Also, move the function so that no forward declaration needs to be reintroduced. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:41 -08:00
Linus Torvalds	54c2c5761f	Ext4 commits for 3.3 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJPDIkfAAoJENNvdpvBGATwWhwQAJ/rVsDAVSH7rbELW7jjdRRw 9vsDRbuCkBKsAi+r1nsfpjGnYTux1BnVU8BMGO8DnL6e43r9ihnm5o78amM9m8aL EDvCqogIY3hzi31gsuIs43DBPs5LwjPPQemgbeE+m/Zbpno/Ju+fDHL286Uvitj3 X22cC691/Ce7vXhnTPS2+p2fZtT17kvwmkxkVyTs3E3gXcKKtwI3WUrwyF1iwCXF +1WlKt1/YLdUs+OXPNNBNvLN5nF3b4UYOU7duQXDiY6ubro666VFIbRrIWVMRDFx o4bySo3RBie4AAOJq3IALPDFdRhnkk/qxpBkWvgwqDnMiusogJP6yWh9RdR8OB07 8nrkobxFwkkDnOFU4KaVjVmPtLmeq4ksiQfDFPnYRg/SQ2+BsdoxV8jpnZvugzKV XN+ABWqNucBuM/HoeinJF/UgT+nxpTr2QvH5ueNdvS3Kb9YsbrgJ/fX1941DWzMv r9P//ZUOueSGhqUB0IopvXSrjAbRMF1SjuAP9oLEJZkJ/+/dCPMohJTyusZ8qAaY MPb1QrQQej5dOb2jqe7BUe0eNKPT9eaZxCW3+hF5QMtlcLlJbHhDL5ijyyWJvTAZ GZh5RwdwANxOEfl0atXZcniBvxD2FA6hi2P/d4nBXGa3EkZRk5bDsCj12cPGPoGa irKRs7QRBiZGLYi0+W2a =3oE5 -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Ext4 commits for 3.3 merge window * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: fix undefined behavior in ext4_fill_flex_info() ext4: make more symbols static ext4: make local symbol ext4_initxattrs static jbd2: fix hung processes in jbd2_journal_lock_updates() ext4: reserve new feature flag codepoints ext4: Report max_batch_time option correctly ext4: add missing ext4_resize_end on error paths ext4: let ext4_group_add() use common code ext4: let ext4_group_extend() use common code ext4: add new online resize interface ext4: add a new function which adds a flex group to a fs ext4: add a new function which allocates bitmaps and inode tables ext4: pass verify_reserved_gdb() the number of group decriptors ext4: add a function which updates the super block during online resizing ext4: add a function which sets up a block group descriptors of a flex bg ext4: add a function which sets up group blocks of a flex bg ext4: add a structure which will be used by 64bit-resize interface ext4: add a function which adds a new group descriptors to a fs ext4: add a function which extends a group without checking parameters ext4: use proper little-endian bitops ...	2012-01-10 15:51:48 -08:00
Linus Torvalds	609eac1c15	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: iattr_valid flags are kernel internal flags map them to 9p values. fs/9p: We should not allocate a new inode when creating hardlines. fs/9p: v9fs_stat2inode should update suid/sgid bits. 9p: Reduce object size with CONFIG_NET_9P_DEBUG fs/9p: check schedule_timeout_interruptible return value Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to debug messages having changed to use p9_debug() on one hand, and the changes for umode_t on the other.	2012-01-10 15:09:01 -08:00
Linus Torvalds	57eccf1c2a	Merge branch 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Change the default setting of the nfs4_disable_idmapping parameter NFSv4: Save the owner/group name string when doing open NFS: Remove pNFS bloat from the generic write path pnfs-obj: Must return layout on IO error pnfs-obj: pNFS errors are communicated on iodata->pnfs_error NFS: Cache state owners after files are closed NFS: Clean up nfs4_find_state_owners_locked() NFSv4: include bitmap in nfsv4 get acl data nfs: fix a minor do_div portability issue NFSv4.1: cleanup comment and debug printk NFSv4.1: change nfs4_free_slot parameters for dynamic slots NFSv4.1: cleanup init and reset of session slot tables NFSv4.1: fix backchannel slotid off-by-one bug nfs: fix regression in handling of context= option in NFSv4 NFS - fix recent breakage to NFS error handling. NFS: Retry mounting NFSROOT SUNRPC: Clean up the RPCSEC_GSS service ticket requests	2012-01-10 14:57:40 -08:00
Linus Torvalds	5c395ae703	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBI: fix use-after-free on error path UBI: fix missing scrub when there is a bit-flip UBIFS: Use kmemdup rather than duplicating its implementation	2012-01-10 14:57:19 -08:00
Linus Torvalds	49d41bae46	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add recovery callbacks dlm: add node slots and generation dlm: move recovery barrier calls dlm: convert rsb list to rb_tree	2012-01-10 14:55:55 -08:00
Al Viro	b3f2a92447	hfsplus: creation of hidden dir on mount can fail Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 17:48:52 -05:00
Linus Torvalds	7b3480f8b7	MTD pull for 3.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAk8Mq/MACgkQdwG7hYl686PeFACfZCgbdDWD9A/JL+i1RMfExVu6 Pi0An3Hmc3PTCp0yQ21KtcKhpF9CAMEu =NfuL -----END PGP SIGNATURE----- Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6 MTD pull for 3.3 * tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits) mtd: Fix dependency for MTD_DOC200x mtd: do not use mtd->block_markbad directly logfs: do not use 'mtd->block_isbad' directly mtd: introduce mtd_can_have_bb helper mtd: do not use mtd->suspend and mtd->resume directly mtd: do not use mtd->lock, unlock and is_locked directly mtd: do not use mtd->sync directly mtd: harmonize mtd_writev usage mtd: do not use mtd->lock_user_prot_reg directly mtd: mtd->write_user_prot_reg directly mtd: do not use mtd->read__prot_reg directly mtd: do not use mtd->get__prot_info directly mtd: do not use mtd->read_oob directly mtd: mtdoops: do not use mtd->panic_write directly romfs: do not use mtd->get_unmapped_area directly mtd: do not use mtd->get_unmapped_area directly mtd: do use mtd->point directly mtd: introduce mtd_has_oob helper mtd: mtdcore: export symbols cleanup mtd: clean-up the default_mtd_writev function ... Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c	2012-01-10 13:45:22 -08:00
Linus Torvalds	1c8106528a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits) iommu/amd: Set IOTLB invalidation timeout iommu/amd: Init stats for iommu=pt iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume iommu/amd: Add invalidate-context call-back iommu/amd: Add amd_iommu_device_info() function iommu/amd: Adapt IOMMU driver to PCI register name changes iommu/amd: Add invalid_ppr callback iommu/amd: Implement notifiers for IOMMUv2 iommu/amd: Implement IO page-fault handler iommu/amd: Add routines to bind/unbind a pasid iommu/amd: Implement device aquisition code for IOMMUv2 iommu/amd: Add driver stub for AMD IOMMUv2 support iommu/amd: Add stat counter for IOMMUv2 events iommu/amd: Add device errata handling iommu/amd: Add function to get IOMMUv2 domain for pdev iommu/amd: Implement function to send PPR completions iommu/amd: Implement functions to manage GCR3 table iommu/amd: Implement IOMMUv2 TLB flushing routines iommu/amd: Add support for IOMMUv2 domain mode iommu/amd: Add amd_iommu_domain_direct_map function ...	2012-01-10 11:08:21 -08:00
Linus Torvalds	1a464cbb3d	Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linux * 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits) drm/nouveau/pm: fix build with HWMON off gma500: silence gcc warnings in mid_get_vbt_data() drm/ttm: fix condition (and vs or) drm/radeon: double lock typo in radeon_vm_bo_rmv() drm/radeon: use after free in radeon_vm_bo_add() drm/sis\|via: don't return stack garbage from free_mem ioctl drm/radeon/kms: remove pointless CS flags priority struct drm/radeon/kms: check if vm is supported in VA ioctl drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2) radeon: Fix disabling PCI bus mastering on big endian hosts. ttm: fix agp since ttm tt rework agp: Fix multi-line warning message whitespace drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages. drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool. drm/radeon/kms: sync across multiple rings when doing bo moves v3 drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2) drm/radeon: GPU virtual memory support v22 drm: make DRM_UNLOCKED ioctls with their own mutex drm: no need to hold global mutex for static data drm/radeon/benchmark: common modes sweep ignores 640x480@32 ... Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c	2012-01-10 11:04:36 -08:00
Linus Torvalds	dbe950f201	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (64 commits) Input: tc3589x-keypad - add missing kerneldoc Input: ucb1400-ts - switch to using dev_xxx() for diagnostic messages Input: ucb1400_ts - convert to threaded IRQ Input: ucb1400_ts - drop inline annotations Input: usb1400_ts - add __devinit/__devexit section annotations Input: ucb1400_ts - set driver owner Input: ucb1400_ts - convert to use dev_pm_ops Input: psmouse - make sure we do not use stale methods Input: evdev - do not block waiting for an event if fd is nonblock Input: evdev - if no events and non-block, return EAGAIN not 0 Input: evdev - only allow reading events if a full packet is present Input: add driver for pixcir i2c touchscreens Input: samsung-keypad - implement runtime power management support Input: tegra-kbc - report wakeup key for some platforms Input: tegra-kbc - add device tree bindings Input: add driver for AUO In-Cell touchscreens using pixcir ICs Input: mpu3050 - configure the sampling method Input: mpu3050 - ensure we enable interrupts Input: mpu3050 - add of_match table for device-tree probing Input: sentelic - document the latest hardware ... Fix up fairly trivial conflicts (device tree matching conflicting with some independent cleanups) in drivers/input/keyboard/samsung-keypad.c	2012-01-10 10:55:52 -08:00
Linus Torvalds	f62f61917d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (68 commits) hid-input/battery: add FEATURE quirk hid-input/battery: remove battery_val hid-input/battery: power-supply type really is a battery hid-input/battery: make the battery setup common for INPUTs and FEATUREs hid-input/battery: deal with both FEATURE and INPUT report batteries hid-input/battery: add quirks for battery hid-input/battery: remove apparently redundant kmalloc hid-input: add support for HID devices reporting Battery Strength HID: hid-multitouch: add support 9 new Xiroku devices HID: multitouch: add support for 3M 32" HID: multitouch: add support of Atmel multitouch panels HID: usbhid: defer LED setting to a workqueue HID: usbhid: hid-core: submit queued urbs before suspend HID: usbhid: remove LED_ON HID: emsff: use symbolic name instead of hardcoded PID constant HID: Enable HID_QUIRK_MULTI_INPUT for Trio Linker Plus II HID: Kconfig: fix syntax HID: introduce proper dependency of HID_BATTERY on POWER_SUPPLY HID: multitouch: support PixArt optical touch screen HID: make parser more verbose about parsing errors by default ... Fix up rename/delete conflict in drivers/hid/hid-hyperv.c (removed in staging, moved in this branch) and similarly for the rules for same file in drivers/staging/hv/{Kconfig,Makefile}.	2012-01-10 10:48:28 -08:00
Linus Torvalds	d04baa157d	SCSI updates for post 3.2 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPBh1uAAoJEDeqqVYsXL0MnxIIAJl0SLxgX3Vo18jhv7epNaUy Akm8VcLTjW99IAZm1x166pGjLvdeZJC5A50DxW3jQMknKYZyyxEmTGOOMVA/LuCS J3V18tMrsEA7i1kEGx2MauRRNAvReAZl4a/nHuRc+hpVmfyQegBv1v4V0v0gzD5I MDZSSksqtXpJhsHt2B4g/jao7RhuJYXw7NidRGzEtksax3NMyWzaIb/75Uq6eenE HwJwCUgZZxxfRKksj/T8ShRE6BKL9wcvrm8SVNjBYF2OpnMUNCXtfLQ4fqbHtGz2 otWvoQxVERehPZhTWHk6QnLgPwBWYyIUK7ErSFMTb9EK8b3FsEZkw6/LlS/lXnI= =Zopk -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 SCSI updates for post 3.2 merge window * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (67 commits) [SCSI] lpfc 8.3.28: Update driver version to 8.3.28 [SCSI] lpfc 8.3.28: Add Loopback support for SLI4 adapters [SCSI] lpfc 8.3.28: Critical Miscellaneous fixes [SCSI] Lpfc 8.3.28: FC and SCSI Discovery Fixes [SCSI] lpfc 8.3.28: Add support for ABTS failure handling [SCSI] lpfc 8.3.28: SLI fixes and added SLI4 support [SCSI] lpfc 8.3.28: Miscellaneous fixes in sysfs and mgmt interfaces [SCSI] mpt2sas: Removed redundant calling of _scsih_probe_devices() from _scsih_probe [SCSI] mac_scsi: Remove obsolete IRQ_FLG_* users [SCSI] qla4xxx: Update driver version to 5.02.00-k10 [SCSI] qla4xxx: check for FW alive before calling chip_reset [SCSI] qla4xxx: Fix qla4xxx_dump_buffer to dump buffer correctly [SCSI] qla4xxx: Fix the IDC locking mechanism [SCSI] qla4xxx: Wait for disable_acb before doing set_acb [SCSI] qla4xxx: Don't recover adapter if device state is FAILED [SCSI] qla4xxx: fix call trace on rmmod with ql4xdontresethba=1 [SCSI] qla4xxx: Fix CPU lockups when ql4xdontresethba set [SCSI] qla4xxx: Perform context resets in case of context failures. [SCSI] iscsi class: export pid of process that created [SCSI] mpt2sas: Remove unused duplicate diag_buffer_enable param ...	2012-01-10 10:36:08 -08:00
Linus Torvalds	88266917b5	Merge git://www.linux-watchdog.org/linux-watchdog * git://www.linux-watchdog.org/linux-watchdog: watchdog: omap_wdt.c: fix the WDIOC_GETBOOTSTATUS ioctl if not implemented. watchdog: new driver for VIA chipsets watchdog: ath79_wdt: flush register writes drivers/watchdog/lantiq_wdt.c: drop iounmap for devm_ allocated data watchdog: documentation: describe nowayout in coversion-guide watchdog: documentation: update index file watchdog: Convert wm831x driver to devm_kzalloc() watchdog: add nowayout helpers to Watchdog Timer Driver Kernel API watchdog: convert drivers/watchdog/* to use module_platform_driver() watchdog: Use DEFINE_SPINLOCK() for static spinlocks watchdog: Convert Wolfson drivers to module_platform_driver	2012-01-10 10:29:23 -08:00
Linus Torvalds	269d430131	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (40 commits) regulator: set constraints.apply_uV to 0 in of_get_fixed_voltage_config regulator: max8925: fix enabled/disabled judgement mistake regulator: add regulator_bulk_force_disable function regulator: pass regulator_register of_node in fixed voltage driver regulator: add regulator_force_disable() definition for !CONFIG_REGULATOR regulator: Enable supply regulator if child rail is enabled. regulator: mc13892: Convert to devm_kzalloc() regulator: mc13783: Convert to devm_kzalloc() regulator: Fix checking return value of create_regulator regulator: Fix the error handling if create_regulator fails regulator: Export regulator_is_supported_voltage() regulator: mc13892: add device tree probe support regulator: mc13892: remove the unnecessary prefix from regulator name regulator: Convert wm831x regulator drivers to devm_kzalloc() regulator: da9052: Staticize non-exported symbols regulator: Replace kzalloc with devm_kzalloc and if-else with a switch-case for da9052-regulator regulator: Update da9052-regulator for DT changes regulator: DA9052/53 Regulator support regulator: pass device_node to of_get_regulator_init_data() regulator: If a single voltage is set with device tree then set apply_uV ...	2012-01-10 10:20:34 -08:00
Linus Torvalds	d52739c62e	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (31 commits) pinctrl: remove unnecessary max pin number pinctrl: correct a offset while enumerating pins pinctrl: some typo fixes pinctrl: rename U300 and SIRF pin controllers pinctrl: pass name instead of device to pin_config_* pinctrl: add "struct seq_file;" to pinconf.h pinctrl: conjure names for unnamed pins pinctrl: add a group-specific hog macro pinctrl: don't create a device for each pin controller arm/u300: don't use PINMUX_MAP_PRIMARY* pinctrl: implement PINMUX_MAP_SYS_HOG pinctrl: add a pin config interface pinctrl/coh901: driver to request its pins pinctrl: u300-pinmux: register proper GPIO ranges pinctrl: move the U300 GPIO driver to pinctrl ARM: u300: localize GPIO assignments pinctrl: make it possible to add multiple maps pinctrl: make a copy of pinmux map pinctrl: GPIO direction support for muxing pinctrl: print pin range in GPIO range debugs ...	2012-01-10 10:19:57 -08:00
Linus Torvalds	abce00f962	Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev * 'upstream-linus' of git://github.com/jgarzik/libata-dev: ahci: support the STA2X11 I/O Hub pata_bf54x: fix BMIDE status register emulation ata: add ata port hibernate callbacks ata: update ata port's runtime status during system resume [SCSI] runtime resume parent for child's system-resume ahci: platform support for suspend/resume libata-core: kill duplicate statement in ata_do_set_mode() pata_of_platform: remove direct dependency on OF_IRQ SATA/PATA: convert drivers/ata/* to use module_platform_driver() pata_cs5536: forward port changes from cs5536 libata-sff: use ATAPI_{COD\|IO} ata: add ata port runtime PM callbacks ata: add ata port system PM callbacks [SCSI] sd: check runtime PM status in sd_shutdown [SCSI] check runtime PM status in system PM [SCSI] add flag to skip the runtime PM calls on the host ata: make ata port as parent device of scsi host ahci: start engine only during soft/hard resets	2012-01-10 10:19:17 -08:00
Linus Torvalds	90160371b3	Merge branch 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (37 commits) xen/pciback: Expand the warning message to include domain id. xen/pciback: Fix "device has been assigned to X domain!" warning xen/pciback: Move the PCI_DEV_FLAGS_ASSIGNED ops to the "[un\|]bind" xen/xenbus: don't reimplement kvasprintf via a fixed size buffer xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX. Xen: consolidate and simplify struct xenbus_driver instantiation xen-gntalloc: introduce missing kfree xen/xenbus: Fix compile error - missing header for xen_initial_domain() xen/netback: Enable netback on HVM guests xen/grant-table: Support mappings required by blkback xenbus: Use grant-table wrapper functions xenbus: Support HVM backends xen/xenbus-frontend: Fix compile error with randconfig xen/xenbus-frontend: Make error message more clear xen/privcmd: Remove unused support for arch specific privcmp mmap xen: Add xenbus_backend device xen: Add xenbus device driver xen: Add privcmd device driver xen/gntalloc: fix reference counts on multi-page mappings ...	2012-01-10 10:09:59 -08:00
Linus Torvalds	ae5cfc0546	Merge branch 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/mmu: Fix compile errors introduced by x86/memblock mismerge.	2012-01-10 10:09:48 -08:00
Sergey Senozhatsky	ace8577aeb	block_dev: Suppress bdev_cache_init() kmemleak warninig Kmemleak reports the following warning in bdev_cache_init() [ 0.003738] kmemleak: Object 0xffff880153035200 (size 256): [ 0.003823] kmemleak: comm "swapper/0", pid 0, jiffies 4294667299 [ 0.003909] kmemleak: min_count = 1 [ 0.003988] kmemleak: count = 0 [ 0.004066] kmemleak: flags = 0x1 [ 0.004144] kmemleak: checksum = 0 [ 0.004224] kmemleak: backtrace: [ 0.004303] [<ffffffff814755ac>] kmemleak_alloc+0x21/0x3e [ 0.004446] [<ffffffff811100ba>] kmem_cache_alloc+0xca/0x1dc [ 0.004592] [<ffffffff811371b1>] alloc_vfsmnt+0x1f/0x198 [ 0.004736] [<ffffffff811375c5>] vfs_kern_mount+0x36/0xd2 [ 0.004879] [<ffffffff8113929a>] kern_mount_data+0x18/0x32 [ 0.005025] [<ffffffff81ab9075>] bdev_cache_init+0x51/0x81 [ 0.005169] [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d [ 0.005313] [<ffffffff81a9bae3>] start_kernel+0x344/0x383 [ 0.005456] [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2 [ 0.005602] [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111 [ 0.005747] [<ffffffffffffffff>] 0xffffffffffffffff [ 0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey [ 0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888 [ 0.008856] Call Trace: [ 0.008934] [<ffffffff81118704>] ? find_and_get_object+0x44/0x118 [ 0.009023] [<ffffffff81118fe6>] paint_ptr+0x57/0x8f [ 0.009109] [<ffffffff81475935>] kmemleak_not_leak+0x23/0x42 [ 0.009195] [<ffffffff81ab9096>] bdev_cache_init+0x72/0x81 [ 0.009282] [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d [ 0.009368] [<ffffffff81a9bae3>] start_kernel+0x344/0x383 [ 0.009466] [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2 [ 0.009555] [<ffffffff81a9b140>] ? early_idt_handlers+0x140/0x140 [ 0.009643] [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111 due to attempt to mark pointer to `struct vfsmount' as a gray object, which is embedded into `struct mount' returned from alloc_vfsmnt(). Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as suggested by Al Viro. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 13:08:55 -05:00
Miklos Szeredi	eaf5f90735	fix shrink_dcache_parent() livelock Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may cause shrink_dcache_parent() to loop forever. Here's what appears to happen: 1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1 2 - CPU1: select_parent(P) locks P->d_lock 3 - CPU0: shrink_dentry_list() locks C->d_lock dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock 4 - CPU1: select_parent(P) locks C->d_lock, moves C from dispose list being processed on CPU0 to the new dispose list, returns 1 5 - CPU0: shrink_dentry_list() finds dispose list empty, returns 6 - Goto 2 with CPU0 and CPU1 switched Basically select_parent() steals the dentry from shrink_dentry_list() and thinks it found a new one, causing shrink_dentry_list() to think it's making progress and loop over and over. One way to trigger this is to make udev calls stat() on the sysfs file while it is going away. Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick: ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true" Then execute the following loop: while true; do echo -bond0 > /sys/class/net/bonding_masters echo +bond0 > /sys/class/net/bonding_masters echo -bond1 > /sys/class/net/bonding_masters echo +bond1 > /sys/class/net/bonding_masters done One fix would be to check all callers and prevent concurrent calls to shrink_dcache_parent(). But I think a better solution is to stop the stealing behavior. This patch adds a new dentry flag that is set when the dentry is added to the dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a new reference just before being pruned. If the dentry has this flag, select_parent() will skip it and let shrink_dentry_list() retry pruning it. With select_parent() skipping those dentries there will not be the appearance of progress (new dentries found) when there is none, hence shrink_dcache_parent() will not loop forever. Set the flag is also set in prune_dcache_sb() for consistency as suggested by Linus. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 13:06:32 -05:00
Linus Torvalds	3dcf6c1b6b	Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits) KVM: PPC: Whitespace fix for kvm.h KVM: Fix whitespace in kvm_para.h KVM: PPC: annotate kvm_rma_init as __init KVM: x86 emulator: implement RDPMC (0F 33) KVM: x86 emulator: fix RDPMC privilege check KVM: Expose the architectural performance monitoring CPUID leaf KVM: VMX: Intercept RDPMC KVM: SVM: Intercept RDPMC KVM: Add generic RDPMC support KVM: Expose a version 2 architectural PMU to a guests KVM: Expose kvm_lapic_local_deliver() KVM: x86 emulator: Use opcode::execute for Group 9 instruction KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions KVM: x86 emulator: Use opcode::execute for Group 1A instruction KVM: ensure that debugfs entries have been created KVM: drop bsp_vcpu pointer from kvm struct KVM: x86: Consolidate PIT legacy test KVM: x86: Do not rely on implicit inclusions KVM: Make KVM_INTEL depend on CPU_SUP_INTEL KVM: Use memdup_user instead of kmalloc/copy_from_user ...	2012-01-10 09:57:11 -08:00
Theodore Ts'o	ff9cb1c4ee	Merge branch 'for_linus' into for_linus_merged Conflicts: fs/ext4/ioctl.c	2012-01-10 11:54:07 -05:00
Xi Wang	d50f2ab6f0	ext4: fix undefined behavior in ext4_fill_flex_info() Commit `503358ae01` ("ext4: avoid divide by zero when trying to mount a corrupted file system") fixes CVE-2009-4307 by performing a sanity check on s_log_groups_per_flex, since it can be set to a bogus value by an attacker. sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex; groups_per_flex = 1 << sbi->s_log_groups_per_flex; if (groups_per_flex < 2) { ... } This patch fixes two potential issues in the previous commit. 1) The sanity check might only work on architectures like PowerPC. On x86, 5 bits are used for the shifting amount. That means, given a large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36 is essentially 1 << 4 = 16, rather than 0. This will bypass the check, leaving s_log_groups_per_flex and groups_per_flex inconsistent. 2) The sanity check relies on undefined behavior, i.e., oversized shift. A standard-confirming C compiler could rewrite the check in unexpected ways. Consider the following equivalent form, assuming groups_per_flex is unsigned for simplicity. groups_per_flex = 1 << sbi->s_log_groups_per_flex; if (groups_per_flex == 0 \|\| groups_per_flex == 1) { We compile the code snippet using Clang 3.0 and GCC 4.6. Clang will completely optimize away the check groups_per_flex == 0, leaving the patched code as vulnerable as the original. GCC keeps the check, but there is no guarantee that future versions will do the same. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-01-10 11:51:10 -05:00
Al Viro	f4947fbce2	coda: switch coda_cnode_make() to sane API as well, clean coda_lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 11:13:16 -05:00
Al Viro	0b2c4e39c0	coda: deal correctly with allocation failure from coda_cnode_makectl() lookup should fail with ENOMEM, not silently make dentry negative. Switched to saner calling conventions, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 11:13:13 -05:00
Al Viro	3e25eb9c4b	securityfs: fix object creation races inode needs to be fully set up before we feed it to d_instantiate(). securityfs_create_file() does not do so; it sets ->i_fop and ->i_private only after we'd exposed the inode. Unfortunately, that's done fairly deep in call chain, so the amount of churn is considerable. Helper functions killed by substituting into their solitary call sites, dead code removed. We finally can bury default_file_ops, now that the final value of ->i_fop is available (and assigned) at the point where inode is allocated. Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-10 10:20:35 -05:00

... 2 3 4 5 6 ...

283433 commits