kernel-fxtec-pro1x

Author	SHA1	Message	Date
Christoph Lameter	b5fab14e5d	Add VM_BUG_ON in case someone uses page_mapping on a slab page Detect slab objects being passed to the page oriented functions of the VM. It is not sufficient to simply return NULL because the functions calling page_mapping may depend on other items of the page_struct also to be setup properly. Moreover slab object may not be properly aligned. The page oriented functions of the VM expect to operate on page aligned, page sized objects. Operations on object straddling page boundaries may only affect the objects partially which may lead to surprising results. It is better to detect eventually remaining uses and eliminate them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:02 -07:00
Christoph Lameter	81cda66261	Slab allocators: Cleanup zeroing allocations It becomes now easy to support the zeroing allocs with generic inline functions in slab.h. Provide inline definitions to allow the continued use of kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing functions from the slab allocators and util.c. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Christoph Lameter	0c71001320	SLUB: add some more inlines and #ifdef CONFIG_SLUB_DEBUG Add #ifdefs around data structures only needed if debugging is compiled into SLUB. Add inlines to small functions to reduce code size. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Christoph Lameter	6cb8f91320	Slab allocators: consistent ZERO_SIZE_PTR support and NULL result semantics Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the allocators. Move ZERO_SIZE_PTR related stuff into slab.h. Make ZERO_SIZE_PTR work for all slab allocators and get rid of the WARN_ON_ONCE(size == 0) that is still remaining in SLAB. Make slub return NULL like the other allocators if a too large memory segment is requested via __kmalloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Rusty Russell	8e1f936b73	mm: clean up and kernelify shrinker registration I can never remember what the function to register to receive VM pressure is called. I have to trace down from __alloc_pages() to find it. It's called "set_shrinker()", and it needs Your Help. 1) Don't hide struct shrinker. It contains no magic. 2) Don't allocate "struct shrinker". It's not helpful. 3) Call them "register_shrinker" and "unregister_shrinker". 4) Call the function "shrink" not "shrinker". 5) Reduce the 17 lines of waffly comments to 13, but document it properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: David Chinner <dgc@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:00 -07:00
Andy Whitcroft	5ad333eb66	Lumpy Reclaim V4 When we are out of memory of a suitable size we enter reclaim. The current reclaim algorithm targets pages in LRU order, which is great for fairness at order-0 but highly unsuitable if you desire pages at higher orders. To get pages of higher order we must shoot down a very high proportion of memory; >95% in a lot of cases. This patch set adds a lumpy reclaim algorithm to the allocator. It targets groups of pages at the specified order anchored at the end of the active and inactive lists. This encourages groups of pages at the requested orders to move from active to inactive, and active to free lists. This behaviour is only triggered out of direct reclaim when higher order pages have been requested. This patch set is particularly effective when utilised with an anti-fragmentation scheme which groups pages of similar reclaimability together. This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms the foundation. Credit to Mel Gorman for sanitity checking. Mel said: The patches have an application with hugepage pool resizing. When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can be resized with greater reliability. Testing on a desktop machine with 2GB of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own was very slow as the success rate was quite low. Without lumpy-reclaim, each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages. With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical. [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup] [bunk@stusta.de: static declarations for internal functions] [a.p.zijlstra@chello.nl: initial lumpy V2 implementation] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Mel Gorman	ed7ed36517	handle kernelcore=: generic This patch adds the kernelcore= parameter for x86. Once all patches are applied, a new command-line parameter exist and a new sysctl. This patch adds the necessary documentation. From: Yasunori Goto <y-goto@jp.fujitsu.com> When "kernelcore" boot option is specified, kernel can't boot up on ia64 because of an infinite loop. In addition, the parsing code can be handled in an architecture-independent manner. This patch uses common code to handle the kernelcore= parameter. It is only available to architectures that support arch-independent zone-sizing (i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will ignore the boot parameter. [bunk@stusta.de: make cmdline_parse_kernelcore() static] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Mel Gorman	396faf0303	Allow huge page allocations to use GFP_HIGH_MOVABLE Huge pages are not movable so are not allocated from ZONE_MOVABLE. However, as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it can be used to satisfy hugepage allocations even when the system has been running a long time. This allows an administrator to resize the hugepage pool at runtime depending on the size of ZONE_MOVABLE. This patch adds a new sysctl called hugepages_treat_as_movable. When a non-zero value is written to it, future allocations for the huge page pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. [akpm@linux-foundation.org: various fixes] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Mel Gorman	2a1e274acf	Create the ZONE_MOVABLE zone The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisfied from either partition. The patches may be applied with the list-based anti-fragmentation patches that groups pages together based on mobility. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spread evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. Credit goes to Andy Whitcroft for catching a large variety of problems during review of the patches. This patch creates an additional zone, ZONE_MOVABLE. This zone is only usable by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE. Hot-added memory continues to be placed in their existing destination as there is no mechanism to redirect them to a specific zone. [y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code] [akpm@linux-foundation.org: various fixes] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Mel Gorman	769848c038	Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated It is often known at allocation time whether a page may be migrated or not. This patch adds a flag called __GFP_MOVABLE and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. An API function very similar to alloc_zeroed_user_highpage() is added for __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The flags used by alloc_zeroed_user_highpage() are not changed because it would change the semantics of an existing API. After this patch is applied there are no in-kernel users of alloc_zeroed_user_highpage() so it probably should be marked deprecated if this patch is merged. Note that this patch includes a minor cleanup to the use of __GFP_ZERO in shmem.c to keep all flag modifications to inode->mapping in the shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of Hugh Dickens. Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the concept. Credit to Hugh Dickens for catching issues with shmem swap vector and ramfs allocations. [akpm@linux-foundation.org: build fix] [hugh@veritas.com: __GFP_ZERO cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Martin Schwidefsky	e21ea246bc	mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirty Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty. Remove the functions from all architectures. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Martin Schwidefsky	f0e47c229b	mm: remove ptep_establish() The last user of ptep_establish in mm/ is long gone. Remove the architecture primitive as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:22:59 -07:00
Linus Torvalds	489de30259	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (209 commits) [POWERPC] Create add_rtc() function to enable the RTC CMOS driver [POWERPC] Add H_ILLAN_ATTRIBUTES hcall number [POWERPC] xilinxfb: Parameterize xilinxfb platform device registration [POWERPC] Oprofile support for Power 5++ [POWERPC] Enable arbitary speed tty ioctls and split input/output speed [POWERPC] Make drivers/char/hvc_console.c:khvcd() static [POWERPC] Remove dead code for preventing pread() and pwrite() calls [POWERPC] Remove unnecessary #undef printk from prom.c [POWERPC] Fix typo in Ebony default DTS [POWERPC] Check for NULL ppc_md.init_IRQ() before calling [POWERPC] Remove extra return statement [POWERPC] pasemi: Don't auto-select CONFIG_EMBEDDED [POWERPC] pasemi: Rename platform [POWERPC] arch/powerpc/kernel/sysfs.c: Move NUMA exports [POWERPC] Add __read_mostly support for powerpc [POWERPC] Modify sched_clock() to make CONFIG_PRINTK_TIME more sane [POWERPC] Create a dummy zImage if no valid platform has been selected [POWERPC] PS3: Bootwrapper support. [POWERPC] powermac i2c: Use mutex [POWERPC] Schedule removal of arch/ppc ... Fixed up conflicts manually in: Documentation/feature-removal-schedule.txt arch/powerpc/kernel/pci_32.c arch/powerpc/kernel/pci_64.c include/asm-powerpc/pci.h and asked the powerpc people to double-check the result..	2007-07-16 17:58:08 -07:00
Linus Torvalds	1f1c2881f6	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (37 commits) forcedeth bug fix: realtek phy forcedeth bug fix: vitesse phy forcedeth bug fix: cicada phy atl1: reorder atl1_main functions atl1: fix excessively indented code atl1: cleanup atl1_main atl1: header file cleanup atl1: remove irq_sem cdc-subset to support new vendor/product ID 8139cp: implement the missing dev->tx_timeout myri10ge: Remove nonsensical limit in the tx done routine gianfar: kill unused header EP93XX_ETH must select MII macb: Add multicast capability macb: Use generic PHY layer s390: add barriers to qeth driver s390: scatter-gather for inbound traffic in qeth driver eHEA: Introducing support vor DLPAR memory add Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c [PATCH] softmac: Fix ESSID problem ...	2007-07-16 17:48:54 -07:00
frederic RODO	6c36a70744	macb: Use generic PHY layer Convert the macb driver to use the generic PHY layer in drivers/net/phy. Signed-off-by: Frederic RODO <f.rodo@til-technologies.fr> Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-16 18:28:04 -04:00
Linus Torvalds	2e27afb300	Revert "[NET]: Fix races in net_rx_action vs netpoll." This reverts commit `29578624e3`. Ingo Molnar reports complete breakage with his e1000 card (no networking, card reports transmit timeouts), and bisected it down to this commit. Let's figure out what went wrong, but not keep breaking machines until we do. Cc: Ingo Molnar <mingo@elte.hu> Cc: Olaf Kirch <olaf.kirch@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 14:31:08 -07:00
Linus Torvalds	add096909d	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits) [PATCH] ocfs2: zero_user_page conversion ocfs2: Support xfs style space reservation ioctls ocfs2: support for removing file regions ocfs2: update truncate handling of partial clusters ocfs2: btree support for removal of arbirtrary extents ocfs2: Support creation of unwritten extents ocfs2: support writing of unwritten extents ocfs2: small cleanup of ocfs2_write_begin_nolock() ocfs2: btree changes for unwritten extents ocfs2: abstract btree growing calls ocfs2: use all extent block suballocators ocfs2: plug truncate into cached dealloc routines ocfs2: simplify deallocation locking ocfs2: harden buffer check during mapping of page blocks ocfs2: shared writeable mmap ocfs2: factor out write aops into nolock variants ocfs2: rework ocfs2_buffered_write_cluster() ocfs2: take ip_alloc_sem during entire truncate ocfs2: Add "preferred slot" mount option [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c ...	2007-07-16 10:52:55 -07:00
Linus Torvalds	e245befce7	Merge branch 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block * 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block: (25 commits) bsg: Kconfig updates bsg: add SCSI transport-level request support bsg: add bidi support add a struct request pointer to the request structure bsg: fix the deadlock on discarding done commands bsg: fix a blocking read bug bsg: minor bug fixes improve bsg device allocation bind bsg to all SCSI devices bsg: bind bsg to request_queue instead of gendisk bsg: add a request_queue argument to scsi_cmd_ioctl() bsg: simplify __bsg_alloc_command failpath bsg: add cheasy error checks for sysfs stuff Add queue resizing support Replace s32, u32 and u64 with __s32, __u32 and __u64 in bsg.h for userspace bsg: silence a bogus gcc warning bsg: style cleanup bsg: use u32 etc instead of uint32_t bsg: add SG_IO to SG v4 bsg: replace SG v3 with SG v4 ...	2007-07-16 10:50:19 -07:00
Linus Torvalds	14dc524972	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: splice: direct splicing updates ppos twice more ACSI removal umem: Fix match of pci_ids in umem driver umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable remove the documentation for the legacy CDROM drivers	2007-07-16 10:48:20 -07:00
Linus Torvalds	02b2318e07	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: (26 commits) [SPARC64]: Fix UP build. [SPARC64]: dr-cpu unconfigure support. [SERIAL]: Fix console write locking in sparc drivers. [SPARC64]: Give more accurate errors in dr_cpu_configure(). [SPARC64]: Clear cpu_{core,sibling}_map[] in smp_fill_in_sib_core_maps() [SPARC64]: Fix leak when DR added cpu does not bootup. [SPARC64]: Add ->set_affinity IRQ handlers. [SPARC64]: Process dr-cpu events in a kthread instead of workqueue. [SPARC64]: More sensible udelay implementation. [SPARC64]: SMP build fixes. [SPARC64]: mdesc.c needs linux/mm.h [SPARC64]: Fix build regressions added by dr-cpu changes. [SPARC64]: Unconditionally register vio_bus_type. [SPARC64]: Initial LDOM cpu hotplug support. [SPARC64]: Fix setting of variables in LDOM guest. [SPARC64]: Fix MD property lifetime bugs. [SPARC64]: Abstract out mdesc accesses for better MD update handling. [SPARC64]: Use more mearningful names for IRQ registry. [SPARC64]: Initial domain-services driver. [SPARC64]: Export powerd facilities for external entities. ...	2007-07-16 10:45:23 -07:00
Linus Torvalds	b91cba52e9	Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (68 commits) sh: sh-rtc support for SH7709. sh: Revert __xdiv64_32 size change. sh: Update r7785rp defconfig. sh: Export div symbols for GCC 4.2 and ST GCC. sh: fix race in parallel out-of-tree build sh: Kill off dead mach.c for hp6xx. sh: hd64461.h cleanup and added comments. sh: Update the alignment when 4K stacks are used. sh: Add a .bss.page_aligned section for 4K stacks. sh: Don't let SH-4A clobber SH-4 CFLAGS. sh: Add parport stub for SuperIO ports. sh: Drop -Wa,-dsp for DSP tuning. sh: Update dreamcast defconfig. fb: pvr2fb: A few more __devinit annotations for PCI. fb: pvr2fb: Fix up section mismatch warnings. sh: Select IPR-IRQ for SH7091. sh: Correct __xdiv64_32/div64_32 return value size. sh: Fix timer-tmu build for SH-3. sh: Add cpu and mach links to CLEAN_FILES. sh: Preliminary support for the SH-X3 CPU. ...	2007-07-16 10:32:02 -07:00
Badari Pulavarty	5e70030d4c	ext4: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Badari Pulavarty	a71ce8c6c9	ext3: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Badari Pulavarty	2235219b77	ext2: statfs speed up This is a patch that speeds up statfs. It is very simple - the "overhead" calculation, which takes a huge amount of time for large filesystems, never changes unless the size of the filesystem itself changes. That means we can store it in memory and only recalculate if the filesystem has been resized (almost never). It also fixes a minor problem that we never update the on-disk superblock free blocks/inodes counts until the filesystem is unmounted. While not fatal, we may as well update that on disk when we have the information, and it makes things like debugfs and dumpe2fs report a bit more accurate info. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Adrian Bunk	8f8a68ee48	remove mm/backing-dev.c:congestion_wait_interruptible() congestion_wait_interruptible() is no longer used. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:52 -07:00
Oleg Nesterov	1f1f642e2f	make cancel_xxx_work_sync() return a boolean Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean indicating whether the work was actually cancelled. A zero return value means that the work was not pending/queued. Without that kind of change it is not possible to avoid flush_workqueue() sometimes, see the next patch as an example. Also, this patch unifies both functions and kills the (unlikely) busy-wait loop. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Oleg Nesterov	f5a421a450	rename cancel_rearming_delayed_work() to cancel_delayed_work_sync() Imho, the current naming of cancel_xxx workqueue functions is very confusing. cancel_delayed_work() cancel_rearming_delayed_work() cancel_rearming_delayed_workqueue() // obsolete cancel_work_sync() This looks as if the first 2 functions differ in "type" of their argument which is not true any longer, nowadays the difference is the behaviour. The semantics of cancel_rearming_delayed_work(dwork) was changed significantly, it doesn't require that dwork rearms itself, and cancels dwork synchronously. Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work() and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple inline obsolete wrapper, like cancel_rearming_delayed_workqueue(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Jesper Juhl	d52988023a	Remove the last few UMSDOS leftovers The UMSDOS filesystem was removed back in 2.6.11, but some tiny bits stuck around. This patch removes the few remaining leftovers. The only things left behind after this are the entries in the CREDITS file and the ioctl number in Documentation/ioctl-number.txt as documentation. This third (hopefully final) version of the patch doesn't edit the arch/um/config.release file, since Jeff Dike pointed out to me that it should die completely, and asked me to remove it from my patch as he'll send in a seperate patch removing the file completely. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Heiko Carstens	608e261968	generic bug: use show_regs() instead of dump_stack() The current generic bug implementation has a call to dump_stack() in case a WARN_ON(whatever) gets hit. Since report_bug(), which calls dump_stack(), gets called from an exception handler we can do better: just pass the pt_regs structure to report_bug() and pass it to show_regs() in case of a warning. This will give more debug informations like register contents, etc... In addition this avoids some pointless lines that dump_stack() emits, since it includes a stack backtrace of the exception handler which is of no interest in case of a warning. E.g. on s390 the following lines are currently always present in a stack backtrace if dump_stack() gets called from report_bug(): [<000000000001517a>] show_trace+0x92/0xe8) [<0000000000015270>] show_stack+0xa0/0xd0 [<00000000000152ce>] dump_stack+0x2e/0x3c [<0000000000195450>] report_bug+0x98/0xf8 [<0000000000016cc8>] illegal_op+0x1fc/0x21c [<00000000000227d6>] sysc_return+0x0/0x10 Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Andi Kleen <ak@suse.de> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:51 -07:00
Andrew Morton	cc2ea416b2	uninline check_signature() This is a rather bizarre thing to have inlined in io.h. Stick it in lib/ instead. While we're there, despaghetti it a bit, and fix its off-by-one behaviour when passed a zero length. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Andrea Arcangeli	cf99abace7	make seccomp zerocost in schedule This follows a suggestion from Chuck Ebbert on how to make seccomp absolutely zerocost in schedule too. The only remaining footprint of seccomp is in terms of the bzImage size that becomes a few bytes (perhaps even a few kbytes) larger, measure it if you care in the embedded. Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Andrea Arcangeli	1d9d02feee	move seccomp from /proc to a prctl This reduces the memory footprint and it enforces that only the current task can enable seccomp on itself (this is a requirement for a strightforward [modulo preempt ;) ] TIF_NOTSC implementation). Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Robert P. J. Day	132e4b0a04	cdrom: replace hard-coded constants by kernel.h macro. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Eric W. Biederman	213dd266d4	namespace: ensure clone_flags are always stored in an unsigned long While working on unshare support for the network namespace I noticed we were putting clone flags in an int. Which is weird because the syscall uses unsigned long and we at least need an unsigned to properly hold all of the unshare flags. So to make the code consistent, this patch updates the code to use unsigned long instead of int for the clone flags in those places where we get it wrong today. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Robert P. J. Day	f489592597	Remove final two references to "__obsolete_setup" macro Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Arnd Bergmann	4b7775870b	Introduce compat_u64 and compat_s64 types One common problem with 32 bit system call and ioctl emulation is the different alignment rules between i386 and 64 bit machines. A number of drivers work around this by marking the compat structures as 'attribute((packed))', which is not the right solution because it breaks all the non-x86 architectures that want to use the same compat code. Hopefully, this patch improves the situation, it introduces two new types, compat_u64 and compat_s64. These are defined on all architectures to have the same size and alignment as the 32 bit version of u64 and s64. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vasily Tarasov <vtaras@openvz.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Nathan Lynch	dcf5008db1	remove unused lock_cpu_hotplug_interruptible definition `aa95387774` removed the implementation of lock_cpu_hotplug_interruptible and all users of it. This stub definition for !CONFIG_HOTPLUG_CPU was left over -- kill it now. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Robert P. J. Day	ea5a3dcfda	COBALT: remove all references to Cobalt NVRAM Remove not only the references to Cobalt NVRAM, but the header file as well. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Tim Hockin <thockin@hockin.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Serge E. Hallyn	77ec739d8d	user namespace: add unshare This patch enables the unshare of user namespaces. It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which resets the current user_struct and adds a new root user (uid == 0) For now, unsharing the user namespace allows a process to reset its user_struct accounting and uid 0 in the new user namespace should be contained using appropriate means, for instance selinux The plan, when the full support is complete (all uid checks covered), is to keep the original user's rights in the original namespace, and let a process become uid 0 in the new namespace, with full capabilities to the new namespace. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Andrew Morgan <agm@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Cedric Le Goater	acce292c82	user namespace: add the framework Basically, it will allow a process to unshare its user_struct table, resetting at the same time its own user_struct and all the associated accounting. A new root user (uid == 0) is added to the user namespace upon creation. Such root users have full privileges and it seems that theses privileges should be controlled through some means (process capabilities ?) The unshare is not included in this patch. Changes since [try #4]: - Updated get_user_ns and put_user_ns to accept NULL, and get_user_ns to return the namespace. Changes since [try #3]: - moved struct user_namespace to files user_namespace.{c,h} Changes since [try #2]: - removed struct user_namespace* argument from find_user() Changes since [try #1]: - removed struct user_namespace* argument from find_user() - added a root_user per user namespace Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Andrew Morgan <agm@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Cedric Le Goater	7d69a1f4a7	remove CONFIG_UTS_NS and CONFIG_IPC_NS CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only deactivate the unshare of the uts and ipc namespaces and do not improve performance. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Miloslav Trmac	522ed7767e	Audit: add TTY input auditing Add TTY input auditing, used to audit system administrator's actions. This is required by various security standards such as DCID 6/3 and PCI to provide non-repudiation of administrator's actions and to allow a review of past actions if the administrator seems to overstep their duties or if the system becomes misconfigured for unknown reasons. These requirements do not make it necessary to audit TTY output as well. Compared to an user-space keylogger, this approach records TTY input using the audit subsystem, correlated with other audit events, and it is completely transparent to the user-space application (e.g. the console ioctls still work). TTY input auditing works on a higher level than auditing all system calls within the session, which would produce an overwhelming amount of mostly useless audit events. Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs by process with the attribute is sent to the audit subsystem by the kernel. The audit netlink interface is extended to allow modifying the audit_tty attribute, and to allow sending explanatory audit events from user-space (for example, a shell might send an event containing the final command, after the interactive command-line editing and history expansion is performed, which might be difficult to decipher from the TTY input alone). Because the "audit_tty" attribute is inherited across fork (), it would be set e.g. for sshd restarted within an audited session. To prevent this, the audit_tty attribute is cleared when a process with no open TTY file descriptors (e.g. after daemon startup) opens a TTY. See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a more detailed rationale document for an older version of this patch. [akpm@linux-foundation.org: build fix] Signed-off-by: Miloslav Trmac <mitr@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Alan Cox	4f27c00bf8	Improve behaviour of spurious IRQ detect Currently we handle spurious IRQ activity based upon seeing a lot of invalid interrupts, and we clear things back on the base of lots of valid interrupts. Unfortunately in some cases you get legitimate invalid interrupts caused by timing asynchronicity between the PCI bus and the APIC bus when disabling interrupts and pulling other tricks. In this case although the spurious IRQs are not a problem our unhandled counters didn't clear and they act as a slow running timebomb. (This is effectively what the serial port/tty problem that was fixed by clearing counters when registering a handler showed up) It's easy enough to add a second parameter - time. This means that if we see a regular stream of harmless spurious interrupts which are not harming processing we don't go off and do something stupid like disable the IRQ after a month of running. OTOH lockups and performance killers show up a lot more than 10/second [akpm@linux-foundation.org: cleanup] Signed-off-by: Alan Cox <alan@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Dave Jones	5216184571	fix typo in prefetch.h Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Maxim Uvarov	b663a79c19	taskstats: add context-switch counters Make available to the user the following task and process performance statistics: * Involuntary Context Switches (task_struct->nivcsw) * Voluntary Context Switches (task_struct->nvcsw) Statistics information is available from: 1. taskstats interface (Documentation/accounting/) 2. /proc/PID/status (task only). This data is useful for detecting hyperactivity patterns between processes. [akpm@linux-foundation.org: cleanup] Signed-off-by: Maxim Uvarov <muvarov@ru.mvista.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Jonathan Lim <jlim@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Robert P. J. Day	dcae56ea66	Drop an empty isicom.h from being exported to user space. Drop <linux/isicom.h> from being exported to user space since it would be only an empty file. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Adrian Bunk	c289dca379	remove sonypi_camera_command() Remove the no longer used sonypi_camera_command(). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Mattia Dongili <malattia@linux.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Jan Engelhardt	759448f459	Kernel utf-8 handling This patch fixes dead keys and copy/paste of non-ASCII characters in UTF-8 mode on Linux console. See more details about the original patch at: http://chris.heathens.co.nz/linux/utf8.html Already posted on (Oldest) http://lkml.org/lkml/2003/5/31/148 http://lkml.org/lkml/2005/12/24/69 (Recent) http://lkml.org/lkml/2006/8/7/75 [bunk@stusta.de: make drivers/char/selection.c:store_utf8() static] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Alexander E. Patrakov <patrakov@ums.usu.ru> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:46 -07:00
Alexey Dobriyan	aa0ac36518	Remove capability.h from mm.h I forgot to remove capability.h from mm.h while removing sched.h! This patch remedies that, because the only inline function which was using CAP_something was made out of line. Cross-compile tested without regressions on: all powerpc defconfigs all mips defconfigs all m68k defconfigs all arm defconfigs all ia64 defconfigs alpha alpha-allnoconfig alpha-defconfig alpha-up arm i386 i386-allnoconfig i386-defconfig i386-up ia64 ia64-allnoconfig ia64-defconfig ia64-up m68k mips parisc parisc-allnoconfig parisc-defconfig parisc-up powerpc powerpc-up s390 s390-allnoconfig s390-defconfig s390-up sparc sparc-allnoconfig sparc-defconfig sparc-up sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up um-x86_64 x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Ulrich Drepper	4a19542e5f	O_CLOEXEC for SCM_RIGHTS Part two in the O_CLOEXEC saga: adding support for file descriptors received through Unix domain sockets. The patch is once again pretty minimal, it introduces a new flag for recvmsg and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit is not used otherwise but the networking people will know better. This new flag is not recognized by recvfrom and recv. These functions cannot be used for that purpose and the asymmetry this introduces is not worse than the already existing MSG_CMSG_COMPAT situations. The patch must be applied on the patch which introduced O_CLOEXEC. It has to remove static from the new get_unused_fd_flags function but since scm.c cannot live in a module the function still hasn't to be exported. Here's a test program to make sure the code works. It's so much longer than the actual patch... #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <sys/un.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0x40000000 #endif int main (int argc, char argv[]) { if (argc > 1) { int fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 \|\| errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } struct sockaddr_un sun; strcpy (sun.sun_path, "./testsocket"); sun.sun_family = AF_UNIX; char databuf[] = "hello"; struct iovec iov[1]; iov[0].iov_base = databuf; iov[0].iov_len = sizeof (databuf); union { struct cmsghdr hdr; char bytes[CMSG_SPACE (sizeof (int))]; } buf; struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1, .msg_control = buf.bytes, .msg_controllen = sizeof (buf) }; struct cmsghdr cmsg = CMSG_FIRSTHDR (&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN (sizeof (int)); msg.msg_controllen = cmsg->cmsg_len; pid_t child = fork (); if (child == -1) error (1, errno, "fork"); if (child == 0) { int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (bind (sock, (struct sockaddr ) &sun, sizeof (sun)) < 0) error (1, errno, "bind"); if (listen (sock, SOMAXCONN) < 0) error (1, errno, "listen"); int conn = accept (sock, NULL, NULL); if (conn == -1) error (1, errno, "accept"); (int ) CMSG_DATA (cmsg) = sock; if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0) error (1, errno, "sendmsg"); return 0; } / For a test suite this should be more robust like a barrier in shared memory. / sleep (1); int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (connect (sock, (struct sockaddr ) &sun, sizeof (sun)) < 0) error (1, errno, "connect"); unlink (sun.sun_path); (int ) CMSG_DATA (cmsg) = -1; if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0) error (1, errno, "recvmsg"); int fd = (int ) CMSG_DATA (cmsg); if (fd == -1) error (1, 0, "no descriptor received"); char fdname[20]; snprintf (fdname, sizeof (fdname), "%d", fd); execl ("/proc/self/exe", argv[0], fdname, NULL); puts ("execl failed"); return 1; } [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch] [akpm@linux-foundation.org: build fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Buesch <mb@bu3sch.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Ulrich Drepper	f23513e8d9	Introduce O_CLOEXEC The problem is as follows: in multi-threaded code (or more correctly: all code using clone() with CLONE_FILES) we have a race when exec'ing. thread #1 thread #2 fd=open() fork + exec fcntl(fd,F_SETFD,FD_CLOEXEC) In some applications this can happen frequently. Take a web browser. One thread opens a file and another thread starts, say, an external PDF viewer. The result can even be a security issue if that open file descriptor refers to a sensitive file and the external program can somehow be tricked into using that descriptor. Just adding O_CLOEXEC support to open() doesn't solve the whole set of problems. There are other ways to create file descriptors (socket, epoll_create, Unix domain socket transfer, etc). These can and should be addressed separately though. open() is such an easy case that it makes not much sense putting the fix off. The test program: #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif int main (int argc, char *argv[]) { int fd; if (argc > 1) { fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 \|\| errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } fd = open ("/proc/self/exe", O_RDONLY \| O_CLOEXEC); printf ("in parent: new fd = %d\n", fd); char buf[20]; snprintf (buf, sizeof (buf), "%d", fd); execl ("/proc/self/exe", argv[0], buf, NULL); puts ("execl failed"); return 1; } [kyle@parisc-linux.org: parisc fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Venki Pallipadi	c5c061b8f9	Add a flag to indicate deferrable timers in /proc/timer_stats Add a flag in /proc/timer_stats to indicate deferrable timers. This will let developers/users to differentiate between types of tiemrs in /proc/timer_stats. Deferrable timer and normal timer will appear in /proc/timer_stats as below. 10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) Also version of timer_stats changes from v0.1 to v0.2 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Christoph Hellwig	e080706190	remove odd and misleading comments from uio.h Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Dan Williams	1b0fac4587	dma-mapping: prevent dma dependent code from linking on !HAS_DMA archs Continuing the work started in `411f0f3edc` ... This enables code with a dma path, that compiles away, to build without requiring additional code factoring. It also prevents code that calls dma_alloc_coherent and dma_free_coherent from linking whereas previously the code would hit a BUG() at run time. Finally, it allows archs that set !HAS_DMA to delete their asm/dma-mapping.h file. Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: <geert@linux-m68k.org> Cc: <zippel@linux-m68k.org> Cc: <spyro@f2s.com> Cc: <ysato@users.sourceforge.jp> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
Stefan Richter	9e7bf24b1b	fs: clarify "dummy" member in struct inodes_stat_t Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:45 -07:00
David Howells	e8d6c55412	AFS: implement file locking Implement file locking for AFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:43 -07:00
Robert P. J. Day	0a3021f4e2	Remove unnecessary includes of spinlock.h under include/linux Remove the obviously unnecessary includes of <linux/spinlock.h> under the include/linux/ directory, and fix the couple errors that are introduced as a result of that. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
OGAWA Hirofumi	9aacd59934	fat: gcc 4.3 warning fix This patch fixes the following warnings. fs/fat/dir.c: In function 'fat_parse_long': include/linux/msdos_fs.h:294: warning: array subscript is above array bounds include/linux/msdos_fs.h:295: warning: array subscript is above array bounds include/linux/msdos_fs.h:295: warning: array subscript is above array bounds The ->name is defined as "name[8], ext[3]", but fat_checksum() uses those as name[11]. There is no actual problem, but it's not a good manner. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Andrew Morton	c67ad917cb	percpu_counters(): use cpu notifiers per-cpu counters presently must iterate over all possible CPUs in the exhaustive percpu_counter_sum(). But it can be much better to only iterate over the presently-online CPUs. To do this, we must arrange for an offlined CPU's count to be spilled into the counter's central count. We can do this for all percpu_counters in the machine by linking them into a single global list and walking that list at CPU_DEAD time. (I hope. Might have race windows in which the percpu_counter_sum() count is inaccurate?) Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Andrew Morton	21f3da95da	fuse warning fix gcc-4.3: fs/fuse/dir.c: In function 'parse_dirfile': fs/fuse/dir.c:833: warning: cast from pointer to integer of different size fs/fuse/dir.c:835: warning: cast from pointer to integer of different size [miklos@szeredi.hu: use offsetof] Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Matthias Kaehlcke	9ac162521c	Use mutexes instead of semaphores in I2O driver The I2O driver uses two semaphores as mutexes. Use the mutex API instead of the (binary) semaphores. Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Alan Cox	9c1729db3e	Prevent an O_NDELAY writer from blocking when a tty write is blocked by the tty atomic writer mutex Without this a tty write could block if a previous blocking tty write was in progress on the same tty and blocked by a line discipline or hardware event. Originally found and reported by Dave Johnson. Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Tomas Janousek	924b42d5a2	Use boot based time for process start time and boot time in /proc Commit `411187fb05` caused boot time to move and process start times to become invalid after suspend. Using boot based time for those restores the old behaviour and fixes the issue. [akpm@linux-foundation.org: little cleanup] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Tomas Janousek	7c3f1a5732	Introduce boot based time The commits `411187fb05` (GTOD: persistent clock support) `c1d370e167` (i386: use GTOD persistent clock support) changed the monotonic time so that it no longer jumps after resume, but it's not possible to use it for boot time and process start time calculations then. Also, the uptime no longer increases during suspend. I add a variable to track the wall_to_monotonic changes, a function to get the real boot time and a function to get the boot based time from the monotonic one. [akpm@linux-foundation.org: remove exports, add comment] Signed-off-by: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:41 -07:00
Zhang, Yanmin	ed4aaadb1a	fix jvc cdrom drive lockup Before calling init_hwif_default, ide_unregister gets lock ide_lock and disables irq. init_hwif_default calls ide_default_io_base which calls pci_get_device and later pci_get_subsys tries to apply for semaphore pci_bus_sem and goes to sleep. Mostly, pci_get_device should be called when irq is turned on. ide_default_io_base just needs find if list pci_devices is empty. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Greg KH <greg@kroah.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:40 -07:00
Satyam Sharma	e1f4a88c5a	introduce write_trylock_irqsave() Introduce a write_trylock_irqsave() implementation. Similar in style to the implementation of spin_trylock_irqsave() in mainline. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Sripathi Kodi <sripathik@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:40 -07:00
Alexey Dobriyan	786d7e1612	Fix rmmod/read/write races in /proc entries Fix following races: =========================================== 1. Write via ->write_proc sleeps in copy_from_user(). Module disappears meanwhile. Or, more generically, system call done on /proc file, method supplied by module is called, module dissapeares meanwhile. pde = create_proc_entry() if (!pde) return -ENOMEM; pde->write_proc = ... open write copy_from_user pde = create_proc_entry(); if (!pde) { remove_proc_entry(); return -ENOMEM; /* module unloaded / } boom* ========================================== 2. bogo-revoke aka proc_kill_inodes() remove_proc_entry vfs_read proc_kill_inodes [check ->f_op validness] [check ->f_op->read validness] [verify_area, security permissions checks] ->f_op = NULL; if (file->f_op->read) /* ->f_op dereference, boom */ NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's see how this scheme behaves, then extend if needed for directories. Directories creators in /proc only set ->owner for them, so proxying for directories may be unneeded. NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write, ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release. If your in-tree module uses something else, yell on me. Full audit pending. [akpm@linux-foundation.org: build fix] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:39 -07:00
Alan Cox	5568b0e802	v850: enable arbitary speed tty ioctls Adding the defines/macros activates the existing code in the tty layer and allows this platform to use the arbitary speed ioctl setting layer Signed-off-by: Alan Cox <alan@redhat.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:39 -07:00
Jeff Dike	e18eecb8b3	Add generic exit-time stack-depth checking to CONFIG_DEBUG_STACK_USAGE Add generic exit-time stack-depth checking to CONFIG_DEBUG_STACK_USAGE. This also adds UML support. Tested on UML and i386. [akpm@linux-foundation.org: cleanups, speedups, tweaks] Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:38 -07:00
Jeff Dike	84812217e3	uml: use get_free_pages to allocate kernel stacks For some reason, I was using kmalloc instead of get_free_pages for kernel stacks. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:38 -07:00
Alan Cox	e30afd5119	etrax: enable arbitary speed setting on tty ports Add the needed constants and bits. The actual code is already in the tty layer and turned on by the definitions Signed-off-by: Alan Cox <alan@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:38 -07:00
Alan Cox	86245b8507	m32r: enable arbitary speed tty rate setting Add the defines and constants needed for the M32R platform to support the arbitary speed tty ioctls. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:38 -07:00
Alan Cox	787ea0ef70	ARM26: enable arbitary speed tty ioctls and split input/output speed Add the ioctls and values needed for this to the ARM26/ARM32 ports. The actual code has been in the base kernel for a while and automatically turns on when a port sets the required defines. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:38 -07:00
Ivan Kokshaysky	5c6af69abe	fix alpha ISA support isa_bus_to_virt() is still needed in a few places (lance.c, at least). When we switch the kernel to using -Werror-implicit-function-declaration, the lack of isa_bus_to_virt() breaks alpha allmodconfig builds. Add isa_bus_to_virt() and deprecate the ezisting ISA APIs, though it might be better to define these functions as BUG(), since virt_to_bus/bus_to_virt just do wrong things on a number of machines. [akpm@linux-foundation.org: build fix] Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:37 -07:00
Yoshinori Sato	2fea299f74	h8300 entry.S update Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:37 -07:00
Alan Cox	f79224ca27	h8300: enable arbitary speed tty port setup Add the needed constants and defines to activate the new tty code on this platform Signed-off-by: Alan Cox <alan@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:37 -07:00
David Howells	c1a39e0505	FRV: Connect up new syscalls Connect up new system calls. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:37 -07:00
Miklos Szeredi	0165ab4435	split mmap This is a straightforward split of do_mmap_pgoff() into two functions: - do_mmap_pgoff() checks the parameters, and calculates the vma flags. Then it calls - mmap_region(), which does the actual mapping Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:37 -07:00
Paul Mundt	6193a2ff18	slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Jan Beulich	8f0accc862	kill vmalloc_earlyreserve This symbol got orphaned quite a while ago. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Jan Beulich	45e98cdb6d	page table handling cleanup Kill pte_rdprotect(), pte_exprotect(), pte_mkread(), pte_mkexec(), pte_read(), pte_exec(), and pte_user() except where arch-specific code is making use of them. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Andrew Morton	fc9a07e7bf	invalidate_mapping_pages(): add cond_resched invalidate_mapping_pages() can sometimes take a long time (millions of pages to free). Long enough for the softlockup detector to trigger. We used to have a cond_resched() in there but I took it out because the drop_caches code calls invalidate_mapping_pages() under inode_lock. The patch adds a nasty flag and puts the cond_resched() back. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Robert P. J. Day	698827fa9f	Remove the deprecated "kmem_cache_t" typedef from slab.h. Given that there is no remaining usage of the deprecated kmem_cache_t typedef anywhere in the tree, remove that typedef. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:35 -07:00
KAMEZAWA Hiroyuki	f0c0b2b808	change zonelist order: zonelist order selection logic Make zonelist creation policy selectable from sysctl/boot option v6. This patch makes NUMA's zonelist (of pgdat) order selectable. Available order are Default(automatic)/ Node-based / Zone-based. [Default Order] The kernel selects Node-based or Zone-based order automatically. [Node-based Order] This policy treats the locality of memory as the most important parameter. Zonelist order is created by each zone's locality. This means lower zones (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion. IOW. ZONE_DMA will be in the middle of zonelist. current 2.6.21 kernel uses this. Pros. * A user can expect local memory as much as possible. Cons. * lower zone will be exhansted before higher zone. This may cause OOM_KILL. Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL because of ZONE_DMA exhaution and you need the best locality. (example) assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL. node(0)'s memory allocation order: node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL. node(1)'s memory allocation order: node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA. [Zone-based order] This policy treats the zone type as the most important parameter. Zonelist order is created by zone-type order. This means lower zone never be used bofere higher zone exhaustion. IOW. ZONE_DMA will be always at the tail of zonelist. Pros. * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted. Cons. * memory locality may not be best. (example) assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL. node(0)'s memory allocation order: node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA. node(1)'s memory allocation order: node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA. bootoption "numa_zonelist_order=" and proc/sysctl is supporetd. command: %echo N > /proc/sys/vm/numa_zonelist_order Will rebuild zonelist in Node-based order. command: %echo Z > /proc/sys/vm/numa_zonelist_order Will rebuild zonelist in Zone-based order. Thanks to Lee Schermerhorn, he gives me much help and codes. [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order] [akpm@linux-foundation.org: build fix] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:35 -07:00
Yinghai Lu	18a8bd949d	serial: convert early_uart to earlycon for 8250 Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in serial initializing stage. the console_init=>serial8250_console_init=> register_console=>serial8250_console_setup will return -ENDEV, and console ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in drivers/serial/serial_core.c to call register_console to get console ttyS0. that is too late. Make early_uart to use early_param, so uart console can be used earlier. Make it to be bootconsole with CON_BOOT flag, so can use console handover feature. and it will switch to corresponding normal serial console automatically. new command line will be: console=uart8250,io,0x3f8,9600n8 console=uart8250,mmio,0xff5e0000,115200n8 or earlycon=uart8250,io,0x3f8,9600n8 earlycon=uart8250,mmio,0xff5e0000,115200n8 it will print in very early stage: Early serial console at I/O port 0x3f8 (options '9600n8') console [uart0] enabled later for console it will print: console handover: boot [uart0] -> real [ttyS0] Signed-off-by: <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:35 -07:00
Kristian Hoegsberg	23936cc0b5	lib: add idr_remove_all Remove all ids from the given idr tree. idr_destroy() only frees up unused, cached idp_layers, but this function will remove all id mappings and leave all idp_layers unused. A typical clean-up sequence for objects stored in an idr tree, will use idr_for_each() to free all objects, if necessay, then idr_remove_all() to remove all ids, and idr_destroy() to free up the cached idr_layers. Signed-off-by: Kristian Hoegsberg <krh@redhat.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:34 -07:00
Kristian Hoegsberg	96d7fa421e	lib: add idr_for_each() This patch adds an iterator function for the idr data structure. Compared to just iterating through the idr with an integer and idr_find, this iterator is (almost, but not quite) linear in the number of elements, as opposed to the number of integers in the range covered by the idr. This makes a difference for sparse idrs, but more importantly, it's a nicer way to iterate through the elements. The drm subsystem is moving to idr for tracking contexts and drawables, and with this change, we can use the idr exclusively for tracking these resources. [akpm@linux-foundation.org: fix comment] Signed-off-by: Kristian Hoegsberg <krh@redhat.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:34 -07:00
Nitin Gupta	f2a11b158a	LZO1X: fix lzo1x_worst_compress This is a correction for a macro which gives worst case compressed data size by LZO1X. This patch was provided by the LZO author (Markus Oberhumer). Signed-off-by: Nitin Gupta <nitingupta910@gmail.com> Cc: "Markus F.X.J. Oberhumer" <markus@oberhumer.com> Cc: "Richard Purdie" <rpurdie@openedhand.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:34 -07:00
Adrian Bunk	56a68a500f	more ACSI removal This patch removes some code that became dead code after the ATARI_ACSI removal. It also indirectly fixes the following bug introduced by commit `c2bcf3b897`: config ATARI_SLM tristate "Atari SLM laser printer support" - depends on ATARI && ATARI_ACSI!=n + depends on ATARI Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 15:02:47 +02:00
David S. Miller	e0204409df	[SPARC64]: dr-cpu unconfigure support. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:05:32 -07:00
David S. Miller	8b99cfb8cc	[SPARC64]: More sensible udelay implementation. Take a page from the powerpc folks and just calculate the delay factor directly. Since frequency scaling chips use a system-tick register, the value is going to be the same system-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:05:02 -07:00
David S. Miller	b14f5c100c	[SPARC64]: Fix build regressions added by dr-cpu changes. Do not select HOTPLUG_CPU from SUN_LDOMS, that causes HOTPLUG_CPU to be selected even on non-SMP which is illegal. Only build hvtramp.o when SMP, just like trampoline.o Protect dr-cpu code in ds.c with HOTPLUG_CPU. Likewise move ldom_startcpu_cpuid() to smp.c and protect it and the call site with SUN_LDOMS && HOTPLUG_CPU. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:49 -07:00
David S. Miller	4f0234f4f9	[SPARC64]: Initial LDOM cpu hotplug support. Only adding cpus is supports at the moment, removal will come next. When new cpus are configured, the machine description is updated. When we get the configure request we pass in a cpu mask of to-be-added cpus to the mdesc CPU node parser so it only fetches information for those cpus. That code also proceeds to update the SMT/multi-core scheduling bitmaps. cpu_up() does all the work and we return the status back over the DS channel. CPUs via dr-cpu need to be booted straight out of the hypervisor, and this requires: 1) A new trampoline mechanism. CPUs are booted straight out of the hypervisor with MMU disabled and running in physical addresses with no mappings installed in the TLB. The new hvtramp.S code sets up the critical cpu state, installs the locked TLB mappings for the kernel, and turns the MMU on. It then proceeds to follow the logic of the existing trampoline.S SMP cpu bringup code. 2) All calls into OBP have to be disallowed when domaining is enabled. Since cpus boot straight into the kernel from the hypervisor, OBP has no state about that cpu and therefore cannot handle being invoked on that cpu. Luckily it's only a handful of interfaces which can be called after the OBP device tree is obtained. For example, rebooting, halting, powering-off, and setting options node variables. CPU removal support will require some infrastructure changes here. Namely we'll have to process the requests via a true kernel thread instead of in a workqueue. workqueues run on a per-cpu thread, but when unconfiguring we might need to force the thread to execute on another cpu if the current cpu is the one being removed. Removal of a cpu also causes the kernel to destroy that cpu's workqueue running thread. Another issue on removal is that we may have interrupts still pointing to the cpu-to-be-removed. So new code will be needed to walk the active INO list and retarget those cpus as-needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:40 -07:00
David S. Miller	b3e13fbeb9	[SPARC64]: Fix setting of variables in LDOM guest. There is a special domain services capability for setting variables in the OBP options node. Guests don't have permanent store for the OBP variables like a normal system, so they are instead maintained in the LDOM control node or in the SC. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:36 -07:00
David S. Miller	83292e0a9c	[SPARC64]: Fix MD property lifetime bugs. Property values cannot be referenced outside of mdesc_grab()/mdesc_release() pairs. The only major offender was the VIO bus layer, easily fixed. Add some commentary to mdesc.h describing these rules. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:33 -07:00
David S. Miller	43fdf27470	[SPARC64]: Abstract out mdesc accesses for better MD update handling. Since we have to be able to handle MD updates, having an in-tree set of data structures representing the MD objects actually makes things more painful. The MD itself is easy to parse, and we can implement the existing interfaces using direct parsing of the MD binary image. The MD is now reference counted, so accesses have to now take the form: handle = mdesc_grab(); ... operations on MD ... mdesc_release(handle); The only remaining issue are cases where code holds on to references to MD property values. mdesc_get_property() returns a direct pointer to the property value, most cases just pull in the information they need and discard the pointer, but there are few that use the pointer directly over a long lifetime. Those will be fixed up in a subsequent changeset. A preliminary handler for MD update events from domain services is there, it is rudimentry but it works and handles all of the reference counting. It does not check the generation number of the MDs, and it does not generate a "add/delete" list for notification to interesting parties about MD changes but that will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:28 -07:00
David S. Miller	133f09a169	[SPARC64]: Use more mearningful names for IRQ registry. All of the interrupts say "LDX RX" and "LDX TX" currently which is next to useless. Put a device specific prefix before "RX" and "TX" instead which makes it much more useful. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:24 -07:00
David S. Miller	13077d8028	[SPARC64]: Export powerd facilities for external entities. Besides the existing usage for power-button interrupts, we'll want to make use of this code for domain-services where the LDOM manager can send reboot requests to the guest node. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:16 -07:00
David S. Miller	cb48123584	[SPARC64]: Assorted LDC bug cures. 1) LDC_MODE_RELIABLE is deprecated an unused by anything, plus it and LDC_MODE_STREAM were mis-numbered. 2) read_stream() should try to read as much as possible into the per-LDC stream buffer area, so do not trim the read_nonraw() length by the caller's size parameter. 3) Send data ACKs when necessary in read_nonraw(). 4) In read_nonraw() when we get a pure ACK, advance the RX head unconditionally past it. 5) Provide the ACKID field in the ldcdgb() packet dump in read_nonraw(). This helps debugging stream mode LDC channel problems. 6) Decrease verbosity of rx_data_wait() so that it is more useful. A debugging message each loop iteration is too much. 7) In process_data_ack() stop the loop checking when we hit lp->tx_tail not lp->tx_head. 8) Set the seqid field properly in send_data_nack(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:04:09 -07:00
David S. Miller	e53e97ce3c	[SPARC64]: Add LDOM virtual channel driver and VIO device layer. Virtual devices on Sun Logical Domains are built on top of a virtual channel framework. This, with help of hypervisor interfaces, provides a link layer protocol with basic handshaking over which virtual device clients and servers communicate. Built on top of this is a VIO device protocol which has it's own handshaking and message types. At this layer attributes are exchanged (disk size, network device addresses, etc.) descriptor rings are registered, and data transfers are triggers and replied to. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-16 04:03:18 -07:00

1 2 3 4 5 ...

14851 commits