kernel-fxtec-pro1x

Author	SHA1	Message	Date
Magnus Damm	386d9a7edd	[PATCH] elf: Always define elf_addr_t in linux/elf.h Define elf_addr_t in linux/elf.h. The size of the type is determined using ELF_CLASS. This allows us to remove the defines that today are spread all over .c and .h files. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Cc: Daniel Jacobowitz <drow@false.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:38 -08:00
suzuki	651971cb72	[PATCH] Fix the size limit of compat space msgsize Currently we allocate 64k space on the user stack and use it the msgbuf for sys_{msgrcv,msgsnd} for compat and the results are later copied in user [ by copy_in_user]. This patch introduces helper routines for sys_{msgrcv,msgsnd} as below: do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with the msqid and msgflg. do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to the buffer. The mtype has to be copied back the user space msgbuf by the caller. These changes avoid the need to allocate the msgsize on the userspace ( thus removing the size limt ) and the overhead of an extra copy_in_user(). Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:38 -08:00
Thomas Gleixner	cfd1893477	[PATCH] ktime: Fix signed / unsigned mismatch in ktime_to_ns The 32 bit implementation of ktime_to_ns returns unsigned value, while the 64 bit version correctly returns an signed value. There is no current user affected by this, but it has to be fixed, as ktime values can be negative. Pointed-out-by: Helmut Duregger <Helmut.Duregger@student.uibk.ac.at> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:37 -08:00
Andrew Morton	0490366432	[PATCH] remove HASH_HIGHMEM It has no users and it's doubtful that we'll need it again. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:37 -08:00
Peter Zijlstra	d5abe66917	[PATCH] debug: workqueue locking sanity Workqueue functions should not leak locks, assert so, printing the last function ran. Use macros in lockdep.h to avoid include dependency pains. [akpm@osdl.org: build fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:36 -08:00
Ingo Molnar	ece8a684c7	[PATCH] sleep profiling Implement prof=sleep profiling. TASK_UNINTERRUPTIBLE sleeps will be taken as a profile hit, and every millisecond spent sleeping causes a profile-hit for the call site that initiated the sleep. Sample readprofile output on i386: 306 ps2_sendbyte 1.3973 432 call_usermodehelper_keys 1.9548 484 ps2_command 0.6453 790 __driver_attach 4.7879 1593 msleep 44.2500 3976 sync_buffer 64.1290 4076 do_lookup 12.4648 8587 sync_page 122.6714 20820 total 0.0067 (NOTE: architectures need to check whether get_wchan() can be called from deep within the wakeup path.) akpm: we need to mark more functions __sched. lock_sock(), msleep(), others.. akpm: the contention in do_lookup() is a surprise. Presumably doing disk reads for directory contents while holding i_mutex. [akpm@osdl.org: various fixes] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:36 -08:00
Peter Zijlstra	6cfd76a26d	[PATCH] lockdep: name some old style locks Name some of the remaning 'old_style_spin_init' locks Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:36 -08:00
Andrew Morton	8984d137df	[PATCH] ext4: uninline large functions Saves nearly 4kbytes on x86. Cc: Arnaldo Carvalho de Melo <acme@mandriva.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:35 -08:00
Andrew Morton	3a229b39eb	[PATCH] ext3: uninline large functions Saves nearly 4kbytes on x86. Cc: Arnaldo Carvalho de Melo <acme@mandriva.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:35 -08:00
Paul B Schroeder	e0980dafa3	[PATCH] Exar quad port serial This is on our "Envoy" boxes which we have, according to the documentation, an "Exar ST16C554/554D Quad UART with 16-byte Fifo's". The box also has two other "on-board" serial ports and a modem chip. The two on-board serial UARTs were being detected along with the first two Exar UARTs. The last two Exar UARTs were not showing up and neither was the modem. This patch was the only way I could the kernel to see beyond the standard four serial ports and get all four of the Exar UARTs to show up. [akpm@osdl.org: build fix] Signed-off-by: Paul B Schroeder <pschroeder@uplogix.com> Cc: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:35 -08:00
Alexey Dobriyan	9774a1f54f	[PATCH] Compile-time check re world-writeable module params One of the mistakes a module_param() user can make is to supply default value of module parameter as the last argument. module_param() accepts permissions instead. If default value is, say, 3 (-------wx), parameter becomes world-writeable. So far, the only remedy was to apply grep(1) and read drivers submitted to -mm. BTDT. With this patch applied, compiler will finally do some job. ) bounds checking on permissions ) world-writeable bit checking on permissions *) compile breakage if checks trigger First version of this check (only "& 2" part) directly caught 4 out of 7 places during my last grep. Subject: Neverending module_param() bugs [X] drivers/acpi/sbs.c:101:module_param(capacity_mode, int, CAPACITY_UNIT); [X] drivers/acpi/sbs.c:102:module_param(update_mode, int, UPDATE_MODE); [ ] drivers/acpi/sbs.c:103:module_param(update_info_mode, int, UPDATE_INFO_MODE); [ ] drivers/acpi/sbs.c:104:module_param(update_time, int, UPDATE_TIME); [ ] drivers/acpi/sbs.c:105:module_param(update_time2, int, UPDATE_TIME2); [X] drivers/char/watchdog/sbc8360.c:203:module_param(timeout, int, 27); [X] drivers/media/video/tuner-simple.c:13:module_param(offset, int, 0666); Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:34 -08:00
Oleg Nesterov	34ec12349c	[PATCH] taskstats: cleanup ->signal->stats allocation Allocate ->signal->stats on demand in taskstats_exit(), this allows us to remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's public interface. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:34 -08:00
Oleg Nesterov	115085ea07	[PATCH] taskstats: cleanup do_exit() path do_exit: taskstats_exit_alloc() ... taskstats_exit_send() taskstats_exit_free() I think this is not good, let it be a single function exported to the core kernel, taskstats_exit(), which does alloc + send + free itself. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:34 -08:00
Andrew Morton	20aa7b21b1	[PATCH] probe_kernel_address() needs to do set_fs() probe_kernel_address() purports to be generic, only it forgot to select KERNEL_DS, so it presently won't work right on all architectures. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:34 -08:00
Ryan Underwood	c140e11001	[PATCH] parport_pc: Add support for OX16PCI952 parallel port Add support for the parallel port (implemented as separate PCI function) on the Oxford Semiconductor OX16PCI952. Signed-off-by: Ryan Underwood <nemesis@icequake.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:34 -08:00
Jan Engelhardt	5ec68b2e31	[PATCH] pull in necessary header files for cdev.h linux/cdev.h uses struct kobject and other structs and should therefore include them. Currently, a module either needs to add the missing includes itself, or, in case a module includes other headers already, needs to put <linux/cdev.h> last, which goes against a alphabetically-sorted include list. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Ingo Molnar	e59e2ae2c2	[PATCH] SysRq-X: show blocked tasks Add SysRq-X support: show blocked (TASK_UNINTERRUPTIBLE) tasks only. Useful for debugging IO stalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Miklos Szeredi	0ec7ca41f6	[PATCH] fuse: add DESTROY operation Add a DESTROY operation for block device based filesystems. With the help of this operation, such a filesystem can flush dirty data to the device synchronously before the umount returns. This is needed in situations where the filesystem is assumed to be clean immediately after unmount (e.g. ejecting removable media). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Miklos Szeredi	b2d2272fae	[PATCH] fuse: add bmap support Add support for the BMAP operation for block device based filesystems. This is needed to support swap-files and lilo. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Miklos Szeredi	e9168c189f	[PATCH] fuse: update userspace interface to version 7.8 Add a flag to the RELEASE message which specifies that a FLUSH operation should be performed as well. This interface update is needed for the FreeBSD port, and doesn't actually touch the Linux implementation at all. Also rename the unused 'flush_flags' in the FLUSH message to 'unused'. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:31 -08:00
Jan Engelhardt	48ed214d10	[PATCH] constify inode accessors Change the signature of i_size_read(), IMINOR() and IMAJOR() because they, or the functions they call, will never modify the argument. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:31 -08:00
Peter Korsgaard	238b8721a5	[PATCH] serial uartlite driver Add a driver for the Xilinx uartlite serial controller used in boards with the PPC405 core in the Xilinx V2P/V4 fpgas. The hardware is very simple (baudrate/start/stopbits fixed and no break support). See the datasheet for details: http://www.xilinx.com/bvdocs/ipcenter/data_sheet/opb_uartlite.pdf See http://thread.gmane.org/gmane.linux.serial/1237/ for the email thread. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Olof Johansson <olof@lixom.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:30 -08:00
Mike Miller	799202cbd0	[PATCH] cciss: add support for 1024 logical volumes Add the support for a large number of logical volumes. We will soon have hardware that support up to 1024 logical volumes. Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:30 -08:00
Rafael J. Wysocki	341a595850	[PATCH] Support for freezeable workqueues Make it possible to create a workqueue the worker thread of which will be frozen during suspend, along with other kernel threads. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:29 -08:00
Rafael J. Wysocki	a9b6f562f1	[PATCH] swsusp: Untangle thaw_processes Move the loop from thaw_processes() to a separate function and call it independently for kernel threads and user space processes so that the order of thawing tasks is clearly visible. Drop thaw_kernel_threads() which is never used. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:28 -08:00
Nigel Cunningham	ff39593ad0	[PATCH] swsusp: thaw userspace and kernel space separately Modify process thawing so that we can thaw kernel space without thawing userspace, and thaw kernelspace first. This will be useful in later patches, where I intend to get swsusp thawing kernel threads only before seeking to free memory. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:28 -08:00
Nigel Cunningham	7dfb71030f	[PATCH] Add include/linux/freezer.h and move definitions from sched.h Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:27 -08:00
Rafael J. Wysocki	8357376d3d	[PATCH] swsusp: Improve handling of highmem Currently swsusp saves the contents of highmem pages by copying them to the normal zone which is quite inefficient (eg. it requires two normal pages to be used for saving one highmem page). This may be improved by using highmem for saving the contents of saveable highmem pages. Namely, during the suspend phase of the suspend-resume cycle we try to allocate as many free highmem pages as there are saveable highmem pages. If there are not enough highmem image pages to store the contents of all of the saveable highmem pages, some of them will be stored in the "normal" memory. Next, we allocate as many free "normal" pages as needed to store the (remaining) image data. We use a memory bitmap to mark the allocated free pages (ie. highmem as well as "normal" image pages). Now, we use another memory bitmap to mark all of the saveable pages (highmem as well as "normal") and the contents of the saveable pages are copied into the image pages. Then, the second bitmap is used to save the pfns corresponding to the saveable pages and the first one is used to save their data. During the resume phase the pfns of the pages that were saveable during the suspend are loaded from the image and used to mark the "unsafe" page frames. Next, we try to allocate as many free highmem page frames as to load all of the image data that had been in the highmem before the suspend and we allocate so many free "normal" page frames that the total number of allocated free pages (highmem and "normal") is equal to the size of the image. While doing this we have to make sure that there will be some extra free "normal" and "safe" page frames for two lists of PBEs constructed later. Now, the image data are loaded, if possible, into their "original" page frames. The image data that cannot be written into their "original" page frames are loaded into "safe" page frames and their "original" kernel virtual addresses, as well as the addresses of the "safe" pages containing their copies, are stored in one of two lists of PBEs. One list of PBEs is for the copies of "normal" suspend pages (ie. "normal" pages that were saveable during the suspend) and it is used in the same way as previously (ie. by the architecture-dependent parts of swsusp). The other list of PBEs is for the copies of highmem suspend pages. The pages in this list are restored (in a reversible way) right before the arch-dependent code is called. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:27 -08:00
Rafael J. Wysocki	3aef83e0ef	[PATCH] swsusp: use block device offsets to identify swap locations Make swsusp use block device offsets instead of swap offsets to identify swap locations and make it use the same code paths for writing as well as for reading data. This allows us to use the same code for handling swap files and swap partitions and to simplify the code, eg. by dropping rw_swap_page_sync(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:27 -08:00
Rafael J. Wysocki	915bae9ebe	[PATCH] swsusp: use partition device and offset to identify swap areas The Linux kernel handles swap files almost in the same way as it handles swap partitions and there are only two differences between these two types of swap areas: (1) swap files need not be contiguous, (2) the header of a swap file is not in the first block of the partition that holds it. From the swsusp's point of view (1) is not a problem, because it is already taken care of by the swap-handling code, but (2) has to be taken into consideration. In principle the location of a swap file's header may be determined with the help of appropriate filesystem driver. Unfortunately, however, it requires the filesystem holding the swap file to be mounted, and if this filesystem is journaled, it cannot be mounted during a resume from disk. For this reason we need some other means by which swap areas can be identified. For example, to identify a swap area we can use the partition that holds the area and the offset from the beginning of this partition at which the swap header is located. The following patch allows swsusp to identify swap areas this way. It changes swap_type_of() so that it takes an additional argument representing an offset of the swap header within the partition represented by its first argument. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:27 -08:00
Nick Piggin	7cf9c2c76c	[PATCH] radix-tree: RCU lockless readside Make radix tree lookups safe to be performed without locks. Readers are protected against nodes being deleted by using RCU based freeing. Readers are protected against new node insertion by using memory barriers to ensure the node itself will be properly written before it is visible in the radix tree. Each radix tree node keeps a record of their height (above leaf nodes). This height does not change after insertion -- when the radix tree is extended, higher nodes are only inserted in the top. So a lookup can take the pointer to what is now the root node, and traverse down it even if the tree is concurrently extended and this node becomes a subtree of a new root. "Direct" pointers (tree height of 0, where root->rnode points directly to the data item) are handled by using the low bit of the pointer to signal whether rnode is a direct pointer or a pointer to a radix tree node. When a reader wants to traverse the next branch, they will take a copy of the pointer. This pointer will be either NULL (and the branch is empty) or non-NULL (and will point to a valid node). [akpm@osdl.org: cleanups] [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications] [clameter@sgi.com: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Arnaldo Carvalho de Melo	36de643786	[PATCH] Save some bytes in struct mm_struct Before: [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct /* include2/asm/processor.h:542 / struct mm_struct { struct vm_area_struct mmap; /* 0 4 / struct rb_root mm_rb; / 4 4 / struct vm_area_struct mmap_cache; /* 8 4 / long unsigned int (get_unmapped_area)(); /* 12 4 / void (unmap_area)(); /* 16 4 / long unsigned int mmap_base; / 20 4 / long unsigned int task_size; / 24 4 / long unsigned int cached_hole_size; / 28 4 / / ---------- cacheline 1 boundary ---------- / long unsigned int free_area_cache; / 32 4 / pgd_t pgd; /* 36 4 / atomic_t mm_users; / 40 4 / atomic_t mm_count; / 44 4 / int map_count; / 48 4 / struct rw_semaphore mmap_sem; / 52 64 / spinlock_t page_table_lock; / 116 40 / struct list_head mmlist; / 156 8 / mm_counter_t _file_rss; / 164 4 / mm_counter_t _anon_rss; / 168 4 / long unsigned int hiwater_rss; / 172 4 / long unsigned int hiwater_vm; / 176 4 / long unsigned int total_vm; / 180 4 / long unsigned int locked_vm; / 184 4 / long unsigned int shared_vm; / 188 4 / / ---------- cacheline 6 boundary ---------- / long unsigned int exec_vm; / 192 4 / long unsigned int stack_vm; / 196 4 / long unsigned int reserved_vm; / 200 4 / long unsigned int def_flags; / 204 4 / long unsigned int nr_ptes; / 208 4 / long unsigned int start_code; / 212 4 / long unsigned int end_code; / 216 4 / long unsigned int start_data; / 220 4 / / ---------- cacheline 7 boundary ---------- / long unsigned int end_data; / 224 4 / long unsigned int start_brk; / 228 4 / long unsigned int brk; / 232 4 / long unsigned int start_stack; / 236 4 / long unsigned int arg_start; / 240 4 / long unsigned int arg_end; / 244 4 / long unsigned int env_start; / 248 4 / long unsigned int env_end; / 252 4 / / ---------- cacheline 8 boundary ---------- / long unsigned int saved_auxv[44]; / 256 176 / unsigned int dumpable:2; / 432 4 / cpumask_t cpu_vm_mask; / 436 4 / mm_context_t context; / 440 68 / long unsigned int swap_token_time; / 508 4 / / ---------- cacheline 16 boundary ---------- / char recent_pagein; / 512 1 / / XXX 3 bytes hole, try to pack / int core_waiters; / 516 4 / struct completion core_startup_done; /* 520 4 / struct completion core_done; / 524 52 / rwlock_t ioctx_list_lock; / 576 36 / struct kioctx ioctx_list; /* 612 4 / }; / size: 616, sum members: 613, holes: 1, sum holes: 3, cachelines: 20, last cacheline: 8 bytes / After: [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct / include2/asm/processor.h:542 / struct mm_struct { struct vm_area_struct mmap; /* 0 4 / struct rb_root mm_rb; / 4 4 / struct vm_area_struct mmap_cache; /* 8 4 / long unsigned int (get_unmapped_area)(); /* 12 4 / void (unmap_area)(); /* 16 4 / long unsigned int mmap_base; / 20 4 / long unsigned int task_size; / 24 4 / long unsigned int cached_hole_size; / 28 4 / / ---------- cacheline 1 boundary ---------- / long unsigned int free_area_cache; / 32 4 / pgd_t pgd; /* 36 4 / atomic_t mm_users; / 40 4 / atomic_t mm_count; / 44 4 / int map_count; / 48 4 / struct rw_semaphore mmap_sem; / 52 64 / spinlock_t page_table_lock; / 116 40 / struct list_head mmlist; / 156 8 / mm_counter_t _file_rss; / 164 4 / mm_counter_t _anon_rss; / 168 4 / long unsigned int hiwater_rss; / 172 4 / long unsigned int hiwater_vm; / 176 4 / long unsigned int total_vm; / 180 4 / long unsigned int locked_vm; / 184 4 / long unsigned int shared_vm; / 188 4 / / ---------- cacheline 6 boundary ---------- / long unsigned int exec_vm; / 192 4 / long unsigned int stack_vm; / 196 4 / long unsigned int reserved_vm; / 200 4 / long unsigned int def_flags; / 204 4 / long unsigned int nr_ptes; / 208 4 / long unsigned int start_code; / 212 4 / long unsigned int end_code; / 216 4 / long unsigned int start_data; / 220 4 / / ---------- cacheline 7 boundary ---------- / long unsigned int end_data; / 224 4 / long unsigned int start_brk; / 228 4 / long unsigned int brk; / 232 4 / long unsigned int start_stack; / 236 4 / long unsigned int arg_start; / 240 4 / long unsigned int arg_end; / 244 4 / long unsigned int env_start; / 248 4 / long unsigned int env_end; / 252 4 / / ---------- cacheline 8 boundary ---------- / long unsigned int saved_auxv[44]; / 256 176 / cpumask_t cpu_vm_mask; / 432 4 / mm_context_t context; / 436 68 / long unsigned int swap_token_time; / 504 4 / char recent_pagein; / 508 1 / unsigned char dumpable:2; / 509 1 / / XXX 2 bytes hole, try to pack / int core_waiters; / 512 4 / struct completion core_startup_done; /* 516 4 / struct completion core_done; / 520 52 / rwlock_t ioctx_list_lock; / 572 36 / struct kioctx ioctx_list; /* 608 4 / }; / size: 612, sum members: 610, holes: 1, sum holes: 2, cachelines: 20, last cacheline: 4 bytes / [acme@newtoy net-2.6.20]$ codiff -V /tmp/sched.o.before kernel/sched.o /pub/scm/linux/kernel/git/acme/net-2.6.20/kernel/sched.c: struct mm_struct \| -4 dumpable:2; from: unsigned int / 432(30) 4(2) / to: unsigned char / 509(6) 1(2) */ < SNIP other offset changes > 1 struct changed [acme@newtoy net-2.6.20]$ I'm not aware of any problem about using 2 byte wide bitfields where previously a 4 byte wide one was, holler if there is any, I wouldn't be surprised, bitfields are things from hell. For the curious, 432(30) means: at offset 432 from the struct start, at offset 30 in the bitfield (yeah, it comes backwards, hellish, huh?) ditto for 509(6), while 4(2) and 1(2) means "struct field size(bitfield size)". Now we have a 2 bytes hole and are using only 4 bytes of the last 32 bytes cacheline, any takers? :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Andy Whitcroft	33f2ef89f8	[PATCH] mm: make compound page destructor handling explicit Currently we we use the lru head link of the second page of a compound page to hold its destructor. This was ok when it was purely an internal implmentation detail. However, hugetlbfs overrides this destructor violating the layering. Abstract this out as explicit calls, also introduce a type for the callback function allowing them to be type checked. For each callback we pre-declare the function, causing a type error on definition rather than on use elsewhere. [akpm@osdl.org: cleanups] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Andrew Morton	1b1cec4bbc	[PATCH] slab: deprecate kmem_cache_t Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	441e143e95	[PATCH] slab: remove SLAB_DMA SLAB_DMA is an alias of GFP_DMA. This is the last one so we remove the leftover comment too. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Christoph Lameter	e94b176609	[PATCH] slab: remove SLAB_KERNEL SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Christoph Lameter	f7267c0c07	[PATCH] slab: remove SLAB_USER SLAB_USER is an alias of GFP_USER Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Christoph Lameter	e6b4f8da3a	[PATCH] slab: remove SLAB_NOFS SLAB_NOFS is an alias of GFP_NOFS. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	55acbda096	[PATCH] slab: remove SLAB_NOIO SLAB_NOIO is an alias of GFP_NOIO with a single instance of use. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	a06d72c1dc	[PATCH] slab: remove SLAB_LEVEL_MASK SLAB_LEVEL_MASK is only used internally to the slab and is and alias of GFP_LEVEL_MASK. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	6e0eaa4b05	[PATCH] slab: remove SLAB_NO_GROW It is only used internally in the slab. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Andy Whitcroft	25ba77c141	[PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int NUMA node ids are passed as either int or unsigned int almost exclusivly page_to_nid and zone_to_nid both return unsigned long. This is a throw back to when page_to_nid was a #define and was thus exposing the real type of the page flags field. In addition to fixing up the definitions of page_to_nid and zone_to_nid I audited the users of these functions identifying the following incorrect uses: 1) mm/page_alloc.c show_node() -- printk dumping the node id, 2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison against numa_node_id() which returns an int from cpu_to_node(), and 3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which uses bit_set which in generic code takes an int. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	ebe29738f3	[PATCH] Remove uses of kmem_cache_t from mm/* and include/linux/slab.h Remove all uses of kmem_cache_t (the most were left in slab.h). The typedef for kmem_cache_t is then only necessary for other kernel subsystems. Add a comment to that effect. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	b86c089b83	[PATCH] Move names_cachep to linux/fs.h The names_cachep is used for getname() and putname(). So lets put it into fs.h near those two definitions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	aa362a83e7	[PATCH] Move fs_cachep to linux/fs_struct.h fs_cachep is only used in kernel/exit.c and in kernel/fork.c. It is used to store fs_struct items so it should be placed in linux/fs_struct.h Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	8b7d91eb7f	[PATCH] Move filep_cachep to include/file.h filp_cachep is only used in fs/file_table.c and in fs/dcache.c where it is defined. Move it to related definitions in linux/file.h. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Christoph Lameter	5d6538fcf2	[PATCH] Move files_cachep to include/file.h Proper place is in file.h since files_cachep uses are rated to file I/O. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Lameter	c43692e85f	[PATCH] Move vm_area_cachep to include/mm.h vm_area_cachep is used to store vm_area_structs. So move to mm.h. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Lameter	298ec1e2ac	[PATCH] Move sighand_cachep to include/signal.h Move sighand_cachep definitioni to linux/signal.h The sighand cache is only used in fs/exec.c and kernel/fork.c. It is defined in kernel/fork.c but only used in fs/exec.c. The sighand_cachep is related to signal processing. So add the definition to signal.h. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Lameter	54cc211ce3	[PATCH] Remove bio_cachep from slab.h Remove bio_cachep from slab.h - it no longer exists. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Hellwig	b30973f877	[PATCH] node-aware skb allocation Node-aware allocation of skbs for the receive path. Details: - __alloc_skb gets a new node argument and cals the node-aware slab functions with it. - netdev_alloc_skb passed the node number it gets from dev_to_node to it, everyone else passes -1 (any node) Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Hellwig	873481367e	[PATCH] add numa node information to struct device For node-aware skb allocations we need information about the node in struct net_device or struct device. Davem suggested to put it into struct device which this patch does. In particular: - struct device gets a new int numa_node member if CONFIG_NUMA is set - there are two new helpers, dev_to_node and set_dev_node to transparently deal with the non-numa case - for pci devices the node-info is set to the value we get from pcibus_to_node. Note that for some architectures pcibus_to_node doesn't work yet at the time we call it currently. This is harmless and will just mean skb allocations aren't node-local on this architectures until the implementation of pcibus_to_node on these architectures have been updated (There are patches for x86 and x86_64 floating around) [akpm@osdl.org: cleanup] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Christoph Hellwig	8b98c1699e	[PATCH] leak tracking for kmalloc_node We have variants of kmalloc and kmem_cache_alloc that leave leak tracking to the caller. This is used for subsystem-specific allocators like skb_alloc. To make skb_alloc node-aware we need similar routines for the node-aware slab allocator, which this patch adds. Note that the code is rather ugly, but it mirrors the non-node-aware code 1:1: [akpm@osdl.org: add module export] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Peter Zijlstra	ad76fb6b5a	[PATCH] mm: k{,um}map_atomic() vs in_atomic() Make kmap_atomic/kunmap_atomic denote a pagefault disabled scope. All non trivial implementations already do this anyway. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Peter Zijlstra	a866374aec	[PATCH] mm: pagefault_{disable,enable}() Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Chen, Kenneth W	39dde65c99	[PATCH] shared page table for hugetlb page Following up with the work on shared page table done by Dave McCracken. This set of patch target shared page table for hugetlb memory only. The shared page table is particular useful in the situation of large number of independent processes sharing large shared memory segments. In the normal page case, the amount of memory saved from process' page table is quite significant. For hugetlb, the saving on page table memory is not the primary objective (as hugetlb itself already cuts down page table overhead significantly), instead, the purpose of using shared page table on hugetlb is to allow faster TLB refill and smaller cache pollution upon TLB miss. With PT sharing, pte entries are shared among hundreds of processes, the cache consumption used by all the page table is smaller and in return, application gets much higher cache hit ratio. One other effect is that cache hit ratio with hardware page walker hitting on pte in cache will be higher and this helps to reduce tlb miss latency. These two effects contribute to higher application performance. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Dave McCracken <dmccr@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Nick Piggin	cc10250907	[PATCH] mm: add arch_alloc_page Add an arch_alloc_page to match arch_free_page. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Ashwin Chaugule	7602bdf2fd	[PATCH] new scheme to preempt swap token The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Paul Jackson	7253f4ef04	[PATCH] memory page_alloc zonelist caching reorder structure Rearrange the struct members in the 'struct zonelist_cache' structure, so as to put the readonly (once initialized) z_to_n[] array first, where it will come right after the zones[] array in struct zonelist. This pretty much eliminates the chance that the two frequently written elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap times, will end up on the same cache line as the performance sensitive, frequently read, never (after init) written zones[] array. Keeping frequently written data off frequently read cache lines is good for performance. Thanks to Rohit Seth for the suggestion. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Paul Jackson	9276b1bc96	[PATCH] memory page_alloc zonelist caching speedup Optimize the critical zonelist scanning for free pages in the kernel memory allocator by caching the zones that were found to be full recently, and skipping them. Remembers the zones in a zonelist that were short of free memory in the last second. And it stashes a zone-to-node table in the zonelist struct, to optimize that conversion (minimize its cache footprint.) Recent changes: This differs in a significant way from a similar patch that I posted a week ago. Now, instead of having a nodemask_t of recently full nodes, I have a bitmask of recently full zones. This solves a problem that last weeks patch had, which on systems with multiple zones per node (such as DMA zone) would take seeing any of these zones full as meaning that all zones on that node were full. Also I changed names - from "zonelist faster" to "zonelist cache", as that seemed to better convey what we're doing here - caching some of the key zonelist state (for faster access.) See below for some performance benchmark results. After all that discussion with David on why I didn't need them, I went and got some ;). I wanted to verify that I had not hurt the normal case of memory allocation noticeably. At least for my one little microbenchmark, I found (1) the normal case wasn't affected, and (2) workloads that forced scanning across multiple nodes for memory improved up to 10% fewer System CPU cycles and lower elapsed clock time ('sys' and 'real'). Good. See details, below. I didn't have the logic in get_page_from_freelist() for various full nodes and zone reclaim failures correct. That should be fixed up now - notice the new goto labels zonelist_scan, this_zone_full, and try_next_zone, in get_page_from_freelist(). There are two reasons I persued this alternative, over some earlier proposals that would have focused on optimizing the fake numa emulation case by caching the last useful zone: 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems) have seen real customer loads where the cost to scan the zonelist was a problem, due to many nodes being full of memory before we got to a node we could use. Or at least, I think we have. This was related to me by another engineer, based on experiences from some time past. So this is not guaranteed. Most likely, though. The following approach should help such real numa systems just as much as it helps fake numa systems, or any combination thereof. 2) The effort to distinguish fake from real numa, using node_distance, so that we could cache a fake numa node and optimize choosing it over equivalent distance fake nodes, while continuing to properly scan all real nodes in distance order, was going to require a nasty blob of zonelist and node distance munging. The following approach has no new dependency on node distances or zone sorting. See comment in the patch below for a description of what it actually does. Technical details of note (or controversy): - See the use of "zlc_active" and "did_zlc_setup" below, to delay adding any work for this new mechanism until we've looked at the first zone in zonelist. I figured the odds of the first zone having the memory we needed were high enough that we should just look there, first, then get fancy only if we need to keep looking. - Some odd hackery was needed to add items to struct zonelist, while not tripping up the custom zonelists built by the mm/mempolicy.c code for MPOL_BIND. My usual wordy comments below explain this. Search for "MPOL_BIND". - Some per-node data in the struct zonelist is now modified frequently, with no locking. Multiple CPU cores on a node could hit and mangle this data. The theory is that this is just performance hint data, and the memory allocator will work just fine despite any such mangling. The fields at risk are the struct 'zonelist_cache' fields 'fullzones' (a bitmask) and 'last_full_zap' (unsigned long jiffies). It should all be self correcting after at most a one second delay. - This still does a linear scan of the same lengths as before. All I've optimized is making the scan faster, not algorithmically shorter. It is now able to scan a compact array of 'unsigned short' in the case of many full nodes, so one cache line should cover quite a few nodes, rather than each node hitting another one or two new and distinct cache lines. - If both Andi and Nick don't find this too complicated, I will be (pleasantly) flabbergasted. - I removed the comment claiming we only use one cachline's worth of zonelist. We seem, at least in the fake numa case, to have put the lie to that claim. - I pay no attention to the various watermarks and such in this performance hint. A node could be marked full for one watermark, and then skipped over when searching for a page using a different watermark. I think that's actually quite ok, as it will tend to slightly increase the spreading of memory over other nodes, away from a memory stressed node. =============== Performance - some benchmark results and analysis: This benchmark runs a memory hog program that uses multiple threads to touch alot of memory as quickly as it can. Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of the total 96 GBytes on the system, and using 1, 19, 37, or 55 threads (on a 56 CPU system.) System, user and real (elapsed) timings were recorded for each run, shown in units of seconds, in the table below. Two kernels were tested - 2.6.18-mm3 and the same kernel with this zonelist caching patch added. The table also shows the percentage improvement the zonelist caching sys time is over (lower than) the stock -mm kernel. number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent GBs N ------------ -------------- ---------------- systime mem threads sys user real sys user real sys user real better 12 1 153 24 177 151 24 176 -2 0 -1 1% 12 19 99 22 8 99 22 8 0 0 0 0% 12 37 111 25 6 112 25 6 1 0 0 -0% 12 55 115 25 5 110 23 5 -5 -2 0 4% 38 1 502 74 576 497 73 570 -5 -1 -6 0% 38 19 426 78 48 373 76 39 -53 -2 -9 12% 38 37 544 83 36 547 82 36 3 -1 0 -0% 38 55 501 77 23 511 80 24 10 3 1 -1% 64 1 917 125 1042 890 124 1014 -27 -1 -28 2% 64 19 1118 138 119 965 141 103 -153 3 -16 13% 64 37 1202 151 94 1136 150 81 -66 -1 -13 5% 64 55 1118 141 61 1072 140 58 -46 -1 -3 4% 90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4% 90 19 2392 199 192 2116 189 176 -276 -10 -16 11% 90 37 3313 238 175 2972 225 145 -341 -13 -30 10% 90 55 1948 210 104 1843 213 100 -105 3 -4 5% Notes: 1) This test ran a memory hog program that started a specified number N of threads, and had each thread allocate and touch 1/N'th of the total memory to be used in the test run in a single loop, writing a constant word to memory, one store every 4096 bytes. Watching this test during some earlier trial runs, I would see each of these threads sit down on one CPU and stay there, for the remainder of the pass, a different CPU for each thread. 2) The 'real' column is not comparable to the 'sys' or 'user' columns. The 'real' column is seconds wall clock time elapsed, from beginning to end of that test pass. The 'sys' and 'user' columns are total CPU seconds spent on that test pass. For a 19 thread test run, for example, the sum of 'sys' and 'user' could be up to 19 times the number of 'real' elapsed wall clock seconds. 3) Tests were run on a fresh, single-user boot, to minimize the amount of memory already in use at the start of the test, and to minimize the amount of background activity that might interfere. 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM. 5) Notice that the 'real' time gets large for the single thread runs, even though the measured 'sys' and 'user' times are modest. I'm not sure what that means - probably something to do with it being slow for one thread to be accessing memory along ways away. Perhaps the fake numa system, running ostensibly the same workload, would not show this substantial degradation of 'real' time for one thread on many nodes -- lets hope not. 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs) ran quite efficiently, as one might expect. Each pair of threads needed to allocate and touch the memory on the node the two threads shared, a pleasantly parallizable workload. 7) The intermediate thread count passes, when asking for alot of memory forcing them to go to a few neighboring nodes, improved the most with this zonelist caching patch. Conclusions: This zonelist cache patch probably makes little difference one way or the other for most workloads on real numa hardware, if those workloads avoid heavy off node allocations. * For memory intensive workloads requiring substantial off-node allocations on real numa hardware, this patch improves both kernel and elapsed timings up to ten per-cent. * For fake numa systems, I'm optimistic, but will have to leave that up to Rohit Seth to actually test (once I get him a 2.6.18 backport.) Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: David Rientjes <rientjes@cs.washington.edu> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Christoph Lameter	89689ae7f9	[PATCH] Get rid of zone_table[] The zone table is mostly not needed. If we have a node in the page flags then we can get to the zone via NODE_DATA() which is much more likely to be already in the cpu cache. In case of SMP and UP NODE_DATA() is a constant pointer which allows us to access an exact replica of zonetable in the node_zones field. In all of the above cases there will be no need at all for the zone table. The only remaining case is if in a NUMA system the node numbers do not fit into the page flags. In that case we make sparse generate a table that maps sections to nodes and use that table to to figure out the node number. This table is sized to fit in a single cache line for the known 32 bit NUMA platform which makes it very likely that the information can be obtained without a cache miss. For sparsemem the zone table seems to be have been fairly large based on the maximum possible number of sections and the number of zones per node. There is some memory saving by removing zone_table. The main benefit is to reduce the cache foootprint of the VM from the frequent lookups of zones. Plus it simplifies the page allocator. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrew Morton	676dcb8bc2	[PATCH] add bottom_half.h With CONFIG_SMP=n: drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable' drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable' Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h includes sched.h which will need spinlock.h. So the patch breaks the _bh declarations out into a separate header and includes it in both interrupt.h and spinlock.h. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Michael Chan	9d26e21342	[TG3]: Add TG3_FLG2_IS_NIC flag. Add Tg3_FLG2_IS_NIC flag to unambiguously determine whether the device is NIC or onboard. Previously, the EEPROM_WRITE_PROT flag was overloaded to also mean onboard. With the separation, we can support some devices that are onboard but do not use eeprom write protect. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:21:14 -08:00
Michael Chan	676917d488	[TG3]: Add 5787F device ID. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:20:22 -08:00
Joy Latten	161a09e737	audit: Add auditing to ipsec An audit message occurs when an ipsec SA or ipsec policy is created/deleted. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:14:22 -08:00
Yasuyuki Kozakai	9ee0779e99	[NETFILTER]: nf_conntrack: fix warning in PPTP helper Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:04 -08:00
Adrian Bunk	cc44215eaa	[CRYPTO] api: Remove unused functions This patch removes the following no longer used functions: - api.c: crypto_alg_available() - digest.c: crypto_digest_init() - digest.c: crypto_digest_update() - digest.c: crypto_digest_final() - digest.c: crypto_digest_digest() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:54 -08:00
Kazunori MIYAZAWA	7cf4c1a5fd	[IPSEC]: Add support for AES-XCBC-MAC The glue of xfrm. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:51 -08:00
Jamal Hadi Salim	334c29a645	[GENETLINK]: Move command capabilities to flags. This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. We increment the nlctrl version so as to signal to user space that to not expect the attributes. We will try to be careful not to do this too often ;-> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:38:41 -08:00
Artiom Myaskouvskey	bf7e6a1963	[PATCH] i386: Preserve EFI run time regions with memmap parameter When using memmap kernel parameter in EFI boot we should also add to memory map memory regions of runtime services to enable their mapping later. AK: merged and cleaned up the patch Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Artiom Myaskouvskey	e1cccf48b1	[PATCH] i386: call efi_get_time during suspend Function efi_get_time called not only during init kernel phase but also during suspend (from get_cmos_time). When it is called from get_cmos_time the corresponding runtime service should be called in virtual and not in physical mode. Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Narayanan, Chandramouli" <chandramouli.narayanan@intel.com> Cc: "Jiossy, Rami" <rami.jiossy@intel.com> Cc: "Satt, Shai" <shai.satt@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:11 +01:00
Siddha, Suresh B	72486f1f8f	[PATCH] i386: change the 'no_control' field to 'hotpluggable' in the struct cpu Change the 'no_control' field in the cpu struct to a more positive and better term 'hotpluggable'. And change(/cleanup) the logic accordingly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Rusty Russell	d7cd56111f	[PATCH] i386: cpu_detect extraction Both lhype and Xen want to call the core of the x86 cpu detect code before calling start_kernel. (extracted from larger patch) AK: folded in start_kernel header patch Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:08 +01:00
Andi Kleen	2fff0a4841	[PATCH] Generic: Move __user cast into probe_kernel_address Caller of probe_kernel_address shouldn't need to know that pka is internally implemented with __get_user. So move the __user cast into pka. Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:05 +01:00
Eric W. Biederman	968de4f026	[PATCH] i386: Relocatable kernel support This patch modifies the i386 kernel so that if CONFIG_RELOCATABLE is selected it will be able to be loaded at any 4K aligned address below 1G. The technique used is to compile the decompressor with -fPIC and modify it so the decompressor is fully relocatable. For the main kernel relocations are generated. Resulting in a kernel that is relocatable with no runtime overhead and no need to modify the source code. A reserved 32bit word in the parameters has been assigned to serve as a stack so we figure out where are running. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:04 +01:00
Andrew Morton	bb81a09e55	[PATCH] x86: all cpu backtrace When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu backtrace, so we get to see which CPU is holding the lock, and where. Cc: Andi Kleen <ak@muc.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:01 +01:00
Chuck Lever	5847e1f4d0	SUNRPC: Remove pprintk() from net/sunrpc/xprt.c These appear to be deprecated. Removing them also gets rid of some sparse noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:55 -05:00
Chuck Lever	dd4564715e	SUNRPC: Rename skb_reader_t and friends Clean-up: hch suggested that the RPC client shouldn't pollute the name space used by the generic skb manipulation routines in net/core/skbuff.c. Rename a couple of types in xdr.h to adhere to this convention. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:52 -05:00
Chuck Lever	9d29231690	SUNRPC: skb_read_bits is the same as xs_tcp_copy_data Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the common routine skb_read_bits. The UDP and TCP socket read code now share the same routine for copying data into an xdr_buf. Now that skb_read_bits() is exported, rename it to avoid confusing it with a generic skb_* function. As these functions are XDR-specific, they should not have names that suggest they are of generic use. Also rename skb_read_and_csum_bits() to be consistent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:52 -05:00
Chuck Lever	7559c7a28f	SUNRPC: Make address format buffers more generic For now we will assume that all transports will use the address format buffers in the rpc_xprt struct to store their addresses. Change rpc_peer2str() to be a generic routine to handle this, and get rid of the print_address() op in the rpc_xprt_ops vector. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:51 -05:00
Chuck Lever	314dfd7987	SUNRPC: move saved socket callback functions to a private data structure Move the three fields for saving socket callback functions out of the rpc_xprt structure and into a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:51 -05:00
Chuck Lever	7c6e066ec2	SUNRPC: Move the UDP socket bufsize parameters to a private data structure Move the socket-specific buffer size parameters for UDP sockets to a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:50 -05:00
Chuck Lever	c847546182	SUNRPC: Move rpc_xprt socket connect fields into private data structure Move the socket-specific connection management fields out of the generic rpc_xprt structure into a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:50 -05:00
Chuck Lever	e136d0926e	SUNRPC: Move TCP state flags into xprtsock.c Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename them to use the naming scheme in use in xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:50 -05:00
Chuck Lever	51971139b2	SUNRPC: Move TCP receive state variables into private data structure Move the TCP receive state variables from the generic rpc_xprt structure to a private structure maintained inside net/sunrpc/xprtsock.c. Also rename a function/variable pair to refer to RPC fragment headers instead of record markers, to be consistent with types defined in sunrpc/*.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:49 -05:00
Chuck Lever	ee0ac0c227	SUNRPC: Remove sock and inet fields from rpc_xprt The "sock" and "inet" fields are socket-specific. Move them to a private data structure maintained entirely within net/sunrpc/xprtsock.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:49 -05:00
J. Bruce Fields	717757ad10	rpcgss: krb5: ignore seed We're currently not actually using seed or seed_init. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:47 -05:00
J. Bruce Fields	d922a84a8b	rpcgss: krb5: sanity check sealalg value in the downcall The sealalg is checked in several places, giving the impression it could be either SEAL_ALG_NONE or SEAL_ALG_DES. But in fact SEAL_ALG_NONE seems to be sufficient only for making mic's, and all the contexts we get must be capable of wrapping as well. So the sealalg must be SEAL_ALG_DES. As with signalg, just check for the right value on the downcall and ignore it otherwise. Similarly, tighten expectations for the sealalg on incoming tokens, in case we do support other values eventually. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:47 -05:00
J. Bruce Fields	ca54f89645	rpcgss: simplify make_checksum We're doing some pointless translation between krb5 constants and kernel crypto string names. Also clean up some related spkm3 code as necessary. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:46 -05:00
J. Bruce Fields	e678e06bf8	gss: krb5: remove signalg and sealalg We designed the krb5 context import without completely understanding the context. Now it's clear that there are a number of fields that we ignore, or that we depend on having one single value. In particular, we only support one value of signalg currently; so let's check the signalg field in the downcall (in case we decide there's something else we could support here eventually), but ignore it otherwise. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:44 -05:00
Olga Kornievskaia	adeb8133dd	rpc: spkm3 update This updates the spkm3 code to bring it up to date with our current understanding of the spkm3 spec. In doing so, we're changing the downcall format used by gssd in the spkm3 case, which will cause an incompatilibity with old userland spkm3 support. Since the old code a) didn't implement the protocol correctly, and b) was never distributed except in the form of some experimental patches from the citi web site, we're assuming this is OK. We do detect the old downcall format and print warning (and fail). We also include a version number in the new downcall format, to be used in the future in case any further change is required. In some more detail: - fix integrity support - removed dependency on NIDs. instead OIDs are used - known OID values for algorithms added. - fixed some context fields and types Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:44 -05:00
Olga Kornievskaia	37a4e6cb03	rpc: move process_xdr_buf Since process_xdr_buf() is useful outside of the kerberos-specific code, we move it to net/sunrpc/xdr.c, export it, and rename it in keeping with xdr_* naming convention of xdr.c. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:44 -05:00
J. Bruce Fields	8fc7500bb8	rpc: gss: eliminate print_hexl()'s Dumping all this data to the logs is wasteful (even when debugging is turned off), and creates too much output to be useful when it's turned on. Fix a minor style bug or two while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:43 -05:00
Trond Myklebust	61822ab5e3	NFS: Ensure we only call set_page_writeback() under the page lock Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:40 -05:00
Trond Myklebust	e261f51f25	NFS: Make nfs_updatepage() mark the page as dirty. This will ensure that we can call set_page_writeback() from within nfs_writepage(), which is always called with the page lock set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:39 -05:00
Trond Myklebust	4d770ccf42	NFS: Ensure that nfs_wb_page() calls writepage when necessary. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:39 -05:00
Trond Myklebust	1a54533ec8	NFS: Add nfs_set_page_dirty() We will want to allow nfs_writepage() to distinguish between pages that have been marked as dirty by the VM, and those that have been marked as dirty by nfs_updatepage(). In the former case, the entire page will want to be written out, and so any requests that were pending need to be flushed out first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:38 -05:00
Trond Myklebust	200baa2112	NFS: Remove nfs_writepage_sync() Maintaining two parallel ways of doing synchronous writes is rather pointless. This patch gets rid of the legacy nfs_writepage_sync(), and replaces it with the faster asynchronous writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:38 -05:00
Trond Myklebust	1c75950b9a	NFS: cleanup of nfs_sync_inode_wait() Allow callers to directly pass it a struct writeback_control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:35 -05:00
Trond Myklebust	3f442547b7	NFS: Clean up nfs_scan_dirty() Pass down struct writeback control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:35 -05:00
Chuck Lever	c8541ecdd5	SUNRPC: Make the transport-specific setup routine allocate rpc_xprt Change the location where the rpc_xprt structure is allocated so each transport implementation can allocate a private area from the same chunk of memory. Note also that xprt->ops->destroy, rather than xprt_destroy, is now responsible for freeing rpc_xprt when the transport is destroyed. Test plan: Connectathon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:34 -05:00
Chuck Lever	e744cf2e3a	SUNRPC: minor optimization of "xid" field in rpc_xprt Move the xid field in the rpc_xprt structure to be in the same cache line as the reserve_lock, since these are used at the same time. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:34 -05:00
Trond Myklebust	1e78957e0a	SUNRPC: Clean up argument types in xdr.c Converts various integer buffer offsets and sizes to unsigned integer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:32 -05:00
Trond Myklebust	bbd5a1f9fc	SUNRPC: Fix up missing BKL in asynchronous RPC callback functions Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:29 -05:00
Trond Myklebust	3e32a5d99a	SUNRPC: Give cloned RPC clients their own rpc_pipefs directory Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:29 -05:00
Trond Myklebust	8aca67f0ae	SUNRPC: Fix a potential race in rpc_wake_up_task() Use RCU to ensure that we can safely call rpc_finish_wakeup after we've called __rpc_do_wake_up_task. If not, there is a theoretical race, in which the rpc_task finishes executing, and gets freed first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:26 -05:00
Trond Myklebust	e6b3c4db6f	Fix a second potential rpc_wakeup race... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:25 -05:00
David Howells	6d5aefb8ea	WorkQueue: Fix up arch-specific work items where possible Fix up arch-specific work items where possible to use the new work_struct and delayed_work structs. Three places that enqueue bits of their stack and then return have been marked with #error as this is not permitted. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 19:36:26 +00:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Matthew Wilcox	e62438630c	[PATCH] Centralise definitions of sector_t and blkcnt_t CONFIG_LBD and CONFIG_LSF are spread into asm/types.h for no particularly good reason. Centralising the definition in linux/types.h means that arch maintainers don't need to bother adding it, as well as fixing the problem with x86-64 users being asked to make a decision that has absolutely no effect. The H8/300 porters seem particularly confused since I'm not aware of any microcontrollers that need to support 2TB filesystems. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 19:41:15 -08:00
Linus Torvalds	15a4cb9c25	Merge master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc * master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc: (194 commits) [POWERPC] Add missing EXPORTS for mpc52xx support [POWERPC] Remove obsolete PPC_52xx and update CLASSIC32 comment [POWERPC] ps3: add a default zImage target [POWERPC] Add of_platform_bus support to mpc52xx psc uart driver [POWERPC] typo fix and whitespace cleanup on mpc52xx-uart driver [POWERPC] Fix debug printks for 32-bit resources in the PCI code [POWERPC] Replace kmalloc+memset with kzalloc [POWERPC] Linkstation / kurobox support [POWERPC] Add the e300c3 core to the CPU table. [POWERPC] ppc: m48t35 add missing bracket [POWERPC] iSeries: don't build head_64.o unnecessarily [POWERPC] iSeries: stop dt_mod.o being rebuilt unnecessarily [POWERPC] Fix cputable.h for combined build [POWERPC] Allow CONFIG_BOOTX_TEXT on iSeries [POWERPC] Allow xmon to build on legacy iSeries [POWERPC] Change ppc64_defconfig to use AUTOFS_V4 not V3 [POWERPC] Tell firmware we can handle POWER6 compatible mode [POWERPC] Clean images in arch/powerpc/boot [POWERPC] Fix OF pci flags parsing [POWERPC] defconfig for lite5200 board ...	2006-12-04 19:22:33 -08:00
Linus Torvalds	ff51a98799	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (82 commits) [PATCH] pata_ali: small fixes [PATCH] pata_via: VIA 8251 bridged systems are now out and about [PATCH] trivial piix: swap bogus dot for comma space [PATCH] sata_promise: PHYMODE4 fixup [PATCH] libata: always use polling IDENTIFY [libata] pata_cs5535: fix build [PATCH] ahci: do not powerdown during initialization [PATCH] libata: prepare ata_sg_clean() for invocation from EH [PATCH] libata: separate out rw ATA taskfile building into ata_build_rw_tf() [PATCH] libata: implement ata_exec_internal_sg() [PATCH] libata: make sure IRQ is cleared after ata_bmdma_freeze() [PATCH] libata: move BMDMA host status recording from EH to interrupt handler [PATCH] libata: make sure sdev doesn't go away while rescanning [PATCH] libata: don't request sense if the port is frozen [PATCH] libata: fix READ CAPACITY simulation [PATCH] libata: implement ATA_FLAG_SETXFER_POLLING and use it in pata_via, take #2 [PATCH] libata: set IRQF_SHARED for legacy PCI IDE IRQs [PATCH] libata: remove unused HSM_ST_UNKNOWN [PATCH] libata: kill unnecessary sht->max_sectors initializations [PATCH] libata: add missing sht->slave_destroy ...	2006-12-04 13:12:29 -08:00
Al Viro	a80958f484	[PATCH] fix fallout from header dependency trimming OK, that seems to be enough to deal with the mess. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 12:45:29 -08:00
Linus Torvalds	0c789ff64e	netfilter.h needs rcuupdate.h for RCU locking functions This was exposed by Al's recent header file dependency reduction patches.. Cc: Al Viro <viro@ftp.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 10:52:29 -08:00
Linus Torvalds	9b8ab9f6c3	Merge branch 'for-linus4' of master.kernel.org:/pub/scm/linux/kernel/git/viro/bird * 'for-linus4' of master.kernel.org:/pub/scm/linux/kernel/git/viro/bird: [PATCH] severing poll.h -> mm.h [PATCH] severing skbuff.h -> mm.h [PATCH] severing skbuff.h -> poll.h [PATCH] severing skbuff.h -> highmem.h [PATCH] severing uaccess.h -> sched.h [PATCH] severing fs.h, radix-tree.h -> sched.h [PATCH] severing module.h->sched.h	2006-12-04 10:37:06 -08:00
Dwayne Grant McConnell	bf1ab978be	[POWERPC] coredump: Add SPU elf notes to coredump. This patch adds SPU elf notes to the coredump. It creates a separate note for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1, /signal1_type, /signal2, /signal2_type, /event_mask, /event_status, /mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id. A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to specify they have extra elf core notes. A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the additional notes could be calculated and added to the notes phdr entry. A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes would be written after the existing notes. The SPU coredump code resides in spufs. Stub functions are provided in the kernel which are hooked into the spufs code which does the actual work via register_arch_coredump_calls(). A new set of __spufs_<file>_read/get() functions was provided to allow the coredump code to read from the spufs files without having to lock the SPU context for each file read from. Cc: <linux-arch@vger.kernel.org> Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>	2006-12-04 20:40:19 +11:00
Al Viro	f23f6e08c4	[PATCH] severing poll.h -> mm.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:36 -05:00
Al Viro	d7fe0f241d	[PATCH] severing skbuff.h -> mm.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:34 -05:00
Al Viro	bd01f843c3	[PATCH] severing skbuff.h -> poll.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:31 -05:00
Al Viro	a1f8e7f7fb	[PATCH] severing skbuff.h -> highmem.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:29 -05:00
Al Viro	914e26379d	[PATCH] severing fs.h, radix-tree.h -> sched.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:24 -05:00
Al Viro	f6a570333e	[PATCH] severing module.h->sched.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:22 -05:00
Paul Mackerras	79acbb3ff2	Merge branch 'linux-2.6' into for-linus	2006-12-04 15:59:07 +11:00
Jeff Garzik	d916faace3	Remove long-unmaintained ftape driver subsystem. It's bitrotten, long unmaintained, long hidden under BROKEN_ON_SMP, etc. As scheduled in feature-removal-schedule.txt, and ack'd several times on lkml. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-03 22:22:41 -05:00
Tejun Heo	800b399669	[PATCH] libata: always use polling IDENTIFY libata switched to IRQ-driven IDENTIFY when IRQ-driven PIO was introduced. This has caused a lot of problems including device misdetection and phantom device. ATA_FLAG_DETECT_POLLING was added recently to selectively use polling IDENTIFY on problemetic drivers but many controllers and devices are affected by this problem and trying to adding ATA_FLAG_DETECT_POLLING for each such case is diffcult and not very rewarding. This patch makes libata always use polling IDENTIFY. This is consistent with libata's original behavior and drivers/ide's behavior. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-03 07:58:10 -05:00
Tejun Heo	3d3cca3755	[PATCH] libata: implement ATA_FLAG_SETXFER_POLLING and use it in pata_via, take #2 This patch implements ATA_FLAG_SETXFER_POLLING and use in pata_via. If this flag is set, transfer mode setting performed by polling not by interrupt. This should help those controllers which raise interrupt before the command is actually complete on SETXFER. Rationale for this approach. * uses existing facility and relatively simple * no busy sleep in the interrupt handler * updating drivers is easy While at it, kill now unused flag ATA_FLAG_SRST in pata_via. Signed-off-by: Tejun Heo <htejun@gmail.com>	2006-12-03 17:56:23 +09:00
Tejun Heo	582982e699	[PATCH] libata: remove unused HSM_ST_UNKNOWN HSM_ST_UNKNOWN is not used anywhere. Its value is zero and supposed to serve sanity check purpose but HSM_ST_IDLE is used for that purpose. This unused state causes confusion. After a port is initialized but before the first command is executed, the idle hsm state is UNKNOWN. However, once a command has completed, the idle hsm state is IDLE. This defeats sanity check in ata_pio_task() for the first command. This patch removes HSM_ST_UNKNOWN and consequently make HSM_ST_IDLE the default state. Signed-off-by: Tejun Heo <htejun@gmail.com>	2006-12-03 17:56:23 +09:00
Jamal Hadi Salim	2b5f6dcce5	[XFRM]: Fix aevent structuring to be more complete. aevents can not uniquely identify an SA. We break the ABI with this patch, but consensus is that since it is not yet utilized by any (known) application then it is fine (better do it now than later). Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:22:25 -08:00
Patrick McHardy	a536df35b3	[NETFILTER]: nf_conntrack/nf_nat: add TFTP helper port Add IPv4 and IPv6 capable nf_conntrack port of the TFTP conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:10:18 -08:00
Patrick McHardy	9fafcd7b20	[NETFILTER]: nf_conntrack/nf_nat: add SIP helper port Add IPv4 and IPv6 capable nf_conntrack port of the SIP conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:57 -08:00
Patrick McHardy	f09943fefe	[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port Add nf_conntrack port of the PPtP conntrack/NAT helper. Since there seems to be no IPv6-capable PPtP implementation the helper only support IPv4. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:41 -08:00
Patrick McHardy	869f37d8e4	[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port Add nf_conntrack port of the IRC conntrack/NAT helper. Since DCC doesn't support IPv6 yet, the helper is still IPv4 only. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:06 -08:00
Patrick McHardy	f587de0e2f	[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port Add IPv4 and IPv6 capable nf_conntrack port of the H.323 conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:08:46 -08:00
Patrick McHardy	1695890057	[NETFILTER]: nf_conntrack/nf_nat: add amanda helper port Add IPv4 and IPv6 capable nf_conntrack port of the Amanda conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:08:26 -08:00
Jozsef Kadlecsik	55a733247d	[NETFILTER]: nf_nat: add FTP NAT helper port Add FTP NAT helper. Split out from Jozsef's big nf_nat patch with a few small fixes by myself. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:07:44 -08:00
Jozsef Kadlecsik	5b1158e909	[NETFILTER]: Add NAT support for nf_conntrack Add NAT support for nf_conntrack. Joint work of Jozsef Kadlecsik, Yasuyuki Kozakai, Martin Josefsson and myself. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:07:13 -08:00
Patrick McHardy	9a7c9337a0	[NET]: Accept wildcard delimiters in in[46]_pton Accept -1 as delimiter to abort parsing without an error at the first unknown character. This is needed by the upcoming nf_conntrack SIP helper, where addresses are delimited by either '\r' or '\n' characters. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:04:04 -08:00
Al Viro	1e419cd995	[EBTABLES]: Split ebt_replace into user and kernel variants, annotate. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:05 -08:00
Bart De Schuymer	d12cdc3ccf	[NETFILTER]: ebtables: add --snap-arp option The attached patch adds --snat-arp support, which makes it possible to change the source mac address in both the mac header and the arp header with one rule. Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:32 -08:00
Patrick McHardy	baf7b1e112	[NETFILTER]: x_tables: add NFLOG target Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6. Currently we have two (unsupported by userspace) hacks in the LOG and ULOG targets to optionally call to the nflog API. They lack a few features, namely the IPv4 and IPv6 LOG targets can not specify a number of arguments related to nfnetlink_log, while the ULOG target is only available for IPv4. Remove those hacks and add a clean way to use nfnetlink_log. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:31 -08:00
Patrick McHardy	39b46fc6f0	[NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6 Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:31 -08:00
Patrick McHardy	d7a5c32442	[NETFILTER]: nfnetlink_log: remove useless prefix length limitation There is no reason for limiting netlink attributes in size. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:30 -08:00
Eric Leblond	829e17a1a6	[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:29 -08:00
Patrick McHardy	1b683b5512	[NETFILTER]: sip conntrack: better NAT handling The NAT handling of the SIP helper has a few problems: - Request headers are only mangled in the reply direction, From/To headers not at all, which can lead to authentication failures with DNAT in case the authentication domain is the IP address - Contact headers in responses are only mangled for REGISTER responses - Headers may be mangled even though they contain addresses not participating in the connection, like alternative addresses - Packets are droppen when domain names are used where the helper expects IP addresses This patch takes a different approach, instead of fixed rules what field to mangle to what content, it adds symetric mapping of From/To/Via/Contact headers, which allows to deal properly with echoed addresses in responses and foreign addresses not belonging to the connection. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:26 -08:00
Patrick McHardy	40883e8184	[NETFILTER]: sip conntrack: do case insensitive SIP header search SIP headers are generally case-insensitive, only SDP headers are case sensitive. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:24 -08:00
Patrick McHardy	9d5b8baa4e	[NETFILTER]: sip conntrack: minor cleanup - Use enum for header field enumeration - Use numerical value instead of pointer to header info structure to identify headers, unexport ct_sip_hdrs - group SIP and SDP entries in header info structure - remove double forward declaration of ct_sip_get_info Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:23 -08:00
Yasuyuki Kozakai	468ec44bd5	[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find We usually uses 'xxx_find_get' for function which increments reference count. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:21 -08:00

1 2 3 4 5 ...

5206 commits