Commit graph

15166 commits

Author SHA1 Message Date
Mike Stroyan
18955cfcb2 [IPV4] tcp/route: Another look at hash table sizes
The tcp_ehash hash table gets too big on systems with really big memory.
It is worse on systems with pages larger than 4KB.  It wastes memory that
could be better used.  It also makes the netstat command slow because reading
/proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.

  The default value should not be larger for larger page sizes.  It seems
that the effect of page size is an unintended error dating back a long
time.  I also wonder if the default value really should be a larger
fraction of memory for systems with more memory.  While systems with
really big ram can afford more space for hash tables, it is not clear to
me that they benefit from increasing the allocation ratio for this table.

  The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
mm/page_alloc.c:alloc_large_system_hash.

tcp_init calls alloc_large_system_hash passing parameters-
    bucketsize=sizeof(struct tcp_ehash_bucket)
    numentries=thash_entries
    scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
    limit=0

On i386, PAGE_SHIFT is 12 for a page size of 4K
On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K

The num_physpages test above makes the allocation take a larger fraction
of the total memory on systems with larger memory.  The threshold size
for a i386 system is 512MB.  For an ia64 system with 16KB pages the
threshold is 2GB.

For smaller memory systems-
On i386, scale = (27 - 12) = 15
On ia64, scale = (27 - 14) = 13
For larger memory systems-
On i386, scale = (25 - 12) = 13
On ia64, scale = (25 - 14) = 11

  For the rest of this discussion, I'll just track the larger memory case.

  The default behavior has numentries=thash_entries=0, so the allocated
size is determined by either scale or by the default limit of 1/16 of
total memory.

In alloc_large_system_hash-
|	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
|	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
|	numentries >>= 20 - PAGE_SHIFT;
|	numentries <<= 20 - PAGE_SHIFT;

  At this point, numentries is pages for all of memory, rounded up to the
nearest megabyte boundary.

|	/* limit to 1 bucket per 2^scale bytes of low memory */
|	if (scale > PAGE_SHIFT)
|		numentries >>= (scale - PAGE_SHIFT);
|	else
|		numentries <<= (PAGE_SHIFT - scale);

On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
bytes of total memory.
On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
bytes of total memory.

|        log2qty = long_log2(numentries);
|
|        do {
|                size = bucketsize << log2qty;

bucketsize is 16, so size is 16 times numentries, rounded
down to a power of two.

On i386, size is 1/512 of bytes of total memory.
On ia64, size is 1/128 of bytes of total memory.

For smaller systems the results are
On i386, size is 1/2048 of bytes of total memory.
On ia64, size is 1/512 of bytes of total memory.

  The large page effect can be removed by just replacing
the use of PAGE_SHIFT with a constant of 12 in the calls to
alloc_large_system_hash.  That makes them more like the other uses of
that function from fs/inode.c and fs/dcache.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:12:55 -08:00
Linus Torvalds
2f12c74f0c Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2005-11-29 14:23:38 -08:00
Linus Torvalds
92af254a1b Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-11-29 14:23:21 -08:00
Linus Torvalds
f747307ed1 Revert "[PATCH] drivers/message/fusion/mptbase.c: make code static"
This reverts commit 252ac86553.

It impacts the LSI customers using the mptstm target mode drivers
(source tar-ball at

  ftp://ftp.lsil.com/HostAdapterDrivers/linux/Fusion-MPT/mptstm-1.00.13-src.tar.gz

for those who care).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 14:21:57 -08:00
Hugh Dickins
325f04dbca [PATCH] pfnmap: do_no_page BUG_ON again
Use copy_user_highpage directly instead of cow_user_page in do_no_page:
in the immediately following page_cache_release, and elsewhere, it is
assuming that new_page is normal.  If any VM_PFNMAP driver can get to
do_no_page, it's just a BUG (but not in the case of do_anonymous_page).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 14:09:17 -08:00
Hugh Dickins
e5bbe4dfc8 [PATCH] pfnmap: remove src_page from do_wp_page
Clean away do_wp_page's "src_page": cow_user_page makes it unnecessary.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 14:09:16 -08:00
Linus Torvalds
5d2a2dbbc1 cow_user_page: fix page alignment
High Dickins points out that the user virtual address passed to the page
fault handler isn't necessarily page-aligned.

Also, add a comment on why the copy could fail for the user address case.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 14:07:55 -08:00
Linus Torvalds
c9cfcddfd6 VM: add common helper function to create the page tables
This logic was duplicated four times, for no good reason.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 14:03:14 -08:00
David S. Miller
cab3f16feb [SPARC64]: Fix >8K I/O mappings.
Increment the PFN field of the PTE so that the tests
on vm_pfn in mm/memory.c match up.  The TLB ignores these
lower bits for larger page sizes, so it's OK to set things
like this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 13:59:03 -08:00
Linus Torvalds
4168f7a318 Merge master.kernel.org:/pub/scm/linux/kernel/git/tglx/mtd-2.6 2005-11-29 13:04:07 -08:00
Christoph Hellwig
238f9b063d [PATCH] fix megaraid.c locking
This fixes locking in megaraid.c, namely:

 (1) make sure megaraid_queue release the adapter lock by changing the
     code to have a single return
 (2) remove the errornous scsi_assign_lock call

Testing by Burton Windle.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Burton Windle <bwindle@fint.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 13:03:01 -08:00
Linus Torvalds
238f58d898 Support strange discontiguous PFN remappings
These get created by some drivers that don't generally even want a pfn
remapping at all, but would really mostly prefer to just map pages
they've allocated individually instead.

For now, create a helper function that turns such an incomplete PFN
remapping call into a loop that does that explicit mapping.  In the long
run we almost certainly want to export a totally different interface for
that, though.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 13:01:56 -08:00
Ben Collins
eca351336a [PATCH] Fix missing pfn variables caused by vm changes
I image this showed up because of "unused var..." when the changes
occured, because flush_cache_page() is a noop in most places.  This
showed up for me on parisc however, where flush_cache_page() is a real
function.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 12:57:17 -08:00
Michael S. Tsirkin
e0ae9ecf46 IB/mthca: fix posting of send lists of length >= 255 on mem-free HCAs
On mem-free HCAs, when posting a long list of send requests, a
doorbell must be rung every 255 requests.  Add code to handle this.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 11:33:46 -08:00
Roland Dreier
267ee88ed3 IPoIB: fix error handling in ipoib_open
If ipoib_ib_dev_up() fails after ipoib_ib_dev_open() is called, then
ipoib_ib_dev_stop() needs to be called to clean up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:55:58 -08:00
Adrian Bunk
2b9175c174 [MTD] Make functions static, include header files with prototypes
This patch contains the following possible cleanups:
- every file should #include the headers containing the prototypes for
  it's global functions
- make needlessly global functions static

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:54:58 +01:00
Michael S. Tsirkin
4f71055a45 IPoIB: protect child list in ipoib_ib_dev_flush
race condition: ipoib_ib_dev_flush is accessing child list without locks.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:53:30 -08:00
Richard Purdie
ee2d49de3e [MTD] chips: make sharps driver usable again
Update the pre-CFI Sharp driver sharps.c so it compiles.  map_read32 /
map_write32 no longer exist in the kernel so the driver is totally broken
as it stands.  The replacement functions use different parameters resulting
in the other changes.

Change collie to use this driver until someone works out why the cfi driver
fails on that machine.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Tested-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:49:03 +01:00
Thomas Gleixner
72af3b2c5a [MTD] Remove bogus PQ2FADS driver
Remove disfunctional driver, which slipped through the review mechanism

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:46:16 +01:00
Luiz Capitulino
e2602b347a [MTD] maps: sparse fixup
The patch below fixes the following sparse warning:

drivers/mtd/maps/nettel.c:482:27: warning: Using plain integer as NULL pointer

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:37:56 +01:00
Nicolas Pitre
8bc3b3804a [MTD] cfi_cmdset_0001: relax locking rules for multi hardware partition support
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:36:35 +01:00
David Woodhouse
7ac571f8d0 [MTD] Make some tables 'const' so they can live in .rodata
arjan: drivers/mtd/maps/sc520cdp.c:167: warning: par_table is never written to and should be declared 'const'
arjan: drivers/mtd/maps/pci.c:105: warning: mtd_pci_map is never written to and should be declared 'const'
arjan: mind fixing those up ?

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:33:50 +01:00
John Bowler
3c77354794 [MTD] maps/ixp4xx: half-word boundary and little-endian fixups
ixp4xx updates:
  - Handle reads that don't start on a half-word boundary.
  - Make it work when CPU is in little-endian mode.

Signed-off-by: John Bowler <jbowler@acm.org>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:29:37 +01:00
Todd Poynor
987d24018d [MTD] CFI: Use 16-bit access to autoselect/read device id data
Recent models of Intel/Sharp and Spansion CFI flash now have significant
bits in the upper byte of device ID codes, read via what Spansion calls
"autoselect" and Intel calls "read device identifier".  Currently these
values are truncated to the low 8 bits in the mtd data structures, as
all CFI read query info has previously been read one byte at a time.
Add a new method for reading 16-bit info, currently just manufacturer
and device codes; datasheets hint at future uses for upper bytes in
other fields.

Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:27:24 +01:00
Maciej W. Rozycki
3eb8ceac48 [MTD] devices/ms02-nv: phys/virt address fixups
Merge from linux-mips:
Use physical addresses at the interface level, letting drivers remap
them as appropriate.

Signed-off-by: Maciej W. Rozycki <macro@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 19:25:47 +01:00
Roland Dreier
2e86541ec8 IPoIB: don't zero members after we allocate with kzalloc
ipoib_mcast_alloc() uses kzalloc(), so there's no need to zero out
members of the mcast struct after it's allocated.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:25:23 -08:00
Michael S. Tsirkin
de92248789 IPoIB: reinitialize mcast structs' completions for every query
Make sure mcast->done is initialized to uncompleted value before we
submit a new query, so that it's safe to wait on.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:18:45 -08:00
Roland Dreier
5872a9fc28 IPoIB: always set path->query to NULL when query finishes
Always set path->query to NULL when the SA path record query
completes, rather than only when we don't have an address handle.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:13:54 -08:00
Keshavamurthy Anil S
5a94bcfd2a [IA64] Remove getting break_num by decoding instruction
break.b always sets cr.iim to 0 and the current code tries to
get the break_num by decoding instruction. However, their
seems to be a race condition while reading the regs->cr_iip,
as on other cpu the break.b at regs->cr_iip might have been
replaced with the original instruction as a result of
unregister_kprobe() and hence decoding instruction to
obtain break_num will result in wrong value in this case.

Also includes changes to kprobes.c which now has to handle
break number zero.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:39 -08:00
Dean Roe
b77dae5293 [IA64] - Make pfn_valid more precise for SGI Altix systems
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel.  pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:10 -08:00
Linus Torvalds
d70aa5e4b5 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-11-29 08:06:51 -08:00
Linus Torvalds
4570cd6a65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2005-11-29 08:05:54 -08:00
Thomas Gleixner
21eeb7aa11 [JFFS2] Fix the slab cache constructor of 'struct jffs2_inode_info' objects.
JFFS2 initialize f->sem mutex as "locked" in the slab constructor which is a
bug. Objects are freed with unlocked f->sem mutex. So, when they allocated
again, f->sem is unlocked because the slab cache constructor is not called for
them. The constructor is called only once when memory pages are allocated for
objects (namely, when the slab layer allocates new slabs). So, sometimes
'struct jffs2_inode_info' are allocated with unlocked f->sem, sometimes with
locked. This is a bug. Instead, initialize f->sem as unlocked in the
constructor. I.e., in the "constructed" state f->sem must be unlocked.

From: Keijiro Yano <keijiro_yano@yahoo.co.jp>
Acked-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 16:57:17 +01:00
Nick Piggin
fa2a455b02 [PATCH] Fix vma argument in get_usr_pages() for gate areas
The system call gate area handling called vm_normal_page() with the
wrong vma (which was always NULL, and caused an oops).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29 07:53:32 -08:00
Sean Young
bc4117f876 [MTD] RFD_FTL: Use lanana assigned major device number
A major block device number is now assigned by lanana.

Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-29 16:52:26 +01:00
Jeff Garzik
2226340eb8 Merge branch 'master' 2005-11-29 03:50:33 -05:00
YOSHIFUJI Hideaki
220bbd7483 [IPV6]: Implement appropriate dummy rule 4 in ipv6_dev_get_saddr().
Ensure to update hiscore.rule in dummy rule 4 in ipv6_dev_get_saddr().
Pointed out by Yan Zheng <yanzheng@21cn.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-28 22:27:11 -08:00
Roland Dreier
65c7eddaba IPoIB: reinitialize path struct's completion for every query
It's possible that IPoIB will issue multiple SA queries for the same
path struct.  Therefore the struct's completion needs to be
initialized for each query rather than only once when the struct is
allocated, or else we might not wait long enough for later queries to
finish and free the path struct too soon.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-28 21:20:34 -08:00
Paul Mackerras
666acb94d1 powerpc: Export __flush_icache_range for 32-bit
Both 32-bit and 64-bit use the same inline flush_icache_range definition
now, so both need to export __flush_icache_range, not just 64-bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-29 15:50:58 +11:00
Linus Torvalds
624f54be20 Linux v2.6.15-rc3 2005-11-28 19:51:27 -08:00
Otavio Salvador
0e2d94f6a0 [PATCH] ppc: Export symbol needed by MOL
Export symbol needed to allow MOL to run. This was changed to be inline
in past and forgot to be change here.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-29 13:23:28 +11:00
Linus Torvalds
5d240918e6 Merge master.kernel.org:/home/rmk/linux-2.6-serial 2005-11-28 15:03:28 -08:00
Linus Torvalds
cba2fa1861 Merge master.kernel.org:/home/rmk/linux-2.6-mmc 2005-11-28 15:02:50 -08:00
Linus Torvalds
89a1623df6 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-11-28 15:02:30 -08:00
Miklos Szeredi
2827d0b23b [PATCH] fuse: check for invalid node ID in fuse_create_open()
Check for invalid node ID values in the new atomic create+open method.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
Miklos Szeredi
f007d5c961 [PATCH] fuse: check directory aliasing in mkdir
Check the created directory inode for aliases in the mkdir() method.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
Andrea Arcangeli
ea164d73a7 [PATCH] shrinker->nr = LONG_MAX means deadlock for icache
With Andrew Morton <akpm@osdl.org>

The slab scanning code tries to balance the scanning rate of slabs versus the
scanning rate of LRU pages.  To do this, it retains state concerning how many
slabs have been scanned - if a particular slab shrinker didn't scan enough
objects, we remember that for next time, and scan more objects on the next
pass.

The problem with this is that with (say) a huge number of GFP_NOIO
direct-reclaim attempts, the number of objects which are to be scanned when we
finally get a GFP_KERNEL request can be huge.  Because some shrinker handlers
just bail out if !__GFP_FS.

So the patch clamps the number of objects-to-be-scanned to 2* the total number
of objects in the slab cache.

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
Jan Kara
154f484b92 [PATCH] Fix oops in vfs_quotaon_mount()
When quota file specified in mount options did not exist, we tried to
dereference NULL pointer later. Fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
NeilBrown
6aea114a72 [PATCH] md: fix --re-add for raid1 and raid6
If you have an array with a write-intent-bitmap, and you remove a device, then
re-add it, a full recovery isn't needed.  We detect a re-add by looking at
saved_raid_disk.  For raid1, it doesn't matter which disk it was, only whether
or not it was an active device.  The old code being removed set a value of
'mirror' which was then ignored, so it can go.  The changed code performs the
correct check.

For raid6, if there are two missing devices, make sure we chose the right slot
on --re-add rather than always the first slot.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
NeilBrown
b2a2703c28 [PATCH] md: set default_bitmap_offset properly in set_array_info
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00