Commit graph

32605 commits

Author SHA1 Message Date
Chandra Seetharaman
18ee70c9d7 [SCSI] scsi_dh: add the interface scsi_dh_set_params()
When we moved the device handler functionality from dm layer to SCSI layer
we dropped the parameter functionality.

This path adds an interface to scsi dh layer to set device handler
parameters.

Basically, multipath layer need to create a string with all the parameters
and call scsi_dh_set_params() after it called scsi_dh_attach() on a
device.

If a device handler provides such an interface it will handle the parameters
as it expects them.

Reported-by: Eddie Williams <Eddie.Williams@steeleye.com>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Tested-by: Eddie Williams <Eddie.Williams@steeleye.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:14 -05:00
James Bottomley
43d8eb9cfd [SCSI] ses: add support for enclosure component hot removal
Right at the moment, hot removal of a device within an enclosure does
nothing (because the intf_remove only copes with enclosure removal not
with component removal). Fix this by adding a function to remove the
component.  Also needed to fix the prototype of
enclosure_remove_device, since we know the device we've removed but
not the internal component number

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:13 -05:00
James Bottomley
163f52b6cf [SCSI] ses: fix hotplug with multiple devices and expanders
In a situation either with expanders or with multiple enclosure
devices, hot add doesn't always work.  This is because we try to find
a single enclosure device attached to the host.  Fix this by looping
over all enclosure devices attached to the host and also by making the
find loop recognise that the enclosure devices may be expander remote
(i.e. not parented by the host).

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:13 -05:00
Yi Zou
537029f8e9 [SCSI] libfc: Remove FC_FRAME_SG_LEN in fc_fcp_send_data
FC_FRAME_SG_LEN is 4 which is too small when offload is enabled. Actually, the
WARN_ON() in fc_fcp_send_data() should be:

	WARN_ON(skb_shinfo(fp_skb(fp))->nr_frags > MAX_SKB_FRAGS);

But since we will not get anything more than 64K anyway, so there is no need
to do this anyway here. Therefore, I am getting rid of FC_FRAME_SG_LEN here
and the WARN_ON here.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:10 -05:00
Vasu Dev
52ff878c91 [SCSI] fcoe, fnic, libfc: modifies current code paths to use EM anchor list
Modifies current code to use EM anchor list in EM allocation, EM free,
EM reset, exch allocation and exch lookup code paths.

 1. Modifies fc_exch_mgr_alloc to accept EM match function and then
    have allocated EM added to the lport using fc_exch_mgr_add API
    while also updating EM kref for newly added EM.

 2. Updates fc_exch_mgr_free API to accept only lport pointer instead
    EM and then have this API free all EMs of the lport from EM anchor
    list.

 3. Removes single lport pointer link from the EM, which was used in
    associating lport pointer in newly allocated exchange. Instead have
    lport pointer passed along new exchange allocation call path and
    then store passed lport pointer in newly allocated exchange, this
    will allow a single EM instance to be used across more than one
    lport and used in EM reset to reset only lport specific exchanges.

 4. Modifies fc_exch_mgr_reset to reset all EMs from the EM anchor list
    of the lport, adds additional exch lport pointer (ep->lp) check for
    shared EM case to reset exchange specific to a lport requested reset.

 5. Updates exch allocation API fc_exch_alloc to use EM anchor list and
    its anchor match func pointer. The fc_exch_alloc will walk the list
    of EMs until it finds a match, a match will be either null match
    func pointer or call to match function returning true value.

 6. Updates fc_exch_recv to accept incoming frame on local port using
    only lport pointer and frame pointer without specifying EM instance
    of incoming frame. Instead modified fc_exch_recv to locate EM for the
    incoming frame by matching xid of incoming frame against a EM xid range.
    This change was required to use EM list in libfc Rx path and after this
    change the lport fc_exch_mgr pointer emp is not needed anymore, so
    removed emp pointer.

 7. Updates fnic for removed lport emp pointer and above modified libfc APIs
    fc_exch_recv, fc_exch_mgr_alloc and fc_exch_mgr_free.

 8. Removes exch_get and exch_put from libfc_function_template as these
    are no longer needed with EM anchor list and its match function use.
    Also removes its default function fc_exch_get.

A defect this patch introduced regarding the libfc initialization order in
the fnic driver was fixed by Joe Eykholt <jeykholt@cisco.com>.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:08 -05:00
Robert Love
d459b7ea1b [SCSI] libfc: Remove the FC_EM_DBG macro
Currently there is a 1:1 relationship between the lport
and exchange manager. This macro takes an EM as an argument
and determines the lport from it. However, later patches
will use an EM list per lport, so we will no longer have
this 1:1 relationship- this macro must change.

The FC_EM_DBG macro is rarely used. There are four callers,
two can use FC_LPORT_DBG instead and two can be removed
since they're not necessary. This patch makes those changes
and removes the macro.

Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:08 -05:00
Vasu Dev
96316099ac [SCSI] fcoe, libfc: adds exchange manager(EM) anchor list per lport and related APIs
Adds EM list using a anchor struct fc_exch_mgr_anchor, anchor is used
to allow same EM instance sharing across more than one lport on a eth
device, this implementation is per discussed design posted at
http://www.open-fcoe.org/pipermail/devel/2009-June/002566.html.

The shared EM is required for multiple lports on eth device when
using multiple VLANs or NPIV.

Adds fc_exch_mgr_add API to add a EM to the lport and fc_exch_mgr_del
API to delete previously added EM.

Also adds function fc_exch_mgr_destroy() to destroy allocated EM.
The kref is added to the EM to keep track of EM usage count, the EM is
destroyed when no longer in use upon kref reaching to zero.

The caller can specify match function to fc_exch_mgr_add, this
will be used in determining exchange allocation from its EM or not.

Moved calling of fcoe_em_config below fcoe_libfc_config calling,
so that list head lp->ema_list is initialized before configuring
EM.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:07 -05:00
Joe Eykholt
141940548c [SCSI] libfc: rename rport state "NONE" to "DELETE".
State RPORT_ST_NONE was intented to be an invalid state (0), never used.
This was a misguided attempt to be sure it was always initialized.
Having an extra state meaning nothing requires switch statements to
have a case covering that state.

State NONE has been used instead to mean the remote port is being deleted.
Changing the name to RPORT_ST_DELETE.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:06 -05:00
Joe Eykholt
b1d9fd5574 [SCSI] libfc: rename lport NONE state to DISABLED
The state NONE was meant to be invalid, but has been used as
the initial state.  Rename it to be DISABLED, as more descriptive.
Further patches will make it the like the RESET state, except
it won't transition to FLOGI until fc_lport_fabric_login() is called.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:04 -05:00
Joe Eykholt
7f74549ff6 [SCSI] libfc: change debug messages to give host number.
libfc debug messages currently show 'lport: <fc-id>:'
wher <fc-id> is the hex assigned port-id.  When the lport
is logged off, that will be zero, so its hard to distinguish
which instance is involved.  The FC-ID can change
if the port is re-patched or changes VSANs.

Two lports may even have the same FC-ID if connected to isolated SANs.

Change the debug messages to print the SCSI host number "hostN:",
which will not change for the life of the lport.
Still show the FC_ID on lport messages.

Also, add a macro to FC_RPORT_ID_DBG for rport debugging where there's
no rdata structure involved.  It takes the lport and port_id as parameters.
Use this in fc_rport_recv_plogi_req() and fc_rport_recv_logo_req().

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:03 -05:00
Joe Eykholt
beb29a6d42 [SCSI] libfc: remove extra semicolons from debug macros
This is unlikely to cause any problems, but the libfc debug macros
introduce extra undesirable semicolons.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:03 -05:00
Chandra Seetharaman
6c10db72c9 [SCSI] scsi_dh: Reference count scsi_dh_attach
Problem reported: http://marc.info/?l=dm-devel&m=124585978305866&w=2

scsi_dh does not do a refernce count for attach/detach, and this affects
the way it is supposed to work with multipath when a device is not
in the dev_list of the hardware handler.

This patch adds a reference count that counts each attach.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:51:51 -05:00
Linus Torvalds
4dfd79e7b4 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon: add GET_PARAM/INFO support for Z pipes
  drm/radeon/kms: add r100/r200 OQ support.
  drm: Fix sysfs device confusion.
  drm/radeon/kms: implement the bo busy ioctl properly.
2009-08-21 10:45:09 -07:00
Linus Torvalds
f4b0373b26 Make bitmask 'and' operators return a result code
When 'and'ing two bitmasks (where 'andnot' is a variation on it), some
cases want to know whether the result is the empty set or not.  In
particular, the TLB IPI sending code wants to do cpumask operations and
determine if there are any CPU's left in the final set.

So this just makes the bitmask (and cpumask) functions return a boolean
for whether the result has any bits set.

Cc: stable@kernel.org (2.6.30, needed by TLB shootdown fix)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21 09:26:15 -07:00
Alex Deucher
f779b3e513 drm/radeon: add GET_PARAM/INFO support for Z pipes
Needed for occlusion queries on rv530 chips.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-21 19:10:30 +10:00
Dave Airlie
e3b2415e28 drm/radeon/kms: implement the bo busy ioctl properly.
The previous patch assumes the ioctl already existed, when
it actually didn't.

It also didn't return the correct error code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-21 09:51:30 +10:00
Linus Torvalds
cad2c8fd9b Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/kms: teardown crtc correctly when fb is destroyed.
  drm/kms/radeon: cleanup combios TV table like DDX.
  drm/radeon/kms: memset the allocated framebuffer before using it.
  drm/radeon/kms: although LVDS might be possible on crtc 1 don't do it.
  drm/radeon/kms: implement bo busy check + current domain
  drm/radeon/kms: cut down indirects in register accesses.
  drm/radeon/kms: Fix up vertical blank interrupt support.
  drm/radeon/kms: add rv530 R300_SU_REG_DEST + reloc for ZPASS_ADDR
  drm/edid: fixup detailed timings like the X server.
  drm/radeon/kms: Add specific rs690 authorized register table
2009-08-19 10:38:36 -07:00
KOSAKI Motohiro
0753ba01e1 mm: revert "oom: move oom_adj value"
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-18 16:31:13 -07:00
Linus Torvalds
8486a0f95c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
  net: restore gnet_stats_basic to previous definition
  NETROM: Fix use of static buffer
  e1000e: fix use of pci_enable_pcie_error_reporting
  e1000e: WoL does not work on 82577/82578 with manageability enabled
  cnic: Fix locking in init/exit calls.
  cnic: Fix locking in start/stop calls.
  bnx2: Use mutex on slow path cnic calls.
  cnic: Refine registration with bnx2.
  cnic: Fix symbol_put_addr() panic on ia64.
  gre: Fix MTU calculation for bound GRE tunnels
  pegasus: Add new device ID.
  drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
  via-velocity: Fix test of mii_status bit VELOCITY_DUPLEX_FULL
  rt2x00: fix memory corruption in rf cache, add a sanity check
  ixgbe: Fix receive on real device when VLANs are configured
  ixgbe: Do not return 0 in ixgbe_fcoe_ddp() upon FCP_RSP in DDP completion
  netxen: free napi resources during detach
  netxen: remove netxen workqueue
  ixgbe: fix issues setting rx-usecs with legacy interrupts
  can: fix oops caused by wrong rtnl newlink usage
  ...
2009-08-18 13:55:01 -07:00
Eric Dumazet
c1a8f1f1c8 net: restore gnet_stats_basic to previous definition
In 5e140dfc1f "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.

Restoring old behavior is not welcome, for performance reason.

Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)

Based on a report and initial patch from Michael Spang.

Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-17 21:33:49 -07:00
Eric Paris
1d9959734a security: define round_hint_to_min in !CONFIG_SECURITY
Fix the header files to define round_hint_to_min() and to define
mmap_min_addr_handler() in the !CONFIG_SECURITY case.

Built and tested with !CONFIG_SECURITY

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-08-17 15:09:27 +10:00
Eric Paris
788084aba2 Security/SELinux: seperate lsm specific mmap_min_addr
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable.  This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.

The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.

This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-08-17 15:09:11 +10:00
Eric Paris
9c0d90103c Capabilities: move cap_file_mmap to commoncap.c
Currently we duplicate the mmap_min_addr test in cap_file_mmap and in
security_file_mmap if !CONFIG_SECURITY.  This patch moves cap_file_mmap
into commoncap.c and then calls that function directly from
security_file_mmap ifndef CONFIG_SECURITY like all of the other capability
checks are done.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-08-17 15:08:35 +10:00
Dave Airlie
cefb87efc9 drm/radeon/kms: implement bo busy check + current domain
This implements the busy ioctl along with a current domain check.
returns 0 or -EBUSY
puts the current domain no matter what the answer.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-17 12:28:56 +10:00
Linus Torvalds
3493e84de6 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Report the cloning task as parent on perf_counter_fork()
  perf_counter: Fix an ipi-deadlock
  perf: Rework/fix the whole read vs group stuff
  perf_counter: Fix swcounter context invariance
  perf report: Don't show unresolved DSOs and symbols when -S/-d is used
  perf tools: Add a general option to enable raw sample records
  perf tools: Add a per tracepoint counter attribute to get raw sample
  perf_counter: Provide hw_perf_counter_setup_online() APIs
  perf list: Fix large list output by using the pager
  perf_counter, x86: Fix/improve apic fallback
  perf record: Add missing -C option support for specifying profile cpu
  perf tools: Fix dso__new handle() to handle deleted DSOs
  perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
  perf report: Show the tid too in -D
  perf record: Fix .tid and .pid fill-in when synthesizing events
  perf_counter, x86: Fix generic cache events on P6-mobile CPUs
  perf_counter, x86: Fix lapic printk message
2009-08-13 12:24:33 -07:00
Linus Torvalds
919aa96a9c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Fix handling of bad requeue syscall pairing
  futex: Fix compat_futex to be same as futex for REQUEUE_PI
  locking, sched: Give waitqueue spinlocks their own lockdep classes
  futex: Update futex_q lock_ptr on requeue proxy lock
2009-08-13 12:09:16 -07:00
Peter Zijlstra
3dab77fb1b perf: Rework/fix the whole read vs group stuff
Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce
PERF_FORMAT_GROUP to deal with group reads in a more generic
way.

This allows you to get group reads out of read() as well.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey J Ashford <cjashfor@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
LKML-Reference: <20090813103655.117411814@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-13 12:58:04 +02:00
Ingo Molnar
28402971d8 perf_counter: Provide hw_perf_counter_setup_online() APIs
Provide weak aliases for hw_perf_counter_setup_online(). This is
used by the BTS patches (for v2.6.32), but it interacts with
fixes so propagate this upstream. (it has no effect as of yet)

Also export perf_counter_output() to architecture code.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-13 10:13:22 +02:00
Trond Myklebust
1ae88b2e44 NFS: Fix an O_DIRECT Oops...
We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.

We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.

Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-12 08:21:39 -07:00
Linus Torvalds
d00aa6695b Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  perf_counter: Zero dead bytes from ftrace raw samples size alignment
  perf_counter: Subtract the buffer size field from the event record size
  perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
  perf_counter: Correct PERF_SAMPLE_RAW output
  perf tools: callchain: Fix bad rounding of minimum rate
  perf_counter tools: Fix libbfd detection for systems with libz dependency
  perf: "Longum est iter per praecepta, breve et efficax per exempla"
  perf_counter: Fix a race on perf_counter_ctx
  perf_counter: Fix tracepoint sampling to be part of generic sampling
  perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
  perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
  perf tools: callchain: Fix 'perf report' display to be callchain by default
  perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
  perf record: Fix the -A UI for empty or non-existent perf.data
  perf util: Fix do_read() to fail on EOF instead of busy-looping
  perf list: Fix the output to not include tracepoints without an id
  perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
  perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
  perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
  perf report: Fix per task mult-counter stat reporting
  ...
2009-08-10 11:48:51 -07:00
Frederic Weisbecker
1853db0e02 perf_counter: Zero dead bytes from ftrace raw samples size alignment
After aligning the ftrace raw samples, there are dead bytes storing
random data from the stack. We don't want to leak these to userspace,
then zero these out.

Before:

	0x2de88 [0x50]: event: 9
	.
	. ... raw event: size 80 bytes
	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
	.  0010:  68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00  h...h...,......
	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
	.  0040:  68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f  h...@.F........
                                                      ^  ^  ^  ^
                                                         Leak

After:

	0x2d318 [0x50]: event: 9
	.
	. ... raw event: size 80 bytes
	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
	.  0010:  68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00  h...h...h......
	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
	.  0040:  68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00  h.....F........
                                                      ^  ^  ^  ^
							 Fixed

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
2009-08-10 16:51:19 +02:00
Frederic Weisbecker
304703aba3 perf_counter: Subtract the buffer size field from the event record size
We compute the perf raw sample size by aligning the raw ftrace
event size plus the buffer size field itself. We do that
instead of aligning only the perf raw sample size, so that we
might economize some in some cases.

But this buffer size field is not stored in the perf raw
sample, we must then substract its size from the buffer once we
computed the alignment unless we may get a useless u32 field in
the buffer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090810141129.GA5124@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-10 16:18:50 +02:00
Peter Zijlstra
2fc391112f locking, sched: Give waitqueue spinlocks their own lockdep classes
Give waitqueue spinlocks their own lockdep classes when they
are initialised from init_waitqueue_head().  This means that
struct wait_queue::func functions can operate other waitqueues.

This is used by CacheFiles to catch the page from a backing fs
being unlocked and to wake up another thread to take a copy of
it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: linux-cachefs@redhat.com
Cc: torvalds@osdl.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-10 14:43:09 +02:00
Peter Zijlstra
a044560c3a perf_counter: Correct PERF_SAMPLE_RAW output
PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896447.17467.74.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-10 11:33:09 +02:00
Linus Torvalds
17d11ba149 Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Avoid redelivery of edge interrupt before next edge
  KVM: MMU: limit rmap chain length
  KVM: ia64: fix build failures due to ia64/unsigned long mismatches
  KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
  KVM: fix ack not being delivered when msi present
  KVM: s390: fix wait_queue handling
  KVM: VMX: Fix locking imbalance on emulation failure
  KVM: VMX: Fix locking order in handle_invalid_guest_state
  KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
  KVM: SVM: force new asid on vcpu migration
  KVM: x86: verify MTRR/PAT validity
  KVM: PIT: fix kpit_elapsed division by zero
  KVM: Fix KVM_GET_MSR_INDEX_LIST
2009-08-09 14:58:21 -07:00
Frederic Weisbecker
3a43ce68ae perf_counter: Fix tracepoint sampling to be part of generic sampling
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:

- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record

We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.

Reported-by; Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:54:45 +02:00
Frederic Weisbecker
f413cdb80c perf_counter: Fix/complete ftrace event records sampling
This patch implements the kernel side support for ftrace event
record sampling.

A new counter sampling attribute is added:

   PERF_SAMPLE_TP_RECORD

which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.

Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:

 perf record -f -F 1 -a -e workqueue:workqueue_execution
 perf report -D

 0x21e18 [0x48]: event: 9
 .
 . ... raw event: size 72 bytes
 .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
 .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
 .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
 .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
 .  0040:  e0 b1 31 81 ff ff ff ff                          .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33

The raw ftrace binary record starts at offset 0020.

Translation:

 struct trace_entry {
	type		= 0x2b = 43;
	flags		= 1;
	preempt_count	= 2;
	pid		= 0xa = 10;
	tgid		= 0xa = 10;
 }

 thread_comm = "events/1"
 thread_pid  = 0xa = 10;
 func	    = 0xffffffff8131b1e0 = flush_to_ldisc()

What will come next?

 - Userspace support ('perf trace'), 'flight data recorder' mode
   for perf trace, etc.

 - The unconditional copy from the profiling callback brings
   some costs however if someone wants no such sampling to
   occur, and needs to be fixed in the future. For that we need
   to have an instant access to the perf counter attribute.
   This is a matter of a flag to add in the struct ftrace_event.

 - Take care of the events recursivity! Don't ever try to record
   a lock event for example, it seems some locking is used in
   the profiling fast path and lead to a tracing recursivity.
   That will be fixed using raw spinlock or recursivity
   protection.

 - [...]

 - Profit! :-)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:53:48 +02:00
Peter Zijlstra
3a6593050f perf_counter, ftrace: Fix perf_counter integration
Adds possible second part to the assign argument of TP_EVENT().

  TP_perf_assign(
	__perf_count(foo);
	__perf_addr(bar);
  )

Which, when specified make the swcounter increment with @foo instead
of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
associated with the event) when this triggers a counter overflow.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:47:25 +02:00
Linus Torvalds
fb385003c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races
* git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races:
  xfs: fix freeing of inodes not yet added to the inode cache
  vfs: add __destroy_inode
  vfs: fix inode_init_always calling convention
2009-08-07 18:53:44 -07:00
Linus Torvalds
da758ddede Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Fix double list iteration in per task precise stats
  perf: Auto-detect libelf
  perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink
  perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
  ftrace: Fix perf-tracepoint OOPS
  perf report: Add missing command line options to man page
  perf: Auto-detect libbfd
  perf report: Make --sort comm,dso,symbol the default
2009-08-07 10:43:07 -07:00
Linus Torvalds
389623fef0 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  jffs2: Fix return value from jffs2_do_readpage_nolock()
  mtd: mtdblock: introduce mtdblks_lock
  mtd: remove 'SBC8240 Wind River' Device Driver Code
  mtd: OneNAND: OMAP2/3: free GPMC CS on module removal
  mtd: OneNAND: fix incorrect bufferram offset
  mtd: blkdevs: do not forget to get MTD devices
  mtd: fix the conversion from dev to mtd_info
  mtd: let include/linux/mtd/partitions.h stand on its own
2009-08-07 10:42:31 -07:00
Linus Torvalds
385861206c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: matrix_keypad - make matrix keymap size dynamic
  Input: wistron_btns - support Prestigio Wifi RF kill button
  Input: i8042 - add Asus G1S to noloop exception list
2009-08-07 10:42:14 -07:00
Phillip Lougher
daeb6b6fbe bzip2/lzma/gzip: fix comments describing decompressor API
Fix and improve comments in decompress/generic.h that describe the
decompressor API.  Also remove an unused definition, and rename INBUF_LEN
in lib/decompress_inflate.c to conform to bzip2/lzma naming.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
KAMEZAWA Hiroyuki
4bfc44958e mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
 init_task->mems_allowed == node_state[N_POSSIBLE]

And cpuset's top_cpuset mask is initialized as this
 top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]

Before 2.6.29:
policy's mems_allowed is initialized as this.

  1. update tasks->mems_allowed by its cpuset->mems_allowed.
  2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)

Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.

In 2.6.30: After commit 58568d2a82
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.

  1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)

Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one.  Assume user excutes command as #numactrl --interleave=all
,....

  policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)

Then, policy's mems_allowd can includes a possible node, which has no pgdat.

MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.

  NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL

Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
be on statck.  Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.

This patch stands on old behavior.  But I feel this fix itself is just a
Band-Aid.  But to do fundametal fix, we have to take care of memory
hotplug and it takes time.  (task->mems_allowd should be N_HIGH_MEMORY, I
think.)

mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.

In old behavior, this is guaranteed by frequent reference to cpuset's
code.  Now, most of them are removed and mempolicy has to check it by
itself.

To do check, a few nodemask_t will be used for calculating nodemask.  But,
size of nodemask_t can be big and it's not good to allocate them on stack.

Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.

[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:55 -07:00
Christoph Hellwig
2e00c97e2c vfs: add __destroy_inode
When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch.  As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-07 14:38:29 -03:00
Christoph Hellwig
54e346215e vfs: fix inode_init_always calling convention
Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails.  That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added.  This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-07 14:38:25 -03:00
Eric Miao
d82f1c3534 Input: matrix_keypad - make matrix keymap size dynamic
Remove assumption on the shift and size of rows/columns form
matrix_keypad driver.

Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2009-08-05 22:20:14 -07:00
Peter Zijlstra
af6af30c0f ftrace: Fix perf-tracepoint OOPS
Not all tracepoints are created equal, in specific the ftrace
tracepoints are created with TRACE_EVENT_FORMAT() which does
not generate the needed bits to tie them into perf counters.

For those events, don't create the 'id' file and fail
->profile_enable when their ID is specified through other
means.

Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1249497664.5890.4.camel@laptop>
[ v2: fix build error in the !CONFIG_EVENT_PROFILE case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 06:26:09 +02:00
Michael S. Tsirkin
5116d8f6b9 KVM: fix ack not being delivered when msi present
kvm_notify_acked_irq does not check irq type, so that it sometimes
interprets msi vector as irq.  As a result, ack notifiers are not
called, which typially hangs the guest.  The fix is to track and
check irq type.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05 14:03:43 +03:00
Alex Deucher
6502fbfaf8 drm/radeon: Add support for RS880 chips
These are new AMD IGP chips

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-05 12:07:09 +10:00