Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.
v2 allows userspace to save/restore hsave
v4 dummys out the hsave MSR, so we use a host page
v6 remembers the guest's hsave and exports the MSR
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.
To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.
I tried to write some code to make -no-kvm-irqchip work too, but the first
level guest won't even boot with that atm, so I ditched it.
v2 moves the hflags to x86 generic code
v3 makes use of the new permission helper
v6 only enables interrupt_window if GIF=1
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
These are helpers for the nested SVM implementation.
- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory
v3 makes use of the new permission checker
v6 changes:
- streamline nsvm_debug()
- remove printk(KERN_ERR)
- SVME check before CPL check
- give GP error code
- use new EFER constant
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM
specific headers. Linux does have nice header files to collect
EFER bits and MSR IDs, so IMHO we should put them there.
While at it, I also changed the naming scheme to match that
of the other defines.
(introduced in v6)
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.
v2 uses two distinct functions for setting and clearing the bit
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
With a sufficiently new compiler and binutils, code which wasn't
previously generating .eh_frame sections has begun to. Certain
architectures (powerpc, in this case) may generate unexpected relocation
formats in response to this, preventing modules from loading.
While the new relocation types should probably be handled, revert to the
previous behaviour with regards to generation of .eh_frame sections.
(This was reported against Fedora, which appears to be the only distro
doing any building against gcc-4.4 at present: RH bz#486545.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Alexandre Oliva <aoliva@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert the change to the orphan dates of Windows 95, DOS, compression.
Add a new orphan date for OS/2.
Signed-off-by: Jody McIntyre <scjody@sun.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
ucc_geth: Fix oops when using fixed-link support
dm9000: locking bugfix
net: update dnet.c for bus_id removal
dnet: DNET should depend on HAS_IOMEM
dca: add missing copyright/license headers
nl80211: Check that function pointer != NULL before using it
sungem: missing net_device_ops
be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle
be2net: replenish when posting to rx-queue is starved in out of mem conditions
bas_gigaset: correctly allocate USB interrupt transfer buffer
smsc911x: reset last known duplex and carrier on open
sh_eth: Fix mistake of the address of SH7763
sh_eth: Change handling of IRQ
netns: oops in ip[6]_frag_reasm incrementing stats
net: kfree(napi->skb) => kfree_skb
net: fix sctp breakage
ipv6: fix display of local and remote sit endpoints
net: Document /proc/sys/net/core/netdev_budget
tulip: fix crash on iface up with shirq debug
virtio_net: Make virtio_net support carrier detection
...
This patch fixes bug #12208:
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject : uml is very slow on 2.6.28 host
This turned out to be not a scheduler regression, but an already
existing problem in ptrace being triggered by subtle scheduler
changes.
The problem is this:
- task A is ptracing task B
- task B stops on a trace event
- task A is woken up and preempts task B
- task A calls ptrace on task B, which does ptrace_check_attach()
- this calls wait_task_inactive(), which sees that task B is still on the runq
- task A goes to sleep for a jiffy
- ...
Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.
This patch solves this by not rescheduling in read_unlock() after
ptrace_stop() has woken up the tracer.
Thanks to Oleg Nesterov and Ingo Molnar for the feedback.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grant picked up the wrong version of "Respect _PAGE_COHERENT on classic
ppc32 SW" (commit a4bd6a93c3)
It was missing the code to actually deal with the fixup of
_PAGE_COHERENT based on the CPU feature.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
commit b1c4a9dddf ("ucc_geth: Change
uec phy id to the same format as gianfar's") introduced a regression
in the ucc_geth driver that causes this oops when fixed-link is used:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc0151270
Oops: Kernel access of bad area, sig: 11 [#1]
TMCUTU
NIP: c0151270 LR: c0151270 CTR: c0017760
REGS: cf81fa60 TRAP: 0300 Not tainted (2.6.29-rc8)
MSR: 00009032 <EE,ME,IR,DR> CR: 24024042 XER: 20000000
DAR: 00000000, DSISR: 20000000
TASK = cf81cba0[1] 'swapper' THREAD: cf81e000
GPR00: c0151270 cf81fb10 cf81cba0 00000000 c0272e20 c025f354 00001e80
cf86b08c
GPR08: d1068200 cffffb74 06000000 d106c200 42024042 10085148 0fffd000
0ffc81a0
GPR16: 00000001 00000001 00000000 007ffeb0 00000000 0000c000 cf83f36c
cf83f000
GPR24: 00000030 cf83f360 cf81fb20 00000000 d106c200 20000000 00001e80
cf83f360
NIP [c0151270] ucc_geth_open+0x330/0x1efc
LR [c0151270] ucc_geth_open+0x330/0x1efc
Call Trace:
[cf81fb10] [c0151270] ucc_geth_open+0x330/0x1efc (unreliable)
[cf81fba0] [c0187638] dev_open+0xbc/0x12c
[cf81fbc0] [c0187e38] dev_change_flags+0x8c/0x1b0
This patch fixes the issue by removing offending (and somewhat
duplicate) code from init_phy() routine, and changes _probe()
function to use uec_mdio_bus_name().
Also, since we fully construct phy_bus_id in the _probe() routine,
we no longer need ->phy_address and ->mdio_bus fields in
ucc_geth_info structure.
I wish the patch would be a bit shorter, but it seems like the only
way to fix the issue in a sane way. Luckily, the patch has been
tested with real PHYs and fixed-link, so no further regressions
expected.
Reported-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a locking bug in the dm9000 driver. It calls
request_irq() without setting IRQF_DISABLED ... which is
correct for handlers that support IRQ sharing, since that
behavior is not guaranteed for shared IRQs. However, its
IRQ handler then wrongly assumes that IRQs are blocked.
So the fix just uses the right spinlock primitives in the
IRQ handler.
NOTE: this is a classic example of the type of bug which
lockdep currently masks by forcibly setting IRQF_DISABLED
on IRQ handlers that did not request that flag.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'fix-includes' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: merge the non-MMU and MMU versions of siginfo.h
m68k: use the MMU version of unistd.h for all m68k platforms
m68k: merge the non-MMU and MMU versions of signal.h
m68k: merge the non-MMU and MMU versions of ptrace.h
m68k: use MMU version of setup.h for both MMU and non-MMU
m68k: merge the non-MMU and MMU versions of sigcontext.h
m68k: merge the non-MMU and MMU versions of swab.h
m68k: merge the non-MMU and MMU versions of param.h
Update all previous incarnations of my email address to the correct one.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being
specified as mount options, a NULL pointer dereference of crypt_stat
was possible during lookup.
This patch moves the crypt_stat assignment into
ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat
will not be NULL before we attempt to dereference it.
Thanks to Dan Carpenter and his static analysis tool, smatch, for
finding this bug.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: Dustin Kirkland <kirkland@canonical.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When allocating the memory used to store the eCryptfs header contents, a
single, zeroed page was being allocated with get_zeroed_page().
However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or
ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is
stored in the file's private_data->crypt_stat->num_header_bytes_at_front
field.
ecryptfs_write_metadata_to_contents() was using
num_header_bytes_at_front to decide how many bytes should be written to
the lower filesystem for the file header. Unfortunately, at least 8K
was being written from the page, despite the chance of the single,
zeroed page being smaller than 8K. This resulted in random areas of
kernel memory being written between the 0x1000 and 0x1FFF bytes offsets
in the eCryptfs file headers if PAGE_SIZE was 4K.
This patch allocates a variable number of pages, calculated with
num_header_bytes_at_front, and passes the number of allocated pages
along to ecryptfs_write_metadata_to_contents().
Thanks to Florian Streibelt for reporting the data leak and working with
me to find the problem. 2.6.28 is the only kernel release with this
vulnerability. Corresponds to CVE-2009-0787
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: Dustin Kirkland <kirkland@canonical.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>
Cc: Greg KH <greg@kroah.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Florian Streibelt <florian@f-streibelt.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a regression introduced when we switched to using the core
pci_set_power_state(). The chip seems to need the state to be written
over and over again until it sticks, so we do that.
Note that the code is a bit blunt, without timeout, etc... but that's
pretty much because I put back in there the code exactly as it used to
be before the regression. I still add a call to pci_set_power_state()
at the end so that ACPI gets called appropriately on x86.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Raymond Wooninck <tittiatcoke@gmail.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In two dca files copyright and license headers are missing.
This patch adds them there.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NL80211_CMD_GET_MESH_PARAMS and NL80211_CMD_SET_MESH_PARAMS handlers
did not verify whether a function pointer is NULL (not supported by
the driver) before trying to call the function. The former nl80211
command is available for unprivileged users, too, so this can
potentially allow normal users to kill networking (or worse..) if
mac80211 is built without CONFIG_MAC80211_MESH=y.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sungem driver only got partially converted to net_device_ops.
Since this could cause bugs, please push this to 2.6.29
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a patch to reconfigure vlan-ids during an i/f down/up cycle
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a patch to replenish the rx-queue when it is in a starved
state (due to out-of-mem conditions)
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).
Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).
The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit abf137dd77).
Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove a source of fput() call from inside IRQ context. Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program. Independently from this, the bug is
conceptually there, so we might be better off fixing it. This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sam Ravnborg says:
"We have several architectures that plays strange games with $(CC) and
$(CROSS_COMPILE).
So we need to postpone any use of $(call cc-option..) until we have
included the arch specific Makefile so we try with the correct $(CC)
version."
Requested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] make page table upgrade work again
[S390] make page table walking more robust
[S390] Dont check for pfn_valid() in uaccess_pt.c
[S390] ftrace/mcount: fix kernel stack backchain
[S390] topology: define SD_MC_INIT to fix performance regression
[S390] __div64_31 broken for CONFIG_MARCH_G5
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Clear space_info full when adding new devices
Btrfs: Fix locking around adding new space_info
Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.
I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes sure that gcc doesn't try to optimize away wrapping
arithmetic, which the kernel occasionally uses for overflow testing, ie
things like
if (ptr + offset < ptr)
which technically is undefined for non-unsigned types. See
http://bugzilla.kernel.org/show_bug.cgi?id=12597
for details.
Not all versions of gcc support it, so we need to make it conditional
(it looks like it was introduced in gcc-3.4).
Reminded-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When you compile kernel on Sparc64 with heap memory checking and type
"cat /proc/iomem", you get a crash, because pointers in struct
resource are uninitialized.
Most code fills struct resource with zeros, so I assume that it is
responsibility of the caller of request_resource to initialized it,
not the responsibility of request_resource functuion.
After 2.6.29 is out, there could be a check for uninitialized fields
added to request_resource to avoid crashes like this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise it might interrupt switch_to() midstream and use
half-cooked register window state.
Reported-by: Chris Torek <chris.torek@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every USB transfer buffer has to be allocated individually by kmalloc.
Impact: bugfix, no functional change
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Tested-by: Kolja Waschk <kawk@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
smsc911x_phy_adjust_link is called periodically by the phy layer (as
it's run in polling mode), and it only updates the hardware when it sees
a change in duplex or carrier. This patch clears the last known values
every time the interface is brought up, instead of only when the module
is loaded.
Without this patch the adjust_link function never updates the hardware
after an ifconfig down; ifconfig up. On a full duplex link this causes
the tx error counter to increment, even though packets are correctly
transmitted, as the default MAC_CR register setting is for half duplex.
The tx errors are "no carrier" errors, which should be ignored in
full-duplex mode. When MAC_CR is set to "full duplex" mode they are
correctly ignored by the hardware.
Note that even with this patch the tx error counter can increment if
packets are transmitted between "ifconfig up" and the first phy poll
interval. An improved solution would use the phy interrupt with phylib,
but I haven't managed to make this work 100% robustly yet.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Address of SH_TSU_ADDR and ARSTR of SH7763 was wrong.
This revise it.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling of IRQ of the SH7763/SH7764 CPU which sh_eth supported was
changed.
This revises it for this change.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets.
Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter
conntrack reassembles them on the OUTPUT path you hit this code path.
You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD"
With help from Jarek Poplawski.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sk_buff pointers should be freed with kfree_skb.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
broken by commit 5e739d1752aca4e8f3e794d431503bfca3162df4; AFAICS should
be -stable fodder as well...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Aced-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the regressions cause by
commit 1326c3d5a4
(v2.6.28-rc6-461-g23a12b1) broke the display of local and remote
addresses of an SIT tunnel in iproute2.
nt->parms is used by ipip6_tunnel_init() and therefore need to be
initialized first.
Tracked as http://bugzilla.kernel.org/show_bug.cgi?id=12868
Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NAPI poll parameter netdev_budget is not documented in
kernel-docs. Since it may have a substantial effect on at least some
network loads, it should be.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tulip is currently doing request_irq before it has done its
initialization. This is usually not a problem because it hasn't
enable interrupts yet, but with DEBUG_SHIRQ on, we call the irq handler
when registering the interrupt as a sanity check.
This can result in a NULL ptr dereference, so call tulip_init_ring
before request_irq, and add a free_ring function to do the freeing
now shared with tulip_close.
Tested with a shell loop running ifup, ifdown in a loop a few hundred
times with DEBUG_SHIRQ on.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>