In the OF 'translations' property, the template TTEs in the mappings
never specify the executable bit. This is the case even though some
of these mappings are for OF's code segment.
Therefore, we need to force the execute bit on in every mapping.
This problem can only really trigger on Niagara/sun4v machines and the
history behind this is a little complicated.
Previous to sun4v, the sun4u TTE entries lacked a hardware execute
permission bit. So OF didn't have to ever worry about setting
anything to handle executable pages. Any valid TTE loaded into the
I-TLB would be respected by the chip.
But sun4v Niagara chips have a real hardware enforced executable bit
in their TTEs. So it has to be set or else the I-TLB throws an
instruction access exception with type code 6 (protection violation).
We've been extremely fortunate to not get bitten by this in the past.
The best I can tell is that the OF's mappings for it's executable code
were mapped using permanent locked mappings on sun4v in the past.
Therefore, the fact that we didn't have the exec bit set in the OF
translations we would use did not matter in practice.
Thanks to Greg Onufer for helping me track this down.
Signed-off-by: David S. Miller <davem@davemloft.net>
In the rds_iw_mr_pool struct the free_pinned field keeps track of
memory pinned by free MRs. While this field is incremented properly
upon allocation, it is never decremented upon unmapping. This would
cause the rds_rdma module to crash the kernel upon unloading, by
triggering the BUG_ON in the rds_iw_destroy_mr_pool function.
This change keeps track of the MRs that become unpinned, so that
free_pinned can be decremented appropriately.
Signed-off-by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off-by: Steve Wise <swise@ogc.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make USB-to-Ethernet-adapters (depending on usbnet) support
timestamping, the "skb_defer_rx_timestamp" and "skb_tx_timestamp" function
calls are added.
Signed-off-by: Michael Riesch <michael@riesch.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't use hardcoded irq num and replace it with
FEC_IRQ_NUM macro.
Signed-off-by: Xiao Jiang <jgq516@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed that the legacy Interrupt handler didn't have the same
ECC warning as did the MSI. So this patch adds it.
Signed-off-by: Don Skidmore <donald.c.skidmore>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The X540 thermal sensor interrupt isn't a General Purpose Interrupt
so doesn't need to be enabled in ixgbe_setup_gpie(). Likewise X540 doesn't
use the SDP0 for thermal sensor so it doesn't need to be enabled for any
device other than 82599.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add code to enable thermal sensors for the x540 hardware, as well as a
thermal interrupt check which will exit with a critical message of a
thermal overheat is detected. Intent of code allows other mac types to
be added with different configuration in the future.
Fixed in this version is the addition of setting the temp_sensor
capable flag which was previously only set for a specific mac.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Revise high and low threshold marks wrt flow control to account
for the X540 devices and latency introduced by the loopback
switch.
Without this it was in theory possible to drop frames on a
supposedly lossless link with X540 or SR-IOV enabled.
Previously we used a magic number in a define to calculate the
threshold values. This made it difficult to sort out exactly
which latencies were or were not being accounted for. Here
I was overly explicit and tried to used #define names that would
be recognizable after reading the IEEE 802.1Qbb specification.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Disable LLI for FCoE since regular interrupt
and their moderation rate works slightly better
for FCoE also.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is meant to help cleanup the interrupt throttle rate logic by
storing the interrupt throttle rate as a value in microseconds instead of
interrupts per second. The advantage to this approach is that the value
can now be stored in an 16 bit field and doesn't require as much math to
flip the value back and forth since the hardware already used microseconds
when setting the rate.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changes to clean up the vlan rx path broke trunk vlan. Trunk vlans in
a VF driver are those set using:
"ip link set <pfdev> vf <n> <vlanid>"
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
CC: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a qemu-kvm
guest system configured with two e1000 NICs can result in an 'unable to handle
kernel paging request at 0000000100000000' or 'bad page map in process ...' or
something similar.
These result from a 4096-byte page being corrupted with the following two-word
pattern (16-bytes) repeated throughout the entire page:
0x0000000000000000
0x0000000100000000
There can be other bits set as well. What is a constant is that the 2nd word
has the 32nd bit set. So one could see:
:
0x0000000000000000
0x0000000100000000
0x0000000000000000
0x0000000172adc067 <<< bad pte
0x800000006ec60067
0x0000000700000040
0x0000000000000000
0x0000000100000000
:
Which came from from a process' page table I dumped out when the marked line
was seen as bad by print_bad_pte().
The repeating pattern represents the e1000's two-word receive descriptor:
struct e1000_rx_desc {
__le64 buffer_addr; /* Address of the descriptor's data buffer */
__le16 length; /* Length of data DMAed into data buffer */
__le16 csum; /* Packet checksum */
u8 status; /* Descriptor status */
u8 errors; /* Descriptor Errors */
__le16 special;
};
And the 32nd bit of the 2nd word maps to the 'u8 status' member, and
corresponds to E1000_RXD_STAT_DD which indicates the descriptor is done.
The corruption appears to result from the following...
. An 'ifconfig ethN down' gets us into e1000_close(), which through a number
of subfunctions results in:
1. E1000_RCTL_EN being cleared in RCTL register. [e1000_down()]
2. dma_free_coherent() being called. [e1000_free_rx_resources()]
. An 'ifconfig ethN up' gets us into e1000_open(), which through a number of
subfunctions results in:
1. dma_alloc_coherent() being called. [e1000_setup_rx_resources()]
2. E1000_RCTL_EN being set in RCTL register. [e1000_setup_rctl()]
3. E1000_RCTL_EN being cleared in RCTL register. [e1000_configure_rx()]
4. RDLEN, RDBAH and RDBAL registers being set to reflect the dma page
allocated in step 1. [e1000_configure_rx()]
5. E1000_RCTL_EN being set in RCTL register. [e1000_configure_rx()]
During the 'ifconfig ethN up' there is a window opened, starting in step 2
where the receives are enabled up until they are disabled in step 3, in which
the address of the receive descriptor dma page known by the NIC is still the
previous one which was freed during the 'ifconfig ethN down'. If this memory
has been reallocated for some other use and the NIC feels so inclined, it will
write to that former dma page with predictably unpleasant results.
I realize that in the guest, we're dealing with an e1000 NIC that is software
emulated by qemu-kvm. The problem doesn't appear to occur on bare-metal. Andy
suspects that this is because in the emulator link-up is essentially instant
and traffic can start flowing immediately. Whereas on bare-metal, link-up
usually seems to take at least a few milliseconds. And this might be enough
to prevent traffic from flowing into the device inside the window where
E1000_RCTL_EN is set.
So perhaps a modification needs to be made to the qemu-kvm e1000 NIC emulator
to delay the link-up. But in defense of the emulator, it seems like a bad idea
to enable dma operations before the address of the memory to be involved has
been made known.
The following patch no longer enables receives in e1000_setup_rctl() but leaves
them however they were. It only enables receives in e1000_configure_rx(), and
only after the dma address has been made known to the hardware.
There are two places where e1000_setup_rctl() gets called. The one in
e1000_configure() is followed immediately by a call to e1000_configure_rx(), so
there's really no change functionally (except for the removal of the problem
window. The other is in __e1000_shutdown() and is not followed by a call to
e1000_configure_rx(), so there is a change functionally. But consider...
. An 'ifconfig ethN down' (just as described above).
. A 'suspend' of the system, which (I'm assuming) will find its way into
e1000_suspend() which calls __e1000_shutdown() resulting in:
1. E1000_RCTL_EN being set in RCTL register. [e1000_setup_rctl()]
And again we've re-opened the problem window for some unknown amount of time.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If request_irq fails, the ibmveth driver will overwrite
the rc and end up returning a successful rc on its open
function, resulting in an oops later when a packet gets
sent and buffers are not allocated due to the failed open.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently
shared with new socket created for connection.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix EEH recovery on new P Series platform by
requesting fundamental reset.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes two off-by-one errors that canceled each other out.
Checking for the same condition two times in bcm_tx_timeout_tsklet() reduced
the count of frames to be sent by one. This did not show up the first time
tx_setup is invoked as an additional frame is sent due to TX_ANNONCE.
Invoking a second tx_setup on the same item led to a reduced (by 1) number of
sent frames.
Reported-by: Andre Naujoks <nautsch@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I got:
Generating server: Tehuti.onmicrosoft.com
baum@tehutinetworks.net
#< #5.1.1 smtp;550 5.1.1 RESOLVER.ADR.RecipNotFound; not found> #SMTP#
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Alexander Indenbaum <baum@tehutinetworks.net>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver has two warning messages that might be triggered
by normal use cases. When they appear, the messages give the
impression of a never ending series of errors.
This commit changes them to debug messages instead.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.
The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for SJW user settings to not set the synchronization
jump width (SJW) to 1 in any case when using the in-kernel bittiming
calculation.
The ip-tool from iproute2 already supports to pass the user defined SJW
value. The given SJW value is sanitized with the controller specific sjw_max
and the calculated tseg2 value. As the SJW can have values up to 4 providing
this value will lead to the maximum possible SJW automatically. A higher SJW
allows higher controller oscillator tolerances.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch also changes writel/readl to iowrite32/ioread32.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the driver for the SJA1000 based PCMCIA card 'CPC-Card' from
EMS Dr. Thomas Wuensche (http://www.ems-wuensche.de).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Markus Plessing <plessing@ems-wuensche.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an event to monitor comm value changes of tasks. Such an event
becomes vital, if someone desires to control threads of a process in
different manner.
A natural characteristic of threads is its comm value, and helpfully
application developers have an opportunity to change it in runtime.
Reporting about such events via proc connector allows to fine-grain
monitoring and control potentials, for instance a process control daemon
listening to proc connector and following comm value policies can place
specific threads to assigned cgroup partitions.
It might be possible to achieve a pale partial one-shot likeness without
this update, if an application changes comm value of a thread generator
task beforehand, then a new thread is cloned, and after that proc
connector listener gets the fork event and reads new thread's comm value
from procfs stat file, but this change visibly simplifies and extends the
matter.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The upper protocol numbers of PPPOE are different, and should be treated
specially.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 7361c36c52 (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.
This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.
# Events: 154K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... .................. .........................
#
10.40% hackbench [kernel.kallsyms] [k] put_pid
8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
1.75% hackbench [kernel.kallsyms] [k] fget_light
1.51% hackbench [kernel.kallsyms] [k]
__mutex_lock_interruptible_slowpath
1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb
This patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]
Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.
If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.
Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)
hackbench 20 thread 2000
4.228 sec instead of 9.102 sec
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit
288d5abec8: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.
But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?
Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:
"I have some ARM IXP4xx based machines that use the two on chip MAC
ports (aka NPEs). The NPE needs a firmware in order to function.
Ever since the following commit [that 288d5abec8 one], it is no
longer possible to bring up the interfaces during the init scripts."
with a call trace showing an ioctl coming from user space. Richard says:
"The init is busybox, and the startup script does mount, syslogd, and
then ifup, so that all can go by quickly."
The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls. By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0). The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.
This avoids misdirected (and hung) requests that manifest as server
errors like
[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
Signed-off-by: Sage Weil <sage@newdream.net>
We need to apply the modulo pg_num calculation before looking up a pgid in
the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes
(some) misdirected requests that result in messages like
[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
on the server and stall make the client block without getting a reply (at
least until the pg_temp mapping goes way, but that can take a long long
time).
Reorder calc_pg_raw() a bit to make more sense.
Signed-off-by: Sage Weil <sage@newdream.net>
* git://github.com/davem330/net:
ipv6-multicast: Fix memory leak in IPv6 multicast.
ipv6: check return value for dst_alloc
net: check return value for dst_alloc
ipv6-multicast: Fix memory leak in input path.
bnx2x: add missing break in bnx2x_dcbnl_get_cap
bnx2x: fix WOL by enablement PME in config space
bnx2x: fix hw attention handling
net: fix a typo in Documentation/networking/scaling.txt
ath9k: Fix a dma warning/memory leak
rtlwifi: rtl8192cu: Fix unitialized struct
iwlagn: fix dangling scan request
batman-adv: do_bcast has to be true for broadcast packets only
cfg80211: Fix validation of AKM suites
iwlegacy: do not use interruptible waits
iwlegacy: fix command queue timeout
ath9k_hw: Fix Rx DMA stuck for AR9003 chips
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
[SCSI] 3w-9xxx: fix iommu_iova leak
[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
[SCSI] scsi: qla4xxx needs libiscsi.o
[SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
[SCSI] aacraid: reset should disable MSI interrupt
Storing the struct temp_data pointer allocated from create_core_data()
when returning an error has the potential of leaving around a pointer
to freed memory. Reset it to NULL for error returns.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
With recent change "hwmon: (coretemp) don't use kernel assigned CPU
number as platform device ID", the microcode check is now running on
random CPU. Fix that by checking the microcode before creating the
platform device rather than at probe time.
Also avoid calling TO_PHYS_ID(cpu) twice in the same function, it's
expensive.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* 'for-linus' of git://github.com/tiwai/sound:
ASoC: ssm2602: Re-enable oscillator after suspend
ALSA: usb-audio: Check for possible chip NULL pointer before clearing probing flag
ALSA: hda/realtek - Don't detect LO jack when identical with HP
ALSA: hda/realtek - Avoid bogus HP-pin assignment
ALSA: HDA: No power nids on 92HD93
ASoC: omap-mcbsp: Do not attempt to change DAI sysclk if stream is active
If reg_vif_xmit cannot find a routing entry, be sure to
free the skb before returning the error.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
return value of dst_alloc must be checked before use
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
return value of dst_alloc must be checked before use
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Have to free the skb before returning if we fail
the fib lookup.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>