Commit graph

440889 commits

Author SHA1 Message Date
Adrien BAK
ffa91880a9 perf tools: Improve error reporting
In the current version, when using perf record, if something goes
wrong in tools/perf/builtin-record.c:375
  session = perf_session__new(file, false, NULL);

The error message:
"Not enough memory for reading per file header"

is issued. This error message seems to be outdated and is not very
helpful. This patch proposes to replace this error message by
"Perf session creation failed"

I believe this issue has been brought to lkml:
https://lkml.org/lkml/2014/2/24/458
although this patch only tackles a (small) part of the issue.

Additionnaly, this patch improves error reporting in
tools/perf/util/data.c open_file_write.

Currently, if the call to open fails, the user is unaware of it.
This patch logs the error, before returning the error code to
the caller.

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Adrien BAK <adrien.bak@metascale.org>
Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast
[ Reorganize the changelog into paragraphs ]
[ Added empty line after fd declaration in open_file_write ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-04-20 00:15:12 +02:00
Vladimir Nikulichev
922d0e4d9f perf tools: Adjust symbols in VDSO
pert-report doesn't resolve function names in VDSO:

$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
            8.76%
               0x7fff6b1fe861
               __gettimeofday
               ACE_OS::gettimeofday()
...

In this case symbol values should be adjusted the same way as for executables,
relocatable objects and prelinked libraries.

After fix:

$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
            8.76%
               __vdso_gettimeofday
               __gettimeofday
               ACE_OS::gettimeofday()

Signed-off-by: Vladimir Nikulichev <nvs@tbricks.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvs
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-04-20 00:15:11 +02:00
Alexander Yarygin
acb61fc8ed perf kvm: Fix 'Min time' counting in report command
Every event in the perf-kvm has a 'stats' structure, which contains
max/min/average/etc times of handling this event.
The problem is that the 'perf-kvm stat report' command always shows
that 'min time' is 0us for every event. Example:

 # perf kvm stat report

 Analyze events for all VCPUs:

    VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
  [..]
  0xB2 MSCH         12     0.07%     0.00%        0us        8us 7.31us ( +-   2.11% )
  0xB2 CHSC         12     0.07%     0.00%        0us       18us 9.39us ( +-   9.49% )
  0xB2 STPX          8     0.05%     0.00%        0us        2us 1.88us ( +-   7.18% )
  0xB2 STSI          7     0.04%     0.00%        0us       44us 16.49us ( +-  38.20% )
  [..]

This happens because the 'stats' structure is not initialized and
stats->min equals to 0. Lets initialize the structure for every
event after its allocation using init_stats() function. This initializes
stats->min to -1 and makes 'Min time' statistics counting work:

 # perf kvm stat report

 Analyze events for all VCPUs:

    VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
  [..]
  0xB2 MSCH         12     0.07%     0.00%        6us        8us 7.31us ( +-   2.11% )
  0xB2 CHSC         12     0.07%     0.00%        7us       18us 9.39us ( +-   9.49% )
  0xB2 STPX          8     0.05%     0.00%        1us        2us 1.88us ( +-   7.18% )
  0xB2 STSI          7     0.04%     0.00%        1us       44us 16.49us ( +-  38.20% )
  [..]

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com
[ Fixing the perf examples changelog output ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-04-20 00:14:08 +02:00
Venkatesh Srinivas
2422365780 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:26 +02:00
Masami Hiramatsu
6381c24cd6 kprobes/x86: Fix page-fault handling logic
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17 10:57:02 +02:00
Linus Torvalds
6ca2a88ad8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - reboot regression fix
   - build message spam fix
   - GPU quirk fix
   - 'make kvmconfig' fix

  plus the wire-up of the renameat2() system call on i386"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove the PCI reboot method from the default chain
  x86/build: Supress "Nothing to be done for ..." messages
  x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
  x86/platform: Fix "make O=dir kvmconfig"
  i386: Wire up the renameat2() syscall
2014-04-16 16:40:18 -07:00
Linus Torvalds
2a83dc7e37 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes, plus a simple hardware-enablement patch for the Intel
  RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
  still fine at this stage"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Instead of redirecting flex output, use -o
  perf tools: Fix double free in perf test 21 (code-reading.c)
  perf stat: Initialize statistics correctly
  perf bench: Set more defaults in the 'numa' suite
  perf bench: Fix segfault at the end of an 'all' execution
  perf bench: Update manpage to mention numa and futex
  perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
  perf probe: Fix to handle errors in line_range searching
  perf probe: Fix --line option behavior
  perf tools: Pick up libdw without explicit LIBDW_DIR
  MAINTAINERS: Change e-mail to kernel.org one
  perf callchains: Disable unwind libraries when libelf isn't found
  tools lib traceevent: Do not call warning() directly
  tools lib traceevent: Print event name when show warning if possible
  perf top: Fix documentation of invalid -s option
  perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16 16:38:57 -07:00
Linus Torvalds
17cf7db27b Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
 "ARM VIC (Vectored Irq Controller) irqchip driver fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: vic: Properly chain the cascaded IRQs
2014-04-16 16:36:00 -07:00
Linus Torvalds
d99d5917e7 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "liblockdep fixes and mutex debugging fixes"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/mutex: Fix debug_mutexes
  tools/liblockdep: Add proper versioning to the shared obj
  tools/liblockdep: Ignore asmlinkage and visible
2014-04-16 16:35:18 -07:00
Linus Torvalds
498f96204f fbdev fixes for 3.15:
- fix build errors for bf54x-lq043fb and imxfb
 - fbcon fix for da8xx-fb
 - omapdss fixes for hdmi audio, irq handling and fclk calculation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTTkUEAAoJEPo9qoy8lh71ccsP/2i14uYSVkoEkNfYKXbL0zzx
 Tj9CZy1YD2AFtONvxEZM2R5du+bAxMlhAExXsA/+DMOwzT/v9UpWDZHpxaM9FSsG
 TwZGzxeWkRe0B69b/nkGgaYXmcQWD15JXItrBGsQP87ls9clRcC0S+aX5tMs62v/
 5C5NY2xVD49DrLjOU2GPxnuGS6dXIGXzZnvJg4CYsYArDfnab4FCFxw18bBuKWx/
 PIA8lQDRFfZsN6rBj7WO0kBPO8+54ITEZbrjkH+xAjhyXFcS2AZ+uYWHkojmlXZa
 Q+k1kHp+fA1NlZiyrI//Ux5NsUcCQjt5q/K3nAjgOnsILf6iv8iY8p+NtvOHn0Dm
 UfQCHNLFQqv57XDl3bHpq6vYS0jTZ2syqE78MPz3C4OWwnE/7Tmor1BPfR9Z2zX3
 pwRe87NABDC+SBP9XxkhKKFCKjViC+Cpn+OTEMiSib/dthJB18aLPyyQ9fI6/42p
 lEVTP5h5RPhY4E1jBFk+Mp1pOdO7Q2z4dxCyd9Ud4KRbtyAq9OJ4K5sPcvPUlQRk
 y2ffCO4UpYQqF0cVjtf1Fu41pbl9tQxcwnFVrzGyFc+dybhZa7gDYbgLB+Hec0nU
 HvyzqyxL6E2EAAXwsVuBa9Q6BR90nz7UA7oZXXY+Kq0wd81lcTMCFrkQnstPm7Jp
 HCtQLNgw13v1jRK8PUUI
 =ytBC
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull fbdev fixes from Tomi Valkeinen:
 - fix build errors for bf54x-lq043fb and imxfb
 - fbcon fix for da8xx-fb
 - omapdss fixes for hdmi audio, irq handling and fclk calculation

* tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  video: bf54x-lq043fb: fix build error
  OMAPDSS: Change struct reg_field to dispc_reg_field
  OMAPDSS: Take pixelclock unit change into account in hdmi_compute_acr()
  OMAPDSS: fix shared irq handlers
  video: imxfb: Select LCD_CLASS_DEVICE unconditionally
  OMAPDSS: fix rounding when calculating fclk rate
  video: da8xx-fb: Fix casting of info->pseudo_palette
2014-04-16 16:03:24 -07:00
Linus Torvalds
5f63517cbf A first set of pin control fixes for the v3.15 series:
- Fix a couple of barnsjukdomar on the Rockchip driver.
 
 - Remove an idiotic debug print I happened to leave behind
   in the Nomadik driver.
 
 - Fixup the Qualcomm MSM interrupt handling code for the
   TLMM v2.
 
 - Three patches renaming the Broadcom Capri driver to
   BCM28155. This has been falling between the chairs for
   some time due to some cross-tree synchronization
   misunderstandings, now I'm fed up with this and just
   rename it in this -rc1 phase.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTTPPzAAoJEEEQszewGV1zSBgP/iMS9y9LAouPmzkXseCeT4aM
 eb2pEi1wtOSWCeGNMIx4X1GRkbHS+T+5Wk6dmh46RUn/8b1HY4gkLoJLnrQGML1l
 zS9tdDyURGmGYuVAr0ghLq0LTruvcViCQageRzO8yllTkJb8Tf6rfKE2+y9BsGRH
 CdBIE9/XYP2Z2Wwd0fLAPMFpa9wPz8eNeF7XGyQ20+DSuRzNMmDq6AUhmlCfIMnL
 TxlJIT1vpAAt/e3wcRvmr2n/Nrlz28ajP/VmzSm2dSqTajy8ofWqgFwQLiItJJ3q
 VuJto3eKy+xGT4IVO+ozXCZd0kDgaAeiz7PNWHpbFBec0y4QFmVFxtPpw/Zff7RH
 136Vh0KahX47TaJ1GGvB5622OLjsQzwH2TY24Sn9WRzNT8VS0pv2F3RQk8tVrWrd
 fquQksuMEaCay+MHcBhI1mJlQcgTFJNsenmY8KOIjFykeBz5x6bOtBkbOCjzuogr
 yVgaOW/zV7b0zFQ6Vv9eZFf7WYhBXE7w1Im5D2EMR9mJpNBWbj9A8GQpCjc0+VXB
 jN2hWmj5qQtQW6z67VEn3l8Mqzpazsu61zbcB3F4Ma0m247vakIkk5I8GMmLW3/c
 UMr6RG2IHIaECNdS1Ir1UkBnDCFb7CQq3j0Bh2UynyU9jKGtRZU/P7SdD7LqHsi+
 Q56kdNSbYdRPpjPWmZUA
 =i5qY
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pincontrol fixes from Linus Walleij:
 "A first set of pin control fixes for the v3.15 series:

   - Fix a couple of barnsjukdomar on the Rockchip driver.

   - Remove an idiotic debug print I happened to leave behind in the
     Nomadik driver.

   - Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.

   - Three patches renaming the Broadcom Capri driver to BCM28155.  This
     has been falling between the chairs for some time due to some
     cross-tree synchronization misunderstandings, now I'm fed up with
     this and just rename it in this -rc1 phase"

* tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: fix typo in bindings documentation
  Update bcm_defconfig with new pinctrl CONFIG
  pinctrl: Rename Broadcom Capri pinctrl driver
  pinctrl: msm: Correct interrupt code for TLMM v2
  pinctrl: nomadik: delete stray debug print
  pinctrl: rockchip: handle first half of rk3188-bank0 correctly
  pinctrl: rockchip: add return value to rockchip_set_mux
  pinctrl: rockchip: fix offset of mux registers for rk3188
2014-04-16 15:59:16 -07:00
Linus Torvalds
0f689a33ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "An update to the oops output with additional information about the
  crash.  The renameat2 system call is enabled.  Two patches in regard
  to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp_vt220: Fix kernel panic due to early terminal input
  s390/compat: fix typo
  s390/uaccess: fix possible register corruption in strnlen_user_srst()
  s390: add 31 bit warning message
  s390: wire up sys_renameat2
  s390: show_registers() should not map user space addresses to kernel symbols
  s390/mm: print control registers and page table walk on crash
  s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
  s390: fix control register update
2014-04-16 11:28:25 -07:00
Linus Torvalds
7d38cc0290 Small workaround for a rare, but annoying, erratum
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTTsl0AAoJEKurIx+X31iBlZgP/1B01Zsqbq/Oh2JFRwjnLJbW
 FSMDMoAYHbIoB+4tIvQdQf6vGxPz++oBtJq95CeDeez2wh/BPvGFj/LrdBfYYgHo
 uutrB4c6P0WBOQJnfRcXS/MC4frihzV5IaA+VXAXTIBTVKlEi3PGUBM3/WgUA8QS
 Aga73Q0ze3XCjbtGeqjwE3YYzuiOzELeb4MJbNjH3a40ZkXCCmZcerZT8Zp0O5Qi
 hYZr6OTPmmoXrgX4MeBX7kFcwhHe6zb86TvdRAoPIY8fwPJt/zrhQlqrTfoUzRH5
 xyX3y4ByU7HU0CEk0g8AMiiR6EZNmUBo8onei0xVwRP8Z2u884RKQ1cC6D059ars
 isyMO/7lH6Rn+48OOTiXb0e2NOmejr/TRZ3asrvtfvb9uyMGFkG+6g2/pYxbnTb2
 QC8jknY07PhCB9J738RTllAFX6OFRvhk8b3DtrtXC5GTfIiA7HkdkTQ3HISNTgif
 C5WalPhyjh6iCJbtFtdFg0uwEpXgQ5QX5sN08NQOkuDXiI9obftD7pw5EIPiZCTt
 GIIxXh1KDWiktoOHrFAiIm4bh+CAxNRG10WvbZQj3CY+m9GYcI30h/XFmkoil4ds
 IoqVMan42yNK5AwqSa8L9Eq6t5Za1Uic1OgOofX2iYEEqp84aklUI0m5b8woN+/u
 t0BK8ThF4msbaWlJgKRK
 =lloU
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull itanium erratum fix from Tony Luck:
 "Small workaround for a rare, but annoying, erratum #237"

* tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
2014-04-16 11:22:45 -07:00
Tony Luck
c0b5a64d93 [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
April 2014 Itanium processor specification update:

http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html

describes this erratum:

=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectly

Problem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.

Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.

Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.

Status: See the Summary Table of Changes for the affected steppings.
=========================================================================

[Table of changes essentially lists all models from McKinley to Tukwila]

The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.

Way back in 2008 in commit b704882e70
  [IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.

We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update.  Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).

Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-04-16 10:20:34 -07:00
Ingo Molnar
5be44a6fb1 x86: Remove the PCI reboot method from the default chain
Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-16 08:56:09 +02:00
Linus Torvalds
10ec34fcb1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix BPF filter validation of netlink attribute accesses, from
    Mathias Kruase.

 2) Netfilter conntrack generation seqcount not initialized properly,
    from Andrey Vagin.

 3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
    from Patrick McHardy.

 4) Properly limit MTU over ipv6, from Eric Dumazet.

 5) Fix seccomp system call argument population on 32-bit, from Daniel
    Borkmann.

 6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
    skb->mac_len needs to be used.  From Vlad Yasevich.

 7) We have several cases of using socket based communications to
    implement a tunnel.  For example, some tunnels are encapsulations
    over UDP so we use an internal kernel UDP socket to do the
    transmits.

    These tunnels should behave just like other software devices and
    pass the packets on down to the next layer.

    Most importantly we want the top-level socket (eg TCP) that created
    the traffic to be charged for the SKB memory.

    However, once you get into the IP output path, we have code that
    assumed that whatever was attached to skb->sk is an IP socket.

    To keep the top-level socket being charged for the SKB memory,
    whilst satisfying the needs of the IP output path, we now pass in an
    explicit 'sk' argument.

    From Eric Dumazet.

 8) ping_init_sock() leaks group info, from Xiaoming Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  cxgb4: use the correct max size for firmware flash
  qlcnic: Fix MSI-X initialization code
  ip6_gre: don't allow to remove the fb_tunnel_dev
  ipv4: add a sock pointer to dst->output() path.
  ipv4: add a sock pointer to ip_queue_xmit()
  driver/net: cosa driver uses udelay incorrectly
  at86rf230: fix __at86rf230_read_subreg function
  at86rf230: remove check if AVDD settled
  net: cadence: Add architecture dependencies
  net: Start with correct mac_len in skb_network_protocol
  Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
  cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
  net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
  seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
  qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
  qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
  qlcnic: Fix PVID configuration on eSwitch port.
  qlcnic: Fix max ring count calculation
  qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
  qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
  ...
2014-04-15 20:30:30 -07:00
Steve Wise
6f1d721037 cxgb4: use the correct max size for firmware flash
The wrong max fw size was being used and causing false
"too big" errors running ethtool -f.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 15:50:02 -04:00
Alexander Gordeev
8564ae09e0 qlcnic: Fix MSI-X initialization code
Function qlcnic_setup_tss_rss_intr() might enter endless
loop in case pci_enable_msix() contiguously returns a
positive number of MSI-Xs that could have been allocated.
Besides, the function contains 'err = -EIO;' assignment
that never could be reached. This update fixes the
aforementioned issues.

Cc: Shahed Shaikh <shahed.shaikh@qlogic.com>
Cc: Dept-HSGLinuxNICDev@qlogic.com
Cc: netdev@vger.kernel.org
Cc: linux-pci@vger.kernel.org

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 15:14:19 -04:00
Nicolas Dichtel
54d63f787b ip6_gre: don't allow to remove the fb_tunnel_dev
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.

Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).

Introduced by commit c12b395a46 ("gre: Support GRE over IPv6").

CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 14:56:19 -04:00
Eric Dumazet
aad88724c9 ipv4: add a sock pointer to dst->output() path.
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 13:47:15 -04:00
Ingo Molnar
ad466db5b0 perf/urgent fixes:
Developer stuff:
 . Instead of redirecting flex output, use -o (Cody P Schafer)
 
 . Fix double free in perf test 21 (Adrian Hunter)
 
 Signed-off-by: Jiri Olsa <jolsa@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTTSD1AAoJEPZqUSBWB3s90nQQAL2bMOrANdr5dKtvQ38cnAqg
 gwGnt2L+eFhz3qSUZ3RUKDD3lZALrH/3bvTaNI+QdAjBM6Lup486ksFmXKVQ20uG
 k4ZrLaYWsexgx0hSV9VdylpSJS1xOKd36fuemJTcvGuh+W5TfxE6kzg8F77iwxTD
 QpkGmfkA5n/d+tUJgrQXDqsen2eseIGX+0UG8/YvzM6IXVdqK0kEpIJu+Dzy+8hH
 eV8OD+JbfbBpthSuwzrQiyXuU4nNukPxsVlKtVx91OaBf9ALIveII+HIp9mP3/cO
 pkVqws8xeEc1iZsSETjoHqvaQczbedoJRlkMRjMpnGvsiSsL2MZykw2xAeV2pYxi
 x9jzbY1Uosz7lo92gZHydNPgxKw/rjHPDtWaFk6qoU/bFdlCmyjvgoJ3U+8/ELEA
 Kt+dj/GgIEbBkh9GNtYz7uYYwIN1oio02DsbD3W0lLAjGF9W6yYM6SA1vMwN7xjt
 QAtV2biVqW2CAF1nOW5GDWKLOs0IS2+l9g+9ZoT4CaifYTb6RfwLAnTCrU8/jMIT
 QJ0kI6cxKCbC08aX45eZitOeUVsrsoH3ZQw6nBmnwwj9DBjzql3TVKdhqQsnz6Rd
 BDOOs3/6w8ZG21x0879N41Je4+EhxuOK93Qj9wfJCLcYwPm13uleAvYgSHsqtRTg
 /b9g38/s0sdUdrNjpRvt
 =tiL8
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent

Pull perf/urgent fixes from Jiri Olsa:

  * Instead of redirecting flex output, use -o (Cody P Schafer)

  * Fix double free in perf test 21 (Adrian Hunter)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-15 19:25:07 +02:00
Eric Dumazet
b0270e9101 ipv4: add a sock pointer to ip_queue_xmit()
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.

One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.

Fixes: 31c70d5956 ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 12:58:34 -04:00
Linus Walleij
f6da9fe45c irqchip: vic: Properly chain the cascaded IRQs
We are flagging the parent IRQ as chained, then we must also
make sure to call the chained_irq_[enter|exit] functions for
things to work smoothly.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/1397550484-7119-1-git-send-email-linus.walleij@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-15 18:24:24 +02:00
Cody P Schafer
c9e87a4725 perf tools: Instead of redirecting flex output, use -o
This gives us a real filename instead of having '<stdout>' show up all
over the place when debugging.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1396652539-2416-1-git-send-email-cody@linux.vnet.ibm.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-04-15 13:57:21 +02:00
Adrian Hunter
ae450a7d05 perf tools: Fix double free in perf test 21 (code-reading.c)
perf_evlist__delete() deletes attached cpu and thread maps
but the test is still using them, so remove them from the
evlist before deleting it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/53465E3E.8070201@intel.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-04-15 13:57:14 +02:00
Steven Miao
c26ef3eb3c video: bf54x-lq043fb: fix build error
Fix build error by including linux/gpio.h. Also drop asm/gpio.h which is
not needed.

Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2014-04-15 12:44:16 +03:00
Li, Zhen-Hua
1dd333f470 driver/net: cosa driver uses udelay incorrectly
In cosa driver, udelay with more than 20000 may cause __bad_udelay.
Use msleep for instead.

Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 00:08:22 -04:00
Alexander Aring
2168746cfc at86rf230: fix __at86rf230_read_subreg function
The __at86rf230_read_subreg function don't mask and shift register
contents which it should do. This patch adds the necessary masks and
shift operations in this function.

Since we have csma support this can make some trouble on state changes.
Since CSMA support turned on some bits in the TRX_STATUS register that
used to be zero, not masking broke checking of the TRX_STATUS field
after commanding a state change.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 00:08:22 -04:00
Alexander Aring
bb78864a0c at86rf230: remove check if AVDD settled
The AVDD regulator is only enabled when the RF section is active TX_ON
(PLL_ON) state. Since commit 7dcbd22a97
("ieee802154: ensure that first RF212 state comes from TRX_OFF").
We are in TRX_OFF state at the time at86rf230_hw_init is run.

Note that this test would only fail in case of a severe hardware
malfunction (faulty/shorted power supply, etc.) so it wasn't all that
useful in the first place.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 00:08:22 -04:00
Jean Delvare
ea05df4e8f net: cadence: Add architecture dependencies
The Cadence ethernet chipsets are only used on specific ARM
architectures. Add Kconfig dependencies so that drivers for these
chipsets are only buildable on the relevant architectures.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 00:08:22 -04:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Linus Torvalds
dafe344d22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull bmc2835 crypto fix from Herbert Xu:
 "This fixes a potential boot crash on bcm2835 due to the recent change
  that now causes hardware RNGs to be accessed on registration"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
2014-04-14 16:04:14 -07:00
Mikulas Patocka
e79323bd87 user namespace: fix incorrect memory barriers
smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.

In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
  body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
  instructions in the body of the for loop

The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-14 16:03:02 -07:00
David S. Miller
00cbc3dcd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains three Netfilter fixes for your net tree,
they are:

* Fix missing generation sequence initialization which results in a splat
  if lockdep is enabled, it was introduced in the recent works to improve
  nf_conntrack scalability, from Andrey Vagin.

* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
  disabled otherwise this crashes due to a double release, from Andrey
  Vagin.

* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 19:00:10 -04:00
Vlad Yasevich
1e785f48d2 net: Start with correct mac_len in skb_network_protocol
Sometimes, when the packet arrives at skb_mac_gso_segment()
its skb->mac_len already accounts for some of the mac lenght
headers in the packet.  This seems to happen when forwarding
through and OpenSSL tunnel.

When we start looking for any vlan headers in skb_network_protocol()
we seem to ignore any of the already known mac headers and start
with an ETH_HLEN.  This results in an incorrect offset, dropped
TSO frames and general slowness of the connection.

We can start counting from the known skb->mac_len
and return at least that much if all mac level headers
are known and accounted for.

Fixes: 53d6471cef (net: Account for all vlan headers in skb_mac_gso_segment)
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
Tested-by: Martin Filip <nexus+kernel@smoula.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 18:58:58 -04:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00
Feng Wu
de935ae15b KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:37 -03:00
Feng Wu
e1e746b3c5 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:35 -03:00
Feng Wu
97ec8c067d KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:34 -03:00
Feng Wu
56d6efc2de KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:33 -03:00
Daniel Borkmann
362d52040c Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
This reverts commit ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management
to reflect real state of the receiver's buffer") as it introduced a
serious performance regression on SCTP over IPv4 and IPv6, though a not
as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.

Current state:

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
      Cookie: Lab200slot2.1397238981.812898.548918
[  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
[  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
[  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
[  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
[  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
[  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
[  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
[  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
[  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
[  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
[  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
[  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
[  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
[  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
(etc)

[root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
      Cookie: Lab200slot2.1397243321.714295.2b3f7c
[  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
[  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
[  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
[  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
[  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
[  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
[  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
[  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
[  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
[  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
[  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
[  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
(etc)

After patch:

[root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1397493648.413274.65e131
[  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
[  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
[  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
[  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec

With the reverted patch applied, the SCTP/IPv4 performance is back
to normal on latest upstream for IPv4 and IPv6 and has same throughput
as 3.4.2 test kernel, steady and interval reports are smooth again.

Fixes: ef2820a735 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
Reported-by: Peter Butler <pbutler@sonusnet.com>
Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Peter Butler <pbutler@sonusnet.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:48 -04:00
Steve Wise
bfae232499 cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
Hardware needs the local device mac address to support hw loopback for
rdma loopback connections.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:48 -04:00
Daniel Borkmann
8c482cdc35 net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
been wrongly decoded by commit a8fc927780 ("sk-filter: Add ability to
get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.

In practice, this should not have much side-effect though, as such
conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
operation PR_GET_SECCOMP will only return the current seccomp mode, but
not the filter itself. Since the transition to the new BPF infrastructure,
it's also not used anymore, so we can simply remove this as it's
unreachable.

Fixes: a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:47 -04:00
Daniel Borkmann
2eac764832 seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:

  audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000

  audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000

These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.

The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:

  struct seccomp_data {
    int nr;
    __u32 arch;
    __u64 instruction_pointer;
    __u64 args[6];
  };

Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:

  syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 16:26:47 -04:00
David S. Miller
14ed4a5bcb Merge branch 'qlcnic'
Shahed Shaikh says:

====================
qlcnic: Bug fixes

This patch series contains following bug fixes -

* Send INIT_NIC_FUNC mailbox command as first mailbox
* Fix a panic because of uninitialized delayed_work.
* Fix inconsistent calculation of max rings count.
* Fix PVID configuration issue. Driver needs to clear older
  PVID before adding new one.
* Fix QLogic application/driver interface by packing vNIC information
  array.
* Fix a crash when user tries to disable SR-IOV while VFs are
  still assigned to VMs.

Please apply to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:58 -04:00
Manish Chopra
696f1943a1 qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
o While disabling SR-IOV when VFs are assigned to VMs causes host crash
  so return -EPERM when user request to disable SR-IOV using pci sysfs in
  case of VFs are assigned to VMs.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:53 -04:00
Jitendra Kalsaria
4f03022777 qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
o Application expect vNIC number as the array index but driver interface
return configuration in array index form.

o Pack the vNIC information array in the buffer such that application can
access it using vNIC number as the array index.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Jitendra Kalsaria
a78b6da89f qlcnic: Fix PVID configuration on eSwitch port.
Clear older PVID before adding a newer PVID to the eSwicth port

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00
Shahed Shaikh
7b546842b1 qlcnic: Fix max ring count calculation
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
provided and driver defined values.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14 13:43:52 -04:00