Commit graph

9286 commits

Author SHA1 Message Date
Oleg Nesterov
02be46fba4 ptrace/x86: revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit 87dc669ba2 ("hw_breakpoints: Fix racy access to
ptrace breakpoints").

The patch was fine but we can no longer race with SIGKILL after commit
9899d11f65 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints and
does a couple of "while at it" cleanups, it doesn't remove other changes
from the reverted commit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:25 -07:00
Linus Torvalds
21884a83b2 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
 "The timer changes contain:

   - posix timer code consolidation and fixes for odd corner cases

   - sched_clock implementation moved from ARM to core code to avoid
     duplication by other architectures

   - alarm timer updates

   - clocksource and clockevents unregistration facilities

   - clocksource/events support for new hardware

   - precise nanoseconds RTC readout (Xen feature)

   - generic support for Xen suspend/resume oddities

   - the usual lot of fixes and cleanups all over the place

  The parts which touch other areas (ARM/XEN) have been coordinated with
  the relevant maintainers.  Though this results in an handful of
  trivial to solve merge conflicts, which we preferred over nasty cross
  tree merge dependencies.

  The patches which have been committed in the last few days are bug
  fixes plus the posix timer lot.  The latter was in akpms queue and
  next for quite some time; they just got forgotten and Frederic
  collected them last minute."

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  hrtimer: Remove unused variable
  hrtimers: Move SMP function call to thread context
  clocksource: Reselect clocksource when watchdog validated high-res capability
  posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  posix_timers: fix racy timer delta caching on task exit
  posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
  selftests: add basic posix timers selftests
  posix_cpu_timers: consolidate expired timers check
  posix_cpu_timers: consolidate timer list cleanups
  posix_cpu_timer: consolidate expiry time type
  tick: Sanitize broadcast control logic
  tick: Prevent uncontrolled switch to oneshot mode
  tick: Make oneshot broadcast robust vs. CPU offlining
  x86: xen: Sync the CMOS RTC as well as the Xen wallclock
  x86: xen: Sync the wallclock when the system time is set
  timekeeping: Indicate that clock was set in the pvclock gtod notifier
  timekeeping: Pass flags instead of multiple bools to timekeeping_update()
  xen: Remove clock_was_set() call in the resume path
  hrtimers: Support resuming with two or more CPUs online (but stopped)
  timer: Fix jiffies wrap behavior of round_jiffies_common()
  ...
2013-07-06 14:09:38 -07:00
Linus Torvalds
2cb7b5a38c irqdomain refactoring for v3.11
This is the long awaited simplification of irqdomain. It gets rid of the
 different types of irq domains and instead both linear and tree mappings
 can be supported in a single domain. Doing this removes a lot of special
 case code and makes irq domains simpler to understand overall.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR1mnQAAoJEEFnBt12D9kBW7gP/jQpLqFU6PTdsCMX459Ib7fE
 JGLQWYjAsqVbVT3H8LZYOZl+ItIbXSQVk+wf0mislnOzxtGxNCxZeJdniFK2RWyz
 sGNm/eEZfp9bvm9jlrPE615QE7U7c9fQ5dqETNS4UMMFFmjgrsY4Z82zYH2/Y/jK
 YbxFUT6+fep5QaF7lYM7phfrs7FrvnpCoxCio6LO+GgxnMxHSsGJ8L9JcojZ5Qoa
 MwlSwsMaQrWJ6PNVbB2YrtlMxjHkNpQ7EV7bNF693PjACVpXf3ItpvQ1k6W/ns06
 j6Nysj4nbjsTdWOVh5mY7tPfknZUDs38rLT5s+RbfHqySMIGQXp6CzieYaJ+lxV9
 YsItyfZE7h3anebjwRv/9XqLNsDM4KmSIJGcnq+QxNSGxO3rXkItWtpCKaqYdWnN
 wI359rSZ//x2ltFzvB14mZJm2j7r9Hh7gH8M5h8W8YTqCbe3oSES3iRhJqnyPxMX
 of4PsQaTAicuLwberGdVkKafg9XIsHPoebBzDxssB4nxPrEKIILL4g54x1rV17TZ
 kBXurd79FhP4gOQxJHf5uEpdxrQ3Cm/PQpD5CG2rZJrrcV7hfP4QMMFHqqbFSIOu
 2h4M4fZX2gsDh3RLboGRGn0GckSPTHiO05yeeBriLzMCVGcZ1JMqAxLUSJpH/UYl
 EzFy9RpwnwLoUPi8jUXE
 =yJRw
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux

Pull irqdomain refactoring from Grant Likely:
 "This is the long awaited simplification of irqdomain.  It gets rid of
  the different types of irq domains and instead both linear and tree
  mappings can be supported in a single domain.  Doing this removes a
  lot of special case code and makes irq domains simpler to understand
  overall"

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
  irq: fix checkpatch error
  irqdomain: Include hwirq number in /proc/interrupts
  irqdomain: make irq_linear_revmap() a fast path again
  irqdomain: remove irq_domain_generate_simple()
  irqdomain: Refactor irq_domain_associate_many()
  irqdomain: Beef up debugfs output
  irqdomain: Clean up aftermath of irq_domain refactoring
  irqdomain: Eliminate revmap type
  irqdomain: merge linear and tree reverse mappings.
  irqdomain: Add a name field
  irqdomain: Replace LEGACY mapping with LINEAR
  irqdomain: Relax failure path on setting up mappings
2013-07-06 12:37:04 -07:00
Thomas Gleixner
2b0f89317e Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04 23:11:22 +02:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Linus Torvalds
7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Jiang Liu
46a841329a mm/x86: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:38 -07:00
Linus Torvalds
f991fae5c6 Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail
   gracefully (instead of crashing the kernel) if they cannot be
   carried out completely.  From Rafael J Wysocki and Toshi Kani.
 
 - Freezer update from Colin Cross and Mandeep Singh Baines targeted
   at making the freezing of tasks a bit less heavy weight operation.
 
 - cpufreq resume fix from Srivatsa S Bhat for a regression introduced
   during the 3.10 cycle causing some cpufreq sysfs attributes to
   return wrong values to user space after resume.
 
 - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
   provide information previously available via related_cpus from
   Lan Tianyu.
 
 - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
   Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
   Tang Yuantian.
 
 - Fix for an ACPICA regression causing suspend/resume issues to
   appear on some systems introduced during the 3.4 development cycle
   from Lv Zheng.
 
 - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
   Chao Guan, and Zhang Rui.
 
 - New cupidle driver for Xilinx Zynq processors from Michal Simek.
 
 - cpuidle fixes and cleanups from Daniel Lezcano.
 
 - Changes to make suspend/resume work correctly in Xen guests from
   Konrad Rzeszutek Wilk.
 
 - ACPI device power management fixes and cleanups from Fengguang Wu
   and Rafael J Wysocki.
 
 - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
 
 - Fix for the IA-64 issue that was the reason for reverting commit
   9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
 
 - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
   (to allow some EC-related breakage to be fixed on some systems).
 
 - Spec-compliant implementation of acpi_os_get_timer() from
   Mika Westerberg.
 
 - Modification of do_acpi_find_child() to execute _STA in order to
   to avoid situations in which a pointer to a disabled device object
   is returned instead of an enabled one with the same _ADR value.
   From Jeff Wu.
 
 - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
   Intel Low-Power Subsystems (LPSS) driver and modificaions of that
   driver to work around a couple of known BIOS issues from
   Mika Westerberg and Heikki Krogerus.
 
 - EC driver fix from Vasiliy Kulikov to make it use get_user() and
   put_user() instead of dereferencing user space pointers blindly.
 
 - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
   Toshi Kani.
 
 - Modification of the "runtime idle" helper routine to take the return
   values of the callbacks executed by it into account and to call
   rpm_suspend() if they return 0, which allows some code bloat
   reduction to be done, from Rafael J Wysocki and Alan Stern.
 
 - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
 
 - PM QoS documentation update from Lan Tianyu.
 
 - Assorted core PM code cleanups and changes from Bernie Thompson,
   Bjorn Helgaas, Julius Werner, and Shuah Khan.
 
 - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
 
 - Minor devfreq cleanups, fixes and MAINTAINERS update from
   MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
   Wei Yongjun.
 
 - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
   driver updates from Andrii Tseglytskyi and Nishanth Menon.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
 g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
 wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
 VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
 CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
 fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
 R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
 zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
 rHBZfYGkJBrbGRu+Q1gs
 =VBBq
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds
a9f4a7005f Thermal limit warnings are too scary and cause unnecessary concern
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
 w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
 Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
 PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
 bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
 oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
 74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
 nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
 EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
 bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
 C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
 z+z25N5hi4yoUF/+1vc/
 =Dm06
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull thermal power-limit update from Tony Luck:
 "Thermal limit warnings are too scary and cause unnecessary concern"

* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  x86 thermal: Disable power limit notification interrupt by default
  x86 thermal: Delete power-limit-notification console messages
2013-07-03 11:16:09 -07:00
Linus Torvalds
7c6809ff2b Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV update from Ingo Molnar:
 "There's a single commit in this tree, which adds support for a new SGI
  UV GRU (Global Reference Unit - fast NUMA messaging ASIC) hardware
  feature to scale up and beyond: an optional distributed mode that will
  allow per-node address mapping of local GRU space, as opposed to
  mapping all GRU hardware to the same contiguous high space"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Add GRU distributed mode mappings
2013-07-02 16:33:05 -07:00
Linus Torvalds
96a3d998fb Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
 "This tree adds IRQ vector tracepoints that are named after the handler
  and which output the vector #, based on a zero-overhead approach that
  relies on changing the IDT entries, by Seiji Aguchi.

  The new tracepoints look like this:

   # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
   [...]"

* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add config option checking to the definitions of mce handlers
  trace,x86: Do not call local_irq_save() in load_current_idt()
  trace,x86: Move creation of irq tracepoints from apic.c to irq.c
  x86, trace: Add irq vector tracepoints
  x86: Rename variables for debugging
  x86, trace: Introduce entering/exiting_irq()
  tracing: Add DEFINE_EVENT_FN() macro
2013-07-02 16:31:49 -07:00
Linus Torvalds
3045f94a20 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this tree are:

   - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
     Gong
   - misc MCE fixes/cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Update MCE severity condition check
  mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
  ACPI/APEI: Update einj documentation for param1/param2
  ACPI/APEI: Add parameter check before error injection
  ACPI, APEI, EINJ: Fix error return code in einj_init()
  x86, mce: Fix "braodcast" typo
2013-07-02 16:30:46 -07:00
Linus Torvalds
1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds
fdd78889aa Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading update from Ingo Molnar:
 "Two main changes that improve microcode loading on AMD CPUs:

   - Add support for all-in-one binary microcode files that concatenate
     the microcode images of multiple processor families, by Jacob Shin

   - Add early microcode loading (embedded in the initrd) support, also
     by Jacob Shin"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, amd: Another early loading fixup
  x86, microcode, amd: Allow multiple families' bin files appended together
  x86, microcode, amd: Make find_ucode_in_initrd() __init
  x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
  x86, microcode, amd: Early microcode patch loading support for AMD
  x86, microcode, amd: Refactor functions to prepare for early loading
  x86, microcode: Vendor abstract out save_microcode_in_initrd()
  x86, microcode, intel: Correct typo in printk
2013-07-02 16:28:10 -07:00
Linus Torvalds
d652df0b2f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU changes from Ingo Molnar:
 "There are two bigger changes in this tree:

   - Add an [early-use-]safe static_cpu_has() variant and other
     robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
     configurable debugging facility, motivated by recent obscure FPU
     code bugs, by Borislav Petkov

   - Reimplement FPU detection code in C and drop the old asm code, by
     Peter Anvin."

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Use static_cpu_has_safe before alternatives
  x86: Add a static_cpu_has_safe variant
  x86: Sanity-check static_cpu_has usage
  x86, cpu: Add a synthetic, always true, cpu feature
  x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02 16:26:44 -07:00
Linus Torvalds
55a0d3ff60 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "Misc debuggability improvements:

   - Optimize the x86 CPU register printout a bit
   - Expose the tboot TXT log via debugfs
   - Small do_debug() cleanup"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tboot: Provide debugfs interfaces to access TXT log
  x86: Remove weird PTR_ERR() in do_debug
  x86/debug: Only print out DR registers if they are not power-on defaults
2013-07-02 16:25:06 -07:00
Linus Torvalds
35c23d5d79 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes:

   - Extend 32-bit double fault debugging aid to 64-bit
   - Fix a build warning"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up last long-standing warning
  x86: Extend #DF debugging aid to 64-bit
2013-07-02 16:24:24 -07:00
Linus Torvalds
57935b262c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc x86 cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
  x86, cleanups: Remove extra tab in __flush_tlb_one()
  x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
2013-07-02 16:23:50 -07:00
Linus Torvalds
002e44bfb5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull asm/x86 changes from Ingo Molnar:
 "Misc changes, with a bigger processor-flags cleanup/reorganization by
  Peter Anvin"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asm, cleanup: Replace open-coded control register values with symbolic
  x86, processor-flags: Fix the datatypes and add bit number defines
  x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  linux/const.h: Add _BITUL() and _BITULL()
  x86/vdso: Convert use of typedef ctl_table to struct ctl_table
  x86: __force_order doesn't need to be an actual variable
2013-07-02 16:21:45 -07:00
Linus Torvalds
f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Rafael J. Wysocki
3b4550e0e0 Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Rework and clean up acpi_dev_pm_get_state()
  ACPI / PM: Replace ACPI_STATE_D3 with ACPI_STATE_D3_COLD in device_pm.c
  ACPI / PM: Rename function acpi_device_power_state() and make it static
  ACPI / PM: acpi_processor_suspend() can be static
  xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
  x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
2013-06-28 12:58:30 +02:00
Ingo Molnar
fb476cffd5 Changes to simplify the SDM mean that we can also simplify
the code for SRAR (software recoverable action required) errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzKq7AAoJEKurIx+X31iBviQP+gJ6b9BemXj8/TLjWN3P3mAB
 bQHU9imlSXmggfauC1C4/b60k4G/D8d1hN1Lboak7TbFvY69ONcSodY9w6O9Lknb
 r83xT0oUIe5IYWqC42+godt8P59ed10lUa8F5eaA9DhhzcsCWO2GG/ITqCmyjEom
 zJGKlLytq6ViJf+6ig3gTmWgHU4tvlJvOPh/O0WNWXH9kbOtnsLT8ImY9GMO/R1B
 LdItg2nxbDZmhjLOsKamaj2YpPbZ9Vtvam9CnQwirbUAkdGZfmRsq+K9y8O5MjHJ
 GRqOKwQ3BdKBUIEu36BxgIGHSg8hWYSefGD8nz8IP+QHwy5fx7SREdgx5M+AMSFt
 8DewSUij2wMwH/vuP4KrdYWva6LNMBtR+NItCkzpdY5ykcdoSzo06nCotGcUrxVN
 iEmYywZzO3CTYJj0lWqdrhhiFln/9ZrhXaWUbRoX5m3HAQK7XeTPT+3O7Vs5fWDN
 3Fb/0EzyJjmv+Mysl8/PvR9umJs3ELK5MiVLIAIkuozh8+L33byk/WX+VOfqTsY4
 KXYDm6fck3tRfn+PJGrAMSwIEyhgVJArRJC39PydEUIwjlel6DRU0Rs26wQXtFh8
 DHxTIiEubmvctvi3DnHOztOI3d3EkAayU6Dk4cNSy46FwujJC0v+07EQav6BLA3e
 x1HK+3PVVSQYuqFIeyWa
 =PJXS
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE cleanup from Tony Luck:

 "Changes to simplify the SDM means that we can also simplify
  the code for SRAR (software recoverable action required) errors."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:09:26 +02:00
Qiaowei Ren
13bfd47a0e x86/tboot: Provide debugfs interfaces to access TXT log
These logs come from tboot (Trusted Boot, an open source,
pre-kernel/VMM module that uses Intel TXT to perform a
measured and verified launch of an OS kernel/VMM.).

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gang Wei <gang.wei@intel.com>
Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com
[ Beautified the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:05:16 +02:00
Chen Gong
33d7885b59 x86/mce: Update MCE severity condition check
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27 13:45:52 -07:00
Jacob Shin
9608d33b82 x86, microcode, amd: Another early loading fixup
commit cd1c32ca96 is an early premature
rendition of the patch. Augment it with this delta patch to:
  * correctly mark offset and size of the matching bin file
  * use __pa instead of __pa_nodebug during AP load
  * check for !initrd_start before using it

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130620152414.GA6676@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-26 14:55:37 -07:00
Stephane Eranian
983433b581 perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:51 +02:00
Stephane Eranian
2f7f73a520 perf/x86: Fix shared register mutual exclusion enforcement
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:49 +02:00
Linus Torvalds
54faf77d06 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  kprobes: Fix arch_prepare_kprobe to handle copy insn failures
2013-06-26 08:51:44 -10:00
Andi Kleen
069e0c3c40 perf/x86/intel: Support full width counting
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 11:59:25 +02:00
Ingo Molnar
ca02c21674 Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2
 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F
 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w
 fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y
 +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN
 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo
 W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y
 A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC
 yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi
 m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6
 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I
 zXjnZDNQThAgBswtNY1s
 =U/Kg
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
H. Peter Anvin
a3d7b7dddc x86, asm, cleanup: Replace open-coded control register values with symbolic
Clean up an unnecessary open-coded control register values.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
1adfa76a95 x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
2013-06-25 16:25:32 -07:00
Naveen N. Rao
0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Yinghai Lu
d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
H. Peter Anvin
5236eb968e Merge remote-tracking branch 'trace/tip/x86/trace' into x86/trace
Fix from Steven Rostedt.
2013-06-24 11:01:09 -07:00
Grant Likely
ddaf144c61 irqdomain: Refactor irq_domain_associate_many()
Originally, irq_domain_associate_many() was designed to unwind the
mapped irqs on a failure of any individual association. However, that
proved to be a problem with certain IRQ controllers. Some of them only
support a subset of irqs, and will fail when attempting to map a
reserved IRQ. In those cases we want to map as many IRQs as possible, so
instead it is better for irq_domain_associate_many() to make a
best-effort attempt to map irqs, but not fail if any or all of them
don't succeed. If a caller really cares about how many irqs got
associated, then it should instead go back and check that all of the
irqs is cares about were mapped.

The original design open-coded the individual association code into the
body of irq_domain_associate_many(), but with no longer needing to
unwind associations, the code becomes simpler to split out
irq_domain_associate() to contain the bulk of the logic, and
irq_domain_associate_many() to be a simple loop wrapper.

This patch also adds a new error check to the associate path to make
sure it isn't called for an irq larger than the controller can handle,
and adds locking so that the irq_domain_mutex is held while setting up a
new association.

v3: Fixup missing change to irq_domain_add_tree()
v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
    error code, but reports errors to the printk log directly. In the
    majority of cases we don't actually want to fail if there is a
    problem, but rather log it and still try to boot the system.

Signed-off-by: Grant Likely <grant.likely@linaro.org>

irqdomain: Fix flubbed irq_domain_associate_many refactoring

commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
missing the following hunk which causes a boot failure on anything using
irq_domain_add_tree() to allocate an irq domain.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2013-06-24 14:01:42 +01:00
Dave Hansen
0c4df02d73 x86: Add NMI duration tracepoints
This patch has been invaluable in my adventures finding
issues in the perf NMI handler.  I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.

Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:58 +02:00
Dave Hansen
14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Dave Hansen
2ab00456ea x86: Warn when NMI handlers take large amounts of time
I have a system which is causing all kinds of problems.  It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.

If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs.  The biggest problem, though, is that this happens
_silently_.  You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.

This patch should at least give us some warning before we fall
off the cliff.  the warnings look like this:

	nmi_handle: perf_event_nmi_handler() took: 26095071 ns

The message is triggered whenever we notice the longest NMI
we've seen to date.  You can always view and reset this value
via the debugfs interface if you like.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:56 +02:00
Seiji Aguchi
33e5ff634f x86/tracing: Add config option checking to the definitions of mce handlers
In case CONFIG_X86_MCE_THRESHOLD and CONFIG_X86_THERMAL_VECTOR
are disabled, kernel build fails as follows.

   arch/x86/built-in.o: In function `trace_threshold_interrupt':
   (.entry.text+0x122b): undefined reference to `smp_trace_threshold_interrupt'
   arch/x86/built-in.o: In function `trace_thermal_interrupt':
   (.entry.text+0x132b): undefined reference to `smp_trace_thermal_interrupt'

In this case, trace_threshold_interrupt/trace_thermal_interrupt
are not needed to define.

So, add config option checking to their definitions in entry_64.S.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/51C58B8A.2080808@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:41:36 +02:00
Steven Rostedt (Red Hat)
2b4bc78956 trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 13:16:19 -04:00
Linus Torvalds
f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Linus Torvalds
9d0be540d7 KVM fixes for 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRwc4AAAoJEBvWZb6bTYbyT/wQAIAvvjM+jlqOsZCUjnutdWXf
 VGvJB1jXPEa6cuUkaE3RYO8rLLJz/PC4Geg8I/awtjeSkPvldeu0FihSkdbrNkpJ
 3dSAJ0cR8+ScUZB74Lt1pfwwEAdZk61MU1opD1qy3wWTy5Rk2xp4qzE2aCVJoLkh
 K8gscLrCHR47DELcm+AWCtHt/0VOZSAVe9z/7Qf9eAMCNmH/efHgrZZTxF/1aBkI
 7XaZdqT+d84D/Bc2isdRNvFYt9rkuII2GCeRciw2t3QsgVoBXn7FVK2OXbufIaYW
 /XX0YiNpSnUpcpOtGw2MwzpKgDSRE9QbnOUVsWThTDeG7Q5d72ioCFuRDba8p3Lc
 u146nQM495bAdJQz6IW3JCeiZlQ86/QjFquk9hJuCwQpuHmlRpVEqFbLVnZEnYvk
 lc9dnK7BcaftbMgo/j6DMv6Bpq/EDp1JrPXzNT4BwG7ptArEXzmdoktwpONjrZzI
 1oZV5xCcNhTjyqlndiZKxneKXoqz/3NOdR+EV0yh3+0l4PK6C52jd6PaKRv7qN3l
 Lt35uFrmKqzOR12KVStYREyh1Sy43ADE4tIIPnOBbcXz23Df4/vVSb1Bqv0nPq4i
 JhsRsGoR8ZdDBd7c40XlvpxOiMmubFLSklX0WaKn2yIZH9FMj63Ak0wet7qhaqb8
 lbJRUH8gnK0RCVEoUFpB
 =yGPS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three one-line fixes for my first pull request; one for x86 host, one
  for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: kvmclock: zero initialize pvclock shared memory area
  kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
  KVM: x86: remove vcpu's CPL check in host-invoked XCR set
2013-06-21 06:29:22 -10:00
Steven Rostedt (Red Hat)
83ab85140b trace,x86: Move creation of irq tracepoints from apic.c to irq.c
Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
compiled, and the irq tracepoints will not be created via the
CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
we get the following build error:

  LD      init/built-in.o
arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o: In function `trace_irq_work_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o: In function `trace_irq_work_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

As irq.c is always compiled for x86, it is a more appropriate location
to create the irq tracepoints.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-21 10:33:28 -04:00
Seiji Aguchi
cf910e83ae x86, trace: Add irq vector tracepoints
[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:34 -07:00
Seiji Aguchi
629f4f9d59 x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.

Also, introduce a generic way to switch IDT by checking a current state,
debug on/off.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:13 -07:00
Seiji Aguchi
eddc0e922a x86, trace: Introduce entering/exiting_irq()
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.

To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.

So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.

A way to use them is as follows.

Non-trace irq handler:
smp_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	exiting_irq();		/* post-processing of this handler */

}

Trace irq_handler:
smp_trace_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	trace_irq_entry();	/* tracepoint for irq entry */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	trace_irq_exit();	/* tracepoint for irq exit */
	exiting_irq();		/* post-processing of this handler */

}

If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.

smp_trace_irq_handler()
{
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
}

But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use  rcu to synchronize.

As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.

smp_trace_irq_hander()
{
	irq_entry();
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
	irq_exit();
}

But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:01 -07:00
H. Peter Anvin
f037e416af x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
There is no point in using "xorq" to clear a register... use "xorl" to
clear the bottom 32 bits, and the upper 32 bits get cleared by virtue
of zero extension.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/n/tip-b76zi1gep39c0zs8fbvkhie9@git.kernel.org
2013-06-20 21:30:04 -07:00
H. Peter Anvin
e6bca5a6a8 Linux 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRvOHmAAoJEHm+PkMAQRiGpOYIAI/OMDjCMro8S1FjDPGr+wsz
 r/u4h4DiHQTdAUSBYhksgXVBSm02hGhDF3tz7u9oAXuv+4zlGUMZNjmEmLOFfdyd
 Ve9vqMrUkdwA0jt0d7AYVjtCy/j/yVe5szXZCksHXOZiyb78SEZPQgHc13CJ4lKI
 K2jPk0sMko7d5ptALPIX+ome48M+3w3k1HuQ62Znm4pFSADcFGpGdf40HP/5LL+I
 Llp3XHJ5OZrcHRal0t39oHJOCWW61bnYXTWel9J7XoUztebF2cRgFydAhtrto42z
 x7ELOKVY197WH8l56cnVHrISneQd4kgWrooxLYy6QsJl73/qvQBX+7nmkoes7so=
 =qeV0
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc6' into x86/cleanups

Linux 3.10-rc6

We need a change that is the mainline tree for further work.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 21:13:55 -07:00
Borislav Petkov
4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00