Commit graph

5041 commits

Author SHA1 Message Date
Ingo Molnar
9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Hiroshi Shimamoto
20a4a236c7 x86: uaccess_64: fix return value in __copy_from_user()
__copy_from_user() will return invalid value 16 when it fails to
access user space and the size is 10.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 22:28:58 +01:00
Steve Conklin
093bac154c x86: quirk for reboot stalls on a Dell Optiplex 330
Dell Optiplex 330 appears to hang on reboot. This is resolved by adding
a quirk to set bios reboot.

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Steve Conklin <steve.conklin@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 22:22:29 +01:00
Ingo Molnar
73f56c0d35 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-11-18 16:48:49 +01:00
Philipp Kohlbecher
0af40a4b10 x86: more general identifier for Phoenix BIOS
Impact: widen the reach of the low-memory-protect DMI quirk

Phoenix BIOSes variously identify their vendor as "Phoenix Technologies,
LTD" or "Phoenix Technologies LTD" (without the comma.)

This patch makes the identification string in the bad_bios_dmi_table
more general (following a suggestion by Ingo Molnar), so that both
versions are handled.

Again, the patched file compiles cleanly and the patch has been tested
successfully on my machine.

Signed-off-by: Philipp Kohlbecher <xt28@gmx.de>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:11:36 +01:00
Joerg Roedel
8501c45cc3 AMD IOMMU: check for next_bit also in unmapped area
Impact: fix possible use of stale IO/TLB entries

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:43 +01:00
Joerg Roedel
695b5676c7 AMD IOMMU: fix fullflush comparison length
Impact: fix comparison length for 'fullflush'

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:42 +01:00
Joerg Roedel
3ce1f93c6d AMD IOMMU: enable device isolation per default
Impact: makes device isolation the default for AMD IOMMU

Some device drivers showed double-free bugs of DMA memory while testing
them with AMD IOMMU. If all devices share the same protection domain
this can lead to data corruption and data loss. Prevent this by putting
each device into its own protection domain per default.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:31 +01:00
Joerg Roedel
e5e1f606ec AMD IOMMU: add parameter to disable device isolation
Impact: add a new AMD IOMMU kernel command line parameter

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:43:23 +01:00
Ingo Molnar
10db4ef7b9 x86, PEBS/DS: fix code flow in ds_request()
this compiler warning:

  arch/x86/kernel/ds.c: In function 'ds_request':
  arch/x86/kernel/ds.c:368: warning: 'context' may be used uninitialized in this function

Shows that the code flow in ds_request() is buggy - it goes into
the unlock+release-context path even when the context is not allocated
yet.

First allocate the context, then do the other checks.

Also, take care with GFP allocations under the ds_lock spinlock.

Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 15:34:36 +01:00
Frederic Weisbecker
0231022cc3 tracing/function-return-tracer: add the overrun field
Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:11:00 +01:00
Venki Pallipadi
93ce99e849 x86: add rdtsc barrier to TSC sync check
Impact: fix incorrectly marked unstable TSC clock

Patch (commit 0d12cdd "sched: improve sched_clock() performance") has
a regression on one of the test systems here.

With the patch, I see:

 checking TSC synchronization [CPU#0 -> CPU#1]:
 Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
 Marking TSC unstable due to check_tsc_sync_source failed

Whereas, without the patch syncs pass fine on all CPUs:

 checking TSC synchronization [CPU#0 -> CPU#1]: passed.

Due to this, TSC is marked unstable, when it is not actually unstable.
This is because syncs in check_tsc_wrap() goes away due to this commit.

As per the discussion on this thread, correct way to fix this is to add
explicit syncs as below?

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 00:15:02 +01:00
Eric Dumazet
a4a16beade oprofile: fix an overflow in ppro code
reset_value was changed from long to u64 in commit
b991702884 (oprofile: Implement Intel
architectural perfmon support)

But dynamic allocation of this array use a wrong type (long instead of
u64)

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-11-17 18:47:36 +01:00
Yinghai Lu
d3c6aa1e69 x86: fix es7000 compiling
Impact: fix es7000 build

  CC      arch/x86/kernel/es7000_32.o
arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size
arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory
arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table
make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1

we applied one patch out of order...

| commit a73aaedd95
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date:   Sun Sep 14 02:33:14 2008 -0700
|
|    x86: check dsdt before find oem table for es7000, v2
|
|    v2: use __acpi_unmap_table()

that patch need:

	x86: use early_ioremap in __acpi_map_table
	x86: always explicitly map acpi memory
	acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap
	acpi/x86: introduce __apci_map_table, v4

submitted to the ACPI tree but not upstream yet.

fix it until those patches applied, need to revert this one

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 10:05:07 +01:00
Markus Metzger
d1f1e9c010 x86, bts: fix unlock problem in ds.c
Fix a problem where ds_request() returned an error without releasing the
ds lock.

Reported-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 08:25:36 +01:00
Frederic Weisbecker
e7d3737ea1 tracing/function-return-tracer: support for dynamic ftrace on function return tracer
This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).

Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.

An example of trace with getnstimeofday set as a filter:

ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:57:38 +01:00
Frederic Weisbecker
b01c746617 tracing/function-return-tracer: add a barrier to ensure return stack index is incremented in memory
Impact: fix possible race condition in ftrace function return tracer

This fixes a possible race condition if index incrementation
is not immediately flushed in memory.

Thanks for Andi Kleen and Steven Rostedt for pointing out this issue
and give me this solution.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:57:37 +01:00
Steven Rostedt
31e889098a ftrace: pass module struct to arch dynamic ftrace functions
Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

  ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
       a nop, where the original text is calling addr. (mod is the
       module struct if called by module init)

  ftrace_make_caller(rec, addr) - convert the code rec->ip that should
       be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:36:02 +01:00
David Woodhouse
52168e60f7 Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"
This reverts commit e51af66308, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.

The better fix is commit cbda1ba898
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 11:37:16 -08:00
Ingo Molnar
24de38620d Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/function-return-tracer' and 'tracing/urgent' into tracing/core 2008-11-13 09:48:03 +01:00
Rafael J. Wysocki
604d205548 x86: make NUMA on 32-bit depend on EXPERIMENTAL again
My previous patch to make CONFIG_NUMA on x86_32 depend on BROKEN
turned out to be unnecessary, after all, since the source of the
hibernation vs CONFIG_NUMA problem turned out to be the fact that
we didn't take the NUMA KVA remapping into account in the
hibernation code.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:52 +01:00
Rafael J. Wysocki
97a70e548b x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
Impact: fix crash during hibernation on 32-bit NUMA

The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory.  For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory.  As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.

Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present.  Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:51 +01:00
Frederic Weisbecker
1dc1c6adf3 tracing/function-return-tracer: call prepare_ftrace_return by registers
Impact: Optimize a bit the function return tracer

This patch changes the calling convention of prepare_ftrace_return to
pass its arguments by register. This will optimize it a bit and
prepare it to support dynamic tracing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:15:43 +01:00
Frederic Weisbecker
62d59d17a5 tracing/function-return-tracer: make the function return tracer lockless
Impact: remove spinlocks and irq disabling in function return tracer.

I've tried to figure out all of the race condition that could happen
when the tracer pushes or pops a return address trace to/from the
current thread_info.

Theory:

_ One thread can only execute on one cpu at a time. So this code
  doesn't need to be SMP-safe. Just drop the spinlock.

_ The only race could happen between the current thread and an
  interrupt. If an interrupt is raised, it will increase the index of
  the return stack storage and then execute until the end of the
  tracing to finally free the index it used. We don't need to disable
  irqs.

This is theorical. In practice, I've tested it with a two-core SMP and
had no problem at all. Perhaps -tip testing could confirm it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:15:43 +01:00
Steven Rostedt
2ed84eeb88 trace: rename unlikely profiler to branch profiler
Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:27:58 +01:00
Linus Torvalds
5d2007ebc2 Merge branch 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Fix pit memory leak if unable to allocate irq source id
  KVM: ia64: fix vmm_spin_{un}lock for !CONFIG_SMP
  KVM: VMX: Set IGMT bit in EPT entry
  KVM: Require the PCI subsystem
  x86: KVM guest: fix section mismatch warning in kvmclock.c
  KVM: ia64: Use guest signal mask when blocking
  KVM: MMU: increase per-vcpu rmap cache alloc size
2008-11-12 10:38:42 -08:00
Linus Torvalds
08c1184fa2 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (47 commits)
  ACPI: pci_link: remove acpi_irq_balance_set() interface
  fujitsu-laptop: Add DMI callback for Lifebook S6420
  ACPI: EC: Don't do transaction from GPE handler in poll mode.
  ACPI: EC: lower interrupt storm treshold
  ACPICA: Use spinlock for acpi_{en|dis}able_gpe
  ACPI: EC: restart failed command
  ACPI: EC: wait for last write gpe
  ACPI: EC: make kernel messages more useful when GPE storm is detected
  ACPI: EC: revert msleep patch
  thinkpad_acpi: fingers off backlight if video.ko is serving this functionality
  sony-laptop: fingers off backlight if video.ko is serving this functionality
  msi-laptop: fingers off backlight if video.ko is serving this functionality
  fujitsu-laptop: fingers off backlight if video.ko is serving this functionality
  eeepc-laptop: fingers off backlight if video.ko is serving this functionality
  compal: fingers off backlight if video.ko is serving this functionality
  asus-acpi: fingers off backlight if video.ko is serving this functionality
  Acer-WMI: fingers off backlight if video.ko is serving this functionality
  ACPI video: if no ACPI backlight support, use vendor drivers
  ACPI: video: Ignore devices that aren't present in hardware
  Delete an unwanted return statement at evgpe.c
  ...
2008-11-12 10:24:46 -08:00
Ingo Molnar
2b7d0390a6 tracing: branch tracer, fix vdso crash
Impact: fix bootup crash

the branch tracer missed arch/x86/vdso/vclock_gettime.c from
disabling tracing, which caused such bootup crashes:

  [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077

also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
instrumentation on a per file basis.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 13:26:38 +01:00
Steven Rostedt
1f0d69a9fc tracing: profile likely and unlikely annotations
Impact: new unlikely/likely profiler

Andrew Morton recently suggested having an in-kernel way to profile
likely and unlikely macros. This patch achieves that goal.

When configured, every(*) likely and unlikely macro gets a counter attached
to it. When the condition is hit, the hit and misses of that condition
are recorded. These numbers can later be retrieved by:

  /debugfs/tracing/profile_likely    - All likely markers
  /debugfs/tracing/profile_unlikely  - All unlikely markers.

# cat /debug/tracing/profile_unlikely | head
 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
    2167        0   0 do_arch_prctl                  process_64.c         832
       0        0   0 do_arch_prctl                  process_64.c         804
    2670        0   0 IS_ERR                         err.h                34
   71230     5693   7 __switch_to                    process_64.c         673
   76919        0   0 __switch_to                    process_64.c         639
   43184    33743  43 __switch_to                    process_64.c         624
   12740    64181  83 __switch_to                    process_64.c         594
   12740    64174  83 __switch_to                    process_64.c         590

# cat /debug/tracing/profile_unlikely | \
  awk '{ if ($3 > 25) print $0; }' |head -20
   44963    35259  43 __switch_to                    process_64.c         624
   12762    67454  84 __switch_to                    process_64.c         594
   12762    67447  84 __switch_to                    process_64.c         590
    1478      595  28 syscall_get_error              syscall.h            51
       0     2821 100 syscall_trace_leave            ptrace.c             1567
       0        1 100 native_smp_prepare_cpus        smpboot.c            1237
   86338   265881  75 calc_delta_fair                sched_fair.c         408
  210410   108540  34 calc_delta_mine                sched.c              1267
       0    54550 100 sched_info_queued              sched_stats.h        222
   51899    66435  56 pick_next_task_fair            sched_fair.c         1422
       6       10  62 yield_task_fair                sched_fair.c         982
    7325     2692  26 rt_policy                      sched.c              144
       0     1270 100 pre_schedule_rt                sched_rt.c           1261
    1268    48073  97 pick_next_task_rt              sched_rt.c           884
       0    45181 100 sched_info_dequeued            sched_stats.h        177
       0       15 100 sched_move_task                sched.c              8700
       0       15 100 sched_move_task                sched.c              8690
   53167    33217  38 schedule                       sched.c              4457
       0    80208 100 sched_info_switch              sched_stats.h        270
   30585    49631  61 context_switch                 sched.c              2619

# cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
   39900    36577  47 pick_next_task                 sched.c              4397
   20824    15233  42 switch_mm                      mmu_context_64.h     18
       0        7 100 __cancel_work_timer            workqueue.c          560
     617    66484  99 clocksource_adjust             timekeeping.c        456
       0   346340 100 audit_syscall_exit             auditsc.c            1570
      38   347350  99 audit_get_context              auditsc.c            732
       0   345244 100 audit_syscall_entry            auditsc.c            1541
      38     1017  96 audit_free                     auditsc.c            1446
       0     1090 100 audit_alloc                    auditsc.c            862
    2618     1090  29 audit_alloc                    auditsc.c            858
       0        6 100 move_masked_irq                migration.c          9
       1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
       2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
       0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
    4514     2090  31 __grab_cache_page              filemap.c            2149
   12882   228786  94 mapping_unevictable            pagemap.h            50
       4       11  73 __flush_cpu_slab               slub.c               1466
  627757   330451  34 slab_free                      slub.c               1731
    2959    61245  95 dentry_lru_del_init            dcache.c             153
     946     1217  56 load_elf_binary                binfmt_elf.c         904
     102       82  44 disk_put_part                  genhd.h              206
       1        1  50 dst_gc_task                    dst.c                82
       0       19 100 tcp_mss_split_point            tcp_output.c         1126

As you can see by the above, there's a bit of work to do in rethinking
the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
that were more than 25%.

Note:  After submitting my first version of this patch, Andrew Morton
  showed me a version written by Daniel Walker, where I picked up
  the following ideas from:

  1)  Using __builtin_constant_p to avoid profiling fixed values.
  2)  Using __FILE__ instead of instruction pointers.
  3)  Using the preprocessor to stop all profiling of likely
       annotations from vsyscall_64.c.

Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
for their feed back on this patch.

(*) Not ever unlikely is recorded, those that are used by vsyscalls
 (a few of them) had to have profiling disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 11:52:02 +01:00
Ingo Molnar
60a011c736 Merge branch 'tracing/function-return-tracer' into tracing/fastboot 2008-11-12 10:17:09 +01:00
Ingo Molnar
d06bbd6695 Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ring_buffer.c
2008-11-12 10:11:37 +01:00
Len Brown
3e0fe36483 Merge branch 'misc' into release 2008-11-11 21:14:11 -05:00
Bjorn Helgaas
32836259ff ACPI: pci_link: remove acpi_irq_balance_set() interface
This removes the acpi_irq_balance_set() interface from the PCI
interrupt link driver.

x86 used acpi_irq_balance_set() to tell the PCI interrupt link
driver to configure links to minimize IRQ sharing.  But the link
driver can easily figure out whether to turn on IRQ balancing
based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
that external interface.

It's better for the driver to figure this out at init-time.  If
we set it externally via the x86 code, the interface reduces
modularity, and we depend on the fact that acpi_process_madt()
happens before we process the kernel command line.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-11 21:12:05 -05:00
Avi Kivity
e17d1dc086 KVM: Fix pit memory leak if unable to allocate irq source id
Reported-By: Daniel Marjamäki <danielm77@spray.se>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11 21:01:51 +02:00
Sheng Yang
928d4bf747 KVM: VMX: Set IGMT bit in EPT entry
There is a potential issue that, when guest using pagetable without vmexit when
EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory,
which would be inconsistent with host side and would cause host MCE due to
inconsistent cache attribute.

The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default
memory type to protect host (notice that all memory mapped by KVM should be WB).

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 21:00:37 +02:00
Avi Kivity
ca93e992fd KVM: Require the PCI subsystem
PCI device assignment makes calls to pci code, so require it to be built
into the kernel.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11 20:56:13 +02:00
Rakib Mullick
a29a2af378 x86: KVM guest: fix section mismatch warning in kvmclock.c
WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch
in reference from the function kvm_setup_secondary_clock() to the
function .devinit.text:setup_secondary_APIC_clock()
The function kvm_setup_secondary_clock() references
the function __devinit setup_secondary_APIC_clock().
This is often because kvm_setup_secondary_clock lacks a __devinit
annotation or the annotation of setup_secondary_APIC_clock is wrong.

Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 20:55:10 +02:00
Linus Torvalds
f21f237cf5 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context
  nohz: disable tick_nohz_kick_tick() for now
  irq: call __irq_enter() before calling the tick_idle_check
  x86: HPET: enter hpet_interrupt_handler with interrupts disabled
  x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
  x86: HPET: convert WARN_ON to WARN_ON_ONCE
2008-11-11 10:53:50 -08:00
Marcelo Tosatti
c41ef344de KVM: MMU: increase per-vcpu rmap cache alloc size
The page fault path can use two rmap_desc structures, if:

- walk_addr's dirty pte update allocates one rmap_desc.
- mmu_lock is dropped, sptes are zapped resulting in rmap_desc being
freed.
- fetch->mmu_set_spte allocates another rmap_desc.

Increase to 4 for safety.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 20:53:34 +02:00
James Bottomley
6cd10f8db3 x86, voyager: fix smp generic helper voyager breakage
Impact: build/boot fix for x86/Voyager

This change:

| commit 3d44223327
| Author: Jens Axboe <jens.axboe@oracle.com>
| Date:   Thu Jun 26 11:21:34 2008 +0200
|
|     Add generic helpers for arch IPI function calls

didn't wire up the voyager smp call function correctly, so do that
here.  Also make CONFIG_USE_GENERIC_SMP_HELPERS a def_bool y again,
since we now use the generic helpers for every x86 architecture.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 12:08:53 +01:00
Ingo Molnar
19b3e9671c tracing: function return tracer, build fix
fix:

 arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler':
 arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock'

cpu_clock() is implicitly included via a number of ways, but its real
location is sched.h. (Build failure is triggerable if enough other
kernel components are turned off.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 12:03:27 +01:00
Ingo Molnar
867f7fb3eb tracing, x86: function return tracer, fix assembly constraints
fix:

 arch/x86/kernel/ftrace.c: Assembler messages:
 arch/x86/kernel/ftrace.c:140: Error: missing ')'
 arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression
 arch/x86/kernel/ftrace.c:141: Error: missing ')'
 arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression

the [parent_replaced] is used in an =rm fashion, so that constraint
is correct in isolation - but [parent_old] aliases register %0 and uses
it in an addressing mode that is only valid with registers - so change
the constraint from =rm to =r.

This fixes the build failure.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 11:12:18 +01:00
Ingo Molnar
f1c4be5eda tracing, x86: clean up FUNCTION_RET_TRACER Kconfig
Impact: cleanup

move FUNCTION_RET_TRACER to the X86 select section, where we have all the
other options.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 10:29:17 +01:00
Frederic Weisbecker
caf4b323b0 tracing, x86: add low level support for ftrace return tracing
Impact: add infrastructure for function-return tracing

Add low level support for ftrace return tracing.

This plug-in stores return addresses on the thread_info structure of
the current task.

The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.

Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.

For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 10:29:11 +01:00
Ingo Molnar
e0cb4ebcd9 Merge branch 'tracing/urgent' into tracing/ftrace
Conflicts:
	kernel/trace/trace.c
2008-11-11 09:40:18 +01:00
Rafael J. Wysocki
4694516d19 x86: Make NUMA on 32-bit depend on BROKEN
While investigating the failure of hibernation on 32-bit x86 with
CONFIG_NUMA set, as described in this message
http://marc.info/?l=linux-kernel&m=122634118116226&w=4
I asked some people for help and I was told that it wasn't really
worth the effort, because CONFIG_NUMA was generally broken on 32-bit
x86 systems and it shouldn't be used in such configs.  For this
reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on
x86-32.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Machek <pavel@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-10 13:20:57 -08:00
Matt Fleming
5ceb1a0418 x86: HPET: enter hpet_interrupt_handler with interrupts disabled
Some functions that may be called from this handler require that
interrupts are disabled. Also, combining IRQF_DISABLED and
IRQF_SHARED does not reliably disable interrupts in a handler, so
remove IRQF_SHARED from the irq flags (this irq is not shared anyway).

Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Cc: "Will Newton" <will.newton@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10 17:38:07 +01:00
Matt Fleming
89d77a1eb6 x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
In hpet_next_event() we check that the value we just wrote to
HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that
the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if
timer is anything other than timer 0, is likely to fail.

Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10 17:38:07 +01:00
Matt Fleming
1de5b08546 x86: HPET: convert WARN_ON to WARN_ON_ONCE
It is possible to flood the console with call traces if the WARN_ON
condition is true because of the frequency with which this function is
called.

Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10 17:38:07 +01:00
Markus Metzger
f4166c54bf x86, bts: DS and BTS initialization
Impact: widen BTS/PEBS ptrace enablement to more CPU models

Move BTS initialisation out of an #ifdef CONFIG_X86_64 guard.

Assume core2 BTS and DS layout for future models of family 6 processors.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-10 08:50:32 +01:00