opensuse:kernel.git
8 years agov2.6.31.4-rt14
Thomas Gleixner [Tue, 13 Oct 2009 23:14:02 +0000 (01:14 +0200)]
v2.6.31.4-rt14

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge branch 'rt/head' into rt/2.6.31
Thomas Gleixner [Tue, 13 Oct 2009 20:27:16 +0000 (22:27 +0200)]
Merge branch 'rt/head' into rt/2.6.31

8 years agox86: highmem: Restore the not so leftover function prototype
Thomas Gleixner [Tue, 13 Oct 2009 20:23:40 +0000 (22:23 +0200)]
x86: highmem: Restore the not so leftover function prototype

commit e31b799 (x86: highmem: Remove leftover function prototypes)
removed kmap_atomic_prot_pfn() which is not a leftover, but should
have been left where it was.

Restore it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge 2.6.31.4
Thomas Gleixner [Tue, 13 Oct 2009 19:20:35 +0000 (21:20 +0200)]
Merge 2.6.31.4

Merge branch 'master' of

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y

into rt/2.6.31

Conflicts:
Makefile

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge branch 'rt/head' into rt/2.6.31
Thomas Gleixner [Tue, 13 Oct 2009 19:18:56 +0000 (21:18 +0200)]
Merge branch 'rt/head' into rt/2.6.31

8 years agoslab: Cover the numa aliens with the per cpu locked changes
Peter Zijlstra [Tue, 13 Oct 2009 15:05:40 +0000 (10:05 -0500)]
slab: Cover the numa aliens with the per cpu locked changes

The numa aliens tear down is not covered by the per cpu locked changes
which we did to slab. Fix that.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 mm/slab.c |   36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

8 years agofutex: Handle spurious wake up
Thomas Gleixner [Tue, 13 Oct 2009 19:03:43 +0000 (21:03 +0200)]
futex: Handle spurious wake up

The futex code does not handle spurious wake up in futex_wait and
futex_wait_requeue_pi.

The code assumes that any wake up which was not caused by futex_wake /
requeue or by a timeout was caused by a signal wake up and returns one
of the syscall restart error codes.

In case of a spurious wake up the signal delivery code which deals
with the restart error codes is not invoked and we return that error
code to user space. That causes applications which actually check the
return codes to fail. Blaise reported that on preempt-rt a python test
program run into a exception trap. -rt exposed that due to a built in
spurious wake up accelerator :)

Solve this by checking signal_pending(current) in the wake up path and
handle the spurious wake up case w/o returning to user space.

Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
LKML-Reference: <new-submission>

Conflicts:

kernel/futex.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agosoftirq: Name hrtimer softirq also when CONFIG_HIGH_RES_TIMERS=n
Thomas Gleixner [Tue, 13 Oct 2009 16:52:52 +0000 (18:52 +0200)]
softirq: Name hrtimer softirq also when CONFIG_HIGH_RES_TIMERS=n

Remy Bohmer pointed out that we create the hrtimer softirq thread even
when CONFIG_HIGH_RES_TIMERS is off. That results in a softirq-NULL
name for the thread. The thread is needed on -rt

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agosoftirq: Revert "softirq: Do not create hrtimer softirq thread..."
Thomas Gleixner [Tue, 13 Oct 2009 16:51:01 +0000 (18:51 +0200)]
softirq: Revert "softirq: Do not create hrtimer softirq thread..."

This reverts commit d69c5d3773d7e0122358b9c87cc756837021b5f6. The
softirq is necessary even in the CONFIG_HIGH_RES_TIMERS=n case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agofutex: Revert "futex: Wake up waiter outside the hb->lock section"
Thomas Gleixner [Tue, 13 Oct 2009 15:21:18 +0000 (17:21 +0200)]
futex: Revert "futex: Wake up waiter outside the hb->lock section"

This reverts commit 928686b77ab275fd7f828ff24bd510baca995425.

The patch was an optimization of the old futex wake code where we woke
the waiter and then set q->lock_ptr to NULL. When the waiter preempted
the waker then we run into lock contention on q->lock_ptr
aka. hb->lock.

commit f1a11e (futex: remove the wait queue) changes the wakeup logic
by setting q->lock_ptr to NULL _before_ waking the task. It keeps a
reference on the task struct of the to be woken task to avoid an exit
race.

The combination of both patches resulted in different race on -RT:

    A is blocked on futex
    B calls futex_wake
    B sets q(A)->lock_ptr to NULL and puts A on the wake list
    B is preempted
    ...
    A wakes up (e.g. timer, signal)
    A detects q->lock_ptr = NULL and returns
    A waits on a different futex

    B is scheduled back in
    B wakes A
    A sees a spurious wake up

Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 enter the commit message for your changes. Lines starting

8 years agoLinux 2.6.31.4
Greg Kroah-Hartman [Mon, 12 Oct 2009 20:15:40 +0000 (13:15 -0700)]
Linux 2.6.31.4

8 years agosit: fix off-by-one in ipip6_tunnel_get_prl
Sascha Hlusiak [Tue, 29 Sep 2009 11:27:05 +0000 (11:27 +0000)]
sit: fix off-by-one in ipip6_tunnel_get_prl

[ Upstream commit 298bf12ddb25841804f26234a43b89da1b1c0e21 ]

When requesting all prl entries (kprl.addr == INADDR_ANY) and there are
more prl entries than there is space passed from userspace, the existing
code would always copy cmax+1 entries, which is more than can be handled.

This patch makes the kernel copy only exactly cmax entries.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoax25: Fix SIOCAX25GETINFO ioctl
Eric Dumazet [Sun, 20 Sep 2009 06:32:55 +0000 (06:32 +0000)]
ax25: Fix SIOCAX25GETINFO ioctl

[ Upstream commit 407fc5cf019fc5cb990458a2e38d2c0a27b3cb30 ]

rcv_q & snd_q initializations were reversed in commit
31e6d363abcd0d05766c82f1a9c905a4c974a199
(net: correct off-by-one write allocations reports)

Signed-off-by: Jan Rafaj <jr+netfilter-devel@cedric.unob.cz>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoax25: Fix possible oops in ax25_make_new
Jarek Poplawski [Sun, 27 Sep 2009 10:57:02 +0000 (10:57 +0000)]
ax25: Fix possible oops in ax25_make_new

[ Upstream commit 8c185ab6185bf5e67766edb000ce428269364c86 ]

In ax25_make_new, if kmemdup of digipeat returns an error, there would
be an oops in sk_free while calling sk_destruct, because sk_protinfo
is NULL at the moment; move sk->sk_destruct initialization after this.

BTW of reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoappletalk: Fix skb leak when ipddp interface is not loaded
Arnaldo Carvalho de Melo [Wed, 9 Sep 2009 14:40:12 +0000 (11:40 -0300)]
appletalk: Fix skb leak when ipddp interface is not loaded

[ Upstream commit ffcfb8db540ff879c2a85bf7e404954281443414 ]

And also do a better job of returning proper NET_{RX,XMIT}_ values.

Based on a patch by Mark Smith.

This fixes CVE-2009-2903

Reported-by: Mark Smith <lk-netdev@lk-netdev.nosense.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agosky2: Set SKY2_HW_RAM_BUFFER in sky2_init
Mike McCormack [Mon, 21 Sep 2009 04:08:52 +0000 (04:08 +0000)]
sky2: Set SKY2_HW_RAM_BUFFER in sky2_init

[ Upstream commit 74a61ebf653c6abe459f228eb40e9f24f7ef1fb7 ]

The SKY2_HW_RAM_BUFFER bit in hw->flags was checked in sky2_mac_init(),
 before being set later in sky2_up().

Setting SKY2_HW_RAM_BUFFER in sky2_init() where other hw->flags are set
 should avoid this problem recurring.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agosmsc95xx: fix transmission where ZLP is expected
Steve Glendinning [Tue, 22 Sep 2009 04:00:27 +0000 (04:00 +0000)]
smsc95xx: fix transmission where ZLP is expected

[ Upstream commit ec4756238239f1a331d9fb95bad8b281dad56855 ]

Usbnet framework assumes USB hardware doesn't handle zero length
packets, but SMSC LAN95xx requires these to be sent for correct
operation.

This patch fixes an easily reproducible tx lockup when sending a frame
that results in exactly 512 bytes in a USB transmission (e.g. a UDP
frame with 458 data bytes, due to IP headers and our USB headers).  It
adds an extra flag to usbnet for the hardware driver to indicate that
it can handle and requires the zero length packets.

This patch should not affect other usbnet users, please also consider
for -stable.

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agonet: Fix sock_wfree() race
Eric Dumazet [Thu, 24 Sep 2009 10:49:24 +0000 (10:49 +0000)]
net: Fix sock_wfree() race

[ Upstream commit d99927f4d93f36553699573b279e0ff98ad7dea6 ]

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.

Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agotcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG()
Robert Varga [Wed, 16 Sep 2009 06:49:21 +0000 (23:49 -0700)]
tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG()

[ Upstream commit 657e9649e745b06675aa5063c84430986cdc3afa ]

I have recently came across a preemption imbalance detected by:

<4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
<0>------------[ cut here ]------------
<2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
<0>invalid opcode: 0000 [1] PREEMPT SMP

with ffffffff80644630 being inet_twdr_hangman().

This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
bit, so I looked at what might have caused it.

One thing that struck me as strange is tcp_twsk_destructor(), as it
calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
as far as I can tell.

Signed-off-by: Robert Varga <nite@hq.alert.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agotun: Return -EINVAL if neither IFF_TUN nor IFF_TAP is set.
Kusanagi Kouichi [Wed, 16 Sep 2009 21:36:13 +0000 (21:36 +0000)]
tun: Return -EINVAL if neither IFF_TUN nor IFF_TAP is set.

[ Upstream commit 36989b90879c785f95b877bdcf65a2527dadd893 ]

After commit 2b980dbd77d229eb60588802162c9659726b11f4
("lsm: Add hooks to the TUN driver") tun_set_iff doesn't
return -EINVAL though neither IFF_TUN nor IFF_TAP is set.

Signed-off-by: Kusanagi Kouichi <slash@ma.neweb.ne.jp>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agonet: unix: fix sending fds in multiple buffers
Miklos Szeredi [Fri, 11 Sep 2009 18:31:45 +0000 (11:31 -0700)]
net: unix: fix sending fds in multiple buffers

[ Upstream commit 8ba69ba6a324b13e1190fc31e41954d190fd4f1d ]

Kalle Olavi Niemitalo reported that:

  "..., when one process calls sendmsg once to send 43804 bytes of
  data and one file descriptor, and another process then calls recvmsg
  three times to receive the 16032+16032+11740 bytes, each of those
  recvmsg calls returns the file descriptor in the ancillary data.  I
  confirmed this with strace.  The behaviour differs from Linux
  2.6.26, where reportedly only one of those recvmsg calls (I think
  the first one) returned the file descriptor."

This bug was introduced by a patch from me titled "net: unix: fix inflight
counting bug in garbage collector", commit 6209344f5.

And the reason is, quoting Kalle:

  "Before your patch, unix_attach_fds() would set scm->fp = NULL, so
  that if the loop in unix_stream_sendmsg() ran multiple iterations,
  it could not call unix_attach_fds() again.  But now,
  unix_attach_fds() leaves scm->fp unchanged, and I think this causes
  it to be called multiple times and duplicate the same file
  descriptors to each struct sk_buff."

Fix this by introducing a flag that is cleared at the start and set
when the fds attached to the first buffer.  The resulting code should
work equivalently to the one on 2.6.26.

Reported-by: Kalle Olavi Niemitalo <kon@iki.fi>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agonet: restore tx timestamping for accelerated vlans
Eric Dumazet [Wed, 30 Sep 2009 23:42:42 +0000 (16:42 -0700)]
net: restore tx timestamping for accelerated vlans

[ Upstream commit 81bbb3d4048cf577b5babcb0834230de391a35c5 ]

Since commit 9b22ea560957de1484e6b3e8538f7eef202e3596
( net: fix packet socket delivery in rx irq handler )

We lost rx timestamping of packets received on accelerated vlans.

Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps
too late (ie at skb dequeueing time, not at skb queueing time)

14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1
14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1

14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2
14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2

14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3
14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoACPI: fix Compaq Evo N800c (Pentium 4m) boot hang regression
Zhao Yakui [Sun, 27 Sep 2009 07:30:51 +0000 (03:30 -0400)]
ACPI: fix Compaq Evo N800c (Pentium 4m) boot hang regression

commit 3e2ada5867b7e9fa0b296d30fa8f3726ebd0a8b7 upstream.

Don't disable ARB_DISABLE when the familary ID is 0x0F.

http://bugzilla.kernel.org/show_bug.cgi?id=14211

This was a 2.6.31 regression, and so this patch
needs to be applied to 2.6.31.stable

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoACPI: Clarify resource conflict message
Jean Delvare [Tue, 8 Sep 2009 13:31:46 +0000 (15:31 +0200)]
ACPI: Clarify resource conflict message

commit 14f03343ad1080c2fea29ab2c13f05b976c4584e upstream.

The message "ACPI: Device needs an ACPI driver" is misleading. The
device _may_ need an ACPI driver, if the BIOS implemented a custom
API for the device in question (which, AFAIK, can't be checked.) If
not, then either a generic ACPI driver may be used (for example
"thermal"), or nothing can be done (other than a white list).

I propose to reword the message to:

ACPI: If an ACPI driver is available for this device, you should use
it instead of the native driver

which I think is more correct. Comments and suggestions welcome.

I also added a message warning about possible problems and system
instability when users pass acpi_enforce_resources=lax, as suggested
by Len.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Alan Jenkins <sourcejedi.lkml@googlemail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoIMA: open new file for read
Mimi Zohar [Wed, 2 Sep 2009 15:40:32 +0000 (11:40 -0400)]
IMA: open new file for read

commit 6c1488fd581a447ec87c4b59f0d33f95f0aa441b upstream.

When creating a new file, ima_path_check() assumed the new file
was being opened for write. Call ima_path_check() with the
appropriate acc_mode so that the read/write counters are
incremented correctly.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoPIT fixes to unbreak suspend/resume (bug #14222)
john stultz [Thu, 8 Oct 2009 20:31:45 +0000 (13:31 -0700)]
PIT fixes to unbreak suspend/resume (bug #14222)

Resolved differently upstream in commit 8cab02dc3c58a12235c6d463ce684dded9696848

Ondrej Zary reported a suspend/resume hang with 2.6.31 in bug #14222.

http://bugzilla.kernel.org/show_bug.cgi?id=14222

The hang was bisected to c7121843685de2bf7f3afd3ae1d6a146010bf1fc
however, that was really just the last straw that caused the issue.

The problem was that on suspend, the PIT is removed as a clocksource,
and was using the mult value essentially as a is_enabled() flag. The
mult adjustments done in the commit above caused that usage to break,
causing bad list manipulation and the oops.

Further, on resume, the PIT clocksource is never restored, causing the
system to run in a degraded mode with jiffies as the clocksource.

This issue has since been resolved in 2.6.32-rc by commit
8cab02dc3c58a12235c6d463ce684dded9696848 which removes the clocksource
disabling on suspend. Testing shows no issues there.

So the following patch rectifies the situation for 2.6.31 users of the
PIT clocksource that use suspend and resume (which is probably not that
many).

Many thanks to Ondrej for helping narrow down what was happening, what
caused it, and verifying the fix.

---------------

Avoid using the unprotected clocksource.mult value as an "is_registered"
flag, instead us an explicit flag variable. This avoids possible list
corruption if the clocksource is double-unregistered.

Also re-register the PIT clocksource on resume so folks don't have to
use jiffies after suspend.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agosis5513: fix PIO setup for ATAPI devices
Bartlomiej Zolnierkiewicz [Tue, 6 Oct 2009 14:46:05 +0000 (14:46 +0000)]
sis5513: fix PIO setup for ATAPI devices

commit e13ee546bb06453939014c7b854e77fb643fd6f1 upstream.

Clear prefetch setting before potentially (re-)enabling it in
config_drive_art_rwp() so the transition of the device type on
the port from ATA to ATAPI (i.e. during warm-plug operation)
is handled correctly.

This is a really old bug (it probably goes back to very early
days of the driver) but it was only affecting warm-plug operation
until the recent "ide: try to use PIO Mode 0 during probe if
possible" change (commit 6029336426a2b43e4bc6f4a84be8789a047d139e).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Tested-by: David Fries <david@fries.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agomm: add_to_swap_cache() must not sleep
Daisuke Nishimura [Tue, 22 Sep 2009 00:02:50 +0000 (17:02 -0700)]
mm: add_to_swap_cache() must not sleep

commit 31a5639623a487d6db996c8138c9e53fef2e2d91 upstream.

After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
read_swap_cache_async() will busy-wait while a entry doesn't exist in swap
cache but it has SWAP_HAS_CACHE flag.

Such entries can exist on add/delete path of swap cache.  On add path,
add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and
on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is
cleared) soon after __delete_from_swap_cache() is called.  So, the
busy-wait works well in most cases.

But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and
read_swap_cache_async() tries to swap-in the same entry on the same cpu.

This patch calls radix_tree_preload() before swapcache_prepare() and
divides add_to_swap_cache() into two part: radix_tree_preload() part and
radix_tree_insert() part(define it as __add_to_swap_cache()).

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agonet: Fix wrong sizeof
Jean Delvare [Fri, 2 Oct 2009 16:55:19 +0000 (09:55 -0700)]
net: Fix wrong sizeof

commit b607bd900051efc3308c4edc65dd98b34b230021 upstream.

Which is why I have always preferred sizeof(struct foo) over
sizeof(var).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoKVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctly
Joerg Roedel [Mon, 12 Oct 2009 09:42:44 +0000 (11:42 +0200)]
KVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctly

commit 20824f30bb0b8ae0a4099895fd4509f54cf2e1e2 upstream.

When running nested we need to touch the l1 guests
tsc_offset. Otherwise changes will be lost or a wrong value
be read.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoKVM: SVM: Fix tsc offset adjustment when running nested
Joerg Roedel [Mon, 12 Oct 2009 09:41:51 +0000 (11:41 +0200)]
KVM: SVM: Fix tsc offset adjustment when running nested

commit 77b1ab1732feb5e3dcbaf31d2f7547c5229f5f3a upstream.

When svm_vcpu_load is called while the vcpu is running in
guest mode the tsc adjustment made there is lost on the next
emulated #vmexit. This causes the tsc running backwards in
the guest. This patch fixes the issue by also adjusting the
tsc_offset in the emulated hsave area so that it will not
get lost.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoKVM: fix LAPIC timer period overflow
Aurelien Jarno [Fri, 25 Sep 2009 09:09:37 +0000 (11:09 +0200)]
KVM: fix LAPIC timer period overflow

commit b2d83cfa3fdefe5c6573d443d099a18dc3a93c5f upstream.

Don't overflow when computing the 64-bit period from 32-bit registers.

Fixes sourceforge bug #2826486.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoKVM: VMX: flush TLB with INVEPT on cpu migration
Marcelo Tosatti [Thu, 1 Oct 2009 22:16:58 +0000 (19:16 -0300)]
KVM: VMX: flush TLB with INVEPT on cpu migration

commit eb5109e311b5152c0614a28d7d615d087f268f19 upstream.

It is possible that stale EPTP-tagged mappings are used, if a
vcpu migrates to a different pcpu.

Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which
will invalidate both VPID and EPT mappings on the next vm-entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoKVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID
Avi Kivity [Sun, 4 Oct 2009 14:45:13 +0000 (16:45 +0200)]
KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID

commit 6a54435560efdab1a08f429a954df4d6c740bddf upstream.

The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts.  Bound the entry count instead.

Reported-by: David Wagner <daw@cs.berkeley.edu>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoASoC: WM8350 capture PGA mutes are inverted
Mark Brown [Mon, 29 Jun 2009 10:17:10 +0000 (11:17 +0100)]
ASoC: WM8350 capture PGA mutes are inverted

commit 5b7dde346881b12246669ae97b3a2793c27b32b6 upstream.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agosound: via82xx: move DXS volume controls to PCM interface
Clemens Ladisch [Tue, 6 Oct 2009 06:21:04 +0000 (08:21 +0200)]
sound: via82xx: move DXS volume controls to PCM interface

commit 2fb930b53f513cbc4c102d415d2923a8a7091337 upstream.

The "VIA DXS" controls are actually volume controls that apply to the
four PCM substreams, so we better indicate this connection by moving the
controls to the PCM interface.

Commit b452e08e73c0e3dbb0be82130217be4b7084299e in 2.6.30 broke the
restoring of these volumes by "alsactl restore" that most distributions
use; the renaming in this patch cures that regression by preventing
alsactl from applying the old, wrong volume levels to the new controls.
http://bugzilla.kernel.org/show_bug.cgi?id=14151
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=532613

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agolibata: fix incorrect link online check during probe
Tejun Heo [Tue, 6 Oct 2009 08:08:40 +0000 (17:08 +0900)]
libata: fix incorrect link online check during probe

commit 3b761d3d437cffcaf160a5d37eb6b3b186e491d5 upstream.

While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782ac139a8b65147cca990822bb7e8675 incorrectly added link
offline check logic before ata_eh_thaw() was called.  This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.

The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Blechmann <tim@klingt.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoima: ecryptfs fix imbalance message
Mimi Zohar [Mon, 5 Oct 2009 18:25:44 +0000 (14:25 -0400)]
ima: ecryptfs fix imbalance message

commit 36520be8e32b49bd85a63b7b8b40cd07c3da59a5 upstream.

The unencrypted files are being measured.  Update the counters to get
rid of the ecryptfs imbalance message. (http://bugzilla.redhat.com/519737)

Reported-by: Sachin Garg
Cc: Eric Paris <eparis@redhat.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: David Safford <safford@watson.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoNOHZ: update idle state also when NOHZ is inactive
Eero Nurkkala [Wed, 7 Oct 2009 08:54:26 +0000 (11:54 +0300)]
NOHZ: update idle state also when NOHZ is inactive

commit fdc6f192e7e1ae80565af23cc33dc88e3dcdf184 upstream.

Commit f2e21c9610991e95621a81407cdbab881226419b had unfortunate side
effects with cpufreq governors on some systems.

If the system did not switch into NOHZ mode ts->inidle is not set when
tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
fail to call tick_nohz_start_idle(). This results in bogus idle
accounting information which is passed to cpufreq governors.

Set the inidle flag unconditionally of the NOHZ active state to keep
the idle time accounting correct in any case.

[ tglx: Added comment and tweaked the changelog ]

Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@nokia.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Steven Noonan <steven@uplinklabs.net>
LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agofutex: Fix locking imbalance
Thomas Gleixner [Sun, 4 Oct 2009 07:34:17 +0000 (09:34 +0200)]
futex: Fix locking imbalance

commit eaaea8036d0261d87d7072c5bc88c7ea730c18ac upstream.

Rich reported a lock imbalance in the futex code:

   http://bugzilla.kernel.org/show_bug.cgi?id=14288

It's caused by the displacement of the retry_private label in
futex_wake_op(). The code unlocks the hash bucket locks in the
error handling path and retries without locking them again which
makes the next unlock fail.

Move retry_private so we lock the hash bucket locks when we retry.

Reported-by: Rich Ercolany <rercola@acm.jhu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agofutex: Nullify robust lists after cleanup
Peter Zijlstra [Mon, 5 Oct 2009 16:17:32 +0000 (18:17 +0200)]
futex: Nullify robust lists after cleanup

commit fc6b177dee33365ccb29fe6d2092223cf8d679f9 upstream.

The robust list pointers of user space held futexes are kept intact
over an exec() call. When the exec'ed task exits exit_robust_list() is
called with the stale pointer. The risk of corruption is minimal, but
still it is incorrect to keep the pointers valid. Actually glibc
should uninstall the robust list before calling exec() but we have to
deal with it anyway.

Nullify the pointers after [compat_]exit_robust_list() has been
called.

Reported-by: Anirban Sinha <ani@anirban.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agofutex: Move exit_pi_state() call to release_mm()
Thomas Gleixner [Mon, 5 Oct 2009 16:18:03 +0000 (18:18 +0200)]
futex: Move exit_pi_state() call to release_mm()

commit 322a2c100a8998158445599ea437fb556aa95b11 upstream.

exit_pi_state() is called from do_exit() but not from do_execve().
Move it to release_mm() so it gets called from do_execve() as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Anirban Sinha <ani@anirban.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agofutex: fix requeue_pi key imbalance
Darren Hart [Wed, 7 Oct 2009 18:46:54 +0000 (11:46 -0700)]
futex: fix requeue_pi key imbalance

commit da085681014fb43d67d9bf6d14bc068e9254bd49 upstream.

If futex_wait_requeue_pi() wakes prior to requeue, we drop the
reference to the source futex_key twice, once in
handle_early_requeue_pi_wakeup() and once on our way out.

Remove the drop from the handle_early_requeue_pi_wakeup() and keep
the get/drops together in futex_wait_requeue_pi().

Reported-by: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Helge Bahmann <hcb@chaoticmind.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <4ACCE21E.5030805@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoftrace: check for failure for all conversions
Steven Rostedt [Wed, 7 Oct 2009 20:57:56 +0000 (16:57 -0400)]
ftrace: check for failure for all conversions

commit 3279ba37db5d65c4ab0dcdee3b211ccb85bb563f upstream.

Due to legacy code from back when the dynamic tracer used a daemon,
only core kernel code was checking for failures. This is no longer
the case. We must check for failures any time we perform text modifications.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agotracing: correct module boundaries for ftrace_release
jolsa@redhat.com [Wed, 7 Oct 2009 17:00:35 +0000 (19:00 +0200)]
tracing: correct module boundaries for ftrace_release

commit e7247a15ff3bbdab0a8b402dffa1171e5c05a8e0 upstream.

When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.

Plus making ftrace_release function module specific.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoALSA: hda - Added quirk to enable sound on Toshiba NB200
Manoj Iyer [Tue, 22 Sep 2009 23:33:29 +0000 (18:33 -0500)]
ALSA: hda - Added quirk to enable sound on Toshiba NB200

commit 3db6c037c6954ed6d98ef199938e4004fea96908 upstream.

Patch was tested on Toshiba NB200 and is found to enable sound.

Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agox86: Don't leak 64-bit kernel register values to 32-bit processes
Jan Beulich [Wed, 30 Sep 2009 10:22:11 +0000 (11:22 +0100)]
x86: Don't leak 64-bit kernel register values to 32-bit processes

commit 24e35800cdc4350fc34e2bed37b608a9e13ab3b6 upstream.

While 32-bit processes can't directly access R8...R15, they can
gain access to these registers by temporarily switching themselves
into 64-bit mode.

Therefore, registers not preserved anyway by called C functions
(i.e. R8...R11) must be cleared prior to returning to user mode.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4AC34D73020000780001744A@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agotty: Avoid dropping ldisc_mutex over hangup tty re-initialization
Linus Torvalds [Sun, 4 Oct 2009 04:44:21 +0000 (21:44 -0700)]
tty: Avoid dropping ldisc_mutex over hangup tty re-initialization

commit 0b5759c654e74c8dc317ea2c6b3a7476160f688a upstream.

A couple of people have hit the WARN_ON() in drivers/char/tty_io.c,
tty_open() that is unhappy about seeing the tty line discipline go away
during the tty hangup. See for example

http://bugzilla.kernel.org/show_bug.cgi?id=14255

and the reason is that we do the tty_ldisc_halt() outside the
ldisc_mutex in order to be able to flush the scheduled work without a
deadlock with vhangup_work.

However, it turns out that we can solve this particular case by

 - using "cancel_delayed_work_sync()" in tty_ldisc_halt(), which waits
   for just the particular work, rather than synchronizing with any
   random outstanding pending work.

   This won't deadlock, since the buf.work we synchronize with doesn't
   care about the ldisc_mutex, it just flushes the tty ldisc buffers.

 - realize that for this particular case, we don't need to wait for any
   hangup work, because we are inside the hangup codepaths ourselves.

so as a result we can just drop the flush_scheduled_work() entirely, and
then move the tty_ldisc_halt() call to inside the mutex.  That way we
never expose the partially torn down ldisc state to tty_open(), and hold
the ldisc_mutex over the whole sequence.

Reported-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Heinz Diehl <htd@fancy-poultry.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agox86: fix csum_ipv6_magic asm memory clobber
Samuel Thibault [Thu, 1 Oct 2009 22:44:02 +0000 (15:44 -0700)]
x86: fix csum_ipv6_magic asm memory clobber

commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream.

Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.

This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoMerge branch 'rt/head' into rt/2.6.31
Thomas Gleixner [Mon, 12 Oct 2009 18:12:58 +0000 (20:12 +0200)]
Merge branch 'rt/head' into rt/2.6.31

8 years agofutex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()
Darren Hart [Tue, 22 Sep 2009 05:30:38 +0000 (22:30 -0700)]
futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()

PI futexes do not use the same plist_node_empty() test for wakeup.
It was possible for the waiter (in futex_wait_requeue_pi()) to set
TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the
waiter. The waiter would then note the plist was not empty and call
schedule(). The task would not be found by any subsequeuent futex
wakeups, resulting in a userspace hang.

By moving the setting of TASK_INTERRUPTIBLE to before the call to
queue_me(), the race with the waker is eliminated. Since we no
longer call get_user() from within queue_me(), there is no need to
delay the setting of TASK_INTERRUPTIBLE until after the call to
queue_me().

The FUTEX_LOCK_PI operation is not affected as futex_lock_pi()
relies entirely on the rtmutex code to handle schedule() and
wakeup.  The requeue PI code is affected because the waiter starts
as a non-PI waiter and is woken on a PI futex.

Remove the crusty old comment about holding spinlocks() across
get_user() as we no longer do that. Correct the locking statement
with a description of why the test is performed.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053038.8717.97838.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8 years agoMerge branch 'rt/head' into rt/2.6.31
Thomas Gleixner [Mon, 12 Oct 2009 17:26:22 +0000 (19:26 +0200)]
Merge branch 'rt/head' into rt/2.6.31

8 years agox86: highmem: Remove leftover function prototypes
Thomas Gleixner [Fri, 9 Oct 2009 18:51:54 +0000 (20:51 +0200)]
x86: highmem: Remove leftover function prototypes

-RT replaces kmap_atomic* functions with macros, but we kept the
 function prototypes around. Remove them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge branch 'rt/head' into rt/2.6.31
Thomas Gleixner [Thu, 8 Oct 2009 08:48:55 +0000 (10:48 +0200)]
Merge branch 'rt/head' into rt/2.6.31

8 years agotime: Remove xtime_cache
john stultz [Fri, 2 Oct 2009 23:24:15 +0000 (16:24 -0700)]
time: Remove xtime_cache

With the prior logarithmic time accumulation patch, xtime will now
always be within one "tick" of the current time, instead of
possibly half a second off.

This removes the need for the xtime_cache value, which always
stored the time at the last interrupt, so this patch cleans that up
removing the xtime_cache related code.

This is a bit simpler, but still could use some wider testing.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1254525855.7741.95.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cherry-picked from timers/core. Conflicts:

kernel/time.c
kernel/time/timekeeping.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agotime: Implement logarithmic time accumulation
Thomas Gleixner [Thu, 8 Oct 2009 08:39:50 +0000 (10:39 +0200)]
time: Implement logarithmic time accumulation

Accumulating one tick at a time works well unless we're using NOHZ.
Then it can be an issue, since we may have to run through the loop
a few thousand times, which can increase timer interrupt caused
latency.

The current solution was to accumulate in half-second intervals
with NOHZ. This kept the number of loops down, however it did
slightly change how we make NTP adjustments. While not an issue
with NTPd users, as NTPd makes adjustments over a longer period of
time, other adjtimex() users have noticed the half-second
granularity with which we can apply frequency changes to the clock.

For instance, if a application tries to apply a 100ppm frequency
correction for 20ms to correct a 2us offset, with NOHZ they either
get no correction, or a 50us correction.

Now, there will always be some granularity error for applying
frequency corrections. However with users sensitive to this error
have seen a 50-500x increase with NOHZ compared to running without
NOHZ.

So I figured I'd try another approach then just simply increasing
the interval. My approach is to consume the time interval
logarithmically. This reduces the number of times through the loop
needed keeping latency down, while still preserving the original
granularity error for adjtimex() changes.

Further, this change allows us to remove the xtime_cache code
(patch to follow), as xtime is always within one tick of the
current time, instead of the half-second updates it saw before.

An earlier version of this patch has been shipping to x86 users in
the RedHat MRG releases for awhile without issue, but I've reworked
this version to be even more careful about avoiding possible
overflows if the shift value gets too large.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cherry-picked from timers/core. Conflicts:

kernel/time/timekeeping.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agofutex: Move exit_pi_state() call to release_mm()
Thomas Gleixner [Mon, 5 Oct 2009 16:18:03 +0000 (18:18 +0200)]
futex: Move exit_pi_state() call to release_mm()

exit_pi_state() is called from do_exit() but not from do_execve().
Move it to release_mm() so it gets called from do_execve() as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org
Cc: Anirban Sinha <ani@anirban.org>
Cc: Peter Zijlstra <peterz@infradead.org>
8 years agofutex: Nullify robust lists after cleanup
Peter Zijlstra [Mon, 5 Oct 2009 16:17:32 +0000 (18:17 +0200)]
futex: Nullify robust lists after cleanup

The robust list pointers of user space held futexes are kept intact
over an exec() call. When the exec'ed task exits exit_robust_list() is
called with the stale pointer. The risk of corruption is minimal, but
still it is incorrect to keep the pointers valid. Actually glibc
should uninstall the robust list before calling exec() but we have to
deal with it anyway.

Nullify the pointers after [compat_]exit_robust_list() has been
called.

Reported-by: Anirban Sinha <ani@anirban.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org
8 years agoNOHZ: update idle state also when NOHZ is inactive
Eero Nurkkala [Wed, 7 Oct 2009 08:54:26 +0000 (11:54 +0300)]
NOHZ: update idle state also when NOHZ is inactive

Commit f2e21c9610991e95621a81407cdbab881226419b had unfortunate side
effects with cpufreq governors on some systems.

If the system did not switch into NOHZ mode ts->inidle is not set when
tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
fail to call tick_nohz_start_idle(). This results in bogus idle
accounting information which is passed to cpufreq governors.

Set the inidle flag unconditionally of the NOHZ active state to keep
the idle time accounting correct in any case.

[ tglx: Added comment and tweaked the changelog ]

Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@nokia.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: stable@kernel.org
LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge branch 'master' of
Thomas Gleixner [Thu, 8 Oct 2009 07:27:06 +0000 (09:27 +0200)]
Merge branch 'master' of

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y

into rt/2.6.31

Conflicts:
Makefile

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoLinux 2.6.31.3
Greg Kroah-Hartman [Wed, 7 Oct 2009 21:39:51 +0000 (14:39 -0700)]
Linux 2.6.31.3

8 years agoTTY: fix typos
Alan Stern [Thu, 20 Aug 2009 19:23:47 +0000 (15:23 -0400)]
TTY: fix typos

commit 1f5c13fad4ec5617b610e12205902c06298c096a upstream.

This patch (as1282) fixes some obvious typos in the TTY core.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agofutex: fix requeue_pi key imbalance
Darren Hart [Wed, 7 Oct 2009 18:46:54 +0000 (11:46 -0700)]
futex: fix requeue_pi key imbalance

If futex_wait_requeue_pi() wakes prior to requeue, we drop the
reference to the source futex_key twice, once in
handle_early_requeue_pi_wakeup() and once on our way out.

Remove the drop from the handle_early_requeue_pi_wakeup() and keep
the get/drops together in futex_wait_requeue_pi().

Reported-by: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Helge Bahmann <hcb@chaoticmind.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <4ACCE21E.5030805@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agosoftirq: Do not create hrtimer softirq thread when CONFIG_HIGH_RES_TIMERS=n
Thomas Gleixner [Tue, 6 Oct 2009 20:13:37 +0000 (22:13 +0200)]
softirq: Do not create hrtimer softirq thread when CONFIG_HIGH_RES_TIMERS=n

Remy Bohmer pointed out that we create the hrtimer softirq thread even
when CONFIG_HIGH_RES_TIMERS is off. That results in a softirq-NULL
name for the thread.

Skip the thread creation/wakeup/teardown when CONFIG_HIGH_RES_TIMERS=n

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agov2.6.31.2-rt13
Thomas Gleixner [Tue, 6 Oct 2009 09:22:48 +0000 (11:22 +0200)]
v2.6.31.2-rt13

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoMerge branch 'master' of
Thomas Gleixner [Tue, 6 Oct 2009 09:22:23 +0000 (11:22 +0200)]
Merge branch 'master' of

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.31.y

into rt/2.6.31

Conflicts:
Makefile
arch/powerpc/mm/pgtable.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agov2.6.31.1-rt12
Thomas Gleixner [Tue, 6 Oct 2009 08:16:44 +0000 (10:16 +0200)]
v2.6.31.1-rt12

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agoLinux 2.6.31.2
Greg Kroah-Hartman [Mon, 5 Oct 2009 17:12:06 +0000 (10:12 -0700)]
Linux 2.6.31.2

8 years agoiwlwifi: fix unloading driver while scanning
Wey-Yi Guy [Wed, 30 Sep 2009 20:01:01 +0000 (13:01 -0700)]
iwlwifi: fix unloading driver while scanning

This is commit 5bddf54962bf68002816df710348ba197d6391bb in linux-2.6.

If NetworkManager is busy scanning when user
tries to unload the module, the driver can not be unloaded
because HW still scanning.

Make sure driver sends abort scan host command to uCode if it
is in the middle of scanning during driver unload.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoiwlwifi: traverse linklist to find the valid OTP block
Wey-Yi Guy [Wed, 30 Sep 2009 22:36:15 +0000 (15:36 -0700)]
iwlwifi: traverse linklist to find the valid OTP block

commit 415e49936b4b29b34c2fb561eeab867d41fc43a6 upstream.

For devices using OTP memory, EEPROM image can start from
any one of the OTP blocks. If shadow RAM is disabled, we need to
traverse link list to find the last valid block, then start the EEPROM
image reading.

If OTP is not full, the valid block is the block _before_ the last block
on the link list; the last block on the link list is the empty block
ready for next OTP refresh/update.

If OTP is full, then the last block is the valid block to be used for
configure the device.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoiwlagn: modify digital SVR for 1000
Wey-Yi Guy [Fri, 17 Jul 2009 16:30:14 +0000 (09:30 -0700)]
iwlagn: modify digital SVR for 1000

commit 02c06e4abc0680afd31bf481a803541556757fb6 upstream.

On 1000, there are two Switching Voltage Regulators (SVR). The first one
apply digital voltage level (1.32V) for PCIe block and core. We need to
use this regulator to solve a stability issue related to noisy DC2DC
line in the silicon.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoiwlwifi: update 1000 series API version to match firmware
Jay Sternberg [Fri, 17 Jul 2009 16:30:22 +0000 (09:30 -0700)]
iwlwifi: update 1000 series API version to match firmware

commit cce53aa347c1e023d967b1cb1aa393c725aedba5 upstream.

firmware file now contains build number so API needs to be updated.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoiwlwifi: Handle new firmware file with ucode build number in header
Jay Sternberg [Fri, 17 Jul 2009 16:30:16 +0000 (09:30 -0700)]
iwlwifi: Handle new firmware file with ucode build number in header

commit cc0f555d511a5fe9d4519334c8f674a1dbab9e3a upstream.

Adding new API version to account for change to ucode file format.  New
header includes the build number of the ucode.  This build number is the
SVN revision thus allowing for exact correlation to the code that
generated it.

The header adds the build number so that older ucode images can also be
enhanced to include the build in the future.

some cleanup in iwl_read_ucode needed to ensure old header not used and
reduce unnecessary references through pointer with the data is already
in heap variable.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoNOMMU: Fix MAP_PRIVATE mmap() of objects where the data can be mapped directly
David Howells [Thu, 24 Sep 2009 14:13:10 +0000 (15:13 +0100)]
NOMMU: Fix MAP_PRIVATE mmap() of objects where the data can be mapped directly

commit 645d83c5db970a1c57225e155113b4aa2451e920 upstream.

Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store
might be mapped directly.  Use the BDI_CAP_MAP_DIRECT capability flag to govern
whether or not we should be trying to map a file directly.  This can be used to
determine whether or not a region has been filled in at the point where we call
do_mmap_shared() or do_mmap_private().

The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if
there's any reason we can't use it.  It's also cleared in do_mmap_pgoff() if
f_op->get_unmapped_area() fails.

Without this fix, attempting to run a program from a RomFS image on a
non-mappable MTD partition results in a BUG as the kernel attempts XIP, and
this can be caught in gdb:

Program received signal SIGABRT, Aborted.
0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
(gdb) bt
#0  0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
#1  0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=<value optimized out>, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373
#2  0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145
#3  0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=<value optimized out>) at fs/binfmt_elf_fdpic.c:343
#4  0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234
#5  0xc006c648 in do_execve (filename=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356
#6  0xc0008cf0 in sys_execve (name=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263
#7  0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897

Note that this fix does the following commit differently:

commit a190887b58c32d19c2eee007c5eb8faa970a69ba
Author: David Howells <dhowells@redhat.com>
Date:   Sat Sep 5 11:17:07 2009 -0700
nommu: fix error handling in do_mmap_pgoff()

Reported-by: Graff Yang <graff.yang@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agomptsas : PAE Kernel more than 4 GB kernel panic
Kashyap, Desai [Wed, 2 Sep 2009 06:14:57 +0000 (11:44 +0530)]
mptsas : PAE Kernel more than 4 GB kernel panic

commit c55b89fba9872ebcd5ac15cdfdad29ffb89329f0 upstream.

This patch is solving problem for PAE kernel DMA operation.
On PAE system dma_addr and unsigned long will have different
values.
Now dma_addr is not type casted using unsigned long.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoHID: completely remove apple mightymouse from blacklist
Jan Scholz [Wed, 26 Aug 2009 11:18:51 +0000 (13:18 +0200)]
HID: completely remove apple mightymouse from blacklist

commit 42960a13001aa6df52ca9952ce996f94a744ea65 upstream.

Commit fa047e4f6fa63a6e9d0ae4d7749538830d14a343 "HID: fix inverted
wheel for bluetooth version of apple mighty mouse" is incomplete. If
we remove Apple MightyMouse (bluetooth version) from the list of
apple_devices in drivers/hid/hid-apple.c we have to remove it from
hid_blacklist in drivers/hid/hid-core.c as well.

Signed-off-by: Jan Scholz <Scholz@fias.uni-frankfurt.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agopowerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL
Weirich, Bernhard [Thu, 24 Sep 2009 07:16:53 +0000 (17:16 +1000)]
powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL

[I'm going to fix upstream differently, by having all CPU types
actually support _PAGE_SPECIAL, but I prefer the simple and obvious
fix for -stable. -- Ben]

The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on
powerpc is bogus and will end up always defining it, even when
_PAGE_SPECIAL is not supported (in which case it's 0) such as on
8xx or 40x processors.

Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agopowerpc/8xx: Fix regression introduced by cache coherency rewrite
Rex Feany [Thu, 24 Sep 2009 07:16:54 +0000 (17:16 +1000)]
powerpc/8xx: Fix regression introduced by cache coherency rewrite

commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream.

After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agosaa7134: ir-kbd-i2c init data needs a persistent object
Brian Rogers [Wed, 23 Sep 2009 10:05:03 +0000 (03:05 -0700)]
saa7134: ir-kbd-i2c init data needs a persistent object

commit 7aedd5ec87686c557d48584d69ad880c11a0984d upstream.

Tested on MSI TV@nywhere Plus.

Original commit message:

ir-kbd-i2c's ir_probe() function can be called much later (i.e. at
ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data
allocated off of the stack in cx18_i2c_new_ir() at registration time.
Make sure we pass a pointer to a persistent IR_i2c_init_data object at
i2c registration time.

Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to
rise this question.

Before this patch, if ir-kbd-i2c were probed after SAA7134, trash data
were used.

Compile tested only, but the patch is identical to em28xx one. So, it
should work properly.

Original-patch-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[brian@xyzw.org: backported for 2.6.31]
Signed-off-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoem28xx: ir-kbd-i2c init data needs a persistent object
Brian Rogers [Wed, 23 Sep 2009 10:05:02 +0000 (03:05 -0700)]
em28xx: ir-kbd-i2c init data needs a persistent object

commit d2ebd0f806fdb6104903365e355675934eec22b2 upstream.

Original commit message:

ir-kbd-i2c's ir_probe() function can be called much later (i.e. at
ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data
allocated off of the stack in cx18_i2c_new_ir() at registration time.
Make sure we pass a pointer to a persistent IR_i2c_init_data object at
i2c registration time.

Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to
rise this question.

Before this patch, if ir-kbd-i2c were probed after em28xx, trash data
were used. After the patch, no matter what order, it is properly
reported as tested by me:

input: i2c IR (i2c IR (EM2840 Hauppaug as /class/input/input10
ir-kbd-i2c: i2c IR (i2c IR (EM2840 Hauppaug detected at i2c-4/4-0030/ir0 [em28xx #0]

Original-patch-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[brian@xyzw.org: backported for 2.6.31]
Signed-off-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agodrm/i915: Handle ERESTARTSYS during page fault
Chris Wilson [Tue, 22 Sep 2009 23:43:56 +0000 (00:43 +0100)]
drm/i915: Handle ERESTARTSYS during page fault

commit c715089f49844260f1eeae8e3b55af9468ba1325 upstream.

During a page fault and rebinding the buffer there exists a window for a
signal to arrive during the i915_wait_request() and trigger a
ERESTARTSYS. This used to be handled by returning SIGBUS and thereby
killing the application. Try 'cairo-perf-trace & cairo-test-suite' and
watch X go boom!

The solution as suggested by H. Peter Anvin is to simply return NOPAGE and
leave the higher layers to spot we did not fill the page and resubmit
the page fault.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: Mostly squash it with another commit]
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoFix idle time field in /proc/uptime
Michael Abbott [Thu, 24 Sep 2009 08:15:19 +0000 (10:15 +0200)]
Fix idle time field in /proc/uptime

commit 96830a57de1197519b62af6a4c9ceea556c18c3d upstream.

Git commit 79741dd changes idle cputime accounting, but unfortunately
the /proc/uptime file hasn't caught up.  Here the idle time calculation
from /proc/stat is copied over.

Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agommap: avoid unnecessary anon_vma lock acquisition in vma_adjust()
Lee Schermerhorn [Tue, 22 Sep 2009 00:03:40 +0000 (17:03 -0700)]
mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()

commit 252c5f94d944487e9f50ece7942b0fbf659c5c31 upstream.

We noticed very erratic behavior [throughput] with the AIM7 shared
workload running on recent distro [SLES11] and mainline kernels on an
8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
[2.6.27.19+] with Barcelona processors, as we increased the load [10s of
thousands of tasks], the throughput would vary between two "plateaus"--one
at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
causes the results to smooth out at the ~130k plateau.

But wait, there's more:

We do not see this behavior on smaller platforms--e.g., 4 socket/8 core.
This could be the result of the larger number of cpus on the larger
platform--a scalability issue--or it could be the result of the larger
number of interconnect "hops" between some nodes in this platform and how
the tasks for a given load end up distributed over the nodes' cpus and
memories--a stochastic NUMA effect.

The variability in the results are less pronounced [on the same platform]
with Shanghai processors and with mainline kernels.  With 31-rc6 on
Shanghai processors and 288 file systems on 288 fibre attached storage
volumes, the curves [jpm vs load] are both quite flat with the patched
kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
jpm] than the unpatched kernel.

Profiling indicated that the "slow" runs were incurring high[er]
contention on an anon_vma lock in vma_adjust(), apparently called from the
sbrk() system call.

The patch:

A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
anon_vma lock when we're only adjusting the end of a vma, as is the case
for brk().  The comment questions whether it's worth while to optimize for
this case.  Apparently, on the newer, larger x86_64 platforms, with
interesting NUMA topologies, it is worth while--especially considering
that the patch [if correct!] is quite simple.

We can detect this condition--no overlap with next vma--by noting a NULL
"importer".  The anon_vma pointer will also be NULL in this case, so
simply avoid loading vma->anon_vma to avoid the lock.

However, we DO need to take the anon_vma lock when we're inserting a vma
['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
need to check for 'insert', as well.  And Hugh points out that we should
also take it when adjusting vm_start (so that rmap.c can rely upon
vma_address() while it holds the anon_vma lock).

akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it
might be -stable material even though thiss isn't a regression: "this
issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is
severe on large machine (4*8*2 cpu)"

[hugh.dickins@tiscali.co.uk: test vma start too]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Eric Whitney <eric.whitney@hp.com>
Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agomm: fix anonymous dirtying
Hugh Dickins [Tue, 22 Sep 2009 00:03:29 +0000 (17:03 -0700)]
mm: fix anonymous dirtying

commit 1ac0cb5d0e22d5e483f56b2bc12172dec1cf7536 upstream.

do_anonymous_page() has been wrong to dirty the pte regardless.
If it's not going to mark the pte writable, then it won't help
to mark it dirty here, and clogs up memory with pages which will
need swap instead of being thrown away.  Especially wrong if no
overcommit is chosen, and this vma is not yet VM_ACCOUNTed -
we could exceed the limit and OOM despite no overcommit.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agomm: munlock use follow_page
Hugh Dickins [Tue, 22 Sep 2009 00:03:23 +0000 (17:03 -0700)]
mm: munlock use follow_page

commit 408e82b78bcc9f1b47c76e833c3df97f675947de upstream.

Hiroaki Wakabayashi points out that when mlock() has been interrupted
by SIGKILL, the subsequent munlock() takes unnecessarily long because
its use of __get_user_pages() insists on faulting in all the pages
which mlock() never reached.

It's worse than slowness if mlock() is terminated by Out Of Memory kill:
the munlock_vma_pages_all() in exit_mmap() insists on faulting in all the
pages which mlock() could not find memory for; so innocent bystanders are
killed too, and perhaps the system hangs.

__get_user_pages() does a lot that's silly for munlock(): so remove the
munlock option from __mlock_vma_pages_range(), and use a simple loop of
follow_page()s in munlock_vma_pages_range() instead; ignoring absent
pages, and not marking present pages as accessed or dirty.

(Change munlock() to only go so far as mlock() reached?  That does not
work out, given the convention that mlock() claims complete success even
when it has to give up early - in part so that an underlying file can be
extended later, and those pages locked which earlier would give SIGBUS.)

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Hiroaki Wakabayashi <primulaelatior@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agopage-allocator: limit the number of MIGRATE_RESERVE pageblocks per zone
Mel Gorman [Tue, 22 Sep 2009 00:03:02 +0000 (17:03 -0700)]
page-allocator: limit the number of MIGRATE_RESERVE pageblocks per zone

commit 78986a678f6ec3759a01976749f4437d8bf2d6c3 upstream.

After anti-fragmentation was merged, a bug was reported whereby devices
that depended on high-order atomic allocations were failing.  The solution
was to preserve a property in the buddy allocator which tended to keep the
minimum number of free pages in the zone at the lower physical addresses
and contiguous.  To preserve this property, MIGRATE_RESERVE was introduced
and a number of pageblocks at the start of a zone would be marked
"reserve", the number of which depended on min_free_kbytes.

Anti-fragmentation works by avoiding the mixing of page migratetypes
within the same pageblock.  One way of helping this is to increase
min_free_kbytes because it becomes less like that it will be necessary to
place pages of of MIGRATE_RESERVE is unbounded, the free memory is kept
there in large contiguous blocks instead of helping anti-fragmentation as
much as it should.  With the page-allocator tracepoint patches applied, it
was found during anti-fragmentation tests that the number of
fragmentation-related events were far higher than expected even with
min_free_kbytes at higher values.

This patch limits the number of MIGRATE_RESERVE blocks that exist per zone
to two.  For example, with a sufficient min_free_kbytes, 4MB of memory
will be kept aside on an x86-64 and remain more or less free and
contiguous for the systems uptime.  This should be sufficient for devices
depending on high-order atomic allocations while helping fragmentation
control when min_free_kbytes is tuned appropriately.  As side-effect of
this patch is that the reserve variable is converted to int as unsigned
long was the wrong type to use when ensuring that only the required number
of reserve blocks are created.

With the patches applied, fragmentation-related events as measured by the
page allocator tracepoints were significantly reduced when running some
fragmentation stress-tests on systems with min_free_kbytes tuned to a
value appropriate for hugepage allocations at runtime.  On x86, the events
recorded were reduced by 99.8%, on x86-64 by 99.72% and on ppc64 by
99.83%.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agohugetlb: restore interleaving of bootmem huge pages (2.6.31)
Lee Schermerhorn [Tue, 22 Sep 2009 00:01:04 +0000 (17:01 -0700)]
hugetlb: restore interleaving of bootmem huge pages (2.6.31)

Not upstream as it is fixed differently in .32

I noticed that alloc_bootmem_huge_page() will only advance to the next
node on failure to allocate a huge page.  I asked about this on linux-mm
and linux-numa, cc'ing the usual huge page suspects.  Mel Gorman
responded:

I strongly suspect that the same node being used until allocation
failure instead of round-robin is an oversight and not deliberate
at all. It appears to be a side-effect of a fix made way back in
commit 63b4613c3f0d4b724ba259dc6c201bb68b884e1a ["hugetlb: fix
hugepage allocation with memoryless nodes"]. Prior to that patch
it looked like allocations would always round-robin even when
allocation was successful.

Andy Whitcroft countered that the existing behavior looked like Andi
Kleen's original implementation and suggested that we ask him.  We did and
Andy replied that his intention was to interleave the allocations.  So,
...

This patch moves the advance of the hstate next node from which to
allocate up before the test for success of the attempted allocation.  This
will unconditionally advance the next node from which to alloc,
interleaving successful allocations over the nodes with sufficient
contiguous memory, and skipping over nodes that fail the huge page
allocation attempt.

Note that alloc_bootmem_huge_page() will only be called for huge pages of
order > MAX_ORDER.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years ago/proc/kcore: work around a BUG()
KAMEZAWA Hiroyuki [Tue, 22 Sep 2009 00:01:02 +0000 (17:01 -0700)]
/proc/kcore: work around a BUG()

Not upstream due to other fixes in .32

Works around a BUG() which is triggered when the kernel accesses holes in
vmalloc regions.

BUG: unable to handle kernel paging request at fa54c000
IP: [<c04f687a>] read_kcore+0x260/0x31a
*pde = 3540b067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.2/0000:03:00.0/ieee80211/phy0/rfkill0/state
Modules linked in: fuse sco bridge stp llc bnep l2cap bluetooth sunrpc nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput usb_storage arc4 ecb snd_hda_codec_realtek snd_hda_intel ath5k snd_hda_codec snd_hwdep iTCO_wdt snd_pcm iTCO_vendor_support pcspkr i2c_i801 mac80211 joydev snd_timer serio_raw r8169 snd soundcore mii snd_page_alloc ath cfg80211 ata_generic i915 drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
Sep  4 12:45:16 tuxedu kernel: Pid: 2266, comm: cat Not tainted (2.6.31-rc8 #2) Joybook Lite U101
EIP: 0060:[<c04f687a>] EFLAGS: 00010286 CPU: 0
EIP is at read_kcore+0x260/0x31a
EAX: f5e5ea00 EBX: fa54d000 ECX: 00000400 EDX: 00001000
ESI: fa54c000 EDI: f44ad000 EBP: e4533f4c ESP: e4533f24
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 2266, ti=e4532000 task=f09d19a0 task.ti=e4532000)
Stack:
00005000 00000000 f44ad000 09d9c000 00003000 fa54c000 00001000 f6d16f60
 e4520b80 fffffffb e4533f70 c04ef8eb e4533f98 00008000 09d97000 c04f661a
 e4520b80 09d97000 c04ef88c e4533f8c c04ba531 e4533f98 c04c0930 e4520b80
Call Trace:
[<c04ef8eb>] ? proc_reg_read+0x5f/0x73
[<c04f661a>] ? read_kcore+0x0/0x31a
[<c04ef88c>] ? proc_reg_read+0x0/0x73
[<c04ba531>] ? vfs_read+0x82/0xe1
[<c04c0930>] ? path_put+0x1a/0x1d
[<c04ba62e>] ? sys_read+0x40/0x62
[<c0403298>] ? sysenter_do_call+0x12/0x2d
Code: 39 f3 89 ca 0f 43 f3 89 fb 29 f2 29 f3 39 cf 0f 46 d3 29 55 dc 8d 1c 32 f6 40 0c 01 75 18 89 d1 89 f7 c1 e9 02 2b 7d ec 03 7d e0 <f3> a5 89 d1 83 e1 03 74 02 f3 a4 8b 00 83 7d dc 00 74 04 85 c0
EIP: [<c04f687a>] read_kcore+0x260/0x31a SS:ESP 0068:e4533f24
CR2: 00000000fa54c000

To access vmalloc area which may have memory holes, copy_from_user is
useful.  So this:

 # cat /proc/kcore > /dev/null

will not panic.

This is a minimal fix, suitable for 2.6.30.x and 2.6.31.  More extensive
/proc/kcore changes are planned for 2.6.32.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Nick Craig-Wood <nick@craig-wood.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Reported-by: <kbowa@tuxedu.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: Fix SS endpoint companion descriptor parsing.
Sarah Sharp [Tue, 8 Sep 2009 20:20:16 +0000 (13:20 -0700)]
USB: Fix SS endpoint companion descriptor parsing.

commit 6682bb39e111b34290e25c4d275c5bcf8bbccbe1 upstream.

When there's a descriptor after the SuperSpeed endpoint companion
descriptor, the previous code would have skipped over twice the length it
was supposed to.  This code fixes crashes seen with UASP devices (which
have a UASP descriptor after the SS endpoint companion descriptor).

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Support interrupt transfers.
Sarah Sharp [Wed, 2 Sep 2009 19:14:28 +0000 (12:14 -0700)]
USB: xhci: Support interrupt transfers.

commit 624defa12f304b4d11eda309bc207fa5a1900d0f upstream.

Interrupt transfers are submitted to the xHCI hardware using the same TRB
type as bulk transfers.  Re-use the bulk transfer enqueueing code to
enqueue interrupt transfers.

Interrupt transfers are a bit different than bulk transfers.  When the
interrupt endpoint is to be serviced, the xHC will consume (at most) one
TD.  A TD (comprised of sg list entries) can take several service
intervals to transmit.  The important thing for device drivers to note is
that if they use the scatter gather interface to submit interrupt
requests, they will not get data sent from two different scatter gather
lists in the same service interval.

For now, the xHCI driver will use the service interval from the endpoint's
descriptor (bInterval).  Drivers will need a hook to poll at a more
frequent interval.  Set urb->interval to the interval that the xHCI
hardware will use.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Set -EREMOTEIO when xHC gives bad transfer length.
Sarah Sharp [Fri, 28 Aug 2009 21:28:18 +0000 (14:28 -0700)]
USB: xhci: Set -EREMOTEIO when xHC gives bad transfer length.

commit 2f697f6cbff155b3ce4053a50cdf00b5be4dda11 upstream.

The xHCI hardware reports the number of bytes untransferred for a given
transfer buffer.  If the hardware reports a bytes untransferred value
greater than the submitted buffer size, we want to play it safe and say no
data was transferred.  If the driver considers a short packet to be an
error, remember to set -EREMOTEIO.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Check URB_SHORT_NOT_OK before setting short packet status.
Sarah Sharp [Fri, 28 Aug 2009 21:28:15 +0000 (14:28 -0700)]
USB: xhci: Check URB_SHORT_NOT_OK before setting short packet status.

commit 204970a4bb2f584afc430ae330cd44aee329cea4 upstream.

Make sure that the driver that submitted the URB considers a short packet
an error before setting -EREMOTEIO during a short control transfer.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Check URB's actual transfer buffer size.
Sarah Sharp [Thu, 27 Aug 2009 21:36:24 +0000 (14:36 -0700)]
USB: xhci: Check URB's actual transfer buffer size.

commit 99eb32db45061443ab7552b8fdceae68b90fde55 upstream.

Make sure that the amount of data the xHC says was transmitted is less
than or equal to the size of the requested transfer buffer.  Before, if
the host controller erroneously reported that the number of bytes
untransferred was bigger than the buffer in the URB, urb->actual_length
could be set to a very large size.

Make sure urb->actual_length <= urb->transfer_buffer_length.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Don't touch xhci_td after it's freed.
Sarah Sharp [Thu, 27 Aug 2009 21:36:14 +0000 (14:36 -0700)]
USB: xhci: Don't touch xhci_td after it's freed.

commit 9191eee7b8a0e18c07c06d6da502706805cab6d2 upstream.

On a successful transfer, urb->td is freed before the URB is ready to be
given back to the driver.  Don't touch urb->td after it's freed.  This bug
would have only shown up when xHCI debugging was turned on, and the freed
memory was quickly reused for something else.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Handle babbling endpoints correctly.
Sarah Sharp [Thu, 27 Aug 2009 21:36:03 +0000 (14:36 -0700)]
USB: xhci: Handle babbling endpoints correctly.

commit 83fbcdcca03013bb5af130d6d91eba11e3d3269e upstream.

The 0.95 xHCI spec says that non-control endpoints will be halted if a
babble is detected on a transfer.  The 0.96 xHCI spec says all types of
endpoints will be halted when a babble is detected.  Some hardware that
claims to be 0.95 compliant halts the control endpoint anyway.

When a babble is detected on a control endpoint, check the hardware's
output endpoint context to see if the endpoint is marked as halted.  If
the control endpoint is halted, a reset endpoint command must be issued
and the transfer ring dequeue pointer needs to be moved past the stopped
transfer.  Basically, we treat it as if the control endpoint had stalled.

Handle bulk babbling endpoints as if we got a completion event with a
stall completion code.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Make TRB completion code comparison readable.
Sarah Sharp [Thu, 27 Aug 2009 21:35:53 +0000 (14:35 -0700)]
USB: xhci: Make TRB completion code comparison readable.

commit 66d1eebce5cca916e0b08d961690bb01c64751ef upstream.

Use trb_comp_code instead of getting the completion code from the transfer
event every time.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Add quirk for Fresco Logic xHCI hardware.
Sarah Sharp [Fri, 7 Aug 2009 21:04:55 +0000 (14:04 -0700)]
USB: xhci: Add quirk for Fresco Logic xHCI hardware.

commit ac9d8fe7c6a8041cca5a0738915d2c4e21381421 upstream.

This Fresco Logic xHCI host controller chip revision puts bad data into
the output endpoint context after a Reset Endpoint command.  It needs a
Configure Endpoint command (instead of a Set TR Dequeue Pointer command)
after the reset endpoint command.

Set up the input context before issuing the Reset Endpoint command so we
don't copy bad data from the output endpoint context.  The HW also can't
handle two commands queued at once, so submit the TRB for the Configure
Endpoint command in the event handler for the Reset Endpoint command.

Devices that stall on control endpoints before a configuration is selected
will not work under this Fresco Logic xHCI host controller revision.

This patch is for prototype hardware that will be given to other companies
for evaluation purposes only, and should not reach consumer hands.  Fresco
Logic's next chip rev should have this bug fixed.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Handle stalled control endpoints.
Sarah Sharp [Fri, 7 Aug 2009 21:04:52 +0000 (14:04 -0700)]
USB: xhci: Handle stalled control endpoints.

commit 82d1009f537c2a43be0a410abd33521f76ee3a5a upstream.

When a control endpoint stalls, the next control transfer will clear the
stall.  The USB core doesn't call down to the host controller driver's
endpoint_reset() method when control endpoints stall, so the xHCI driver
has to do all its stall handling for internal state in its interrupt handler.

When the host stalls on a control endpoint, it may stop on the data phase
or status phase of the control transfer.  Like other stalled endpoints,
the xHCI driver needs to queue a Reset Endpoint command and move the
hardware's control endpoint ring dequeue pointer past the failed control
transfer (with a Set TR Dequeue Pointer or a Configure Endpoint command).

Since the USB core doesn't call usb_hcd_reset_endpoint() for control
endpoints, we need to do this in interrupt context when we get notified of
the stalled transfer.  URBs may be queued to the hardware before these two
commands complete.  The endpoint queue will be restarted once both
commands complete.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Support full speed devices.
Sarah Sharp [Fri, 7 Aug 2009 21:04:49 +0000 (14:04 -0700)]
USB: xhci: Support full speed devices.

commit 2d3f1fac7ee8bb4c6fad40f838488edbeabb0c50 upstream.

Full speed devices have varying max packet sizes (8, 16, 32, or 64) for
endpoint 0.  The xHCI hardware needs to know the real max packet size
that the USB core discovers after it fetches the first 8 bytes of the
device descriptor.

In order to fix this without adding a new hook to host controller drivers,
the xHCI driver looks for an updated max packet size for control
endpoints.  If it finds an updated size, it issues an evaluate context
command and waits for that command to finish.  This should only happen in
the initialization and device descriptor fetching steps in the khubd
thread, so blocking should be fine.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
8 years agoUSB: xhci: Set correct max packet size for HS/FS control endpoints.
Sarah Sharp [Fri, 7 Aug 2009 21:04:46 +0000 (14:04 -0700)]
USB: xhci: Set correct max packet size for HS/FS control endpoints.

commit 47aded8ade9fee6779b121b2b156235f261239d7 upstream.

Set the max packet size for the default control endpoint on high speed
devices to be 64 bytes.  High speed devices always have a max packet size
of 64 bytes.  There's no use setting it to eight for the initial 8 byte
descriptor fetch and then issuing (and waiting for) an evaluate context
command to update it to 64 bytes for the subsequent control transfers.

The USB core guesses that the max packet size on a full speed control
endpoint is 64 bytes, and then updates it after the first 8-byte
descriptor fetch.  Change the initial setup for the xHCI internal
representation of the full speed device to have a 64 byte max packet size.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>