5 years agoUSB: EHCI: fix regression in QH unlinking
Alan Stern [Wed, 20 Mar 2013 19:07:26 +0000 (15:07 -0400)]
USB: EHCI: fix regression in QH unlinking

commit d714aaf649460cbfd5e82e75520baa856b4fa0a0 upstream.

This patch (as1670) fixes a regression caused by commit
6402c796d3b4205d3d7296157956c5100a05d7d6 (USB: EHCI: work around
silicon bug in Intel's EHCI controllers).  The workaround goes through
two IAA cycles for each QH being unlinked.  During the first cycle,
the QH is not added to the async_iaa list (because it isn't fully gone
from the hardware yet), which means that list will be empty.

Unfortunately, I forgot to update the IAA watchdog timer routine.  It
thinks that an empty async_iaa list means the timer expiration was an
error, which isn't true any more.  This problem didn't show up during
initial testing because the controllers being tested all had working
IAA interrupts.  But not all controllers do, and when the watchdog
timer expires, the empty-list check prevents the second IAA cycle from
starting.  As a result, URB unlinks never complete.  The check needs
to be removed.

Among the symptoms of the regression are processes stuck in D wait
states and hangs during system shutdown.

Signed-off-by: Alan Stern <>
Reported-and-tested-by: Stephen Warren <>
Reported-and-tested-by: Sven Joachim <>
Reported-by: Andreas Bombe <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoUSB: EHCI: fix regression during bus resume
Alan Stern [Fri, 15 Mar 2013 18:40:26 +0000 (14:40 -0400)]
USB: EHCI: fix regression during bus resume

commit 2a40f324541ee61c22146214349c2ce9f5c30bcf upstream.

This patch (as1663) fixes a regression caused by commit
6e0c3339a6f19d748f16091d0a05adeb1e1f822b (USB: EHCI: unlink one async
QH at a time).  In order to avoid keeping multiple QHs in an unusable
intermediate state, that commit changed unlink_empty_async() so that
it unlinks only one empty QH at a time.

However, when the EHCI root hub is suspended, _all_ async QHs need to
be unlinked.  ehci_bus_suspend() used to do this by calling
unlink_empty_async(), but now this only unlinks one of the QHs, not
all of them.

The symptom is that when the root hub is resumed, USB communications
don't work for some period of time.  This is because ehci-hcd doesn't
realize it needs to restart the async schedule; it assumes that
because some QHs are already on the schedule, the schedule must be

The easiest way to fix the problem is add a new function that unlinks
all the async QHs when the root hub is suspended.

This patch should be applied to all kernels that have the 6e0c3339a6f1

Signed-off-by: Alan Stern <>
Reported-and-tested-by: Adrian Bassett <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoUSB: cdc-acm: fix device unregistration
Johan Hovold [Tue, 19 Mar 2013 08:21:06 +0000 (09:21 +0100)]
USB: cdc-acm: fix device unregistration

commit cb25505fc604292c70fc02143fc102f54c8595f0 upstream.

Unregister tty device in disconnect as is required by the USB stack.

By deferring unregistration to when the last tty reference is dropped,
the parent interface device can get unregistered before the child
resulting in broken hotplug events being generated when the tty is
finally closed:

KERNEL[2290.798128] remove   /devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:3.1 (usb)
KERNEL[2290.804589] remove   /devices/pci0000:00/0000:00:1d.7/usb2/2-1 (usb)
KERNEL[2294.554799] remove   /2-1:3.1/tty/ttyACM0 (tty)

The driver must deal with tty callbacks after disconnect by checking the
disconnected flag. Specifically, further opens must be prevented and
this is already implemented.

Acked-by: Oliver Neukum <>
Cc: Oliver Neukum <>
Signed-off-by: Johan Hovold <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoUSB: xhci: correctly enable interrupts
Hannes Reinecke [Mon, 4 Mar 2013 16:14:43 +0000 (17:14 +0100)]
USB: xhci: correctly enable interrupts

commit 00eed9c814cb8f281be6f0f5d8f45025dc0a97eb upstream.

xhci has its own interrupt enabling routine, which will try to
use MSI-X/MSI if present. So the usb core shouldn't try to enable
legacy interrupts; on some machines the xhci legacy IRQ setting
is invalid.

v3: Be careful to not break XHCI_BROKEN_MSI workaround (by trenn)

Cc: Bjorn Helgaas <>
Cc: Oliver Neukum <>
Cc: Thomas Renninger <>
Cc: Yinghai Lu <>
Cc: Frederik Himpe <>
Cc: David Haerdeman <>
Cc: Alan Stern <>
Acked-by: Sarah Sharp <>
Reviewed-by: Thomas Renninger <>
Signed-off-by: Hannes Reinecke <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoUSB: xhci - fix bit definitions for IMAN register
Dmitry Torokhov [Mon, 25 Feb 2013 18:56:01 +0000 (10:56 -0800)]
USB: xhci - fix bit definitions for IMAN register

commit f8264340e694604863255cc0276491d17c402390 upstream.

According to XHCI specification ( the IP is bit 0 and IE is bit 1
of IMAN register. Previously their definitions were reversed.

Even though there are no ill effects being observed from the swapped
definitions (because IMAN_IP is RW1C and in legacy PCI case we come in
with it already set to 1 so it was clearing itself even though we were
setting IMAN_IE instead of IMAN_IP), we should still correct the values.

This patch should be backported to kernels as old as 2.6.36, that
contain the commit 4e833c0b87a30798e67f06120cecebef6ee9644c "xhci: don't
re-enable IE constantly".

Signed-off-by: Dmitry Torokhov <>
Signed-off-by: Sarah Sharp <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agox86-64: Fix the failure case in copy_user_handle_tail()
CQ Tang [Mon, 18 Mar 2013 15:02:21 +0000 (11:02 -0400)]
x86-64: Fix the failure case in copy_user_handle_tail()

commit 66db3feb486c01349f767b98ebb10b0c3d2d021b upstream.

The increment of "to" in copy_user_handle_tail() will have incremented
before a failure has been noted.  This causes us to skip a byte in the
failure case.

Only do the increment when assured there is no failure.

Signed-off-by: CQ Tang <>
Signed-off-by: Mike Marciniszyn <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoclockevents: Don't allow dummy broadcast timers
Mark Rutland [Thu, 7 Mar 2013 15:09:24 +0000 (15:09 +0000)]
clockevents: Don't allow dummy broadcast timers

commit a7dc19b8652c862d5b7c4d2339bd3c428bd29c4a upstream.

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.

Signed-off-by: Mark Rutland <>
Cc: Jon Medhurst (Tixy) <>
Signed-off-by: Thomas Gleixner <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonfsd: fix bad offset use
Kent Overstreet [Fri, 22 Mar 2013 18:18:24 +0000 (11:18 -0700)]
nfsd: fix bad offset use

commit e49dbbf3e770aa590a8a464ac4978a09027060b9 upstream.

vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face15025ffdf664de95e86ae831544154d26c9c "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".

Signed-off-by: Kent Overstreet <>
Cc: Al Viro <>
Cc: "Eric W. Biederman" <>
Reviewed-by: Zach Brown <>
Signed-off-by: J. Bruce Fields <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomd/raid5: ensure sync and DISCARD don't happen at the same time.
NeilBrown [Tue, 12 Mar 2013 01:18:06 +0000 (12:18 +1100)]
md/raid5: ensure sync and DISCARD don't happen at the same time.

commit f8dfcffd0472a0f353f34a567ad3f53568914d04 upstream.

A number of problems can occur due to races between
resync/recovery and discard.

- if sync_request calls handle_stripe() while a discard is
  happening on the stripe, it might call handle_stripe_clean_event
  before all of the individual discard requests have completed
  (so some devices are still locked, but not all).
  Since commit ca64cae96037de16e4af92678814f5d4bf0c1c65
     md/raid5: Make sure we clear R5_Discard when discard is finished.
  this will cause R5_Discard to be cleared for the parity device,
  so handle_stripe_clean_event() will not be called when the other
  devices do become unlocked, so their ->written will not be cleared.
  This ultimately leads to a WARN_ON in init_stripe and a lock-up.

- If handle_stripe_clean_event() does clear R5_UPTODATE at an awkward
  time for resync, it can lead to s->uptodate being less than disks
  in handle_parity_checks5(), which triggers a BUG (because it is

 - keep R5_Discard on the parity device until all other devices have
   completed their discard request
 - make sure we don't try to have a 'discard' and a 'sync' action at
   the same time.
   This involves a new stripe flag to we know when a 'discard' is
   happening, and the use of R5_Overlap on the parity disk so when a
   discard is wanted while a sync is active, so we know to wake up
   the discard at the appropriate time.

Discard support for RAID5 was added in 3.7, so this is suitable for
any -stable kernel since 3.7.

Reported-by: Jes Sorensen <>
Tested-by: Jes Sorensen <>
Signed-off-by: NeilBrown <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoMD RAID5: Avoid accessing gendisk or queue structs when not available
Jonathan Brassow [Thu, 7 Mar 2013 22:22:01 +0000 (16:22 -0600)]
MD RAID5: Avoid accessing gendisk or queue structs when not available

commit e3620a3ad52609f64a2402e4b59300afb4b83b77 upstream.

MD RAID5:  Fix kernel oops when RAID4/5/6 is used via device-mapper

Commit a9add5d (v3.8-rc1) added blktrace calls to the RAID4/5/6 driver.
However, when device-mapper is used to create RAID4/5/6 arrays, the
mddev->gendisk and mddev->queue fields are not setup.  Therefore, calling
things like trace_block_bio_remap will cause a kernel oops.  This patch
conditionalizes those calls on whether the proper fields exist to make
the calls.  (Device-mapper will call trace_block_bio_remap on its own.)

This patch is suitable for the 3.8.y stable kernel.

Signed-off-by: Jonathan Brassow <>
Signed-off-by: NeilBrown <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomd/raid5: schedule_construction should abort if nothing to do.
NeilBrown [Mon, 4 Mar 2013 01:37:14 +0000 (12:37 +1100)]
md/raid5: schedule_construction should abort if nothing to do.

commit ce7d363aaf1e28be8406a2976220944ca487e8ca upstream.

Since commit 1ed850f356a0a422013846b5291acff08815008b
    md/raid5: make sure to_read and to_write never go negative.

It has been possible for handle_stripe_dirtying to be called
when there isn't actually any work to do.
It then calls schedule_reconstruction() which will set R5_LOCKED
on the parity block(s) even when nothing else is happening.
This then causes problems in do_release_stripe().

So add checks to schedule_reconstruction() so that if it doesn't
find anything to do, it just aborts.

This bug was introduced in v3.7, so the patch is suitable
for -stable kernels since then.

Reported-by: majianpeng <>
Signed-off-by: NeilBrown <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agowatchdog: sp5100_tco: Remove code that may cause a boot failure
Takahisa Tanaka [Sun, 3 Mar 2013 05:52:07 +0000 (14:52 +0900)]
watchdog: sp5100_tco: Remove code that may cause a boot failure

commit 18e4321276fcf083b85b788fee7cf15be29ed72a upstream.

A problem was found on PC's with the SB700 chipset: The PC fails to
load BIOS after running the 3.8.x kernel until the power is completely
cut off. It occurs in all 3.8.x versions and the mainline version as of
2/4. The issue does not occur with the 3.7.x builds.

There are two methods for accessing the watchdog registers.

 1. Re-programming a resource address obtained by allocate_resource()
to chipset.
 2. Use the direct memory-mapped IO access.

The method 1 can be used by all the chipsets (SP5100, SB7x0, SB8x0 or
later). However, experience shows that only PC with the SB8x0 (or
later) chipsets can use the method 2.

This patch removes the method 1, because the critical problem was found.
That's why the watchdog timer was able to be used on SP5100 and SB7x0
chipsets until now.

Signed-off-by: Takahisa Tanaka <>
Signed-off-by: Wim Van Sebroeck <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agowatchdog: sp5100_tco: Set the AcpiMmioSel bitmask value to 1 instead of 2
Takahisa Tanaka [Sun, 3 Mar 2013 05:48:00 +0000 (14:48 +0900)]
watchdog: sp5100_tco: Set the AcpiMmioSel bitmask value to 1 instead of 2

commit 81fc933f176cd95f757bfc8a98109ef422598b79 upstream.

The AcpiMmioSel bit is bit 1 in the AcpiMmioEn register, but the current
sp5100_tco driver is using bit 2.

See 2.3.3 Power Management (PM) Registers page 150 of the
AMD SB800-Series Southbridges Register Reference Guide [1].

        AcpiMmioEn - RW – 8/16/32 bits - [PM_Reg: 24h]
        Field Name        Bits  Default  Description
        AcpiMMioDecodeEn  0     0b       Set to 1 to enable AcpiMMio space.
        AcpiMMIoSel       1     0b       Set AcpiMMio registers to be memory-mapped or IO-mapped space.
                                         0: Memory-mapped space
                                         1: I/O-mapped space

The sp5100_tco driver expects zero as a value of AcpiMmioSel (bit 1).

Fortunately, no problems were caused by this typo, because the default
value of the undocumented misused bit 2 seems to be zero.

However, the sp5100_tco driver should use the correct bitmask value.


Signed-off-by: Takahisa Tanaka <>
Signed-off-by: Paul Menzel <>
Signed-off-by: Wim Van Sebroeck <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoIPoIB: Fix send lockup due to missed TX completion
Mike Marciniszyn [Tue, 26 Feb 2013 15:46:27 +0000 (15:46 +0000)]
IPoIB: Fix send lockup due to missed TX completion

commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream.

Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Reviewed-by: Dean Luick <>
Signed-off-by: Mike Marciniszyn <>
Signed-off-by: Roland Dreier <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoext4: fix data=journal fast mount/umount hang
Theodore Ts'o [Wed, 20 Mar 2013 13:42:11 +0000 (09:42 -0400)]
ext4: fix data=journal fast mount/umount hang

commit 2b405bfa84063bfa35621d2d6879f52693c614b0 upstream.

In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into jbd2_log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.

Arguably we should adjust ext4_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext4_evict_inode(), and so to save a bit of CPU time, and to make the
patch much more obviously correct by inspection(tm), we'll fix it by
explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.

This can be easily replicated via:

     mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb

------------[ cut here ]------------
WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd()
Hardware name: Bochs
JBD2: bad log_start_commit: 3005630206 3005630206 0 0
Modules linked in:
Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020
Call Trace:
 [<c015c0ef>] warn_slowpath_common+0x68/0x7d
 [<c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd
 [<c015c177>] warn_slowpath_fmt+0x2b/0x2f
 [<c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd
 [<c02b8075>] jbd2_log_start_commit+0x24/0x34
 [<c0279ed5>] ext4_evict_inode+0x71/0x2e3
 [<c021f0ec>] evict+0x94/0x135
 [<c021f9aa>] iput+0x10a/0x110
 [<c02b7836>] jbd2_journal_destroy+0x190/0x1ce
 [<c0175284>] ? bit_waitqueue+0x50/0x50
 [<c028d23f>] ext4_put_super+0x52/0x294
 [<c020efe3>] generic_shutdown_super+0x48/0xb4
 [<c020f071>] kill_block_super+0x22/0x60
 [<c020f3e0>] deactivate_locked_super+0x22/0x49
 [<c020f5d6>] deactivate_super+0x30/0x33
 [<c0222795>] mntput_no_expire+0x107/0x10c
 [<c02233a7>] sys_umount+0x2cf/0x2e0
 [<c02233ca>] sys_oldumount+0x12/0x14
 [<c08096b8>] syscall_call+0x7/0xb
---[ end trace 6a954cc790501c1f ]---
jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0

Signed-off-by: "Theodore Ts'o" <>
Reviewed-by: Jan Kara <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoext4: use s_extent_max_zeroout_kb value as number of kb
Lukas Czerner [Tue, 12 Mar 2013 16:40:04 +0000 (12:40 -0400)]
ext4: use s_extent_max_zeroout_kb value as number of kb

commit 4f42f80a8f08d4c3f52c4267361241885d5dee3a upstream.

Currently when converting extent to initialized, we have to decide
whether to zeroout part/all of the uninitialized extent in order to
avoid extent tree growing rapidly.

The decision is made by comparing the size of the extent with the
configurable value s_extent_max_zeroout_kb which is in kibibytes units.

However when converting it to number of blocks we currently use it as it
was in bytes. This is obviously bug and it will result in ext4 _never_
zeroout extents, but rather always split and convert parts to
initialized while leaving the rest uninitialized in default setting.

Fix this by using s_extent_max_zeroout_kb as kibibytes.

Signed-off-by: Lukas Czerner <>
Signed-off-by: "Theodore Ts'o" <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoext4: use atomic64_t for the per-flexbg free_clusters count
Theodore Ts'o [Tue, 12 Mar 2013 03:39:59 +0000 (23:39 -0400)]
ext4: use atomic64_t for the per-flexbg free_clusters count

commit 90ba983f6889e65a3b506b30dc606aa9d1d46cd2 upstream.

A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow.  This was detected by PaX security patchset:

This bug was introduced in commit 9f24e4208f7e, so it's been around
since 2.6.30.  :-(

Fix this by using an atomic64_t for struct orlav_stats's

Signed-off-by: "Theodore Ts'o" <>
Reviewed-by: Lukas Czerner <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agousb-storage: add unusual_devs entry for Samsung YP-Z3 mp3 player
Dmitry Artamonow [Sat, 9 Mar 2013 16:30:58 +0000 (20:30 +0400)]
usb-storage: add unusual_devs entry for Samsung YP-Z3 mp3 player

commit 29f86e66428ee083aec106cca1748dc63d98ce23 upstream.

Device stucks on filesystem writes, unless following quirk is passed:
  echo 04e8:5136:m > /sys/module/usb_storage/parameters/quirks

Add corresponding entry to unusual_devs.h

Signed-off-by: Dmitry Artamonow <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoext4: fix the wrong number of the allocated blocks in ext4_split_extent()
Zheng Liu [Mon, 11 Mar 2013 01:20:23 +0000 (21:20 -0400)]
ext4: fix the wrong number of the allocated blocks in ext4_split_extent()

commit 3a2256702e47f68f921dfad41b1764d05c572329 upstream.

This commit fixes a wrong return value of the number of the allocated
blocks in ext4_split_extent.  When the length of blocks we want to
allocate is greater than the length of the current extent, we return a
wrong number.  Let's see what happens in the following case when we
call ext4_split_extent().

  map: [48, 72]
  ex:  [32, 64, u]

'ex' will be split into two parts:
  ex1: [32, 47, u]
  ex2: [48, 64, w]

'map->m_len' is returned from this function, and the value is 24.  But
the real length is 16.  So it should be fixed.

Meanwhile in this commit we use right length of the allocated blocks
when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents
is called.

Signed-off-by: Zheng Liu <>
Signed-off-by: "Theodore Ts'o" <>
Cc: Dmitry Monakhov <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agojbd2: fix use after free in jbd2_journal_dirty_metadata()
Jan Kara [Mon, 11 Mar 2013 17:24:56 +0000 (13:24 -0400)]
jbd2: fix use after free in jbd2_journal_dirty_metadata()

commit ad56edad089b56300fd13bb9eeb7d0424d978239 upstream.

jbd2_journal_dirty_metadata() didn't get a reference to journal_head it
was working with. This is OK in most of the cases since the journal head
should be attached to a transaction but in rare occasions when we are
journalling data, __ext4_journalled_writepage() can race with
jbd2_journal_invalidatepage() stripping buffers from a page and thus
journal head can be freed under hands of jbd2_journal_dirty_metadata().

Fix the problem by getting own journal head reference in
jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers()
which can possibly have the same issue).

Reported-by: Zheng Liu <>
Signed-off-by: Jan Kara <>
Signed-off-by: "Theodore Ts'o" <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agocifs: ignore everything in SPNEGO blob after mechTypes
Jeff Layton [Mon, 11 Mar 2013 13:52:19 +0000 (09:52 -0400)]
cifs: ignore everything in SPNEGO blob after mechTypes

commit f853c616883a8de966873a1dab283f1369e275a1 upstream.

We've had several reports of people attempting to mount Windows 8 shares
and getting failures with a return code of -EINVAL. The default sec=
mode changed recently to sec=ntlmssp. With that, we expect and parse a
SPNEGO blob from the server in the NEGOTIATE reply.

The current decode_negTokenInit function first parses all of the
mechTypes and then tries to parse the rest of the negTokenInit reply.
The parser however currently expects a mechListMIC or nothing to follow the
mechTypes, but Windows 8 puts a mechToken field there instead to carry
some info for the new NegoEx stuff.

In practice, we don't do anything with the fields after the mechTypes
anyway so I don't see any real benefit in continuing to parse them.
This patch just has the kernel ignore the fields after the mechTypes.
We'll probably need to reinstate some of this if we ever want to support

Reported-by: Jason Burgess <>
Reported-by: Yan Li <>
Signed-off-by: Jeff Layton <>
Signed-off-by: Steve French <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agocifs: delay super block destruction until all cifsFileInfo objects are gone
Mateusz Guzik [Fri, 8 Mar 2013 15:30:03 +0000 (16:30 +0100)]
cifs: delay super block destruction until all cifsFileInfo objects are gone

commit 24261fc23db950951760d00c188ba63cc756b932 upstream.

cifsFileInfo objects hold references to dentries and it is possible that
these will still be around in workqueues when VFS decides to kill super
block during unmount.

This results in panics like this one:
BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:943!
Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0)
Call Trace:
 [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60
 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0
 [<ffffffff8119f946>] kill_anon_super+0x16/0x30
 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs]
 [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80
 [<ffffffff811a085e>] deactivate_super+0x4e/0x70
 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130
 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0
 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b

Fix this by making each cifsFileInfo object hold a reference to cifs
super block, which implicitly keeps VFS super block around as well.

Signed-off-by: Mateusz Guzik <>
Reviewed-by: Jeff Layton <>
Reported-and-Tested-by: Ben Greear <>
Signed-off-by: Steve French <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/radeon/benchmark: make sure bo blit copy exists before using it
Alex Deucher [Tue, 12 Mar 2013 16:53:13 +0000 (12:53 -0400)]
drm/radeon/benchmark: make sure bo blit copy exists before using it

commit fa8d387dc3f62062a6b4afbbb2a3438094fd8584 upstream.

Fixes a segfault on asics without a blit callback.


Reviewed-by: Michel Dänzer <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/radeon: fix backend map setup on 1 RB trinity boards
Alex Deucher [Mon, 11 Mar 2013 23:28:39 +0000 (19:28 -0400)]
drm/radeon: fix backend map setup on 1 RB trinity boards

commit 8f612b23a17dce86fef75407e698de6243cc99a1 upstream.

Need to adjust the backend map depending on which RB is
enabled.  This is the trinity equivalent of:

May fix:

Signed-off-by: Alex Deucher <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/radeon: fix S/R on VM systems (cayman/TN/SI)
Alex Deucher [Mon, 11 Mar 2013 19:32:26 +0000 (15:32 -0400)]
drm/radeon: fix S/R on VM systems (cayman/TN/SI)

commit fa3daf9aa74a3ac1c87d8188a43d283d06720032 upstream.

We weren't properly tearing down the VM sub-alloctor
on suspend leading to bogus VM PTs on resume.


Reviewed-by: Christian König <>
Tested-by: Dmitry Cherkasov <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/radeon: add support for Richland APUs
Alex Deucher [Fri, 8 Mar 2013 18:44:15 +0000 (13:44 -0500)]
drm/radeon: add support for Richland APUs

commit e4d170633fde379f39a90f8a5e7eb619b5d1144d upstream.

Richland APUs are a new version of the Trinity APUs
with performance and power management improvements.

Reviewed-by: Jerome Glisse <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/radeon: add Richland pci ids
Alex Deucher [Fri, 8 Mar 2013 18:36:54 +0000 (13:36 -0500)]
drm/radeon: add Richland pci ids

commit b75bbaa038ffc426e88ea3df6c4ae11834fc3e4f upstream.

Reviewed-by: Jerome Glisse <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/mgag200: Bug fix: Modified pll algorithm for EH project
Julia Lemire [Mon, 18 Mar 2013 14:17:47 +0000 (10:17 -0400)]
drm/mgag200: Bug fix: Modified pll algorithm for EH project

commit 260b3f1291a75a580d22ce8bfb1499c617272716 upstream.

While testing the mgag200 kms driver on the HP ProLiant Gen8, a
bug was seen.  Once the bootloader would load the selected kernel,
the screen would go black.  At first it was assumed that the
mgag200 kms driver was hanging.  But after setting up the grub
serial output, it was seen that the driver was being loaded
properly.  After trying serval monitors, one finaly displayed
the message "Frequency Out of Range".  By comparing the kms pll
algorithm with the previous mgag200 xorg driver pll algorithm,
discrepencies were found.  Once the kms pll algorithm was
modified, the expected pll values were produced.  This fix was
tested on several monitors of varying native resolutions.

Signed-off-by: Julia Lemire <>
Signed-off-by: Dave Airlie <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodm verity: avoid deadlock
Mikulas Patocka [Wed, 20 Mar 2013 17:21:25 +0000 (17:21 +0000)]
dm verity: avoid deadlock

commit 3b6b7813b198b578aa7e04e4047ddb8225c37b7f upstream.

A deadlock was found in the prefetch code in the dm verity map
function.  This patch fixes this by transferring the prefetch
to a worker thread and skipping it completely if kmalloc fails.

If generic_make_request is called recursively, it queues the I/O
request on the current->bio_list without making the I/O request
and returns. The routine making the recursive call cannot wait
for the I/O to complete.

The deadlock occurs when one thread grabs the bufio_client
mutex and waits for an I/O to complete but the I/O is queued
on another thread's current->bio_list and is waiting to get
the mutex held by the first thread.

The fix recognises that prefetching is not essential.  If memory
can be allocated, it queues the prefetch request to the worker thread,
but if not, it does nothing.

Signed-off-by: Paul Taysom <>
Signed-off-by: Mikulas Patocka <>
Signed-off-by: Alasdair G Kergon <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodm thin: fix discard corruption
Joe Thornber [Wed, 20 Mar 2013 17:21:24 +0000 (17:21 +0000)]
dm thin: fix discard corruption

commit f046f89a99ccfd9408b94c653374ff3065c7edb3 upstream.

Fix a bug in dm_btree_remove that could leave leaf values with incorrect
reference counts.  The effect of this was that removal of a shared block
could result in the space maps thinking the block was no longer used.
More concretely, if you have a thin device and a snapshot of it, sending
a discard to a shared region of the thin could corrupt the snapshot.

Thinp uses a 2-level nested btree to store it's mappings.  This first
level is indexed by thin device, and the second level by logical

Often when we're removing an entry in this mapping tree we need to
rebalance nodes, which can involve shadowing them, possibly creating a
copy if the block is shared.  If we do create a copy then children of
that node need to have their reference counts incremented.  In this
way reference counts percolate down the tree as shared trees diverge.

The rebalance functions were incrementing the children at the
appropriate time, but they were always assuming the children were
internal nodes.  This meant the leaf values (in our case packed
block/flags entries) were not being incremented.

Signed-off-by: Joe Thornber <>
Signed-off-by: Alasdair G Kergon <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoARM: tegra: fix register address of slink controller
Laxman Dewangan [Fri, 22 Mar 2013 18:35:06 +0000 (12:35 -0600)]
ARM: tegra: fix register address of slink controller

commit 57471c8d3c22873f70813820e6b4d2d1fea9629d upstream.

Fix typo on register address of slink3 controller where register
address is wrongly set as 0x7000d480 but it is 0x7000d800.

Signed-off-by: Laxman Dewangan <>
Signed-off-by: Stephen Warren <>
Signed-off-by: Arnd Bergmann <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotarget/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os
Nicholas Bellinger [Mon, 18 Mar 2013 20:15:57 +0000 (13:15 -0700)]
target/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os

commit f002a24388cc460c8a9be7d446a9871f7c9d52b6 upstream.

This patch bumps the default FILEIO backend FD_MAX_SECTORS value from
1024 -> 2048 in order to allow block_size=512 to handle 1M sized I/Os.

The current default rejects I/Os larger than 512K in sbc_parse_cdb():

[12015.915146] SCSI OP 2ah with too big sectors 1347 exceeds backend
hw_max_sectors: 1024
[12015.977744] SCSI OP 2ah with too big sectors 2048 exceeds backend
hw_max_sectors: 1024

This issue is present in >= v3.5 based kernels, introduced after the
removal of se_task logic.

Reported-by: Viljami Ilola <>
Signed-off-by: Nicholas Bellinger <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotarget/iscsi: Fix mutual CHAP auth on big-endian arches
Andy Grover [Mon, 4 Mar 2013 21:52:09 +0000 (13:52 -0800)]
target/iscsi: Fix mutual CHAP auth on big-endian arches

commit 7ac9ad11b2a5cf77a92b58ee6b672ad2fa155eb1 upstream.


Used a temp var since we take its address in sg_init_one.

Signed-off-by: Andy Grover <>
Signed-off-by: Nicholas Bellinger <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomqueue: sys_mq_open: do not call mnt_drop_write() if read-only
Vladimir Davydov [Fri, 22 Mar 2013 22:04:51 +0000 (15:04 -0700)]
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only

commit 38d78e587d4960d0db94add518d27ee74bad2301 upstream.

mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.

mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only.  Besides, on umount a warning would be
printed like this one:

  [ BUG: bad unlock balance detected! ]
  3.9.0-rc3 #5 Not tainted
  a.out/12486 is trying to release lock (sb_writers) at:
  but there are no more locks to release!

Signed-off-by: Vladimir Davydov <>
Cc: Doug Ledford <>
Cc: KOSAKI Motohiro <>
Cc: "Eric W. Biederman" <>
Cc: Al Viro <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrivers/video/ep93xx-fb.c: include <linux/io.h> for devm_ioremap()
H Hartley Sweeten [Fri, 22 Mar 2013 22:04:45 +0000 (15:04 -0700)]
drivers/video/ep93xx-fb.c: include <linux/io.h> for devm_ioremap()

commit e66b05873a7a76afc569da6382509471cba8d5ff upstream.

Commit be8678149701 ("drivers/video/ep93xx-fb.c: use devm_ functions")
introduced a build error:

  drivers/video/ep93xx-fb.c: In function 'ep93xxfb_probe':
  drivers/video/ep93xx-fb.c:532: error: implicit declaration of function 'devm_ioremap'
  drivers/video/ep93xx-fb.c:533: warning: assignment makes pointer from integer without a cast

Include <linux/io.h> to pickup the declaration of 'devm_ioremap'.

Signed-off-by: H Hartley Sweeten <>
Cc: Florian Tobias Schandinat <>
Acked-by: Ryan Mallon <>
Cc: Damien Cassou <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
Wanpeng Li [Fri, 22 Mar 2013 22:04:40 +0000 (15:04 -0700)]
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting

commit d00285884c0892bb1310df96bce6056e9ce9b9d9 upstream.

hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).

If the system is configured for more than one hugepage size, which is
possible since commit a137e1cc6d6e ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp.  shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory.  This can lead to an
unexpected ENOMEM/EFAULT resp.  SIGSEGV when memory is accounted.

  boot: hugepagesz=1G hugepages=1
  the default overcommit ratio is 50
  before patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     55434168 kB

  after patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     54909880 kB

[ coding-style tweak]
Signed-off-by: Wanpeng Li <>
Acked-by: Michal Hocko <>
Cc: "Aneesh Kumar K.V" <>
Cc: Hillf Danton <>
Cc: KAMEZAWA Hiroyuki <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR
Nicolas Ferre [Fri, 22 Mar 2013 22:04:47 +0000 (15:04 -0700)]
drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR

commit 0ef1594c017521ea89278e80fe3f80dafb17abde upstream.

On some revisions of AT91 SoCs, the RTC IMR register is not working.
Instead of elaborating a workaround for that specific SoC or IP version,
we simply use a software variable to store the Interrupt Mask Register
and modify it for each enabling/disabling of an interrupt.  The overhead
of this is negligible anyway.

The interrupt mask register (IMR) for the RTC is broken on the AT91SAM9x5
sub-family of SoCs (good overview of the members here: ).  The "user visible
effect" is the RTC doesn't work.

That sub-family is less than two years old and only has devicetree (DT)
support and came online circa lk 3.7 .  The dust is yet to settle on the
DT stuff at least for AT91 SoCs (translation: lots of stuff is still
broken, so much that it is hard to know where to start).

The fix in the patch is pretty simple: just shadow the silicon IMR
register with a variable in the driver.  Some older SoCs (pre-DT) use the
the rtc-at91rm9200 driver (e.g.  obviously the AT91RM9200) and they should
not be impacted by the change.  There shouldn't be a large volume of
interrupts associated with a RTC.

Signed-off-by: Nicolas Ferre <>
Reported-by: Douglas Gilbert <>
Cc: Jean-Christophe PLAGNIOL-VILLARD <>
Cc: Ludovic Desroches <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoKMS: fix EDID detailed timing frame rate
Torsten Duwe [Sat, 23 Mar 2013 14:39:34 +0000 (15:39 +0100)]
KMS: fix EDID detailed timing frame rate

commit c19b3b0f6eed552952845e4ad908dba2113d67b4 upstream.

When KMS has parsed an EDID "detailed timing", it leaves the frame rate
zeroed.  Consecutive (debug-) output of that mode thus yields 0 for
vsync.  This simple fix also speeds up future invocations of

While it is debatable whether this qualifies as a -stable fix I'd apply
it for consistency's sake; drm_helper_probe_single_connector_modes()
does the same thing already for all probed modes.

Signed-off-by: Torsten Duwe <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoKMS: fix EDID detailed timing vsync parsing
Torsten Duwe [Sat, 23 Mar 2013 14:38:22 +0000 (15:38 +0100)]
KMS: fix EDID detailed timing vsync parsing

commit 16dad1d743d31a104a849c8944e6b9eb479f6cd7 upstream.

EDID spreads some values across multiple bytes; bit-fiddling is needed
to retrieve these.  The current code to parse "detailed timings" has a
cut&paste error that results in a vsync offset of at most 15 lines
instead of 63.


and in the "EDID Detailed Timing Descriptor" see bytes 10+11 show why
that needs to be a left shift.

Signed-off-by: Torsten Duwe <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoi2c: tegra: check the clk_prepare_enable() return value
Laxman Dewangan [Fri, 15 Mar 2013 05:34:08 +0000 (05:34 +0000)]
i2c: tegra: check the clk_prepare_enable() return value

commit 132c803f7b70b17322579f6f4f3f65cf68e55135 upstream.

NVIDIA's Tegra SoC allows read/write of controller register only
if controller clock is enabled. System hangs if read/write happens
to registers without enabling clock.

clk_prepare_enable() can be fail due to unknown reason and hence
adding check for return value of this function. If this function
success then only access register otherwise return to caller with

Signed-off-by: Laxman Dewangan <>
Reviewed-by: Stephen Warren <>
Signed-off-by: Wolfram Sang <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoRevert "drm/i915: write backlight harder"
Daniel Vetter [Fri, 22 Mar 2013 14:44:46 +0000 (15:44 +0100)]
Revert "drm/i915: write backlight harder"

commit b1289371fcd580b4c412e6d05c4cb8ac8d277239 upstream.

This reverts commit cf0a6584aa6d382f802f2c3cacac23ccbccde0cd.

Turns out that cargo-culting breaks systems. Note that we can't revert
further, since

commit 770c12312ad617172b1a65b911d3e6564fc5aca8
Author: Takashi Iwai <>
Date:   Sat Aug 11 08:56:42 2012 +0200

    drm/i915: Fix blank panel at reopening lid

fixed a regression in 3.6-rc kernels for which we've never figured out
the exact root cause. But some further inspection of the backlight
code reveals that it's seriously lacking locking. And especially the
asle backlight update is know to get fired (through some smm magic)
when writing specific backlight control registers. So the possibility
of suffering from races is rather real.

Until those races are fixed I don't think it makes sense to try
further hacks. Which sucks a bit, but sometimes that's how it is :(

Tested-by: Takashi Iwai <>
Cc: Jani Nikula <>
Cc: Takashi Iwai <>
Signed-off-by: Daniel Vetter <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/i915: bounds check execbuffer relocation count
Kees Cook [Tue, 12 Mar 2013 00:31:45 +0000 (17:31 -0700)]
drm/i915: bounds check execbuffer relocation count

commit 3118a4f652c7b12c752f3222af0447008f9b2368 upstream.

It is possible to wrap the counter used to allocate the buffer for
relocation copies. This could lead to heap writing overflows.


v3: collapse test, improve comment
v2: move check into validate_exec_list

Signed-off-by: Kees Cook <>
Reported-by: Pinkie Pie
Reviewed-by: Chris Wilson <>
Signed-off-by: Daniel Vetter <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomwifiex: fix potential out-of-boundary access to ibss rate table
Bing Zhao [Fri, 8 Mar 2013 04:00:16 +0000 (20:00 -0800)]
mwifiex: fix potential out-of-boundary access to ibss rate table

commit 5f0fabf84d7b52f979dcbafa3d3c530c60d9a92c upstream.

smatch found this error:

CHECK   drivers/net/wireless/mwifiex/join.c
  error: testing array offset 'i' after use.

Signed-off-by: Bing Zhao <>
Signed-off-by: John W. Linville <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agortlwifi: rtl8192cu: Fix problem that prevents reassociation
Larry Finger [Wed, 13 Mar 2013 15:28:13 +0000 (10:28 -0500)]
rtlwifi: rtl8192cu: Fix problem that prevents reassociation

commit 9437a248e7cac427c898bdb11bd1ac6844a1ead4 upstream.

The driver was failing to clear the BSSID when a disconnect happened. That
prevented a reconnection. This problem is reported at,,, and

Thanks to Jussi Kivilinna for making the critical observation
that led to the solution.

Reported-by: Jussi Kivilinna <>
Tested-by: Jussi Kivilinna <>
Tested-by: Alessandro Lannocca <>
Signed-off-by: Larry Finger <>
Signed-off-by: John W. Linville <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agortlwifi: rtl8192cu: Fix schedule while atomic bug splat
Larry Finger [Wed, 27 Feb 2013 20:10:30 +0000 (14:10 -0600)]
rtlwifi: rtl8192cu: Fix schedule while atomic bug splat

commit 664899786cb49cb52f620e06ac19c0be524a7cfa upstream.

When run at debug 3 or higher, rtl8192cu reports a BUG as follows:

BUG: scheduling while atomic: kworker/u:0/5281/0x00000002
INFO: lockdep is turned off.
Modules linked in: rtl8192cu rtl8192c_common rtlwifi fuse af_packet bnep bluetooth b43 mac80211 cfg80211 ipv6 snd_hda_codec_conexant kvm_amd k
vm snd_hda_intel snd_hda_codec bcma rng_core snd_pcm ssb mmc_core snd_seq snd_timer snd_seq_device snd i2c_nforce2 sr_mod pcmcia forcedeth i2c_core soundcore
 cdrom sg serio_raw k8temp hwmon joydev ac battery pcmcia_core snd_page_alloc video button wmi autofs4 ext4 mbcache jbd2 crc16 thermal processor scsi_dh_alua
 scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh ata_generic pata_acpi pata_amd [last unloaded: rtlwifi]
Pid: 5281, comm: kworker/u:0 Tainted: G        W    3.8.0-wl+ #119
Call Trace:
 [<ffffffff814531e7>] __schedule_bug+0x62/0x70
 [<ffffffff81459af0>] __schedule+0x730/0xa30
 [<ffffffff81326e49>] ? usb_hcd_link_urb_to_ep+0x19/0xa0
 [<ffffffff8145a0d4>] schedule+0x24/0x70
 [<ffffffff814575ec>] schedule_timeout+0x18c/0x2f0
 [<ffffffff81459ec0>] ? wait_for_common+0x40/0x180
 [<ffffffff8133f461>] ? ehci_urb_enqueue+0xf1/0xee0
 [<ffffffff810a579d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81459f65>] wait_for_common+0xe5/0x180
 [<ffffffff8107d1c0>] ? try_to_wake_up+0x2d0/0x2d0
 [<ffffffff8145a08e>] wait_for_completion_timeout+0xe/0x10
 [<ffffffff8132ab1c>] usb_start_wait_urb+0x8c/0x100
 [<ffffffff8132adf9>] usb_control_msg+0xd9/0x130
 [<ffffffffa057dd8d>] _usb_read_sync+0xcd/0x140 [rtlwifi]
 [<ffffffffa057de0e>] _usb_read32_sync+0xe/0x10 [rtlwifi]
 [<ffffffffa04b0555>] rtl92cu_update_hal_rate_table+0x1a5/0x1f0 [rtl8192cu]

The cause is a synchronous read from routine rtl92cu_update_hal_rate_table().
The resulting output is not critical, thus the debug statement is

Reported-by: Jussi Kivilinna <>
Signed-off-by: Larry Finger <>
Signed-off-by: John W. Linville <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotracing: Keep overwrite in sync between regular and snapshot buffers
Steven Rostedt (Red Hat) [Thu, 14 Mar 2013 18:20:54 +0000 (14:20 -0400)]
tracing: Keep overwrite in sync between regular and snapshot buffers

commit 80902822658aab18330569587cdb69ac1dfdcea8 upstream.

Changing the overwrite mode for the ring buffer via the trace
option only sets the normal buffer. But the snapshot buffer could
swap with it, and then the snapshot would be in non overwrite mode
and the normal buffer would be in overwrite mode, even though the
option flag states otherwise.

Keep the two buffers overwrite modes in sync.

Signed-off-by: Steven Rostedt <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotracing: Protect tracer flags with trace_types_lock
Steven Rostedt (Red Hat) [Thu, 14 Mar 2013 17:50:56 +0000 (13:50 -0400)]
tracing: Protect tracer flags with trace_types_lock

commit 69d34da2984c95b33ea21518227e1f9470f11d95 upstream.

Seems that the tracer flags have never been protected from
synchronous writes. Luckily, admins don't usually modify the
tracing flags via two different tasks. But if scripts were to
be used to modify them, then they could get corrupted.

Move the trace_types_lock that protects against tracers changing
to also protect the flags being set.

Signed-off-by: Steven Rostedt <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotracing: Fix free of probe entry by calling call_rcu_sched()
Steven Rostedt (Red Hat) [Wed, 13 Mar 2013 15:15:19 +0000 (11:15 -0400)]
tracing: Fix free of probe entry by calling call_rcu_sched()

commit 740466bc89ad8bd5afcc8de220f715f62b21e365 upstream.

Because function tracing is very invasive, and can even trace
calls to rcu_read_lock(), RCU access in function tracing is done
with preempt_disable_notrace(). This requires a synchronize_sched()
for updates and not a synchronize_rcu().

Function probes (traceon, traceoff, etc) must be freed after
a synchronize_sched() after its entry has been removed from the
hash. But call_rcu() is used. Fix this by using call_rcu_sched().

Also fix the usage to use hlist_del_rcu() instead of hlist_del().

Signed-off-by: Steven Rostedt <>
Cc: Paul McKenney <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotracing: Fix race in snapshot swapping
Steven Rostedt (Red Hat) [Tue, 12 Mar 2013 15:32:32 +0000 (11:32 -0400)]
tracing: Fix race in snapshot swapping

commit 2721e72dd10f71a3ba90f59781becf02638aa0d9 upstream.

Although the swap is wrapped with a spin_lock, the assignment
of the temp buffer used to swap is not within that lock.
It needs to be moved into that lock, otherwise two swaps
happening on two different CPUs, can end up using the wrong
temp buffer to assign in the swap.

Luckily, all current callers of the swap function appear to have
their own locks. But in case something is added that allows two
different callers to call the swap, then there's a chance that
this race can trigger and corrupt the buffers.

New code is coming soon that will allow for this race to trigger.

I've Cc'd stable, so this bug will not show up if someone backports
one of the changes that can trigger this bug.

Signed-off-by: Steven Rostedt <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrm/i915: restrict kernel address leak in debugfs
Kees Cook [Mon, 11 Mar 2013 19:25:19 +0000 (12:25 -0700)]
drm/i915: restrict kernel address leak in debugfs

commit 2563a4524febe8f4a98e717e02436d1aaf672aa2 upstream.

Masks kernel address info-leak in object dumps with the %pK suffix,
so they cannot be used to target kernel memory corruption attacks if
the kptr_restrict sysctl is set.

Signed-off-by: Kees Cook <>
Signed-off-by: Daniel Vetter <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoRevert "drm/i915: try to train DP even harder"
Takashi Iwai [Mon, 11 Mar 2013 17:40:16 +0000 (18:40 +0100)]
Revert "drm/i915: try to train DP even harder"

commit 3b4f819d5eac94ba8fe5e8c061f6dabfe8d7b22c upstream.

This reverts commit 0d71068835e2610576d369d6d4cbf90e0f802a71.

Not only that the commit introduces a bogus check (voltage_tries == 5
will never meet at the inserted code path), it brings the i915 driver
into an endless dp-train loop on HP Z1 desktop machine with IVY+eDP.

At least reverting this commit recovers the framebuffer (but X is
still broken by other reasons...)

Signed-off-by: Takashi Iwai <>
Signed-off-by: Daniel Vetter <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agovfs,proc: guarantee unique inodes in /proc
Linus Torvalds [Fri, 22 Mar 2013 18:44:04 +0000 (11:44 -0700)]
vfs,proc: guarantee unique inodes in /proc

commit 51f0885e5415b4cc6535e9cdcc5145bfbc134353 upstream.

Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc/<pid>/net/ for example.

This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems.  We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.

In short: don't do this.  Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way.  But we're kind of stuck with it.

This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number.  That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.

That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable.  We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.

Reported-and-tested-by: Dave Jones <>
Analyzed-by: Al Viro <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosaner proc_get_inode() calling conventions
Al Viro [Sat, 26 Jan 2013 01:11:22 +0000 (20:11 -0500)]
saner proc_get_inode() calling conventions

commit d3d009cb965eae7e002ea5badf603ea8f4c34915 upstream.

Make it drop the pde in *all* cases when no new reference to it is
put into an inode - both when an inode had already been set up
(as we were already doing) and when inode allocation has failed.
Makes for simpler logics in callers...

Signed-off-by: Al Viro <>
Cc: Linus Torvalds <>
Cc: Dave Jones <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoALSA: snd-usb: mixer: ignore -EINVAL in snd_usb_mixer_controls()
Daniel Mack [Tue, 19 Mar 2013 20:09:25 +0000 (21:09 +0100)]
ALSA: snd-usb: mixer: ignore -EINVAL in snd_usb_mixer_controls()

commit 83ea5d18d74f032a760fecde78c0210f66f7f70c upstream.

Creation of individual mixer controls may fail, but that shouldn't cause
the entire mixer creation to fail. Even worse, if the mixer creation
fails, that will error out the entire device probing.

All the functions called by parse_audio_unit() should return -EINVAL if
they find descriptors that are unsupported or believed to be malformed,
so we can safely handle this error code as a non-fatal condition in

That fixes a long standing bug which is commonly worked around by
adding quirks which make the driver ignore entire interfaces. Some of
them might now be unnecessary.

Signed-off-by: Daniel Mack <>
Reported-and-tested-by: Rodolfo Thomazelli <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoALSA: snd-usb: mixer: propagate errors up the call chain
Daniel Mack [Tue, 19 Mar 2013 20:09:24 +0000 (21:09 +0100)]
ALSA: snd-usb: mixer: propagate errors up the call chain

commit 4d7b86c98e445b075c2c4c3757eb6d3d6efbe72e upstream.

In check_input_term() and parse_audio_feature_unit(), propagate the
error value that has been returned by a failing function instead of
-EINVAL. That helps cleaning up the error pathes in the mixer.

Signed-off-by: Daniel Mack <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoALSA: usb: Parse UAC2 extension unit like for UAC1
Torstein Hegge [Tue, 19 Mar 2013 16:12:14 +0000 (17:12 +0100)]
ALSA: usb: Parse UAC2 extension unit like for UAC1

commit 61ac51301e6c6d4ed977d7674ce2b8e713619a9b upstream.

UAC2_EXTENSION_UNIT_V2 differs from UAC1_EXTENSION_UNIT, but can be handled in
the same way when parsing the unit. Otherwise parse_audio_unit() fails when it
sees an extension unit on a UAC2 device.

UAC2_EXTENSION_UNIT_V2 is outside the range allocated by UAC1.

Signed-off-by: Torstein Hegge <>
Acked-by: Daniel Mack <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoALSA: hda - Fix typo in checking IEC958 emphasis bit
Takashi Iwai [Wed, 20 Mar 2013 14:42:00 +0000 (15:42 +0100)]
ALSA: hda - Fix typo in checking IEC958 emphasis bit

commit a686fd141e20244ad75f80ad54706da07d7bb90a upstream.

There is a typo in convert_to_spdif_status() about checking the
emphasis IEC958 status bit.  It should check the given value instead
of the resultant value.

Reported-by: Martin Weishart <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoALSA: hda/cirrus - Fix the digital beep registration
Takashi Iwai [Mon, 18 Mar 2013 10:00:44 +0000 (11:00 +0100)]
ALSA: hda/cirrus - Fix the digital beep registration

commit a86b1a2cd2f81f74e815e07f756edd7bc5b6f034 upstream.

The argument passed to snd_hda_attach_beep_device() is a widget NID
while spec->beep_amp holds the composed value for amp controls.

Signed-off-by: Takashi Iwai <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosfc: Only use TX push if a single descriptor is to be written
Ben Hutchings [Wed, 27 Feb 2013 16:50:38 +0000 (16:50 +0000)]
sfc: Only use TX push if a single descriptor is to be written

[ Upstream commit fae8563b25f73dc584a07bcda7a82750ff4f7672 ]

Using TX push when notifying the NIC of multiple new descriptors in
the ring will very occasionally cause the TX DMA engine to re-use an
old descriptor.  This can result in a duplicated or partly duplicated
packet (new headers with old data), or an IOMMU page fault.  This does
not happen when the pushed descriptor is the only one written.

TX push also provides little latency benefit when a packet requires
more than one descriptor.

Signed-off-by: Ben Hutchings <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosfc: Disable soft interrupt handling during efx_device_detach_sync()
Ben Hutchings [Tue, 5 Mar 2013 01:03:47 +0000 (01:03 +0000)]
sfc: Disable soft interrupt handling during efx_device_detach_sync()

[ Upstream commit 35205b211c8d17a8a0b5e8926cb7c73e9a7ef1ad ]

efx_device_detach_sync() locks all TX queues before marking the device
detached and thus disabling further TX scheduling.  But it can still
be interrupted by TX completions which then result in TX scheduling in
soft interrupt context.  This will deadlock when it tries to acquire
a TX queue lock that efx_device_detach_sync() already acquired.

To avoid deadlock, we must use netif_tx_{,un}lock_bh().

Signed-off-by: Ben Hutchings <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosfc: Detach net device when stopping queues for reconfiguration
Ben Hutchings [Mon, 28 Jan 2013 19:01:06 +0000 (19:01 +0000)]
sfc: Detach net device when stopping queues for reconfiguration

[ Upstream commit 29c69a4882641285a854d6d03ca5adbba68c0034 ]

We must only ever stop TX queues when they are full or the net device
is not 'ready' so far as the net core, and specifically the watchdog,
is concerned.  Otherwise, the watchdog may fire *immediately* if no
packets have been added to the queue in the last 5 seconds.

The device is ready if all the following are true:

(a) It has a qdisc
(b) It is marked present
(c) It is running
(d) The link is reported up

(a) and (c) are normally true, and must not be changed by a driver.
(d) is under our control, but fake link changes may disturb userland.
This leaves (b).  We already mark the device absent during reset
and self-test, but we need to do the same during MTU changes and ring
reallocation.  We don't need to do this when the device is brought
down because then (c) is already false.

Signed-off-by: Ben Hutchings <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosfc: Fix efx_rx_buf_offset() in the presence of swiotlb
Ben Hutchings [Thu, 10 Jan 2013 23:51:54 +0000 (23:51 +0000)]
sfc: Fix efx_rx_buf_offset() in the presence of swiotlb

[ Upstream commits b590ace09d51cd39744e0f7662c5e4a0d1b5d952 and
  c73e787a8db9117d59b5180baf83203a42ecadca ]

We assume that the mapping between DMA and virtual addresses is done
on whole pages, so we can find the page offset of an RX buffer using
the lower bits of the DMA address.  However, swiotlb maps in units of
2K, breaking this assumption.

Add an explicit page_offset field to struct efx_rx_buffer.

Signed-off-by: Ben Hutchings <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosfc: Properly sync RX DMA buffer when it is not the last in the page
Ben Hutchings [Thu, 20 Dec 2012 18:48:20 +0000 (18:48 +0000)]
sfc: Properly sync RX DMA buffer when it is not the last in the page

[ Upstream commit 3a68f19d7afb80f548d016effbc6ed52643a8085 ]

We may currently allocate two RX DMA buffers to a page, and only unmap
the page when the second is completed.  We do not sync the first RX
buffer to be completed; this can result in packet loss or corruption
if the last RX buffer completed in a NAPI poll is the first in a page
and is not DMA-coherent.  (In the middle of a NAPI poll, we will
handle the following RX completion and unmap the page *before* looking
at the content of the first buffer.)

Signed-off-by: Ben Hutchings <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodrivers/net/ethernet/sfc/ptp.c: adjust duplicate test
Julia Lawall [Mon, 21 Jan 2013 03:02:48 +0000 (03:02 +0000)]
drivers/net/ethernet/sfc/ptp.c: adjust duplicate test

[ Upstream commit 56567c6f8751c633581ca7c8e1cf08eed503f5ea ]

Delete successive tests to the same location.  rc was previously tested and
not subsequently updated.  efx_phc_adjtime can return an error code, so the
call is updated so that is tested instead.

A simplified version of the semantic match that finds this problem is as
follows: (

// <smpl>
@s exists@
local idexpression y;
expression x,e;

*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
 { ... when forall
   return ...; }
... when != \(y = e\|y += e\|y -= e\|y |= e\|y &= e\|y++\|y--\|&y\)
    when != \(XT_GETPAGE(...,y)\|WMI_CMD_BUF(...)\)
*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
 { ... when forall
   return ...; }
// </smpl>

Signed-off-by: Julia Lawall <>
Acked-by: Ben Hutchings <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoinet: limit length of fragment queue hash table bucket lists
Hannes Frederic Sowa [Fri, 15 Mar 2013 11:32:30 +0000 (11:32 +0000)]
inet: limit length of fragment queue hash table bucket lists

[ Upstream commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 ]

This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <>
Cc: Jesper Dangaard Brouer <>
Signed-off-by: Hannes Frederic Sowa <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotcp: dont handle MTU reduction on LISTEN socket
Eric Dumazet [Mon, 18 Mar 2013 07:01:28 +0000 (07:01 +0000)]
tcp: dont handle MTU reduction on LISTEN socket

[ Upstream commit 0d4f0608619de59fd8169dd8e72aadc28d80e715 ]

When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
LISTEN socket, and this socket is currently owned by the user, we
set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.

This is bad because if we clone the parent before it had a chance to
clear the flag, the child inherits the tsq_flags value, and next
tcp_release_cb() on the child will decrement sk_refcnt.

Result is that we might free a live TCP socket, as reported by

IPv4: Attempt to release TCP socket in state 1

Fix this issue by testing sk_state against TCP_LISTEN early, so that we
set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)

This bug was introduced in commit 563d34d05786
(tcp: dont drop MTU reduction indications)

Reported-by: dormando <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobnx2x: fix occasional statistics off-by-4GB error
Maciej Żenczykowski [Fri, 15 Mar 2013 11:56:17 +0000 (11:56 +0000)]
bnx2x: fix occasional statistics off-by-4GB error

[ Upstream commit b009aac12cd0fe34293c68af8ac48b85be3bd858 ]

The UPDATE_QSTAT function introduced on February 15, 2012
in commit 1355b704b9ba "bnx2x: consistent statistics after
internal driver reload" incorrectly fails to handle overflow
during addition of the lower 32-bit field of a stat.

This bug is present since 3.4-rc1 and should thus be considered
a candidate for stable 3.4+ releases.

Google-Bug-Id: 8374428
Signed-off-by: Maciej Żenczykowski <>
Cc: Mintz Yuval <>
Acked-by: Eilon Greenstein <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agovhost/net: fix heads usage of ubuf_info
Michael S. Tsirkin [Sun, 17 Mar 2013 02:46:09 +0000 (02:46 +0000)]
vhost/net: fix heads usage of ubuf_info

[ Upstream commit 46aa92d1ba162b4b3d6b7102440e459d4e4ee255 ]

ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.

Reported-by: Rusty Russell <>
Signed-off-by: Michael S. Tsirkin <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobnx2x: add missing napi deletion in error path
Michal Schmidt [Fri, 15 Mar 2013 05:27:54 +0000 (05:27 +0000)]
bnx2x: add missing napi deletion in error path

[ Upstream commit 722c6f585088a2c392b4c5d01b87a584bb8fb73f ]

If the hardware initialization fails in bnx2x_nic_load() after adding
napi objects, they would not be deleted. A subsequent attempt to unload
the bnx2x module detects a corruption in the napi list.

Add the missing napi deletion to the error path.

Signed-off-by: Michal Schmidt <>
Acked-by: Dmitry Kravkov <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonet: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
Bjørn Mork [Thu, 14 Mar 2013 01:05:13 +0000 (01:05 +0000)]
net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility

[ Upstream commit 1e8bbe6cd02fc300c88bd48244ce61ad9c7d1776 ]

commit bd329e1 ("net: cdc_ncm: do not bind to NCM compatible MBIM devices")
introduced a new policy, preferring MBIM for dual NCM/MBIM functions if
the cdc_mbim driver was enabled.  This caused a regression for users
wanting to use NCM.

Devices implementing NCM backwards compatibility according to section
3.2 of the MBIM v1.0 specification allow either NCM or MBIM on a single
USB function, using different altsettings.  The cdc_ncm and cdc_mbim
drivers will both probe such functions, and must agree on a common
policy for selecting either MBIM or NCM.  Until now, this policy has
been set at build time based on CONFIG_USB_NET_CDC_MBIM.

Use a module parameter to set the system policy at runtime, allowing the
user to prefer NCM on systems with the cdc_mbim driver.

Cc: Greg Suarez <>
Cc: Alexey Orishko <>
Reported-by: Geir Haatveit <>
Reported-by: Tommi Kyntola <>
Signed-off-by: Bjørn Mork <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agortnetlink: Mask the rta_type when range checking
Vlad Yasevich [Wed, 13 Mar 2013 04:18:58 +0000 (04:18 +0000)]
rtnetlink: Mask the rta_type when range checking

[ Upstream commit a5b8db91442fce9c9713fcd656c3698f1adde1d6 ]

Range/validity checks on rta_type in rtnetlink_rcv_msg() do
not account for flags that may be set.  This causes the function
to return -EINVAL when flags are set on the type (for example

Signed-off-by: Vlad Yasevich <>
Acked-by: Thomas Graf <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoRevert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
Timo Teräs [Wed, 13 Mar 2013 02:37:49 +0000 (02:37 +0000)]
Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"

[ Upstream commit 8c6216d7f118a128678270824b6a1286a63863ca ]

This reverts commit 412ed94744d16806fbec3bd250fd94e71cde5a1f.

The commit is wrong as tiph points to the outer IPv4 header which is
installed at ipgre_header() and not the inner one which is protocol dependant.

This commit broke succesfully opennhrp which use PF_PACKET socket with
ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer
IPv4 address. This address is written by ipgre_header() to the skb
earlier, and this is the IPv4 header tiph should point to - regardless
of the inner protocol payload.

Signed-off-by: Timo Teräs <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoskb: Propagate pfmemalloc on skb from head page only
Pavel Emelyanov [Thu, 14 Mar 2013 03:29:40 +0000 (03:29 +0000)]
skb: Propagate pfmemalloc on skb from head page only

[ Upstream commit cca7af3889bfa343d33d5e657a38d876abd10e58 ]


I'm trying to send big chunks of memory from application address space via
TCP socket using vmsplice + splice like this

   mem = mmap(128Mb);
   vmsplice(pipe[1], mem); /* splice memory into pipe */
   splice(pipe[0], tcp_socket); /* send it into network */

When I'm lucky and a huge page splices into the pipe and then into the socket
_and_ client and server ends of the TCP connection are on the same host,
communicating via lo, the whole connection gets stuck! The sending queue
becomes full and app stops writing/splicing more into it, but the receiving
queue remains empty, and that's why.

The __skb_fill_page_desc observes a tail page of a huge page and erroneously
propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
contain garbage). Then this skb->pfmemalloc leaks through lo and due to the

        if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
            return -ENOMEM
        goto release_and_discard;

no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
cloning clones the pfmemalloc flag as well.

That said, here's the proper page->pfmemalloc propagation onto socket: we
must check the huge-page's head page only, other pages' pfmemalloc and mapping
values do not contain what is expected in this place. However, I'm not sure
whether this fix is _complete_, since pfmemalloc propagation via lo also
oesn't look great.

Both, bit propagation from page to skb and this check in sk_filter, were
introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
Mel and stable@ are in Cc.

Signed-off-by: Pavel Emelyanov <>
Acked-by: Eric Dumazet <>
Acked-by: Mel Gorman <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotcp: fix skb_availroom()
Eric Dumazet [Thu, 14 Mar 2013 05:40:32 +0000 (05:40 +0000)]
tcp: fix skb_availroom()

[ Upstream commit 16fad69cfe4adbbfa813de516757b87bcae36d93 ]

Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :

commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx
path) did a poor choice adding an 'avail_size' field to skb, while
what we really needed was a 'reserved_tailroom' one.

It would have avoided commit 22b4a4f22da (tcp: fix retransmit of
partially acked frames) and this commit.

Crash occurs because skb_split() is not aware of the 'avail_size'
management (and should not be aware)

Signed-off-by: Eric Dumazet <>
Reported-by: Mukesh Agrawal <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonet: qmi_wwan: set correct altsetting for Gobi 1K devices
Bjørn Mork [Wed, 13 Mar 2013 02:25:17 +0000 (02:25 +0000)]
net: qmi_wwan: set correct altsetting for Gobi 1K devices

[ Upstream commit b701f16dd490d3f346724050f17d60beda094998 ]

commit bd877e4 ("net: qmi_wwan: use a single bind function for
all device types") made Gobi 1K devices fail probing.

Using the number of endpoints in the default altsetting to decide
whether the function use one or two interfaces is wrong.  Other
altsettings may provide more endpoints.

With Gobi 1K devices, USB interface #3's altsetting is 0 by default, but
altsetting 0 only provides one interrupt endpoint and is not sufficent
for QMI.  Altsetting 1 provides all 3 endpoints required for qmi_wwan
and works with QMI. Gobi 1K layout for intf#3 is:

    Interface Descriptor:  255/255/255
      bInterfaceNumber        3
      bAlternateSetting       0
      Endpoint Descriptor:  Interrupt IN
    Interface Descriptor:  255/255/255
      bInterfaceNumber        3
      bAlternateSetting       1
      Endpoint Descriptor:  Interrupt IN
      Endpoint Descriptor:  Bulk IN
      Endpoint Descriptor:  Bulk OUT

Prior to commit bd877e4, we would call usbnet_get_endpoints
before giving up finding enough endpoints. Removing the early
endpoint number test and the strict functional descriptor
requirement allow qmi_wwan_bind to continue until
usbnet_get_endpoints has made the final attempt to collect
endpoints.  This restores the behaviour from before commit
bd877e4 without losing the added benefit of using a single bind

The driver has always required a CDC Union functional descriptor
for two-interface functions. Using the existence of this
descriptor to detect two-interface functions is the logically
correct method.

Reported-by: Dan Williams <>
Signed-off-by: Bjørn Mork <>
Tested-by: Dan Williams <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoipv4: fix definition of FIB_TABLE_HASHSZ
Denis V. Lunev [Wed, 13 Mar 2013 00:24:15 +0000 (00:24 +0000)]
ipv4: fix definition of FIB_TABLE_HASHSZ

[ Upstream commit 5b9e12dbf92b441b37136ea71dac59f05f2673a9 ]

a long time ago by the commit

  commit 93456b6d7753def8760b423ac6b986eb9d5a4a95
  Author: Denis V. Lunev <>
  Date:   Thu Jan 10 03:23:38 2008 -0800

    [IPV4]: Unify access to the routing tables.

the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH

This patch returns the situation to the original state.

The problem was spotted by Tingwei Liu.

Signed-off-by: Denis V. Lunev <>
CC: Tingwei Liu <>
CC: Alexey Kuznetsov <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosctp: don't break the loop while meeting the active_path so as to find the matched...
Xufeng Zhang [Thu, 7 Mar 2013 21:39:37 +0000 (21:39 +0000)]
sctp: don't break the loop while meeting the active_path so as to find the matched transport

[ Upstream commit 2317f449af30073cfa6ec8352e4a65a89e357bdd ]

sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.

Signed-off-by: Xufeng Zhang <>
Acked-by: Neil Horman <>
Acked-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosctp: Use correct sideffect command in duplicate cookie handling
Vlad Yasevich [Tue, 12 Mar 2013 15:53:23 +0000 (15:53 +0000)]
sctp: Use correct sideffect command in duplicate cookie handling

[ Upstream commit f2815633504b442ca0b0605c16bf3d88a3a0fcea ]

When SCTP is done processing a duplicate cookie chunk, it tries
to delete a newly created association.  For that, it has to set
the right association for the side-effect processing to work.
However, when it uses the SCTP_CMD_NEW_ASOC command, that performs
more work then really needed (like hashing the associationa and
assigning it an id) and there is no point to do that only to
delete the association as a next step.  In fact, it also creates
an impossible condition where an association may be found by
the getsockopt() call, and that association is empty.  This
causes a crash in some sctp getsockopts.

The solution is rather simple.  We simply use SCTP_CMD_SET_ASOC
command that doesn't have all the overhead and does exactly
what we need.

Reported-by: Karl Heiss <>
Tested-by: Karl Heiss <>
CC: Neil Horman <>
Signed-off-by: Vlad Yasevich <>
Acked-by: Neil Horman <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agotg3: 5715 does not link up when autoneg off
Nithin Sujir [Tue, 12 Mar 2013 15:32:48 +0000 (15:32 +0000)]
tg3: 5715 does not link up when autoneg off

[ Upstream commit 7c6cdead7cc9a99650d15497aae47d7472217eb1 ]

Commit d13ba512cbba7de5d55d7a3b2aae7d83c8921457 ("tg3: Remove
SPEED_UNKNOWN checks") cleaned up the autoneg advertisement by
removing some dead code. One effect of this change was that the
advertisement register would not be updated if autoneg is turned off.

This exposed a bug on the 5715 device w.r.t linking. The 5715 defaults
to advertise only 10Mb Full duplex. But with autoneg disabled, it needs
the configured speed enabled in the advertisement register to link up.

This patch adds the work around to advertise all speeds on the 5715 when
autoneg is disabled.

Reported-by: Marcin Miotk <>
Reviewed-by: Benjamin Li <>
Signed-off-by: Nithin Nayak Sujir <>
Signed-off-by: Michael Chan <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobonding: don't call update_speed_duplex() under spinlocks
Veaceslav Falico [Tue, 12 Mar 2013 06:31:32 +0000 (06:31 +0000)]
bonding: don't call update_speed_duplex() under spinlocks

[ Upstream commit 876254ae2758d50dcb08c7bd00caf6a806571178 ]

bond_update_speed_duplex() might sleep while calling underlying slave's
routines. Move it out of atomic context in bond_enslave() and remove it
from bond_miimon_commit() - it was introduced by commit 546add79, however
when the slave interfaces go up/change state it's their responsibility to
fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update
their speed.

I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex
changes, remote-controlled and local, on (not) MII-based cards. All changes
are visible.

Signed-off-by: Veaceslav Falico <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobatman-adv: verify tt len does not exceed packet len
Marek Lindner [Mon, 4 Mar 2013 02:39:49 +0000 (10:39 +0800)]
batman-adv: verify tt len does not exceed packet len

[ Upstream commit b47506d91259c29b9c75c404737eb6525556f9b4 ]

batadv_iv_ogm_process() accesses the packet using the tt_num_changes
attribute regardless of the real packet len (assuming the length check
was done before). Therefore a length check is needed to avoid reading
random memory.

Signed-off-by: Marek Lindner <>
Signed-off-by: Antonio Quartulli <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonetconsole: don't call __netpoll_cleanup() while atomic
Veaceslav Falico [Mon, 11 Mar 2013 00:21:48 +0000 (00:21 +0000)]
netconsole: don't call __netpoll_cleanup() while atomic

[ Upstream commit 3f315bef23075ea8a98a6fe4221a83b83456d970 ]

__netpoll_cleanup() is called in netconsole_netdev_event() while holding a
spinlock. Release/acquire the spinlock before/after it and restart the
loop. Also, disable the netconsole completely, because we won't have chance
after the restart of the loop, and might end up in a situation where
nt->enabled == 1 and nt-> == NULL.

Signed-off-by: Veaceslav Falico <>
Acked-by: Neil Horman <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobridge: reserve space for IFLA_BRPORT_FAST_LEAVE
stephen hemminger [Mon, 11 Mar 2013 13:52:17 +0000 (13:52 +0000)]
bridge: reserve space for IFLA_BRPORT_FAST_LEAVE

[ Upstream commit 3da889b616164bde76a37350cf28e0d17a94e979 ]

The bridge multicast fast leave feature was added sufficient space
was not reserved in the netlink message. This means the flag may be
lost in netlink events and results of queries.

Found by observation while looking up some netlink stuff for discussion with Vlad.
Problem introduced by commit c2d3babfafbb9f6629cfb47139758e59a5eb0d80
Author: David S. Miller <>
Date:   Wed Dec 5 16:24:45 2012 -0500

    bridge: implement multicast fast leave

Signed-off-by: Stephen Hemminger <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonet/ipv4: Ensure that location of timestamp option is stored
David Ward [Mon, 11 Mar 2013 10:43:39 +0000 (10:43 +0000)]
net/ipv4: Ensure that location of timestamp option is stored

[ Upstream commit 4660c7f498c07c43173142ea95145e9dac5a6d14 ]

This is needed in order to detect if the timestamp option appears
more than once in a packet, to remove the option if the packet is
fragmented, etc. My previous change neglected to store the option
location when the router addresses were prespecified and Pointer >
Length. But now the option location is also stored when Flag is an
unrecognized value, to ensure these option handling behaviors are
still performed.

Signed-off-by: David Ward <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agosunsu: Fix panic in case of nonexistent port at "console=ttySY" cmdline option
Tkhai Kirill [Sat, 23 Feb 2013 23:01:15 +0000 (23:01 +0000)]
sunsu: Fix panic in case of nonexistent port at "console=ttySY" cmdline option

[ Upstream commit cb29529ea0030e60ef1bbbf8399a43d397a51526 ]

If a machine has X (X < 4) sunsu ports and cmdline
option "console=ttySY" is passed, where X < Y <= 4,
than the following panic happens:

Unable to handle kernel NULL pointer dereference
TPC: <sunsu_console_setup+0x78/0xe0>
RPC: <sunsu_console_setup+0x74/0xe0>
I7: <register_console+0x378/0x3e0>
Call Trace:
 [0000000000453a38] register_console+0x378/0x3e0
 [0000000000576fa0] uart_add_one_port+0x2e0/0x340
 [000000000057af40] su_probe+0x160/0x2e0
 [00000000005b8a4c] platform_drv_probe+0xc/0x20
 [00000000005b6c2c] driver_probe_device+0x12c/0x220
 [00000000005b6da8] __driver_attach+0x88/0xa0
 [00000000005b4df4] bus_for_each_dev+0x54/0xa0
 [00000000005b5a54] bus_add_driver+0x154/0x260
 [00000000005b7190] driver_register+0x50/0x180
 [00000000006d250c] sunsu_init+0x18c/0x1e0
 [00000000006c2668] do_one_initcall+0xe8/0x160
 [00000000006c282c] kernel_init_freeable+0x12c/0x1e0
 [0000000000603764] kernel_init+0x4/0x100
 [0000000000405f64] ret_from_syscall+0x1c/0x2c
 [0000000000000000]           (null)

1)Fix the panic;
2)Increment registered port number every successful

Signed-off-by: Kirill Tkhai <>
CC: David Miller <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoUSB: EHCI: work around silicon bug in Intel's EHCI controllers
Alan Stern [Fri, 1 Mar 2013 15:50:08 +0000 (10:50 -0500)]
USB: EHCI: work around silicon bug in Intel's EHCI controllers

commit 6402c796d3b4205d3d7296157956c5100a05d7d6 upstream.

This patch (as1660) works around a hardware problem present in some
(if not all) Intel EHCI controllers.  After a QH has been unlinked
from the async schedule and the corresponding IAA interrupt has
occurred, the controller is not supposed access the QH and its qTDs.
There certainly shouldn't be any more DMA writes to those structures.
Nevertheless, Intel's controllers have been observed to perform a
final writeback to the QH's overlay region and to the most recent qTD.
For more information and a test program to determine whether this
problem is present in a particular controller, see

This patch works around the problem by always waiting for two IAA
cycles when unlinking an async QH.  The extra IAA delay gives the
controller time to perform its final writeback.

Surprisingly enough, the effects of this silicon bug have gone
undetected until quite recently.  More through luck than anything
else, it hasn't caused any apparent problems.  However, it does
interact badly with the path that follows this one, so it needs to be

This is the first part of a fix for the regression reported at:

Signed-off-by: Alan Stern <>
Tested-by: Stephen Thirlwall <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoLinux 3.8.4
Greg Kroah-Hartman [Wed, 20 Mar 2013 20:11:19 +0000 (13:11 -0700)]
Linux 3.8.4

5 years agoRevert "drm/i915: reorder setup sequence to have irqs for output setup"
Greg Kroah-Hartman [Mon, 18 Mar 2013 20:16:44 +0000 (13:16 -0700)]
Revert "drm/i915: reorder setup sequence to have irqs for output setup"

Revert commit 2a9810441fcc26cf3f006f015f8a62094fe57a90 which is
commit 52d7ecedac3f96fb562cb482c139015372728638 upstream.

This caused problems in 3.8-stable, but all is fine in 3.9-rc.

Signed-off-by: Greg Kroah-Hartman <>
Cc: Imre Deak <>
Cc: Daniel Vetter <>
5 years agoRevert "drm/i915: enable irqs earlier when resuming"
Greg Kroah-Hartman [Mon, 18 Mar 2013 20:13:39 +0000 (13:13 -0700)]
Revert "drm/i915: enable irqs earlier when resuming"

This reverts commit 31f14f4219d2a74b7a6d86c7798f49141b5eccbe which was
commit 15239099d7a7a9ecdc1ccb5b187ae4cda5488ff9 upstream.

It caused problems in the 3.8-stable series, but 3.9-rc is just fine.

Signed-off-by: Greg Kroah-Hartman <>
Cc: Chris Wilson <>
Cc: Mika Kuoppala <>
Cc: Ilya Tumaykin <>
Cc: Daniel Vetter <>
5 years ago6lowpan: Fix endianness issue in is_addr_link_local().
YOSHIFUJI Hideaki [Sat, 9 Mar 2013 09:11:57 +0000 (09:11 +0000)]
6lowpan: Fix endianness issue in is_addr_link_local().

[ Upstream commit 9026c4927254f5bea695cc3ef2e255280e6a3011 ]

Signed-off-by: YOSHIFUJI Hideaki <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agodcbnl: fix various netlink info leaks
Mathias Krause [Sat, 9 Mar 2013 05:52:21 +0000 (05:52 +0000)]
dcbnl: fix various netlink info leaks

[ Upstream commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 ]

The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
  copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
  so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
  for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agortnl: fix info leak on RTM_GETLINK request for VF devices
Mathias Krause [Sat, 9 Mar 2013 05:52:20 +0000 (05:52 +0000)]
rtnl: fix info leak on RTM_GETLINK request for VF devices

[ Upstream commit 84d73cd3fb142bf1298a8c13fd4ca50fd2432372 ]

Initialize the mac address buffer with 0 as the driver specific function
will probably not fill the whole buffer. In fact, all in-kernel drivers
fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible
bytes. Therefore we currently leak 26 bytes of stack memory to userland
via the netlink interface.

Signed-off-by: Mathias Krause <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobridge: fix mdb info leaks
Mathias Krause [Sat, 9 Mar 2013 05:52:19 +0000 (05:52 +0000)]
bridge: fix mdb info leaks

[ Upstream commit c085c49920b2f900ba716b4ca1c1a55ece9872cc ]

The bridging code discloses heap and stack bytes via the RTM_GETMDB
netlink interface and via the notify messages send to group RTNLGRP_MDB
afer a successful add/del.

Fix both cases by initializing all unset members/padding bytes with

Cc: Stephen Hemminger <>
Signed-off-by: Mathias Krause <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoipv6: stop multicast forwarding to process interface scoped addresses
Hannes Frederic Sowa [Fri, 8 Mar 2013 02:07:23 +0000 (02:07 +0000)]
ipv6: stop multicast forwarding to process interface scoped addresses

[ Upstream commit ddf64354af4a702ee0b85d0a285ba74c7278a460 ]

a) used struct ipv6_addr_props

a) reverted changes for ipv6_addr_props

a) do not use __ipv6_addr_needs_scope_id

Cc: YOSHIFUJI Hideaki <>
Signed-off-by: Hannes Frederic Sowa <>
Acked-by: YOSHIFUJI Hideaki <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobridging: fix rx_handlers return code
Cristian Bercaru [Fri, 8 Mar 2013 07:03:38 +0000 (07:03 +0000)]
bridging: fix rx_handlers return code

[ Upstream commit 3bc1b1add7a8484cc4a261c3e128dbe1528ce01f ]

The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer
counted as dropped. They are counted as successfully received by

This allows network interface drivers to correctly update their RX-OK and
RX-DRP counters based on the result of 'netif_receive_skb'.

Signed-off-by: Cristian Bercaru <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agonetlabel: correctly list all the static label mappings
Paul Moore [Wed, 6 Mar 2013 11:45:24 +0000 (11:45 +0000)]
netlabel: correctly list all the static label mappings

[ Upstream commits 0c1233aba1e948c37f6dc7620cb7c253fcd71ce9 and
  a6a8fe950e1b8596bb06f2c89c3a1a4bf2011ba9 ]

When we have a large number of static label mappings that spill across
the netlink message boundary we fail to properly save our state in the
netlink_callback struct which causes us to repeat the same listings.
This patch fixes this problem by saving the state correctly between
calls to the NetLabel static label netlink "dumpit" routines.

Signed-off-by: Paul Moore <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agomacvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode.
Vlad Yasevich [Thu, 7 Mar 2013 10:21:48 +0000 (10:21 +0000)]
macvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode.

[ Upstream commit 87ab7f6f2874f1115817e394a7ed2dea1c72549e ]

Macvlan already supports hw address filters.  Set the IFF_UNICAST_FLT
so that it doesn't needlesly enter PROMISC mode when macvlans are

Signed-of-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agoteam: unsyc the devices addresses when port is removed
Vlad Yasevich [Thu, 7 Mar 2013 07:59:25 +0000 (07:59 +0000)]
team: unsyc the devices addresses when port is removed

[ Upstream commit ba81276b1a5e3cf0674cb0e6d9525e5ae0c98695 ]

When a team port is removed, unsync all devices addresses that may have
been synched to the port devices.

CC: Jiri Pirko <>
Signed-off-by: Vlad Yasevich <>
Acked-by: Jiri Pirko <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agobonding: fire NETDEV_RELEASE event only on 0 slaves
Veaceslav Falico [Wed, 6 Mar 2013 07:10:32 +0000 (07:10 +0000)]
bonding: fire NETDEV_RELEASE event only on 0 slaves

[ Upstream commit 80028ea1c0afc24d4ddeb8dd2a9992fff03616ca ]

Currently, if we set up netconsole over bonding and release a slave,
netconsole will stop logging on the whole bonding device. Change the
behavior to stop the netconsole only when the last slave is released.

Signed-off-by: Veaceslav Falico <>
Acked-by: Neil Horman <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
5 years agovxlan: fix oops when delete netns containing vxlan
Zang MingJie [Wed, 6 Mar 2013 04:37:37 +0000 (04:37 +0000)]
vxlan: fix oops when delete netns containing vxlan

[ Upstream commit 9cb6cb7ed11cd3b69c47bb414983603a6ff20b1d ]

The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533 struct sock *sk = vn->sock->sk;
    (gdb) l
    528 static int vxlan_leave_group(struct net_device *dev)
    529 {
    530 struct vxlan_dev *vxlan = netdev_priv(dev);
    531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532 int err = 0;
    533 struct sock *sk = vn->sock->sk;
    534 struct ip_mreqn mreq = {
    535 .imr_multiaddr.s_addr = vxlan->gaddr,
    536 .imr_ifindex = vxlan->link,
    537 };
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>