Pileus Git - ~andy/linux/log

powerpc/powernv: Refactor PHB diag-data dump

As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl:     00000002
  RootSts:     0000000f 00400000 b0830008 00100147 00002000
  nFir:        0000000000000000 0030006e00000000 0000000000000000
  PhbSts:      0000001c00000000 0000000000000000
  Lem:         0000000000100000 42498e327f502eae 0000000000000000
  InAErr:      8000000000000000 8000000000000000 0402030000000000 0000000000000000
  PE[  8] A/B: 8480002b00000000 8000000000000000

[ The current diag data is so big that it overflows the printk
  buffer pretty quickly in cases when we get a handful of errors
  at once which can happen. --BenH
]

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Dump PHB diag-data immediately

The PHB diag-data is important to help locating the root cause for
EEH errors such as frozen PE or fenced PHB. However, the EEH core
enables IO path by clearing part of HW registers before collecting
this data causing it to be corrupted.

This patch fixes this by dumping the PHB diag-data immediately when
frozen/fenced state on PE or PHB is detected for the first time in
eeh_ops::get_state() or next_error() backend.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Increase stack redzone for 64-bit userspace to 512 bytes

The new ELFv2 little-endian ABI increases the stack redzone -- the
area below the stack pointer that can be used for storing data --
from 288 bytes to 512 bytes.  This means that we need to allow more
space on the user stack when delivering a signal to a 64-bit process.

To make the code a bit clearer, we define new USER_REDZONE_SIZE and
KERNEL_REDZONE_SIZE symbols in ptrace.h.  For now, we leave the
kernel redzone size at 288 bytes, since increasing it to 512 bytes
would increase the size of interrupt stack frames correspondingly.

Gcc currently only makes use of 288 bytes of redzone even when
compiling for the new little-endian ABI, and the kernel cannot
currently be compiled with the new ABI anyway.

In the future, hopefully gcc will provide an option to control the
amount of redzone used, and then we could reduce it even more.

This also changes the code in arch_compat_alloc_user_space() to
preserve the expanded redzone.  It is not clear why this function would
ever be used on a 64-bit process, though.

Signed-off-by: Paul Mackerras <paulus@samba.org>
CC: <stable@vger.kernel.org> [v3.13]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/ftrace: bugfix for test_24bit_addr

The branch target should be the func addr, not the addr of func_descr_t.
So using ppc_function_entry() to generate the right target addr.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/crashdump : Fix page frame number check in copy_oldmem_page

In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.

This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.

This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly

Currently we're storing a host endian RTAS token in
rtas_stop_self_args.token. We then pass that directly to rtas. This is
fine on big endian however on little endian the token is not what we
expect.

This will typically result in hitting:
panic("Alas, I survived.\n");

To fix this we always use the stop-self token in host order and always
convert it to be32 before passing this to rtas.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ipv6: ipv6_find_hdr restore prev functionality

The commit 9195bb8e381d81d5a315f911904cdf0cfcc919b8 ("ipv6: improve
ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr().

When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr()
returns -ENOENT when it's found, not the header as expected.

A part of IPVS is broken and possible also nft_exthdr_eval().
When target is -1 which it is most cases, it works.

This patch exits the do while loop if the specific header is found
so the nexthdr could be returned as expected.

Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
CC:Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: recompute reachabletime before returning from neigh_periodic_work()

If the neigh table's entries is less than gc_thresh1, the function
will return directly, and the reachabletime will not be recompute,
so the reachabletime can be guessed.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branches 'pm-cpufreq', 'pm-hibernate' and 'acpi-processor'

* pm-cpufreq:
  intel_pstate: Change busy calculation to use fixed point math.

* pm-hibernate:
  PM / hibernate: Fix restore hang in freeze_processes()

* acpi-processor:
  ACPI / processor: Rework processor throttling with work_on_cpu()

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

John W. Linville says:

====================
Regarding the mac80211 bits, Johannes says:

"This time, I have a fix from Arik for scheduled scan recovery (something
that only recently went into the tree), a memory leak fix from Eytan and
a small regulatory bugfix from Inbal. The EAPOL change from Felix makes
rekeying more stable while lots of traffic is flowing, and there's
Emmanuel's and my fixes for a race in the code handling powersaving
clients."

Regarding the NFC bits, Samuel says:

"We only have one candidate for 3.14 fixes, and this is a NCI NULL
pointer dereference introduced during the 3.14 merge window."

Regarding the iwlwifi bits, Emmanuel says:

"This should fix an issue raised in iwldvm when we have lots of
association failures. There is a bugzilla for this bug - it hasn't
been validated by the user, but I hope it will do the trick."

Beyond that...

Amitkumar Karwar brings two mwifiex fixes, one to avoid a NULL pointer
dereference and another to address an improperly timed interrupt.

Arend van Spriel gives us a brcmfmac fix to avoid a crash during
scatter-gather packet transfers.

Avinash Patila offers an mwifiex to avoid an invalid memory access
when a device is removed.

Bing Zhao delivers a simple fix to avoid a naming conflict between
libertas and mwifiex.

Felix Fietkau provides a trio of ath9k fixes that properly account
for sequence numbering in ps-poll frames, reduce the rate for false
positives during baseband hang detection, and fix a regression related
to rx descriptor handling.

James Cameron shows us a libertas fix to ignore zero-length IEs when
processing scan results.

Kirill Tkhai brings a hostap fix to avoid prematurely freeing a timer.

Stanislaw Gruszka fixes an ath9k locking problem.

Sujith Manoharan addresses ETSI compliance for a device handled by
ath9k by adjusting the minimum CCA power threshold values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add missing bit in default Tx switching

Commit c14db2025 "bnx2x: Correct default Tx switching behaviour" supposedly
changed the default Tx switching behaviour, but was missing the fastpath change
required for FW to pass packets from PFs to VFs.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

kvm, vmx: Really fix lazy FPU on nested guest

Commit e504c9098ed6 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup. In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c9098ed6acd9e1079c5e10e4910724ad429f
Reported-by: Kashyap Chamarty <kchamart@redhat.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Kashyap Chamarty <kchamart@redhat.com>
Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

perf tools: fix BFD detection on opensuse

opensuse libbfd requires -lz -liberty to build. Add those to the BFD
feature detection.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1389469379-13340-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
1) Build fix for ip_vti when NET_IP_TUNNEL is not set.
   We need this set to have ip_tunnel_get_stats64()
   available.

2) Fix a NULL pointer dereference on sub policy usage.
   We try to access a xfrm_state from the wrong array.

3) Take xfrm_state_lock in xfrm_migrate_state_find(),
   we need it to traverse through the state lists.

4) Clone states properly on migration, otherwise we crash
   when we migrate a state with aead algorithm attached.

5) Fix unlink race when between thread context and timer
   when policies are deleted.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: ping: Use socket mark in routing lookup

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

mac80211: fix association to 20/40 MHz VHT networks

When a VHT network uses 20 or 40 MHz as per the HT operation
information, the channel center frequency segment 0 field in
the VHT operation information is reserved, so ignore it.

This fixes association with such networks when the AP puts 0
into the field, previously we'd disconnect due to an invalid
channel with the message
wlan0: AP VHT information is invalid, disable VHT

Cc: stable@vger.kernel.org
Fixes: f2d9d270c15ae ("mac80211: support VHT association")
Reported-by: Tim Nelson <tim.l.nelson@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

drm/radeon: enable speaker allocation setup on dce3.2

Now that we disable audio while setting up the audio
hw, we should be able to set this up without hangs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: change audio enable logic

Disable audio around audio hw setup. This may avoid
hangs on certain asics.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: fix audio disable on dce6+

Properly clear the enable bit when audio disable is requested.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org

drm/radeon: free uvd ring on unload

Need to free the uvd ring. Also reshuffle gart tear down to
happen after uvd tear down.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: disable pll sharing for DP on DCE4.1

Causes display problems. We had already disabled
sharing for non-DP displays.

Based on a patch from:
Niels Ole Salscheider <niels_ole@salscheider-online.de>

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=58121

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/radeon: fix missing bo reservation

Otherwise we might get a crash here.

Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: print the supported atpx function mask

Print the supported functions mask in addition to
the version. This is useful in debugging PX
problems since we can see what functions are available.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

Merge tag 'metag-fixes-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull Metag arch and asm-generic fixes from James Hogan:

- Add the new sched_setattr/sched_getattr syscalls to the asm-generic
   syscall list, which is used by arc, arm64, c6x, hexagon, metag,
  openrisc, score, tile, and unicore32.

- An IRQ affinity bug fix for metag to prevent interrupts being
   vectored to offline CPUs when their affinity is changed via
   /proc/irq/ (thanks tglx).

* tag 'metag-fixes-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  irq-metag*: stop set_affinity vectoring to offline cpus
  asm-generic: add sched_setattr/sched_getattr syscalls

Merge tag 'pwm/for-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fix from Thierry Reding:
"Just a single trivial patch to plug a memory leak in an error path"

* tag 'pwm/for-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: lp3943: Fix potential memory leak during request

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull filesystem fixes from Jan Kara:
"Notification, writeback, udf, quota fixes

  The notification patches are (with one exception) a fallout of my
  fsnotify rework which went into -rc1 (I've extented LTP to cover these
  cornercases to avoid similar breakage in future).

  The UDF patch is a nasty data corruption Al has recently reported,
  the revert of the writeback patch is due to possibility of violating
  sync(2) guarantees, and a quota bug can lead to corruption of quota
  files in ocfs2"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Allocate overflow events with proper type
  fanotify: Handle overflow in case of permission events
  fsnotify: Fix detection whether overflow event is queued
  Revert "writeback: do not sync data dirtied after sync start"
  quota: Fix race between dqput() and dquot_scan_active()
  udf: Fix data corruption on file type conversion
  inotify: Fix reporting of cookies for inotify events

Merge tag 'upstream-3.14-rc5' of git://git.infradead.org/linux-ubifs

Pull ubifs fix from Artem Bityutskiy:
"Just a single fix for the UBI module unload path which makes sure we
do not touch freed memory"

* tag 'upstream-3.14-rc5' of git://git.infradead.org/linux-ubifs:
UBI: fix some use after free bugs

kvm: x86: fix emulator buffer overflow (CVE-2014-0049)

The problem occurs when the guest performs a pusha with the stack
address pointing to an mmio address (or an invalid guest physical
address) to start with, but then extending into an ordinary guest
physical address.  When doing repeated emulated pushes
emulator_read_write sets mmio_needed to 1 on the first one.  On a
later push when the stack points to regular memory,
mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.

As a result, KVM exits to userspace, and then returns to
complete_emulated_mmio.  In complete_emulated_mmio
vcpu->mmio_cur_fragment is incremented.  The termination condition of
vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
The code bounces back and fourth to userspace incrementing
mmio_cur_fragment past it's buffer.  If the guest does nothing else it
eventually leads to a a crash on a memcpy from invalid memory address.

However if a guest code can cause the vm to be destroyed in another
vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
can be used by the guest to control the data that's pointed to by the
call to cancel_work_item, which can be used to gain execution.

Fixes: f78146b0f9230765c6315b2e14f56112513389ad
Signed-off-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org (3.5+)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT

Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added
support for CPU power-management, using a cpu_notifier to re-init
KVM on a CPU that entered CPU idle.

The code assumed that a CPU entering idle would actually be powered
off, loosing its state entierely, and would then need to be
reinitialized. It turns out that this is not always the case, and
some HW performs CPU PM without actually killing the core. In this
case, we try to reinitialize KVM while it is still live. It ends up
badly, as reported by Andre Przywara (using a Calxeda Midway):

[    3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
[    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
[    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0

The trick here is to detect if we've been through a full re-init or
not by looking at HVBAR (VBAR_EL2 on arm64). This involves
implementing the backend for __hyp_get_vectors in the main KVM HYP
code (rather small), and checking the return value against the
default one when the CPU notifier is called on CPU_PM_EXIT.

Reported-by: Andre Przywara <osp@andrep.de>
Tested-by: Andre Przywara <osp@andrep.de>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <rob.herring@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sch_tbf: Fix potential memory leak in tbf_change().

The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dm thin: allow metadata space larger than supported to go unused

It was always intended that a user could provide a thin metadata device
that is larger than the max supported by the on-disk format.  The extra
space would just go unused.

Unfortunately that never worked.  If the user attempted to use a larger
metadata device on creation they would get an error like the following:

device-mapper: space map common: space map too large
device-mapper: transaction manager: couldn't create metadata space map
device-mapper: thin metadata: tm_create_with_sm failed
device-mapper: table: 252:17: thin-pool: Error creating metadata object
device-mapper: ioctl: error adding target to table

Fix this by allowing the initial metadata space map creation to cap its
size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).

Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
the sizeof the disk_bitmap_header.  So the supported maximum metadata
size is a bit smaller (reduced from 33423360 to 33292800 sectors).

Lastly, remove the "excess space will not be used" warning message from
get_metadata_dev_size(); it resulted in printing the warning multiple
times.  Factor out warn_if_metadata_device_too_big(), call it from
pool_ctr() and maybe_resize_metadata_dev().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>

Merge branch 'liblockdep-fixes' of https://github.com/sashalevin/liblockdep into core/urgent

Pull tools/lib/lockdep/ fixes from Sasha Levin.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* Fix annotation on stdio/GTK+ interfaces (Namhyung Kim)

* Fix file descriptor leaking while searching DSOs for suitable symtab (Namhyung Kim).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'asoc-v3.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Updates for v3.14

A few more driver specific bug fixes, all driver specific things that
only affect users of those devices.

perf: Fix hotplug splat

Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.

It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.

We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Fix event scheduling

Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

  n_events = 2, n_added = 1, n_txn = 1

>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

group_sched_in()
  pmu->start_txn() /* nop - BP pmu */
  event_sched_in()
     event->pmu->add()

So here we should end up with:

  0: n_events = 3, n_added = 2, n_txn = 2
  4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded!  Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

event_sched_out()
  event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

n_event = 2, n_added = 2, n_txn = 2

>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Prevent rt_time growth to infinity

Kirill Tkhai noted:

  Since deadline tasks share rt bandwidth, we must care about
  bandwidth timer set. Otherwise rt_time may grow up to infinity
  in update_curr_dl(), if there are no other available RT tasks
  on top level bandwidth.

RT task were in fact throttled right after they got enqueued,
and never executed again (rt_time never again went below rt_runtime).

Peter then proposed to accrue DL execution on rt_time only when
rt timer is active, and proposed a patch (this patch is a slight
modification of that) to implement that behavior. While this
solves Kirill problem, it has a drawback.

Indeed, Kirill noted again:

  It looks we may get into a situation, when all CPU time is shared
  between RT and DL tasks:

  rt_runtime = n
  rt_period  = 2n

  | RT working, DL sleeping  | DL working, RT sleeping      |
  -----------------------------------------------------------
  | (1)     duration = n     | (2)     duration = n         | (repeat)
  |--------------------------|------------------------------|
  | (rt_bw timer is running) | (rt_bw timer is not running) |

  No time for fair tasks at all.

While this can happen during the first period, if rq is always backlogged,
RT tasks won't have the opportunity to execute anymore: rt_time reached
rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
throttled since rt timer didn't fire, replenishment is from now on eaten up
by DL tasks that accrue their execution on rt_time (while rt timer is
active - we have an RT task waiting for replenishment). FAIR tasks are
not touched after this first period. Ok, this is not ideal, and the situation
is even worse!

What above (the nice case), practically never happens in reality, where
your rt timer is not aligned to tasks periods, tasks are in general not
periodic, etc.. Long story short, you always risk to overload your system.

This patch is based on Peter's idea, but exploits an additional fact:
if you don't have RT tasks enqueued, it makes little sense to continue
incrementing rt_time once you reached the upper limit (DL tasks have their
own mechanism for throttling).

This cures both problems:

- no matter how many DL instances in the past, you'll have an rt_time
   slightly above rt_runtime when an RT task is enqueued, and from that
   point on (after the first replenishment), the task will normally execute;

- you can still eat up all bandwidth during the first period, but not
   anymore after that, remember that DL execution will increment rt_time
   till the upper limit is reached.

The situation is still not perfect! But, we have a simple solution for now,
that limits how much you can jeopardize your system, as we keep working
towards the right answer: RT groups scheduled using deadline servers.

Reported-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Switch CPU's presence test order

Commit 82b9580 ("sched/deadline: Test for CPU's presence explicitly")
changed how we check if a CPU returned by cpudeadline machinery is
valid. But, we don't want to call cpu_present() if best_cpu is
equal to -1. So, switch the order of tests inside WARN_ON().

Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: konrad.wilk@oracle.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1393238832-9100-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Cleanup RT leftovers from {inc/dec}_dl_migration

In deadline class we do not have group scheduling.

So, let's remove unnecessary

X = X;

equations.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix double normalization of vruntime

dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge remote-tracking branch 'asoc/fix/wm8958' into asoc-linus

Merge remote-tracking branches 'asoc/fix/da732x' and 'asoc/fix/sta32x' into asoc-linus

Merge tag 'asoc-v3.14-rc4' into asoc-linus

ASoC: Fixes for v3.14

A somewhat large set of fixes here due to the identification of some
systematic problems with hard to use APIs in the subsystem.  Takashi did
a lot of work to address the enumeration API which uncovered a number of
off by one bugs caused by confusing APIs while Charles addressed issues
in the locking around DAPM.

# gpg: Signature made Sun 23 Feb 2014 13:29:34 KST using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg:                 aka "Mark Brown <broonie@debian.org>"
# gpg:                 aka "Mark Brown <broonie@kernel.org>"
# gpg:                 aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg:                 aka "Mark Brown <broonie@linaro.org>"
# gpg:                 aka "Mark Brown <Mark.Brown@linaro.org>"

Merge tag 'asoc-v3.14-rc3' into asoc-linus

ASoC: Fixes for v3.14

A few fixes, all driver speccific ones.  The DaVinci ones aren't as
clear as they should be from the subject lines on the commits but they
fix issues which will prevent correct operation in some use cases and
only affect that particular driver so are reasonably safe.

# gpg: Signature made Wed 19 Feb 2014 13:23:13 KST using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg:                 aka "Mark Brown <broonie@debian.org>"
# gpg:                 aka "Mark Brown <broonie@kernel.org>"
# gpg:                 aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg:                 aka "Mark Brown <broonie@linaro.org>"
# gpg:                 aka "Mark Brown <Mark.Brown@linaro.org>"

iwlwifi: fix TX status for aggregated packets

Only the first packet is currently handled correctly, but then
all others are assumed to have failed which is problematic. Fix
this, marking them all successful instead (since if they're not
then the firmware will have transmitted them as single frames.)

This fixes the lost packet reporting.

Also do a tiny variable scoping cleanup.

Cc: <stable@vger.kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[Add the dvm part]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

ASoC: sta32x: Fix wrong enum for limiter2 release rate

There is a typo in the Limiter2 Release Rate control, a wrong enum for
Limiter1 is assigned.  It must point to Limiter2.
Spotted by a compile warning:

In file included from sound/soc/codecs/sta32x.c:34:0:
sound/soc/codecs/sta32x.c:223:29: warning: ‘sta32x_limiter2_release_rate_enum’ defined but not used [-Wunused-variable]
static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
                             ^
include/sound/soc.h:275:18: note: in definition of macro ‘SOC_ENUM_DOUBLE_DECL’
  struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
                  ^
sound/soc/codecs/sta32x.c:223:8: note: in expansion of macro ‘SOC_ENUM_SINGLE_DECL’
static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
        ^

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: <stable@vger.kernel.org>

iwlwifi: mvm: change of listen interval from 70 to 10

Some APs reject STA association request if a listen interval value exceeds
a threshold of 10. Thus, for example, Cisco APs may deny STA associations
returning status code 12 (Association denied due to reason outside the scope
of 802.11 standard) in the association response frame.

Fixing the issue by setting the default IWL_CONN_MAX_LISTEN_INTERVAL value
from 70 to 10.

Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

Merge tag 'asoc-v3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.14

A somewhat large set of fixes here due to the identification of some
systematic problems with hard to use APIs in the subsystem. Takashi did
a lot of work to address the enumeration API which uncovered a number of
off by one bugs caused by confusing APIs while Charles addressed issues
in the locking around DAPM.

MAINTAINERS: update drm git tree entry

Fix Dave's git tree.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

MAINTAINERS: add entry for drm radeon driver

Add an entry for radeon.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

bonding: disallow enslaving a bond to itself

Enslaving a bond to itself leads to an endless loop and hangs the kernel.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools/liblockdep: Use realpath for srctree and objtree

If BUILD_SRC or CURDIR contains tailing '/', the file names passed to gcc will
contain '//'. It will be contained .o's in debuginfo, then confuse debugedit:

https://bugzilla.redhat.com/show_bug.cgi?id=304121

This patch uses realpath command to makesure potential tailing '/'s are removed.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geng Hui <hui.geng@huawei.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

tools/liblockdep: Add a stub for new rcu_is_watching

Stub out rcu_is_watching(), prevents build error with the updated
tree.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

tools/liblockdep: Mark runtests.sh as executable

runtests.sh is used to run the sanity tests for liblockdep
and should be set +x.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

tools/liblockdep: Add include directory to allow tests to compile

All of the programs in the tests directory require the
liblockdep/mutex.h header in order to compile. Add the include directory
to the compiler options so that the tests can be built with the provided
Makefile.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

tools/liblockdep: Fix include of asm/hash.h

Commit 71ae8aac ("lib: introduce arch optimized hash library")
added an include to <linux/hash.h> for setting up an architecture
specific fast hash.

This patch mirrors the fix used for perf, titled "tools: perf: util: fix
include for non x86 architectures".

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

tools/liblockdep: Fix initialization code path

This makes initialization actually happen. Without it, initialization is
always skipped due to an incorrect conditional statement.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

clk:at91: Fix memory leak in of_at91_clk_master_setup()

cppcheck detected following error
[clk-master.c:245]: (error) Memory leak: characteristics

The original code forgot to free characteristics when
irq_of_parse_and_map() failed.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by Boris BREZILLON <b.brezillon@overkiz.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>

usb: ehci: fix deadlock when threadirqs option is used

ehci_irq() and ehci_hrtimer_func() can deadlock on ehci->lock when
threadirqs option is used. To prevent the deadlock use
spin_lock_irqsave() in ehci_irq().

This change can be reverted when hrtimer callbacks become threaded.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: ftdi_sio: add Cressi Leonardo PID

Hello,

the following patch adds an entry for the PID of a Cressi Leonardo
diving computer interface to kernel 3.13.0.
It is detected as FT232RL.
Works with subsurface.

Signed-off-by: Joerg Dorchain <joerg@dorchain.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI / processor: Rework processor throttling with work_on_cpu()

acpi_processor_set_throttling() uses set_cpus_allowed_ptr() to make
sure that the (struct acpi_processor)->acpi_processor_set_throttling()
callback will run on the right CPU. However, the function may be
called from a worker thread already bound to a different CPU in which
case that won't work.

Make acpi_processor_set_throttling() use work_on_cpu() as appropriate
instead of abusing set_cpus_allowed_ptr().

Reported-and-tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Cc: All applicable <stable@vger.kernel.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

bonding: fix a div error caused by the slave release path

There's a bug in the slave release function which leads the transmit
functions which use the bond->slave_cnt to a div by 0 because we might
just have released our last slave and made slave_cnt == 0 but at the same
time we may have a transmitter after the check for an empty list which will
fetch it and use it in the slave id calculation.
Fix it by moving the slave_cnt after synchronize_rcu so if this was our
last slave any new transmitters will see an empty slave list which is
checked after rcu lock but before calling the mode transmit functions
which rely on bond->slave_cnt.

Fixes: 278b208375 ("bonding: initial RCU conversion")
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

AX88179_178A: Add VID:DID for Lenovo OneLinkDock Gigabit LAN

Add VID:DID for Lenovo OneLinkDock Gigabit LAN

Signed-off-by: Freddy Xin <freddy@asix.com.tw>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding_rtnl'

Ding Tianhong says:

====================
Fix RTNL: assertion failed at net/core/rtnetlink.c

The commit 1d3ee88ae0d
(bonding: add netlink attributes to slave link dev)
make the bond_set_active_slave() and bond_set_backup_slave()
use rtmsg_ifinfo to send slave's states and this functions
should be called in RTNL.

But the 902.3ad and ARP monitor did not hold the RTNL when calling
thses two functions, so fix them.

v1->v2: Add new micro to indicate that the notification should be send
later, not never.
And add a new patch to fix the same problem for ARP mode.

v2->v3: modify the bond_should_notify to should_notify_rtnl, it is more
reasonable, and use bool for should_notify_rtnl.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor

Veaceslav has reported and fix this problem by commit f2ebd477f141bc0
(bonding: restructure locking of bond_ab_arp_probe()). According Jay's
opinion, the current solution is not very well, because the notification
is to indicate that the interface has actually changed state in a meaningful
way, but these calls in the ab ARP monitor are internal settings of the flags
to allow the ARP monitor to search for a slave to become active when there are
no active slaves. The flag setting to active or backup is to permit the ARP
monitor's response logic to do the right thing when deciding if the test
slave (current_arp_slave) is up or not.

So the best way to fix the problem is that we should not send a notification
when the slave is in testing state, and check the state at the end of the
monitor, if the slave's state recover, avoid to send pointless notification
twice. And RTNL is really a big lock, hold it regardless the slave's state
changed or not when the current_active_slave is null will loss performance
(every 100ms), so we should hold it only when the slave's state changed and
need to notify.

I revert the old commit and add new modifications.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for 802.3ad mode

The problem was introduced by the commit 1d3ee88ae0d
(bonding: add netlink attributes to slave link dev).
The bond_set_active_slave() and bond_set_backup_slave()
will use rtmsg_ifinfo to send slave's states, so these
two functions should be called in RTNL.

In 802.3ad mode, acquiring RTNL for the __enable_port and
__disable_port cases is difficult, as those calls generally
already hold the state machine lock, and cannot unconditionally
call rtnl_lock because either they already hold RTNL (for calls
via bond_3ad_unbind_slave) or due to the potential for deadlock
with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed,
bond_3ad_link_change, or bond_3ad_update_lacp_rate.  All four of
those are called with RTNL held, and acquire the state machine lock
second.  The calling contexts for __enable_port and __disable_port
already hold the state machine lock, and may or may not need RTNL.

According to the Jay's opinion, I don't think it is a problem that
the slave don't send notify message synchronously when the status
changed, normally the state machine is running every 100 ms, send
the notify message at the end of the state machine if the slave's
state changed should be better.

I fix the problem through these steps:

1). add a new function bond_set_slave_state() which could change
    the slave's state and call rtmsg_ifinfo() according to the input
    parameters called notify.

2). Add a new slave parameter which called should_notify, if the slave's state
    changed and don't notify yet, the parameter will be set to 1, and then if
    the slave's state changed again, the param will be set to 0, it indicate that
    the slave's state has been restored, no need to notify any one.

3). the __enable_port and __disable_port should not call rtmsg_ifinfo
    in the state machine lock, any change in the state of slave could
    set a flag in the slave, it will indicated that an rtmsg_ifinfo
    should be called at the end of the state machine.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Intel nic drivers

Add a new F: line for the intel subdirectories.

This allows get_maintainers to avoid using git log
and cc'ing people that have submitted clean-up style
patches for all first level directories under
drivers/net/ethernet/intel/

This does not make e100.c maintained.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: check for NULL efx->ptp_data in efx_ptp_event

If we receive a PTP event from the NIC when we haven't set up PTP state
in the driver, we attempt to read through a NULL pointer efx->ptp_data,
triggering a panic.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tcp: use NET_INC_STATS()

While LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES can only be incremented
in tcp_transmit_skb() from softirq (incoming message or timer
activation), it is better to use NET_INC_STATS() instead of
NET_INC_STATS_BH() as tcp_transmit_skb() can be called from process
context.

This will avoid copy/paste confusion when/if we want to add
other SNMP counters in tcp_transmit_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

clk: nomadik: fix multiplatform problem

The Nomadik debugfs screws up multiplatform boots if debugfs
is enabled on the multiplatform image, since it's a simple
initcall that is unconditionally executed and reads from certain
memory locations.

Fix this by checking that the driver has been properly
initialized, so a base offset to the Nomadik SRC controller
exists, before proceeding to register debugfs files.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mike Turquette <mturquette@linaro.org>

KVM: MMU: drop read-only large sptes when creating lower level sptes

Read-only large sptes can be created due to read-only faults as
follows:

- QEMU pagetable entry that maps guest memory is read-only
due to COW.
- Guest read faults such memory, COW is not broken, because
it is a read-only fault.
- Enable dirty logging, large spte not nuked because it is read-only.
- Write-fault on such memory causes guest to loop endlessly
(which must go down to level 1 because dirty logging is enabled).

Fix by dropping large spte when necessary.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

pwm: lp3943: Fix potential memory leak during request

Fix a memory leak in the lp3943_pwm_request_map() error handling path.
Make sure already allocated pwm map memory is freed correctly.
Detected by Coverity: CID 1162829.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Acked-by: Milo Kim <milo.kim@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>

dm mpath: fix stalls when handling invalid ioctls

An invalid ioctl will never be valid, irrespective of whether multipath
has active paths or not. So for invalid ioctls we do not have to wait
for multipath to activate any paths, but can rather return an error
code immediately. This fix resolves numerous instances of:

udevd[]: worker [] unexpectedly returned with status 0x0100

that have been seen during testing.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

ASoC: da732x: Mark DC offset control registers volatile

The driver reads from the DC offset control registers during callibration
but since the registers are marked as volatile and there is a register
cache the values will not be read from the hardware after the first reading
rendering the callibration ineffective.

It appears that the driver was originally written for the ASoC level
register I/O code but converted to regmap prior to merge and this issue
was missed during the conversion as the framework level volatile register
functionality was not being used.

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Cc: stable@vger.kernel.org

ALSA: hda/realtek - Add more entry for enable HP mute led

I lost this SSID. Add it into the fixup table.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

xfrm: Fix unlink race when policies are deleted.

When a policy is unlinked from the lists in thread context,
the xfrm timer can fire before we can mark this policy as dead.
So reinitialize the bydst hlist, then hlist_unhashed() will
notice that this policy is not linked and will avoid a
doulble unlink of that policy.

Reported-by: Xianpeng Zhao <673321875@qq.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

x86, kaslr: add missed "static" declarations

This silences build warnings about unexported variables and functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, kaslr: export offset in VMCOREINFO ELF notes

Include kASLR offset in VMCOREINFO ELF notes to assist in debugging.

[ hpa: pushing this for v3.14 to avoid having a kernel version with
kASLR where we can't debug output. ]

Signed-off-by: Eugene Surovegin <surovegin@google.com>
Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

PM / hibernate: Fix restore hang in freeze_processes()

During restore, pm_notifier chain are called with
PM_RESTORE_PREPARE.  The firmware_class driver handler
fw_pm_notify does not have a handler for this.  As a result,
it keeps a reader on the kmod.c umhelper_sem.  During
freeze_processes, the call to __usermodehelper_disable tries to
take a write lock on this semaphore and hangs waiting.

Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Acked-by: Ming Lei <ming.lei@canonical.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Change busy calculation to use fixed point math.

Commit fcb6a15c2e (intel_pstate: Take core C0 time into account for
core busy calculation) introduced a regression on some processor SKUs
supported by intel_pstate. This was due to the truncation caused by
using integer math to calculate core busy and C0 percentages.

On a i7-4770K processor operating at 800Mhz going to 100% utilization
the percent busy of the CPU using integer math is 22%, but it actually
is 22.85%. This value scaled to the current frequency returned 97
which the PID interpreted as no error and did not adjust the P state.

Tested on i7-4770K, i7-2600, i5-3230M.

Fixes: fcb6a15c2e7e (intel_pstate: Take core C0 time into account for core busy calculation)
References: https://lkml.org/lkml/2014/2/19/626
References: https://bugzilla.kernel.org/show_bug.cgi?id=70941
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

phy: unmask link partner capabilities

Masking the link partner's capabilities with local capabilities can be
misleading in autonegotiation scenarios such as PAUSE frame
autonegotiation.
This patch calculates the join between the local capabilities and the
link parner capabilities, when it determines the speed and duplex
settings, but does not mask any of the link partner capabilities when
it calculates PAUSE frame settings.

Signed-off-by: Cristian Bercaru <cristian.bercaru@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'akpm' (patches from Andrew Morton)

Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton akpm@linux-foundation.org>:
  MAINTAINERS: change mailing list address for Altera UART drivers
  Makefile: fix build with make 3.80 again
  MAINTAINERS: update L: misuses
  Makefile: fix extra parenthesis typo when CC_STACKPROTECTOR_REGULAR is enabled
  ipc,mqueue: remove limits for the amount of system-wide queues
  memcg: change oom_info_lock to mutex
  mm, thp: fix infinite loop on memcg OOM
  drivers/fmc/fmc-write-eeprom.c: fix decimal permissions
  drivers/iommu/omap-iommu-debug.c: fix decimal permissions
  mm, hwpoison: release page on PageHWPoison() in __do_fault()

net: Fix permission check in netlink_connect()

netlink_sendmsg() was changed to prevent non-root processes from sending
messages with dst_pid != 0.
netlink_connect() however still only checks if nladdr->nl_groups is set.
This patch modifies netlink_connect() to check for the same condition.

Signed-off-by: Mike Pecovnik <mike.pecovnik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/cxgb4: use remove handler as shutdown handler

Without a shutdown handler, T4 cards behave very badly after a kexec.
Some firmware calls return errors indicating allocation failures, for
example. This is probably because thouse resources were not released by
a BYE message to the firmware, for example.

Using the remove handler guarantees we will use a well tested path.

With this patch I applied, I managed to use kexec multiple times and
probe and iSCSI login worked every time.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlcnic'

Shahed Shaikh says:

====================
qlcnic: Bug fixes

This patch series includes following bug fixes,

* Fix for return value handling of function qlcnic_enable_msi_legacy().
* Fix for the usage of module parameters for interrupt mode.
  Driver should use flags while checking for driver's interrupt mode instead of
  module parameters.
* Revert commit 1414abea04 (qlcnic: Restrict VF from configuring any VLAN mode),
  in order to save some multicast filters.
* Fix a bug where driver was not re-setting sds ring count to 1 when
  it falls back from MSI-x mode to legacy interrupt mode.

Please apply to net.

Change in v2 -
Dropped patch "qlcnic: reset firmware API lock during driver load" for further rework.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix number of rings when we fall back from msix to legacy.

o Driver was not re-setting sds ring count to 1 after failing
to allocate msi-x interrupts.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Allow any VLAN to be configured from VF.

o This patch reverts commit 1414abea048e0835c43600d62808ed8163897227
  (qlcnic: Restrict VF from configuring any VLAN mode.)
  This will allow same multicast address to be used with any VLAN
  instead of programming seperate (MAC, VLAN) tuples in adapter.
  This will help save some multicast filters.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix usage of use_msi and use_msi_x module parameters

Once interrupts are enabled, instead of using module parameters,
use flags (QLCNIC_MSI_ENABLED and QLCNIC_MSIX_ENABLED) set by driver
to check interrupt mode.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix function return error check

Driver was treating -ve return value as success in case of
qlcnic_enable_msi_legacy() failure

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qeth: postpone freeing of qdio memory

To guarantee that a qdio ccw_device no longer touches the
qdio memory shared with Linux, the qdio ccw_device should
be offline when freeing the qdio memory. Thus this patch
postpones freeing of qdio memory.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: ipv6: better estimate tunnel header cut for correct ufo handling

Currently the UFO fragmentation process does not correctly handle inner
UDP frames.

(The following tcpdumps are captured on the parent interface with ufo
disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280,
both sit device):

IPv6:
16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000
16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471

We can see that fragmentation header offset is not correctly updated.
(fragmentation id handling is corrected by 916e4cf46d0204 ("ipv6: reuse
ip6_frag_id from ip6_ufo_append_data")).

IPv4:
16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276)
    192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000
16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772)
    192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109

In this case fragmentation id is incremented and offset is not updated.

First, I aligned inet_gso_segment and ipv6_gso_segment:
* align naming of flags
* ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we
  always ensure that the state of this flag is left untouched when
  returning from upper gso segmenation function
* ipv6_gso_segment: move skb_reset_inner_headers below updating the
  fragmentation header data, we don't care for updating fragmentation
  header data
* remove currently unneeded comment indicating skb->encapsulation might
  get changed by upper gso_segment callback (gre and udp-tunnel reset
  encapsulation after segmentation on each fragment)

If we encounter an IPIP or SIT gso skb we now check for the protocol ==
IPPROTO_UDP and that we at least have already traversed another ip(6)
protocol header.

The reason why we have to special case GSO_IPIP and GSO_SIT is that
we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner
protocol of GSO_UDP_TUNNEL or GSO_GRE packets.

Reported-by: Wolfgang Walter <linux@stwm.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: change mailing list address for Altera UART drivers

The nios2-dev list has been moved to the RocketBoards infrastructure, so
adjust the address accordingly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Makefile: fix build with make 3.80 again

According to Documentation/Changes, make 3.80 is still being supported
for building the kernel, hence make files must not make (unconditional)
use of features introduced only in newer versions. Commit 8779657d29c0
("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG") however
introduced an "else ifdef" construct which make 3.80 doesn't understand.

Also correct a warning message still referencing the old config option
name.

Apart from that I question the use of "ifdef" here (but it was used that
way already prior to said commit): ifeq (,y) would seem more to the
point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: update L: misuses

L: lines are for the email addresses of traditional mailing lists.
W: lines are for URLs.

Convert two L: misuses to W: links.

Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Makefile: fix extra parenthesis typo when CC_STACKPROTECTOR_REGULAR is enabled

An extra parenthesis typo introduced in 19952a92037e ("stackprotector:
Unify the HAVE_CC_STACKPROTECTOR logic between architectures") is
causing the following error when CONFIG_CC_STACKPROTECTOR_REGULAR is
enabled:

Makefile:608: Cannot use CONFIG_CC_STACKPROTECTOR: -fstack-protector not supported by compiler
Makefile:608: *** missing separator. Stop.

Signed-off-by: Fathi Boudra <fathi.boudra@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,mqueue: remove limits for the amount of system-wide queues

Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

"This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Madars Vitolins <m@silodev.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: change oom_info_lock to mutex

Kirill has reported the following:

  Task in /test killed as a result of limit of /test
  memory: usage 10240kB, limit 10240kB, failcnt 51
  memory+swap: usage 10240kB, limit 10240kB, failcnt 0
  kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
  Memory cgroup stats for /test:

  BUG: sleeping function called from invalid context at kernel/cpu.c:68
  in_atomic(): 1, irqs_disabled(): 0, pid: 66, name: memcg_test
  2 locks held by memcg_test/66:
   #0:  (memcg_oom_lock#2){+.+...}, at: [<ffffffff81131014>] pagefault_out_of_memory+0x14/0x90
   #1:  (oom_info_lock){+.+...}, at: [<ffffffff81197b2a>] mem_cgroup_print_oom_info+0x2a/0x390
  CPU: 2 PID: 66 Comm: memcg_test Not tainted 3.14.0-rc1-dirty #745
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
  Call Trace:
    __might_sleep+0x16a/0x210
    get_online_cpus+0x1c/0x60
    mem_cgroup_read_stat+0x27/0xb0
    mem_cgroup_print_oom_info+0x260/0x390
    dump_header+0x88/0x251
    ? trace_hardirqs_on+0xd/0x10
    oom_kill_process+0x258/0x3d0
    mem_cgroup_oom_synchronize+0x656/0x6c0
    ? mem_cgroup_charge_common+0xd0/0xd0
    pagefault_out_of_memory+0x14/0x90
    mm_fault_error+0x91/0x189
    __do_page_fault+0x48e/0x580
    do_page_fault+0xe/0x10
    page_fault+0x22/0x30

which complains that mem_cgroup_read_stat cannot be called from an atomic
context but mem_cgroup_print_oom_info takes a spinlock.  Change
oom_info_lock to a mutex.

This was introduced by 947b3dd1a84b ("memcg, oom: lock
mem_cgroup_print_oom_info").

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: fix infinite loop on memcg OOM

Masayoshi Mizuma reported a bug with the hang of an application under
the memcg limit.  It happens on write-protection fault to huge zero page

If we successfully allocate a huge page to replace zero page but hit the
memcg limit we need to split the zero page with split_huge_page_pmd()
and fallback to small pages.

The other part of the problem is that VM_FAULT_OOM has special meaning
in do_huge_pmd_wp_page() context.  __handle_mm_fault() expects the page
to be split if it sees VM_FAULT_OOM and it will will retry page fault
handling.  This causes an infinite loop if the page was not split.

do_huge_pmd_wp_zero_page_fallback() can return VM_FAULT_OOM if it failed
to allocate one small page, so fallback to small pages will not help.

The solution for this part is to replace VM_FAULT_OOM with
VM_FAULT_FALLBACK is fallback required.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/fmc/fmc-write-eeprom.c: fix decimal permissions

This 444 should have been octal.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>