Pileus Git - ~andy/linux/log

fib_rules: reorder struct fib_rules fields

Move refcnt, pref, suppress_ifgroup, suppress_prefixlen out of first
cache line, as they are not used in fast path.

Make sure ctarget & fr_net are in first cache line.

(Assuming 64 bit arches and 64 bytes cache lines)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_rules: fix suppressor names and default values

This change brings the suppressor attribute names into line; it also changes
the data types to provide a more consistent interface.

While -1 indicates that the suppressor is not enabled, values >= 0 for
suppress_prefixlen or suppress_ifgroup reject routing decisions violating the
constraint.

This changes the previously presented behaviour of suppress_prefixlen, where a
prefix length _less_ than the attribute value was rejected. After this change,
a prefix length less than *or* equal to the value is considered a violation of
the rule constraint.

It also changes the default values for default and newly added rules (disabling
any suppression for those).

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: cleanup the usage of vlan_dev_priv(dev)

This patch cleanup 2 points for the usage of vlan_dev_priv(dev):
* In vlan_dev.c/vlan_dev_hard_header, we should use the var *vlan directly
  after grabing the pointer at the beginning with
        *vlan = vlan_dev_priv(dev);
  when we need to access the fields of *vlan.
* In vlan.c/register_vlan_device, add the var *vlan pointer
        struct vlan_dev_priv *vlan;
to cleanup the code to access the fields of vlan_dev_priv(new_dev).

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic, bnx2i: Fix bug on some bnx2x devices that don't support iSCSI

On some bnx2x devices, iSCSI is determined to be unsupported only after
firmware is downloaded. We need to check max_iscsi_conn again after
NETDEV_UP and block iSCSI init operations. Without this fix, iscsiadm
can hang as the firmware will not respond to the iSCSI init message.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond_neigh_parms'

Veaceslav Falico says:

====================
Recent patches revealed an old bug, which was there for quite awhile. It's
related to vlan on top of bonding and ndo_neigh_setup(). When vlan device
is initiated, it calls its real_dev->ndo_neigh_setup(), and in case of
bonding - it will modify neigh_parms->neigh_setup to point to
bond_neigh_init, while neigh_parms are of vlan's dev.

This way, when neigh_parms->neigh_setup() of vlan's dev is called, the
bonding function will be called, which expects the dev to be struct
bonding, but will receive a vlan dev.

It was hidden before because of bond->first_slave usage. Now, with
Nikolay's conversion to list/RCU, first_slave is gone and we hit a null
pointer dereference when working with lists/slave.

First patch moves ndo_neigh_setup() in neigh_parms_alloc() to the bottom,
so that the ->dev will be available to the caller. It doesn't really change
anything, however is needed for the second patch.

Second patch makes bond_neigh_setup() (bond->ndo_neigh_setup()) check if
the neigh_parms are really from a bonding dev, and only modify the
neigh_setup in this case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: modify only neigh_parms owned by us

Otherwise, on neighbour creation, bond_neigh_init() will be called with a
foreign netdev.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup

dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
populate them before calling it.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: minor: remove unused pointer in alloc_pg_vec

Variable ptr is being assigned, but never used, so just remove it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_rules: add route suppression based on ifgroup

This change adds the ability to suppress a routing decision based upon the
interface group the selected interface belongs to. This allows it to
exclude specific devices from a routing decision.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

icmpv6_filter: allow ICMPv6 messages with bodies < 4 bytes

By using sizeof(_hdr), net/ipv6/raw.c:icmpv6_filter implicitly assumes
that any valid ICMPv6 message is at least eight bytes long, i.e., that
the message body is at least four bytes.

The DIS message of RPL (RFC 6550 section 6.2, from the 6LoWPAN world),
has a minimum length of only six bytes, and is thus blocked by
icmpv6_filter.

RFC 4443 seems to allow even a zero-sized body, making the minimum
allowable message size four bytes.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

icmpv6_filter: fix "_hdr" incorrectly being a pointer

"_hdr" should hold the ICMPv6 header while "hdr" is the pointer to it.
This worked by accident.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: simplify VLAN frame check in packet_snd

For ethernet frames, eth_type_trans() already parses the header, so one
can skip this when checking the frame size.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: fix for sending VLAN frames via packet_mmap

Since tpacket_fill_skb() parses the protocol field in ethernet frames'
headers, it's easy to see if any passed frame is a VLAN one and account
for the extended size.

But as the real protocol does not turn up before tpacket_fill_skb()
runs which in turn also checks the frame length, move the max frame
length calculation into the function.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: when sending ethernet frames, parse header for skb->protocol

This may be necessary when the SKB is passed to other layers on the go,
which check the protocol field on their own. An example is a VLAN packet
sent out using AF_PACKET on a bridge interface. The bridging code checks
the SKB size, accounting for any VLAN header only if the protocol field
is set accordingly.

Note that eth_type_trans() sets skb->dev to the passed argument, so this
can be skipped in packet_snd() for ethernet frames, as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Don't lookup dst if transport dst is still valid

When sctp sits on IPv6, sctp_transport_dst_check pass cookie as ZERO,
as a result ip6_dst_check always fail out. This behaviour makes
transport->dst useless, because every sctp_packet_transmit must look
for valid dst.

Add a dst_cookie into sctp_transport, and set the cookie whenever we
get new dst for sctp_transport. So dst validness could be checked
against it.

Since I have split genid for IPv4 and IPv6, also delete/add IPv6 address
will also bump IPv6 genid. So issues we discussed in:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4
have all been sloved for this patch.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'eth_alen'

Joe Perches says:

====================
Convert the uses mac addresses to ETH_ALEN so
it's easier to find and verify where mac addresses
need to be __aligned(2)

Change in V2:
- Remove include/acpi/actbl2.h conversion
It's a file copied from outside ACPI sources

Changes in V3:
- Don't move the pasemi_mac.h mac address to be aligned(2)
Just note that it's unaligned.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: Convert mac address uses of 6 to ETH_ALEN

Use the normal #define to help grep find mac addresses
and ensure that addresses are aligned.

pasemi.h has an unaligned access to mac_addr, unchanged
for now.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Olof Johansson <olof@lixom.net> # pasemi_mac pieces
Signed-off-by: David S. Miller <davem@davemloft.net>

include: Convert ethernet mac address declarations to use ETH_ALEN

It's convenient to have ethernet mac addresses use
ETH_ALEN to be able to grep for them a bit easier and
also to ensure that the addresses are __aligned(2).

Add #include <linux/if_ether.h> as necessary.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

uapi: Convert some uses of 6 to ETH_ALEN

Use the #define where appropriate.

Add #include <linux/if_ether.h>
where appropriate too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlcnic'

Himanshu Madhani says:

====================
This series contains following patches

o in v2 series, we received feedback on return codes to use standard error
  codes instead of mixing custom error codes. We have modified patch for
  loopback diagnostic test to return standard error codes.

o rest of the 3 patches in the series are for mailbox refactoring

  Current driver-firmware mailbox interface was operating in polling mode
  because of some limitations with the earlier versions of 83xx adapter
  firmware. These issues are resolved now and we are implementing the
  mailbox interface in interrupt mode.

  There are three patches which refactors mailbox handling:
  * Interrupt mode mailbox implantation.
  * Replace poll mode mailbox interfaces with interrupt mode interfaces.
  * Operate mailbox in poll mode when interrupts are not available.

changes from v2 -> v3
* Addressed review feedback to use standard return codes for loopback
   diagnostic test.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Update version to 5.2.45

Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Enable mailbox interface in poll mode when interrupts are not available

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Replace poll mode mailbox interface with interrupt based mailbox interface

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Interrupt based driver firmware mailbox mechanism

o Driver firmware mailbox interface was operating in polling mode
  because of limitations with the earlier versions of 83xx adapter firmware.
  These issues are resolved and we are implementing interrupt based mailbox
  mechanism.

o Data structures and API's for interrupt mode mailbox mechanism.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Enhance diagnostic loopback error codes.

o Enhanced the driver to use standard Linux error codes
o Return a unique error code to indicate loopback is in progress

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond_rcu'

Nikolay Aleksandrov says:

====================
This patchset aims to lay the groundwork, and do the initial conversion to
RCUism. I decided that it'll be much better to make the bonding RCU
conversion gradual, so patches can be reviewed and tested better rather
than having one huge patch (which I did in the beginning, before this).
The first patch is straightforward and it converts the bonding to the
standard list API, simplifying a lot of code, removing unnecessary local
variables and allowing to use the nice rculist API later. It also takes
care of some minor styling issues (re-arranging local variables longest ->
shortest, removing brackets for single statement if/else, leaving new line
before return statement etc.).
The second patch simplifies the conversion by removing unnecessary
read_lock(&bond->curr_slave_lock) in xmit paths that are to be converted
later, because we only care if the pointer is NULL or a slave there, since
we already have bond->lock the slave can't go away.
The third patch simplifies the broadcast xmit function by removing
the use of curr_active_slave and converting to standard list API. Also this
design of the broadcast xmit function avoids a subtle double packet tx race
when converted to RCU.
The fourth patch factors out the code that transmits skb through a slave
with given id (i.e. rr_tx_counter in rr mode, hashed value in xor mode) and
simplifies the active-backup xmit path because bond_dev_queue_xmit always
consumes the skb. The new bond_xmit_slave_id function is used in rr and xor
modes currently, but the plans are to use it in 3ad mode as well thus it's
made global. I've left the function prototype to be 81 chars so I wouldn't
break it, if this is an issue I can always break it in more lines.
The fifth patch introduces RCU by converting attach/detach and release to
RCU. It also converts dereferencing of curr_active_slave to rcu_dereference
although it's not fully converted to RCU, that is needed for the converted
xmit paths. And it converts roundrobin, broadcast, xor and active-backup
xmit paths to RCU. The 3ad and ALB/TLB modes acquire read_lock(&bond->lock)
to make sure that no slave will be removed and to sync properly with
enslave and release as before.
This way for the price of a little complexity, we'll be able to convert
individual parts of the bonding to RCU, and test them easier in the
process. If this patchset is accepted in some form, I'll post followups
in the next weeks that gradually convert the bonding to RCU and remove the
need for the rwlocks.
For performance notes please refer to patch 5 (RCU conversion one).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: initial RCU conversion

This patch does the initial bonding conversion to RCU. After it the
following modes are protected by RCU alone: roundrobin, active-backup,
broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
reading, and will be dealt with later. curr_active_slave needs to be
dereferenced via rcu in the converted modes because the only thing
protecting the slave after this patch is rcu_read_lock, so we need the
proper barrier for weakly ordered archs and to make sure we don't have
stale pointer. It's not tagged with __rcu yet because there's still work
to be done to remove the curr_slave_lock, so sparse will complain when
rcu_assign_pointer and rcu_dereference are used, but the alternative to use
rcu_dereference_protected would've created much bigger code churn which is
more difficult to test and review. That will be converted in time.

1. Active-backup mode
1.1 Perf recording while doing iperf -P 4
  - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
                 in bonding
  - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
                 in bonding
1.2. Bandwidth measurements
  - old bonding: 16.1 gbps consistently
  - new bonding: 17.5 gbps consistently

2. Round-robin mode
2.1 Perf recording while doing iperf -P 4
  - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
                 in bonding
  - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
                 in bonding
2.2 Bandwidth measurements
  - old bonding: 8 gbps (variable due to packet reorderings)
  - new bonding: 10 gbps (variable due to packet reorderings)

Of course the latency has improved in all converted modes, and moreover
while
doing enslave/release (since it doesn't affect tx anymore).

Also I've stress tested all modes doing enslave/release in a loop while
transmitting traffic.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: factor out slave id tx code and simplify xmit paths

I factored out the tx xmit code which relies on slave id in
bond_xmit_slave_id. It is global because later it can be used also in
3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
mode is simplified because bond_dev_queue_xmit always consumes the skb.
bond_xmit_xor becomes one line because of bond_xmit_slave_id.
bond_for_each_slave_from is not used in bond_xmit_slave_id because later
when RCU is used we can avoid important race condition by using standard
rculist routines.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: simplify broadcast_xmit function

We don't need to start from the curr_active_slave as the frame will be
sent to all eligible slaves anyway, so we remove the unnecessary local
variables, checks and comments, and make it use the standard list API.
This has the nice side-effect that later when it's converted to RCU
a race condition will be avoided which could lead to double packet tx.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove unnecessary read_locks of curr_slave_lock

In all the cases we already hold bond->lock for reading, so the slave
can't get away and the check != NULL is sufficient. curr_active_slave
can still change after the read_lock is unlocked prior to use of the
dereferenced value, so there's no need for it. It either contains a
valid slave which we use (and can't get away), or it is NULL which is
checked.
In some places the read_lock of curr_slave_lock was left because we need
it not to change while performing some action (e.g. syncing current
active slave's addresses, sending ARP requests through the active slave)
such cases will be dealt with individually while converting to RCU.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: convert to list API and replace bond's custom list

This patch aims to remove struct bonding's first_slave and struct
slave's next and prev pointers, and replace them with the standard Linux
list API. The old macros are converted to list API as well and some new
primitives are available now. The checks if there're slaves that used
slave_cnt have been replaced by the list_empty macro.
Also a few small style fixes, changing longest -> shortest line in local
variable declarations, leaving an empty line before return and removing
unnecessary brackets.
This is the first step to gradual RCU conversion.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: bump genid when delete/add address

Server           Client
2001:1::803/64  <-> 2001:1::805/64
2001:2::804/64  <-> 2001:2::806/64

Server side fib binary tree looks like this:

                                   (2001:/64)
                                   /
                                  /
                   ffff88002103c380
                 /                 \
     (2)        /                   \
(2001::803/128)                     ffff880037ac07c0
                                    /               \
                                   /                 \  (3)
                      ffff880037ac0640               (2001::806/128)
                       /             \
             (1)      /               \
        (2001::804/128)               (2001::805/128)

Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
destinate to 2001::806 with source address as 2001::804/64. That's because
2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
where the substantial difference between same prefix configuration and
different prefix configuration :) So packet are still transmitted out to
2001::806 with source address as 2001::804/64.

So bump genid will clear rt in (3), and up layer protocol will eventually
find the right one for themselves.

This problem arised from the discussion in here:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull-request for net-next/master. It consists of two patches
by Fabio Estevam. Them first convert the flexcan driver to use
devm_ioremap_resource(), the second adds return value checking for
clk_prepare_enable().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Revising locking scheme for MAC configuration

On very rare occasions, repeated load/unload stress test in the presence of
our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code
(NULL pointer dereference). Stack traces indicate the issue happens during MAC
configuration; thorough code review showed that indeed several races exist
in which one thread can iterate over the list of configured MACs while another
deletes entries from the same list.

This patch adds a varient on the single-writer/Multiple-reader lock mechanism -
It utilizes an already exsiting bottom-half lock, using it so that Whenever
a writer is unable to continue due to the existence of another writer/reader,
it pends its request for future deliverance.
The writer / last readers will check for the existence of such requests and
perform them instead of the original initiator.
This prevents the writer from having to sleep while waiting for the lock
to be accessible, which might cause deadlocks given the locks already
held by the writer.

Another result of this patch is that setting of Rx Mode is now made in
sleepable context - Setting of Rx Mode is made under a bottom-half lock, which
was always nontrivial for the bnx2x driver, as the HW/FW configuration requires
wait for completions.
Since sleep was impossible (due to the sleepless-context), various mechanisms
were utilized to prevent the calling thread from sleep, but the truth was that
when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the
Rx mode was still not set in HW/FW.

bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be
configured by the sp_rtnl_task which hold the RTNL lock and is sleepable
context.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix system hang due to fast igmp timer rescheduling

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event")
we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
scheduled with rtnl already held (e.g. when bond_change_active_slave is
called with rtnl) causing a loop of immediate reschedules + calls because
rtnl_trylock fails each time since it's being already held.
For me this issue leads to system hangs very easy:
modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
bonding;

The fix is to introduce a small (1 jiffy) delay which is enough for the
sections holding rtnl to finish without putting any strain on the system.
Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
most of the time it's called with rtnl already held.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: support PTP using the tilegx mPIPE (IEEE 1588)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: remove deprecated NETIF_F_LLTX flag from tile drivers

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: make "tile_net.custom" a proper bool module parameter

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: support TSO for IPv6 in tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: support multiple mPIPE shims in tilegx network driver

The initial driver support was for a single mPIPE shim on the chip
(as is the case for the Gx36 hardware). The Gx72 chip has two mPIPE
shims, so we extend the driver to handle that case.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: enable GRO in the tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: fix panic bug in napi support for tilegx network driver

The code used to call napi_disable() in an interrupt handler
(from smp_call_function), which in turn could call msleep().
Unfortunately you can't sleep in an interrupt context.

Luckily it turns out all the NAPI support functions are
just operating on data structures and not on any deeply
per-cpu data, so we can arrange to set up and tear down all
the NAPI state on the core driving the process, and just
do the IRQ enable/disable as a smp_call_function thing.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: update dev->stats directly in tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: support jumbo frames in the tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: remove dead is_dup_ack() function from tilepro net driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: avoid bug in tilepro net driver built with old hypervisor

Building against headers from an older Tilera hypervisor can cause
the frags[] array to be overrun. Don't enable TSO in that case.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: support rx_dropped/rx_errors in tilepro net driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: set hw_features and vlan_features in setup

This change allows the user to configure various features of the tile
networking drivers on and off. There is no change to the default
initialization state of either the tilegx or tilepro drivers.

Neither driver needs the ndo_fix_features or ndo_set_features callbacks,
since the generic code already handles the dependencies for
fix_features, and there is no hardware state to tweak in set_features.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Remove unused field grp_id from gfar_priv_grp

grp->grp_id is obsolete. It has no use in the current driver.
Remove it from gfar_priv_grp and put the 'rstat' member
in its place, in the 2nd cache line, as rstat needs fast access.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a temporary sanity check in skb_orphan()

David suggested to add a BUG_ON() to catch if some layer
sets skb->sk pointer without a corresponding destructor.

As skb can sit in a queue, it's mandatory to make sure the
socket cannot disappear, and it's usually done by taking a
reference on the socket, then releasing it from the skb
destructor.

This patch is a follow-up to commit c34a761231b5
("net: skb_orphan() changes") and will be reverted after
catching all possible offenders if any.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: flexcan: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: flexcan: Use devm_ioremap_resource()

Using devm_ioremap_resource() can make the code simpler and smaller.

Also, place alloc_candev() after of_match_device() to make error handling
easier.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

ipv6: fib6_rules should return exact return value

With the addition of the suppress operation
(7764a45a8f1fe74d4f7d301eaca2e558e7e2831a ("fib_rules: add .suppress
operation") we rely on accurate error reporting of the fib_rules.actions.

fib6_rule_action always returned -EAGAIN in case we could not find a
matching route and 0 if a rule was matched. This also included a match
for blackhole or prohibited rule actions which could get suppressed by
the new logic.

So adapt fib6_rule_action to always return the correct error code as
its counterpart fib4_rule_action does. This also fixes a possiblity of
nullptr-deref where we don't find a table, thus rt == NULL. Because
the condition rt != ip6_null_entry still holdes it seems we could later
get a nullptr bug on dereference rt->dst.

v2:
a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
changes to the patch.

Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

checksum: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfg80211.h/mac80211.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ax25.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arp/neighbour.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_rxrpc.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

addrconf.h: Remove extern function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: add networking/netdev-FAQ.txt

A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.

The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.

This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.

[1] https://lwn.net/Articles/559211/

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_rules: add .suppress operation

This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.

The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.

When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:

  $ ip route add table secuplink default via 10.42.23.1

  $ ip rule add pref 100            table main prefixlength 1
  $ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove extern from include/net/ scheduling prototypes

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skb_orphan() changes

It is illegal to set skb->sk without corresponding destructor.

Its therefore safe for skb_orphan() to not clear skb->sk if
skb->destructor is not set.

Also avoid clearing skb->destructor if already NULL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netem: Introduce skb_orphan_partial() helper

Commit 547669d483e578 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.

ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
tc qd add dev $ETH parent 1:$i netem delay 100ms
done

As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.

In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.

We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.

Tested:

With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits

lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: split rt_genid for ipv4 and ipv6

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt

The RFE interrupt is enabled for the r8a7790 but isn't handled,
resulting in the interrupts core noticing unhandled interrupts, and
eventually disabling the ethernet IRQ.

Fix it by adding RFE to the bitmask of error interrupts to be handled
for r8a7790.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe and pci.

The first patch for ixgbe from Greg Rose is the second submission.  The
first submission of "ixgbe: Retain VLAN filtering in promiscuous + VT
mode" had a typo, which Joe Perches pointed out and is fixed in this
submission.

Alex updates the ixgbe driver to use the generic helper pci_vfs_assigned
instead of the driver specific function ixgbe_vfs_are_assigned.

Don Skidmore provides 4 patches for ixgbe, the first being a fix for
flow control ethtool reporting.  Originally ixgbe_device_supports_autoneg_fc()
was expected to be called by only copper devices, which lead to false
information being displayed via ethtool.  Two other patches add support
for fixed fiber for SFP+ devices and the addition of a quad-port x520
adapter.  The last patch simply bumps the driver version.

Emil Tantilov provides 3 fixes for ixgbe, two of which resolve
semaphore lock issues.  The third fix resolves several issues in the
previous implementation of the SFF data dumps of SFP+ modules.

The remaining ixgbe and pci patches are from Jacob Keller.  The pci
patches exposes bus speed, link speed and bus width so that drivers
can take advantage of this information.  In addition, adds a pci function
which obtains minimum link width and speed.  Jacob also provides the
ixgbe patch to incorporate the pci function. He provides a patch that
fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This issue was found and reported by Stephen Hemminger.

-v2-
* fix patch 3 to be a bool function based on David Miller's feedback
* fix patch 4 debug message based on David Miller's feedback
* fix patch 8 moved the extern declarations to pci.h based on Bjorn
  Helgaas's feedback
* fix patch 11 update the error message to include encoding loss based
* fix patch 8/9/10 title based on Bjorn's feedback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Remove unused tcpct declarations and comments

Remove declaration, 4 defines and confusing comment that are no longer used
since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Acked-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: add support for quad-port x520 adapter

This is a x520 based quad-port (4x10Gbps) NIC with a single QSFP+
connector. Changes were required to our identify functions due to
different eeprom address which is also included here.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: clear semaphore bits on timeouts

This patch changes the error code path in ixgbe_acquire_swfw_sync() to deal
with cases where acquiring SW semaphore times out.

In cases where the SW/FW semaphore bits were set (i.e. due to a crash) the
driver will hang on load. With this patch the driver will clear
the stuck bits if the semaphore was not acquired in the allotted time.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: rename LL_EXTENDED_STATS to use queue instead of q

This patch renames the stats introduced by the busy poll feature so that they
are more inline with the current statistics naming schemes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix lockdep annotation issue for ptp's work item

This patch fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This is caused because ixgbe_ptp_stop did not check to actually
ensure PTP was running first. The new implementation introduces a state in the
&adapter->state field which is used to indicate that PTP is running. (This
replaces the IXGBE_FLAG2_PTP_ENABLED field). This state will use the atomic
set_bit, test_bit, and test_and_clear_bit functions. ixgbe_ptp_stop will check
to ensure that PTP was enabled, (and if not, it will not attempt to do any
cleanup work from ixgbe_ptp_init). This resolves the lockdep annotation warning
found by Stephen Hemminger

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth

This patch uses the new pcie_get_minimum_link function to perform a check to
ensure that the adapter is hooked into a slot which is capable of providing the
necessary bandwidth. This check supersedes the original method which only
checked the current pci device. The new method is capable of determining the
minimum speed and link of an entire PCI chain.

-v2-
* update the error message to include encoding loss

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

PCI: Add function to obtain minimum link width and speed

A PCI Express device can potentially report a link width and speed which it will
not properly fulfill due to being plugged into a slower link higher in the
chain. This function walks up the PCI bus chain and calculates the minimum link
width and speed of this entire chain. This can be useful to enable a device to
determine if it has enough bandwidth for optimum functionality.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: remove an unneeded check

"ifa->ifa_label" is an array inside the in_ifaddr struct. It can never
be NULL so we can remove this check.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: add support for IPPROTO_IPV6

Support IPPROTO_IPV6 similar to IPPROTO_IPIP

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: clean up IPIP case

Explicitly set proto to ETH_P_IP and jump directly to ip processing.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

PCI: move enum pcie_link_width into pci.h

pcie_link_width is the enum used to define the link width values for a pcie
device. This enum should not be contained solely in pci_hotplug.h, and this
patch moves it next to pci_bus_speed in pci.h

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

PCI: expose pcie_link_speed and pcix_bus_speed arrays

pcie_link_speed and pcix_bus_speed are arrays used by probe.c to correctly
convert lnksta register values into the pci_bus_speed enum. These static arrays
are useful outside probe for this purpose. This patch makes these defines into
conist arrays and exposes them with an extern header in drivers/pci/pci.h

-v2-
* move extern declarations to drivers/pci/pci.h

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix SFF data dumps of SFP+ modules

This patch fixes several issues with the previous implementation of the
SFF data dump of SFP+ modules:

- removed the __IXGBE_READ_I2C flag - I2C access locking is handled in the
  HW specific routines

- fixed the read loop to read data from ee->offset to ee->len

- the reads fail if __IXGBE_IN_SFP_INIT is set in the process - this is
  needed because on some HW I2C operations can take long time and disrupt
  the SFP and link detection process

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix semaphore lock for I2C read/writes on 82598

ixgbe_read/write_i2c_phy_82598() does not hold the SWFW_SYNC
semaphore for the entire function. Instead the lock is held only
during the phy.ops.read/write_reg operations. As result when the
function is being called simultaneously the I2C read/writes can
be corrupted.

The following patch introduces the SWFW_SYNC semaphore for the
entire ixgbe_read/write_i2c_phy_82598() function. To accomplish
this I had to create 2 separate functions:

ixgbe_read_phy_reg_mdi()
ixgbe_write_phy_reg_mdi()

Those functions are identical to ixgbe_read/write_phy_reg_generic()
sans the locking, and can be used in ixgbe_read/write_i2c_phy_82598()
with the SWFW_SYNC semaphore being held.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: bump version number

Bump the version number to better match with a similar version of the
out of tree driver.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add new media type.

This patch adds support for a new media type fiber_fixed. This is useful
to avoid all the SFP+ hot plug support path on devices who's fix fiber need
not worry about such things. This patch is needed for a following patch
that adds support for "fiber_fixed" devices.

v2: cleaned up logging message based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'phys_port'

Jiri Pirko says:

====================
This patchset is based on patch by Narendra_K@Dell.com
Once device which can change phys port id during its lifetime adopts this,
NETDEV_CHANGEPHYSPORTID event will be added and driver will call
call_netdevice_notifiers(NETDEV_NETDEV_CHANGEPHYSPORTID, dev) to propagate
the change to userspace.

v1->v2: as suggested by Ben, handle -EOPNOTSUPP in rtnl code (wrapped up ndo call)
v2->v3: adjusted patch 1 commit message
v3->v4: used "%phN" for sysfs printf as suggested by DaveM
        added igb/igbvf implementation as requested by Or Gerlitz
v4->v5: used prandom_u32 to generate id in igb_probe
        removed duplicate code in ibgvf_probe
        pushed dev_err string into one line in igbvf_refresh_ppid
v5->v6: use uuid_le_gen for generating 16-byte phys port id for igb/igbvf
as suggested by BenH

1) Why do we need this, and why do existing facilities fail to provide
   a way to accomplish this?

Currenty there's very hard to tell if two netdevs are using the same physical
port. For sr-iov this can be get by sysfs. For other mechanisms, like NPAR
there's very hard to do it (one must learn it from NIC BIOS). But even for
sr-iov there's no way to say if two netdevs are using the same phys port when
these are passed through to virtual guests.

This patchset provides the generic way of letting this information know to
userspace. This info can be used by apps like NetworkManager, teamd, Wicked,
ovs daemon, etc, to do smarter bonding decisions.

2) Why is the physical port ID defined as a 32 byte opaque cookie?
   What formats and layouts need to be accomodated, and which
   influenced the design of the ID?

For user to distinguish if two netdevs are using the same port, he only needs
to compare their phys port ids. Nothing else is needed. This id has no
structure for security reasons. VF should not know anything about PF.

3) Are IDs globally unique?  Why or why not?  If IDs should be
   globally unique, but only in certain cases, what exactly are those
   cases.

Most of the time only uniqueness needed is in scope of single machine.
There might be case when the id should be unique between couple of machines
in virtualization environment. Given that for example for igb/igbvf 16B uuid
is used, there is no problem for this case as well. But each driver can
implement this differently focusing the hw capabilities and needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: export physical port id via sysfs

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnl: export physical port id via RT netlink

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add ndo to get id of physical port of the device

This patch adds a ndo for getting physical port of the device. Driver
which is aware of being virtual function of some physical port should
implement this ndo. This is applicable not only for IOV, but for other
solutions (NPAR, multichannel) as well. Basically if there is possible
to have multiple netdevs on the single hw port.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix fc autoneg ethtool reporting.

Originally ixgbe_device_supports_autoneg_fc() was only expected to
be called by copper devices. This would lead to false information
to be displayed via ethtool.

v2: changed ixgbe_device_supports_autoneg_fc() to a bool function,
it returns bool. Based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned

This change makes it so that the ixgbe driver uses the generic helper
pci_vfs_assigned instead of the ixgbe specific function
ixgbe_vfs_are_assigned.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Retain VLAN filtering in promiscuous + VT mode

When using the new bridge FDB interface to allow SR-IOV virtual function
network devices to communicate with SW bridged network devices the
physical function is placed into promiscuous mode and hardware VLAN
filtering is disabled. This defeats the ability to use VLAN tagging
to isolate user networks. When the device is in promiscuous mode and
VT mode simultaneously ensure that VLAN hardware filtering remains
enabled.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: mvneta: support big endian

Use the "swap descriptor" feature of the hardware to properly swap the
descriptors when running in big endian mode. Since the swapping occurs
on 64 bits words, we also need to provide a separate structure layout
for the DMA descriptors between little endian and big endian mode,
like is done in the mv643xx_eth driver.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: move the RX and TX desc macros outside of the structs

The macros used for the various fields of the RX and TX descriptions
are currently declared next to those fields within the structure
definitions of the RX and TX descriptors.

However, in order to support big endian, we'll have to use the "swap
descriptors" features of the hardware, which swaps every byte within
each 64 bits word of the descriptors. This requires a separate
definition of the RX and TX descriptor structures for little and big
endian, as is done in the mv643xx_eth. Those macros can therefore no
longer be defined inside those structures.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: Require CONFIG_INET due to use of IPv4 checksum function

Unlike for IPv6, the IPv4 checksum functions are only available
if CONFIG_INET is set.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies

| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.

Original idea and first implementation by Eric Dumazet.

Cc: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: Add support for set MAC address

Adding support for setting MAC address to cpsw device via ndo_set_mac_address

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: handle 64-bit statistics in tilepro network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9p: client: remove unused code and any reference to "cancelled" function

This patch reverts commit

80b45261a0b263536b043c5ccfc4ba4fc27c2acc

which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.

This implementation was not used anywhere and therefore removed.

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: don't use dev_err when AER enabling fails

The driver uses dev_err when enabling of AER fails (e.g. PCIe AER is not
supported). The dev_info is more appropriate to avoid console pollution.

Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>