summary refs log tree commit diff
AgeCommit message (Collapse)Author
2017-12-15net: aquantia: Fill in multicast counter in ndev stats from hardwareIgor Russkikh
This metric comes from HW and is also diff-calculated, like other counters Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fill ndev stat couters from hardwareIgor Russkikh
Originally they were filled from ring sw counters. These sometimes incorrectly calculate byte and packet amounts when using LRO/LSO and jumboframes. Filling ndev counters from hardware makes them precise. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Extend stat counters to 64bit valuesIgor Russkikh
Device hardware provides only 32bit counters. Using these directly causes byte counters to overflow soon. A separate nic level structure with 64 bit counters is now used to collect incrementally all the stats and report these counters to ethtool stats and ndev stats. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fix hardware DMA stream overload on large MRRSIgor Russkikh
Systems with large MRRS on device (2K, 4K) with high data rates and/or large MTU, atlantic observes DMA packet buffer overflow. On some systems that causes PCIe transaction errors, hardware NMIs or datapath freeze. This patch 1) Limits MRRS from device side to 2K (thats maximum our hardware supports) 2) Limit maximum size of outstanding TX DMA data read requests. This makes hardware buffers running fine. Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fix actual speed capabilities reportingIgor Russkikh
Different hardware device Ids correspond to different maximum speed available. Extra checks were added for devices D108 and D109 to remove unsupported speeds from these device capabilities list. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sock: free skb in skb_complete_tx_timestamp on errorWillem de Bruijn
skb_complete_tx_timestamp must ingest the skb it is passed. Call kfree_skb if the skb cannot be enqueued. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()") Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 's390-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2017-12-13 some more patches for 4.15, that fix multiple issues with IP Takeover configuration in qeth. Please queue them up for stable kernels as well (4.9 and newer). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15s390/qeth: update takeover IPs after configuration changeJulian Wiedmann
Any modification to the takeover IP-ranges requires that we re-evaluate which IP addresses are takeover-eligible. Otherwise we might do takeover for some addresses when we no longer should, or vice-versa. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15s390/qeth: lock IP table while applying takeover changesJulian Wiedmann
Modifying the flags of an IP addr object needs to be protected against eg. concurrent removal of the same object from the IP table. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15s390/qeth: don't apply takeover changes to RXIPJulian Wiedmann
When takeover is switched off, current code clears the 'TAKEOVER' flag on all IPs. But the flag is also used for RXIP addresses, and those should not be affected by the takeover mode. Fix the behaviour by consistenly applying takover logic to NORMAL addresses only. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15s390/qeth: apply takeover changes when mode is toggledJulian Wiedmann
Just as for an explicit enable/disable, toggling the takeover mode also requires that the IP addresses get updated. Otherwise all IPs that were added to the table before the mode-toggle, get registered with the old settings. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge tag 'batadv-net-for-davem-20171215' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Initialize the fragment headers, by Sven Eckelmann - Fix a NULL check in BATMAN V, by Sven Eckelmann - Fix kernel doc for the time_setup() change, by Sven Eckelmann - Use the right lock in BATMAN IV OGM Update, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15mlxsw: spectrum: Disable MAC learning for ovs portYuval Mintz
Learning is currently enabled for ports which are OVS slaves - even though OVS doesn't need this indication. Since we're not associating a fid with the port, HW would continuously notify driver of learned [& aged] MACs which would be logged as errors. Fixes: 2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-13 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Addition of explicit scheduling points to map alloc/free in order to avoid having to hold the CPU for too long, from Eric. 2) Fixing of a corruption in overlapping perf_event_output calls from different BPF prog types on the same CPU out of different contexts, from Daniel. 3) Fallout fixes for recent correction of broken uapi for BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header that needed to be pulled in from asm-generic and for BPF selftests the asm-generic include did not work, so similar asm include scheme was adapted for that problematic header that perf is having with other header files under tools, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Merge branch 'mlx4-misc-fixes'David S. Miller
Tariq Toukan says: ==================== mlx4 misc fixes This patchset contains misc bug fixes from the team to the mlx4 Core and Eth drivers. Patch 1 by Eugenia fixes an MTU issue in selftest. Patch 2 by Eran fixes an accounting issue in the resource tracker. Patch 3 by Eran fixes a race condition that causes counter inconsistency. Series generated against net commit: 200809716aed fou: fix some member types in guehdr v2: Patch 2: Add reviewer credit, rephrase commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_en: Fill all counters under one call of stats lockEran Ben Elisha
Before this patch, the stats_lock was acquired twice. In between the locks Driver sent command to gather some more statistics (per priority and counter statistics). If the stats lock was acquired by get statistics NDO in between we would have report out of sync counters. Fix this by collecting all stats from Firmware in advance and then fill the Software structs under one lock. Fixes: 0b131561a7d6 ("net/mlx4_en: Add Flow control statistics display via ethtool") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_core: Fix wrong calculation of free countersEran Ben Elisha
The field res_free indicates the total number of counters which are available for allocation (reserved and unreserved). Fixed a bug where the reserved counters were subtracted from res_free before any allocation was performed. Before this fix, free counters which were not reserved could not be allocated. Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net/mlx4_en: Fix selftest for small MTUsEugenia Emantayev
Set the minimal MTU threshold for running loopback selftest. MTU should be big enough to include packet payload, NET_IP_ALIGN, Ethernet headers and preamble length. Fixes: e7c1c2c46201 ("mlx4_en: Added self diagnostics test implementation") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: phy: marvell: avoid configuring fiber page for SGMII-to-CopperRussell King
When in SGMII-to-Copper mode, the fiber page is used for the MAC facing link, and does not require configuration of the fiber auto-negotiation settings. Avoid trying. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13dwc-xlgmac: Add co-maintainerJie Deng
Jose Abreu will join to maintain dwc-xlgmac. He will help with new feature development for this driver. Thanks Jose and welcome on board! Signed-off-by: Jie Deng <jiedeng@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13tcp: refresh tcp_mstamp from timers callbacksEric Dumazet
Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Mike Maloney <maloney@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13tcp: fix potential underestimation on rcv_rttWei Wang
When ms timestamp is used, current logic uses 1us in tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us. This could cause rcv_rtt underestimation. Fix it by always using a min value of 1ms if ms timestamp is used. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13skge: remove redundunt free_irq under spinlockStephen Hemminger
The code to handle multi-port SKGE boards was freeing IRQ twice. The first one was under lock and might sleep. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: phy: meson-gxl: make function meson_gxl_read_status staticColin Ian King
The function meson_gxl_read_status is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'meson_gxl_read_status' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13of_mdio / mdiobus: ensure mdio devices have fwnode correctly populatedRussell King
Ensure that all mdio devices populate the struct device fwnode pointer as well as the of_node pointer to allow drivers that wish to use fwnode APIs to work. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: phy: fix resume handlingRussell King
When a PHY has the BMCR_PDOWN bit set, it may decide to ignore writes to other registers, or reset the registers to power-on defaults. Micrel PHYs do this for their interrupt registers. The current structure of phylib tries to enable interrupts before resuming (and releasing) the BMCR_PDOWN bit. This fails, causing Micrel PHYs to stop working after a suspend/resume sequence if they are using interrupts. Fix this by ensuring that the PHY driver resume methods do not take the phydev->lock mutex themselves, but the callers of phy_resume() take that lock. This then allows us to move the call to phy_resume() before we enable interrupts in phy_start(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13ARM: dts: vf610-zii-dev: use XAUI for DSA link portsRussell King
Use XAUI rather than XGMII for DSA link ports, as this is the interface mode that the switches actually use. XAUI is the 4 lane bus with clock per direction, whereas XGMII is a 32 bit bus with clock. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: dsa: allow XAUI phy interface modeRussell King
XGMII is a 32-bit bus plus two clock signals per direction. XAUI is four serial lanes per direction. The 88e6190 supports XAUI but not XGMII as it doesn't have enough pins. The same is true of 88e6176. Match on PHY_INTERFACE_MODE_XAUI for the XAUI port type, but keep accepting XGMII for backwards compatibility. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13hippi: Fix a Fix a possible sleep-in-atomic bug in rr_closeJia-Ju Bai
The driver may sleep under a spinlock. The function call path is: rr_close (acquire the spinlock) free_irq --> may sleep To fix it, free_irq is moved to the place without holding the spinlock. This bug is found by my static analysis tool(DSAC) and checked by my code review. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The follow patchset contains Netfilter fixes for your net tree, they are: 1) Fix compilation warning in x_tables with clang due to useless redundant reassignment, from Colin Ian King. 2) Add bugtrap to net_exit to catch uninitialized lists, patch from Vasily Averin. 3) Fix out of bounds memory reads in H323 conntrack helper, this comes with an initial patch to remove replace the obscure CHECK_BOUND macro as a dependency. From Eric Sesterhenn. 4) Reduce retransmission timeout when window is 0 in TCP conntrack, from Florian Westphal. 6) ctnetlink clamp timeout to INT_MAX if timeout is too large, otherwise timeout wraps around and it results in killing the entry that is being added immediately. 7) Missing CAP_NET_ADMIN checks in cthelper and xt_osf, due to no netns support. From Kevin Cernekee. 8) Missing maximum number of instructions checks in xt_bpf, patch from Jann Horn. 9) With no CONFIG_PROC_FS ipt_CLUSTERIP compilation breaks, patch from Arnd Bergmann. 10) Missing netlink attribute policy in nftables exthdr, from Florian Westphal. 11) Enable conntrack with IPv6 MASQUERADE rules, as a357b3f80bc8 should have done in first place, from Konstantin Khlebnikov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: ethernet: arc: fix error handling in emac_rockchip_probeBranislav Radocaj
If clk_set_rate() fails, we should disable clk before return. Found by Linux Driver Verification project (linuxtesting.org). Changes since v2 [1]: * Merged with latest code changes Changes since v1: Update made thanks to David's review, much appreciated David. * Improved inconsistent failure handling of clock rate setting * For completeness of usecase, added arc_emac_probe error handling Signed-off-by: Branislav Radocaj <branislav@radocaj.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: qmi_wwan: add Sierra EM7565 1199:9091Sebastian Sjoholm
Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem. The USB id is added to qmi_wwan.c to allow QMI communication with the EM7565. Signed-off-by: Sebastian Sjoholm <ssjoholm@mac.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: igmp: Use correct source address on IGMPv3 reportsKevin Cernekee
Closing a multicast socket after the final IPv4 address is deleted from an interface can generate a membership report that uses the source IP from a different interface. The following test script, run from an isolated netns, reproduces the issue: #!/bin/bash ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 up ip link set dummy1 up ip addr add 10.1.1.1/24 dev dummy0 ip addr add 192.168.99.99/24 dev dummy1 tcpdump -U -i dummy0 & socat EXEC:"sleep 2" \ UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 & sleep 1 ip addr del 10.1.1.1/24 dev dummy0 sleep 5 kill %tcpdump RFC 3376 specifies that the report must be sent with a valid IP source address from the destination subnet, or from address 0.0.0.0. Add an extra check to make sure this is the case. Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13tipc: eliminate potential memory leakJon Maloy
In the function tipc_sk_mcast_rcv() we call refcount_dec(&skb->users) on received sk_buffers. Since the reference counter might hit zero at this point, we have a potential memory leak. We fix this by replacing refcount_dec() with kfree_skb(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13net: remove duplicate includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13ipv4: igmp: guard against silly MTU valuesEric Dumazet
IPv4 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in igmp code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv4 minimal MTU. This patch adds missing IPV4_MIN_MTU define, to not abuse ETH_MIN_MTU anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13ipv6: mcast: better catch silly mtu valuesEric Dumazet
syzkaller reported crashes in IPv6 stack [1] Xin Long found that lo MTU was set to silly values. IPv6 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in mld code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv6 minimal MTU. [1] skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20 head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801db307508 EFLAGS: 00010286 RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000 RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95 RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020 R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540 FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> skb_over_panic net/core/skbuff.c:109 [inline] skb_put+0x181/0x1c0 net/core/skbuff.c:1694 add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695 add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817 mld_send_cr net/ipv6/mcast.c:1903 [inline] mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448 call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320 expire_timers kernel/time/timer.c:1357 [inline] __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 __do_softirq+0x29d/0xbb2 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d3/0x210 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:540 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13Revert "ravb: add workaround for clock when resuming with WoL enabled"Geert Uytterhoeven
This reverts commit fbf3d034f2ff6264183cfa6845770e8cc2a986c8. As of commit 560869100b99a3da ("clk: renesas: cpg-mssr: Restore module clocks during resume"), the workaround is no longer needed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-12bpf: add schedule points to map alloc/freeEric Dumazet
While using large percpu maps, htab_map_alloc() can hold cpu for hundreds of ms. This patch adds cond_resched() calls to percpu alloc/free call sites, all running in process context. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12Merge branch 'bpf-misc-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== Couple of outstanding fixes for BPF tree: 1) fixes a perf RB corruption, 2) and 3) fixes a few build issues from the recent bpf_perf_event.h uapi corrections. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12bpf: fix broken BPF selftest buildDaniel Borkmann
At least on x86_64, the kernel's BPF selftests seemed to have stopped to build due to 618e165b2a8e ("selftests/bpf: sync kernel headers and introduce arch support in Makefile"): [...] In file included from test_verifier.c:29:0: ../../../include/uapi/linux/bpf_perf_event.h:11:32: fatal error: asm/bpf_perf_event.h: No such file or directory #include <asm/bpf_perf_event.h> ^ compilation terminated. [...] While pulling in tools/arch/*/include/uapi/asm/bpf_perf_event.h seems to work fine, there's no automated fall-back logic right now that would do the same out of tools/include/uapi/asm-generic/bpf_perf_event.h. The usual convention today is to add a include/[uapi/]asm/ equivalent that would pull in the correct arch header or generic one as fall-back, all ifdef'ed based on compiler target definition. It's similarly done also in other cases such as tools/include/asm/barrier.h, thus adapt the same here. Fixes: 618e165b2a8e ("selftests/bpf: sync kernel headers and introduce arch support in Makefile") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12bpf: fix build issues on um due to mising bpf_perf_event.hDaniel Borkmann
Since c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build on i386 or x86_64: [...] CC init/main.o In file included from ../include/linux/perf_event.h:18:0, from ../include/linux/trace_events.h:10, from ../include/trace/syscall.h:7, from ../include/linux/syscalls.h:82, from ../init/main.c:20: ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error: asm/bpf_perf_event.h: No such file or directory #include <asm/bpf_perf_event.h> [...] Lets add missing bpf_perf_event.h also to um arch. This seems to be the only one still missing. Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") Reported-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Richard Weinberger <richard@sigma-star.at> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Richard Weinberger <richard@sigma-star.at> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12bpf: fix corruption on concurrent perf_event_output callsDaniel Borkmann
When tracing and networking programs are both attached in the system and both use event-output helpers that eventually call into perf_event_output(), then we could end up in a situation where the tracing attached program runs in user context while a cls_bpf program is triggered on that same CPU out of softirq context. Since both rely on the same per-cpu perf_sample_data, we could potentially corrupt it. This can only ever happen in a combination of the two types; all tracing programs use a bpf_prog_active counter to bail out in case a program is already running on that CPU out of a different context. XDP and cls_bpf programs by themselves don't have this issue as they run in the same context only. Therefore, split both perf_sample_data so they cannot be accessed from each other. Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Reported-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12tcp md5sig: Use skb's saddr when replying to an incoming segmentChristoph Paasch
The MD5-key that belongs to a connection is identified by the peer's IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying to an incoming segment from tcp_check_req() that failed the seq-number checks. Thus, to find the correct key, we need to use the skb's saddr and not the daddr. This bug seems to have been there since quite a while, but probably got unnoticed because the consequences are not catastrophic. We will call tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer, thus the connection doesn't really fail. Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11fou: fix some member types in guehdrXin Long
guehdr struct is used to build or parse gue packets, which are always in big endian. It's better to define all guehdr members as __beXX types. Also, in validate_gue_flags it's not good to use a __be32 variable for both Standard flags(__be16) and Private flags (__be32), and pass it to other funcions. This patch could fix a bunch of sparse warnings from fou. Fixes: 5024c33ac354 ("gue: Add infrastructure for flags and options") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streamsXin Long
Now in sctp_setsockopt_reset_streams, it only does the check optlen < sizeof(*params) for optlen. But it's not enough, as params->srs_number_streams should also match optlen. If the streams in params->srs_stream_list are less than stream nums in params->srs_number_streams, later when dereferencing the stream list, it could cause a slab-out-of-bounds crash, as reported by syzbot. This patch is to fix it by also checking the stream numbers in sctp_setsockopt_reset_streams to make sure at least it's not greater than the streams in the list. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11net: ipv4: fix for a race condition in raw_sendmsgMohamed Ghannam
inet->hdrincl is racy, and could lead to uninitialized stack pointer usage, so its value should be read only once. Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt") Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11netlink: Add netns check on tapsKevin Cernekee
Currently, a nlmon link inside a child namespace can observe systemwide netlink activity. Filter the traffic so that nlmon can only sniff netlink messages from its own netns. Test case: vpnns -- bash -c "ip link add nlmon0 type nlmon; \ ip link set nlmon0 up; \ tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" & sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \ spi 0x1 mode transport \ auth sha1 0x6162633132330000000000000000000000000000 \ enc aes 0x00000000000000000000000000000000 grep --binary abc123 /tmp/nlmon.pcap Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11net: sh_eth: do not advertise Gigabit capabilities when not availableThomas Petazzoni
Not all variants of the sh_eth hardware have Gigabit support. Unfortunately, the current driver doesn't tell the PHY about the limited MAC capabilities. Due to this, if you have a Gigabit capable PHY, the PHY will advertise its Gigabit capability and establish a link at 1Gbit/s, even though the MAC doesn't support it. In order to avoid this, we use the recently introduced phy_set_max_speed() to tell the PHY to not advertise speed higher than 100 MBit/s. Tested on a SH7786 platform, with a Gigabit PHY. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11net: phy: meson-gxl: detect LPA corruptionJerome Brunet
The purpose of this change is to fix the incorrect detection of the link partner (LP) advertised capabilities which sometimes happens with this PHY (roughly 1 time in a dozen) This issue may cause the link to be negotiated at 10Mbps/Full or 10Mbps/Half when 100MBps/Full is actually possible. In some case, the link is even completely broken and no communication is possible. To detect the corruption, we must look for a magic undocumented bit in the WOL bank (hint given by the SoC vendor kernel) but this is not enough to cover all cases. We also have to look at the LPA ack. If the LP supports Aneg but did not ack our base code when aneg is completed, we assume something went wrong. The detection of a corrupted LPA triggers a restart of the aneg process. This solves the problem but may take up to 6 retries to complete. Fixes: 7334b3e47aee ("net: phy: Add Meson GXL Internal PHY driver") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>