summary refs log tree commit diff
AgeCommit message (Collapse)Author
2017-09-05rds: Fix non-atomic operation on shared flag variableHåkon Bugge
The bits in m_flags in struct rds_message are used for a plurality of reasons, and from different contexts. To avoid any missing updates to m_flags, use the atomic set_bit() instead of the non-atomic equivalent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: sched: don't use GFP_KERNEL under spin lockJakub Kicinski
The new TC IDR code uses GFP_KERNEL under spin lock. Which leads to: [ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416 [ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc [ 582.636939] 2 locks held by tc/3379: [ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400 [ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.662217] Preemption disabled at: [ 582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287 [ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 582.691937] Call Trace: ... [ 582.742460] kmem_cache_alloc+0x286/0x540 [ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450 [ 582.753209] idr_get_free_cmn+0x627/0xf80 ... [ 582.815525] idr_alloc_cmn+0x1a8/0x270 ... [ 582.833804] tcf_idr_create+0x31b/0x8e0 ... Try to preallocate the memory with idr_prealloc(GFP_KERNEL) (as suggested by Eric Dumazet), and change the allocation flags under spin lock. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05vhost_net: correctly check tx avail during rx busy pollingJason Wang
We check tx avail through vhost_enable_notify() in the past which is wrong since it only checks whether or not guest has filled more available buffer since last avail idx synchronization which was just done by vhost_vq_avail_empty() before. What we really want is checking pending buffers in the avail ring. Fix this by calling vhost_vq_avail_empty() instead. This issue could be noticed by doing netperf TCP_RR benchmark as client from guest (but not host). With this fix, TCP_RR from guest to localhost restores from 1375.91 trans per sec to 55235.28 trans per sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz). Fixes: 030881372460 ("vhost_net: basic polling support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: mdio-mux: add mdio_mux parameter to mdio_mux_init()Corentin Labbe
mdio_mux_init() use the parameter dev for two distinct thing: 1) Have a device for all devm_ functions 2) Get device_node from it Since it is two distinct purpose, this patch add a parameter mdio_mux that is linked to task 2. This will also permit to register an of_node mdio-mux that lacks a direct owning device. For example a mdio-mux which is a subnode of a real device. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rxrpc: Make service connection lookup always check for retryDavid Howells
When an RxRPC service packet comes in, the target connection is looked up by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry check is, however, currently skipped if we got a match, but probably shouldn't be in case the connection we found gets replaced whilst we're doing a search. Make the lookup procedure always go through need_seqretry(), even if the lookup was successful. This makes sure we always pick up on a write-lock event. On the other hand, since we don't take a ref on the object, but rely on RCU to prevent its destruction after dropping the seqlock, I'm not sure this is necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: stmmac: Delete dead code for MDIO registrationRomain Perier
This code is no longer used, the logging function was changed by commit fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register"). It was previously showing information about the type of the IRQ, if it's polled, ignored or a normal interrupt. As we don't want information loss, I have moved this code to phy_attached_print(). Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register") Signed-off-by: Romain Perier <romain.perier@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05gianfar: Fix Tx flow control deactivationClaudiu Manoil
The wrong register is checked for the Tx flow control bit, it should have been maccfg1 not maccfg2. This went unnoticed for so long probably because the impact is hardly visible, not to mention the tangled code from adjust_link(). First, link flow control (i.e. handling of Rx/Tx link level pause frames) is disabled by default (needs to be enabled via 'ethtool -A'). Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few old boards), which results in Tx flow control remaining always on once activated. Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6Ganesh Goudar
MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence ignore this interrupt for T6. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Fix pause frame count in t4_get_port_statsGanesh Goudar
MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx] only control whether or not Pause Frames will be counted as part of the 64-Byte Tx/Rx Frame counters. These bits do not control whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes counters. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: fix memory leakGanesh Goudar
do not reuse the loop counter which is used iterate over the ports, so that sched_tbl will be freed for all the ports. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05tun: rename generic_xdp to skb_xdpJason Wang
Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the generic XDP which will be done at netif_receive_skb(). Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05tun: reserve extra headroom only when XDP is setJason Wang
We reserve headroom unconditionally which could cause unnecessary stress on socket memory accounting because of increased trusesize. Fix this by only reserve extra headroom when XDP is set. Cc: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge branch 'dsa-tx-queues'David S. Miller
Florian Fainelli says: ==================== net: dsa: Allow switch drivers to indicate number of TX queues This patch series extracts the parts of the patch set that are likely not to be controversial and actually bringing multi-queue support to DSA-created network devices. With these patches, we can now use sch_multiq as documented under Documentation/networking/multique.txt and let applications dedice the switch port output queue they want to use. Currently only Broadcom tags utilize that information. Resending based on David's feedback regarding the patches not in patchwork. Changes in v2: - use a proper define for the number of TX queues in bcm_sf2.c (Andrew) Changes from RFC: - dropped the ability to configure RX queues since we don't do anything with those just yet - dropped the patches that dealt with binding the DSA slave network devices queues with their master network devices queues this will be worked on separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: dsa: bcm_sf2: Configure IMP port TC2QOS mappingFlorian Fainelli
Even though TC2QOS mapping is for switch egress queues, we need to configure it correclty in order for the Broadcom tag ingress (CPU -> switch) queue selection to work correctly since there is a 1:1 mapping between switch egress queues and ingress queues. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: dsa: bcm_sf2: Advertise number of egress queuesFlorian Fainelli
The switch supports 8 egress queues per port, so indicate that such that net/dsa/slave.c::dsa_slave_create can allocate the right number of TX queues. While at it use SF2_NUM_EGRESS_QUEUE as a define for the number of queues we support. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: dsa: tag_brcm: Set output queue from skb queue mappingFlorian Fainelli
We originally used skb->priority but that was not quite correct as this bitfield needs to contain the egress switch queue we intend to send this SKB to. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: dsa: Allow switch drivers to indicate number of TX queuesFlorian Fainelli
Let switch drivers indicate how many TX queues they support. Some switches, such as Broadcom Starfighter 2 are designed with 8 egress queues. Future changes will allow us to leverage the queue mapping and direct the transmission towards a particular queue. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05bridge: switchdev: Use an helper to clear forward markIdo Schimmel
Instead of using ifdef in the C file. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net/mlx4_core: Use ARRAY_SIZE macroThomas Meyer
Use ARRAY_SIZE macro, rather than explicitly coding some variant of it yourself. Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e 's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\) /ARRAY_SIZE(\1)/g' and manual check/verification. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge branch 'flow_dissector-fixes'David S. Miller
Tom Herbert says: ==================== flow_dissector: Flow dissector fixes This patch set fixes some basic issues with __skb_flow_dissect function. Items addressed: - Cleanup control flow in the function; in particular eliminate a bunch of goto's and implement a simplified control flow model - Add limits for number of encapsulations and headers that can be dissected v2: - Simplify the logic for limits on flow dissection. Just set the limit based on the number of headers the flow dissector can processes. The accounted headers includes encapsulation headers, extension headers, or other shim headers. Tested: Ran normal traffic, GUE, and VXLAN traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05flow_dissector: Add limit for number of headers to dissectTom Herbert
In flow dissector there are no limits to the number of nested encapsulations or headers that might be dissected which makes for a nice DOS attack. This patch sets a limit of the number of headers that flow dissector will parse. Headers includes network layer headers, transport layer headers, shim headers for encapsulation, IPv6 extension headers, etc. The limit for maximum number of headers to parse has be set to fifteen to account for a reasonable number of encapsulations, extension headers, VLAN, in a packet. Note that this limit does not supercede the STOP_AT_* flags which may stop processing before the headers limit is reached. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05flow_dissector: Cleanup control flowTom Herbert
__skb_flow_dissect is riddled with gotos that make discerning the flow, debugging, and extending the capability difficult. This patch reorganizes things so that we only perform goto's after the two main switch statements (no gotos within the cases now). It also eliminates several goto labels so that there are only two labels that can be target for goto. Reported-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05soc: ti/knav_dma: include dmaengine headerArnd Bergmann
A header file cleanup apparently caused a build regression with one driver using the knav infrastructure: In file included from drivers/net/ethernet/ti/netcp_core.c:30:0: include/linux/soc/ti/knav_dma.h:129:30: error: field 'direction' has incomplete type enum dma_transfer_direction direction; ^~~~~~~~~ drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_txpipe_open': drivers/net/ethernet/ti/netcp_core.c:1349:21: error: 'DMA_MEM_TO_DEV' undeclared (first use in this function); did you mean 'DMA_MEMORY_MAP'? config.direction = DMA_MEM_TO_DEV; ^~~~~~~~~~~~~~ DMA_MEMORY_MAP drivers/net/ethernet/ti/netcp_core.c:1349:21: note: each undeclared identifier is reported only once for each function it appears in drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_setup_navigator_resources': drivers/net/ethernet/ti/netcp_core.c:1659:22: error: 'DMA_DEV_TO_MEM' undeclared (first use in this function); did you mean 'DMA_DESC_HOST'? config.direction = DMA_DEV_TO_MEM; As the header is no longer included implicitly through netdevice.h, we should include it in the header that references the enum. Fixes: 0dd5759dbb1c ("net: remove dmaengine.h inclusion from netdevice.h") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid referencesArnd Bergmann
We get a new link error in allmodconfig kernels after ftgmac100 started using the ncsi helpers: ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined! ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined! Related to that, we get another error when CONFIG_NET_NCSI is disabled: drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'? drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'? This fixes both problems at once, using a 'static inline' stub helper for the disabled case, and exporting the functions when they are present. Fixes: 51564585d8c6 ("ftgmac100: Support NCSI VLAN filtering when available") Fixes: 21acf63013ed ("net/ncsi: Configure VLAN tag filter") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05bpf: fix numa_node validationEric Dumazet
syzkaller reported crashes in bpf map creation or map update [1] Problem is that nr_node_ids is a signed integer, NUMA_NO_NODE is also an integer, so it is very tempting to declare numa_node as a signed integer. This means the typical test to validate a user provided value : if (numa_node != NUMA_NO_NODE && (numa_node >= nr_node_ids || !node_online(numa_node))) must be written : if (numa_node != NUMA_NO_NODE && ((unsigned int)numa_node >= nr_node_ids || !node_online(numa_node))) [1] kernel BUG at mm/slab.c:3256! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000 RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096 RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000 RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040 RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040 R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b FS: 0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0 Call Trace: __do_kmalloc_node mm/slab.c:3688 [inline] __kmalloc_node+0x33/0x70 mm/slab.c:3696 kmalloc_node include/linux/slab.h:535 [inline] alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740 htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820 map_update_elem kernel/bpf/syscall.c:587 [inline] SYSC_bpf kernel/bpf/syscall.c:1468 [inline] SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x440409 RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409 RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70 R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000 Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638 ---[ end trace d745f355da2e33ce ]--- Kernel panic - not syncing: Fatal exception Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for next-net (part 2) The following patchset contains Netfilter updates for net-next. This patchset includes updates for nf_tables, removal of CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More specifically, they: 1) Add new rate match mode for hashlimit, this introduces a new revision for this match. The idea is to stop matching packets until ratelimit criteria stands true. Patch from Vishwanath Pai. 2) Add ->select_ops indirection to nf_tables named objects, so we can choose between different flavours of the same object type, patch from Pablo M. Bermudo. 3) Shorter function names in nft_limit, basically: s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo. 4) Add new stateful limit named object type, this allows us to create limit policies that you can identify via name, also from Pablo. 5) Remove unused hooknum parameter in conntrack ->packet indirection. From Florian Westphal. 6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao. 7) Add nf_tables_updchain() helper function and use it from nf_tables_newchain() to make it more maintainable. Similarly, add nf_tables_addchain() and use it too. 8) Add new netlink NLM_F_NONREC flag, this flag should only be used for deletion requests, specifically, to support non-recursive deletion. Based on what we discussed during NFWS'17 in Faro. 9) Use NLM_F_NONREC from table and sets in nf_tables. 10) Support for recursive chain deletion. Table and set deletion commands come with an implicit content flush on deletion, while chains do not. This patch addresses this inconsistency by adding the code to perform recursive chain deletions. This also comes with the bits to deal with the new NLM_F_NONREC netlink flag. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-04netfilter: nf_tables: support for recursive chain deletionPablo Neira Ayuso
This patch sorts out an asymmetry in deletions. Currently, table and set deletion commands come with an implicit content flush on deletion. However, chain deletion results in -EBUSY if there is content in this chain, so no implicit flush happens. So you have to send a flush command in first place to delete chains, this is inconsistent and it can be annoying in terms of user experience. This patch uses the new NLM_F_NONREC flag to request non-recursive chain deletion, ie. if the chain to be removed contains rules, then this returns EBUSY. This problem was discussed during the NFWS'17 in Faro, Portugal. In iptables, you hit -EBUSY if you try to delete a chain that contains rules, so you have to flush first before you can remove anything. Since iptables-compat uses the nf_tables netlink interface, it has to use the NLM_F_NONREC flag from userspace to retain the original iptables semantics, ie. bail out on removing chains that contain rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: use NLM_F_NONREC for deletion requestsPablo Neira Ayuso
Bail out if user requests non-recursive deletion for tables and sets. This new flags tells nf_tables netlink interface to reject deletions if tables and sets have content. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netlink: add NLM_F_NONREC flag for deletion requestsPablo Neira Ayuso
In the last NFWS in Faro, Portugal, we discussed that netlink is lacking the semantics to request non recursive deletions, ie. do not delete an object iff it has child objects that hang from this parent object that the user requests to be deleted. We need this new flag to solve a problem for the iptables-compat backward compatibility utility, that runs iptables commands using the existing nf_tables netlink interface. Specifically, custom chains in iptables cannot be deleted if there are rules in it, however, nf_tables allows to remove any chain that is populated with content. To sort out this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC flag to obtain the same semantics that iptables provides. This new flag should only be used for deletion requests. Note this new flag value overlaps with the existing: * NLM_F_ROOT for get requests. * NLM_F_REPLACE for new requests. However, those flags should not ever be used in deletion requests. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add nf_tables_addchain()Pablo Neira Ayuso
Wrap the chain addition path in a function to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add nf_tables_updchain()Pablo Neira Ayuso
nf_tables_newchain() is too large, wrap the chain update path in a function to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros.Varsha Rao
This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they are no longer required. Replace _ASSERT() macros with WARN_ON(). Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04net: Replace NF_CT_ASSERT() with WARN_ON().Varsha Rao
This patch removes NF_CT_ASSERT() and instead uses WARN_ON(). Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
2017-09-04netfilter: remove unused hooknum arg from packet functionsFlorian Westphal
tested with allmodconfig build. Signed-off-by: Florian Westphal <fw@strlen.de>
2017-09-04netfilter: nft_limit: add stateful object typePablo M. Bermudo Garay
Register a new limit stateful object type into the stateful object infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nft_limit: replace pkt_bytes with bytesPablo M. Bermudo Garay
Just a small refactor patch in order to improve the code readability. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add select_ops for stateful objectsPablo M. Bermudo Garay
This patch adds support for overloading stateful objects operations through the select_ops() callback, just as it is implemented for expressions. This change is needed for upcoming additions to the stateful objects infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: xt_hashlimit: add rate match modeVishwanath Pai
This patch adds a new feature to hashlimit that allows matching on the current packet/byte rate without rate limiting. This can be enabled with a new flag --hashlimit-rate-match. The match returns true if the current rate of packets is above/below the user specified value. The main difference between the existing algorithm and the new one is that the existing algorithm rate-limits the flow whereas the new algorithm does not. Instead it *classifies* the flow based on whether it is above or below a certain rate. I will demonstrate this with an example below. Let us assume this rule: iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain If the packet rate is 15/s, the existing algorithm would ACCEPT 10 packets every second and send 5 packets to "new_chain". But with the new algorithm, as long as the rate of 15/s is sustained, all packets will continue to match and every packet is sent to new_chain. This new functionality will let us classify different flows based on their current rate, so that further decisions can be made on them based on what the current rate is. This is how the new algorithm works: We divide time into intervals of 1 (sec/min/hour) as specified by the user. We keep track of the number of packets/bytes processed in the current interval. After each interval we reset the counter to 0. When we receive a packet for match, we look at the packet rate during the current interval and the previous interval to make a decision: if [ prev_rate < user and cur_rate < user ] return Below else return Above Where cur_rate is the number of packets/bytes seen in the current interval, prev is the number of packets/bytes seen in the previous interval and 'user' is the rate specified by the user. We also provide flexibility to the user for choosing the time interval using the option --hashilmit-interval. For example the user can keep a low rate like x/hour but still keep the interval as small as 1 second. To preserve backwards compatibility we have to add this feature in a new revision, so I've created revision 3 for hashlimit. The two new options we add are: --hashlimit-rate-match --hashlimit-rate-interval I have updated the help text to add these new options. Also added a few tests for the new options. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-03Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-09-03 Here's one last bluetooth-next pull request for the 4.14 kernel: - NULL pointer fix in ca8210 802.15.4 driver - A few "const" fixes - New Kconfig option for disabling legacy interfaces Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Merge branch 'qualcomm-rmnet-Fix-comments-on-initial-patchset'David S. Miller
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Fix comments on initial patchset This series fixes the comments from Dan on the first patch series. Fixes a memory corruption which could occur if mux_id was higher than 32. Remove the RMNET_LOCAL_LOGICAL_ENDPOINT which is no longer used. Make a log message more useful. Combine __rmnet_set_endpoint_config() with rmnet_set_endpoint_config(). Set the mux_id in rmnet_vnd_newlink(). Set the ingress and egress data format directly in newlink. Implement ndo_get_iflink to find the real_dev. Rename the real_dev_info to port to make it similar to other drivers. The conversion of rmnet_devices to a list and hash lookup will be sent as part of a seperate patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Rename real_dev_info to portSubash Abhinov Kasiviswanathan
Make it similar to drivers like ipvlan / macvlan so it is easier to read. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Implement ndo_get_iflinkSubash Abhinov Kasiviswanathan
This makes it easier to find out the parent dev. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Refactor the new rmnet dev creationSubash Abhinov Kasiviswanathan
Data format can be directly set from rmnet_newlink() since the rmnet real dev info is already available. Since __rmnet_get_real_dev_info() is no longer used in rmnet_config.c after removal of those functions, move content to rmnet_get_real_dev_info(). __rmnet_set_endpoint_config() is collapsed into rmnet_set_endpoint_config() since only mux_id was being set additionally within it. Remove an unnecessary mux_id check. Set the mux_id for the rmnet_dev within rmnet_vnd_newlink() itself. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Move the device creation logSubash Abhinov Kasiviswanathan
The current log is not very useful as it does not log the device name since it it is prior to registration - (unnamed net_device) (uninitialized): Setting up device Modify to log after the device registration - rmnet1: rmnet dev created Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Remove the unused endpoint -1Subash Abhinov Kasiviswanathan
This was used only in the original patch series where the IOCTLs were present and is no longer in use. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03net: qualcomm: rmnet: Fix memory corruption if mux_id is greater than 32Subash Abhinov Kasiviswanathan
rmnet_rtnl_validate() was checking for upto mux_id 254, however the rmnet_devices devices could hold upto 32 entries only. Fix this by increasing the size of the rmnet_devices. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Merge branch 'nfp-refactor-app-init-and-minor-flower-fixes'David S. Miller
Jakub Kicinski says: ==================== nfp: refactor app init, and minor flower fixes This series is a part 2 to what went into net as a simpler fix. In net we simply moved when existing callbacks are invoked to ensure flower app does not still use representors when lower netdev has already been destroyed. In this series we add a callback to notify apps when vNIC netdevs are fully initialized and they are about to be destroyed. This allows flower to spawn representors at the right time, while keeping the start/stop callbacks for what they are intended to be used - FW initialization over control channel. Patch 4 improves drop monitor interaction and patch 5 changes the default Kconfig selection of flower offload. Patch 6 fixes locking around representor updates which got lost in net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03nfp: flower: restore RTNL locking around representor updatesJakub Kicinski
When we moved to updating representors from a workqueue grabbing the RTNL somehow got lost in the process. Restore it, and make sure RCU lock is not held while we are grabbing the RTNL. RCU protects the representor table, so since we will be under RTNL we can drop RCU lock as soon as we find the netdev pointer. RTNL is needed for the dev_set_mtu() call. Fixes: 2dff19622421 ("nfp: process MTU updates from firmware flower app") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03nfp: build the flower offload by defaultJakub Kicinski
It's reasonable to assume that if user selects to build the NFP driver all offload capabilities will be enabled by default. Change the CONFIG_NFP_APP_FLOWER to default to enabled. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03nfp: be drop monitor friendlyJakub Kicinski
Use dev_consume_skb_any() in place of dev_kfree_skb_any() when control frame has been successfully processed in flower and on the driver's main TX completion path. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>