summary refs log tree commit diff
path: root/include/net/ipv6.h
AgeCommit message (Collapse)Author
2013-01-30ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6 flowlabel: Convert hash list to RCU.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29net: frag helper functions for mem limit trackingJesper Dangaard Brouer
This change is primarily a preparation to ease the extension of memory limit tracking. The change does reduce the number atomic operation, during freeing of a frag queue. This does introduce a some performance improvement, as these atomic operations are at the core of the performance problems seen on NUMA systems. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ipv6: Unshare ip6_nd_hdr() and change return type to void.YOSHIFUJI Hideaki / 吉藤英明
- move ip6_nd_hdr() to its users' source files. In net/ipv6/mcast.c, it will be called ip6_mc_hdr(). - make return type to void since this function never fails. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17ipv6: fix ipv6_prefix_equal64_half mask conversionFabio Baltieri
Fix the 64bit optimized version of ipv6_prefix_equal to convert the bitmask to network byte order only after the bit-shift. The bug was introduced in: 3867517 ipv6: 64bit version of ipv6_prefix_equal(). Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17net: increase fragment memory usage limitsJesper Dangaard Brouer
Increase the amount of memory usage limits for incomplete IP fragments. Arguing for new thresh high/low values: High threshold = 4 MBytes Low threshold = 3 MBytes The fragmentation memory accounting code, tries to account for the real memory usage, by measuring both the size of frag queue struct (inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize. We want to be able to handle/hold-on-to enough fragments, to ensure good performance, without causing incomplete fragments to hurt scalability, by causing the number of inet_frag_queue to grow too much (resulting longer searches for frag queues). For IPv4, how much memory does the largest frag consume. Maximum size fragment is 64K, which is approx 44 fragments with MTU(1500) sized packets. Sizeof(struct ipq) is 200. A 1500 byte packet results in a truesize of 2944 (not 2048 as I first assumed) (44*2944)+200 = 129736 bytes The current default high thresh of 262144 bytes, is obviously problematic, as only two 64K fragments can fit in the queue at the same time. How many 64K fragment can we fit into 4 MBytes: 4*2^20/((44*2944)+200) = 32.34 fragment in queues An attacker could send a separate/distinct fake fragment packets per queue, causing us to allocate one inet_frag_queue per packet, and thus attacking the hash table and its lists. How many frag queue do we need to store, and given a current hash size of 64, what is the average list length. Using one MTU sized fragment per inet_frag_queue, each consuming (2944+200) 3144 bytes. 4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length An attack could send small fragments, the smallest packet I could send resulted in a truesize of 896 bytes (I'm a little surprised by this). 4*2^20/(896+200) = 3827 frag queues -> 59 avg list length When increasing these number, we also need to followup with improvements, that is going to help scalability. Simply increasing the hash size, is not enough as the current implementation does not have a per hash bucket locking. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-16ipv6: Fix endianess warning in ip6_flow_hdr().YOSHIFUJI Hideaki
Commit 3e4e4c1f ("ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.) uses ntohl(), which should be htonl(). Found by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: 64bit version of ipv6_prefix_equal().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Remove __ipv6_prefix_equal().YOSHIFUJI Hideaki / 吉藤英明
ipv6_prefix_equal() just casts its arguments and it is the only user of __ipv6_prefix_equal(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: 64bit version of ipv6_addr_set().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: 64bit version of ipv6_addr_v4mapped().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: 64bit version of ipv6_addr_loopback().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: 64bit version of ipv6_addr_diff().YOSHIFUJI Hideaki / 吉藤英明
Introduce __ipv6_addr_diff64() to to find the first different bit between two addresses on 64bit architectures. 32bit version is still available as __ipv6_addr_diff32(), and __ipv6_addr_diff() automatically selects appropriate version. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.YOSHIFUJI Hideaki / 吉藤英明
This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01ipv6: unify logic evaluating inet6_dev's accept_ra propertyShmulik Ladkani
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the logic determining whether to send Router Solicitations is identical to the logic determining whether kernel accepts Router Advertisements. However the condition itself is repeated in several code locations. Unify it by introducing 'ipv6_accept_ra()' accessor. Also, simplify the condition expression, making it more readable. No semantic change. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-12ipv6: improve ipv6_find_hdr() to skip empty routing headersAnsis Atteka
This patch prepares ipv6_find_hdr() function so that it could be able to skip routing headers, where segements_left is 0. This is required to handle multiple routing header case correctly when changing IPv6 addresses. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-09ipv6: Move ipv6_find_hdr() out of Netfilter code.Jesse Gross
Open vSwitch will soon also use ipv6_find_hdr() so this moves it out of Netfilter-specific code into a more common location. Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-19ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inlineAmerigo Wang
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19ipv6: unify conntrack reassembly expire code with standard oneAmerigo Wang
Two years ago, Shan Wei tried to fix this: http://patchwork.ozlabs.org/patch/43905/ The problem is that RFC2460 requires an ICMP Time Exceeded -- Fragment Reassembly Time Exceeded message should be sent to the source of that fragment, if the defragmentation times out. " If insufficient fragments are received to complete reassembly of a packet within 60 seconds of the reception of the first-arriving fragment of that packet, reassembly of that packet must be abandoned and all the fragments that have been received for that packet must be discarded. If the first fragment (i.e., the one with a Fragment Offset of zero) has been received, an ICMP Time Exceeded -- Fragment Reassembly Time Exceeded message should be sent to the source of that fragment. " As Herbert suggested, we could actually use the standard IPv6 reassembly code which follows RFC2460. With this patch applied, I can see ICMP Time Exceeded sent from the receiver when the sender sent out 3/4 fragmented IPv6 UDP packet. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24Merge branch 'for-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace This is an initial merge in of Eric Biederman's work to start adding user namespace support to the networking. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14net ip6 flowlabel: Make owner a union of struct pid * and kuid_tEric W. Biederman
Correct a long standing omission and use struct pid in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS. This guarantees we don't have issues when pid wraparound occurs. Use a kuid_t in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_USER to add user namespace support. In /proc/net/ip6_flowlabel capture the current pid namespace when opening the file and release the pid namespace when the file is closed ensuring we print the pid owner value that is meaning to the reader of the file. Similarly use from_kuid_munged to print uid values that are meaningful to the reader of the file. This requires exporting pid_nr_ns so that ipv6 can continue to built as a module. Yoiks what silliness Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14gre: Support GRE over IPv6xeb@mail.ru
GRE over IPv6 implementation. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18ipv6: add ipv6_addr_hash() helperEric Dumazet
Introduce ipv6_addr_hash() helper doing a XOR on all bits of an IPv6 address, with an optimized x86_64 version. Use it in flow dissector, as suggested by Andrew McGregor, to reduce hash collision probabilities in fq_codel (and other users of flow dissector) Use it in ip6_tunnel.c and use more bit shuffling, as suggested by David Laight, as existing hash was ignoring most of them. Use it in sunrpc and use more bit shuffling, using hash_32(). Use it in net/ipv6/addrconf.c, using hash_32() as well. As a cleanup, use it in net/ipv4/tcp_metrics.c Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew McGregor <andrewmcgr@gmail.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().David S. Miller
And delete rt6_redirect(), since it is no longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10ipv6: optimize ipv6 addresses comparesEric Dumazet
On 64 bit arches having efficient unaligned accesses (eg x86_64) we can use long words to reduce number of instructions for free. Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used in a sorting function. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv6: bool/const conversions phase2Eric Dumazet
Mostly bool conversions, some inline removals and const additions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ipv6: bool conversions phase1Eric Dumazet
ipv6_opt_accepted() returns a bool, and can use const pointers ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(), ipv6_addr_orchid() return a bool. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18ip_frag: struct inet_frags match() method returns a boolEric Dumazet
- match() method returns a boolean - return (A && B && C && D) -> return A && B && C && D - fix indentation Signed-off-by: Eric Dumazet <edumazet@google.com>
2012-04-20net: Delete all remaining instances of ctl_pathEric W. Biederman
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net ipv4: Remove the unneeded registration of an empty net/ipv4/neighEric W. Biederman
sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this won't cause any user visible changes. Delete the ipv4_path and the ipv4_skeleton these are no longer needed. Directly register the ipv4_route_table. And since I am an idiot remove the header definitions that I should have removed in the previous patch. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-03ipv6: Add fragment reporting to ipv6_skip_exthdr().Jesse Gross
While parsing through IPv6 extension headers, fragment headers are skipped making them invisible to the caller. This reports the fragment offset of the last header in order to make it possible to determine whether the packet is fragmented and, if so whether it is a first or last fragment. Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-11-22net: remove ipv6_addr_copy()Alexey Dobriyan
C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14ipv6: reduce percpu needs for icmpv6msg mibsEric Dumazet
Reading /proc/net/snmp6 on a machine with a lot of cpus is very expensive (can be ~88000 us). This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding values for all possible cpus can read 16 Mbytes of memory (32MBytes on non x86 arches) ICMP messages are not considered as fast path on a typical server, and eventually few cpus handle them anyway. We can afford an atomic operation instead of using percpu data. This saves 4096 bytes per cpu and per network namespace. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-27ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAITEric Dumazet
commit 66b13d99d96a (ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT) fixed IPv4 only. This part is for the IPv6 side, adding a tclass param to ip6_xmit() We alias tw_tclass and tw_tos, if socket family is INET6. [ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill TOS field, TCLASS is not taken into account ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21ipv6: make fragment identifications less predictableEric Dumazet
IPv6 fragment identification generation is way beyond what we use for IPv4 : It uses a single generator. Its not scalable and allows DOS attacks. Now inetpeer is IPv6 aware, we can use it to provide a more secure and scalable frag ident generator (per destination, instead of system wide) This patch : 1) defines a new secure_ipv6_id() helper 2) extends inet_getid() to provide 32bit results 3) extends ipv6_select_ident() with a new dest parameter Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-19ipv6: reduce per device ICMP mib sizesEric Dumazet
ipv6 has per device ICMP SNMP counters, taking too much space because they use percpu storage. needed size per device is : (512+4)*sizeof(long)*number_of_possible_cpus*2 On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of memory per ipv6 enabled network device, taken in vmalloc pool. Since ICMP messages are rare, just use shared counters (atomic_long_t) Per network space ICMP counters are still using percpu memory, we might also convert them to shared counters in a future patch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-24net: Remove __KERNEL__ cpp checks from include/netDavid S. Miller
These header files are never installed to user consumption, so any __KERNEL__ cpp checks are superfluous. Projects should also not copy these files into their userland utility sources and try to use them there. If they insist on doing so, the onus is on them to sanitize the headers as needed. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22inet: constify ip headers and in6_addrEric Dumazet
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv6: Convert to use flowi6 where applicable.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h
2011-03-01xfrm: Handle blackhole route creation via afinfo.David S. Miller
That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Normalize arguments to ip6_dst_blackhole().David S. Miller
Return a dst pointer which is potentitally error encoded. Don't pass original dst pointer by reference, pass a struct net instead of a socket, and elide the flow argument since it is unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Change final dst lookup arg name to "can_sleep"David S. Miller
Since it indicates whether we are invoked from a sleepable context or not. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Consolidate route lookup sequences.David S. Miller
Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22ipv6: Add IPv6 multicast address flag definesLinus Lüssing
This commit adds the missing IPv6 multicast address flag defines to complement the already existing multicast address scope defines and to be able to check these flags nicely in the future. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-23net: return operator cleanupEric Dumazet
Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30snmp: 64bit ipstats_mib for all archesEric Dumazet
/proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s|egrep "InOctets|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) optionArnaud Ebalard
There are more than a dozen occurrences of following code in the IPv6 stack: if (opt && opt->srcrt) { struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt; ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; } Replace those with a helper. Note that the helper overrides final_p in all cases. This is ok as final_p was previously initialized to NULL when declared. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: David S. Miller <davem@davemloft.net>