aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Alloc: Disable direct use of mmapky/fix-mmap-fragmentationKazuki Yamaguchi2022-04-011-1/+1
| | | | | | | The way munmap is called breaks VMA merging in Linux and crashes BIRD. Let's fallback to posix_memalign() for now, which takes care of the number of mmap'ed regions.
* Alloc: Use posix_memalign() instead of aligned_alloc()Ondrej Zajicek (work)2022-02-081-5/+6
| | | | | For compatibility with older systems use posix_memalign(). We can switch to aligned_alloc() when we commit to C11 for multithreading.
* Netlink: Minor cleanupOndrej Zajicek (work)2022-02-081-1/+1
|
* Lib: Update alignment of slabsOndrej Zajicek (work)2022-02-071-2/+2
| | | | | | Alignment of slabs should be at least sizeof(ptr) to avoid unaligned pointers in slab structures. Fixme: Use proper way to choose alignment for internal allocators.
* Merge branch 'oz-trie-table'Ondrej Zajicek (work)2022-02-0623-358/+2945
|\
| * Trie: Fix trie formatOndrej Zajicek (work)2022-02-062-21/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | After switching to 16-way tries, trie format ignored unaligned / internal prefixes and only reported the primary prefix of a trie node. Fix trie format by showing internal prefixes based on the 'local' bitmask of a node. Also do basic (intra-node) reconstruction of prefix patterns by finding common subtrees in 'local' bitmask. In future, we could improve that by doing inter-node reconstruction, so prefixes entered as one pattern for a subtree (e.g. 192.168.0.0/18+) would be reported as such, like with aligned prefixes.
| * Nest: Implement locking of prefix tries during walksOndrej Zajicek (work)2022-02-063-2/+95
| | | | | | | | | | | | | | | | | | | | The prune loop may may rebuild the prefix trie and therefore invalidate walk state for asynchronous walks (used in 'show route in' cmd). Fix it by adding locking that keeps the old trie in memory until current walks are done. In future this could be improved by rebuilding trie walk states (by lookup for last found prefix) after the prefix trie rebuild.
| * Nest: Implement prefix trie pruningOndrej Zajicek (work)2022-02-062-14/+44
| | | | | | | | | | | | | | | | | | When rtable is pruned and network fib nodes are removed, we also need to prune prefix trie. Unfortunately, rebuilding prefix trie takes long time (got about 400 ms for 1M networks), so must not be atomic, we have to rebuild a new trie while current one is still active. That may require some considerable amount of temporary memory, so we do that only if we expect significant trie size reduction.
| * Trie: Add prefix counterOndrej Zajicek (work)2022-02-063-0/+12
| | | | | | | | | | Add counter of prefixes stored in trie. Works only for 'restricted' tries composed of explicit prefixes (pxlen == l == h), like ones used in rtables.
| * Doc: Describe routing table optionsOndrej Zajicek (work)2022-02-061-16/+64
| |
| * BGP: Implement flowspec validation procedureOndrej Zajicek (work)2022-02-069-26/+487
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement flowspec validation procedure as described in RFC 8955 sec. 6 and RFC 9117. The Validation procedure enforces that only routers in the forwarding path for a network can originate flowspec rules for that network. The patch adds new mechanism for tracking inter-table dependencies, which is necessary as the flowspec validation depends on IP routes, and flowspec rules must be revalidated when best IP routes change. The validation procedure is disabled by default and requires that relevant IP table uses trie, as it uses interval queries for subnets.
| * Nest: Add routing table configuration blocksOndrej Zajicek (work)2022-02-062-31/+60
| | | | | | | | | | | | Allow to specify sorted flag, trie fla, and min/max settle time. Also do not enable trie by default, it must be explicitly enabled.
| * Nest: Add convenience functions to check rtable net typeOndrej Zajicek (work)2022-02-061-0/+12
| |
| * Nest: Avoid unnecessary net_format() in 'show route' commandOndrej Zajicek (work)2022-02-061-3/+9
| | | | | | | | | | | | | | | | | | When output of 'show route' command was generated, the net_format() was called for each network prematurely, even if the result was not needed. Fix the code to call net_format() only when needed. This makes queries that process many networks but show only few (e.g. 'show route where ..', or 'show route count') much faster (like 5x - 10x faster).
| * Nest: Add trie iteration code to 'show route'Ondrej Zajicek (work)2022-02-062-14/+51
| | | | | | | | | | Add trie iteration code to rt_show_cont() CLI hook and use it to accelerate 'show route in <addr>' commands using interval queries.
| * Nest: Implement 'show route in <addr>' commandOndrej Zajicek (work)2022-02-063-6/+24
| | | | | | | | | | | | Implement 'show route in <addr>' command, which shows all routes in networks that are subnets of given network. Currently limited to IP network types.
| * Nest: Attach prefix trie to rtable for faster LPM and interval queriesOndrej Zajicek (work)2022-02-065-36/+270
| | | | | | | | | | | | | | | | | | | | | | Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x speedup for IPv6 of these calls. TODO: - Rebuild the trie during rt_prune_table() - Better way to avoid trie_add_prefix() in net_get() for existing tables - Make it configurable (?)
| * Trie: Clarify handling of less-common net typesOndrej Zajicek (work)2021-12-021-21/+41
| | | | | | | | | | | | For convenience, Trie functions generally accept as input values not only NET_IPx types of nets, but also NET_VPNx and NET_ROAx types. But returned values are always NET_IPx types.
| * Trie: Implement longest-prefix-match queries and walksOndrej Zajicek (work)2021-11-264-2/+359
| | | | | | | | | | | | | | The prefix trie now supports longest-prefix-match query by function trie_match_longest_ipX() and it can be extended to iteration over all covering prefixes for a given prefix (from longest to shortest) using TRIE_WALK_TO_ROOT_IPx() macro.
| * Trie: Implement trie walking codeOndrej Zajicek (work)2021-11-193-13/+413
| | | | | | | | | | | | Trie walking allows enumeration of prefixes in a trie in the usual lexicographic order. Optionally, trie enumeration can be restricted to a chosen subnet (and its descendants).
| * Trie: Simplify network matching codeOndrej Zajicek (work)2021-11-133-16/+90
| | | | | | | | Introduce ipX_prefix_equal() and use it to simplify network matching code.
| * Filter: Add prefix trie benchmarksOndrej Zajicek (work)2021-09-253-0/+242
| | | | | | | | | | | | Add trie tests intended as benchmarks that use external datasets instead of generated prefixes. As datasets are not included, they are commented out by default.
| * Filter: Improve prefix trie testsOndrej Zajicek (work)2021-09-253-74/+310
| | | | | | | | | | Add tests explicitly matching insides and outsides of trie and update tests to do testing of both IPv4 and IPv6 tries.
| * Filter: Update trie documentationOndrej Zajicek (work)2021-09-251-44/+69
| |
| * Filter: Fix trie testOndrej Zajicek (work)2021-09-251-2/+3
| | | | | | | | Generated prefixes must be valid.
| * Filter: Faster prefix setsOndrej Zajicek (work)2021-09-255-73/+236
| | | | | | | | | | | | | | | | | | Use 16-way (4bit) branching in prefix trie instead of basic binary branching. The change makes IPv4 prefix sets almost 3x faster, but with more memory consumption and much more complicated algorithm. Together with a previous filter change, it makes IPv4 prefix sets about ~4.3x faster and slightly smaller (on my test data).
* | BGP: Make routing loops silentOndrej Zajicek (work)2022-01-282-5/+9
| | | | | | | | | | | | One of previous commits added error logging of invalid routes. This also inadvertently caused error logging of route loops, which should be ignored silently. Fix that.
* | BGP: Use proper class in attribute error messagesOndrej Zajicek (work)2022-01-283-15/+21
| | | | | | | | | | | | | | | | | | | | Most error messages in attribute processing are in rx/decode step and these use L_REMOTE log class. But there are few that are in tx/export step and these should use L_ERR log class. Use tx-specific macro (REJECT()) in tx/export code and rename field err_withdraw to err_reject in struct bgp_export_state to ensure that appropriate error reporting macros are called in proper contexts.
* | BGP: Improve 'invalid next hop' error reportingOndrej Zajicek (work)2022-01-281-11/+17
| | | | | | | | | | | | | | Distinguish multiple causes of 'invalid next hop' message and report the relevant next hop address. Thanks to Simon Ruderich for the original patch.
* | BGP: Log route updates that were changed to withdrawsOndrej Zajicek (work)2022-01-243-1/+12
| | | | | | | | | | Typical BGP error handling is treat-as-withdraw, where an invalid route is replaced with a withdraw. Log route network when it happens.
* | .gitlab-ci.yml: minor changes inside the .yml file.Matous Holinka2022-01-172-9/+11
| | | | | | | | | | | | | | | | + ubuntu:21.10 added into the pipeline, - ubuntu:20.10 removed from the pipeline, + misc/docker/ubuntu-21.10-amd64/Dockerfile added, - misc/docker/ubuntu-20.10-amd64/Dockerfile removed.
* | Netlink: Add option to specify netlink socket receive buffer sizeOndrej Zajicek (work)2022-01-175-1/+2243
| | | | | | | | | | | | | | Add option 'netlink rx buffer' to specify netlink socket receive buffer size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit. Thanks to Trisha Biswas and Michal for the original patches.
* | Netlink: Add another workaround for older kernel headersOndrej Zajicek (work)2022-01-151-0/+7
| | | | | | | | | | Unfortunately, SOL_NETLINK is both recently added and arch-dependent, so we cannot just define it.
* | Netlink: Add workaround for older kernel headersOndrej Zajicek (work)2022-01-141-0/+4
| |
* | Netlink: Enable strict checking for KRT dumpsOndrej Zajicek (work)2022-01-141-11/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add strict checking for netlink KRT dumps to avoid PMTU cache records from FNHE table dump along with KRT. Linux Kernel added FNHE table dump to the netlink API in patch: https://patchwork.ozlabs.org/project/netdev/patch/8d3b68cd37fb5fddc470904cdd6793fcf480c6c1.1561131177.git.sbrivio@redhat.com/ Therefore, since Linux 5.3 these route cache entries are dumped together with regular routes during periodic KRT scans, which in some cases may be huge amount of useless data. This can be avoided by using strict checking for netlink dumps: https://lore.kernel.org/netdev/20181008031644.15989-1-dsahern@kernel.org/ The patch mitigates the risk of receiving unknown and potentially large number of FNHE records that would block BIRD I/O in each sync. There is a known issue caused by the GRE tunnels on Linux that seems to be creating one FNHE record for each destination IP address that is routed through the tunnel, even when the PMTU equals to GRE interface MTU. Thanks to Tomas Hlavacek for the original patch.
* | Netlink: Explicitly skip received cloned routesOndrej Zajicek (work)2022-01-141-3/+7
| | | | | | | | | | | | | | | | | | Kernel uses cloned routes to keep route cache entries, but reports them together with regular routes. They were skipped implicitly as they do not have rtm_protocol filled. Add explicit check for cloned flag and skip such routes explicitly. Also, improve debug logs of skipped routes.
* | BGP: Add option 'free bind'Ondrej Zajicek (work)2022-01-094-4/+17
| | | | | | | | | | | | | | The BGP 'free bind' option applies the IP_FREEBIND/IPV6_FREEBIND socket option for the BGP listening socket. Thanks to Alexander Zubkov for the idea.
* | IO: Support nonlocal bind in socket interfaceAlexander Zubkov2022-01-084-0/+30
| | | | | | | | | | | | | | | | | | Add option to socket interface for nonlocal binding, i.e. binding to an IP address that is not present on interfaces. This behaviour is enabled when SKF_FREEBIND socket flag is set. For Linux systems, it is implemented by IP_FREEBIND socket flag. Minor changes done by commiter.
* | Test: Activate some remaining build testsOndrej Zajicek (work)2022-01-051-0/+10
| |
* | Netlink: Do not ignore dead routes from BIRDOndrej Zajicek (work)2022-01-051-4/+4
| | | | | | | | | | | | | | | | | | Currently, BIRD ignores dead routes to consider them absent. But it also ignores its own routes and thus it can not correctly manage such routes in some cases. This patch makes an exception for routes with proto bird when ignoring dead routes, so they can be properly updated or removed. Thanks to Alexander Zubkov for the original patch.
* | Netlink: Improve multipath parsing errorsOndrej Zajicek (work)2022-01-051-16/+25
| | | | | | | | Function nl_parse_multipath() should handle errors internally.
* | Conf: Fix parsing full-length IPv6 addressesOndrej Zajicek (work)2022-01-052-1/+21
| | | | | | | | | | | | | | | | | | | | Lexer expression for bytestring was too loose, accepting also full-length IPv6 addresses. It should be restricted such that colon is used between every byte or never. Fix the regex and also add some test cases for it. Thanks to Alexander Zubkov for the bugreport
* | gitlab-ci.yml: failing gitlab runner fixed.Matous2022-01-051-55/+55
| | | | | | | | 'registry.labs.nic.cz' -> 'registry.nic.cz' changed
* | Doc: Document min/max operators for listsAlexander Zubkov2021-12-281-1/+10
| |
* | Filter: Add operators to find minimum and maximum element of setsAlexander Zubkov2021-12-287-0/+238
| | | | | | | | | | | | | | | | | | | | | | Add operators .min and .max to find minumum or maximum element in sets of types: clist, eclist, lclist. Example usage: bgp_community.min bgp_ext_community.max filter(bgp_large_community, [(as1, as2, *)]).min Signed-off-by: Alexander Zubkov <green@qrator.net>
* | Doc: Document community components access operatorsAlexander Zubkov2021-12-281-0/+7
| |
* | Filter: Add operators to pick community componentsAlexander Zubkov2021-12-283-8/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add operators that can be used to pick components from pair (standard community) or lc (large community) types. For example: (10, 20).asn --> 10 (10, 20).data --> 20 (10, 20, 30).asn --> 10 (10, 20, 30).data1 --> 20 (10, 20, 30).data2 --> 30 Signed-off-by: Alexander Zubkov <green@qrator.net>
* | BSD: Assume onlink flag on ifaces with only host addressesOndrej Zajicek (work)2021-12-271-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BSD kernel does not support the onlink flag and BIRD does not use direct routes for next hop validation, instead depends on interface address ranges. We would like to handle PtMP cases with only host addresses configured, like: ifconfig wg0 192.168.0.10/32 route add 192.168.0.4 -iface wg0 route add 192.168.0.8 -iface wg0 To accept BIRD routes with onlink next-hop, like: route 192.168.42.0/24 via 192.168.0.4%wg0 onlink BIRD would dismiss the route when receiving from the kernel, as the next-hop 192.168.0.4 is not part of any interface subnet and onlink flag is not kept by the BSD kernel. The commit fixes this by assuming that for routes received from the kernel, any next-hop is onlink on ifaces with only host addresses. Thanks to Stefan Haller for the original patch.
* | RPKI: Add contextual out-of-bound checks in RTR Prefix PDU handlerJob Snijders2021-12-181-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 6810 and RFC 8210 specify that the "Max Length" value MUST NOT be less than the Prefix Length element (underflow). On the other side, overflow of the Max Length element also is possible, it being an 8-bit unsigned integer allows for values larger than 32 or 128. This also implicitly ensures there is no overflow of "Length" value. When a PDU is received where the Max Length field is corrputed, the RTR client (BIRD) should immediately terminate the session, flush all data learned from that cache, and log an error for the operator. Minor changes done by commiter.
* | Doc: bgp: remove "advertise ipv4"Simon Ruderich2021-12-181-7/+0
| | | | | | | | | | The option was removed in d15b0b0a ("BGP redesign", 2016-12-07) but the documentation wasn't updated.