# GoFlow2 [![Build Status](https://github.com/netsampler/goflow2/workflows/Build/badge.svg)](https://github.com/netsampler/goflow2/actions?query=workflow%3ABuild) [![Go Reference](https://pkg.go.dev/badge/github.com/netsampler/goflow2.svg)](https://pkg.go.dev/github.com/netsampler/goflow2) This application is a NetFlow/IPFIX/sFlow collector in Go. It gathers network information (IP, interfaces, routers) from different flow protocols, serializes it in a common format. You will want to use GoFlow if: * You receive a decent amount of network samples and need horizontal scalability * Have protocol diversity and need a consistent format * Require raw samples and build aggregation and custom enrichment This software is the entry point of a pipeline. The storage, transport, enrichment, graphing, alerting are not provided. ![GoFlow2 System diagram](/graphics/diagram.png) ## Origins This work is a fork of a previous [open-source GoFlow code](https://github.com/cloudflare/goflow) built and used at Cloudflare. It lives in its own GitHub organization to be maintained more easily. Among the differences with the original code: The serializer and transport options have been revamped to make this program more user-friendly and target new use-cases like logging providers. Minimal changes in the decoding libraries. ## Modularity In order to enable load-balancing and optimizations, the GoFlow library has a `decoder` which converts the payload of a flow packet into a Go structure. The `producer` functions (one per protocol) then converts those structures into a protobuf (`pb/flow.pb`) which contains the fields a network engineer is interested in. The flow packets usually contains multiples samples This acts as an abstraction of a sample. The `format` directory offers various utilities to process the protobuf. It can convert The `transport` provides different way of processing the protobuf. Either sending it via Kafka or send it to a file (or stdout). GoFlow2 is a wrapper of all the functions and chains thems. You can build your own collector using this base and replace parts: * Use different transport (e.g: RabbitMQ instead of Kafka) * Convert to another format (e.g: Cap'n Proto, Avro, instead of protobuf) * Decode different samples (e.g: not only IP networks, add MPLS) * Different metrics system (e.g: [OpenTelemetry](https://opentelemetry.io/)) ### Protocol difference The sampling protocols have distinct features: **sFlow** is a stateless protocol which sends the full header of a packet with router information (interfaces, destination AS) while **NetFlow/IPFIX** rely on templates that contain fields (e.g: source IPv6). The sampling rate in NetFlow/IPFIX is provided by **Option Data Sets**. This is why it can take a few minutes for the packets to be decoded until all the templates are received (**Option Template** and **Data Template**). Both of these protocols bundle multiple samples (**Data Set** in NetFlow/IPFIX and **Flow Sample** in sFlow) in one packet. The advantages of using an abstract network flow format, such as protobuf, is it enables summing over the protocols (e.g: per ASN or per port, rather than per (ASN, router) and (port, router)). To read more about the protocols and how they are mapped inside, check out [page](/docs/protocols.md) ### Features of GoFlow2 Collection: * NetFlow v5 * IPFIX/NetFlow v9 (sampling rate provided by the Option Data Set) * sFlow v5 (adding NetFlow v1,7,8 is being evaluated) Production: * Convert to protobuf or json * Prints to the console/file * Sends to Kafka and partition Monitoring via Prometheus metrics ## Get started To read about agents that samples network traffic, check this [page](/docs/agents.md). To set up the collector, download the latest release corresponding to your OS and run the following command (the binaries have a suffix with the version): ```bash $ ./goflow2 ``` By default, this command will launch an sFlow collector on port `:6343` and a NetFlowV9/IPFIX collector on port `:2055`. By default, the samples received will be printed in JSON format on the stdout. ```json { "Type": "SFLOW_5", "TimeFlowEnd": 1621820000, "TimeFlowStart": 1621820000, "TimeReceived": 1621820000, "Bytes": 70, "Packets": 1, "SamplingRate": 100, "SamplerAddress": "192.168.1.254", "DstAddr": "10.0.0.1", "DstMac": "ff:ff:ff:ff:ff:ff", "SrcAddr": "192.168.1.1", "SrcMac": "ff:ff:ff:ff:ff:ff", "InIf": 1, "OutIf": 2, "Etype": 2048, "EtypeName": "IPv4", "Proto": 6, "ProtoName": "TCP", "SrcPort": 443, "DstPort": 46344, "FragmentId": 54044, "FragmentOffset": 16384, ... "IPTTL": 64, "IPTos": 0, "TCPFlags": 16, } ``` If you are using a log integration (e.g: Loki with Promtail, Splunk, Fluentd, Google Cloud Logs, etc.), just send the output into a file. ```bash $ ./goflow2 -transport.file /var/logs/goflow2.log ``` To enable Kafka and send protobuf, use the following arguments: ```bash $ ./goflow2 -transport=kafka -transport.kafka.brokers=localhost:9092 -transport.kafka.topic=flows -format=pb ``` By default, the distribution will be randomized. To partition the feed (any field of the protobuf is available), the following options can be used: ``` -transport.kafka.hashing=true \ -format.hash=SamplerAddress,DstAS ``` ### Docker You can also run directly with a container: ``` $ sudo docker run -p 6343:6343/udp -p 2055:2055/udp -ti netsampler/goflow2:latest ``` ### Mapping extra fields In the case of exotic template fields or extra payload not supported by GoFlow2 of out the box, it is possible to pass a mapping file using `-mapping mapping.yaml`. A [sample file](cmd/goflow2/mapping.yaml) is available in the `cmd/goflow2` directory. For instance, certain devices producing IPFIX use `ingressPhysicalInterface` (id: 252) and do not use `ingressInterface` (id: 10). Using the following you can have the interface mapped in the InIf protobuf field without changing the code. ```yaml ipfix: mapping: - field: 252 destination: InIf - field: 253 destination: OutIf ``` ### Output format considerations The JSON format is advised only when consuming a small amount of data directly. For bigger workloads, the protobuf output format provides a binary representation and is preferred. It can also be extended with enrichment as long as the user keep the same IDs. If you want to develop applications, build `pb/flow.proto` into the language you want: When adding custom fields, picking a field ID ≥ 1000 is suggested. Check the docs for more information about [compiling protobuf](/docs/protobuf.md). ## Flow Pipeline A basic enrichment tool is available in the `cmd/enricher` directory. You need to load the Maxmind GeoIP ASN and Country databases using `-db.asn` and `-db.country`. Running a flow enrichment system is as simple as a pipe. Once you plug the stdin of the enricher to the stdout of GoFlow in protobuf, the source and destination IP addresses will automatically be mapped with a database for Autonomous System Number and Country. Similar output options as GoFlow are provided. ```bash $ ./goflow2 -transport.file.sep= -format=pb -format.protobuf.fixedlen=true | ./enricher -db.asn path-to/GeoLite2-ASN.mmdb -db.country path-to/GeoLite2-Country.mmdb ``` For a more scalable production setting, Kafka and protobuf are recommended. Stream operations (aggregation and filtering) can be done with stream-processor tools. For instance Flink, or the more recent Kafka Streams and kSQLdb. Direct storage can be done with data-warehouses like Clickhouse. In some cases, the consumer will require protobuf messages to be prefixed by length. To do this, use the flag `-format.protobuf.fixedlen=true`. This repository contains [examples of pipelines](./compose) with docker-compose. The available pipelines are: * [Kafka+Clickhouse+Grafana](./compose/kcg) * [Logstash+Elastic+Kibana](./compose/elk) ## User stories Are you using GoFlow2 in production at scale? Add yourself here! ### Contributions This project welcomes pull-requests, whether it's documentation, instrumentation (e.g: docker-compose, metrics), internals (protocol libraries), integration (new CLI feature) or else! Just make sure to check for the use-cases via an issue. This software would not exist without the testing and commits from its users and [contributors](docs/contributors.md). ## License Licensed under the BSD-3 License.