2021-05-22 16:12:26 -07:00
|
|
|
# GoFlow2
|
|
|
|
|
|
|
|
[](https://github.com/netsampler/goflow2/actions?query=workflow%3ABuild)
|
|
|
|
[](https://pkg.go.dev/github.com/netsampler/goflow2)
|
|
|
|
|
|
|
|
This application is a NetFlow/IPFIX/sFlow collector in Go.
|
|
|
|
|
|
|
|
It gathers network information (IP, interfaces, routers) from different flow protocols,
|
|
|
|
serializes it in a common format.
|
|
|
|
|
|
|
|
You will want to use GoFlow if:
|
|
|
|
* You receive a decent amount of network samples and need horizontal scalability
|
|
|
|
* Have protocol diversity and need a consistent format
|
|
|
|
* Require raw samples and build aggregation and custom enrichment
|
|
|
|
|
|
|
|
This software is the entry point of a pipeline. The storage, transport, enrichment, graphing, alerting are
|
|
|
|
not provided.
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
## Origins
|
|
|
|
|
|
|
|
This work is a fork of a previous [open-source GoFlow code](https://github.com/cloudflare/goflow) built and used at Cloudflare.
|
|
|
|
It lives in its own GitHub organization to be maintained more easily.
|
|
|
|
|
|
|
|
Among the differences with the original code:
|
2021-06-04 10:31:52 +02:00
|
|
|
The serializer and transport options have been revamped to make this program more user-friendly
|
2021-05-22 16:12:26 -07:00
|
|
|
and target new use-cases like logging providers.
|
|
|
|
Minimal changes in the decoding libraries.
|
|
|
|
|
|
|
|
## Modularity
|
|
|
|
|
|
|
|
In order to enable load-balancing and optimizations, the GoFlow library has a `decoder` which converts
|
|
|
|
the payload of a flow packet into a Go structure.
|
|
|
|
|
|
|
|
The `producer` functions (one per protocol) then converts those structures into a protobuf (`pb/flow.pb`)
|
|
|
|
which contains the fields a network engineer is interested in.
|
|
|
|
The flow packets usually contains multiples samples
|
|
|
|
This acts as an abstraction of a sample.
|
|
|
|
|
|
|
|
The `format` directory offers various utilities to process the protobuf. It can convert
|
|
|
|
|
|
|
|
The `transport` provides different way of processing the protobuf. Either sending it via Kafka or
|
|
|
|
send it to a file (or stdout).
|
|
|
|
|
|
|
|
GoFlow2 is a wrapper of all the functions and chains thems.
|
|
|
|
|
|
|
|
You can build your own collector using this base and replace parts:
|
2021-06-04 10:31:52 +02:00
|
|
|
* Use different transport (e.g: RabbitMQ instead of Kafka)
|
|
|
|
* Convert to another format (e.g: Cap'n Proto, Avro, instead of protobuf)
|
|
|
|
* Decode different samples (e.g: not only IP networks, add MPLS)
|
|
|
|
* Different metrics system (e.g: [OpenTelemetry](https://opentelemetry.io/))
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
### Protocol difference
|
|
|
|
|
|
|
|
The sampling protocols have distinct features:
|
|
|
|
|
|
|
|
**sFlow** is a stateless protocol which sends the full header of a packet with router information
|
2021-06-04 10:31:52 +02:00
|
|
|
(interfaces, destination AS) while **NetFlow/IPFIX** rely on templates that contain fields (e.g: source IPv6).
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
The sampling rate in NetFlow/IPFIX is provided by **Option Data Sets**. This is why it can take a few minutes
|
|
|
|
for the packets to be decoded until all the templates are received (**Option Template** and **Data Template**).
|
|
|
|
|
|
|
|
Both of these protocols bundle multiple samples (**Data Set** in NetFlow/IPFIX and **Flow Sample** in sFlow)
|
|
|
|
in one packet.
|
|
|
|
|
|
|
|
The advantages of using an abstract network flow format, such as protobuf, is it enables summing over the
|
2021-06-04 10:31:52 +02:00
|
|
|
protocols (e.g: per ASN or per port, rather than per (ASN, router) and (port, router)).
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
To read more about the protocols and how they are mapped inside, check out [page](/docs/protocols.md)
|
|
|
|
|
|
|
|
### Features of GoFlow2
|
|
|
|
|
|
|
|
Collection:
|
|
|
|
* NetFlow v5
|
|
|
|
* IPFIX/NetFlow v9 (sampling rate provided by the Option Data Set)
|
|
|
|
* sFlow v5
|
|
|
|
|
|
|
|
(adding NetFlow v1,7,8 is being evaluated)
|
|
|
|
|
|
|
|
Production:
|
|
|
|
* Convert to protobuf or json
|
|
|
|
* Prints to the console/file
|
|
|
|
* Sends to Kafka and partition
|
|
|
|
|
|
|
|
Monitoring via Prometheus metrics
|
|
|
|
|
|
|
|
## Get started
|
|
|
|
|
|
|
|
To read about agents that samples network traffic, check this [page](/docs/agents.md).
|
|
|
|
|
2021-06-04 10:31:52 +02:00
|
|
|
To set up the collector, download the latest release corresponding to your OS
|
2021-05-22 16:12:26 -07:00
|
|
|
and run the following command (the binaries have a suffix with the version):
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ ./goflow2
|
|
|
|
```
|
|
|
|
|
|
|
|
By default, this command will launch an sFlow collector on port `:6343` and
|
|
|
|
a NetFlowV9/IPFIX collector on port `:2055`.
|
|
|
|
|
|
|
|
By default, the samples received will be printed in JSON format on the stdout.
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"Type": "SFLOW_5",
|
|
|
|
"TimeFlowEnd": 1621820000,
|
|
|
|
"TimeFlowStart": 1621820000,
|
|
|
|
"TimeReceived": 1621820000,
|
|
|
|
"Bytes": 70,
|
|
|
|
"Packets": 1,
|
|
|
|
"SamplingRate": 100,
|
|
|
|
"SamplerAddress": "192.168.1.254",
|
|
|
|
"DstAddr": "10.0.0.1",
|
|
|
|
"DstMac": "ff:ff:ff:ff:ff:ff",
|
|
|
|
"SrcAddr": "192.168.1.1",
|
|
|
|
"SrcMac": "ff:ff:ff:ff:ff:ff",
|
|
|
|
"InIf": 1,
|
|
|
|
"OutIf": 2,
|
|
|
|
"Etype": 2048,
|
|
|
|
"EtypeName": "IPv4",
|
|
|
|
"Proto": 6,
|
|
|
|
"ProtoName": "TCP",
|
|
|
|
"SrcPort": 443,
|
|
|
|
"DstPort": 46344,
|
|
|
|
"FragmentId": 54044,
|
|
|
|
"FragmentOffset": 16384,
|
|
|
|
...
|
|
|
|
"IPTTL": 64,
|
|
|
|
"IPTos": 0,
|
|
|
|
"TCPFlags": 16,
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2021-06-04 10:31:52 +02:00
|
|
|
If you are using a log integration (e.g: Loki with Promtail, Splunk, Fluentd, Google Cloud Logs, etc.),
|
2021-05-22 16:12:26 -07:00
|
|
|
just send the output into a file.
|
|
|
|
```bash
|
|
|
|
$ ./goflow2 -transport.file /var/logs/goflow2.log
|
|
|
|
```
|
|
|
|
|
|
|
|
To enable Kafka and send protobuf, use the following arguments:
|
|
|
|
```bash
|
|
|
|
$ ./goflow2 -transport=kafka -transport.kafka.brokers=localhost:9092 -transport.kafka.topic=flows -format=pb
|
|
|
|
```
|
|
|
|
|
|
|
|
By default, the distribution will be randomized.
|
|
|
|
To partition the feed (any field of the protobuf is available), the following options can be used:
|
|
|
|
```
|
|
|
|
-transport.kafka.hashing=true \
|
|
|
|
-format.hash=SamplerAddress,DstAS
|
|
|
|
```
|
|
|
|
|
2022-10-08 14:17:18 -07:00
|
|
|
By default, compression is disabled when sending data to Kafka.
|
|
|
|
To change the kafka compression type of the producer side configure the following option:
|
|
|
|
```
|
|
|
|
-transport.kafka.compression.type=gzip
|
|
|
|
```
|
|
|
|
The list of codecs is available in the [Sarama documentation](https://pkg.go.dev/github.com/Shopify/sarama#CompressionCodec).
|
|
|
|
|
|
|
|
|
2022-10-08 08:50:23 -07:00
|
|
|
By default, the collector will listen for IPFIX/NetFlow V9 on port 2055
|
|
|
|
and sFlow on port 6343.
|
|
|
|
To change the sockets binding, you can set the `-listen` argument and a URI
|
|
|
|
for each protocol (`netflow`, `sflow` and `nfl` as scheme) separated by a comma.
|
|
|
|
For instance, to create 4 parallel sockets of sFlow and one of NetFlow V5, you can use:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ ./goflow2 -listen 'sflow://:6343?count=4,nfl://:2055'
|
|
|
|
```
|
|
|
|
|
2021-05-22 16:12:26 -07:00
|
|
|
### Docker
|
|
|
|
|
|
|
|
You can also run directly with a container:
|
|
|
|
```
|
|
|
|
$ sudo docker run -p 6343:6343/udp -p 2055:2055/udp -ti netsampler/goflow2:latest
|
|
|
|
```
|
|
|
|
|
2022-08-02 22:28:47 -07:00
|
|
|
### Mapping extra fields
|
|
|
|
|
|
|
|
In the case of exotic template fields or extra payload not supported by GoFlow2
|
|
|
|
of out the box, it is possible to pass a mapping file using `-mapping mapping.yaml`.
|
|
|
|
A [sample file](cmd/goflow2/mapping.yaml) is available in the `cmd/goflow2` directory.
|
|
|
|
|
|
|
|
For instance, certain devices producing IPFIX use `ingressPhysicalInterface` (id: 252)
|
|
|
|
and do not use `ingressInterface` (id: 10). Using the following you can have the interface mapped
|
|
|
|
in the InIf protobuf field without changing the code.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
ipfix:
|
|
|
|
mapping:
|
|
|
|
- field: 252
|
|
|
|
destination: InIf
|
|
|
|
- field: 253
|
|
|
|
destination: OutIf
|
|
|
|
```
|
|
|
|
|
2021-05-22 16:12:26 -07:00
|
|
|
### Output format considerations
|
|
|
|
|
|
|
|
The JSON format is advised only when consuming a small amount of data directly.
|
|
|
|
For bigger workloads, the protobuf output format provides a binary representation
|
|
|
|
and is preferred.
|
2021-06-04 10:31:52 +02:00
|
|
|
It can also be extended with enrichment as long as the user keep the same IDs.
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
If you want to develop applications, build `pb/flow.proto` into the language you want:
|
|
|
|
When adding custom fields, picking a field ID ≥ 1000 is suggested.
|
|
|
|
|
2021-06-07 22:43:09 -07:00
|
|
|
Check the docs for more information about [compiling protobuf](/docs/protobuf.md).
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
## Flow Pipeline
|
|
|
|
|
|
|
|
A basic enrichment tool is available in the `cmd/enricher` directory.
|
|
|
|
You need to load the Maxmind GeoIP ASN and Country databases using `-db.asn` and `-db.country`.
|
|
|
|
|
|
|
|
Running a flow enrichment system is as simple as a pipe.
|
|
|
|
Once you plug the stdin of the enricher to the stdout of GoFlow in protobuf,
|
|
|
|
the source and destination IP addresses will automatically be mapped
|
|
|
|
with a database for Autonomous System Number and Country.
|
|
|
|
Similar output options as GoFlow are provided.
|
|
|
|
|
|
|
|
```bash
|
2021-08-27 10:04:22 -07:00
|
|
|
$ ./goflow2 -transport.file.sep= -format=pb -format.protobuf.fixedlen=true | ./enricher -db.asn path-to/GeoLite2-ASN.mmdb -db.country path-to/GeoLite2-Country.mmdb
|
2021-05-22 16:12:26 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
For a more scalable production setting, Kafka and protobuf are recommended.
|
|
|
|
Stream operations (aggregation and filtering) can be done with stream-processor tools.
|
|
|
|
For instance Flink, or the more recent Kafka Streams and kSQLdb.
|
2021-05-31 22:10:05 -07:00
|
|
|
Direct storage can be done with data-warehouses like Clickhouse.
|
2021-05-22 16:12:26 -07:00
|
|
|
|
|
|
|
In some cases, the consumer will require protobuf messages to be prefixed by
|
|
|
|
length. To do this, use the flag `-format.protobuf.fixedlen=true`.
|
|
|
|
|
2021-05-31 22:10:05 -07:00
|
|
|
This repository contains [examples of pipelines](./compose) with docker-compose.
|
|
|
|
The available pipelines are:
|
|
|
|
* [Kafka+Clickhouse+Grafana](./compose/kcg)
|
|
|
|
* [Logstash+Elastic+Kibana](./compose/elk)
|
|
|
|
|
2021-05-22 16:12:26 -07:00
|
|
|
## User stories
|
|
|
|
|
|
|
|
Are you using GoFlow2 in production at scale? Add yourself here!
|
|
|
|
|
|
|
|
### Contributions
|
|
|
|
|
2021-06-04 10:31:52 +02:00
|
|
|
This project welcomes pull-requests, whether it's documentation,
|
|
|
|
instrumentation (e.g: docker-compose, metrics), internals (protocol libraries),
|
2021-05-22 16:12:26 -07:00
|
|
|
integration (new CLI feature) or else!
|
|
|
|
Just make sure to check for the use-cases via an issue.
|
|
|
|
|
|
|
|
This software would not exist without the testing and commits from
|
|
|
|
its users and [contributors](docs/contributors.md).
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
|
|
Licensed under the BSD-3 License.
|