diff options
Diffstat (limited to 'vendor/github.com/klauspost/reedsolomon/README.md')
-rw-r--r-- | vendor/github.com/klauspost/reedsolomon/README.md | 395 |
1 files changed, 395 insertions, 0 deletions
diff --git a/vendor/github.com/klauspost/reedsolomon/README.md b/vendor/github.com/klauspost/reedsolomon/README.md new file mode 100644 index 0000000..f9824cb --- /dev/null +++ b/vendor/github.com/klauspost/reedsolomon/README.md @@ -0,0 +1,395 @@ +# Reed-Solomon +[![GoDoc][1]][2] [![Build Status][3]][4] + +[1]: https://godoc.org/github.com/klauspost/reedsolomon?status.svg +[2]: https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc +[3]: https://travis-ci.org/klauspost/reedsolomon.svg?branch=master +[4]: https://travis-ci.org/klauspost/reedsolomon + +Reed-Solomon Erasure Coding in Go, with speeds exceeding 1GB/s/cpu core implemented in pure Go. + +This is a Go port of the [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon) library released by +[Backblaze](http://backblaze.com), with some additional optimizations. + +For an introduction on erasure coding, see the post on the [Backblaze blog](https://www.backblaze.com/blog/reed-solomon/). + +Package home: https://github.com/klauspost/reedsolomon + +Godoc: https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc + +# Installation +To get the package use the standard: +```bash +go get -u github.com/klauspost/reedsolomon +``` + +# Changes + +## May 2020 + +* ARM64 optimizations, up to 2.5x faster. +* Added [WithFastOneParityMatrix](https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc#WithFastOneParityMatrix) for faster operation with 1 parity shard. +* Much better performance when using a limited number of goroutines. +* AVX512 is now using multiple cores. +* Stream processing overhaul, big speedups in most cases. +* AVX512 optimizations + +## March 6, 2019 + +The pure Go implementation is about 30% faster. Minor tweaks to assembler implementations. + +## February 8, 2019 + +AVX512 accelerated version added for Intel Skylake CPUs. This can give up to a 4x speed improvement as compared to AVX2. +See [here](https://github.com/klauspost/reedsolomon#performance-on-avx512) for more details. + +## December 18, 2018 + +Assembly code for ppc64le has been contributed, this boosts performance by about 10x on this platform. + +## November 18, 2017 + +Added [WithAutoGoroutines](https://godoc.org/github.com/klauspost/reedsolomon#WithAutoGoroutines) which will attempt +to calculate the optimal number of goroutines to use based on your expected shard size and detected CPU. + +## October 1, 2017 + +* [Cauchy Matrix](https://godoc.org/github.com/klauspost/reedsolomon#WithCauchyMatrix) is now an option. +Thanks to [templexxx](https://github.com/templexxx) for the basis of this. + +* Default maximum number of [goroutines](https://godoc.org/github.com/klauspost/reedsolomon#WithMaxGoroutines) +has been increased for better multi-core scaling. + +* After several requests the Reconstruct and ReconstructData now slices of zero length but sufficient capacity to +be used instead of allocating new memory. + +## August 26, 2017 + +* The [`Encoder()`](https://godoc.org/github.com/klauspost/reedsolomon#Encoder) now contains an `Update` +function contributed by [chenzhongtao](https://github.com/chenzhongtao). + +* [Frank Wessels](https://github.com/fwessels) kindly contributed ARM 64 bit assembly, +which gives a huge performance boost on this platform. + +## July 20, 2017 + +`ReconstructData` added to [`Encoder`](https://godoc.org/github.com/klauspost/reedsolomon#Encoder) interface. +This can cause compatibility issues if you implement your own Encoder. A simple workaround can be added: + +```Go +func (e *YourEnc) ReconstructData(shards [][]byte) error { + return ReconstructData(shards) +} +``` + +You can of course also do your own implementation. +The [`StreamEncoder`](https://godoc.org/github.com/klauspost/reedsolomon#StreamEncoder) +handles this without modifying the interface. +This is a good lesson on why returning interfaces is not a good design. + +# Usage + +This section assumes you know the basics of Reed-Solomon encoding. +A good start is this [Backblaze blog post](https://www.backblaze.com/blog/reed-solomon/). + +This package performs the calculation of the parity sets. The usage is therefore relatively simple. + +First of all, you need to choose your distribution of data and parity shards. +A 'good' distribution is very subjective, and will depend a lot on your usage scenario. +A good starting point is above 5 and below 257 data shards (the maximum supported number), +and the number of parity shards to be 2 or above, and below the number of data shards. + +To create an encoder with 10 data shards (where your data goes) and 3 parity shards (calculated): +```Go + enc, err := reedsolomon.New(10, 3) +``` +This encoder will work for all parity sets with this distribution of data and parity shards. +The error will only be set if you specify 0 or negative values in any of the parameters, +or if you specify more than 256 data shards. + +If you will primarily be using it with one shard size it is recommended to use +[`WithAutoGoroutines(shardSize)`](https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc#WithAutoGoroutines) +as an additional parameter. This will attempt to calculate the optimal number of goroutines to use for the best speed. +It is not required that all shards are this size. + +The you send and receive data is a simple slice of byte slices; `[][]byte`. +In the example above, the top slice must have a length of 13. + +```Go + data := make([][]byte, 13) +``` +You should then fill the 10 first slices with *equally sized* data, +and create parity shards that will be populated with parity data. In this case we create the data in memory, +but you could for instance also use [mmap](https://github.com/edsrzf/mmap-go) to map files. + +```Go + // Create all shards, size them at 50000 each + for i := range input { + data[i] := make([]byte, 50000) + } + + + // Fill some data into the data shards + for i, in := range data[:10] { + for j:= range in { + in[j] = byte((i+j)&0xff) + } + } +``` + +To populate the parity shards, you simply call `Encode()` with your data. +```Go + err = enc.Encode(data) +``` +The only cases where you should get an error is, if the data shards aren't of equal size. +The last 3 shards now contain parity data. You can verify this by calling `Verify()`: + +```Go + ok, err = enc.Verify(data) +``` + +The final (and important) part is to be able to reconstruct missing shards. +For this to work, you need to know which parts of your data is missing. +The encoder *does not know which parts are invalid*, so if data corruption is a likely scenario, +you need to implement a hash check for each shard. + +If a byte has changed in your set, and you don't know which it is, there is no way to reconstruct the data set. + +To indicate missing data, you set the shard to nil before calling `Reconstruct()`: + +```Go + // Delete two data shards + data[3] = nil + data[7] = nil + + // Reconstruct the missing shards + err := enc.Reconstruct(data) +``` +The missing data and parity shards will be recreated. If more than 3 shards are missing, the reconstruction will fail. + +If you are only interested in the data shards (for reading purposes) you can call `ReconstructData()`: + +```Go + // Delete two data shards + data[3] = nil + data[7] = nil + + // Reconstruct just the missing data shards + err := enc.ReconstructData(data) +``` + +So to sum up reconstruction: +* The number of data/parity shards must match the numbers used for encoding. +* The order of shards must be the same as used when encoding. +* You may only supply data you know is valid. +* Invalid shards should be set to nil. + +For complete examples of an encoder and decoder see the +[examples folder](https://github.com/klauspost/reedsolomon/tree/master/examples). + +# Splitting/Joining Data + +You might have a large slice of data. +To help you split this, there are some helper functions that can split and join a single byte slice. + +```Go + bigfile, _ := ioutil.Readfile("myfile.data") + + // Split the file + split, err := enc.Split(bigfile) +``` +This will split the file into the number of data shards set when creating the encoder and create empty parity shards. + +An important thing to note is that you have to *keep track of the exact input size*. +If the size of the input isn't divisible by the number of data shards, extra zeros will be inserted in the last shard. + +To join a data set, use the `Join()` function, which will join the shards and write it to the `io.Writer` you supply: +```Go + // Join a data set and write it to io.Discard. + err = enc.Join(io.Discard, data, len(bigfile)) +``` + +# Streaming/Merging + +It might seem like a limitation that all data should be in memory, +but an important property is that *as long as the number of data/parity shards are the same, +you can merge/split data sets*, and they will remain valid as a separate set. + +```Go + // Split the data set of 50000 elements into two of 25000 + splitA := make([][]byte, 13) + splitB := make([][]byte, 13) + + // Merge into a 100000 element set + merged := make([][]byte, 13) + + for i := range data { + splitA[i] = data[i][:25000] + splitB[i] = data[i][25000:] + + // Concatenate it to itself + merged[i] = append(make([]byte, 0, len(data[i])*2), data[i]...) + merged[i] = append(merged[i], data[i]...) + } + + // Each part should still verify as ok. + ok, err := enc.Verify(splitA) + if ok && err == nil { + log.Println("splitA ok") + } + + ok, err = enc.Verify(splitB) + if ok && err == nil { + log.Println("splitB ok") + } + + ok, err = enc.Verify(merge) + if ok && err == nil { + log.Println("merge ok") + } +``` + +This means that if you have a data set that may not fit into memory, you can split processing into smaller blocks. +For the best throughput, don't use too small blocks. + +This also means that you can divide big input up into smaller blocks, and do reconstruction on parts of your data. +This doesn't give the same flexibility of a higher number of data shards, but it will be much more performant. + +# Streaming API + +There has been added support for a streaming API, to help perform fully streaming operations, +which enables you to do the same operations, but on streams. +To use the stream API, use [`NewStream`](https://godoc.org/github.com/klauspost/reedsolomon#NewStream) function +to create the encoding/decoding interfaces. + +You can use [`WithConcurrentStreams`](https://godoc.org/github.com/klauspost/reedsolomon#WithConcurrentStreams) +to ready an interface that reads/writes concurrently from the streams. + +You can specify the size of each operation using +[`WithStreamBlockSize`](https://godoc.org/github.com/klauspost/reedsolomon#WithStreamBlockSize). +This will set the size of each read/write operation. + +Input is delivered as `[]io.Reader`, output as `[]io.Writer`, and functionality corresponds to the in-memory API. +Each stream must supply the same amount of data, similar to how each slice must be similar size with the in-memory API. +If an error occurs in relation to a stream, +a [`StreamReadError`](https://godoc.org/github.com/klauspost/reedsolomon#StreamReadError) +or [`StreamWriteError`](https://godoc.org/github.com/klauspost/reedsolomon#StreamWriteError) +will help you determine which stream was the offender. + +There is no buffering or timeouts/retry specified. If you want to add that, you need to add it to the Reader/Writer. + +For complete examples of a streaming encoder and decoder see the +[examples folder](https://github.com/klauspost/reedsolomon/tree/master/examples). + +# Advanced Options + +You can modify internal options which affects how jobs are split between and processed by goroutines. + +To create options, use the WithXXX functions. You can supply options to `New`, `NewStream`. +If no Options are supplied, default options are used. + +Example of how to supply options: + + ```Go + enc, err := reedsolomon.New(10, 3, WithMaxGoroutines(25)) + ``` + + +# Performance +Performance depends mainly on the number of parity shards. +In rough terms, doubling the number of parity shards will double the encoding time. + +Here are the throughput numbers with some different selections of data and parity shards. +For reference each shard is 1MB random data, and 2 CPU cores are used for encoding. + +| Data | Parity | Parity | MB/s | SSSE3 MB/s | SSSE3 Speed | Rel. Speed | +|------|--------|--------|--------|-------------|-------------|------------| +| 5 | 2 | 40% | 576,11 | 2599,2 | 451% | 100,00% | +| 10 | 2 | 20% | 587,73 | 3100,28 | 528% | 102,02% | +| 10 | 4 | 40% | 298,38 | 2470,97 | 828% | 51,79% | +| 50 | 20 | 40% | 59,81 | 713,28 | 1193% | 10,38% | + +If `runtime.GOMAXPROCS()` is set to a value higher than 1, +the encoder will use multiple goroutines to perform the calculations in `Verify`, `Encode` and `Reconstruct`. + +Example of performance scaling on AMD Ryzen 3950X - 16 physical cores, 32 logical cores, AVX 2. +The example uses 10 blocks with 1MB data each and 4 parity blocks. + +| Threads | Speed | +|---------|------------| +| 1 | 9979 MB/s | +| 2 | 18870 MB/s | +| 4 | 33697 MB/s | +| 8 | 51531 MB/s | +| 16 | 59204 MB/s | + + +Benchmarking `Reconstruct()` followed by a `Verify()` (=`all`) versus just calling `ReconstructData()` (=`data`) gives the following result: +``` +benchmark all MB/s data MB/s speedup +BenchmarkReconstruct10x2x10000-8 2011.67 10530.10 5.23x +BenchmarkReconstruct50x5x50000-8 4585.41 14301.60 3.12x +BenchmarkReconstruct10x2x1M-8 8081.15 28216.41 3.49x +BenchmarkReconstruct5x2x1M-8 5780.07 28015.37 4.85x +BenchmarkReconstruct10x4x1M-8 4352.56 14367.61 3.30x +BenchmarkReconstruct50x20x1M-8 1364.35 4189.79 3.07x +BenchmarkReconstruct10x4x16M-8 1484.35 5779.53 3.89x +``` + +# Performance on AVX512 + +The performance on AVX512 has been accelerated for Intel CPUs. +This gives speedups on a per-core basis typically up to 2x compared to +AVX2 as can be seen in the following table: + +``` +[...] +``` + +This speedup has been achieved by computing multiple parity blocks in parallel as opposed to one after the other. +In doing so it is possible to minimize the memory bandwidth required for loading all data shards. +At the same time the calculations are performed in the 512-bit wide ZMM registers and the surplus of ZMM +registers (32 in total) is used to keep more data around (most notably the matrix coefficients). + +# Performance on ARM64 NEON + +By exploiting NEON instructions the performance for ARM has been accelerated. +Below are the performance numbers for a single core on an EC2 m6g.16xlarge (Graviton2) instance (Amazon Linux 2): + +``` +BenchmarkGalois128K-64 119562 10028 ns/op 13070.78 MB/s +BenchmarkGalois1M-64 14380 83424 ns/op 12569.22 MB/s +BenchmarkGaloisXor128K-64 96508 12432 ns/op 10543.29 MB/s +BenchmarkGaloisXor1M-64 10000 100322 ns/op 10452.13 MB/s +``` + +# Performance on ppc64le + +The performance for ppc64le has been accelerated. +This gives roughly a 10x performance improvement on this architecture as can been seen below: + +``` +benchmark old MB/s new MB/s speedup +BenchmarkGalois128K-160 948.87 8878.85 9.36x +BenchmarkGalois1M-160 968.85 9041.92 9.33x +BenchmarkGaloisXor128K-160 862.02 7905.00 9.17x +BenchmarkGaloisXor1M-160 784.60 6296.65 8.03x +``` + +# asm2plan9s + +[asm2plan9s](https://github.com/fwessels/asm2plan9s) is used for assembling the AVX2 instructions into their BYTE/WORD/LONG equivalents. + +# Links +* [Backblaze Open Sources Reed-Solomon Erasure Coding Source Code](https://www.backblaze.com/blog/reed-solomon/). +* [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon). Compatible java library by Backblaze. +* [ocaml-reed-solomon-erasure](https://gitlab.com/darrenldl/ocaml-reed-solomon-erasure). Compatible OCaml implementation. +* [reedsolomon-c](https://github.com/jannson/reedsolomon-c). C version, compatible with output from this package. +* [Reed-Solomon Erasure Coding in Haskell](https://github.com/NicolasT/reedsolomon). Haskell port of the package with similar performance. +* [reed-solomon-erasure](https://github.com/darrenldl/reed-solomon-erasure). Compatible Rust implementation. +* [go-erasure](https://github.com/somethingnew2-0/go-erasure). A similar library using cgo, slower in my tests. +* [Screaming Fast Galois Field Arithmetic](http://www.snia.org/sites/default/files2/SDC2013/presentations/NewThinking/EthanMiller_Screaming_Fast_Galois_Field%20Arithmetic_SIMD%20Instructions.pdf). Basis for SSE3 optimizations. + +# License + +This code, as the original [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon) is published under an MIT license. See LICENSE file for more information. |