View on GitHub

binary-trees

Extending implementation of binary trees test of the Computer Language Benchmarks Game for Go language

Boost performance “binary trees” of a benchmark game more than 3x times

The Computer Language Benchmarks Game is an excellent synthetic test for compilers. There is different kind of algorithmic tasks implemented using popular programming languages.

Go language is one of my favorite language which I use for my pet projects and production-ready projects.

I was surprised that Go is slower than several “native” languages (native against VM/JIT) on several tests, for ex. Binary trees.

Binary trees is a memory-intensive using task. It approved by profiling. I’v profiled the best implementation for Go - Go #8. A result - bottleneck is memory allocations.

Buffered allocation

A source implementation has once allocation-intensive point:

func bottomUpTree(depth uint32) *Tree {
    tree := &Tree{}
    // .....
}

A little bit refactoring move that piece of code to an “allocator” to easy using a different kind of allocators in a project:

func (o *AllocatorNaive) NewTree() *Tree {
	return &Tree{}
}

func NewTree(depth uint32, allocator Allocator) (tree *Tree) {
	tree = allocator.NewTree()
    // .....
}

This function was calling about 600 000 000 times of a “naive” allocator for 21 depth, and it was a place for 97.9% memory allocations.

The improvement is based on a task-specific prediction of allocation count. A binary tree has 2^(depth+1) nodes including a root node. The buffered allocator is allocating this count of nodes once for a tree:

func NewAllocatorBuffered(depth uint32) Allocator {
	return &AllocatorBuffered{
		buffer: make([]Tree, 1<<(depth+1)),
	}
}

A result - most calling allocation is that function for 5 580 000 times in ~107 times lesser than a naive allocator.

Build and run

You need Go 1.12+ with module support (GO111MODULE=on) to build or run a project:

# build
go build

# run
./binary-trees -allocator 0 21

# run skipping build 
go run . -allocator 1 21

A command supports options to choose allocator and depth of binary trees. Supported allocators:

Estimate CPU performance of allocators

An environment:

The result:

Allocator Min (sec.) Max (sec.) Avg (sec.)
Naive 15.452 15.520 15.4796
Buffered 4.816 4.895 4.8432
Diff     3.196x

A conclusion

P.S.

Go standard library include sync.Pool. It is your best friend in general cases, as the author of the fasthttp said.