simhashstash

package module
v0.0.0-...-cc63d3f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2017 License: MIT Imports: 5 Imported by: 0

README

simhashstash

Build Status

A very basic store for keeping track of near-duplicates.

Useful for things like identifying duplicate comments or bug descriptions.

Also potentially useful for identifying anomalous log messages, given a history of "normal" looking log messages.

TODOs:

  • Persistence
  • Docs, examples
  • Benchmarks

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Node

type Node struct {
	Key uint64
	// val is a slice of byte slices because multiple values can
	// generate the same simhash, and we need to handle those collisions.
	Val [][]byte
}

Implements llrb.Item

func (Node) Less

func (a Node) Less(b llrb.Item) bool

func (*Node) String

func (a *Node) String() string

type PersistedTree

type PersistedTree struct {
	Nodes []Node
}

type Stash

type Stash struct {
	// contains filtered or unexported fields
}

func NewStash

func NewStash() *Stash

func (*Stash) Add

func (s *Stash) Add(k, v []byte)

Add adds k to the list of keys associated with v's simhash matches.

func (*Stash) Query

func (s *Stash) Query(in []byte, thresh uint8) [][]byte

Query returns any matches that are within thresh Hamming distance of in's simhash.

func (*Stash) ReadFrom

func (s *Stash) ReadFrom(stream io.Reader) error

func (*Stash) WriteTo

func (s *Stash) WriteTo(stream io.Writer) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL