Documentation
¶
Overview ¶
Package avro encodes and decodes Avro binary data.
Parse an Avro JSON schema with Parse (or MustParse for package-level vars), then call Schema.Encode / Schema.Decode to convert between Go values and Avro binary. See Schema.Decode for the full Go-to-Avro type mapping.
Basic usage ¶
schema := avro.MustParse(`{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
]
}`)
type User struct {
Name string `avro:"name"`
Age int `avro:"age"`
}
// Encode
data, err := schema.Encode(&User{Name: "Alice", Age: 30})
// Decode
var u User
_, err = schema.Decode(data, &u)
Schema evolution ¶
Avro data is always written with a specific schema — the "writer schema." When you read that data later, your application may expect a different schema — the "reader schema." For example, you may have added a field, removed one, or widened a type from int to long. The data on disk doesn't change, but your code expects the new layout.
Resolve bridges this gap. Given the writer and reader schemas, it returns a new schema that knows how to decode the old wire format and produce values in the reader's layout:
- Fields in the reader but not the writer are filled from defaults.
- Fields in the writer but not the reader are skipped.
- Fields that exist in both are matched by name (or alias) and decoded, with type promotion applied where needed (e.g. int → long).
You typically get the writer schema from the data itself: an [ocf] file header embeds it, and schema registries store it by ID or fingerprint.
As a concrete example, suppose v1 of your application wrote User records with just a name:
var writerSchema = avro.MustParse(`{
"type": "record", "name": "User",
"fields": [
{"name": "name", "type": "string"}
]
}`)
In v2, you added an email field with a default:
var readerSchema = avro.MustParse(`{
"type": "record", "name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "email", "type": "string", "default": ""}
]
}`)
type User struct {
Name string `avro:"name"`
Email string `avro:"email"`
}
To read old v1 data with your v2 struct, resolve the two schemas:
resolved, err := avro.Resolve(writerSchema, readerSchema)
// Decode v1 data: "email" is absent in the old data, so it gets
// the reader default ("").
var u User
_, err = resolved.Decode(v1Data, &u)
// u == User{Name: "Alice", Email: ""}
If you just want to check whether two schemas are compatible without building a resolved schema, use CheckCompatibility.
Single Object Encoding ¶
For sending self-describing values over the wire (as opposed to files, where [ocf] is preferred), use Schema.AppendSingleObject and Schema.DecodeSingleObject. To decode without knowing the schema in advance, extract the fingerprint with SingleObjectFingerprint and look it up in your own registry.
Fingerprinting ¶
Schema.Canonical returns the Parsing Canonical Form for deterministic comparison. Schema.Fingerprint hashes it with any hash.Hash; use NewRabin for the Avro-standard CRC-64-AVRO.
Example ¶
package main
import (
"fmt"
"log"
"github.com/twmb/avro"
)
func main() {
schema := avro.MustParse(`{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
]
}`)
type User struct {
Name string `avro:"name"`
Age int32 `avro:"age"`
}
data, err := schema.Encode(&User{Name: "Alice", Age: 30})
if err != nil {
log.Fatal(err)
}
var u User
if _, err := schema.Decode(data, &u); err != nil {
log.Fatal(err)
}
fmt.Printf("%s is %d\n", u.Name, u.Age)
}
Output: Alice is 30
Index ¶
- func CheckCompatibility(writer, reader *Schema) error
- func NewRabin() hash.Hash64
- func SingleObjectFingerprint(data []byte) (fp [8]byte, rest []byte, err error)
- type CompatibilityError
- type Duration
- type Schema
- func (s *Schema) AppendEncode(dst []byte, v any) ([]byte, error)
- func (s *Schema) AppendSingleObject(dst []byte, v any) ([]byte, error)
- func (s *Schema) Canonical() []byte
- func (s *Schema) Decode(src []byte, v any) ([]byte, error)
- func (s *Schema) DecodeSingleObject(data []byte, v any) ([]byte, error)
- func (s *Schema) Encode(v any) ([]byte, error)
- func (s *Schema) Fingerprint(h hash.Hash) []byte
- func (s *Schema) String() string
- type SchemaCache
- type SemanticError
- type ShortBufferError
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CheckCompatibility ¶
CheckCompatibility reports whether data written with the writer schema can be read by the reader schema. It returns nil on success or a *CompatibilityError describing the first incompatibility.
See Resolve for a note on argument order.
Types ¶
type CompatibilityError ¶
type CompatibilityError struct {
// Path is the dotted path to the incompatible element (e.g. "User.address.zip").
Path string
// ReaderType is the Avro type in the reader schema.
ReaderType string
// WriterType is the Avro type in the writer schema.
WriterType string
// Detail describes the specific incompatibility.
Detail string
}
CompatibilityError describes an incompatibility between a reader and writer schema, as returned by CheckCompatibility and Resolve.
func (*CompatibilityError) Error ¶
func (e *CompatibilityError) Error() string
type Duration ¶
Duration represents the Avro duration logical type: a 12-byte fixed value containing three little-endian unsigned 32-bit integers representing months, days, and milliseconds.
type Schema ¶
type Schema struct {
// contains filtered or unexported fields
}
Schema is a compiled Avro schema. Create one with Parse or MustParse, then use Schema.Encode / Schema.Decode to convert between Go values and Avro binary. A Schema is safe for concurrent use.
func MustParse ¶
MustParse is like Parse but panics on error. Useful for package-level var declarations.
func Parse ¶
Parse parses an Avro JSON schema string and returns a compiled *Schema. The input can be a primitive name (e.g. `"string"`), a JSON object (record, enum, array, map, fixed), or a JSON array (union). Named types may reference each other and self-reference. The schema is fully validated: unknown types, duplicate names, invalid defaults, etc. all return errors.
To parse schemas that reference named types from other schemas, use SchemaCache.
func Resolve ¶
Resolve returns a schema that decodes data written with the writer schema and produces values matching the reader schema's layout. The writer schema is what the data was encoded with (typically from an [ocf] file header or a schema registry); the reader schema is what your application expects now.
Decoding with the returned schema handles field addition (defaults), field removal (skip), renaming (aliases), reordering, and type promotion. Encoding with it uses the reader's format.
If the schemas have identical canonical forms, reader is returned as-is. Otherwise CheckCompatibility is called first and any incompatibility is returned as a *CompatibilityError. See the package-level documentation for a full example.
Note: the argument order is (writer, reader), matching source-then-destination convention and Java's GenericDatumReader. This differs from the Avro spec text and hamba/avro, which put reader first.
Example ¶
package main
import (
"fmt"
"log"
"github.com/twmb/avro"
)
func main() {
// v1 wrote User with just a name.
writerSchema := avro.MustParse(`{
"type": "record", "name": "User",
"fields": [{"name": "name", "type": "string"}]
}`)
// v2 added an email field with a default.
readerSchema := avro.MustParse(`{
"type": "record", "name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "email", "type": "string", "default": ""}
]
}`)
resolved, err := avro.Resolve(writerSchema, readerSchema)
if err != nil {
log.Fatal(err)
}
// Encode a v1 record (name only).
v1Data, err := writerSchema.Encode(map[string]any{"name": "Alice"})
if err != nil {
log.Fatal(err)
}
// Decode old data into the new layout; email gets the default.
type User struct {
Name string `avro:"name"`
Email string `avro:"email"`
}
var u User
if _, err := resolved.Decode(v1Data, &u); err != nil {
log.Fatal(err)
}
fmt.Printf("name=%s email=%q\n", u.Name, u.Email)
}
Output: name=Alice email=""
func (*Schema) AppendEncode ¶
AppendEncode appends the Avro binary encoding of v to dst. See Schema.Decode for the Go-to-Avro type mapping.
Example ¶
package main
import (
"fmt"
"log"
"github.com/twmb/avro"
)
func main() {
schema := avro.MustParse(`"string"`)
// AppendEncode reuses a buffer across calls, avoiding allocation.
var buf []byte
var err error
for _, s := range []string{"hello", "world"} {
buf, err = schema.AppendEncode(buf[:0], s)
if err != nil {
log.Fatal(err)
}
fmt.Printf("encoded %q: %d bytes\n", s, len(buf))
}
}
Output: encoded "hello": 6 bytes encoded "world": 6 bytes
func (*Schema) AppendSingleObject ¶
AppendSingleObject appends a Single Object Encoding of v to dst: 2-byte magic, 8-byte CRC-64-AVRO fingerprint, then the Avro binary payload.
Example ¶
package main
import (
"fmt"
"log"
"github.com/twmb/avro"
)
func main() {
schema := avro.MustParse(`{
"type": "record",
"name": "Event",
"fields": [
{"name": "id", "type": "long"},
{"name": "name", "type": "string"}
]
}`)
type Event struct {
ID int64 `avro:"id"`
Name string `avro:"name"`
}
// Encode: 2-byte magic + 8-byte fingerprint + Avro payload.
data, err := schema.AppendSingleObject(nil, &Event{ID: 1, Name: "click"})
if err != nil {
log.Fatal(err)
}
// Decode.
var e Event
if _, err := schema.DecodeSingleObject(data, &e); err != nil {
log.Fatal(err)
}
fmt.Printf("id=%d name=%s\n", e.ID, e.Name)
}
Output: id=1 name=click
func (*Schema) Canonical ¶
Canonical returns the Parsing Canonical Form of the schema, stripping doc, aliases, defaults, and other non-essential attributes. The result is deterministic and suitable for comparison and fingerprinting.
func (*Schema) Decode ¶
Decode reads Avro binary from src into v and returns the remaining bytes. v must be a non-nil pointer to a type compatible with the schema:
- null: any (always decodes to nil)
- boolean: bool, any
- int, long: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, any
- float: float32, float64, any
- double: float64, float32, any
- string: string, []byte, any; also encoding.TextUnmarshaler
- bytes: []byte, string, any
- enum: string, int/int8/.../uint64 (ordinal), any
- fixed: [N]byte, []byte, any
- array: slice, any
- map: map[string]T, any
- union: any, *T (for ["null", T] unions), or the matched branch type
- record: struct (matched by field name or `avro` tag), map[string]any, any
When decoding into any (interface{}), values are returned as their natural Go types: nil, bool, int64, float32, float64, string, []byte, []any, map[string]any, or map[string]any for records.
func (*Schema) DecodeSingleObject ¶
DecodeSingleObject decodes a Single Object Encoding message into v after verifying the magic and fingerprint match this schema.
func (*Schema) Fingerprint ¶
Fingerprint hashes the schema's canonical form with h. Use NewRabin for CRC-64-AVRO or crypto/sha256 for cross-language compatibility.
type SchemaCache ¶
type SchemaCache struct {
// contains filtered or unexported fields
}
SchemaCache accumulates named types across multiple SchemaCache.Parse calls, allowing schemas to reference types defined in previously parsed schemas. This is useful for Schema Registry integrations where schemas have references to other schemas.
Schemas must be parsed in dependency order: referenced types must be parsed before the schemas that reference them.
The returned *Schema from each Parse call is fully resolved and independent of the cache — it can be used for Schema.Encode and Schema.Decode without the cache.
A SchemaCache is safe for concurrent use.
Example ¶
package main
import (
"fmt"
"log"
"github.com/twmb/avro"
)
func main() {
cache := avro.NewSchemaCache()
// Parse the Address type first.
if _, err := cache.Parse(`{
"type": "record",
"name": "Address",
"fields": [
{"name": "street", "type": "string"},
{"name": "city", "type": "string"}
]
}`); err != nil {
log.Fatal(err)
}
// User references Address by name.
schema, err := cache.Parse(`{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "address", "type": "Address"}
]
}`)
if err != nil {
log.Fatal(err)
}
type Address struct {
Street string `avro:"street"`
City string `avro:"city"`
}
type User struct {
Name string `avro:"name"`
Address Address `avro:"address"`
}
data, err := schema.Encode(&User{
Name: "Alice",
Address: Address{Street: "123 Main St", City: "Springfield"},
})
if err != nil {
log.Fatal(err)
}
var u User
if _, err := schema.Decode(data, &u); err != nil {
log.Fatal(err)
}
fmt.Printf("%s lives at %s, %s\n", u.Name, u.Address.Street, u.Address.City)
}
Output: Alice lives at 123 Main St, Springfield
func NewSchemaCache ¶
func NewSchemaCache() *SchemaCache
NewSchemaCache returns a new empty SchemaCache.
func (*SchemaCache) Parse ¶
func (c *SchemaCache) Parse(schema string) (*Schema, error)
Parse parses a schema string, registering any named types (records, enums, fixed) in the cache. Named types from previous Parse calls are available for reference resolution. On failure, the cache is not modified.
type SemanticError ¶
type SemanticError struct {
// GoType is the Go type involved, if applicable.
GoType reflect.Type
// AvroType is the Avro schema type (e.g. "int", "record", "boolean").
AvroType string
// Field is the record field name, if within a record.
Field string
// Err is the underlying error.
Err error
}
SemanticError indicates a Go type is incompatible with an Avro schema type during encoding or decoding.
func (*SemanticError) Error ¶
func (e *SemanticError) Error() string
func (*SemanticError) Unwrap ¶
func (e *SemanticError) Unwrap() error
type ShortBufferError ¶
type ShortBufferError struct {
// Type is what was being read (e.g. "boolean", "string", "uint32").
Type string
// Need is the number of bytes required (0 if unknown).
Need int
// Have is the number of bytes available.
Have int
}
ShortBufferError indicates the input buffer is too short for the value being decoded.
func (*ShortBufferError) Error ¶
func (e *ShortBufferError) Error() string