semanticfw

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2026 License: MIT Imports: 18 Imported by: 0

README

Semantic Firewall

Detect logic corruption that bypasses code reviews.

Go Reference Marketplace Semantic Check


Semantic Firewall generates deterministic fingerprints of your Go code's behavior, not its bytes. It uses Scalar Evolution (SCEV) analysis to prove that syntactically different loops are mathematically identical, and a Semantic Zipper to diff architectural changes without the noise.


Quick Start

# Install
go install github.com/BlackVectorOps/semantic_firewall/cmd/sfw@latest

# Fingerprint a file
sfw check ./main.go

# Semantic diff between two versions
sfw diff old_version.go new_version.go

Check Output:

{
  "file": "./main.go",
  "functions": [
    { "function": "main", "fingerprint": "005efb52a8c9d1e3..." }
  ]
}

Diff Output (The Zipper):

{
  "summary": {
    "semantic_match_pct": 92.5,
    "preserved": 12,
    "modified": 1
  },
  "functions": [
    {
      "function": "HandleLogin",
      "status": "modified",
      "added_ops": ["Call <log.Printf>", "Call <net.Dial>"],
      "removed_ops": []
    }
  ]
}

Why Use This?

"Don't unit tests solve this?" No. Unit tests verify correctness (does input A produce output B?). sfw verifies intent and integrity.

  • A developer refactors a function but secretly adds a network call → unit tests pass, sfw fails.
  • A developer changes a switch to a Strategy Pattern → git diff shows 100 lines changed, sfw diff shows zero logic changes.
Traditional Tooling Semantic Firewall
Git Diff — Shows lines changed (whitespace, renaming = noise) sfw check — Verifies control flow graph identity
Unit Tests — Verify input/output (blind to side effects) sfw diff — Isolates actual logic drift from cosmetic changes

Use cases:

  • Supply chain security — Detect backdoors like the xz attack that pass code review
  • Safe refactoring — Prove your refactor didn't change behavior
  • CI/CD gates — Block PRs that alter critical function logic

CI Integration: Blocker & Reporter Modes

sfw supports two distinct CI roles:

  1. Blocker Mode: When a PR claims to be a refactor (via title or semantic-safe label), sfw enforces strict semantic equivalence. Any logic change fails the build.

  2. Reporter Mode: On feature PRs, sfw runs a semantic diff and generates a drift report (e.g., "Semantic Match: 80%"), helping reviewers focus on the code where behavior actually changed.

GitHub Actions Workflow
name: Semantic Firewall

on:
  pull_request:
    branches: [ "main" ]
    types: [opened, synchronize, reopened, labeled]

jobs:
  semantic-analysis:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-go@v5
        with:
          go-version: '1.24'

      - name: Install sfw
        run: go install github.com/BlackVectorOps/semantic_firewall/cmd/sfw@latest

      - name: Determine Mode
        id: mode
        run: |
          if [[ "${{ contains(github.event.pull_request.labels.*.name, 'semantic-safe') }}" == "true" ]] || \
             [[ "${{ contains(github.event.pull_request.title, 'refactor') }}" == "true" ]]; then
            echo "mode=BLOCKER" >> $GITHUB_OUTPUT
          else
            echo "mode=REPORTER" >> $GITHUB_OUTPUT
          fi

      - name: Run Blocker Check
        if: steps.mode.outputs.mode == 'BLOCKER'
        run: sfw check ./

      - name: Run Reporter Diff
        if: steps.mode.outputs.mode == 'REPORTER'
        run: |
          BASE_SHA=${{ github.event.pull_request.base.sha }}
          git diff --name-only "$BASE_SHA" HEAD -- '*.go' | while read file; do
            [ -f "$file" ] || continue
            git show "$BASE_SHA:$file" > old.go 2>/dev/null || touch old.go
            sfw diff old.go "$file" | jq .
            rm old.go
          done

Library Usage

import semanticfw "github.com/BlackVectorOps/semantic_firewall"

src := `package main
func Add(a, b int) int { return a + b }
`

results, err := semanticfw.FingerprintSource("example.go", src, semanticfw.DefaultLiteralPolicy)
if err != nil {
    log.Fatal(err)
}

for _, r := range results {
    fmt.Printf("%s: %s\n", r.FunctionName, r.Fingerprint)
}

Technical Deep Dive

Click to expand: SCEV & The Zipper
How It Works
  1. Parse — Load Go source into SSA (Static Single Assignment) form
  2. Canonicalize — Normalize variable names, branch ordering, loop structures
  3. Fingerprint — SHA-256 hash of the canonical IR

The result: semantically equivalent code produces identical fingerprints.

Scalar Evolution (SCEV) Analysis

Standard hashing is brittle—changing for i := 0 to for range breaks the hash. sfw solves this with an SCEV engine (scev.go) that algebraically solves loops:

  • Induction Variable Detection: Classifies loop variables as Add Recurrences: ${Start, +, Step}$
  • Trip Count Derivation: Proves that a range loop and an index loop iterate the same number of times
  • Loop Invariant Hoisting: Invariant expressions (e.g., len(s)) are virtually hoisted, so manual optimizations don't alter fingerprints

Result: Refactor loop syntax freely. If the math is the same, the fingerprint is the same.

The Semantic Zipper

When logic does change (e.g., architectural refactors), fingerprint comparison fails. The Zipper algorithm (zipper.go) takes two SSA graphs and "zips" them together starting from function parameters:

  • Anchor Alignment: Parameters and free variables establish deterministic entry points
  • Forward Propagation: Traverses use-def chains to match semantically equivalent nodes
  • Divergence Isolation: Reports exactly what changed (e.g., "added Call <net.Dial>, preserved all assignments")

Result: A semantic changelog that ignores renaming, reordering, and helper extraction.

Security Hardening
  • Cycle Detection: Prevents stack overflow DoS from malformed cyclic graphs
  • IR Injection Prevention: Sanitizes string literals and struct tags to prevent fake instruction injection
  • NaN-Safe Comparisons: Limits branch normalization to integer/string types to avoid floating-point edge cases

License

MIT License — See LICENSE for details.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultLiteralPolicy = LiteralPolicy{
	AbstractControlFlowComparisons: true,
	KeepSmallIntegerIndices:        true,
	KeepReturnStatusValues:         true,

	KeepStringLiterals: false,
	SmallIntMin:        -16,
	SmallIntMax:        16,
	AbstractOtherTypes: true,
}

DefaultLiteralPolicy represents the standard policy for fingerprinting; it preserves small integers used for indexing and status codes while masking magic numbers and large constants.

View Source
var KeepAllLiteralsPolicy = LiteralPolicy{
	AbstractControlFlowComparisons: false,
	KeepSmallIntegerIndices:        true,
	KeepReturnStatusValues:         true,
	KeepStringLiterals:             true,
	SmallIntMin:                    math.MinInt64,
	SmallIntMax:                    math.MaxInt64,
	AbstractOtherTypes:             false,
}

KeepAllLiteralsPolicy is designed for testing or exact matching by disabling most abstractions and expanding the "small" integer range to the full int64 spectrum.

Functions

func AnalyzeSCEV

func AnalyzeSCEV(info *LoopInfo)

AnalyzeSCEV is the main entry point for SCEV analysis on a LoopInfo.

func BuildSSAFromPackages

func BuildSSAFromPackages(initialPkgs []*packages.Package) (*ssa.Program, *ssa.Package, error)

Constructs the Static Single Assignment form from loaded Go packages. Provides the complete program and the target package for analysis.

func ReleaseCanonicalizer

func ReleaseCanonicalizer(c *Canonicalizer)

Types

type Canonicalizer

type Canonicalizer struct {
	Policy     LiteralPolicy
	StrictMode bool
	// contains filtered or unexported fields
}

Canonicalizer transforms an SSA function into a deterministic string representation.

func AcquireCanonicalizer

func AcquireCanonicalizer(policy LiteralPolicy) *Canonicalizer

func NewCanonicalizer

func NewCanonicalizer(policy LiteralPolicy) *Canonicalizer

func (*Canonicalizer) ApplyVirtualControlFlowFromState

func (c *Canonicalizer) ApplyVirtualControlFlowFromState(swappedBlocks map[*ssa.BasicBlock]bool, virtualBinOps map[*ssa.BinOp]token.Token)

func (*Canonicalizer) CanonicalizeFunction

func (c *Canonicalizer) CanonicalizeFunction(fn *ssa.Function) string

type FingerprintResult

type FingerprintResult struct {
	FunctionName string
	Fingerprint  string
	CanonicalIR  string
	Pos          token.Pos
	Line         int
	Filename     string
	// contains filtered or unexported fields
}

Encapsulates the output of the semantic fingerprinting process for a function.

func FingerprintPackages

func FingerprintPackages(initialPkgs []*packages.Package, policy LiteralPolicy, strictMode bool) ([]FingerprintResult, error)

FingerprintPackages iterates over loaded packages to construct SSA and generate results.

func FingerprintSource

func FingerprintSource(filename string, src string, policy LiteralPolicy) ([]FingerprintResult, error)

FingerprintSource analyzes a single Go source file provided as a string. This is the primary entry point for verifying code snippets or patch hunks.

func FingerprintSourceAdvanced

func FingerprintSourceAdvanced(filename string, src string, policy LiteralPolicy, strictMode bool) ([]FingerprintResult, error)

FingerprintSourceAdvanced provides an extended interface for source analysis with strict mode control.

func GenerateFingerprint

func GenerateFingerprint(fn *ssa.Function, policy LiteralPolicy, strictMode bool) FingerprintResult

GenerateFingerprint generates the hash and canonical string representation for an SSA function. This function uses a pooled Canonicalizer to ensure high throughput and low allocation overhead.

func (FingerprintResult) GetSSAFunction added in v1.1.0

func (r FingerprintResult) GetSSAFunction() *ssa.Function

GetSSAFunction returns the underlying SSA function for advanced analysis workflows such as semantic diffing with the Zipper algorithm. Returns nil if not available.

type IVType

type IVType int
const (
	IVTypeUnknown    IVType = iota
	IVTypeBasic             // {S, +, C}
	IVTypeDerived           // Affine: A * IV + B
	IVTypeGeometric         // {S, *, C}
	IVTypePolynomial        // Step is another IV
)

type InductionVariable

type InductionVariable struct {
	Phi   *ssa.Phi
	Type  IVType
	Start SCEV // Value at iteration 0
	Step  SCEV // Update stride
}

InductionVariable describes a detected IV. Reference: Section 3.2 Classification Taxonomy.

type LiteralPolicy

type LiteralPolicy struct {
	AbstractControlFlowComparisons bool
	KeepSmallIntegerIndices        bool
	KeepReturnStatusValues         bool
	// FIX: Added flag to keep string literals.
	KeepStringLiterals bool
	SmallIntMin        int64
	SmallIntMax        int64
	AbstractOtherTypes bool
}

LiteralPolicy defines the configurable strategy for determining which literal values should be abstracted into placeholders during canonicalization. It allows fine grained control over integer abstraction in different contexts.

func (*LiteralPolicy) ShouldAbstract

func (p *LiteralPolicy) ShouldAbstract(c *ssa.Const, usageContext ssa.Instruction) bool

decides whether a given constant should be replaced by a generic placeholder. It analyzes the constant's type, value, and immediate usage context in the SSA graph.

type Loop

type Loop struct {
	Header *ssa.BasicBlock
	Latch  *ssa.BasicBlock // Primary source of the backedge

	// Blocks contains all basic blocks within the loop body.
	Blocks map[*ssa.BasicBlock]bool
	// Exits contains blocks inside the loop that have successors outside.
	Exits []*ssa.BasicBlock

	// Hierarchy
	Parent   *Loop
	Children []*Loop

	// Semantic Analysis (populated in scev.go)
	Inductions map[*ssa.Phi]*InductionVariable
	TripCount  SCEV // Symbolic expression
}

Loop represents a natural loop in the SSA graph. Reference: Section 2.3 Natural Loops.

func (*Loop) String

func (l *Loop) String() string

type LoopInfo

type LoopInfo struct {
	Function *ssa.Function
	Loops    []*Loop // Top-level loops (roots of the hierarchy)
	// Map from Header block to Loop object for O(1) lookup
	LoopMap map[*ssa.BasicBlock]*Loop
}

LoopInfo summarizes loop analysis for a single function.

func DetectLoops

func DetectLoops(fn *ssa.Function) *LoopInfo

DetectLoops reconstructs the loop hierarchy using dominance relations. Reference: Section 2.3.1 Algorithm: Detecting Natural Loops.

type Renamer

type Renamer func(ssa.Value) string

Renamer is a function that maps an SSA value to its canonical name. This is used to ensure deterministic output regardless of SSA register naming.

type SCEV

type SCEV interface {
	ssa.Value
	EvaluateAt(k *big.Int) *big.Int
	IsLoopInvariant(loop *Loop) bool
	String() string
	// StringWithRenamer returns a canonical string using the provided renamer
	// function to map SSA values to their canonical names (e.g., v0, v1).
	// This is critical for determinism: without it, raw SSA names (t0, t1)
	// would leak into fingerprints, breaking semantic equivalence.
	StringWithRenamer(r Renamer) string
}

SCEV represents a scalar expression.

type SCEVAddRec

type SCEVAddRec struct {
	Start SCEV
	Step  SCEV
	Loop  *Loop
}

SCEVAddRec represents an Add Recurrence: {Start, +, Step}_L Reference: Section 4.1 The Add Recurrence Abstraction.

func (*SCEVAddRec) EvaluateAt

func (s *SCEVAddRec) EvaluateAt(k *big.Int) *big.Int

func (*SCEVAddRec) IsLoopInvariant

func (s *SCEVAddRec) IsLoopInvariant(loop *Loop) bool

func (*SCEVAddRec) Name

func (s *SCEVAddRec) Name() string

ssa.Value Stubs

func (*SCEVAddRec) Parent

func (s *SCEVAddRec) Parent() *ssa.Function

func (*SCEVAddRec) Pos

func (s *SCEVAddRec) Pos() token.Pos

func (*SCEVAddRec) Referrers

func (s *SCEVAddRec) Referrers() *[]ssa.Instruction

func (*SCEVAddRec) String

func (s *SCEVAddRec) String() string

func (*SCEVAddRec) StringWithRenamer

func (s *SCEVAddRec) StringWithRenamer(r Renamer) string

func (*SCEVAddRec) Type

func (s *SCEVAddRec) Type() types.Type

type SCEVConstant

type SCEVConstant struct {
	Value *big.Int
}

SCEVConstant represents a literal integer constant.

func SCEVFromConst

func SCEVFromConst(c *ssa.Const) *SCEVConstant

func (*SCEVConstant) EvaluateAt

func (s *SCEVConstant) EvaluateAt(k *big.Int) *big.Int

func (*SCEVConstant) IsLoopInvariant

func (s *SCEVConstant) IsLoopInvariant(loop *Loop) bool

func (*SCEVConstant) Name

func (s *SCEVConstant) Name() string

ssa.Value Stubs

func (*SCEVConstant) Parent

func (s *SCEVConstant) Parent() *ssa.Function

func (*SCEVConstant) Pos

func (s *SCEVConstant) Pos() token.Pos

func (*SCEVConstant) Referrers

func (s *SCEVConstant) Referrers() *[]ssa.Instruction

func (*SCEVConstant) String

func (s *SCEVConstant) String() string

func (*SCEVConstant) StringWithRenamer

func (s *SCEVConstant) StringWithRenamer(r Renamer) string

func (*SCEVConstant) Type

func (s *SCEVConstant) Type() types.Type

type SCEVGenericExpr

type SCEVGenericExpr struct {
	Op token.Token
	X  SCEV
	Y  SCEV
}

SCEVGenericExpr represents binary operations like Add/Mul for formulas.

func (*SCEVGenericExpr) EvaluateAt

func (s *SCEVGenericExpr) EvaluateAt(k *big.Int) *big.Int

func (*SCEVGenericExpr) IsLoopInvariant

func (s *SCEVGenericExpr) IsLoopInvariant(loop *Loop) bool

func (*SCEVGenericExpr) Name

func (s *SCEVGenericExpr) Name() string

ssa.Value Stubs

func (*SCEVGenericExpr) Parent

func (s *SCEVGenericExpr) Parent() *ssa.Function

func (*SCEVGenericExpr) Pos

func (s *SCEVGenericExpr) Pos() token.Pos

func (*SCEVGenericExpr) Referrers

func (s *SCEVGenericExpr) Referrers() *[]ssa.Instruction

func (*SCEVGenericExpr) String

func (s *SCEVGenericExpr) String() string

func (*SCEVGenericExpr) StringWithRenamer

func (s *SCEVGenericExpr) StringWithRenamer(r Renamer) string

func (*SCEVGenericExpr) Type

func (s *SCEVGenericExpr) Type() types.Type

type SCEVUnknown

type SCEVUnknown struct {
	Value       ssa.Value
	IsInvariant bool // Explicitly tracks invariance relative to the analysis loop scope
}

SCEVUnknown represents a symbolic value (e.g., parameter or unanalyzable instr).

func (*SCEVUnknown) EvaluateAt

func (s *SCEVUnknown) EvaluateAt(k *big.Int) *big.Int

func (*SCEVUnknown) IsLoopInvariant

func (s *SCEVUnknown) IsLoopInvariant(loop *Loop) bool

func (*SCEVUnknown) Name

func (s *SCEVUnknown) Name() string

ssa.Value Stubs

func (*SCEVUnknown) Parent

func (s *SCEVUnknown) Parent() *ssa.Function

func (*SCEVUnknown) Pos

func (s *SCEVUnknown) Pos() token.Pos

func (*SCEVUnknown) Referrers

func (s *SCEVUnknown) Referrers() *[]ssa.Instruction

func (*SCEVUnknown) String

func (s *SCEVUnknown) String() string

func (*SCEVUnknown) StringWithRenamer

func (s *SCEVUnknown) StringWithRenamer(r Renamer) string

func (*SCEVUnknown) Type

func (s *SCEVUnknown) Type() types.Type

type Zipper added in v1.1.0

type Zipper struct {
	// contains filtered or unexported fields
}

Zipper implements the semantic delta analysis algorithm.

func NewZipper added in v1.1.0

func NewZipper(oldFn, newFn *ssa.Function, policy LiteralPolicy) (*Zipper, error)

NewZipper creates a new analysis session.

func (*Zipper) ComputeDiff added in v1.1.0

func (z *Zipper) ComputeDiff() (*ZipperArtifacts, error)

ComputeDiff executes the Zipper Algorithm Phases.

type ZipperArtifacts added in v1.1.0

type ZipperArtifacts struct {
	OldFunction  string
	NewFunction  string
	MatchedNodes int
	Added        []string
	Removed      []string
	Preserved    bool
}

ZipperArtifacts encapsulates the results of the semantic delta analysis.

Directories

Path Synopsis
cmd
sfw command
Package main provides the sfw CLI tool for semantic fingerprinting of Go source files.
Package main provides the sfw CLI tool for semantic fingerprinting of Go source files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL