parser

package
v0.0.0-...-fd949be Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2018 License: Apache-2.0 Imports: 12 Imported by: 0

README

esengine parser ES8(ES2017)

The parser package for esengine which handles lexing and parsing and generating abstract syntax trees for given

The reference used for the grammars is the one found here: https://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

When it comes to lexing and unicode characters the bits not covered in the ES8 Spec (Such as the entire set of Z classified characters for white space) are taken from the following source: http://www.fileformat.info/info/unicode

ECMAScript 8 Grammar

The grammar is ported from the ECMAScript specification to a YAML file which uses the following conventions.

<[A-Za-z]+> represents a non-terminal symbol in the grammar. Non-terminal symbols can take the following form:

<\w+>
<\w+>:
  params: [\w+, ...]
  rhs: [[{Terminal|NonTerminal}, ...], ...]

The following provide the representations of a non-terminal symbol in the right hand side of a production:

<\w+>
<\w+>:
  params:
    passthrough: [{Param}, ...]
    optional: true|false
    conditions: [{Param}, ...]

Terminal symbols can also have conditions such as [~Yield] yield in the ECMAScript specification which translates to yield: { conditions: [~Yield] } in our YAML representation of the grammar where [] represents a list in this case.

<*[A-Za-z]+*> represent custom operations that should occur when a certain position in a production is reached. The <*Lookahead*> operation should take an exclude parameter with a list of terminal or non-terminal symbols that the next token cannot be.

The <*Conditional*> operation represents a condition applied to a right hand side alternative consisting of multiple terminal and non-terminal symbols. The condition operation takes a params: { conditions: [...]} mapping and a parts: [].

<![A-Za-z]+!> represents a placeholder where anything but one or more of the given terminal symbol will follow.

Documentation

Index

Constants

View Source
const (
	MaxPunctuatorLength   = 4
	MaxReservedWordLength = 10
)

Variables

View Source
var (
	// ErrInvalidUnicodeSourceText provides the error when source text
	// contains characters which are not valid unicode code points.
	ErrInvalidUnicodeSourceText = errors.New("Non-unicode source text was present in the provided source text")
)

Functions

func DecodeUnicodeEscSeq

func DecodeUnicodeEscSeq(input []rune) []rune

DecodeUnicodeEscSeq deals with decoding a unicode escape sequence into the actual represented code points.

func FutureReservedWords

func FutureReservedWords() map[string]rune

func IsDecimalDigit

func IsDecimalDigit(c rune, nonZero bool) bool

IsDecimalDigit determines whether the provided rune is a decimal digit unicode code point or not. If nonZero is true, then excludes 0, otherwise it is included.

func IsDoubleStringCharacter

func IsDoubleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsDoubleStringCharacter determines whether the given rune is a valid double quotes string character.

func IsEscapeCharacter

func IsEscapeCharacter(c rune) bool

IsEscapeCharacter determines whether the provided code point is an escape character as per the ECMAScript 8 specification.

func IsFirstRegExpChar

func IsFirstRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsFirstRegExpChar determines whether the given code point is a valid regular expression first character.

func IsHexDigit

func IsHexDigit(c rune) bool

IsHexDigit determines whether the provided code point is a hexadecimal digit or not.

func IsHexEscapeSequence

func IsHexEscapeSequence(pos int, buf []rune) (bool, int)

IsHexEscapeSequence determines whether the next set of characters from the given position is that of a valid hexedecimal escape sequence.

func IsIDContinue

func IsIDContinue(c rune) bool

func IsIDStart

func IsIDStart(c rune) bool

func IsIdentifierPart

func IsIdentifierPart(pos int, buf []rune) (bool, int)

func IsLineContinuation

func IsLineContinuation(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsLineContinuation determines whether the next set of characters from the given position is that of a valid line continuation.

func IsLineTerminator

func IsLineTerminator(c rune, charMap map[string]map[rune]rune) bool

func IsLineTerminatorSequence

func IsLineTerminatorSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsLineTerminatorSequence determines whether the next set of characters form a line terminator sequence.

func IsNonEscapeCharacter

func IsNonEscapeCharacter(c rune, charMap map[string]map[rune]rune) bool

IsNonEscapeCharacter determines whether the provided code point is a non-escape character.

func IsOctalDigit

func IsOctalDigit(c rune) bool

IsOctalDigit determines whether the provided rune is a octal digit unicode code point or not.

func IsRegExpBackSlashSeq

func IsRegExpBackSlashSeq(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpBackSlashSeq determines whether the provided code point is the beginning of regular expression back slash sequence.

func IsRegExpBody

func IsRegExpBody(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpBody determines whether the next sequence of code points make up a regular expression literal body.

func IsRegExpChar

func IsRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpChar determines whether the given code point is a valid regular expression character.

func IsRegExpClass

func IsRegExpClass(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClass determines whether the next set of code points makes up a valid regular expression class.

func IsRegExpClassChar

func IsRegExpClassChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClassChar determines whether the next sequence of code points make up a valid regular expression character.

func IsRegExpClassChars

func IsRegExpClassChars(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClassChars determines whether the next code point or set of code points represents a valid regular expression class characters sequence.

func IsRegExpFlags

func IsRegExpFlags(pos int, buf []rune) (bool, int)

IsRegExpFlags determines whether the next sequence of code points make up a regular expression literal flags section. This takes a naive approach in assuming the flags section has ended as soon as a non-identifier part code point is met.

func IsRegExpNonTerminator

func IsRegExpNonTerminator(c rune, charMap map[string]map[rune]rune) bool

IsRegExpNonTerminator determines whether the provided code point is a valid regular expression non-terminator.

func IsSingleEscapeCharacter

func IsSingleEscapeCharacter(c rune) bool

IsSingleEscapeCharacter determines whether the provided code point is a single escape character.

func IsSingleStringCharacter

func IsSingleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsSingleStringCharacter determines whether the given rune is valid single quotes string character.

func IsStartOfIdentifier

func IsStartOfIdentifier(c rune, pos int, buf []rune) (bool, int)

func IsStringEscapeSequence

func IsStringEscapeSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsStringEscapeSequence determines whether the next set of characters from the given position is that of a valid string escape sequence.

func IsTemplateChar

func IsTemplateChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsTemplateChar determines whether the next code point or sequence of code points makes up a valid template character.

func IsUnicodeEspaceSequence

func IsUnicodeEspaceSequence(pos int, buf []rune) (bool, int)

func Keywords

func Keywords() map[string]rune

func LineTerminators

func LineTerminators() map[rune]rune

func ProcessReservedWord

func ProcessReservedWord(identTkn *Token, kwMap map[string]rune, frwMap map[string]rune)

ProcessReservedWord takes the value of an identifier token and transforms it into a reserved word token if there is a reserved word match.

func Punctuators

func Punctuators() map[string]rune

func ReadTemplateChars

func ReadTemplateChars(pos int, buf []rune, charMap map[string]map[rune]rune) int

ReadTemplateChars parses the next sequence of code points from the specified position that makes up a set of valid TemplateCharacters.

func WhiteSpaceChars

func WhiteSpaceChars() map[rune]rune

Types

type AbstractModuleRecord

type AbstractModuleRecord struct {
	ModuleRecord
	Realm       *RealmRecord
	Environment *LexicalEnvironment
	Namespace   map[string]interface{}
	Evaluated   bool
	HostDefined interface{}
}

type CommentType

type CommentType int

CommentType provides a type alias to distinguish types of comment tokens.

const (
	NonComment CommentType
	SingleLineComment
	MultiLineComment
)

func IsStartOfComment

func IsStartOfComment(c rune, next rune) (bool, CommentType)

IsStartOfComment determines whether the given code point is the beginning of a comment.

type ExportEntry

type ExportEntry struct {
	ExportName    string
	ModuleRequest string
	ImportName    string
	LocalName     string
}

type ImportEntry

type ImportEntry struct {
	ModuleRequest string
	ImportName    string
	LocalName     string
}

type Lexer

type Lexer interface {
	Tokenise(input []rune, goal LexicalGoalSymbol) ([]*Token, error)
	TokeniseUpToType(input []rune, tokenType string, goal LexicalGoalSymbol) ([]*Token, error, int)
	TokeniseUpToToken(input []rune, tokenType string, tokenValue string, goal LexicalGoalSymbol) ([]*Token, error, int)
	Reset()
}

Lexer provides the base definition for a service which deals with tokenising an input slice of code points.

func NewLexer

func NewLexer() Lexer

NewLexer creates a new instance of the default lexer service.

type LexicalEnvironment

type LexicalEnvironment struct {
}

type LexicalGoalSymbol

type LexicalGoalSymbol int

LexicalGoalSymbol provides a type alias to distinguish between lexical goals during the process of tokenisation.

const (
	InputElementDiv LexicalGoalSymbol
	InputElementRegExp
	InputElementRegExpOrTemplateTail
	InputElementTemplateTail
)

type ModuleRecord

type ModuleRecord interface {
	GetExportedNames([]ModuleRecord) []string
	ResolveExport(string, []*ResolveExportEntry) (*ResolvedBindingRecord, string)
	ModuleDeclarationInstantiation()
	ModuleEvaluation()
}

type ParseNode

type ParseNode struct {
	// The symbol of the current parse node.
	// For non-terminals, the goal symbol and for terminals
	// the terminal symbol value.
	Symbol Symbol
	// Terminal determines whether or not the current parse node
	// is a terminal symbol.
	Terminal bool
	// Children represents from left to right,
	// the child nodes of our current root node.
	Children []*ParseNode
}

ParseNode represents a symbol in the parse tree.

type ParseStack

type ParseStack []Symbol

ParseStack provides a stack data structure used in parsing ECMAScript.

This is non-threadsafe but that shouldn't be a problem as will only be used for the sequrntial process of parsing the ECMAScript (derivative) grammar.

func (ParseStack) Pop

func (s ParseStack) Pop() (ParseStack, Symbol)

Pop takes a token from the top of the stack.

func (ParseStack) Push

func (s ParseStack) Push(v Symbol) ParseStack

Push adds a token to the top of the stack.

type Parser

type Parser interface {
	ParseScript([]byte, *RealmRecord, interface{}) *ScriptRecord
	ParseModule([]byte, *RealmRecord, interface{}) (ModuleRecord, error)
}

Parser provides the base definition for a service that deals with parsing input into an AST.

func NewParser

func NewParser(lexer Lexer) Parser

NewParser creates a new instance of the default implementation of the parser.

type RealmRecord

type RealmRecord struct {
	GlobalObject map[string]interface{}
}

RealmRecord provides the realm a script has been created in for script evaluation.

type ResolveExportEntry

type ResolveExportEntry struct {
	Module     ModuleRecord
	ExportName string
}

type ResolvedBindingRecord

type ResolvedBindingRecord struct {
	Module      ModuleRecord
	BindingName string
}

type ScriptRecord

type ScriptRecord struct {
	ParseTree *ParseNode
	Errors    []error
	Realm     *RealmRecord
}

ScriptRecord provides the record which encapsulates information about the script being evaluated.

type SourceTextModuleRecord

type SourceTextModuleRecord struct {
	*AbstractModuleRecord
	ParseTree             *ParseNode
	RequestedModules      []string
	ImportEntries         []*ImportEntry
	LocalExportEntries    []*ExportEntry
	IndirectExportEntries []*ExportEntry
	StarExportEntries     []*ExportEntry
}

func (*SourceTextModuleRecord) GetExportedNames

func (r *SourceTextModuleRecord) GetExportedNames(exportStarSet []ModuleRecord) []string

GetExportedNames retrieves the exported names of the module given the provided export star set.

func (*SourceTextModuleRecord) ModuleDeclarationInstantiation

func (r *SourceTextModuleRecord) ModuleDeclarationInstantiation()

func (*SourceTextModuleRecord) ModuleEvaluation

func (r *SourceTextModuleRecord) ModuleEvaluation()

func (*SourceTextModuleRecord) ResolveExport

func (r *SourceTextModuleRecord) ResolveExport(exportName string, resolveSet []*ResolveExportEntry) (*ResolvedBindingRecord, string)

type Symbol

type Symbol int

type Token

type Token struct {
	Name  string
	Value string
	Pos   int
}

Token holds a token produced in the token table of the lexical analysis stage.

func NextToken

func NextToken(pos int, buf []rune, charMap map[string]map[rune]rune,
	pMap map[string]rune, kwMap map[string]rune, frwMap map[string]rune, goal LexicalGoalSymbol) (*Token, int, error)

NextToken attempts to parse the next set of code points as a valid token in the ECMAScript lexical grammar.

func ProcessBinaryIntegerLiteral

func ProcessBinaryIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessBinaryIntegerLiteral deals with extracting a binary integer literal from the provided input.

func ProcessComment

func ProcessComment(pos int, buf []rune, charMap map[string]map[rune]rune, commentType CommentType) (*Token, int, error)

ProcessComment deals with creating a token for a comment that starts from position of the provided slice of code points. pos is including the two opening terminals.

func ProcessCommonToken

func ProcessCommonToken(pos int, buf []rune, charMap map[string]map[rune]rune,
	kwMap map[string]rune, frwMap map[string]rune, pMap map[string]rune) (*Token, int, error)

ProcessCommonToken deals with attempting to parse the next set of sequence points as a common token.

func ProcessDecimalLiteral

func ProcessDecimalLiteral(pos int, buf []rune) (*Token, int, error)

ProcessDecimalLiteral attempts to process a decimal literal value.

func ProcessDivPunctuator

func ProcessDivPunctuator(pos int, buf []rune) (*Token, int, error)

func ProcessHexIntegerLiteral

func ProcessHexIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessHexIntegerLiteral attempts to extract a hexadecimal integer literal value to be added to the lexical token table from the provided input.

func ProcessIdentifier

func ProcessIdentifier(
	identifier []rune, startPos int, fromPos int, buf []rune,
	kwMap map[string]rune, frwMap map[string]rune,
) (*Token, int, error)

ProcessIdentifier deals with attempting to extract the next sequence of code points as an identifier token.

func ProcessLineTerminator

func ProcessLineTerminator(pos int, buf []rune) (*Token, int, error)

ProcessLineTerminator reads line terminator code points into the lexical analysis token table.

func ProcessNumericLiteral

func ProcessNumericLiteral(pos int, buf []rune) (*Token, int, error)

ProcessNumericLiteral attempts to process the next set of code points as a numeric literal token.

func ProcessOctalIntegerLiteral

func ProcessOctalIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessOctalIntegerLiteral deals with extracting an octal integer literal from the input rune slice.

func ProcessPunctuator

func ProcessPunctuator(pos int, buf []rune, pMap map[string]rune) (*Token, int, error)

func ProcessRegExpLiteral

func ProcessRegExpLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessRegExpLiteral attempts to process the next sequence of code points as a regular expression literal.

func ProcessRightBracePunctuator

func ProcessRightBracePunctuator(pos int, buf []rune) (*Token, int, error)

func ProcessStringLiteral

func ProcessStringLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessStringLiteral attempts to process code points from the position provided in the input buffer onwards.

func ProcessTemplateLiteral

func ProcessTemplateLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessTemplateLiteral attempts to parse the next sequence of code points from the provided position as a template literal.

func ProcessWhiteSpace

func ProcessWhiteSpace(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessWhiteSpace attempts to read the current code point as white space token.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL