parser

package

v0.0.0-...-fd949be Latest Latest Go to latest Published: Jan 8, 2018 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/freshwebio/esengine

Links

Open Source Insights

README ¶

esengine parser ES8(ES2017)

The parser package for esengine which handles lexing and parsing and generating abstract syntax trees for given

The reference used for the grammars is the one found here: https://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

When it comes to lexing and unicode characters the bits not covered in the ES8 Spec (Such as the entire set of Z classified characters for white space) are taken from the following source: http://www.fileformat.info/info/unicode

ECMAScript 8 Grammar

The grammar is ported from the ECMAScript specification to a YAML file which uses the following conventions.

<[A-Za-z]+> represents a non-terminal symbol in the grammar. Non-terminal symbols can take the following form:

<\w+>

<\w+>:
  params: [\w+, ...]
  rhs: [[{Terminal|NonTerminal}, ...], ...]

The following provide the representations of a non-terminal symbol in the right hand side of a production:

<\w+>

<\w+>:
  params:
    passthrough: [{Param}, ...]
    optional: true|false
    conditions: [{Param}, ...]

Terminal symbols can also have conditions such as [~Yield] yield in the ECMAScript specification which translates to yield: { conditions: [~Yield] } in our YAML representation of the grammar where [] represents a list in this case.

<*[A-Za-z]+*> represent custom operations that should occur when a certain position in a production is reached. The <*Lookahead*> operation should take an exclude parameter with a list of terminal or non-terminal symbols that the next token cannot be.

The <*Conditional*> operation represents a condition applied to a right hand side alternative consisting of multiple terminal and non-terminal symbols. The condition operation takes a params: { conditions: [...]} mapping and a parts: [].

<![A-Za-z]+!> represents a placeholder where anything but one or more of the given terminal symbol will follow.

Documentation ¶

Index ¶

Constants
Variables
func DecodeUnicodeEscSeq(input []rune) []rune
func FutureReservedWords() map[string]rune
func IsDecimalDigit(c rune, nonZero bool) bool
func IsDoubleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)
func IsEscapeCharacter(c rune) bool
func IsFirstRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsHexDigit(c rune) bool
func IsHexEscapeSequence(pos int, buf []rune) (bool, int)
func IsIDContinue(c rune) bool
func IsIDStart(c rune) bool
func IsIdentifierPart(pos int, buf []rune) (bool, int)
func IsLineContinuation(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)
func IsLineTerminator(c rune, charMap map[string]map[rune]rune) bool
func IsLineTerminatorSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)
func IsNonEscapeCharacter(c rune, charMap map[string]map[rune]rune) bool
func IsOctalDigit(c rune) bool
func IsRegExpBackSlashSeq(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpBody(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpClass(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpClassChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpClassChars(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsRegExpFlags(pos int, buf []rune) (bool, int)
func IsRegExpNonTerminator(c rune, charMap map[string]map[rune]rune) bool
func IsSingleEscapeCharacter(c rune) bool
func IsSingleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)
func IsStartOfIdentifier(c rune, pos int, buf []rune) (bool, int)
func IsStringEscapeSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)
func IsTemplateChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)
func IsUnicodeEspaceSequence(pos int, buf []rune) (bool, int)
func Keywords() map[string]rune
func LineTerminators() map[rune]rune
func ProcessReservedWord(identTkn *Token, kwMap map[string]rune, frwMap map[string]rune)
func Punctuators() map[string]rune
func ReadTemplateChars(pos int, buf []rune, charMap map[string]map[rune]rune) int
func WhiteSpaceChars() map[rune]rune
type AbstractModuleRecord
type CommentType
- func IsStartOfComment(c rune, next rune) (bool, CommentType)
type ExportEntry
type ImportEntry
type Lexer
- func NewLexer() Lexer
type LexicalEnvironment
type LexicalGoalSymbol
type ModuleRecord
type ParseNode
type ParseStack
- func (s ParseStack) Pop() (ParseStack, Symbol)
- func (s ParseStack) Push(v Symbol) ParseStack
type Parser
- func NewParser(lexer Lexer) Parser
type RealmRecord
type ResolveExportEntry
type ResolvedBindingRecord
type ScriptRecord
type SourceTextModuleRecord
- func (r *SourceTextModuleRecord) GetExportedNames(exportStarSet []ModuleRecord) []string
- func (r *SourceTextModuleRecord) ModuleDeclarationInstantiation()
- func (r *SourceTextModuleRecord) ModuleEvaluation()
- func (r *SourceTextModuleRecord) ResolveExport(exportName string, resolveSet []*ResolveExportEntry) (*ResolvedBindingRecord, string)
type Symbol
type Token
- func NextToken(pos int, buf []rune, charMap map[string]map[rune]rune, pMap map[string]rune, ...) (*Token, int, error)
- func ProcessBinaryIntegerLiteral(pos int, buf []rune) (*Token, int, error)
- func ProcessComment(pos int, buf []rune, charMap map[string]map[rune]rune, commentType CommentType) (*Token, int, error)
- func ProcessCommonToken(pos int, buf []rune, charMap map[string]map[rune]rune, kwMap map[string]rune, ...) (*Token, int, error)
- func ProcessDecimalLiteral(pos int, buf []rune) (*Token, int, error)
- func ProcessDivPunctuator(pos int, buf []rune) (*Token, int, error)
- func ProcessHexIntegerLiteral(pos int, buf []rune) (*Token, int, error)
- func ProcessIdentifier(identifier []rune, startPos int, fromPos int, buf []rune, ...) (*Token, int, error)
- func ProcessLineTerminator(pos int, buf []rune) (*Token, int, error)
- func ProcessNumericLiteral(pos int, buf []rune) (*Token, int, error)
- func ProcessOctalIntegerLiteral(pos int, buf []rune) (*Token, int, error)
- func ProcessPunctuator(pos int, buf []rune, pMap map[string]rune) (*Token, int, error)
- func ProcessRegExpLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)
- func ProcessRightBracePunctuator(pos int, buf []rune) (*Token, int, error)
- func ProcessStringLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)
- func ProcessTemplateLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)
- func ProcessWhiteSpace(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

Constants ¶

View Source

const (
	MaxPunctuatorLength   = 4
	MaxReservedWordLength = 10
)

Variables ¶

View Source

var (
	// ErrInvalidUnicodeSourceText provides the error when source text
	// contains characters which are not valid unicode code points.
	ErrInvalidUnicodeSourceText = errors.New("Non-unicode source text was present in the provided source text")
)

Functions ¶

func DecodeUnicodeEscSeq ¶

func DecodeUnicodeEscSeq(input []rune) []rune

DecodeUnicodeEscSeq deals with decoding a unicode escape sequence into the actual represented code points.

func FutureReservedWords ¶

func FutureReservedWords() map[string]rune

func IsDecimalDigit ¶

func IsDecimalDigit(c rune, nonZero bool) bool

IsDecimalDigit determines whether the provided rune is a decimal digit unicode code point or not. If nonZero is true, then excludes 0, otherwise it is included.

func IsDoubleStringCharacter ¶

func IsDoubleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsDoubleStringCharacter determines whether the given rune is a valid double quotes string character.

func IsEscapeCharacter ¶

func IsEscapeCharacter(c rune) bool

IsEscapeCharacter determines whether the provided code point is an escape character as per the ECMAScript 8 specification.

func IsFirstRegExpChar ¶

func IsFirstRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsFirstRegExpChar determines whether the given code point is a valid regular expression first character.

func IsHexDigit ¶

func IsHexDigit(c rune) bool

IsHexDigit determines whether the provided code point is a hexadecimal digit or not.

func IsHexEscapeSequence ¶

func IsHexEscapeSequence(pos int, buf []rune) (bool, int)

IsHexEscapeSequence determines whether the next set of characters from the given position is that of a valid hexedecimal escape sequence.

func IsIDContinue ¶

func IsIDContinue(c rune) bool

func IsIDStart ¶

func IsIDStart(c rune) bool

func IsIdentifierPart ¶

func IsIdentifierPart(pos int, buf []rune) (bool, int)

func IsLineContinuation ¶

func IsLineContinuation(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsLineContinuation determines whether the next set of characters from the given position is that of a valid line continuation.

func IsLineTerminator ¶

func IsLineTerminator(c rune, charMap map[string]map[rune]rune) bool

func IsLineTerminatorSequence ¶

func IsLineTerminatorSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsLineTerminatorSequence determines whether the next set of characters form a line terminator sequence.

func IsNonEscapeCharacter ¶

func IsNonEscapeCharacter(c rune, charMap map[string]map[rune]rune) bool

IsNonEscapeCharacter determines whether the provided code point is a non-escape character.

func IsOctalDigit ¶

func IsOctalDigit(c rune) bool

IsOctalDigit determines whether the provided rune is a octal digit unicode code point or not.

func IsRegExpBackSlashSeq ¶

func IsRegExpBackSlashSeq(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpBackSlashSeq determines whether the provided code point is the beginning of regular expression back slash sequence.

func IsRegExpBody ¶

func IsRegExpBody(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpBody determines whether the next sequence of code points make up a regular expression literal body.

func IsRegExpChar ¶

func IsRegExpChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpChar determines whether the given code point is a valid regular expression character.

func IsRegExpClass ¶

func IsRegExpClass(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClass determines whether the next set of code points makes up a valid regular expression class.

func IsRegExpClassChar ¶

func IsRegExpClassChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClassChar determines whether the next sequence of code points make up a valid regular expression character.

func IsRegExpClassChars ¶

func IsRegExpClassChars(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsRegExpClassChars determines whether the next code point or set of code points represents a valid regular expression class characters sequence.

func IsRegExpFlags ¶

func IsRegExpFlags(pos int, buf []rune) (bool, int)

IsRegExpFlags determines whether the next sequence of code points make up a regular expression literal flags section. This takes a naive approach in assuming the flags section has ended as soon as a non-identifier part code point is met.

func IsRegExpNonTerminator ¶

func IsRegExpNonTerminator(c rune, charMap map[string]map[rune]rune) bool

IsRegExpNonTerminator determines whether the provided code point is a valid regular expression non-terminator.

func IsSingleEscapeCharacter ¶

func IsSingleEscapeCharacter(c rune) bool

IsSingleEscapeCharacter determines whether the provided code point is a single escape character.

func IsSingleStringCharacter ¶

func IsSingleStringCharacter(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsSingleStringCharacter determines whether the given rune is valid single quotes string character.

func IsStartOfIdentifier ¶

func IsStartOfIdentifier(c rune, pos int, buf []rune) (bool, int)

func IsStringEscapeSequence ¶

func IsStringEscapeSequence(pos int, buf []rune, charMap map[string]map[rune]rune) (int, bool)

IsStringEscapeSequence determines whether the next set of characters from the given position is that of a valid string escape sequence.

func IsTemplateChar ¶

func IsTemplateChar(pos int, buf []rune, charMap map[string]map[rune]rune) (bool, int)

IsTemplateChar determines whether the next code point or sequence of code points makes up a valid template character.

func IsUnicodeEspaceSequence ¶

func IsUnicodeEspaceSequence(pos int, buf []rune) (bool, int)

func Keywords ¶

func Keywords() map[string]rune

func LineTerminators ¶

func LineTerminators() map[rune]rune

func ProcessReservedWord ¶

func ProcessReservedWord(identTkn *Token, kwMap map[string]rune, frwMap map[string]rune)

ProcessReservedWord takes the value of an identifier token and transforms it into a reserved word token if there is a reserved word match.

func Punctuators ¶

func Punctuators() map[string]rune

func ReadTemplateChars ¶

func ReadTemplateChars(pos int, buf []rune, charMap map[string]map[rune]rune) int

ReadTemplateChars parses the next sequence of code points from the specified position that makes up a set of valid TemplateCharacters.

func WhiteSpaceChars ¶

func WhiteSpaceChars() map[rune]rune

Types ¶

type AbstractModuleRecord ¶

type AbstractModuleRecord struct {
	ModuleRecord
	Realm       *RealmRecord
	Environment *LexicalEnvironment
	Namespace   map[string]interface{}
	Evaluated   bool
	HostDefined interface{}
}

type CommentType ¶

type CommentType int

CommentType provides a type alias to distinguish types of comment tokens.

const (
	NonComment CommentType
	SingleLineComment
	MultiLineComment
)

func IsStartOfComment ¶

func IsStartOfComment(c rune, next rune) (bool, CommentType)

IsStartOfComment determines whether the given code point is the beginning of a comment.

type ExportEntry ¶

type ExportEntry struct {
	ExportName    string
	ModuleRequest string
	ImportName    string
	LocalName     string
}

type ImportEntry ¶

type ImportEntry struct {
	ModuleRequest string
	ImportName    string
	LocalName     string
}

type Lexer ¶

type Lexer interface {
	Tokenise(input []rune, goal LexicalGoalSymbol) ([]*Token, error)
	TokeniseUpToType(input []rune, tokenType string, goal LexicalGoalSymbol) ([]*Token, error, int)
	TokeniseUpToToken(input []rune, tokenType string, tokenValue string, goal LexicalGoalSymbol) ([]*Token, error, int)
	Reset()
}

Lexer provides the base definition for a service which deals with tokenising an input slice of code points.

func NewLexer ¶

func NewLexer() Lexer

NewLexer creates a new instance of the default lexer service.

type LexicalEnvironment ¶

type LexicalEnvironment struct {
}

type LexicalGoalSymbol ¶

type LexicalGoalSymbol int

LexicalGoalSymbol provides a type alias to distinguish between lexical goals during the process of tokenisation.

const (
	InputElementDiv LexicalGoalSymbol
	InputElementRegExp
	InputElementRegExpOrTemplateTail
	InputElementTemplateTail
)

type ModuleRecord ¶

type ModuleRecord interface {
	GetExportedNames([]ModuleRecord) []string
	ResolveExport(string, []*ResolveExportEntry) (*ResolvedBindingRecord, string)
	ModuleDeclarationInstantiation()
	ModuleEvaluation()
}

type ParseNode ¶

type ParseNode struct {
	// The symbol of the current parse node.
	// For non-terminals, the goal symbol and for terminals
	// the terminal symbol value.
	Symbol Symbol
	// Terminal determines whether or not the current parse node
	// is a terminal symbol.
	Terminal bool
	// Children represents from left to right,
	// the child nodes of our current root node.
	Children []*ParseNode
}

ParseNode represents a symbol in the parse tree.

type ParseStack ¶

type ParseStack []Symbol

ParseStack provides a stack data structure used in parsing ECMAScript.

This is non-threadsafe but that shouldn't be a problem as will only be used for the sequrntial process of parsing the ECMAScript (derivative) grammar.

func (ParseStack) Pop ¶

func (s ParseStack) Pop() (ParseStack, Symbol)

Pop takes a token from the top of the stack.

func (ParseStack) Push ¶

func (s ParseStack) Push(v Symbol) ParseStack

Push adds a token to the top of the stack.

type Parser ¶

type Parser interface {
	ParseScript([]byte, *RealmRecord, interface{}) *ScriptRecord
	ParseModule([]byte, *RealmRecord, interface{}) (ModuleRecord, error)
}

Parser provides the base definition for a service that deals with parsing input into an AST.

func NewParser ¶

func NewParser(lexer Lexer) Parser

NewParser creates a new instance of the default implementation of the parser.

type RealmRecord ¶

type RealmRecord struct {
	GlobalObject map[string]interface{}
}

RealmRecord provides the realm a script has been created in for script evaluation.

type ResolveExportEntry ¶

type ResolveExportEntry struct {
	Module     ModuleRecord
	ExportName string
}

type ResolvedBindingRecord ¶

type ResolvedBindingRecord struct {
	Module      ModuleRecord
	BindingName string
}

type ScriptRecord ¶

type ScriptRecord struct {
	ParseTree *ParseNode
	Errors    []error
	Realm     *RealmRecord
}

ScriptRecord provides the record which encapsulates information about the script being evaluated.

type SourceTextModuleRecord ¶

type SourceTextModuleRecord struct {
	*AbstractModuleRecord
	ParseTree             *ParseNode
	RequestedModules      []string
	ImportEntries         []*ImportEntry
	LocalExportEntries    []*ExportEntry
	IndirectExportEntries []*ExportEntry
	StarExportEntries     []*ExportEntry
}

func (*SourceTextModuleRecord) GetExportedNames ¶

func (r *SourceTextModuleRecord) GetExportedNames(exportStarSet []ModuleRecord) []string

GetExportedNames retrieves the exported names of the module given the provided export star set.

func (*SourceTextModuleRecord) ModuleDeclarationInstantiation ¶

func (r *SourceTextModuleRecord) ModuleDeclarationInstantiation()

func (*SourceTextModuleRecord) ModuleEvaluation ¶

func (r *SourceTextModuleRecord) ModuleEvaluation()

func (*SourceTextModuleRecord) ResolveExport ¶

func (r *SourceTextModuleRecord) ResolveExport(exportName string, resolveSet []*ResolveExportEntry) (*ResolvedBindingRecord, string)

type Symbol ¶

type Symbol int

type Token ¶

type Token struct {
	Name  string
	Value string
	Pos   int
}

Token holds a token produced in the token table of the lexical analysis stage.

func NextToken ¶

func NextToken(pos int, buf []rune, charMap map[string]map[rune]rune,
	pMap map[string]rune, kwMap map[string]rune, frwMap map[string]rune, goal LexicalGoalSymbol) (*Token, int, error)

NextToken attempts to parse the next set of code points as a valid token in the ECMAScript lexical grammar.

func ProcessBinaryIntegerLiteral ¶

func ProcessBinaryIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessBinaryIntegerLiteral deals with extracting a binary integer literal from the provided input.

func ProcessComment ¶

func ProcessComment(pos int, buf []rune, charMap map[string]map[rune]rune, commentType CommentType) (*Token, int, error)

ProcessComment deals with creating a token for a comment that starts from position of the provided slice of code points. pos is including the two opening terminals.

func ProcessCommonToken ¶

func ProcessCommonToken(pos int, buf []rune, charMap map[string]map[rune]rune,
	kwMap map[string]rune, frwMap map[string]rune, pMap map[string]rune) (*Token, int, error)

ProcessCommonToken deals with attempting to parse the next set of sequence points as a common token.

func ProcessDecimalLiteral ¶

func ProcessDecimalLiteral(pos int, buf []rune) (*Token, int, error)

ProcessDecimalLiteral attempts to process a decimal literal value.

func ProcessDivPunctuator ¶

func ProcessDivPunctuator(pos int, buf []rune) (*Token, int, error)

func ProcessHexIntegerLiteral ¶

func ProcessHexIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessHexIntegerLiteral attempts to extract a hexadecimal integer literal value to be added to the lexical token table from the provided input.

func ProcessIdentifier ¶

func ProcessIdentifier(
	identifier []rune, startPos int, fromPos int, buf []rune,
	kwMap map[string]rune, frwMap map[string]rune,
) (*Token, int, error)

ProcessIdentifier deals with attempting to extract the next sequence of code points as an identifier token.

func ProcessLineTerminator ¶

func ProcessLineTerminator(pos int, buf []rune) (*Token, int, error)

ProcessLineTerminator reads line terminator code points into the lexical analysis token table.

func ProcessNumericLiteral ¶

func ProcessNumericLiteral(pos int, buf []rune) (*Token, int, error)

ProcessNumericLiteral attempts to process the next set of code points as a numeric literal token.

func ProcessOctalIntegerLiteral ¶

func ProcessOctalIntegerLiteral(pos int, buf []rune) (*Token, int, error)

ProcessOctalIntegerLiteral deals with extracting an octal integer literal from the input rune slice.

func ProcessPunctuator ¶

func ProcessPunctuator(pos int, buf []rune, pMap map[string]rune) (*Token, int, error)

func ProcessRegExpLiteral ¶

func ProcessRegExpLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessRegExpLiteral attempts to process the next sequence of code points as a regular expression literal.

func ProcessRightBracePunctuator ¶

func ProcessRightBracePunctuator(pos int, buf []rune) (*Token, int, error)

func ProcessStringLiteral ¶

func ProcessStringLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessStringLiteral attempts to process code points from the position provided in the input buffer onwards.

func ProcessTemplateLiteral ¶

func ProcessTemplateLiteral(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessTemplateLiteral attempts to parse the next sequence of code points from the provided position as a template literal.

func ProcessWhiteSpace ¶

func ProcessWhiteSpace(pos int, buf []rune, charMap map[string]map[rune]rune) (*Token, int, error)

ProcessWhiteSpace attempts to read the current code point as white space token.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL