Justin Pombrio

What we preceive as reality is a construct of the mind.

Syntax Highlighting

How ought syntax highlighting be done?


First, recognize that this is a lexing problem, not a parsing one. (In retrospect, this is true only for some languages.) It is not necessary to know the full grammar of a language. But it is imperitive to recognize tokens perfectly: if a single string literal terminator is missed, the attempt to highlight backfires.

Besides splitting tokens, the only other job of a syntax highlighter is the classify and color them. Thus each language needs only a lexing file which describes a few categories of tokens. I would suggest just four universal categories. Does anyone know of a language for which these categories are inappropriate?

C Scheme Haskell Extended BNF
Syntax { } ; ( ) * & + - if struct ( ) ` , { } ; case of data type where ::= ; [ ] { } ( ) |
Identifier foo foo + foo Nothing <foo>
Literal 3 "hello" 3 'hi "hello" 3 "hello" "hello"
Operator * & + ++ `elem`
Comment // /* */ ; -- {- -} (* *)
(Whitespace)