Mon 07 March 2022
This journal entry was done relative to this GitHub commit
On the weekend I did a spot of hacking on the Agda compiler. I found out that the input for the HTML back-end is actually a token stream. That is, the input is what you would expect to get after lexing (not parsing). However, what’s interesting is that there is a piece of data attached to each token called aspect which tells the HTML backend what colour to highlight it. I feel like this data could only be gleaned by parsing.
So we seem to have an interesting situation where parsing is done but the the token stream is re-annotated with extra information.
Here’s an example of the first token on a line that specifies the
type signature of swap
.
( 1294
"swap"
,
, Aspects{ aspect = Just (Name (Just Function) False)
= fromList []
, otherAspects = ""
, note =
, definitionSite (DefinitionSite
Just { defSiteModule =
TopLevelModuleName{ moduleNameRange = NoRange
= "Data" :| ["Sum","Base"]}
, moduleNameParts = 1294
, defSitePos = True
, defSiteHere = Just "swap"
, defSiteAnchor })
= NotOnlyTokenBased}) , tokenBased
This is the Aspect
data structure.
data Aspect
= Comment
| Keyword
| String
| Number
| Hole
| Symbol -- ^ Symbols like forall, =, ->, etc.
| PrimitiveType -- ^ Things like Set and Prop.
| Name (Maybe NameKind) Bool -- ^ Is the name an operator part?
| Pragma -- ^ Text occurring in pragmas that
-- does not have a more specific
-- aspect.
| Background -- ^ Non-code contents in literate Agda
| Markup
-- ^ Delimiters used to separate the Agda code blocks from the
-- other contents in literate Agda
(Eq, Show, Generic) deriving
It seems that the HTML backend does recover the token stream from teh abstract syntax.
Module Agda.Interaction.Highlighting.FromAbstract
has
this at the top:
-- | Extract highlighting syntax from abstract syntax.
--
-- Implements one big fold over abstract syntax.
It works on the data type from module
Agda.Syntax.Abstract
which has this as its header:
{-| The abstract syntax. This is what you get after desugaring and scope
analysis of the concrete syntax. The type checker works on abstract syntax,
producing internal syntax ("Agda.Syntax.Internal").
-}
The entry point to the highlighter is in module
Agda.Interaction.Highlighting.FromAbstract
and it has the
following signature:
runHighlighter ::
Hilite a =>
SourceToModule -> AbsolutePath -> NameKinds -> a ->
HighlightingInfoBuilder
As you can see there is a Hilite
type class. Many
instances are defined in the module. Here is the one for
declarations.
instance Hilite A.Declaration where
hilite = \case
A.Axiom _ax _di ai _occ x e -> hl ai <> hl x <> hl e
A.Generalize _names _di ai x e -> hl ai <> hl x <> hl e
A.Field _di x e -> hlField x <> hl e
A.Primitive _di x e -> hl x <> hl e
A.Mutual _mi ds -> hl ds
A.Section _r x tel ds -> hl x <> hl tel <> hl ds
A.Apply mi x a _ci dir -> hl mi <> hl x <> hl a <> hl dir
A.Import mi x dir -> hl mi <> hl x <> hl dir
A.Open mi x dir -> hl mi <> hl x <> hl dir
A.FunDef _di x _delayed cs -> hl x <> hl cs
A.DataSig _di x tel e -> hl x <> hl tel <> hl e
A.DataDef _di x _uc pars cs -> hl x <> hl pars <> hl cs
A.RecSig _di x tel e -> hl x <> hl tel <> hl e
A.RecDef _di x _uc dir bs e ds -> hl x <> hl dir <> hl bs <> hl e <> hl ds
A.PatternSynDef x xs p -> hl x <> hl xs <> hl p
A.UnquoteDecl _mi _di xs e -> hl xs <> hl e
A.UnquoteDef _di xs e -> hl xs <> hl e
A.ScopedDecl s ds -> hl ds
A.Pragma _r pragma -> hl pragma
where
hl a = hilite a
hlField x = hiliteField (concreteQualifier x) (concreteBase x) (Just $ bindingSite x)
FunDef
Let’s see how the highlighting works on FunDef
. The key
line is
A.FunDef _di x _delayed cs -> hl x <> hl cs
Function hilite
picks out the second field
x
and fourth field cs
. From module
Agda.Syntax.Abstract
we learn that these have types
x :: QName
and cs :: [Clause]
respectively:
| FunDef DefInfo QName Delayed [Clause] -- ^ sequence of function clauses
Data type QName
is defined in
Agda.Syntax.Abstract.Name
data QName = QName { qnameModule :: ModuleName
, qnameName :: Name
}
deriving Data
and Name
is also defined in the same module as:
data Name = Name
{ nameId :: !NameId
, nameConcrete :: C.Name -- ^ The concrete name used for this instance
, nameCanonical :: C.Name -- ^ The concrete name in the original definition (needed by primShowQName, see #4735)
, nameBindingSite :: Range
, nameFixity :: Fixity'
, nameIsRecordName :: Bool
-- ^ Is this the name of the invisible record variable `self`?
-- Should not be printed or displayed in the context, see issue #3584.
} deriving Data
runHighlighter
produces a
HighlightingInfoBuilder
As far as I can tell this produces a HighlightingInfo
at
the end, which is a type synonym for RangeMap Aspects
. The
definition involves a DelayedMerge
data type for efficiency
reasons.
A RangeMap Aspects
can be thought of as mapping a range,
(Int, Int)
, to an Aspects
value.
Okay, now let’s rewind a bit. The token stream is actually produced
in by function
Agda.Interaction.Highlighting.HTML.Base.tokenStream
tokenStream
:: Text -- ^ The contents of the module.
-> HighlightingInfo -- ^ Highlighting information.
-> [TokenInfo]
I learned that the Agda HTML Backend gets its highlighting information from the module interface produced by Agda’s type-checking phase. The highlighting information is provided at the level of tokens not abstract syntax which makes it hard to e.g. pull out just a single function definition, since we don’t have access to a structure which allows us to do that. We just have tokens and what colour they should be highlighted.