Mon 07 March 2022
This journal entry was done relative to this GitHub commit
On the weekend I did a spot of hacking on the Agda compiler. I found out that the input for the HTML back-end is actually a token stream. That is, the input is what you would expect to get after lexing (not parsing). However, what’s interesting is that there is a piece of data attached to each token called aspect which tells the HTML backend what colour to highlight it. I feel like this data could only be gleaned by parsing.
So we seem to have an interesting situation where parsing is done but the the token stream is re-annotated with extra information.
Here’s an example of the first token on a line that specifies the
type signature of swap.
( 1294
,"swap"
, Aspects
{ aspect = Just (Name (Just Function) False)
, otherAspects = fromList []
, note = ""
, definitionSite =
Just (DefinitionSite
{ defSiteModule =
TopLevelModuleName
{ moduleNameRange = NoRange
, moduleNameParts = "Data" :| ["Sum","Base"]}
, defSitePos = 1294
, defSiteHere = True
, defSiteAnchor = Just "swap"
})
, tokenBased = NotOnlyTokenBased})This is the Aspect data structure.
data Aspect
= Comment
| Keyword
| String
| Number
| Hole
| Symbol -- ^ Symbols like forall, =, ->, etc.
| PrimitiveType -- ^ Things like Set and Prop.
| Name (Maybe NameKind) Bool -- ^ Is the name an operator part?
| Pragma -- ^ Text occurring in pragmas that
-- does not have a more specific
-- aspect.
| Background -- ^ Non-code contents in literate Agda
| Markup
-- ^ Delimiters used to separate the Agda code blocks from the
-- other contents in literate Agda
deriving (Eq, Show, Generic)It seems that the HTML backend does recover the token stream from teh abstract syntax.
Module Agda.Interaction.Highlighting.FromAbstract has
this at the top:
-- | Extract highlighting syntax from abstract syntax.
--
-- Implements one big fold over abstract syntax.
It works on the data type from module
Agda.Syntax.Abstract which has this as its header:
{-| The abstract syntax. This is what you get after desugaring and scope
analysis of the concrete syntax. The type checker works on abstract syntax,
producing internal syntax ("Agda.Syntax.Internal").
-}
The entry point to the highlighter is in module
Agda.Interaction.Highlighting.FromAbstract and it has the
following signature:
runHighlighter ::
Hilite a =>
SourceToModule -> AbsolutePath -> NameKinds -> a ->
HighlightingInfoBuilder
As you can see there is a Hilite type class. Many
instances are defined in the module. Here is the one for
declarations.
instance Hilite A.Declaration where
hilite = \case
A.Axiom _ax _di ai _occ x e -> hl ai <> hl x <> hl e
A.Generalize _names _di ai x e -> hl ai <> hl x <> hl e
A.Field _di x e -> hlField x <> hl e
A.Primitive _di x e -> hl x <> hl e
A.Mutual _mi ds -> hl ds
A.Section _r x tel ds -> hl x <> hl tel <> hl ds
A.Apply mi x a _ci dir -> hl mi <> hl x <> hl a <> hl dir
A.Import mi x dir -> hl mi <> hl x <> hl dir
A.Open mi x dir -> hl mi <> hl x <> hl dir
A.FunDef _di x _delayed cs -> hl x <> hl cs
A.DataSig _di x tel e -> hl x <> hl tel <> hl e
A.DataDef _di x _uc pars cs -> hl x <> hl pars <> hl cs
A.RecSig _di x tel e -> hl x <> hl tel <> hl e
A.RecDef _di x _uc dir bs e ds -> hl x <> hl dir <> hl bs <> hl e <> hl ds
A.PatternSynDef x xs p -> hl x <> hl xs <> hl p
A.UnquoteDecl _mi _di xs e -> hl xs <> hl e
A.UnquoteDef _di xs e -> hl xs <> hl e
A.ScopedDecl s ds -> hl ds
A.Pragma _r pragma -> hl pragma
where
hl a = hilite a
hlField x = hiliteField (concreteQualifier x) (concreteBase x) (Just $ bindingSite x)
FunDefLet’s see how the highlighting works on FunDef. The key
line is
A.FunDef _di x _delayed cs -> hl x <> hl cs
Function hilite picks out the second field
x and fourth field cs. From module
Agda.Syntax.Abstract we learn that these have types
x :: QName and cs :: [Clause]
respectively:
| FunDef DefInfo QName Delayed [Clause] -- ^ sequence of function clauses
Data type QName is defined in
Agda.Syntax.Abstract.Name
data QName = QName { qnameModule :: ModuleName
, qnameName :: Name
}
deriving Data
and Name is also defined in the same module as:
data Name = Name
{ nameId :: !NameId
, nameConcrete :: C.Name -- ^ The concrete name used for this instance
, nameCanonical :: C.Name -- ^ The concrete name in the original definition (needed by primShowQName, see #4735)
, nameBindingSite :: Range
, nameFixity :: Fixity'
, nameIsRecordName :: Bool
-- ^ Is this the name of the invisible record variable `self`?
-- Should not be printed or displayed in the context, see issue #3584.
} deriving Data
runHighlighter produces a
HighlightingInfoBuilder
As far as I can tell this produces a HighlightingInfo at
the end, which is a type synonym for RangeMap Aspects. The
definition involves a DelayedMerge data type for efficiency
reasons.
A RangeMap Aspects can be thought of as mapping a range,
(Int, Int), to an Aspects value.
Okay, now let’s rewind a bit. The token stream is actually produced
in by function
Agda.Interaction.Highlighting.HTML.Base.tokenStream
tokenStream
:: Text -- ^ The contents of the module.
-> HighlightingInfo -- ^ Highlighting information.
-> [TokenInfo]
I learned that the Agda HTML Backend gets its highlighting information from the module interface produced by Agda’s type-checking phase. The highlighting information is provided at the level of tokens not abstract syntax which makes it hard to e.g. pull out just a single function definition, since we don’t have access to a structure which allows us to do that. We just have tokens and what colour they should be highlighted.