Hacking the Agda HTML backend

Sean Seefried

Mon 07 March 2022

This journal entry was done relative to this GitHub commit

On the weekend I did a spot of hacking on the Agda compiler. I found out that the input for the HTML back-end is actually a token stream. That is, the input is what you would expect to get after lexing (not parsing). However, what’s interesting is that there is a piece of data attached to each token called aspect which tells the HTML backend what colour to highlight it. I feel like this data could only be gleaned by parsing.

So we seem to have an interesting situation where parsing is done but the the token stream is re-annotated with extra information.

Here’s an example of the first token on a line that specifies the type signature of swap.

( 1294
,"swap"
, Aspects
    { aspect = Just (Name (Just Function) False)
    , otherAspects = fromList []
    , note = ""
    , definitionSite =
         Just (DefinitionSite
           { defSiteModule =
                TopLevelModuleName
                  { moduleNameRange = NoRange
                  , moduleNameParts = "Data" :| ["Sum","Base"]}
                  , defSitePos = 1294
                  , defSiteHere = True
                  , defSiteAnchor = Just "swap"
            })
    , tokenBased = NotOnlyTokenBased})

This is the Aspect data structure.

data Aspect
  = Comment
  | Keyword
  | String
  | Number
  | Hole
  | Symbol                     -- ^ Symbols like forall, =, ->, etc.
  | PrimitiveType              -- ^ Things like Set and Prop.
  | Name (Maybe NameKind) Bool -- ^ Is the name an operator part?
  | Pragma                     -- ^ Text occurring in pragmas that
                               --   does not have a more specific
                               --   aspect.
  | Background                 -- ^ Non-code contents in literate Agda
  | Markup
    -- ^ Delimiters used to separate the Agda code blocks from the
    -- other contents in literate Agda
    deriving (Eq, Show, Generic)

It seems that the HTML backend does recover the token stream from teh abstract syntax.

Module Agda.Interaction.Highlighting.FromAbstract has this at the top:

-- | Extract highlighting syntax from abstract syntax.
--
-- Implements one big fold over abstract syntax.

It works on the data type from module Agda.Syntax.Abstract which has this as its header:

{-| The abstract syntax. This is what you get after desugaring and scope
    analysis of the concrete syntax. The type checker works on abstract syntax,
    producing internal syntax ("Agda.Syntax.Internal").
-}

The entry point to the highlighter is in module Agda.Interaction.Highlighting.FromAbstract and it has the following signature:

runHighlighter ::
  Hilite a =>
  SourceToModule -> AbsolutePath -> NameKinds -> a ->
  HighlightingInfoBuilder

As you can see there is a Hilite type class. Many instances are defined in the module. Here is the one for declarations.

instance Hilite A.Declaration where
  hilite = \case
      A.Axiom _ax _di ai _occ x e            -> hl ai <> hl x <> hl e
      A.Generalize _names _di ai x e         -> hl ai <> hl x <> hl e
      A.Field _di x e                        -> hlField x <> hl e
      A.Primitive _di x e                    -> hl x <> hl e
      A.Mutual _mi ds                        -> hl ds
      A.Section _r x tel ds                  -> hl x <> hl tel <> hl ds
      A.Apply mi x a _ci dir                 -> hl mi <> hl x <> hl a <> hl dir
      A.Import mi x dir                      -> hl mi <> hl x <> hl dir
      A.Open mi x dir                        -> hl mi <> hl x <> hl dir
      A.FunDef _di x _delayed cs             -> hl x <> hl cs
      A.DataSig _di x tel e                  -> hl x <> hl tel <> hl e
      A.DataDef _di x _uc pars cs            -> hl x <> hl pars <> hl cs
      A.RecSig _di x tel e                   -> hl x <> hl tel <> hl e
      A.RecDef _di x _uc dir bs e ds         -> hl x <> hl dir <> hl bs <> hl e <> hl ds
      A.PatternSynDef x xs p                 -> hl x <> hl xs <> hl p
      A.UnquoteDecl _mi _di xs e             -> hl xs <> hl e
      A.UnquoteDef _di xs e                  -> hl xs <> hl e
      A.ScopedDecl s ds                      -> hl ds
      A.Pragma _r pragma                     -> hl pragma
    where
    hl      a = hilite a
    hlField x = hiliteField (concreteQualifier x) (concreteBase x) (Just $ bindingSite x)

Running through an example on FunDef

Let’s see how the highlighting works on FunDef. The key line is

    A.FunDef _di x _delayed cs             -> hl x <> hl cs

Function hilite picks out the second field x and fourth field cs. From module Agda.Syntax.Abstract we learn that these have types x :: QName and cs :: [Clause] respectively:

  | FunDef     DefInfo QName Delayed [Clause] -- ^ sequence of function clauses

Data type QName is defined in Agda.Syntax.Abstract.Name

data QName = QName { qnameModule :: ModuleName
                   , qnameName   :: Name
                   }
    deriving Data

and Name is also defined in the same module as:

data Name = Name
  { nameId           :: !NameId
  , nameConcrete     :: C.Name  -- ^ The concrete name used for this instance
  , nameCanonical    :: C.Name  -- ^ The concrete name in the original definition (needed by primShowQName, see #4735)
  , nameBindingSite  :: Range
  , nameFixity       :: Fixity'
  , nameIsRecordName :: Bool
      -- ^ Is this the name of the invisible record variable `self`?
      --   Should not be printed or displayed in the context, see issue #3584.
  } deriving Data

runHighlighter produces a HighlightingInfoBuilder

As far as I can tell this produces a HighlightingInfo at the end, which is a type synonym for RangeMap Aspects. The definition involves a DelayedMerge data type for efficiency reasons.

A RangeMap Aspects can be thought of as mapping a range, (Int, Int), to an Aspects value.


Okay, now let’s rewind a bit. The token stream is actually produced in by function Agda.Interaction.Highlighting.HTML.Base.tokenStream

tokenStream
     :: Text             -- ^ The contents of the module.
     -> HighlightingInfo -- ^ Highlighting information.
     -> [TokenInfo]

What did I learn about the Agda HTML backend?

I learned that the Agda HTML Backend gets its highlighting information from the module interface produced by Agda’s type-checking phase. The highlighting information is provided at the level of tokens not abstract syntax which makes it hard to e.g. pull out just a single function definition, since we don’t have access to a structure which allows us to do that. We just have tokens and what colour they should be highlighted.