Parsing an indented tree in Haskell

An example of how to parse an indented tree of data in Haskell using Parsec and indents.

> import Control.Applicative
> import Data.Char (isSpace)
> import Data.Either.Utils (forceEither)
> import Data.Monoid
> import System.Environment (getArgs)
> import Text.Parsec hiding (many, optional, (<|>))
> import Text.Parsec.Indent

A basic tree structure:

> data Tree = Node [Tree] | Leaf String

A simple serialization function to easily check the result of our parsing:

> serializeIndentedTree tree = drop 2 $ s (-1) tree
>   where
>     s i (Node children) = "\n" <> (concat $ replicate i "    ") <> (concat $ map (s (i+1)) children)
>     s _ (Leaf text)     = text <> " "

Our main function and some glue:

> main = do
>     args <- getArgs
>     input <- if null args then return example else readFile $ head args
>     putStrLn $ serializeIndentedTree $ forceEither $ parseIndentedTree input
> 
> parseIndentedTree input = runIndent "" $ runParserT aTree () "" input

The actual parser:

Note that the indents package works by storing a SourcePos in a State monad. Its combinators don't actually consume indentation, they just compare the column numbers. So where we consume spaces is very important.

> aTree = Node <$> many aNode
> 
> aNode = spaces *> withBlock makeNode aNodeHeader aNode
> 
> aNodeHeader = many1 aLeaf <* spaces
> 
> aLeaf = Leaf <$> (many1 (satisfy (not . isSpace)) <* many (oneOf " \t"))
> 
> makeNode leaves nodes = Node $ leaves <> nodes

An example tree:

> example = unlines [
>     "lorem ipsum",
>     "    dolor",
>     "    sit amet",
>     "    consectetur",
>     "        adipiscing elit dapibus",
>     "    sodales",
>     "urna",
>     "    facilisis"
>   ]

The result:

% runhaskell parseIndentedTree.lhs
lorem ipsum 
    dolor 
    sit amet 
    consectetur 
        adipiscing elit dapibus 
    sodales 
urna 
    facilisis 

Comments

Add a comment