oocdoc, Part 3 — parsing

👋 This page was last updated ~12 years ago. Just so you know.

In , I gave brummi a go. However, we've seen that it still doesn't fit our requirements: we need a tool that's fast, easy to install and configure, produces beautiful and usable docs.

Yesterday I started building my own documentation generator, and in this series I'll present the challenges I face and how I solved them. This might show a few ooc tricks, perhaps some software design, some good, some bad, but overall I hope it'll be a good read!

Basic application skeleton

I decided to name my tool hopemage after a misspelling of homepage. It has a only a handful of Google results and doesn't trigger auto-correct, so I figured it's as good a name as any!

To stub the project, you can use this very nice tool called llamize, it'll create a folder for you and populate it with a .gitignore file, a pre-filled .use file, a source and samples folder, and even a basic Travis config file for Continuous Integration.

After running it with llamize hopemage, here's what we get:

├── README.md
├── hopemage.use
├── samples
└── source
    └── hopemage

Time to fill the README and .use file with a few basic infos about the project, and push it to GitHub

I don't want to have hopemage everytime I'll launch the tool. So, the command-line tool will be named homa. For that, I'm creating the file source/hopemage/homa.ooc, and have a basic main function there:

// sdk stuff
import structs/ArrayList

// our stuff
use hopemage

main: func (args: ArrayList<String>) {
    app := Homa new()
    app handle(args)
}

Homa: class {

    versionString := "0.1"
    
    init: func

    handle: func (args: ArrayList<String>) {
        if (args size > 1) {
            parse(args[1])
        } else {
            usage()
            exit(0)
        }
    }

    usage: func {
        "homa v%s" printfln(versionString)
        "Usage: homa FILE" println()
    }

    parse: func (path: String) {
        // TODO: fill in later
    }

}

This is pretty much your stereo-typical 'ready to be scaled up' hello world program. We have an application class, Homa, that will handle the various operations our program can perform.

A simple version string is enough for now, no need to dedicate a whole class to it. The main prototype accepts an ArrayList of strings, which is handier than, say, (argc: Int, argv: CString*).

So that rock knows which .ooc file contains the main program, let's pimp our .use file a bit:

Name: hopemage
Version: 0.1
Description: Generates documentation when it feels like it.
SourcePath: source
Main: hopemage/homa

The important part, of course, being the Main directive here. Now we can simply launch rock, the ooc compiler from our project folder. If you've never compiled an ooc program, here's how it looks on my OSX box:

$ rock
Build order: [hopemage, nagaqueen, sdk]
[ OK ]

$ ./homa
homa v0.1
Usage: homa FILE

Not too hard, eh? Let's move on to more serious stuff.

Parsing ooc code

The tool we're making is not language-agnostic. It should have knowledge of the code's structure, the types and functions defined there, so it can interpret generate an index, and so on.

However, we're not going to write an ooc parser by hand. I've done that stuff too much in my youth: at 23, I reckon I like to cut me some slack. Instead, we're going to use nagaqueen.

Nagaqueen is a peg/leg grammar for ooc that is meant to be used with greg. Greg is a fork of _why's fork of Ian Piumarta's original peg/leg utility. It generates C code for a parser from a Parsing Expression Grammar.

Since there's a C file that needs to be added to our project, we'll need to smart about it. Turns out there's a very nice way to do this without resorting to Makefiles or similar pagan rituals.

A few clones later, with greg set up and nagaqueen in my Dev directory, I'm able to do this:

$ cd hopemage

$ mkdir nagaqueen-packed

$ greg ../nagaqueen/grammar/nagaqueen.leg > nagaqueen-packed/nagaqueen.c

And now, we just have to add this to our .use file for the c file to be compiled in with our project:

Requires: nagaqueen
Additionals: nagaqueen-packed/nagaqueen.c

As we are parsing .ooc files, we need some sort of AST to store their structure. Let's do this quickly, in source/hopemage/ast.ooc:

// sdk stuff
import structs/ArrayList

Module: class {

   types := ArrayList<Type> new()

   init: func

}

Type: class {

    name: String
    doc: Doc

    init: func (=name, =doc) {}

}

Doc: class {

    raw: String

    init: func (=raw)

    parse: static func (input: String) -> This {
        This new(input)
    }

}

Nothing too exciting here, our data structures are pretty dumb, but that'll do for now. Next up, we want a Frontend class to handle the parsing itself.

Nagaqueen comes in the form of an ooc library that is quite easy to use: all you need to do is have a class that extends OocListener, and override whichever callbacks you want to use.

You can think of nagaqueen as similar to SAX, ie. an event-driven parser. When it encounters elements, it calls functions, and it's up to the listener (in our case, Frontend) to make sense of it and either build an AST, or process the data directly.

Here's how a basic Frontend class could like, in source/hopemage/frontend.ooc:

// third-party stuff
use nagaqueen
import nagaqueen/[OocListener]

// our stuff
import hopemage/[ast]

Frontend: class extends OocListener {

    module: Module

    init: func {
        module = Module new()
    }

    strict?: func -> Bool {
        false
    }

    onClassStart: func (name, doc: CString) {
        type := Type new(name toString(), Doc parse(doc toString()))
        module types add(type)
    }

}

The reason we have to use nagaqueen explicitly is because, well we're using it in this file. But we have to do it even though it's listed in hopemage.use, because requirements in .use files are only used by tools like sam for package management.

The square brackets in our import directives are there in case we want to import several classes from the same package. We'll later be able to simply add them separated by commas, like so: import hopemage/[ast, module2, module3].

The reason we overload the strict? method is because by default, nagaqueen operates in strict mode, where every non-overloaded callback is treated as an error - it throws an Exception and prevents the rest of the file from being parsed, even if it's caught. This is useful when implementing a compiler, for example, but in our case we'll ignore most of the callbacks and concentrate on type declarations and function declarations.

The only callback we're overloading is onClassStart (a complete list can be looked up in nagaqueen's source). This handy function tells us the name and doc string of any class declared in the file. Finally, we simply add it to a list of types contained in Module, which is defined in our AST.

Now is a good time to fill out the parse method we left empty in the first place, in Homa:

// (snip)
import hopemage/[frontend, ast]

Homa: class {

    // (snip)

    parse: func (path: String) {
        frontend := Frontend new()
        frontend parse(path)

        module := frontend module

        for (t in module types) {
            "## %s\n\n'''%s'''\n\n" printfln(t name, t doc raw)
        }
    }
}

So, does it work?

To test if it works, we can simply recompile our application (we might need to run rock -x to clean up temporary files, as we've added a non-trivial C dependency), and launch it against an ooc file to test it out:

$ ./homa ~/Dev/rock/sdk/net/DNS.ooc
## DNS

'''
   Allows DNS lookups and reserve lookups
 '''


## HostInfo

'''
   Information about an host, ie. its name and different addresses
 '''

As you can see, it works perfectly - we have class names, and the associated doc strings. We don't do any sort of parsing on the doc strings themselves yet, it's just some raw text - but that's for the next article!

I hope you enjoy this series, please tell me if I went over some things too quickly, I'll gladly include additional information in these articles.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Abstracting away correctness

I've been banging the same drum for years: APIs must be carefully designed.

This statement doesn't resonate the same way with everyone. In order to really understand what I mean by "careful API design", one has to have experienced both ends of the spectrum.

But there is a silver lining - once you have experienced "good design", it's really hard to go back to the other kind. Even after acknowledging that "good design" inevitably comes at a cost, whether it's cognitive load, compile times, making hiring more challenging, etc.