oocdoc, Part 3 — parsing
👋 This page was last updated ~12 years ago. Just so you know.
In the previous article, I gave brummi a go. However, we've seen that it still doesn't fit our requirements: we need a tool that's fast, easy to install and configure, produces beautiful and usable docs.
Yesterday I started building my own documentation generator, and in this series I'll present the challenges I face and how I solved them. This might show a few ooc tricks, perhaps some software design, some good, some bad, but overall I hope it'll be a good read!
Basic application skeleton
I decided to name my tool hopemage
after a misspelling of homepage
.
It has a only a handful of Google results and doesn't trigger auto-correct,
so I figured it's as good a name as any!
To stub the project, you can use this very nice tool called llamize, it'll create a folder for you and populate it with a .gitignore file, a pre-filled .use file, a source and samples folder, and even a basic Travis config file for Continuous Integration.
After running it with llamize hopemage
, here's what we get:
├── README.md
├── hopemage.use
├── samples
└── source
└── hopemage
Time to fill the README and .use file with a few basic infos about the project, and push it to GitHub
I don't want to have hopemage
everytime I'll launch the tool.
So, the command-line tool will be named homa
. For that, I'm creating
the file source/hopemage/homa.ooc
, and have a basic main function there:
// sdk stuff
import structs/ArrayList
// our stuff
use hopemage
main: func (args: ArrayList<String>) {
app := Homa new()
app handle(args)
}
Homa: class {
versionString := "0.1"
init: func
handle: func (args: ArrayList<String>) {
if (args size > 1) {
parse(args[1])
} else {
usage()
exit(0)
}
}
usage: func {
"homa v%s" printfln(versionString)
"Usage: homa FILE" println()
}
parse: func (path: String) {
// TODO: fill in later
}
}
This is pretty much your stereo-typical 'ready to be scaled up'
hello world program. We have an application class, Homa
, that
will handle the various operations our program can perform.
A simple version string is enough for now, no need to dedicate
a whole class to it. The main prototype accepts an ArrayList of
strings, which is handier than, say, (argc: Int, argv: CString*)
.
So that rock knows which .ooc file contains the main program, let's pimp our .use file a bit:
Name: hopemage
Version: 0.1
Description: Generates documentation when it feels like it.
SourcePath: source
Main: hopemage/homa
The important part, of course, being the Main
directive here. Now we can
simply launch rock
, the ooc compiler from our project folder. If
you've never compiled an ooc program, here's how it looks on my OSX box:
$ rock
Build order: [hopemage, nagaqueen, sdk]
[ OK ]
$ ./homa
homa v0.1
Usage: homa FILE
Not too hard, eh? Let's move on to more serious stuff.
Parsing ooc code
The tool we're making is not language-agnostic. It should have knowledge of the code's structure, the types and functions defined there, so it can interpret generate an index, and so on.
However, we're not going to write an ooc parser by hand. I've done that stuff too much in my youth: at 23, I reckon I like to cut me some slack. Instead, we're going to use nagaqueen.
Nagaqueen is a peg/leg grammar for ooc that is meant to be used with greg. Greg is a fork of _why's fork of Ian Piumarta's original peg/leg utility. It generates C code for a parser from a Parsing Expression Grammar.
Since there's a C file that needs to be added to our project, we'll need to smart about it. Turns out there's a very nice way to do this without resorting to Makefiles or similar pagan rituals.
A few clones later, with greg set up and nagaqueen in my Dev directory, I'm able to do this:
$ cd hopemage
$ mkdir nagaqueen-packed
$ greg ../nagaqueen/grammar/nagaqueen.leg > nagaqueen-packed/nagaqueen.c
And now, we just have to add this to our .use file for the c file to be compiled in with our project:
Requires: nagaqueen
Additionals: nagaqueen-packed/nagaqueen.c
As we are parsing .ooc files, we need some sort of AST
to store their structure. Let's do this quickly, in
source/hopemage/ast.ooc
:
// sdk stuff
import structs/ArrayList
Module: class {
types := ArrayList<Type> new()
init: func
}
Type: class {
name: String
doc: Doc
init: func (=name, =doc) {}
}
Doc: class {
raw: String
init: func (=raw)
parse: static func (input: String) -> This {
This new(input)
}
}
Nothing too exciting here, our data structures are pretty
dumb, but that'll do for now. Next up, we want a Frontend
class to handle the parsing itself.
Nagaqueen comes in the form of an ooc library that is quite
easy to use: all you need to do is have a class that extends
OocListener
, and override whichever callbacks you want
to use.
You can think of nagaqueen as similar to SAX, ie. an event-driven
parser. When it encounters elements, it calls functions, and
it's up to the listener (in our case, Frontend
) to make sense
of it and either build an AST, or process the data directly.
Here's how a basic Frontend
class could like, in source/hopemage/frontend.ooc
:
// third-party stuff
use nagaqueen
import nagaqueen/[OocListener]
// our stuff
import hopemage/[ast]
Frontend: class extends OocListener {
module: Module
init: func {
module = Module new()
}
strict?: func -> Bool {
false
}
onClassStart: func (name, doc: CString) {
type := Type new(name toString(), Doc parse(doc toString()))
module types add(type)
}
}
The reason we have to use nagaqueen
explicitly is because, well
we're using it in this file. But we have to do it even though it's
listed in hopemage.use
, because requirements in .use files are
only used by tools like sam for package management.
The square brackets in our import directives are there in case we
want to import several classes from the same package. We'll later
be able to simply add them separated by commas, like so:
import hopemage/[ast, module2, module3]
.
The reason we overload the strict?
method is because by default,
nagaqueen operates in strict mode, where every non-overloaded callback
is treated as an error - it throws an Exception and prevents the
rest of the file from being parsed, even if it's caught. This is useful
when implementing a compiler, for example, but in our case we'll
ignore most of the callbacks and concentrate on type declarations
and function declarations.
The only callback we're overloading is onClassStart
(a complete
list can be looked up in nagaqueen's source). This
handy function tells us the name and doc string of any class declared
in the file. Finally, we simply add it to a list of types contained
in Module
, which is defined in our AST.
Now is a good time to fill out the parse
method we left empty
in the first place, in Homa
:
// (snip)
import hopemage/[frontend, ast]
Homa: class {
// (snip)
parse: func (path: String) {
frontend := Frontend new()
frontend parse(path)
module := frontend module
for (t in module types) {
"## %s\n\n'''%s'''\n\n" printfln(t name, t doc raw)
}
}
}
So, does it work?
To test if it works, we can simply recompile our application (we
might need to run rock -x
to clean up temporary files, as we've
added a non-trivial C dependency), and launch it against an ooc
file to test it out:
$ ./homa ~/Dev/rock/sdk/net/DNS.ooc
## DNS
'''
Allows DNS lookups and reserve lookups
'''
## HostInfo
'''
Information about an host, ie. its name and different addresses
'''
As you can see, it works perfectly - we have class names, and the associated doc strings. We don't do any sort of parsing on the doc strings themselves yet, it's just some raw text - but that's for the next article!
I hope you enjoy this series, please tell me if I went over some things too quickly, I'll gladly include additional information in these articles.
Here's another article just for you:
Aiming for correctness with types
The Nature weekly journal of science was first published in 1869. And after one and a half century, it has finally completed one cycle of carcinization, by publishing an article about the Rust programming language.
It's a really good article.
What I liked about this article is that it didn't just talk about performance, or even just memory safety - it also talked about correctness.