orgmode

Kotlin Multiplatform library for parsing, formatting, and working with Org Mode files and content.

The parser is based on parser combinators and you can compose them to make new or variations of existing parsers. Every parser returns an element of type OrgElem that also tracks the tokens (Token) used in making the element to let you get the full source map.

Installation

The library is available in Maven Central. Right now only common, android, iOS, and jvm artifacts are published. With later versions, I will also push Linux and web builds.

You can use equivalent of the following to use this library:

repositories {
    mavenCentral()
}

kotlin {
    sourceSets {
        commonMain {
            dependencies {
                implementation("xyz.lepisma:orgmode:$version")
            }
        }
        androidMain {
            dependencies {
                implementation("xyz.lepisma:orgmode-android:$version")
            }
        }
        jvmMain {
            dependencies {
                implementation("xyz.lepisma:orgmode-jvm:$version")
            }
        }
        iosX64Main {
            dependencies {
                implementation("xyz.lepisma:orgmode-iosx64:$version")
            }
        }
        iosArm64Main {
            dependencies {
                implementation("xyz.lepisma:orgmode-iosarm64:$version")
            }
        }
        iosSimulatorArm64Main {
            dependencies {
                implementation("xyz.lepisma:orgmode-iossimulatorarm64:$version")
            }
        }
    }
}

Usage

There are two stages for using the parser:

Lexing converts an org string to individual xyz.lepisma.orgmode.Token.
Parsing converts a list of tokens to any org element.

The root element of a full document is OrgDocument which can be parsed like this:

val tokens: List<Token> = OrgLexer(orgString).tokenize()
val document: OrgDocument? = parse(tokens)

Each OrgElem has, usually, its own parser that can be invoked on tokens that build that element up. For using them, you will also be helped by the parser combinators here. As an example, here is how to parse a line with an org link:

import xyz.lepisma.orgmode.lexer.OrgLexer
import xyz.lepisma.orgmode.core.seq
import xyz.lepisma.orgmode.core.matchSOF
import xyz.lepisma.orgmode.core.matchEOF
import xyz.lepisma.orgmode.parseOrgLine

val text = "this is [[attachment:hello world.pdf]]"
val tokens = OrgLexer(text).tokenize()
val parser = seq(matchSOF, parseOrgLine, matchEOF)

// Output is a triple with items matching SOF OrgToken, OrgLine, and EOF OrgToken
val line = (parser.invoke(tokens, 0) as ParsingResult.Success).output.second

// line.items.size shouldBe 5
// (line.items.last() is OrgInlineElem.Link) shouldBe true
// (line.items.last() as OrgInlineElem.Link).title shouldBe null
// (line.items.last() as OrgInlineElem.Link).target shouldBe "hello world.pdf"
// (line.items.last() as OrgInlineElem.Link).type shouldBe "attachment"

You can see a work in progress visualization of parse tree structure here.

Going back

For going from an element to tokens, use the tokens property of the element. For further going from tokens to raw string, call inverseLex on the list of tokens.

// The following will be true in all cases
val reconstructedString = inverseLex(unparse(parse(OrgLexer(orgString).tokenize())!!))
reconstructedString shouldBe orgString

Supported OrgMode Features

The parser is not complete yet, but here are the currently supported elements (lexer capabilities follow whatever is needed for the parser):

Properties blocks, both at document level and under section headings.
Document preamble, tags, configs etc. This is not complete but the basic items like title are supported.
Ordered, unordered, nested lists with checkboxes.
Inside org text, only links, datetime stamps, and datetime ranges are supported. Other markup support is on the way.
#hashtags and #hashmetric(value). Note that these are not standard features of Org Mode but I find them helpful.
Horizontal line
Source, Quote, and Verse blocks
Page Intro, Edits, Aside blocks. These are additional features, not present in Org Mode.
Headings with tags, planning info, TODO states, and priorities

Notable missing items that will be added before calling this parser complete:

Tables
Inline markups like bold, italic, etc.
LaTeX elements
Custom todo states

Other features like footnotes, citations, and anything else missing will be added on a need basis.

Combinators

The list of parser combinators is relatively complete and can be checked out here.

Development

Development documentation will be added once the library is stabilized. Till then, checkout the source code here.

Packages

xyz.lepisma.orgmode

common

xyz.lepisma.orgmode.core

common

xyz.lepisma.orgmode.lexer

common