pico8parse

A Lua parser written in JavaScript, with support for the PICO-8 flavour.
Luaparse is originally written by Oskar Schöldström for his bachelor's thesis at Arcada.

Support for the PICO-8 Flavour of Lua

The PICO-8 Lua is based of the 5.2 version of the Lua language, a such most 5.2 syntaxes and behavior are present in PICO-8 Lua.

The values added for the parser luaVersion option are:

If the targeted version is not in this list, the closest preceding version should have the same syntaxes.

TL;DR: PICO-8 Lua differs from standard Lua because it is preprocessed; for example the '?' "operator" is simply replaced with ' print('. The way this is done introduces many cursed syntaxes (and I don't want to maintain a list of these). To see the result of this preprocessing you can use this same trick (until fixed) with a cartridge containing exactly:

pico-8 cartridge

__lua__
printh[=[
#include file_with_code_to_inspect
]=]

Miscellaneous

Not Covered / Inaccuracies

These changes (specific to PICO-8's Lua) have been considered out of scope:

Differences with the official interpreter for PICO-8 version 0.2.1:

Comments

PICO-8 Lua reads // as a single line comment (similarly to C-inspired languages). Do note that this is not a simple alias for -- as it will not start a longstring comment:

// comment

//[[ comment
  not comment,
  but syntax error
]]

Adding to the list of changes to comments, longstring comments cannot use the = as depth:

--[==[ comment
  not comment but syntax error
  (no MD highlight to show that)
]==]

This syntax is kept for actual strings:

local text = [==[hey
how's it going?]==]
print(text)

Numerals

Binary Literal

PICO-8 accepts binary notation for numeral the same way Lua 5.2+ does for hexadecimal:

No Exponent, no Complex, no Unsigned nor Long

Lua numerals can consist of trailing "U", "I" and "L"s, none of which is a supported notation for PICO-8. The same way, the exponential and binary exponential notations are seen as error:

Operators

Unary Operators

The following are unary operators on a number (with the same precedence as -, ~ and not):

Binary Operators

Here is a set of bitwise operators (so on numbers)

The Lua integer division (usually written //) represents a single line comment in PICO-8 so the \ is used instead.

This flavour also implement the != as an alias for ~=.

The shift and rotates have the same precedence as Lua's shift operators << and >>. The exclusive or, not-equal and integer division keep the same precedence.

Assignment Operations

Every binary operator that is not a comparison have an assignment syntax. The full list is:

assignment equivalent
a += b a = a + b
a -= b a = a - b
a *= b a = a * b
a /= b a = a / b
a \= b a = a \ b
a %= b a = a % b
a ^= b a = a ^ b
a ..= b a = a .. b
a |= b a = a | b
a &= b a = a & b
a ^^= b a = a ^^ b
a <<= b a = a << b
a >>= b a = a >> b
a >>>= b a = a >>> b
a <<>= b a = a <<> b
a >><= b a = a >>< b

It must be noted that for this syntax to be implemented, a new type of AST node was added under the name of 'AssignmentOperatorStatement'. These present the properties variables and init of an 'AssignmentStatement' as well as the operator of a 'BinaryExpression' node.

Single Line Statements

Important notice: this feature introduces a meaning to a new-line sequence, which is a blank character and get ignored by Lua.

Idea

This syntax introduce the possibility to write certain block without having to specify its bounds using tokens, similarly to some other languages (for example an if statement in JavaScript do no need curly brackets when its body consists of only one "line" of instructions).

if (condition) instruction1()
instruction2()

-- is semantically the same as

if condition then instruction1() end instruction2()
if (condition) instruction1() else instruction2()

-- is semantically the same as

if condition then instruction1() else instruction2() end
while (condition) instruction()

-- is semantically the same as

while condition do instruction() end

Notice how:

Quirks

While this change initially only apply to the if and while statements, its actual implementation in the PICO-8 Lua runtime leads to unexpectedly correct or incorrect syntaxes which also impacts the standard do, function and for amongst other things.

Without any evidences to back the following claims as to how PICO-8 parses its Lua, testings have shown that the behavior of the actual implementation are similar to applying the following preprocessing to the raw script:

If a line contains an if token (resp. while) which does not present an associated then token (resp. do) right after its condition, the very next new-line sequence (or if lack thereof, end-of-file) is treated as an end token. This end token is in no way associated with the opened if .. then statement's body block (resp. do .. while) and essentially gives any end-of-lines potentially the meaning of an end token.

This gives rise to a bunch of cursed valid syntaxes (try to guess the scopes):

PICO-8 SyntaxResult of Preprocess
if (cdt) do
end
if cdt then do end
end
if (cdt) end do
if cdt then end do end
if (cdt) end function fun()
if cdt then end function fun() end
-- incorrect!
if (cdt1) end while (cdt2) instr() do

-- correct!
if (cdt1) end while (cdt2) do instr()
-- indeed... weird
if cdt1 then end while cdt2 instr() do end

-- correct!
if cdt1 then end while cdt2 do instr() end

Note for slimmer screens: outside of the first row, all are on a single line.

Additionally

This syntax of "single line" statements also comes with an arbitrary rule that the character preceding the if (or while) token must be either:

-- invalid
instr()--[[]]if (cdt) instr()
instr()--[[]]--[[]]if (cdt) instr()
instr() --[[]]--[[]]if (cdt) instr()

-- valid
instr() --[[]]if (cdt) instr()
instr()--[[]] if (cdt) instr()
a = 0xf--[[]]if (cdt) instr()

This dependency on the preceding character/token was removed in version 0.2.4 (making every lines above valid, even without comments).

Finally, it is important to notice that the body of the clause cannot be empty. Or maybe just sometimes. To be more specific, a single line if or while with no instruction following (before end-of-line) is not considered valid, but a single line if containing an else clause (albeit empty) is valid.

-- invalid
while (cdt)
if (cdt)

-- valid
while (cdt) instr()
if (cdt) instr()
if (cdt) else instr()

-- also valid
if (cdt) else

This was changed in version 0.2.3, above which the following is made possible:

-- valid
while (cdt) ;

-- still invalid
while (cdt)

The special character ? acts as a short hand for the print function. As such, it does expect a parameter list as its following tokens

Example: ? "text", 1, 2, 3 is equivalent to print("text", 1, 2, 3).

However the implementation of it is somewhat capricious and very new-line dependent:

-- invalid
instr() ?"text"
?"text" ; instr()
?("text", 1)

-- still invalid
instr()    --[[]]    ?"text"

-- valid
?"text"
--[[]]?"text"
?("text"), (1)

Although from version 0.2.3 onward, this was changed to allow the ? "operator" to appear on the same line as a single line if/while, making the following valid:

-- valid
if (cdt) ?"text"
instr() ?"text"

-- also valid
??"text") -- what?

The actual behavior of the preprocessor regarding the ? is rather simple and explains that last line:

Therefor ?"text" translates to print("text") and ??"text" would only translate to print( print("text") (only one closing parenthesis).

P8SCII

Remark: this feature is only available when specifying a version above 'PICO-8-0.2.2' and might be erroneous in some places.

A new set of escape sequences become valid in PICO-8:

The following sequence are already valid in Lua 5.2 and thus are not modified, but they get a mention as their meaning was changed (and maybe their validation pattern too):

Support for the PICO-8 Associated File Format (.p8, text)

Scripts in the PICO-8 Lua flavour are usually part of .p8 files. The convoluted definition for this file format is described bellow. Only 'PICO-8-x.y.z' luaVersions are expected to be .p8 files (via the strictP8FileFormat feature inherited from the abstract version 'PICO-8'). Despite that, the ignoreStrictP8FileFormat parser option may be used to ignore some behaviors.

Uh.. note: as per the current implementation of ignoreStrictP8FileFormat, only the header check is ignored and the stream is assumed to start in a '__lua__' section; but further section-starting sequences are still accounted for.

Overall Format

A typical .p8 file starts with the following 2-lines header:

pico-8 cartridge // http://www.pico-8.com
version VER

Where VER is a numeric version (decimal integer, no leading sign or zeros).

It may then consists of up to 7 sections all introduced in order by their respective sequence:

__lua__
__gfx__
__gff__
__label__
__map__
__sfx__
__music__

The sequence is usually found alone on its line, with no other character (except for EOL). The associated section then starts from the following line until the next section or EOF.

Of these 7 section, only the '__lua__' one consists of actual valid Lua code.

More About the Header

The exact definition that makes a .p8 file's header valid is looser than introduced above: as long as the string "pico-8 cartridge" (16 characters, case sensitive) is present on the first line, the rest of the file is parsed fine.

But in addition to checking the first line, the very next one is entirely discarded (it typically contains the version VER mention). This behavior makes it so if it contains one of the 7 sequences presented above, it will not count as staring a section.

Past these 2 lines, any character is discarded until a valid section starts (se bellow).

More About Sections

As shown above, all section-starting sequences are similar in syntax to Python's dunder, '__xyz__' and '__abc__' are used here to specify any of those.

When reading a .p8 file line-be-line, a new section starts when a line starts by a sequence from the list above. This lines is then silently discarded. Because only these first few characters are checked:

The various sections of a .p8 files does not have to be present in the same order as listed above. Beyond that, they do no have to be present at all as much as they can be present multiple times:

__xyz__
content of section xyz

__abc__
content of section abc

__xyz__
content AGAIN of section xyz

When a file presents multiple sections under the same '__xyz__', their content is essentially concatenated (line-wise). The snippet above is equivalent to this (note the newlines):

__xyz__
content of section xyz

content AGAIN of section xyz
__abc__
content of section abc

Although the section '__xyz__' now shows an empty line, it is important to note that each section also have their own set of internal rules, to which empty lines may be removed.

Coming back to parsing Lua, all this means that the following sources get effectively parsed somewhat unexpectedly:

Actual File ContentResult of Preprocess
pico-8 cartridge
__lua__
a = [[
__gfx__
bonjour
]]
__lua__
]]
pico-8 cartridge
__lua__
a = [[
]]
pico-8 cartridge
__lua__
__lua__ = 42
print(__lua__)
pico-8 cartridge
__lua__
print(__lua__)
pico-8 cartridge
--[[ how to write
documentation for
__lua__ PICO-8? ]]
print("coucou")
pico-8 cartridge
__lua__
print("coucou")
pico-8 cartridge
with no section
at all
pico-8 cartridge