A Lua parser written in JavaScript, with support for the PICO-8 flavour.
Luaparse is originally written by Oskar Schöldström for his bachelor's thesis at Arcada.
The PICO-8 Lua is based of the 5.2 version of the Lua language, a such most 5.2 syntaxes and behavior are present in PICO-8 Lua.
The values added for the parser luaVersion
option are:
'PICO-8'
(should not be used, just for feature inheritance)'PICO-8-0.2.1'
'PICO-8-0.2.2'
(adds support for some escape sequences)'PICO-8-0.2.3'
(empty if
/while
, ?
on same line)'PICO-8-0.2.4'
(any character before if
/while
)'PICO-8-0.2.4c'
(special __meta:
sections)'PICO-8-0.2.5'
(normal bitwise xor)If the targeted version is not in this list, the closest preceding version should have the same syntaxes.
TL;DR: PICO-8 Lua differs from standard Lua because it is preprocessed; for example the
'?'
"operator" is simply replaced with' print('
. The way this is done introduces many cursed syntaxes (and I don't want to maintain a list of these). To see the result of this preprocessing you can use this same trick (until fixed) with a cartridge containing exactly:pico-8 cartridge __lua__ printh[=[ #include file_with_code_to_inspect ]=]
These changes (specific to PICO-8's Lua) have been considered out of scope:
#include ..
directive__lua__
section with local a,
Differences with the official interpreter for PICO-8 version 0.2.1:
::_::a+=1
" (gives a funny syntax error expected near '+'
)a+=1+cos"0"
" (runtime error attempt to perform arithmetic on global 'cos' [...]
)+=
and such, stare at PICO-8's changelogs for more of these)PICO-8 Lua reads //
as a single line comment (similarly to C-inspired languages). Do note that this is not a simple alias for --
as it will not start a longstring comment:
// comment
//[[ comment
not comment,
but syntax error
]]
Adding to the list of changes to comments, longstring comments cannot use the =
as depth:
--[==[ comment
not comment but syntax error
(no MD highlight to show that)
]==]
This syntax is kept for actual strings:
local text = [==[hey
how's it going?]==]
print(text)
PICO-8 accepts binary notation for numeral the same way Lua 5.2+ does for hexadecimal:
0b101010
: 420b0.1
: 0.50b10.
: 2.0b.01
: .125Lua numerals can consist of trailing "U", "I" and "L"s, none of which is a supported notation for PICO-8. The same way, the exponential and binary exponential notations are seen as error:
2.1i
4U
0xB0p99
0.31416E1
The following are unary operators on a number (with the same precedence as -
, ~
and not
):
@
example @0x2000
$
example $-1
%
example %100
Here is a set of bitwise operators (so on numbers)
>>>
(ignores the sign)<<>
>><
~
) is replaced with the token ^^
The Lua integer division (usually written //
) represents a single line comment in PICO-8 so the \
is used instead.
This flavour also implement the !=
as an alias for ~=
.
The shift and rotates have the same precedence as Lua's shift operators <<
and >>
. The exclusive or, not-equal and integer division keep the same precedence.
Every binary operator that is not a comparison have an assignment syntax. The full list is:
assignment | equivalent |
---|---|
a += b |
a = a + b |
a -= b |
a = a - b |
a *= b |
a = a * b |
a /= b |
a = a / b |
a \= b |
a = a \ b |
a %= b |
a = a % b |
a ^= b |
a = a ^ b |
a ..= b |
a = a .. b |
a |= b |
a = a | b |
a &= b |
a = a & b |
a ^^= b |
a = a ^^ b |
a <<= b |
a = a << b |
a >>= b |
a = a >> b |
a >>>= b |
a = a >>> b |
a <<>= b |
a = a <<> b |
a >><= b |
a = a >>< b |
It must be noted that for this syntax to be implemented, a new type of AST node was added under the name of 'AssignmentOperatorStatement'
. These present the properties variables
and init
of an 'AssignmentStatement'
as well as the operator
of a 'BinaryExpression'
node.
Important notice: this feature introduces a meaning to a new-line sequence, which is a blank character and get ignored by Lua.
This syntax introduce the possibility to write certain block without having to specify its bounds using tokens, similarly to some other languages (for example an if
statement in JavaScript do no need curly brackets when its body consists of only one "line" of instructions).
if (condition) instruction1()
instruction2()
-- is semantically the same as
if condition then instruction1() end instruction2()
if (condition) instruction1() else instruction2()
-- is semantically the same as
if condition then instruction1() else instruction2() end
while (condition) instruction()
-- is semantically the same as
while condition do instruction() end
Notice how:
if
statement cannot present an elseif
clauseWhile this change initially only apply to the if
and while
statements, its actual implementation in the PICO-8 Lua runtime leads to unexpectedly correct or incorrect syntaxes which also impacts the standard do
, function
and for
amongst other things.
Without any evidences to back the following claims as to how PICO-8 parses its Lua, testings have shown that the behavior of the actual implementation are similar to applying the following preprocessing to the raw script:
If a line contains an if
token (resp. while
) which does not present an associated then
token (resp. do
) right after its condition, the very next new-line sequence (or if lack thereof, end-of-file) is treated as an end
token. This end
token is in no way associated with the opened if .. then
statement's body block (resp. do .. while
) and essentially gives any end-of-lines potentially the meaning of an end
token.
This gives rise to a bunch of cursed valid syntaxes (try to guess the scopes):
PICO-8 Syntax | Result of Preprocess |
---|---|
|
|
|
|
|
|
|
|
Note for slimmer screens: outside of the first row, all are on a single line.
This syntax of "single line" statements also comes with an arbitrary rule that the character preceding the if
(or while
) token must be either:
a = 0xfif (cdt) instr()
is valid (and a
is 15)-- invalid
instr()--[[]]if (cdt) instr()
instr()--[[]]--[[]]if (cdt) instr()
instr() --[[]]--[[]]if (cdt) instr()
-- valid
instr() --[[]]if (cdt) instr()
instr()--[[]] if (cdt) instr()
a = 0xf--[[]]if (cdt) instr()
This dependency on the preceding character/token was removed in version 0.2.4 (making every lines above valid, even without comments).
Finally, it is important to notice that the body of the clause cannot be empty. Or maybe just sometimes. To be more specific, a single line if
or while
with no instruction following (before end-of-line) is not considered valid, but a single line if
containing an else
clause (albeit empty) is valid.
-- invalid
while (cdt)
if (cdt)
-- valid
while (cdt) instr()
if (cdt) instr()
if (cdt) else instr()
-- also valid
if (cdt) else
This was changed in version 0.2.3, above which the following is made possible:
-- valid
while (cdt) ;
-- still invalid
while (cdt)
The special character ?
acts as a short hand for the print
function. As such, it does expect a parameter list as its following tokens
Example: ? "text", 1, 2, 3
is equivalent to print("text", 1, 2, 3)
.
However the implementation of it is somewhat capricious and very new-line dependent:
?
character except a multiline comment?
must be a valid parameter list-- invalid
instr() ?"text"
?"text" ; instr()
?("text", 1)
-- still invalid
instr() --[[]] ?"text"
-- valid
?"text"
--[[]]?"text"
?("text"), (1)
Although from version 0.2.3 onward, this was changed to allow the ?
"operator" to appear on the same line as a single line if
/while
, making the following valid:
-- valid
if (cdt) ?"text"
instr() ?"text"
-- also valid
??"text") -- what?
The actual behavior of the preprocessor regarding the ?
is rather simple and explains that last line:
?
is replaced with exactly print(
Therefor ?"text"
translates to print("text")
and ??"text"
would only translate to print( print("text")
(only one closing parenthesis).
Remark: this feature is only available when specifying a version above
'PICO-8-0.2.2'
and might be erroneous in some places.
A new set of escape sequences become valid in PICO-8:
\*
\+
\-
\|
\#
\^
The following sequence are already valid in Lua 5.2 and thus are not modified, but they get a mention as their meaning was changed (and maybe their validation pattern too):
\a
"audio"\v
decorate previous character\f
set foreground colour\014
switch to font defined at 0x5600\015
switch to default fontScripts in the PICO-8 Lua flavour are usually part of .p8 files. The convoluted definition for this file format is described bellow. Only 'PICO-8-x.y.z' luaVersion
s are expected to be .p8 files (via the strictP8FileFormat
feature inherited from the abstract version 'PICO-8'). Despite that, the ignoreStrictP8FileFormat
parser option may be used to ignore some behaviors.
Uh.. note: as per the current implementation of
ignoreStrictP8FileFormat
, only the header check is ignored and the stream is assumed to start in a'__lua__'
section; but further section-starting sequences are still accounted for.
A typical .p8 file starts with the following 2-lines header:
pico-8 cartridge // http://www.pico-8.com
version VER
Where VER
is a numeric version (decimal integer, no leading sign or zeros).
It may then consists of up to 7 sections all introduced in order by their respective sequence:
__lua__
__gfx__
__gff__
__label__
__map__
__sfx__
__music__
The sequence is usually found alone on its line, with no other character (except for EOL). The associated section then starts from the following line until the next section or EOF.
Of these 7 section, only the '__lua__'
one consists of actual valid Lua code.
The exact definition that makes a .p8 file's header valid is looser than introduced above: as long as the string "pico-8 cartridge"
(16 characters, case sensitive) is present on the first line, the rest of the file is parsed fine.
But in addition to checking the first line, the very next one is entirely discarded (it typically contains the version VER
mention). This behavior makes it so if it contains one of the 7 sequences presented above, it will not count as staring a section.
Past these 2 lines, any character is discarded until a valid section starts (se bellow).
As shown above, all section-starting sequences are similar in syntax to Python's dunder,
'__xyz__'
and'__abc__'
are used here to specify any of those.
When reading a .p8 file line-be-line, a new section starts when a line starts by a sequence from the list above. This lines is then silently discarded. Because only these first few characters are checked:
'__xyz__'
but before the EOL are effectively discarded'__xyz__'
invalidates the check, no new section is startedThe various sections of a .p8 files does not have to be present in the same order as listed above. Beyond that, they do no have to be present at all as much as they can be present multiple times:
__xyz__
content of section xyz
__abc__
content of section abc
__xyz__
content AGAIN of section xyz
When a file presents multiple sections under the same '__xyz__'
, their content is essentially concatenated (line-wise). The snippet above is equivalent to this (note the newlines):
__xyz__
content of section xyz
content AGAIN of section xyz
__abc__
content of section abc
Although the section
'__xyz__'
now shows an empty line, it is important to note that each section also have their own set of internal rules, to which empty lines may be removed.
Coming back to parsing Lua, all this means that the following sources get effectively parsed somewhat unexpectedly:
Actual File Content | Result of Preprocess |
---|---|
|
|
|
|
|
|
|
|