NewtonScript ByteCode

Matthew Faupel, Draft 1, 17 Nov 1994
David Arnold, HTML conversion, 8 Aug 1997

Contents

1
Background to ByteCode
1.1
The Representation of Compiled NewtonScript
1.2
How Values are Encoded in NewtonScript
2
ByteCode
2.1
The General Format of ByteCode
2.2
Explanation of the Description of Individual ByteCodes
2.3
ByteCode Descriptions

Abstract

This technical report details my discoveries about NewtonScript bytecode. It is not an official Apple document and as the information was discovered by observation only, it may be neither accurate nor complete. This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

The discoveries detailed in this technical report cover how compiled NewtonScript is represented on the Apple MessagePad platform, how values are encoded, the general format of bytecode, and the behaviour of individual bytecodes.

The author is indebted to Jason Harper for his ViewFrame Demo program, which made me realise that bytecode might not be as impenetrable as I first thought. Any omissions or errors in this document are though, entirely my responsibility.


1 Background to ByteCode

In order to understand fully the description of NewtonScript bytecode given in the second section of this report a number of things first need to be made clear.

1.1 The Representation of Compiled NewtonScript

A compiled NewtonScript function is represented on the MessagePad as a frame with the following fixed format:

{ class: 'CodeBlock, instructions: <bytecode of type 'instructions>, literals: [...], argFrame: { _nextArgFrame: {} _Parent: {}, _implementor: {}, ... }, numArgs: <integer> }

To give a concrete example, here is a short NewtonScript function and the frame that represents it:

func( foo, bar ) { begin class: 'CodeBlock, local fred := '[]; instructions: <instructions>, /* 18 A5 7D 02 is return fred */ literals: [ '[] ], end argFrame: { _nextArgFrame: {}, _Parent: {}, _implementor: {}, foo: NIL, bar: NIL, fred: NIL }, numArgs: 2 }

1.2 How Values are Encoded in NewtonScript

A knowledge of how NewtonScript encodes values is useful for understanding some of the bytecodes. Certain simple values such as integers, and the constants True and Nil are encoded as immediate values. More complex values such as floats, icons, frames and arrays are encoded as separate data blocks and then referred to via a pointer.

NewtonScript has a uniform way of encoding both the simple values and the pointers to more complex values within a 32 bit field. It achieves this by using the lowest two bits of the field as a type identifier. What the value means for each of the four possible types is as follows:

00 The field represents a signed integer in the range -229 to +229-1. i.e. if the low two bits of the 32 bit value are zero, the integer value that it represents can be obtained by performing value >> 2.
01 The field represents a pointer to a complex structure. The pointer value can be obtained by performing value & 0xFFFFFFFC.
10 The field represents a unique system value. As far as I'm aware the possible values are:
0x00000002 Nil
0x0000001A True
0x000UUUU6 The Unicode character \uUUUU, e.g. the NewtonScript constant $A translates to the immediate value 0x00000496.
11 The field represents a magic pointer. These are the constants that are reperesented as @ and have names beginning with ROM_ in the NTK definitions file. The number is encoded in the top 30 bits of the field. As an example, ROM_asciishift (@7) is encoded as 0x0000001F, i.e. ....000111 11.

2 ByteCode

2.1 The General Format of ByteCode

All NewtonScript bytecodes are either one or three bytes long. If the low three bits of the first byte in the sequence are all 1 then the following two bytes are also part of the instruction, otherwise it is a single byte instruction. The following are all examples of valid byte codes (shown in hexadecimal):

00 73 5F 02 15 37 00 10

Very often (but not always) the low three bits are used as a count value and if the required value exceeds 6 then all three bits are set and the following two bytes are used instead for the value. Hence this initially rather odd seeming format is actually quite useful for keeping bytecode compact. The use of this count field is explained in more detail below.

2.2 Explanation of the Description of Individual ByteCodes

All of the bytecode descriptions given below follow the same format:

Name Octal Hex Binary
Stack behaviour
Longer description

The bytecodes are given as octal values first, because this makes it easier to show the low three bits of the code as a separate digit. As mentioned above, these three bits often act as a count field. When this is the case, the letter `N' will be used instead of a digit and the role played by the count field will be explained in the stack behaviour and description sections. Remember that any bytecode listed as xxN can be either one or three bytes long!

The hex value of the bytecode is also given as tools such as ViewFrame tend to show raw data in hex format rather than octal. In the case of bytecodes with a count field, the hex value given will be with the count set to zero.

Bytecode is executed on a virtual machine that uses a stack of NewtonScript values (see section 1.2), hence how each instruction modifies the stack important. The stack behaviour is shown as:

{items removed from stack} -> {items placed on stack}

The items are listed in the order that they are pushed onto the stack, i.e. the leftmost is the first item pushed and the rightmost the last. Repetition of items is indicated by braces (...) followed by a repetition count. If an item on the stack must be of a certain type, the description says so, otherwise placeholders such as <#007F>X" are used. References to arrays, frames and symbols are indicated by [], {} and ' respectively. An empty stack is indicated by the symbol *.

2.3 ByteCode Descriptions

Name Octal Hex Binary
Stack behaviour
Longer description
pop 000 00 0000 0000
X -> *
Removes the top item from the stack
dup 001 01 0000 0001
X -> X, X
Duplicates the top item on the stack
return 002 02 0000 0010
* -> *
Exits from a routine. By convention, the top item on the stack is the return value of the routine. It is possible to return from a routine with more items on the stack (or none at all), however the current Apple NewtonScript compiler ensures that the functions that it produces always leave one and only one value on the stack.
self 003 03 0000 0011
* -> self
Pushes a reference to the frame containing the current function onto the stack. In other words this pushes the value represented by the NewtonScript value self.
create closure(?) 004 04 0000 0100
CodeBlock -> CodeBlock
This bytecode is inserted by the compiler whenever it has just pushed a CodeBlock frame either as a parameter to a function call or message send, or as the target of a call...with statement. My guess is that this initialises the three reserved fields in the argFrame of the CodeBlock (see 1.1), but I haven't yet attempted to verify this.
foreach next 005 05 0000 0101
[iterator] -> *
Steps to next item in iteration. See 307 000 021 - foreach for a full explanation.
foreach complete 006 06 0000 0110
[iterator] -> boolean
Checks if iteration is complete. See 307 000 021 - foreach for a full explanation.
end exception handling(?) 007 000 007 07 00 07 0000 0111 0000 0000 0000 0111
* -> *
See on exception for full explanation.
unused(?) 01N 08 through 0F 0000 1???
-
Apparently unused.
unused(?) 02N 10 through 17 0001 0???
-
Apparently unused.
push literal 03N 18 through 1F 0001 1???
* -> literal N
Pushes the literal which is element N of the literals array of this code block onto the stack.
push immediate 04N 20 through 27 0010 0???
* -> N
N here is treated as an immediate value as described in section 1.2. This instruction is used to push any immediate value that can be represented in 16 (sign extended) bits or less. Values that require the full 32 bits are stored as literals and pushed using 03N.

Some examples of the use of this instruction:

20 push 0 22 push Nil 23 push @0 24 push 1 27 00 1A push True 27 02 86 push $( 27 FF F8 push -2
call global 05N 28 through 2F 0010 1???
argument x N, 'FunctionSymbol -> *
N arguments to the function are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by a reference to the symbol that represents the name of the function (this is stored in the literals array and pushed using 03N). Although the stack description indicates that the instruction itself pushes nothing onto the stack, remember that all NewtonScript functions conventionally push a single item onto the stack before returning.

Example:

max( foo, bar ) translates to: 7B push foo (assuming it to be the first parameter of the function) 7C push bar (assuming it to be the second) 18 push 'max (assuming it to be the first literal in the function) 2A Call a global function with two arguments.
call with 06N 30 through 37 0011 0???
argument x N, CodeBlock -> *
This directly translates the NewtonScript call ... with() construct. N arguments to the function are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by a reference to a CodeBlock frame that has had the create closure instruction called on it. As before, remember that all NewtonScript functions conventionally push a single item onto the stack before returning.

Example:

call func(foo) return foo with (1) translates to: 24 push 1 18 push func (assuming it to be the first literal in the containing function). Note that inline function definitions like this are encoded as a CodeBlock frame in the literals array of the containing function. 04 Form a closure for the func 31 Call ... with passing one argument.
send message 07N 38 through 3F 0011 1???
argument x N, {Receiver}, 'Message -> *
This directly translates the NewtonScript ":" construct. N arguments to the message are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by a reference to the receiving frame and then the symbol of the message. Again, remember that all NewtonScript functions conventionally push a single item onto the stack before returning.

Example:

:Open() translates to: 03 push self 18 push 'Open (assuming it to be the first literal in the function) 38 Send message with zero parameters.
conditional send message 10N 40 through 47 0100 0???
argument x N, {Receiver}, 'Message -> * or Nil
This directly translates the NewtonScript ":?" construct. N arguments to the message are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by a reference to the receiving frame and then the symbol of the message to be sent if the given function exists. If the function doesn't exist, Nil is placed on the stack else it's left up to the function to put something there.

Example:

child:?DoItIfYouCan() translates to: 7B push child (assuming it to be the first parameter of the function) 18 push 'DoItIfYouCan (assuming it to be the first literal in the function) 40 Send message with zero parameters if possible.
inherited send message 11N 48 through 4F 0100 1???
argument x N, 'Message -> *
This directly translates the NewtonScript "inherited:" construct. N arguments to the message are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by the symbol of the message. Again, remember that all NewtonScript functions conventionally push a single item onto the stack before returning.

Example:

inherited:Open() translates to: 18 push 'Open (assuming it to be the first literal in the function) 48 Send message with zero parameters to inheritance parent.
conditional inherited send message 12N 50 through 57 0101 0???
argument x N, 'Message -> * or Nil
This directly translates the NewtonScript "inherited:?" construct. N arguments to the message are expected on the stack (pushed in the same order that they are listed in the function parameter list), followed by the symbol of the message to be sent if the given function exists. If the function doesn't exist, Nil is placed on the stack else it's left up to the function to put something there.

Example:

inherited:?DoItIfYouCan() translates to: 18 push 'DoItIfYouCan (assuming it to be the first literal in the function) 50 Send message with zero parameters to inheritance parent (if possible).
goto 13N 58 through 5F 0101 1???
* -> *
This instruction causes execution to pass to location N, where N is the offset from the start of the block of bytecode (i.e. gotos are absolute, not relative).
goto if not Nil 14N 60 through 67 0110 0???
X -> *
X is removed from the stack and examined. If it is Nil, execution continues with the next instruction in sequence. If it is not Nil, execution passes to location N where N is the offset from the start of the block of bytecode.
goto if Nil 15N 68 through 6F 0110 1???
X -> *
X is removed from the stack and examined. If it is not Nil, execution continues with the next instruction in sequence. If it is Nil, execution passes to location N where N is the offset from the start of the block of bytecode.
push external variable 16N 06 0111 0???
* -> literal index N
This bytecode pushes onto the stack the value contained in a variable or slot from outside of the function. The name of the variable to push is given by literal N in the function's literals array; this literal should be a symbol reference.

Example:

length( functions ) translates to: 70 push value of functions (assuming 'functions is the first entry in the literals array of the containing CodeBlock) C7 00 12 Return the length of functions.
push local variable 17N 78 through 7F 0111 1???
* -> value of slot N of argFrame
This bytecode pushes onto the stack the value contained in the Nth slot of the CodeBlock's argFrame frame. In other words, this bytecode is commonly used to push either the value of an argument to the function or that of a local variable.

Example:

func( foo ) return foo translates to: 73 push value of foo (remember argument slots start after the three fixed slots in argFrame, hence the index is 3 not 0). 02 Return
create frame 20N 80 through 87 1000 0???
value x N, [frameMap] -> {frame}
This bytecode creates a frame given N slot values on the stack plus a frame map describing the slots in the frame. The frame map is an array whose first item is Nil and then each subsequent item is a reference to a symbol giving the name of the corresponding slot in the frame being created. The values are pushed onto the stack in the order of the slot names in the frame map. If this is as clear as mud, maybe the example will help:

Example:

local fr := { size: 4, count: 7 } translates to: 27 00 10 push 4 27 00 1C push 7 18 push frame map (assuming it to be the first literal in the CodeBlock). The frame map is an array thus: [Nil, 'size, 'count]. 82 Create a frame with two slots A6 Store in fr (assumed to be the 6th slot in argFrame) A final NB. The frame map array isn't a normal array of class Array. Its class is a small integer, whose value seems to be made up from three possible flags. I'm investigating the meaning of these flags at the moment.
create array 21N 88 through 8F 1000 1???
element x N, 'Class -> [Class: ...] or
integer, 'Class -> [Class: ...]
This bytecode creates an array given N element values on the stack plus a reference to a symbol defining the class of the array. The class of the array is specified in NewtonScript with an initial ":" inside the array, e.g.: [stepChildren:]. If no class is specified, then the default class of Array is used.

In the special case of N being 0xFFFF, the instruction takes a single NewtonScript integer from the stack instead of N element values and creates an empty array (i.e. each element is Nil) with the given number of elements.

Example:

local foo := [ 4, 7 ] translates to: 27 00 10 push 4 27 00 1C push 7 18 push 'Array (assuming it to be the first literal in the CodeBlock) 8A Create an array with two elements A5 Store in foo (assumed to be the 5th slot in argFrame)
push slot value 221 91 1001 0001
{frame}, 'SlotName -> slot value
This takes a frame and the name of a slot in that frame and pushes the value contained in the slot. If the slot name is a single name then it is represented by a symbol reference. If the slot name is a path expression (e.g. foo.bar) then it is represented by an array of class pathExpr with one element per section of the path. Each element is then a reference to the symbol for that part of the path.

Example:

return self.foo.bar translates to: 03 push self 18 push '[pathExpr: 'foo, 'bar] (assuming it to be the first literal) 91 Get slot value 02 Return
assign to slot 230 98 1001 1000
{X}, 'SlotName, Y -> *
This assigns the value Y to the slot SlotName of frame X, i.e. X.SlotName := Y. As with push slot value the slot name is represented either by a simple symbol reference or a pathExpr array for a complex path expression.
assign to slot and push result 231 99 1001 1001
{X}, 'SlotName, Y -> Y
Identical to assign to slot except that the value assigned to the slot is also pushed onto the stack.
assign to local variable 24N 06 1010 0???
X -> *
Value X is assigned to the Nth slot of the CodeBlock's argFrame frame. In other words, this bytecode is usually used to set the value of a local variable.

Example:

local x := '[] translates to: 18 push '[] (assuming it to be the first literal) A7 00 10 Assign to x (assuming it to be 16th slot in argFrame)
assign to external variable 25N A8 through AF 1010 1???
X -> *
Value X is assigned to a variable or slot from outside of the function. The name of the variable to assign to is given by literal N in the function's literals array; this literal should be a symbol reference.
increment local variable 26N 1011 0???
Increment -> Increment, local variable N + Increment
This bytecode adds the given integer increment to the local variable in the Nth slot of the CodeBlock's argFrame and the returns both the increment and the new value of the local variable. This instruction is used internally as part of the coding of for loops.
for loop goto 27N B8 through BF 1011 1???
Increment, value, limit -> *
This instruction is placed at the end of a for loop construct. It takes the increment, current value of the counter (after having been incremented by increment local variable and the limit value of the loop. If the counter value now exceeds the loop limit, execution continues with the next instruction in sequence. If it does not, execution passes to location N where N is the offset from the start of the block of bytecode. This instruction is used internally as part of the coding of for loops.
add 300 C0 1100 0000
X, Y -> X+Y
The 30N bytecode series encodes a number of different low-level system function calls. Any NewtonScript operator or built-in function not explicitly mentioned here is implemented by using a standard function call (bytecode 05N). This first bytecode of the series implements addition (of integers or floats).
subtract 301 C1 1100 0001
X, Y -> X-Y
Subtraction of integers or floats.
dereference array object 302 C2 1100 0010
[X], Y -> X[Y]
This pushes the array element Y of array X. Y should be an integer.
assign to array element and push 303 C3 1100 0011
[X], Y, Z -> Z
This assignes value Z to the pushes the array element Y of array X i.e. X[Y] := Z. Y should be an integer.
compare equal 304 C4 1100 0100
X, Y -> Boolean (X=Y)
This compares X and Y for equality and pushes a Boolean result (True or Nil).
not 305 C5 1100 0101
X -> Boolean (not X)
This takes any value X and returns not X. Not Nil is True, not anything else is Nil.
compare not equal 306 C6 1100 0110
X, Y -> Boolean (X<>Y)
This compares X and Y for inequality and pushes a Boolean result (True or Nil).
multiply 307 000 007 C7 00 07 1100 0111 0000 0000 0000 0111
X, Y -> X*Y
This implements multiplication (of integers or floats.
divide with float result 307 000 010 C7 00 08 1100 0111 0000 0000 0000 1000
X, Y -> Float (X/Y)
This implements division (of integers or floats) with a float result, i.e. it corresponds to the "/" operator in NewtonScript.
divide with integer result 307 000 011 C7 00 09 1100 0111 0000 0000 0000 1001
X, Y -> Integer (X div Y)
This implements division of integers with an integer result, i.e. it corresponds to the <#007F>div" operator in NewtonScript.
compare less than 307 000 012 C7 00 0A 1100 0111 0000 0000 0000 1010
X, Y -> Boolean (X < Y)
This checks if X is less than Y and pushes a Boolean result (True or Nil).
compare greater than 307 000 013 C7 00 0B 1100 0111 0000 0000 0000 1011
X, Y -> Boolean (X > Y)
This checks if X is greater than Y and pushes a Boolean result (True or Nil).
compare greater or equal 307 000 014 C7 00 0C 1100 0111 0000 0000 0000 1100
X, Y -> Boolean (X >= Y)
This checks if X is greater than or equal to Y and pushes a Boolean result (True or Nil).
compare less or equal 307 000 015 C7 00 0D 1100 0111 0000 0000 0000 1101
X, Y -> Boolean (X <= Y)
This checks if X is less than or equal to Y and pushes a Boolean result (True or Nil).
binary and 307 000 016 C7 00 0E 1100 0111 0000 0000 0000 1110
Integer X, integer Y -> Integer band(X,Y)
This performs a binary and of the two integers X and Y.
binary or 307 000 017 C7 00 0F 1100 0111 0000 0000 0000 1111
Integer X, integer Y -> Integer bor(X,Y)
This performs a binary or of the two integers X and Y.
binary not 307 000 020 C7 00 10 1100 0111 0000 0000 0001 0000
Integer X -> Integer bnot(X)
This performs a binary not of the integer X.
foreach 307 000 021 C7 00 11 1100 0111 0000 0000 0001 0001
X, Boolean -> [Iterator]
This is the fundamental instruction used to implement all NewtonScript foreach loops. If the Boolean is Nil it begins a normal foreach loop and if it is True it begins a foreach deeply loop. The iterator returned is then passed to the other two related instructions 005 (foreach next) and 006 (foreach complete). The iterator itself is an array containing various useful state information about what is being iterated over and how far the iteration has progressed.

foreach next takes the iterator and modifies it so that the next item in the iteration becomes the current item. foreach complete checks the state of the iterator and returns True if all items have been iterated over and Nil if not.

So, a foreach loop is implemented in bytecode as:

?? push thing being iterated over 27 00 1A/22 push True or Nil (for deeply or not) C7 00 11 Create foreach iterator A? Assign it to a local variable. ... Perform other initialisation (depends on the loop type) 5F ?? ?? Goto end of loop check (Y:) X: ... Code executed each time round the iteration. 7? Push the local variable referencing the iterator 05 And move to the next thing to be iterated Y: 7? Push the iterator again 06 And check if we've finished iterating 6F ?? ?? Go round the loop again if not (X:)
length 307 000 022 C7 00 12 1100 0111 0000 0000 0001 0010
X -> Integer length(X)
This implements the length function.
clone 307 000 023 C7 00 13 1100 0111 0000 0000 0001 0011
X -> clone(X)
This implements the clone function.
set class 307 000 024 C7 00 14 1100 0111 0000 0000 0001 0100
X, 'Symbol -> modified X
This implements the SetClass function e.g. SetClass( data, 'Binary ). X is a complex data structure (frame, array or data block); SetClass sets the class of that structure and then returns a reference to the now modified structure.
add array slot 307 000 025 C7 00 15 1100 0111 0000 0000 0001 0101
[X], Y -> Y
This implements the AddArraySlot function e.g. AddArraySlot( foo, 2 ). Y is added as the last element of the array X.
make string 307 000 026 C7 00 16 1100 0111 0000 0000 0001 0110
[Array of strings] -> String
This takes an array of strings and combines them into a single string. This function is used to implement the NewtonScript operators "&" and "&&". "&&" is currently done by inserting an extra single space string into the array of strings to be translated.

Example:

fred && "me" translates to: 70 push value of fred ('fred is the first literal) 19 1A 1B push " " (second), "me" (third) and 'Array (fourth) 8B Make an array from 3 parameters and a class symbol C7 00 16 Make a string from the array
slot exists 307 000 027 C7 00 17 1100 0111 0000 0000 0001 0111
{X}, 'SlotName -> Boolean (True if X contains the given slot)
This implements the exists operator, but only when checking for the existance of a slot in a frame (e.g. foo.bar exists). When checking for the existance of a variable, a function call (bytecode 05N) to the function HasVar is used instead.
class of 307 000 030 C7 00 18 1100 0111 0000 0000 0001 1000
X -> ClassOf(X)
This implements the ClassOf function.
on exception 31N C8 through CF 1100 1???
('Exception, byte offset)N -> *
This takes N pairs of an exception symbol and an offset into the bytecode to jump to if that exception occurs. This is used to encode the high level try ... onexception construct. Note that unlike all the goto bytecodes, the offset is encoded as a standard NewtonScript integer. At the end of both the normal and the exception code, the bytecode 007 000 007 is executed - I guess it clears whatever state was set up.

Example:

try 1/0 onexception |evt.ex.div0| do nil becomes: 18 push '|evt.ex.div0| (first literal) 27 00 40 push integer 16 C9 onexception '|evt.ex.div0| goto 16 24 push 1 20 push 0 C7 00 08 1/0 07 00 07 End exception block (?) 5F 00 14 Goto 20 (i.e. the end of this block) 16: 22 Push Nil 07 00 07 End exception block (?)


Apple and Newton are trademarks of Apple Computer, Inc., registered in the United States and other countries. MessagePad, NewtonScript and Newton ToolKit are trademarks of Apple Computer, Inc. ViewFrame is a trademark of Jason Harper.