First Commit

2025-02-06 22:24:29 +08:00
parent ed7df4c81e
commit 7539e6a53c
18116 changed files with 6181499 additions and 0 deletions
--- a/externals/nihstro/docs/instruction_set.md
+++ b/externals/nihstro/docs/instruction_set.md
@@ -0,0 +1,315 @@
+# Shader Instruction Set
+
+This page gives an overview over the instruction set supported by nihstro. Note that there is a similar reference list on [3dbrew](http://3dbrew.org/wiki/Shader_Instruction_Set), which documents the actual implementation on hardware though. nihstro seeks to abstract away annoying details like the fact that there are 3 different CALL instructions, and instead provides convenience shortcuts where possible without giving up flexibility.
+
+# Table of Contents
+
+- [Shader Instruction Set](#shader-instruction-set)
+  - [Arithmetic Instructions](#arithmetic-instructions)
+  - [Flow Control Instructions](#flow-control-instructions)
+  - [Special Purpose Instructions](#special-purpose-instructions)
+
+## Arithmetic Instructions
+Most arithmetic instructions take a destination operand and one or more source operands. Source operands may use any kind of swizzle mask, while destination operands may not use reordering or duplicating swizzle masks. Below you will find a short operation description for each instruction, e.g. `dest[i] = src[i]`, which means that the `i`-th source component (as specified by the swizzle mask) will be assigned to the `i`-th destination component (as specified by the swizzle mask), with `i` ranging from 1 to the number of swizzle mask components. Components not listed in the destination swizzle mask hence will not be written.
+
+Static indexing (i.e. indexing with a constant, not to be confused with the above notation) may be done for both operand types. Source operands additionally support *dynamic indexing*, where the index depends on one of the address registers `a0`/`a1` or on the loop counter `lcnt`. Examples:
+* static indexing: `c0[20]`
+* dynamic indexing: `c0[2+a0]`
+
+#### mov: Copy floating point value
+Syntax: `mov dest_operand, src_operand`
+
+Operation: `dest[i] = src[i]`
+
+Restrictions:
+* `src` and `dest` must have the same number of components
+
+#### add: Per-component floating point sum
+Syntax: `add dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = src1[i] + src2[i]`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+Notes:
+* subtraction can be performed using negation: `add r0, c0, -c1`
+* when chaining an addition and a multiplication, consider using `mad` instead
+
+#### mul: Per-component floating point multiplication
+Syntax: `mul dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = src1[i] * src2[i]`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+Notes:
+* division can be performed by computing the reciprocal of src2 and multiplying the result: `rcp r0, c1; mul r0, c0, r0`
+* when chaining an addition and a multiplication, consider using `mad` instead
+
+#### mad: Fused multiply-add of three floating point numbers
+Syntax: `mad dest_operand, src1_operand, src2_operand, src3_operand`
+
+Operation: `dest[i] = src1[i] * src2[i] + src3[i]`
+
+Restrictions:
+* `src1`, `src2`, `src3`, and `dest` must have the same number of components
+* not more than two source operands may be float uniform registers
+* no dynamic indexing may be performed on any of the source operands.
+
+Notes:
+* when dynamic indexing is not avoidable, use `add` and `mul` instead
+* not supported currently
+
+#### max: Copy the greater of two floating point numbers
+Syntax: `max dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = max(src1[i], src2[i])`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### min: Copy the smaller of two floating point numbers
+Syntax: `min dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = min(src1[i], src2[i])`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### flr: Floating point floor
+Syntax: `flr dest_operand, src_operand`
+
+Operation: `dest[i] = floor(src[i])`
+
+Restrictions:
+* `src` and `dest` must have the same number of components
+
+#### rcp: Floating point reciprocal
+Syntax: `rcp dest_operand, src_operand`
+
+Operation: `dest[i] = 1 / src[i]`
+
+Restrictions:
+* `src` and `dest` must have the same number of components
+
+#### rsq: Floating point reciprocal square root
+Syntax: `rsq dest_operand, src_operand`
+
+Operation: `dest[i] = 1 / sqrt(src[i])`
+
+Restrictions:
+* `src` and `dest` must have the same number of components
+
+#### exp: Floating point base-2 exponential
+Syntax: `exp dest_operand, src_operand`
+
+Operation: `dest[i] = exp(src[i])`
+
+Restrictions:
+* `src1` and `dest` must have the same number of components
+
+#### log: Floating point base-2 logarithm
+Syntax: `log dest_operand, src_operand`
+
+Operation: `dest[i] = log(src[i])`
+
+Restrictions:
+* `src1` and `dest` must have the same number of components
+
+#### dp3: Floating point 3-component dot-product
+Syntax: `dp3 dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2])`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### dp4: Floating point 4-component dot-product
+Syntax: `dp4 dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2]+src1[3]*src2[3])`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### dph: Floating point homogeneous dot-product
+Syntax: `dph dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2]+src2[3]`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing.
+
+#### sge: Set to one if greater or equal
+Syntax: `sge dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = (src1[i] >= src2[i]) ? 1.0 : 0.0`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### slt: Set to one if (strictly) less
+Syntax: `slt dest_operand, src1_operand, src2_operand`
+
+Operation: `dest[i] = (src1[i] < src2[i]) ? 1.0 : 0.0`
+
+Restrictions:
+* `src1`, `src2`, and `dest` must have the same number of components
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+#### mova: Move to address register
+Syntax: `mova src_operand`
+
+Operation:
+
+    a0 = src.x
+    a1 = src.y
+
+Restrictions:
+* src_operand must be a two-component vector.
+
+Notes:
+* not supported currently
+
+## Flow Control Instructions
+These allow for non-linear code execution, e.g. by conditionally or repeatedly running code.
+
+Some flow control instruction take a "condition" parameter. A condition is either
+* a boolean uniform or
+* an expression consisting of one or two conditional code components, combined via `&&` ("and") or `||` ("or"), and optionally negated. Examples: `cc.x`, `cc.y && !cc.x`
+
+#### cmp: Compare two floating point numbers
+
+Syntax: `cmp src1_operand, src2_operand, op1, op2`
+
+`op1` and `op2` may be any of the strings `==` (equal), `!=` (not equal), `<` (less than), `<=` (less than or equal to), `>` (greater than), and `>=` (greater than or equal to).
+
+Operation:
+
+    cc.x = (src1[0] op1 src2[0])
+    cc.y = (src1[1] op2 src2[1])
+
+Restrictions:
+* `src1` and `src2` must be two-component vectors
+* it is not possible to set `cc.x` without also setting `cc.y`
+* not more than one of the source operands may be a float uniform register and/or use dynamic indexing
+
+Notes:
+* this instruction is used to set conditional codes, which can be used as conditions for `if`/`jmp`/`call`/`break`.
+
+#### if: Conditional code execution
+Syntax: `if condition`
+
+Operation:
+If `condition` is true, conditionally executes the code between itself and the corresponding `else` or `endif` pseudo-instruction. Otherwise, executes the code in the `else` branch, if one is given (otherwise, skips the branch body and continues after the `endif` statement).
+
+Restrictions:
+* not more than one `else` branch may be specified (`else if` syntax is not supported)
+
+Notes:
+* all `if` branches must be closed explicitly using `endif`
+* jumping out of a branch body may result in undefined behavior
+
+Example:
+
+    if cc.x && !cc.y
+        // do stuff
+    else
+        if b0
+            // do other stuff
+        endif
+    endif
+
+#### loop: Repeat code execution
+Syntax: `loop int_uniform`
+
+Operation:
+Initialize `lcnt` to `int_uniform.y`, then process code between `loop` and `endloop` for `int_uniform.x+1` iterations in total. After each iteration, `lcnt` is incremented by `int_uniform.z`.
+
+Restrictions:
+* no swizzle mask may be applied on the given uniform
+* there is no direct way of looping zero times (the easiest way is to use `break` with an extra boolean uniform)
+
+Notes:
+* `lcnt` can be used to dynamically index arrays, e.g. to implement vertex lighting with multiple light sources
+
+#### break: Break out of current loop
+Syntax: `break condition`
+
+Operation:
+If `condition` is true, break out of the current loop.
+
+Restrictions:
+* jumping out of a branch body may result in undefined behavior
+
+#### jmp: Jump to code address
+Syntax: `jmp target_label if condition`
+
+Restrictions:
+* jumping out of or into branch bodies or loops may result in undefined behavior
+* there is no way to force a jump without specifying a condition
+
+Notes:
+* if you need to automatically return from a function, use `call` instead
+
+Example:
+
+    main:
+        jmp my_helper_code if b0
+        // if not b0, do other stuff here
+        nop
+        end
+
+    my_helper_code:
+        // do stuff
+        nop
+        end
+
+#### call: Jump to code address and return to caller
+Possible syntaxes:
+`call target_label until return_label if condition`
+`call target_label until return_label`
+
+Operation:
+If `condition` is true (or none is given), jumps to `target_label` and processes shader code there until `return_label` is hit, at which point code execution jumps back to the caller.
+
+Restrictions:
+* jumping out of or into branch bodies or loops may result in undefined behavior
+
+Notes:
+* if you don't need to automatically return from a function, use `jmp` instead
+
+Example:
+
+    main:
+        call my_helper_code until end_helper_code
+        nop
+        end
+
+    my_helper_code:
+        // do stuff here
+        nop
+    end_helper_code:
+
+## Special Purpose Instructions
+#### nop: No operation
+Syntax: `nop`
+
+Notes:
+* This may be necessary before using `end` to make sure all pending write operations have been completed
+
+#### end: Finish shader execution
+Syntax: `end`
+
+Operation:
+Stops shader execution.
--- a/externals/nihstro/docs/nihcode_spec.md
+++ b/externals/nihstro/docs/nihcode_spec.md
@@ -0,0 +1,130 @@
+#nihcode Specification
+
+Version 0.1.
+
+This page seeks to be a formal-ish specification of the input assembly language *nihcode* used by the nihstro shader assembler.
+
+## Version information
+This document is is intended to give developers an idea of how things are expected to work. Please file issue reports for any deviations in nihstro's behavior from this specifications that you find. Similarly, any inclarities in the specification will be corrected if reported, too.
+
+## Structure
+nihcode is a sequence of statements, each of which must be put onto a separate line. There are five types of statements:
+* version information statements
+* include statement
+* alias declaration statements,
+* label declaration statements, and
+* instruction statements,
+each of which is described in its own section below. Additionally, C++-like comments may be inserted at any point and are started using the character sequences `//`, `#`, or `;`. Comments span the rest of the line after any of these characters. Any statement must be written on its own line.
+
+A pseudo-code example of nihcode looks like this:
+
+    // First example shader
+    .version 0.1                  // version information
+    
+    .alias inpos v0               // alias declaration
+    .alias intex v1               // alias declaration
+    .alias pos o0    as position  // alias declaration
+    .alias pos o1.xy as texcoord0 // alias declaration
+
+    .include "utils.h"            // include utility functionality
+
+    main:                         // label declaration
+        mov o0, v0                // instruction
+        mov o1.xy, v1.xy          // instruction
+        nop                       // instruction
+        end                       // instruction
+
+
+## Shader Registers, builtin Identifiers, Swizzle Masks
+A shader can access a number of different registers with different purposes. *Input registers* expose the raw input vertex attribute data, while the output vertex attributes used for rendering is written to *output registers*. External programs can pass parameters to the shader by setting *uniforms*. Additionally, a number of *temporary registers* are free for any use. There are also special-purpose registers, namely the *address registers* and the *conditional code register*.
+
+Registers are being referred to by using *identifiers*. There is a number of builtin identifiers, each of which refers to one register. Note that most registers are vectors, i.e. they comprise multiple components, which are accessed using swizzle masks.
+* `v0`-`v15`: Input registers (read-only), four-component vectors
+* `r0`-`r15`: Temporary registers (read-write), four-component vectors
+* `c0`-`c95`: Float uniforms (read-only), four-component vectors
+* `i0`-`i3`:  Integer uniforms (read-only), four-component vectors
+* `b0`-`b15`: Boolean uniforms (read-only), scalar
+* `o0`-`o15`: Output registers (write-only), four-component vectors
+* `a0, a1, aL`: Address registers (used with MOVA and dynamic indexing), scalar
+* `cc`: Conditional code register (used with CMP and flow-control instructions), two-component vector
+
+For better readability, one can also define new identifiers, as explained below. Identifiers may only use a restricted set of names including lower- or uppercase letters a-Z, underscores, and decimal digits (the latter two which may not be used as the first character of the name). Additionally, an identifier may be followed by a swizzle mask, separated by the character `.` (e.g. `texcoord.zyx`). Swizzle masks allow for reordering, duplicating, and removing of one or more vector components of the identified register (without actually modifying that register).
+
+When used with certain instructions, identifiers may be mentioned along with a sign, an array index, and/or a swizzle mask. Constructs like this are called *expressions*.
+
+The following names are reserved identifiers, and may not be used during declarations:
+* Any names starting with a `gl_` prefix
+* Any names starting with a `dmp_` prefix
+* Any names starting with an underscore prefix
+* Any of the instruction opcodes mentioned below may not be used for the identifier name
+
+## Aliases
+### Plain Aliases (any register)
+`.alias <new_identifier> <existing_identifier>{.<swizzle_mask>}`
+
+Declares a new identifier called `new_identifier` which will refer to the same register that `existing_identifier` refers to, applying a swizzle_mask if specified. All subsequent uses of `new_identifier` are equivalent to using `existing_identifier.swizzle_mask`. Aliases of any register type may be created, however it should be noted that using output registers requires explicit assignment of an output semantic (see below).
+
+E.g. `.alias input_texture v2.xy`
+
+### Alias with Assignment of a Semantic (output registers)
+`.alias <new_identifier> <existing_identifier>{.swizzle_mask} as <semantic>`
+
+Declares an alias of `existing_identifier` with the name `new_identifier` and assigns the given semantic to the corresponding output register. An output semantic needs to be given to describe how the output vertex attribute is intended to be used after shader execution. `semantic` may be any of the strings `position`, `quaternion`, `color`, `texcoord0`, `texcoord1`, `texcoord2`, and `view`. If not all output register components are being written to, a swizzle mask should be used to denote the "active" components. Note that this swizzle mask may not reorder any components.
+
+E.g. `.alias output_texcoord o1.xy as texcoord0`
+
+### Constant Declarations (uniform registers)
+scalar constants: `.alias <new_identifier> <existing_identifier> as <value>`
+
+vector constants: `.alias <new_identifier> <existing_identifier> as (<x>, <y>{, <z>{, <w>}})`
+
+Declares an alias of `existing_identifier` with the name `new_identifier` and assigns the given default value to it. Default values are parsed by the ctrulib API and automatically applied when enabling a shader. The number of components in the given constant must match the number of components in the specified register.
+
+E.g. `.alias my_const c4 as (0.1, 3.2, -3.14, 0.0)`
+
+## Label Declarations
+`<labelname>:`
+
+Declares a new label with the name `labelname` at the given source line, which can be used in flow control operations. Label names follow the same conventions as identifiers and may not share the same name with an existing identifier.
+
+## Instruction Statements
+Writes the given opcode according to the given arguments to the shader binary. There are a lot of instructions, and each of them uses one of the following formats:
+
+Trivial operations:
+`<opcode>`
+Used by `else`, `emit`, `end`, `endif`, `endloop`, and `nop`.
+
+Arithmetic operations:
+`<opcode> <expression1>{, <expression2>{, <expression3>{, <expression4>}}}`
+Used by `add`, `dp3`, `dp4`, `dph`, `ex2`, `flr`, `lg2`, `mad`, `max`, `min`, `mov`, `mova`, `mul`, `rcp`, `rsq`, `sge` and `slt`. The number of required expressions as well as their meaning depends on the opcode.
+
+E.g. `mul o3.xyz c4.xyz v0.xyz`
+
+Compare operation:
+`cmp <expression1>, <expression2>, <op_x>, <op_y>`
+Used exclusively by `cmp`. `expression1` and `expression2` must evaluate to two-component float vectors. `op_x` and `op_y` specify comparison operations for the x and y components of the given expressions, respectively. They may be `==`, `!=`, `<`, `<=`, `>` or `>=`.
+
+E.g. `cmp c0.xy, i2.xy, <=, ==`
+
+Flow control operations:
+`<opcode> <condition>`
+Used by `break`, `if` and `loop`.
+
+`<opcode> {<target_label>} {until <return_label>} {if <condition>}`
+Used by `jmp` and `call`.
+
+ `condition` may either be an identifier of a boolean uniform or a conditional expression. Examples for conditional expressions are `cc.x`, `!cc.x`, `!cc.xy`, `cc.x && !cc.y`, and `cc.x || cc.y`, where `{!}cc.xy` is equivalent to `{!}cc.x && {!}cc.y`. `target_label` and `return_label` must be label identifiers. Their meaning depends on the given opcode.
+
+For a full instruction set reference, go to [instruction set reference](instruction_set.md). You may also want to refer to [3dbrew](http://3dbrew.org/wiki/Shader_Instruction_Set) for low-level documentation on each opcode. Is is suggested that you take a look at the nihstro examples to get a better picture of how to apply that information.
+
+## Include Statements
+`.include "filename"`
+
+Replaces the `.include` line with the contents of the given file. The filename is taken to be relative to the file it was included from.
+
+## Version Information
+`.version number`
+
+This statement is a hint for the compiler to see which language specification the shader was written against. It may be used to toggle a compatibility assembling mode.
+
+E.g. `.version 0.1`