Merge pull request #718 from ucb-bar/hammer-docs

Generalizing VLSI Flows docs
2021-01-11 10:11:48 -08:00
parent d1d7bb8f52 06cee8fa42
commit 4caecb9a10
37 changed files with 465 additions and 76 deletions
--- a/docs/Advanced-Concepts/Chip-Communication.rst
+++ b/docs/Advanced-Concepts/Chip-Communication.rst
@@ -54,7 +54,7 @@ sends the TSI command recieved by the simulation stub into the DUT which then co
 command into a TileLink request. This conversion is done by the ``SerialAdapter`` module
 (located in the ``generators/testchipip`` project). In simulation, FESVR
 resets the DUT, writes into memory the test program, and indicates to the DUT to start the program
-through an interrupt (see :ref:`Chipyard Boot Process`). Using TSI is currently the fastest
+through an interrupt (see :ref:`customization/Boot-Process:Chipyard Boot Process`). Using TSI is currently the fastest
 mechanism to communicate with the DUT in simulation.

 In the case of a chip tapeout bringup, TSI commands can be sent over a custom communication
@@ -96,7 +96,7 @@ Starting the TSI or DMI Simulation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 All default Chipyard configurations use TSI to communicate between the simulation and the simulated SoC/DUT. Hence, when running a
-software RTL simulation, as is indicated in the :ref:`Software RTL Simulation` section, you are in-fact using TSI to communicate with the DUT. As a
+software RTL simulation, as is indicated in the :ref:`simulation/Software-RTL-Simulation:Software RTL Simulation` section, you are in-fact using TSI to communicate with the DUT. As a
 reminder, to run a software RTL simulation, run:

 .. code-block:: bash
--- a/docs/Advanced-Concepts/Debugging-BOOM.rst
+++ b/docs/Advanced-Concepts/Debugging-BOOM.rst
@@ -1,8 +1,8 @@
 Debugging BOOM
 ======================

-In addition to the default debugging techniques specified in :ref:`Debugging RTL`,
-single-core BOOM designs can utilize the Dromajo co-simulator (see :ref:`Dromajo`)
+In addition to the default debugging techniques specified in :ref:`Advanced-Concepts/Debugging-RTL:Debugging RTL`,
+single-core BOOM designs can utilize the Dromajo co-simulator (see :ref:`Tools/Dromajo:Dromajo`)
 to verify functionality.

 .. warning:: Dromajo currently only works in single-core BOOM systems without accelerators.
--- a/docs/Advanced-Concepts/Top-Testharness.rst
+++ b/docs/Advanced-Concepts/Top-Testharness.rst
@@ -58,7 +58,7 @@ Tops

 A SoC Top then extends the ``System`` class with traits for custom components.
 In Chipyard, this includes things like adding a NIC, UART, and GPIO as well as setting up the hardware for the bringup method.
-Please refer to :ref:`Communicating with the DUT` for more information on these bringup methods.
+Please refer to :ref:`Advanced-Concepts/Chip-Communication:Communicating with the DUT` for more information on these bringup methods.

 TestHarness
 -------------------------
--- a/docs/Chipyard-Basics/Chipyard-Components.rst
+++ b/docs/Chipyard-Basics/Chipyard-Components.rst
@@ -14,15 +14,15 @@ Processor Cores

 **Rocket Core**
  An in-order RISC-V core.
-  See :ref:`Rocket Core` for more information.
+  See :ref:`Generators/Rocket:Rocket Core` for more information.

 **BOOM (Berkeley Out-of-Order Machine)**
  An out-of-order RISC-V core.
-  See :ref:`Berkeley Out-of-Order Machine (BOOM)` for more information.
+  See :ref:`Generators/BOOM:Berkeley Out-of-Order Machine (BOOM)` for more information.

 **CVA6 Core**
  An in-order RISC-V core written in System Verilog. Previously called Ariane.
-  See :ref:`CVA6 Core` for more information.
+  See :ref:`Generators/CVA6:CVA6 Core` for more information.

 Accelerators
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -31,7 +31,7 @@ Accelerators
  A decoupled vector architecture co-processor.
  Hwacha currently implements a non-standard RISC-V extension, using a vector architecture programming model.
  Hwacha integrates with a Rocket or BOOM core using the RoCC (Rocket Custom Co-processor) interface.
-  See :ref:`Hwacha` for more information.
+  See :ref:`Generators/Hwacha:Hwacha` for more information.

 **Gemmini**
  A matrix-multiply accelerator targeting neural-networks
@@ -64,24 +64,24 @@ Tools
  A hardware description library embedded in Scala.
  Chisel is used to write RTL generators using meta-programming, by embedding hardware generation primitives in the Scala programming language.
  The Chisel compiler elaborates the generator into a FIRRTL output.
-  See :ref:`Chisel` for more information.
+  See :ref:`Tools/Chisel:Chisel` for more information.

 **FIRRTL**
  An intermediate representation library for RTL description of digital designs.
  FIRRTL is used as a formalized digital circuit representation between Chisel and Verilog.
  FIRRTL enables digital circuits manipulation between Chisel elaboration and Verilog generation.
-  See :ref:`FIRRTL` for more information.
+  See :ref:`Tools/FIRRTL:FIRRTL` for more information.

 **Barstools**
  A collection of common FIRRTL transformations used to manipulate a digital circuit without changing the generator source RTL.
-  See :ref:`Barstools` for more information.
+  See :ref:`Tools/Barstools:Barstools` for more information.

 **Dsptools**
  A Chisel library for writing custom signal processing hardware, as well as integrating custom signal processing hardware into an SoC (especially a Rocket-based SoC).

 **Dromajo**
  A RV64GC emulator primarily used for co-simulation and was originally developed by Esperanto Technologies.
-  See :ref:`Dromajo` for more information.
+  See :ref:`Tools/Dromajo:Dromajo` for more information.

 Toolchains
 -------------------------------------------
@@ -109,12 +109,12 @@ Sims
 **Verilator**
  Verilator is an open source Verilog simulator.
  The ``verilator`` directory provides wrappers which construct Verilator-based simulators from relevant generated RTL, allowing for execution of test RISC-V programs on the simulator (including vcd waveform files).
-  See :ref:`Verilator (Open-Source)` for more information.
+  See :ref:`Simulation/Software-RTL-Simulation:Verilator (Open-Source)` for more information.

 **VCS**
  VCS is a proprietary Verilog simulator.
  Assuming the user has valid VCS licenses and installations, the ``vcs`` directory provides wrappers which construct VCS-based simulators from relevant generated RTL, allowing for execution of test RISC-V programs on the simulator (including vcd/vpd waveform files).
-  See :ref:`Synopsys VCS (License Required)` for more information.
+  See :ref:`Simulation/Software-RTL-Simulation:Synopsys VCS (License Required)` for more information.

 **FireSim**
  FireSim is an open-source FPGA-accelerated simulation platform, using Amazon Web Services (AWS) EC2 F1 instances on the public cloud.
@@ -122,7 +122,7 @@ Sims
  To model I/O, FireSim includes synthesizeable and timing-accurate models for standard interfaces like DRAM, Ethernet, UART, and others.
  The use of the elastic public cloud enable FireSim to scale simulations up to thousands of nodes.
  In order to use FireSim, the repository must be cloned and executed on AWS instances.
-  See :ref:`FireSim` for more information.
+  See :ref:`Simulation/FPGA-Accelerated-Simulation:FireSim` for more information.

 Prototyping
 -------------------------------------------
@@ -130,8 +130,8 @@ Prototyping
 **FPGA Prototyping**
  FPGA prototyping is supported in Chipyard using SiFive's ``fpga-shells``.
  Some examples of FPGAs supported are the Xilinx Arty 35T and VCU118 boards.
-  For a fast and deterministic simulation with plenty of debugging tools, please consider using the :ref:`FireSim` platform.
-  See :ref:`Prototyping Flow` for more information on FPGA prototypes.
+  For a fast and deterministic simulation with plenty of debugging tools, please consider using the :ref:`Simulation/FPGA-Accelerated-Simulation:FireSim` platform.
+  See :ref:`Prototyping/index:Prototyping Flow` for more information on FPGA prototypes.

 VLSI
 -------------------------------------------
@@ -141,4 +141,4 @@ VLSI
  The HAMMER flow provide automated scripts which generate relevant tool commands based on a higher level description of physical design constraints.
  The Hammer flow also allows for re-use of process technology knowledge by enabling the construction of process-technology-specific plug-ins, which describe particular constraints relating to that process technology (obsolete standard cells, metal layer routing constraints, etc.).
  The Hammer flow requires access to proprietary EDA tools and process technology libraries.
-  See :ref:`Core HAMMER` for more information.
+  See :ref:`VLSI/Hammer:Core HAMMER` for more information.
--- a/docs/Chipyard-Basics/Configs-Parameters-Mixins.rst
+++ b/docs/Chipyard-Basics/Configs-Parameters-Mixins.rst
@@ -17,7 +17,7 @@ Configs
 A *config* is a collection of multiple generator parameters being set to specific values.
 Configs are additive, can override each other, and can be composed of other configs (sometimes referred to as config fragments).
 The naming convention for an additive config or config fragment is ``With<YourConfigName>``, while the naming convention for a non-additive config will be ``<YourConfig>``.
-Configs can take arguments which will in-turn set parameters in the design or reference other parameters in the design (see :ref:`Parameters`).
+Configs can take arguments which will in-turn set parameters in the design or reference other parameters in the design (see :ref:`Chipyard-Basics/Configs-Parameters-Mixins:Parameters`).

 This example shows a basic config fragment class that takes in zero arguments and instead uses hardcoded values to set the RTL design parameters.
 In this example, ``MyAcceleratorConfig`` is a Scala case class that defines a set of variables that the generator can use when referencing the ``MyAcceleratorKey`` in the design.
@@ -121,7 +121,7 @@ This is shown in the ``Top`` class where things such as ``CanHavePeripherySerial
 Additional References
 ---------------------------

-Another description of traits/mixins and config fragments is given in :ref:`Keys, Traits, and Configs`.
+Another description of traits/mixins and config fragments is given in :ref:`Customization/Keys-Traits-Configs:Keys, Traits, and Configs`.
 Additionally, a brief explanation of some of these topics (with slightly different naming) is given in the following video: https://www.youtube.com/watch?v=Eko86PGEoDY.

 .. Note:: Chipyard uses the name "config fragments" over "config mixins" to avoid confusion between a mixin applying to a config or to the system ``Top`` (even though both are technically Scala mixins).
--- a/docs/Chipyard-Basics/Initial-Repo-Setup.rst
+++ b/docs/Chipyard-Basics/Initial-Repo-Setup.rst
@@ -73,7 +73,7 @@ This depends on what you are planning to do with Chipyard.

 * If you intend to run a simulation of one of the vanilla Chipyard examples, go to :ref:`sw-rtl-sim-intro` and follow the instructions.

-* If you intend to run a simulation of a custom Chipyard SoC Configuration, go to :ref:`Simulating A Custom Project` and follow the instructions.
+* If you intend to run a simulation of a custom Chipyard SoC Configuration, go to :ref:`Simulation/Software-RTL-Simulation:Simulating A Custom Project` and follow the instructions.

 * If you intend to run a full-system FireSim simulation, go to :ref:`firesim-sim-intro` and follow the instructions.

--- a/docs/Customization/Boot-Process.rst
+++ b/docs/Customization/Boot-Process.rst
@@ -34,7 +34,7 @@ FESVR is a program that runs on the host CPU and can read/write arbitrary
 parts of the target system memory using the Tethered Serial Interface (TSI).

 FESVR uses TSI to load a baremetal executable or second-stage bootloader into
-the SoC memory. In :ref:`Software RTL Simulation`, this will be the binary you
+the SoC memory. In :ref:`Simulation/Software-RTL-Simulation:Software RTL Simulation`, this will be the binary you
 pass to the simulator. Once it is finished loading the program, FESVR will
 write to the software interrupt register for CPU 0, which will bring CPU 0
 out of its WFI loop. Once it receives the interrupt, CPU 0 will write to
--- a/docs/Customization/DMA-Devices.rst
+++ b/docs/Customization/DMA-Devices.rst
@@ -22,7 +22,7 @@ that writes zeros to the memory at a configured address.

 We use ``TLHelper.makeClientNode`` to create a TileLink client node for us.
 We then connect the client node to the memory system through the front bus (fbus).
-For more info on creating TileLink client nodes, take a look at :ref:`Client Node`.
+For more info on creating TileLink client nodes, take a look at :ref:`TileLink-Diplomacy-Reference/NodeTypes:Client Node`.

 Once we've created our top-level module including the DMA widget, we can create a configuration for it as we did before.

--- a/docs/Customization/Firrtl-Transforms.rst
+++ b/docs/Customization/Firrtl-Transforms.rst
@@ -5,7 +5,7 @@ Adding a Firrtl Transform

 Similar to how LLVM IR passes can perform transformations and optimizations on software, FIRRTL transforms can
 modify Chisel-elaborated RTL.
-As mentioned in Section :ref:`firrtl`, transforms are modifications that happen on the FIRRTL IR that can modify a circuit.
+As mentioned in Section :ref:`Tools/FIRRTL:firrtl`, transforms are modifications that happen on the FIRRTL IR that can modify a circuit.
 Transforms are a powerful tool to take in the FIRRTL IR that is emitted from Chisel and run analysis or convert the circuit into a new form.

 Where to add transforms
@@ -24,7 +24,7 @@ If you look inside of the `tools/barstools/tapeout/src/main/scala/transforms/Gen
 you can see that FIRRTL is invoked twice, once for the "Top" and once for the "Harness". If you want to add transforms to just modify the DUT, you can add them to ``topTransforms``.
 Otherwise, if you want to add transforms to just modify the test harness, you can add them to ``harnessTransforms``.

-For more information on Barstools, please visit the :ref:`Barstools` section.
+For more information on Barstools, please visit the :ref:`Tools/Barstools:Barstools` section.

 Examples of transforms
 ----------------------
--- a/docs/Customization/Memory-Hierarchy.rst
+++ b/docs/Customization/Memory-Hierarchy.rst
@@ -13,7 +13,7 @@ if you use the ``WithNMedCores`` or ``WithNSmallCores`` configurations, you can
 configure 4 KiB direct-mapped caches for L1I and L1D.

 If you only want to change the size or associativity, there are config
-fragments for those too. See :ref:`Config Fragments` for how to add these to a custom ``Config``.
+fragments for those too. See :ref:`Customization/Keys-Traits-Configs:Config Fragments` for how to add these to a custom ``Config``.

 .. code-block:: scala

--- a/docs/Generators/Gemmini.rst
+++ b/docs/Generators/Gemmini.rst
@@ -5,7 +5,7 @@ The Gemmini project is developing a systolic-array based matrix multiplication u

 Gemmini is implemented as a RoCC accelerator with non-standard RISC-V custom instructions. The Gemmini unit uses the RoCC port of a Rocket or BOOM `tile`, and by default connects to the memory system through the System Bus (i.e., directly to the L2 cache).

-To add a Gemmini unit to an SoC, you should add the ``gemmini.DefaultGemminiConfig`` config fragment to the SoC configurations. To change the configuration of the Gemmini accelerator unit, you can write a custom configuration to replace the ``DefaultGemminiConfig``, which you can view under `generators/gemmini/src/main/scala/configs.scala <https://github.com/ucb-bar/gemmini/blob/master/src/main/scala/gemmini/configs.scala>`__ to see the possible configuration parameters.
+To add a Gemmini unit to an SoC, you should add the ``gemmini.DefaultGemminiConfig`` config fragment to the SoC configurations. To change the configuration of the Gemmini accelerator unit, you can write a custom configuration to replace the ``DefaultGemminiConfig``, which you can view under `generators/gemmini/src/main/scala/configs.scala <https://github.com/ucb-bar/gemmini/blob/master/src/main/scala/gemmini/Configs.scala>`__ to see the possible configuration parameters.

 The example Chipyard config includes the following example SoC configuration which includes Gemmini:

--- a/docs/Generators/IceNet.rst
+++ b/docs/Generators/IceNet.rst
@@ -8,11 +8,11 @@ A diagram of IceNet's microarchitecture is shown below.

 .. image:: ../_static/images/nic-design.png

-There are four basic parts of the NIC: the :ref:`Controller`, which takes requests
-from and sends responses to the CPU; the :ref:`Send Path`, which reads data from
-memory and sends it out to the network; the :ref:`Receive Path`, which receives
+There are four basic parts of the NIC: the :ref:`Generators/IceNet:Controller`, which takes requests
+from and sends responses to the CPU; the :ref:`Generators/IceNet:Send Path`, which reads data from
+memory and sends it out to the network; the :ref:`Generators/IceNet:Receive Path`, which receives
 data from the network and writes it to memory; and, optionally,
-the :ref:`Pause Handler`, which generates Ethernet pause frames for the purpose
+the :ref:`Generators/IceNet:Pause Handler`, which generates Ethernet pause frames for the purpose
 of flow control.

 Controller
@@ -78,7 +78,7 @@ Configuration
 To add IceNIC to your design, add ``HasPeripheryIceNIC`` to your lazy module
 and ``HasPeripheryIceNICModuleImp`` to the module implementation. If you
 are confused about the distinction between lazy module and module
-implementation, refer to :ref:`Cake Pattern / Mixin`.
+implementation, refer to :ref:`Chipyard-Basics/Configs-Parameters-Mixins:Cake Pattern / Mixin`.

 Then add the ``WithIceNIC`` config fragment to your configuration. This will
 define ``NICKey``, which IceNIC uses to determine its parameters. The config fragment
--- a/docs/Generators/Rocket-Chip.rst
+++ b/docs/Generators/Rocket-Chip.rst
@@ -30,7 +30,7 @@ The tiles connect to the ``SystemBus``, which connect it to the L2 cache banks.
 The L2 cache banks then connect to the ``MemoryBus``, which connects to the
 DRAM controller through a TileLink to AXI converter.

-To learn more about the memory hierarchy, see :ref:`Memory Hierarchy`.
+To learn more about the memory hierarchy, see :ref:`Customization/Memory-Hierarchy:Memory Hierarchy`.

 MMIO
 ----
--- a/docs/Generators/TestChipIP.rst
+++ b/docs/Generators/TestChipIP.rst
@@ -2,9 +2,9 @@ Test Chip IP
 ============

 Chipyard includes a Test Chip IP library which provides various hardware
-widgets that may be useful when designing SoCs. This includes a :ref:`Serial Adapter`,
-:ref:`Block Device Controller`, :ref:`TileLink SERDES`, :ref:`TileLink Switcher`,
-:ref:`TileLink Ring Network`, and :ref:`UART Adapter`.
+widgets that may be useful when designing SoCs. This includes a :ref:`Generators/TestChipIP:Serial Adapter`,
+:ref:`Generators/TestChipIP:Block Device Controller`, :ref:`Generators/TestChipIP:TileLink SERDES`, :ref:`Generators/TestChipIP:TileLink Switcher`,
+:ref:`Generators/TestChipIP:TileLink Ring Network`, and :ref:`Generators/TestChipIP:UART Adapter`.

 Serial Adapter
 --------------
@@ -14,7 +14,7 @@ processor. An instance of RISC-V frontend server running on the host CPU
 can send commands to the serial adapter to read and write data from the memory
 system. The frontend server uses this functionality to load the test program
 into memory and to poll for completion of the program. More information on
-this can be found in :ref:`Chipyard Boot Process`.
+this can be found in :ref:`Customization/Boot-Process:Chipyard Boot Process`.

 Block Device Controller
 -----------------------
@@ -69,7 +69,7 @@ to the TLXbar provided by RocketChip, but uses ring networks internally rather
 than crossbars. This can be useful for chips with very wide TileLink networks
 (many cores and L2 banks) that can sacrifice cross-section bandwidth to relieve
 wire routing congestion. Documentation on how to use the ring network can be
-found in :ref:`The System Bus`. The implementation itself can be found 
+found in :ref:`Customization/Memory-Hierarchy:The System Bus`. The implementation itself can be found 
 `here <https://github.com/ucb-bar/testchipip/blob/master/src/main/scala/Ring.scala>`_,
 and may serve as an example of how to implement your own TileLink network with
 a different topology.
--- a/docs/Generators/index.rst
+++ b/docs/Generators/index.rst
@@ -4,7 +4,7 @@ Included RTL Generators
 ============================

 A Generator can be thought of as a generalized RTL design, written using a mix of meta-programming and standard RTL.
-This type of meta-programming is enabled by the Chisel hardware description language (see :ref:`Chisel`).
+This type of meta-programming is enabled by the Chisel hardware description language (see :ref:`Tools/Chisel:Chisel`).
 A standard RTL design is essentially just a single instance of a design coming from a generator.
 However, by using meta-programming and parameter systems, generators can allow for integration of complex hardware designs in automated ways.
 The following pages introduce the generators integrated with the Chipyard framework.
--- a/docs/Prototyping/General.rst
+++ b/docs/Prototyping/General.rst
@@ -18,7 +18,7 @@ Generating a Bitstream
 ----------------------

 Generating a bitstream for any FPGA target using Vivado is similar to building RTL for a software RTL simulation.
-Similar to a software RTL simulation (:ref:`Simulating A Custom Project`), you can run the following command in the ``fpga`` directory to build a bitstream using Vivado:
+Similar to a software RTL simulation (:ref:`Simulation/Software-RTL-Simulation:Simulating A Custom Project`), you can run the following command in the ``fpga`` directory to build a bitstream using Vivado:

 .. code-block:: shell

@@ -67,4 +67,4 @@ For example, running the bitstream build for an added ILA for a BOOM config.:

    make SUB_PROJECT=vcu118 CONFIG=BoomVCU118Config debug-bitstream

-.. IMPORTANT:: For more extensive debugging tools for FPGA simulations including printf synthesis, assert synthesis, instruction traces, ILAs, out-of-band profiling, co-simulation, and more, please refer to the :ref:`FireSim` platform.
+.. IMPORTANT:: For more extensive debugging tools for FPGA simulations including printf synthesis, assert synthesis, instruction traces, ILAs, out-of-band profiling, co-simulation, and more, please refer to the :ref:`Simulation/FPGA-Accelerated-Simulation:FireSim` platform.
--- a/docs/Prototyping/VCU118.rst
+++ b/docs/Prototyping/VCU118.rst
@@ -45,7 +45,7 @@ For ease of use, you can change the ``FPGAFrequencyKey`` to change the default c

 After the harness is created, the ``BundleBridgeSource``'s must be connected to the ``ChipTop`` IOs.
 This is done with harness binders and io binders (see ``fpga/src/main/scala/vcu118/HarnessBinders.scala`` and ``fpga/src/main/scala/vcu118/IOBinders.scala``).
-For more information on harness binders and io binders, refer to :ref:`IOBinders and HarnessBinders`.
+For more information on harness binders and io binders, refer to :ref:`Customization/IOBinders:IOBinders and HarnessBinders`.

 Introduction to the Bringup Platform
 ------------------------------------
@@ -57,4 +57,4 @@ The TSI Host Widget is used to interact with the DUT from the prototype over a S

 .. Note:: Remember that since whenever a new test harness is created (or the config changes, or the config packages changes, or...), you need to modify the make invocation.
    For example, ``make SUB_PROJECT=vcu118 CONFIG=MyNewVCU118Config CONFIG_PACKAGE=this.is.my.scala.package bitstream``.
-    See :ref:`Generating a Bitstream` for information on the various make variables.
+    See :ref:`Prototyping/General:Generating a Bitstream` for information on the various make variables.
--- a/docs/TileLink-Diplomacy-Reference/NodeTypes.rst
+++ b/docs/TileLink-Diplomacy-Reference/NodeTypes.rst
@@ -59,7 +59,7 @@ TileLink messages.

 The ``edge`` object represents the edge of the Diplomacy graph. It contains
 some useful helper functions which will be documented in
-:ref:`TileLink Edge Object Methods`.
+:ref:`TileLink-Diplomacy-Reference/EdgeFunctions:TileLink Edge Object Methods`.

 Manager Node
 ------------
@@ -116,7 +116,7 @@ most MMIO peripherals should set it to.

 The next six arguments start with ``support`` and determine the different
 A channel message types that the manager can accept. The definitions of the
-message types are explained in :ref:`TileLink Edge Object Methods`.
+message types are explained in :ref:`TileLink-Diplomacy-Reference/EdgeFunctions:TileLink Edge Object Methods`.
 The ``TransferSizes`` case class specifies the range of logical sizes (in bytes)
 that the manager can accept for the particular message type. This is an inclusive
 range and all logical sizes must be powers of two. So in this case, the manager
@@ -137,7 +137,7 @@ to handle TileLink requests, it is usually much easier to use a register node.
 This type of node provides a ``regmap`` method that allows you to specify
 control/status registers and automatically generates the logic to handle the
 TileLink protocol. More information about how to use register nodes can be
-found in :ref:`Register Router`.
+found in :ref:`TileLink-Diplomacy-Reference/Register-Router:Register Router`.

 Identity Node
 -------------
@@ -176,7 +176,7 @@ If we want to connect the client and manager groups together, we can now do this
    :end-before: DOC include end: MyClientManagerComplex

 The meaning of the ``:=*`` operator is explained in more detail in the
-:ref:`Diplomacy Connectors` section. In summary, it connects two nodes together
+:ref:`TileLink-Diplomacy-Reference/Diplomacy-Connectors:Diplomacy Connectors` section. In summary, it connects two nodes together
 using multiple edges. The edges in the identity node are assigned in order,
 so in this case ``client1.node`` will eventually connect to ``manager1.node``
 and ``client2.node`` will connect to ``manager2.node``.
@@ -192,7 +192,7 @@ produces the same number of outputs. However, unlike the identity node, the
 adapter node does not simply pass the connections through unchanged.
 It can change the logical and physical interfaces between input and output and
 rewrite messages going through. RocketChip provides a library of adapters,
-which are catalogued in :ref:`Diplomatic Widgets`.
+which are catalogued in :ref:`TileLink-Diplomacy-Reference/Widgets:Diplomatic Widgets`.

 You will rarely need to create an adapter node yourself, but the invocation is
 as follows.
--- a/docs/TileLink-Diplomacy-Reference/Register-Router.rst
+++ b/docs/TileLink-Diplomacy-Reference/Register-Router.rst
@@ -32,7 +32,7 @@ The default value is 4 bytes. The ``concurrency`` argument is the size of the
 internal queue for TileLink requests. By default, this value is 0, which means
 there will be no queue. This value must be greater than 0 if you wish to
 decoupled requests and responses for register accesses. This is discussed
-in :ref:`Using Functions`.
+in :ref:`TileLink-Diplomacy-Reference/Register-Router:Using Functions`.

 The main way to interact with the node is to call the ``regmap`` method, which
 takes a sequence of pairs. The first element of the pair is an offset from the
@@ -128,7 +128,7 @@ Register Routers for Other Protocols

 One useful feature of the register router interface is that you can easily
 change the protocol being used. For instance, in the first example in
-:ref:`Basic Usage`, you could simply change the ``TLRegisterNode`` to
+:ref:`TileLink-Diplomacy-Reference/Register-Router:Basic Usage`, you could simply change the ``TLRegisterNode`` to
 and ``AXI4RegisterNode``.

 .. literalinclude:: ../../generators/chipyard/src/main/scala/example/RegisterNodeExample.scala
--- a/docs/TileLink-Diplomacy-Reference/Widgets.rst
+++ b/docs/TileLink-Diplomacy-Reference/Widgets.rst
@@ -81,7 +81,7 @@ The arguments for the five-argument constructor are
 AXI4Buffer
 ----------

-Similar to the :ref:`TLBuffer`, but for AXI4. It also takes ``BufferParams`` objects
+Similar to the :ref:`TileLink-Diplomacy-Reference/Widgets:TLBuffer`, but for AXI4. It also takes ``BufferParams`` objects
 as arguments.

 **Arguments:**
@@ -200,7 +200,7 @@ transactions.
 AXI4Fragmenter
 --------------

-The AXI4Fragmenter is similar to the :ref:`TLFragmenter`, except it can only
+The AXI4Fragmenter is similar to the :ref:`TileLink-Diplomacy-Reference/Widgets:TLFragmenter`, except it can only
 break multi-beat AXI4 transactions into single-beat transactions. This
 effectively serves as an AXI4 to AXI4-Lite converter. The constructor for this
 widget does not take any arguments.
@@ -237,7 +237,7 @@ you will want to use a TLSourceShrinker.
 AXI4IdIndexer
 -------------

-The AXI4 equivalent of :ref:`TLSourceShrinker`. This limits the number of
+The AXI4 equivalent of :ref:`TileLink-Diplomacy-Reference/Widgets:TLSourceShrinker`. This limits the number of
 AWID/ARID bits in the slave AXI4 interface. Useful for connecting to external
 or black box AXI4 ports.

@@ -257,7 +257,7 @@ or black box AXI4 ports.

 The AXI4IdIndexer will create a ``user`` field on the slave interface, as it
 stores the ID of the master requests in this field. If connecting to an AXI4
-interface that doesn't have a ``user`` field, you'll need to use the :ref:`AXI4UserYanker`.
+interface that doesn't have a ``user`` field, you'll need to use the :ref:`TileLink-Diplomacy-Reference/Widgets:AXI4UserYanker`.

 TLWidthWidget
 -------------
@@ -301,7 +301,7 @@ The possible values of ``policy`` are:
   ordering guaranteed
 - ``TLFIFOFixer.allVolatile`` - All managers that have a RegionType of
   ``VOLATILE``, ``PUT_EFFECTS``, or ``GET_EFFECTS`` will have ordering
-   guaranteed (see :ref:`Manager Node` for explanation of region types).
+   guaranteed (see :ref:`TileLink-Diplomacy-Reference/NodeTypes:Manager Node` for explanation of region types).

 TLXbar and AXI4Xbar
 -------------------
@@ -377,14 +377,14 @@ override the default arguments of the constructors for these widgets.
        AXI4Fragmenter() :=
        axi4master.node

-You will need to add an :ref:`AXI4Deinterleaver` after the TLToAXI4 converter
+You will need to add an :ref:`TileLink-Diplomacy-Reference/Widgets:AXI4Deinterleaver` after the TLToAXI4 converter
 because it cannot deal with interleaved read responses. The TLToAXI4 converter
 also uses the AXI4 user field to store some information, so you will need an
-:ref:`AXI4UserYanker` if you want to connect to an AXI4 port without user
+:ref:`TileLink-Diplomacy-Reference/Widgets:AXI4UserYanker` if you want to connect to an AXI4 port without user
 fields.

 Before you connect an AXI4 port to the AXI4ToTL widget, you will need to
-add an :ref:`AXI4Fragmenter` and :ref:`AXI4UserYanker` because the converter cannot
+add an :ref:`TileLink-Diplomacy-Reference/Widgets:AXI4Fragmenter` and :ref:`TileLink-Diplomacy-Reference/Widgets:AXI4UserYanker` because the converter cannot
 deal with multi-beat transactions or user fields.

 TLROM
--- a/docs/Tools/Barstools.rst
+++ b/docs/Tools/Barstools.rst
@@ -23,15 +23,15 @@ An external module reference is a FIRRTL construct that enables a design to refe
 A list of unique SRAM configurations is output to a ``.conf`` file by FIRRTL, which is used to map technology SRAMs.
 Without this transform, FIRRTL will map all ``SeqMem`` s to flip-flop arrays with equivalent behavior, which may lead to a design that is difficult to route.

-The ``.conf`` file is consumed by a tool called MacroCompiler, which is part of the :ref:`Barstools` scala package.
+The ``.conf`` file is consumed by a tool called MacroCompiler, which is part of the :ref:`Tools/Barstools:Barstools` scala package.
 MacroCompiler is also passed an ``.mdf`` file that describes the available list of technology SRAMs or the capabilities of the SRAM compiler, if one is provided by the foundry.
-Typically a foundry SRAM compiler will be able to generate a set of different SRAMs collateral based on some requirements on size, aspect ratio, etc. (see :ref:`SRAM MDF Fields`).
+Typically a foundry SRAM compiler will be able to generate a set of different SRAMs collateral based on some requirements on size, aspect ratio, etc. (see :ref:`Tools/Barstools:SRAM MDF Fields`).
 Using a user-customizable cost function, MacroCompiler will select the SRAMs that are the best fit for each dimensionality in the ``.conf`` file.
 This may include over provisioning (e.g. using a 64x1024 SRAM for a requested 60x1024, if the latter is not available) or arraying.
 Arraying can be done in both width and depth, as well as to solve masking constraints.
 For example, a 128x2048 array could be composed of four 64x1024 arrays, with two macros in parallel to create two 128x1024 virtual SRAMs which are combinationally muxed to add depth.
 If this macro requires byte-granularity write masking, but no technology SRAMs support masking, then the tool may choose to use thirty-two 8x1024 arrays in a similar configuration.
-For information on writing ``.mdf`` files, look at `MDF on github <https://github.com/ucb-bar/plsi-mdf>`__ and a brief description in :ref:`SRAM MDF Fields` section.
+For information on writing ``.mdf`` files, look at `MDF on github <https://github.com/ucb-bar/plsi-mdf>`__ and a brief description in :ref:`Tools/Barstools:SRAM MDF Fields` section.

 The output of MacroCompiler is a Verilog file containing modules that wrap the technology SRAMs into the specified interface names from the ``.conf``.
 If the technology supports an SRAM compiler, then MacroCompiler will also emit HammerIR that can be passed to Hammer to run the compiler itself and generate design collateral.
@@ -103,7 +103,7 @@ This is necessary to facilitate post-synthesis and post-place-and-route simulati
 Simulations after you the design goes through a VLSI flow will use the verilog netlist generated from the flow and will need an untouched test harness to drive it.
 Separating these components into separate files makes this straightforward.
 Without the separation the file that included the test harness would also redefine the DUT which is often disallowed in simulation tools.
-To do this, there is a FIRRTL ``App`` in :ref:`Barstools` called ``GenerateTopAndHarness``, which runs the appropriate transforms to elaborate the modules separately.
+To do this, there is a FIRRTL ``App`` in :ref:`Tools/Barstools:Barstools` called ``GenerateTopAndHarness``, which runs the appropriate transforms to elaborate the modules separately.
 This also renames modules in the test harness so that any modules that are instantiated in both the test harness and the chip are uniquified.

 .. Note:: For VLSI projects, this ``App`` is run instead of the normal FIRRTL ``App`` to elaborate Verilog.
@@ -131,5 +131,5 @@ This, unfortunately, breaks the process-agnostic RTL abstraction, so it is recom
 The simplest way to do this is to have a config fragment that when included updates instantiates the IO cells and connects them in the test harness.
 When simulating chip-specific designs, it is important to include the IO cells.
 The IO cell behavioral models will often assert if they are connected incorrectly, which is a useful runtime check.
-They also keep the IO interface at the chip and test harness boundary (see :ref:`Separating the Top module from the TestHarness module`) consistent after synthesis and place-and-route,
+They also keep the IO interface at the chip and test harness boundary (see :ref:`Tools/Barstools:Separating the Top module from the TestHarness module`) consistent after synthesis and place-and-route,
 which allows the RTL simulation test harness to be reused.
--- a/docs/Tools/Chisel-Testers.rst
+++ b/docs/Tools/Chisel-Testers.rst
@@ -4,4 +4,4 @@ Chisel Testers
 `Chisel Testers <https://github.com/freechipsproject/chisel-testers>`__ is a library for writing tests for Chisel designs.
 It provides a Scala API for interacting with a DUT.
 It can use multiple backends, including things such as Treadle and Verilator.
-See :ref:`Treadle and FIRRTL Interpreter` and :ref:`sw-rtl-sim-intro` for more information on these simulation methods.
+See :ref:`Tools/Treadle:Treadle and FIRRTL Interpreter` and :ref:`sw-rtl-sim-intro` for more information on these simulation methods.
--- a/docs/Tools/Chisel.rst
+++ b/docs/Tools/Chisel.rst
@@ -13,7 +13,7 @@ The Chisel generator starts elaboration using the module and configuration class
 This is where the Chisel "library functions" are called with the parameters given and Chisel tries to construct a circuit based on the Chisel code.
 If a runtime error happens here, Chisel is stating that it cannot "build" your circuit due to "violations" between your code and the Chisel "library".
 However, if that passes, the output of the generator gives you an FIRRTL file and other misc collateral!
-See :ref:`FIRRTL` for more information on how to get a FIRRTL file to Verilog.
+See :ref:`Tools/FIRRTL:FIRRTL` for more information on how to get a FIRRTL file to Verilog.

 For an interactive tutorial on how to use Chisel and get started please visit the `Chisel Bootcamp <https://github.com/freechipsproject/chisel-bootcamp>`__.
 Otherwise, for all things Chisel related including API documentation, news, etc, visit their `website <https://chisel-lang.org/>`__.
--- a/docs/Tools/Dromajo.rst
+++ b/docs/Tools/Dromajo.rst
@@ -19,4 +19,4 @@ An example of a divergence and Dromajo's printout is shown below.
 Dromajo shows the divergence compared to simulation (PC, inst, inst-bits, write data, etc) and also provides the register state on failure.
 It is useful to catch bugs that affect architectural state before a simulation hangs or crashes.

-To use Dromajo with BOOM, refer to :ref:`Debugging RTL` section on Dromajo.
+To use Dromajo with BOOM, refer to :ref:`Advanced-Concepts/Debugging-RTL:Debugging RTL` section on Dromajo.
--- a/docs/VLSI/Basic-Flow.rst
+++ b/docs/VLSI/Basic-Flow.rst
@@ -0,0 +1,308 @@
+.. _hammer_basic_flow:
+
+Using Hammer To Place and Route a Custom Block
+=================================================
+
+.. IMPORTANT:: In order to use the Hammer VLSI flow, you need access to Hammer tools and technology plugins. You can obtain these by emailing hammer-plugins-access@lists.berkeley.edu with a request for which plugin(s) you would like access to. Make sure your email includes your github ID and proof (through affiliation or otherwise) that you have licensed access to relevant tools.
+
+Initialize the Hammer Plug-ins
+----------------------------------
+In the Chipyard root, run:
+
+.. code-block:: shell
+
+    ./scripts/init-vlsi.sh <tech-plugin-name>
+    
+This will pull the Hammer & CAD tool plugin submodules, assuming the technology plugins are available on github.
+Currently only the asap7 technology plugin is available on github.
+If you have additional private technology plugins (this is a typical use-case for proprietry process technologies with require NDAs and secure servers), you can clone them directly
+into VLSI directory with the name ``hammer-<tech-plugin-name>-plugin``.
+For example, for an imaginary process technology called tsmintel3:
+
+.. code-block:: shell
+
+    cd vlsi
+    git clone git@my-secure-server.berkeley.edu:tsmintel3/hammer-tsmintel3-plugin.git
+
+
+Next, we define the Hammer environment into the shell:
+
+.. code-block:: shell
+
+    cd vlsi    # (if you haven't done so yet)
+    export HAMMER_HOME=$PWD/hammer
+    source $HAMMER_HOME/sourceme.sh
+
+
+.. Note:: Some VLSI EDA tools are supported only on RHEL-based operating systems. We recommend using Chipyard on RHEL7 and above. However, many VLSI server still have old operating systems such as RHEL6, which have software packages older than the basic chipyard requirements. In order to build Chipyard on RHEL6, you will likely need to use tool packages such as devtoolset (for example, devtoolset-8) and/or build from source gcc, git, gmake, make, dtc, cc, bison, libexpat and liby.
+
+Setting up the Hammer Configuration Files
+--------------------------------------------
+
+The first configuration file that needs to be set up is the Hammer environment configuration file ``env.yml``. In this file you need to set the paths to the EDA tools and license servers you will be using. You do not have to fill all the fields in this configuration file, you only need to fill in the paths for the tools that you will be using.
+If you are working within a shared server farm environment with an LSF cluster setup (for example, the Berkeley Wireless Research Center), please note the additional possible environment configuration listed in the :ref:`VLSI/Basic-Flow:Advanced Environment Setup` segment of this documentation page. 
+
+Hammer relies on YAML-based configuration files. While these configuration can be consolidated within a single files (as is the case in the ASAP7 tutorial :ref:`tutorial` and the ``nangate45``
+OpenRoad example), the generally suggested way to work with an arbitrary process technology or tools plugins would be to use three configuration files, matching the three Hammer concerns - tools, tech, and design. 
+The ``vlsi`` directory includes three such example configuration files matching the three concerns: ``example-tools.yml``, ``example-tech.yml``, and ``example-design.yml``.
+
+The ``example-tools.yml`` file configures which EDA tools hammer will use. This example file uses Cadence Innovus, Genus and Voltus, Synopsys VCS, and Mentor Calibre (which are likely the tools you will use if you're working in the Berkeley Wireless Research Center). Note that tool versions are highly sensitive to the process-technology in-use. Hence, tool versions that work with one process technology may not work with another (for example, ASAP7 will not work with an Innovus version newer than 18.1, while other proprietary process technologies will likely require newer versions such as 19.1).
+
+The ``example-design.yml`` file contains basic build system information (how many cores/threads to use, etc.), as well as configurations that are specific to the design we are working on such as clock signal name and frequency, power modes, floorplan, and additional constraints that we will add later on.
+
+Finally, the ``example-tech`` file is a template file for a process technology plugin configuration. We will copy this file, and replace its fields with the appropriate process technology details for the tech plugin that we have access to. For example, for the ``asap7`` tech plugin we will replace the <tech_name> field with "asap7", the Node size "N" with "7", and the path to the process technology files installation directory.
+
+We recommend copying these example configuration files and customizing them with a different name, so you can have different configuration files for different process technologies and designs (e.g. create tech-tsmintel3.yml from example-tech.yml)
+
+
+Building the Design
+---------------------
+After we have set the configuration files, we will now elaborate our Chipyard Chisel design into Verilog, while also performing the required transformations in order to make the Verilog VLSI-friendly.
+Additionally, we will automatically generate another set of Hammer configuration files matching to this design, which will be used in order to configure the physical design tools.
+We will do so by calling ``make buildfile`` with appropriate Chipyard configuration variables and Hammer configuration files.
+As in the rest of the Chipyard flows, we specify our SoC configuration using the ``CONFIG`` make variable. 
+However, unlike the rest of the Chipyard flows, in the case of physical design we might be interested in working in a hierarchical fashion and therefore we would like to work on a single module.
+Therefore, we can also specify a ``VLSI_TOP`` make variable with the same of a specific Verilog module (which should also match the name of the equivalent Chisel module) which we would like to work on.
+The makefile will automatically call tools such as Barstools and the MacroCopmiler (:ref:`Tools/Barstools:barstools`) in order to make the generated Verilog more VLSI friendly. 
+By default, the MacroCopmiler will attempt to map memories into the SRAM options within the Hammer technology plugin. However, if you are wokring with a new process technology are prefer to work with flipflop arrays, you can configure the MacroCompiler using the ``MACROCOMPILER_MODE`` make variable. For example, the ASAP7 process technology does not have associated SRAMs, and therefore the ASAP7 Hammer tutorial (:ref:`tutorial`) uses the ``MACROCOMPILER_MODE='--mode synflops'`` option (Note that synthesizing a design with only flipflops is very slow and will often may not meet constraints).
+
+We call the ``make buildfile`` command while also specifying the name of the process technology we are working with (same ``tech_name`` for the configuration files and plugin name) and the configuration files we created. Note, in the ASAP7 tutorial ((:ref:`tutorial`)) these configuration files are merged into a single file called ``example-asap7.yml``.
+
+Hence, if we want to monolithically place and route the entire SoC, the relevant command would be
+.. code-block:: shell
+
+    make buildfile CONFIG=<chipyard_config_name> tech_name=<tech_name> INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+In a more typical scenario of working on a single module, for example the Gemmini accelerator within the GemminiRocketConfig Chipyard SoC configuration, the relevant command would be
+.. code-block:: shell
+
+    make buildfile CONFIG=GemminiRocketConfig VLSI_TOP=Gemmini tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+Running the VLSI Flow
+---------------------
+
+Running a basic VLSI flow using the Hammer default configurations is fairly simple, and consists of simple ``make`` command with the previously mentioned Make variables.
+
+Synthesis
+^^^^^^^^^
+
+In order to run synthesis, we run ``make syn`` with the matching Make variables. 
+Post-synthesis logs and collateral will be saved in ``build/<config-name>/syn-rundir``. The raw QoR data (area, timing, gate counts, etc.) will be found in ``build/<config-name>/syn-rundir/reports``.
+
+Hence, if we want to monolithically synthesize the entire SoC, the relevant command would be
+.. code-block:: shell
+
+    make syn CONFIG=<chipyard_config_name> tech_name=<tech_name> INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+In a more typical scenario of working on a single module, for example the Gemmini accelerator within the GemminiRocketConfig Chipyard SoC configuration, the relevant command would be
+.. code-block:: shell
+
+    make syn CONFIG=GemminiRocketConfig VLSI_TOP=Gemmini tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+
+It is worth checking the final-qor.rpt report to make sure that the synthesized design meets timing before moving to the place-and-route step.
+
+Place-and-Route
+^^^^^^^^^^^^^^^
+In order to run place-and-route, we run ``make par`` with the matching Make variables.
+Post-PnR logs and collateral will be saved in ``build/<config-name>/par-rundir``. Specifically, the resulting GDSII file will be in that directory with the suffix ``*.gds``. and timing reports can be found in ``build/<config-name>/par-rundir/timingReports``.
+Place-and-route is requires more design details in contrast to synthesis. For example, place-and-route requires some basic floorplanning constraints. The default ``example-design.yml`` configuration file template allows the tool (specifically, the Cadence Innovus tool) to use it's automatic floorplanning capability within the top level of the design (``ChipTop``). However, if we choose to place-and-route a specific block which is not the SoC top level, we need to change the top-level path name to match the ``VLSI_TOP`` make parameter we are using.
+
+Hence, if we want to monolitically place-and-route the entire SoC with the default tech plug-in parameters for power-straps and corners, the relevant command would be
+.. code-block:: shell
+
+    make par CONFIG=<chipyard_config_name> tech_name=<tech_name> INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+In a more typical scenario of working on a single module, for example the Gemmini accelerator within the GemminiRocketConfig Chipyard SoC configuration,
+
+.. code-block:: shell
+
+  vlsi.inputs.placement_constraints:
+    - path: "Gemmini"
+      type: toplevel
+      x: 0
+      y: 0
+      width: 300
+      height: 300
+      margins:
+        left: 0
+        right: 0
+        top: 0
+        bottom: 0
+
+The relevant ``make`` command would then be
+.. code-block:: shell
+
+    make par CONFIG=GemminiRocketConfig VLSI_TOP=Gemmini tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+Note that the width and height specification can vary widely between different modulesi and level of the module hierarchy. Make sure to set sane width and height values.
+Place-and-route generally requires more fine-grained input specifications regarding power nets, clock nets, pin assignments and floorplanning. While the template configuration files provide defaults for automatic tool defaults, these will usually result in very bad QoR, and therefore it is recommended to specify better-informed floorplans, pin assignments and power nets. For more information about cutomizing theses parameters, please refer to the :ref:`VLSI/Basic-Flow:Customizing Your VLSI Flow in Hammer` sections or to the Hammer documentation. 
+Additionally, some Hammer process technology plugins do not provide sufficient default values for requires settings such as power nets and pin assignments (for example, ASAP7). In those cases, these constraints will need to be specified manually in the top-level configuration yml files, as is the case in the ``example-asap7.yml`` configuration file.
+
+Place-and-route tools are very sensitive to process technologes (significantly more sensitive than synthesis tools), and different process technologies may work only on specific tool versions. It is recommended to check what is the appropriate tool version for the specific process technology you are working with.
+
+
+.. Note:: If you edit the yml configuration files in between synthesis and place-and-route, the `make par` command will automatically re-run synthesis. If you would like to avoid that and are confident that your configuration file changes do not affect synthesis results, you may use the `make redo-par` instead.
+
+
+
+Power Estimation
+^^^^^^^^^^^^^^^^^^^^
+Power estimation in Hammer can be performed in one of two stages: post-synthesis (post-syn) or post-place-and-route (post-par). The most accurate power estimation is post-par, and it includes finer grained details of the places instances and wire lengths.
+Post-par power estimation can be based on static average signal toggles rates (also known as "static power estimation"), or based on simulation-extracted signal toggle data (also known as "dynamic power estimation").
+
+.. Warning:: In order to run post-par power estimation, make sure that a power estimation tool (such as Cadence Voltus) has been defined in your ``example-tools.yml`` file. Make sure that the power estimation tool (for example, Cadence Voltus) version matches the physical design tool (for example, Cadence Innovus) version, otherwise you will encounter a database mismatch error.
+
+Simulation-exacted power estimation often requires a dedicated testharness for the block under evalution (DUT). While the Hammer flow supports such configurations (further details can be found in the Hammer documentation), Chipyard's integrated flows support an automated full digital SoC simulation-extracted post-par power estimation through the integration of software RTL simulation flows with the Hammer VLSI flow. As such, full digital SoC simulation-extracted power estimation can be performed by specifying a simple binary executable with the associated ``make`` command.
+
+.. code-block:: shell
+
+    make power-par BINARY=/path/to/baremetal/binary/rv64ui-p-addi.riscv CONFIG=<chipyard_config_name> tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+
+The simulation-extracted power estimation flow implicitly uses Hammer's gate-level simulation flow (in order to generate the ``saif`` activity data file). This gate-level simulation flow can also be run independantly from the power estimation flow using the ``make sim-par`` command.
+
+
+.. Note:: The gate-level simulation flow (and there the simulation-extracted power-estimation) is currently integrated only with the Synopsys VCS simulation (Verilator does not support gate-level simulation. Support for Cadence Incisive is work-in-progress)
+
+
+Signoff
+^^^^^^^^^
+
+During chip tapeout, you will need to perform sign-off check to make sure the generated GDSII can be fabricated as intended. This is done using dedicated signoff tools that perform design rule checking (DRC) and layout versus schematic (LVS) verification. 
+In most cases, placed-and-routed designs will not pass DRC and LVS on first attempts due to nuanced design rules and subtle/silent failures of the place-and-route tools. Passing DRC and LVS will often requires adding manual placement constraints to "force" the EDA tools into certain patterns. 
+If you have placed-and-routed a design with the goal of getting area and power estimates, DRC and LVS are not strictly neccessary and the results will likely be quite similar. If you are intending to tapeout and fabricate a chip, DRC and LVS are mandatory and will likely requires multiple-iterations of refining manual placement constraints.
+Having a large number of DRC/LVS violations can have a significant impact on the runtime of the place-and-route procedure (since the tools will try to fix each of them several times). A large number of DRC/LVS violations may also be an indication that the design is not necessarily realistic for this particular process technology, which may have power/area implications. 
+
+Since signoff checks are required only for a complete chip tapeout, they are currently not fully automated in Hammer, and often require some additional manual inclusion of custom Makefiles associated with specific process technologies. However, the general steps from running signoff within Hammer (under the assumption of a fully automated tech plug-in) are Make commands similar to the previous steps.
+
+In order to run DRC, the relevant ``make`` command is ``make drc``. As in the previous stages, the make command should be accompanied by the relevant configuration Make variables:
+
+.. code-block:: shell
+
+    make drc CONFIG=GemminiRocketConfig VLSI_TOP=Gemmini tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+
+DRC does not emit easily audited reports, as the rule names violated can be quite esoteric. It is often more productive to rather use the scripts generated by Hammer to open the DRC error database within the appropriate tool. These generated scripts can be called from ``./build/<config-name>/drc-rundir/generated-scripts/view_drc``.
+
+
+In order to run LVS, the relevant ``make`` command is ``make lvs``. As in the previous stages, the make command should be accompanied by the relevant configuration Make variables:
+ 
+.. code-block:: shell
+
+    make lvs CONFIG=GemminiRocketConfig VLSI_TOP=Gemmini tech_name=tsmintel3 INPUT_CONFS="example-design.yml example-tools.yml example-tech.yml"
+
+LVS does not emit easily audited reports, as the violations are often cryptic when seen textually. As a result it is often more productive to visually see the LVS issues using the generated scripts that enable opening the LVS error database within the appropriate tool. These generated scripts can be called from ``./build/<config-name>/lvs-rundir/generated-scripts/view_lvs``.
+
+
+Customizing Your VLSI Flow in Hammer
+----------------------------------------
+
+Advanced Environment Setup
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you have access to a shared LSF cluster and you would like Hammer to submit it's compute-intensive jobs to the LSF cluster rather than your login machine, you can add the following code segment to your ``env.yml`` file (completing the relevant values for the bsub binary path, the number of CPUs requested, and the requested LSF queue):
+
+.. code-block:: shell
+
+    #submit command (use LSF)
+    vlsi.submit:
+        command: "lsf"
+        settings: [{"lsf": {
+            "bsub_binary": "</path/to/bsub/binary/bsub>",
+            "num_cpus": <N>,
+            "queue": "<lsf_queu>",
+            "extra_args": ["-R", "span[hosts=1]"]
+            }
+        }]
+        settings_meta: "append"
+
+
+
+Composing a Hierarchical Design
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For large designs, a monolithic VLSI flow may take the EDA tools a very long time to process and optimize, to the extent that it may not be feasable sometimes. 
+Hammer supports a hierarchical physical design flow, which decomposes the design into several specified sub-components and runs the flow on each sub-components separetly. Hammer is then able to assemble these blocks together into a top-level design. This hierarchical approach speeds up the VLSI flow for large designs, especially designs in which there may me multiple instantiations of the same sub-components(since the sub-component can simply be replicated in the layout).
+While hierarchical physical design can be performed in multiple ways (top-down, bottom-up, abutment etc.), Hammer currently supports only the bottom-up approach.
+The bottom-up approach traverses a tree representing the hierarchy starting from the leaves and towards the direction of the root (the "top level"), and runs the physical design flow on each node of the hierarchy tree using the previously layed-out children nodes.
+As nodes get closer to the root (or "top level") of the hierarchy, largers sections of the design get layed-out.
+
+The Hammer hierarchical flow relies on a manually-specified descrition of the desired heirarchy tree. The specification of the heirarchy tree is defined based on the instance names in the generated Verilog, which sometime make this specification challenging due to inconsisent instance names. Additionally, the specification of the heirarchy tree is intertwined with the manual specification of a floorplan for the design.
+
+For example, if we choose to specifiy the previously mentioned ``GemminiRocketConfig`` configuration in a hierarchical fashion in which the Gemmini accelerator and the last-level cache are run separetly from the top-level SoC, we would replace the floorplan example in ``example-design.yml`` from the :ref:`VLSI/Basic-Flow:Place-and-Route` section with the following specification:
+
+.. code-block:: shell
+
+    vlsi.inputs.hiearchical.top_module: "ChipTop"
+    vlsi.inputs.hierarchical.mode: manual"
+    vlsi.inputs.manual_modules:
+      - ChipTop:
+        - RocketTile
+        - InclusiveCache
+      - RocketTile:
+        - Gemmini
+    vlsi.manual_placement_constraints:
+      - ChipTop
+        - path: "ChipTop"
+          type: toplevel
+          x: 0
+          y: 0
+          width: 500
+          height: 500
+          margins:
+            left: 0
+            right: 0
+            top: 0
+            bottom: 0
+      - RocketTile
+        - path: "chiptop.system.tile_prci_domain.tile"
+          type: hierarchical
+          master: ChipTop
+          x: 0
+          y: 0
+          width: 250
+          height: 250
+          margins:
+            left: 0
+            right: 0
+            top: 0
+            bottom: 0
+      - Gemmini
+        - path: "chiptop.system.tile_prci_domain.tile.gemmini"
+          type: hierarchical
+          master: RocketTile
+          x: 0
+          y: 0
+          width: 200
+          height: 200
+          margins:
+            left: 0
+            right: 0
+            top: 0
+            bottom: 0
+      - InclusiveCache
+        - path: "chiptop.system.subsystem_l2_wrapper.l2"
+          type: hierarchical
+          master: ChipTop
+          x: 0
+          y: 0
+          width: 100
+          height: 100
+          margins:
+            left: 0
+            right: 0
+            top: 0
+            bottom: 0
+
+
+In this specification, ``vlsi.inputs.hierarchical.mode`` indicates the manual specification of the heirarchy tree (which is the only mode currently supported by Hammer), ``vlsi.inputs.hiearchical.top_module`` sets the root of the hierarchical tree, ``vlsi.inputs.hierarchical.manual_modules`` enumerates the tree of hierarchical modules, and ``vlsi.inputs.hierarchical.manual_placement_constraints`` enumerates the floorplan for each module.
+
+
+.. Specifying a Custom Floorplan
+.. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+Customizing Generated Tcl Scripts
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``example-vlsi`` python script is the Hammer entry script with placeholders for hooks. Hooks are additional snippets of python and TCL (via ``x.append()``) to extend the Hammer APIs. Hooks can be inserted using the ``make_pre/post/replacement_hook`` methods as shown in the ``example-vlsi`` entry script example. In this particular example, a list of hooks is paased in the ``get_extra_par_hooks`` function in the ``ExampleDriver`` class. Refer to the `Hammer documentation on hooks <https://hammer-vlsi.readthedocs.io/en/latest/Hammer-Use/Hooks.html>`__ for a detailed description of how these are injected into the VLSI flow.
--- a/docs/VLSI/Building-A-Chip.rst
+++ b/docs/VLSI/Building-A-Chip.rst
@@ -10,7 +10,7 @@ Transforming the RTL
 --------------------

 Building a chip requires specializing the generic verilog emitted by FIRRTL to adhere to the constraints imposed by the technology used for fabrication.
-This includes mapping Chisel memories to available technology macros such as SRAMs, mapping the input and output of your chip to connect to technology IO cells, see :ref:`Barstools`.
+This includes mapping Chisel memories to available technology macros such as SRAMs, mapping the input and output of your chip to connect to technology IO cells, see :ref:`Tools/Barstools:Barstools`.
 In addition to these required transformations, it may also be beneficial to transform the RTL to make it more amenable to hierarchical physical design easier.
 This often includes modifying the logical hierarchy to match the physical hierarchy through grouping components together or flattening components into a single larger module.

@@ -49,6 +49,6 @@ Running the VLSI tool flow
 --------------------------

 For the full documentation on how to use the VLSI tool flow, see the `Hammer Documentation <https://hammer-vlsi.readthedocs.io/>`__.
-For an example of how to use the VLSI in the context of Chipyard, see :ref:`ASAP7 Tutorial`.
+For an example of how to use the VLSI in the context of Chipyard, see :ref:`VLSI/Tutorial:ASAP7 Tutorial`.


--- a/docs/VLSI/Tutorial.rst
+++ b/docs/VLSI/Tutorial.rst
@@ -30,7 +30,7 @@ This example gives a suggested file structure and build system. The ``vlsi/`` fo

  * Verilog wrapper around the accelerator and dummy hard macro.

-* example.yml
+* example-asap7.yml

  * Hammer IR for this tutorial.

@@ -77,7 +77,7 @@ Pull the Hammer environment into the shell:
    source $HAMMER_HOME/sourceme.sh

 Building the Design
-------------------
+--------------------
 To elaborate the ``Sha3RocketConfig`` (Rocket Chip w/ the accelerator) and set up all prerequisites for the build system to push just the accelerator + hard macro through the flow:

 .. code-block:: shell
--- a/docs/VLSI/index.rst
+++ b/docs/VLSI/index.rst
@@ -10,5 +10,6 @@ In particular, we aim to support the Hammer physical design generator flow.

   Building-A-Chip
   Hammer
+   Basic-Flow
   Tutorial
   Advanced-Usage
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -190,3 +190,6 @@ texinfo_documents = [
 intersphinx_mapping = {'python' : ('https://docs.python.org/', None),
                       'boom' : ('https://docs.boom-core.org/en/latest/', None),
                        'firesim' : ('http://docs.fires.im/en/latest/', None) }
+
+# resolve label conflict between documents
+autosectionlabel_prefix_document = True
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -11,7 +11,7 @@ Welcome to Chipyard's documentation!
 Chipyard is a framework for designing and evaluating full-system hardware using agile teams.
 It is composed of a collection of tools and libraries designed to provide an integration between open-source and commercial tools for the development of systems-on-chip.

-.. IMPORTANT:: **New to Chipyard?** Jump to the :ref:`Initial Repository Setup` page for setup instructions.
+.. IMPORTANT:: **New to Chipyard?** Jump to the :ref:`Chipyard-Basics/Initial-Repo-Setup:Initial Repository Setup` page for setup instructions.

 Getting Help
 ------------
--- a/scripts/ubuntu-req.sh
+++ b/scripts/ubuntu-req.sh
@@ -14,7 +14,7 @@ sudo apt-get install -y libexpat1-dev libusb-dev libncurses5-dev cmake
 # deps for poky
 sudo apt-get install -y python3.6 patch diffstat texi2html texinfo subversion chrpath git wget
 # deps for qemu
-sudo apt-get install -y libgtk-3-dev
+sudo apt-get install -y libgtk-3-dev gettext
 # deps for firemarshal
 sudo apt-get install -y python3-pip python3.6-dev rsync libguestfs-tools expat ctags
 # install DTC
--- a/vlsi/Makefile
+++ b/vlsi/Makefile
@@ -38,7 +38,7 @@ ENV_YML            ?= $(vlsi_dir)/env.yml
 INPUT_CONFS        ?= $(if $(filter $(tech_name),nangate45),\
                        example-nangate45.yml,\
                        example-asap7.yml)
-HAMMER_EXEC        ?= example-vlsi
+HAMMER_EXEC        ?= ./example-vlsi
 VLSI_TOP           ?= $(TOP)
 VLSI_HARNESS_DUT_NAME ?= chiptop
 VLSI_OBJ_DIR       ?= $(vlsi_dir)/build
--- a/vlsi/example-design.yml
+++ b/vlsi/example-design.yml
@@ -0,0 +1,37 @@
+# General Hammer Inputs Related to the Design and Build System
+
+# Generate Make include to aid in flow 
+vlsi.core.build_system: make
+vlsi.core.max_threads: 12
+
+# Hammer will auto-generate a CPF for simple power designs; see hammer/src/hammer-vlsi/defaults.yml for more info
+vlsi.inputs.power_spec_mode: "auto"
+vlsi.inputs.power_spec_type: "cpf"
+
+# Specify clock signals
+vlsi.inputs.clocks: [
+  {name: "clock", period: "2ns", uncertainty: "0.1ns"}
+]
+
+# Specify pin properties
+# Default pin placement can be set by the tool
+# Default pin layer assignments can be found in some tech plug-ins
+vlsi.inputs.pin_mode: generated
+vlsi.inputs.pin.generate_mode: semi_auto
+
+# Specify the floorplan
+# Default floor plan can be set by the tool
+# The path name should match the VLSI_TOP makefile parameter if it is set
+par.innovus.floorplan_mode: "auto"
+vlsi.inputs.placement_constraints:
+  - path: "ChipTop"
+    type: toplevel
+    x: 0
+    y: 0
+    width: 300
+    height: 300
+    margins:
+      left: 0
+      right: 0
+      top: 0
+      bottom: 0
--- a/vlsi/example-tech.yml
+++ b/vlsi/example-tech.yml
@@ -0,0 +1,7 @@
+# Technology Setup
+vlsi.core.technology: <tech_name>
+vlsi.core.technology_path: ["hammer-<tech_name>-plugin"]
+vlsi.core.technology_path_meta: append
+
+# technology files installation directory
+technology.<tech_name>.install_dir: "</path/to/technology/pdk/>"
--- a/vlsi/example-tools.yml
+++ b/vlsi/example-tools.yml
@@ -0,0 +1,33 @@
+# SRAM Compiler compiler options
+vlsi.core.sram_generator_tool: "sram_compiler"
+# You should specify a location for the SRAM generator in the tech plugin
+vlsi.core.sram_generator_tool_path: []
+vlsi.core.sram_generator_tool_path_meta: "append"
+
+# Tool options. Replace with your tool plugin of choice.
+# Genus options
+vlsi.core.synthesis_tool: "genus"
+vlsi.core.synthesis_tool_path: ["hammer-cadence-plugins/synthesis"]
+vlsi.core.synthesis_tool_path_meta: "append"
+synthesis.genus.version: "1813"
+# Innovus options
+vlsi.core.par_tool: "innovus"
+vlsi.core.par_tool_path: ["hammer-cadence-plugins/par"]
+vlsi.core.par_tool_path_meta: "append"
+par.innovus.version: "191_ISR3"
+par.innovus.design_flow_effort: "standard"
+par.inputs.gds_merge: true
+# Calibre options
+vlsi.core.drc_tool: "calibre"
+vlsi.core.drc_tool_path: ["hammer-mentor-plugins/drc"]
+vlsi.core.lvs_tool: "calibre"
+vlsi.core.lvs_tool_path: ["hammer-mentor-plugins/lvs"]
+# VCS options
+vlsi.core.sim_tool: "vcs"
+vlsi.core.sim_tool_path: ["hammer-synopsys-plugins/sim"]
+sim.vcs.version: "P-2019.06-SP2-5"
+# # Voltus options
+vlsi.core.power_tool: "voltus"
+vlsi.core.power_tool_path: ["hammer-cadence-plugins/power"]
+vlsi.core.power_tool_path_meta: "append"
+power.voltus.version: "191_ISR3"
--- a/vlsi/hammer
+++ b/vlsi/hammer
--- a/vlsi/hammer-cadence-plugins
+++ b/vlsi/hammer-cadence-plugins