Merge branch 'dev' into patch-2

2021-04-06 09:57:26 +05:30
parent a7befc8c5c d411f2ac41
commit 78241c278b
47 changed files with 930 additions and 444 deletions
--- a/docs/Advanced-Concepts/Chip-Communication.rst
+++ b/docs/Advanced-Concepts/Chip-Communication.rst
@@ -7,7 +7,7 @@ There are two types of DUTs that can be made: `tethered` or `standalone` DUTs.
 A `tethered` DUT is where a host computer (or just host) must send transactions to the DUT to bringup a program.
 This differs from a `standalone` DUT that can bringup itself (has its own bootrom, loads programs itself, etc).
 An example of a tethered DUT is a Chipyard simulation where the host loads the test program into the DUTs memory and signals to the DUT that the program is ready to run.
-An example of a standalone DUT is a Chipyard simulation where a program can be loaded from an SDCard by default.
+An example of a standalone DUT is a Chipyard simulation where a program can be loaded from an SDCard out of reset.
 In this section, we mainly describe how to communicate to tethered DUTs.

 There are two ways the host (otherwise known as the outside world) can communicate with a tethered Chipyard DUT:
@@ -45,33 +45,21 @@ Using the Tethered Serial Interface (TSI)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 By default, Chipyard uses the Tethered Serial Interface (TSI) to communicate with the DUT.
-TSI protocol is an implementation of HTIF that is used to send commands to the
-RISC-V DUT. These TSI commands are simple R/W commands
-that are able to probe the DUT's memory space. During simulation, the host sends TSI commands to a
-simulation stub called ``SimSerial`` (C++ class) that resides in a ``SimSerial`` Verilog module
-(both are located in the ``generators/testchipip`` project). This ``SimSerial`` Verilog module then
-sends the TSI command recieved by the simulation stub into the DUT which then converts the TSI
-command into a TileLink request. This conversion is done by the ``SerialAdapter`` module
-(located in the ``generators/testchipip`` project). In simulation, FESVR
-resets the DUT, writes into memory the test program, and indicates to the DUT to start the program
-through an interrupt (see :ref:`customization/Boot-Process:Chipyard Boot Process`). Using TSI is currently the fastest
-mechanism to communicate with the DUT in simulation.
-
-In the case of a chip tapeout bringup, TSI commands can be sent over a custom communication
-medium to communicate with the chip. For example, some Berkeley tapeouts have a FPGA
-with a RISC-V soft-core that runs FESVR. The FESVR on the soft-core sends TSI commands
-to a TSI-to-TileLink converter living on the FPGA (i.e. ``SerialAdapter``). After the transaction is
-converted to TileLink, the ``TLSerdesser`` (located in ``generators/testchipip``) serializes the
-transaction and sends it to the chip (this ``TLSerdesser`` is sometimes also referred to as a
-serial-link or serdes). Once the serialized transaction is received on the
-chip, it is deserialized and masters a bus on the chip. The following image shows this flow:
-
-.. image:: ../_static/images/chip-bringup.png
-
-.. note::
-    The ``TLSerdesser`` can also be used as a slave (client), so it can sink memory requests from the chip
-    and connect to off-chip backing memory. Or in other words, ``TLSerdesser`` creates a bi-directional TileLink
-    interface.
+TSI protocol is an implementation of HTIF that is used to send commands to the RISC-V DUT.
+These TSI commands are simple R/W commands that are able to access the DUT's memory space.
+During simulation, the host sends TSI commands to a simulation stub in the test harness called ``SimSerial``
+(C++ class) that resides in a ``SimSerial`` Verilog module (both are located in the ``generators/testchipip``
+project).
+This ``SimSerial`` Verilog module then sends the TSI command recieved by the simulation stub
+to an adapter that converts the TSI command into a TileLink request.
+This conversion is done by the ``SerialAdapter`` module (located in the ``generators/testchipip`` project).
+After the transaction is converted to TileLink, the ``TLSerdesser`` (located in ``generators/testchipip``) serializes the
+transaction and sends it to the chip (this ``TLSerdesser`` is sometimes also referred to as a digital serial-link or SerDes).
+Once the serialized transaction is received on the chip, it is deserialized and masters a TileLink bus on the chip
+which handles the request.
+In simulation, FESVR resets the DUT, writes into memory the test program, and indicates to the DUT to start the program
+through an interrupt (see :ref:`customization/Boot-Process:Chipyard Boot Process`).
+Using TSI is currently the fastest mechanism to communicate with the DUT in simulation (compared to DMI/JTAG) and is also used by FireSim.

 Using the Debug Module Interface (DMI)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -90,14 +78,14 @@ command into a TileLink request. This conversion is done by the DTM named ``Debu
 When the DTM receives the program to load, it starts to write the binary byte-wise into memory.
 This is considerably slower than the TSI protocol communication pipeline (i.e. ``SimSerial``/``SerialAdapter``/TileLink)
 which directly writes the program binary to memory.
-Thus, Chipyard removes the DTM by default in favor of the TSI protocol for DUT communication.

 Starting the TSI or DMI Simulation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-All default Chipyard configurations use TSI to communicate between the simulation and the simulated SoC/DUT. Hence, when running a
-software RTL simulation, as is indicated in the :ref:`simulation/Software-RTL-Simulation:Software RTL Simulation` section, you are in-fact using TSI to communicate with the DUT. As a
-reminder, to run a software RTL simulation, run:
+All default Chipyard configurations use TSI to communicate between the simulation and the simulated SoC/DUT.
+Hence, when running a software RTL simulation, as is indicated in the
+:ref:`simulation/Software-RTL-Simulation:Software RTL Simulation` section, you are in-fact using TSI to communicate with the DUT.
+As a reminder, to run a software RTL simulation, run:

 .. code-block:: bash

@@ -105,11 +93,10 @@ reminder, to run a software RTL simulation, run:
   # or
   cd sims/vcs

-   make CONFIG=LargeBoomConfig run-asm-tests
+   make CONFIG=RocketConfig run-asm-tests

-FireSim FPGA-accelerated simulations use TSI by default as well.
-
-If you would like to build and simulate a Chipyard configuration with a DTM configured for DMI communication, then you must tie-off the TSI interface, and instantiate the `SimDTM`. Note that we use `WithTiedOffSerial ++ WithSimDebug` instead of `WithTiedOffDebug ++ WithSimSerial`.
+If you would like to build and simulate a Chipyard configuration with a DTM configured for DMI communication,
+then you must tie-off the serial-link interface, and instantiate the `SimDTM`.

 .. literalinclude:: ../../generators/chipyard/src/main/scala/config/RocketConfigs.scala
    :language: scala
@@ -129,14 +116,110 @@ Then you can run simulations with the new DMI-enabled top-level and test-harness
 Using the JTAG Interface
 ------------------------

-The main way to use JTAG with a Rocket Chip based system is to instantiate the Debug Transfer Module (DTM)
-and configure it to use a JTAG interface. The default Chipyard designs instantiate the DTM and configure it
-to use JTAG. You may attach OpenOCD and GDB to any of the default JTAG-enabled designs.
+Another way to interface with the DUT is to use JTAG.
+Similar to the :ref:`Advanced-Concepts/Chip-Communication:Using the Debug Module interface (DMI)` section, in order to use the JTAG protocol,
+the DUT needs to contain a Debug Transfer Module (DTM) configured to use JTAG instead of DMI.
+Once the JTAG port is exposed, the host can communicate over JTAG to the DUT through a simulation stub
+called ``SimJTAG`` (C++ class) that resides in a ``SimJTAG`` Verilog module (both reside in the ``generators/rocket-chip`` project).
+This simulation stub creates a socket that OpenOCD and GDB can connect to when the simulation is running.
+The default Chipyard designs instantiate the DTM configured to use JTAG (i.e. ``RocketConfig``).
+
+.. note::
+    As mentioned, default Chipyard designs are enabled with JTAG.
+    However, they also use TSI/Serialized-TL with FESVR in case the JTAG interface isn't used.
+    This allows users to choose how to communicate with the DUT (use TSI or JTAG).

 Debugging with JTAG
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~

-Please refer to the following resources on how to debug with JTAG.
+Roughly the steps to debug with JTAG in simulation are as follows:

-* https://github.com/chipsalliance/rocket-chip#-debugging-with-gdb
-* https://github.com/riscv/riscv-isa-sim#debugging-with-gdb
+1. Build a Chipyard JTAG-enabled RTL design. Remember default Chipyard designs are JTAG ready.
+
+.. code-block:: bash
+
+    cd sims/verilator
+    # or
+    cd sims/vcs
+
+    make CONFIG=RocketConfig
+
+2. Run the simulation with remote bit-bang enabled. Since we hope to load/run the binary using JTAG,
+   we can pass ``none`` as a binary (prevents FESVR from loading the program). (Adapted from: https://github.com/chipsalliance/rocket-chip#3-launch-the-emulator)
+
+.. code-block:: bash
+
+    # note: this uses Chipyard make invocation to run the simulation to properly wrap the simulation args
+    make CONFIG=RocketConfig BINARY=none SIM_FLAGS="+jtag_rbb_enable=1 --rbb-port=9823" run-binary
+
+3. `Follow the instructions here to connect to the simulation using OpenOCD + GDB. <https://github.com/chipsalliance/rocket-chip#4-launch-openocd>`__
+
+.. note::
+    This section was adapted from the instruction in Rocket Chip and riscv-isa-sim. For more information refer
+    to that documentation: `Rocket Chip GDB Docs <https://github.com/chipsalliance/rocket-chip#-debugging-with-gdb>`__,
+    `riscv-isa-sim GDB Docs <https://github.com/riscv/riscv-isa-sim#debugging-with-gdb>`__
+
+Example Test Chip Bringup Communication
+---------------------------------------
+
+Intro to Typical Chipyard Test Chip
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most, if not all, Chipyard configurations are tethered using TSI (over a serial-link) and have access
+to external memory through an AXI port (backing AXI memory).
+The following image shows the DUT with these set of default signals:
+
+.. image:: ../_static/images/default-chipyard-config-communication.png
+
+In this setup, the serial-link is connected to the TSI/FESVR peripherals while the AXI port is connected
+to a simulated AXI memory.
+However, AXI ports tend to have many signals, and thus wires, associated with them so instead of creating an AXI port off the DUT,
+one can send the memory transactions over the bi-directional serial-link (``TLSerdesser``) so that the main
+interface to the DUT is the serial-link (which has comparatively less signals than an AXI port).
+This new setup (shown below) is a typical Chipyard test chip setup:
+
+.. image:: ../_static/images/bringup-chipyard-config-communication.png
+
+Simulation Setup of the Example Test Chip
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To test this type of configuration (TSI/memory transactions over the serial-link), most of the same TSI collateral
+would be used.
+The main difference is that the TileLink-to-AXI converters and simulated AXI memory resides on the other side of the
+serial-link.
+
+.. image:: ../_static/images/chip-bringup-simulation.png
+
+.. note::
+    Here the simulated AXI memory and the converters can be in a different clock domain in the test harness
+    than the reference clock of the DUT.
+    For example, the DUT can be clocked at 3.2GHz while the simulated AXI memory can be clocked at 1GHz.
+    This functionality is done in the harness binder that instantiates the TSI collateral, TL-to-AXI converters,
+    and simulated AXI memory.
+    See :ref:`Advanced-Concepts/Harness-Clocks:Creating Clocks in the Test Harness` on how to generate a clock
+    in a harness binder.
+
+This type of simulation setup is done in the following multi-clock configuration:
+
+.. literalinclude:: ../../generators/chipyard/src/main/scala/config/RocketConfigs.scala
+    :language: scala
+    :start-after: DOC include start: MulticlockAXIOverSerialConfig
+    :end-before: DOC include end: MulticlockAXIOverSerialConfig
+
+Bringup Setup of the Example Test Chip after Tapeout
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Assuming this example test chip is taped out and now ready to be tested, we can communicate with the chip using this serial-link.
+For example, a common test setup used at Berkeley to evaluate Chipyard-based test-chips includes an FPGA running a RISC-V soft-core that is able to speak to the DUT (over an FMC).
+This RISC-V soft-core would serve as the host of the test that will run on the DUT.
+This is done by the RISC-V soft-core running FESVR, sending TSI commands to a ``SerialAdapter`` / ``TLSerdesser`` programmed on the FPGA.
+Once the commands are converted to serialized TileLink, then they can be sent over some medium to the DUT
+(like an FMC cable or a set of wires connecting FPGA outputs to the DUT board).
+Similar to simulation, if the chip requests offchip memory, it can then send the transaction back over the serial-link.
+Then the request can be serviced by the FPGA DRAM.
+The following image shows this flow:
+
+.. image:: ../_static/images/chip-bringup.png
+
+In fact, this exact type of bringup setup is what the following section discusses:
+:ref:`Prototyping/VCU118:Introduction to the Bringup Platform`.
--- a/docs/Advanced-Concepts/Harness-Clocks.rst
+++ b/docs/Advanced-Concepts/Harness-Clocks.rst
@@ -0,0 +1,38 @@
+.. _harness-clocks:
+
+Creating Clocks in the Test Harness
+===================================
+
+Chipyard currently allows the SoC design (everything under ``ChipTop``) to
+have independent clock domains through diplomacy.
+This implies that some reference clock enters the ``ChipTop`` and then is divided down into
+separate clock domains.
+From the perspective of the ``TestHarness`` module, the ``ChipTop`` clock and reset is
+provided from a clock and reset called ``buildtopClock`` and ``buildtopReset``.
+In the default case, this ``buildtopClock`` and ``buildtopReset`` is directly wired to the
+clock and reset IO's of the ``TestHarness`` module.
+However, the ``TestHarness`` has the ability to generate a standalone clock and reset signal
+that is separate from the reference clock/reset of ``ChipTop``.
+This allows harness components (including harness binders) the ability to "request" a clock
+for a new clock domain.
+This is useful for simulating systems in which modules in the harness have independent clock domains
+from the DUT.
+
+Requests for a harness clock is done by the ``HarnessClockInstantiator`` class in ``generators/chipyard/src/main/scala/TestHarness.scala``.
+This class is accessed in harness components by referencing the Rocket Chip parameters key ``p(HarnessClockInstantiatorKey)``.
+Then you can request a clock and syncronized reset at a particular frequency by invoking the ``requestClockBundle`` function.
+Take the following example:
+
+.. literalinclude:: ../../generators/chipyard/src/main/scala/HarnessBinders.scala
+    :language: scala
+    :start-after: DOC include start: HarnessClockInstantiatorEx
+    :end-before: DOC include end: HarnessClockInstantiatorEx
+
+Here you can see the ``p(HarnessClockInstantiatorKey)`` is used to request a clock and reset at ``memFreq`` frequency.
+
+.. note::
+    In the case that the reference clock entering ``ChipTop`` is not the overall reference clock of the simulation
+    (i.e. the clock/reset coming into the ``TestHarness`` module), the ``buildtopClock`` and ``buildtopReset`` can
+    differ from the implicit ``TestHarness`` clock and reset. For example, if the ``ChipTop`` reference is 500MHz but an
+    extra harness clock is requested at 1GHz, the ``TestHarness`` implicit clock/reset will be at 1GHz while the ``buildtopClock``
+    and ``buildtopReset`` will be at 500MHz.
--- a/docs/Advanced-Concepts/index.rst
+++ b/docs/Advanced-Concepts/index.rst
@@ -14,4 +14,5 @@ They expect you to know about Chisel, Parameters, configs, etc.
   Debugging-BOOM
   Resources
   CDEs
+   Harness-Clocks

--- a/docs/Chipyard-Basics/Initial-Repo-Setup.rst
+++ b/docs/Chipyard-Basics/Initial-Repo-Setup.rst
@@ -66,6 +66,22 @@ You can put this in your ``.bashrc`` or equivalent environment setup file to get

 These variables need to be set for the ``make`` system to work properly.

+Pre-built Docker Image
+-------------------------------------------
+
+An alternative to setting up the Chipyard repository locally is to pull the pre-built Docker image from Docker Hub. The image comes with all dependencies installed, Chipyard cloned, and toolchains initialized. This image sets up baseline Chipyard (not including FireMarshal, FireSim, and Hammer initializations). Each image comes with a tag that corresponds to the version of Chipyard cloned/set-up in that image. Not including a tag during the pull will pull the image with the latest version of Chipyard.
+First, pull the Docker image. Run:
+
+.. code-block:: shell
+
+    sudo docker pull ucbbar/chipyard-image:<TAG>
+
+To run the Docker container in an interactive shell, run:
+
+.. code-block:: shell
+
+    sudo docker run -it ucbbar/chipyard-image bash
+
 What's Next?
 -------------------------------------------

--- a/docs/Customization/Heterogeneous-SoCs.rst
+++ b/docs/Customization/Heterogeneous-SoCs.rst
@@ -17,8 +17,8 @@ The following example shows a dual core BOOM with a single core Rocket.

 .. literalinclude:: ../../generators/chipyard/src/main/scala/config/HeteroConfigs.scala
    :language: scala
-    :start-after: DOC include start: DualBoomAndRocket
-    :end-before: DOC include end: DualBoomAndRocket
+    :start-after: DOC include start: DualBoomAndSingleRocket
+    :end-before: DOC include end: DualBoomAndSingleRocket


 Adding Hwachas
@@ -48,7 +48,7 @@ An example is shown below with two BOOM cores, and one Rocket tile with a RoCC a
    :start-after: DOC include start: DualBoomAndRocketOneHwacha
    :end-before: DOC include end: DualBoomAndRocketOneHwacha

-The ``WithMultiRoCCHwacha`` config fragment assigns a Hwacha accelerator to a particular ``hartId`` (in this case, the ``hartId`` of ``2`` corresponds to the Rocket core).
+The ``WithMultiRoCCHwacha`` config fragment assigns a Hwacha accelerator to a particular ``hartId`` (in this case, the ``hartId`` of ``0`` corresponds to the Rocket core).
 Finally, the ``WithMultiRoCC`` config fragment is called.
 This config fragment sets the ``BuildRoCC`` key to use the ``MultiRoCCKey`` instead of the default.
 This must be used after all the RoCC parameters are set because it needs to override the ``BuildRoCC`` parameter.
@@ -56,6 +56,29 @@ If this is used earlier in the configuration sequence, then MultiRoCC does not w

 This config fragment can be changed to put more accelerators on more cores by changing the arguments to cover more ``hartId``'s (i.e. ``WithMultiRoCCHwacha(0,1,3,6,...)``).

+Since config fragments are applied from right-to-left (or bottom-to-top as they are formatted here), the right-most config fragment specifying a core (which is ``freechips.rocketchip.subsystem.WithNBigCores`` in the example above) gets the first hart ID.
+Consider this config:
+
+.. code-block:: scala
+
+    class RocketThenBoomHartIdTestConfig extends Config(
+      new boom.common.WithNLargeBooms(2) ++
+      new freechips.rocketchip.subsystem.WithNBigCores(3) ++
+      new chipyard.config.AbstractConfig)
+
+This specifies an SoC with three Rocket cores and two BOOM cores.
+The Rocket cores would have hart IDs 0, 1, and 2, while the BOOM cores would have hard IDs 3 and 4.
+On the other hand, consider this config which reverses the order of those two fragments:
+
+.. code-block:: scala
+
+    class BoomThenRocketHartIdTestConfig extends Config(
+      new freechips.rocketchip.subsystem.WithNBigCores(3) ++
+      new boom.common.WithNLargeBooms(2) ++
+      new chipyard.config.AbstractConfig)
+
+This also specifies an SoC with three Rocket cores and two BOOM cores, but because the BOOM config fragment is evaluated before the Rocket config fragment, the hart IDs are reversed.
+The BOOM cores would have hart IDs 0 and 1, while the Rocket cores would have hard IDs 2, 3, and 4.

 .. [1] Note, in this section "core" and "tile" are used interchangeably but there is subtle distinction between a "core" and "tile" ("tile" contains a "core", L1D/I$, PTW).
    For many places in the documentation, we usually use "core" to mean "tile" (doesn't make a large difference but worth the mention).
--- a/docs/Tools/Barstools.rst
+++ b/docs/Tools/Barstools.rst
@@ -102,7 +102,7 @@ Separating the Top module from the TestHarness module

 Unlike the FireSim and Software simulation flows, a VLSI flow needs to separate the test harness and the chip (a.k.a. DUT) into separate files.
 This is necessary to facilitate post-synthesis and post-place-and-route simulation, as the module names in the RTL and gate-level verilog files would collide.
-Simulations after you the design goes through a VLSI flow will use the verilog netlist generated from the flow and will need an untouched test harness to drive it.
+Simulations, after your design goes through a VLSI flow, will use the verilog netlist generated from the flow and will need an untouched test harness to drive it.
 Separating these components into separate files makes this straightforward.
 Without the separation the file that included the test harness would also redefine the DUT which is often disallowed in simulation tools.
 To do this, there is a FIRRTL ``App`` in :ref:`Tools/Barstools:Barstools` called ``GenerateTopAndHarness``, which runs the appropriate transforms to elaborate the modules separately.
--- a/docs/_static/images/bringup-chipyard-config-communication.png
+++ b/docs/_static/images/bringup-chipyard-config-communication.png
--- a/docs/_static/images/chip-bringup-simulation.png
+++ b/docs/_static/images/chip-bringup-simulation.png
--- a/docs/_static/images/chip-bringup.png
+++ b/docs/_static/images/chip-bringup.png
--- a/docs/_static/images/chip-communication.png
+++ b/docs/_static/images/chip-communication.png
--- a/docs/_static/images/default-chipyard-config-communication.png
+++ b/docs/_static/images/default-chipyard-config-communication.png