Merge branch 'main' of https://github.com/ucb-bar/chipyard into klayout-docs

This commit is contained in:
Nayiri K
2023-06-15 11:00:53 -07:00
25 changed files with 453 additions and 92 deletions

View File

@@ -1,10 +1,15 @@
.. _rocc-accelerators:
Adding a RoCC Accelerator
----------------------------
-------------------------
RoCC accelerators are lazy modules that extend the ``LazyRoCC`` class.
Their implementation should extends the ``LazyRoCCModule`` class.
A RoCC accelerator is a component that can be added into a particular Rocket or BooM tile.
It receives instructions that match a certain opcode, talks to other parts of the core or SoC (L1, L2, PTW, FPU), and then optionally writes back a value into the register corresponding with the ``rd`` field of the instruction.
RoCC accelerators are instantiated via modules that extend the ``LazyRoCC`` class.
These modules lazily instantiate another module which extends the ``LazyRoCCModule`` class.
This extra layer of indirection is used so that Diplomacy can figure out how to connect the RoCC module to the chip, without needing to instantiate the module ahead of time.
Lazy modules are further explained in the :ref:`Chipyard-Basics/Configs-Parameters-Mixins:Cake Pattern / Mixin` section.
Below is a minimal instantiation of a RoCC accelerator.
.. code-block:: scala
@@ -31,7 +36,6 @@ Their implementation should extends the ``LazyRoCCModule`` class.
...
}
The ``opcodes`` parameter for ``LazyRoCC`` is the set of custom opcodes that will map to this accelerator.
More on this in the next subsection.
@@ -46,6 +50,47 @@ the ``busy`` signal, which indicates when the accelerator is still handling an i
and the ``interrupt`` signal, which can be used to interrupt the CPU.
Look at the examples in ``generators/rocket-chip/src/main/scala/tile/LazyRoCC.scala`` for detailed information on the different IOs.
There is also more information about each of the signals in `the RoCC Documentation written by UCSD <https://docs.google.com/document/d/1CH2ep4YcL_ojsa3BVHEW-uwcKh1FlFTjH_kg5v8bxVw/edit>`_, although it is updated out of tree and may be out of date.
Accessing Memory via L1 Cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A RoCC accelerator can access memory through the L1 Cache of the core it is attached to.
This is a simpler interface for accelerator architects to implement, but will generally have lower achievable throughput than a dedicated TileLink port.
In your ``LazyRoCCModuleImp``, the signal ``io.mem`` is a ``HellaCacheIO``, which is defined in ``generators/rocket-chip/src/main/scala/rocket/HellaCache.scala``.
.. code-block:: scala
class HellaCacheIO(implicit p: Parameters) extends CoreBundle()(p) {
val req = Decoupled(new HellaCacheReq)
val s1_kill = Output(Bool()) // kill previous cycle's req
val s1_data = Output(new HellaCacheWriteData()) // data for previous cycle's req
val s2_nack = Input(Bool()) // req from two cycles ago is rejected
val s2_nack_cause_raw = Input(Bool()) // reason for nack is store-load RAW hazard (performance hint)
val s2_kill = Output(Bool()) // kill req from two cycles ago
val s2_uncached = Input(Bool()) // advisory signal that the access is MMIO
val s2_paddr = Input(UInt(paddrBits.W)) // translated address
val resp = Flipped(Valid(new HellaCacheResp))
val replay_next = Input(Bool())
val s2_xcpt = Input(new HellaCacheExceptions)
val s2_gpa = Input(UInt(vaddrBitsExtended.W))
val s2_gpa_is_pte = Input(Bool())
val uncached_resp = tileParams.dcache.get.separateUncachedResp.option(Flipped(Decoupled(new HellaCacheResp)))
val ordered = Input(Bool())
val perf = Input(new HellaCachePerfEvents())
val keep_clock_enabled = Output(Bool()) // should D$ avoid clock-gating itself?
val clock_enabled = Input(Bool()) // is D$ currently being clocked?
}
At a high level, you must tag requests that you send across this interface using the ``io.mem.req.tag``, and the tag will be returned to you when the data is ready.
Responses may come back out of order if you issue multiple requests, so you can use these tags to tell what data came back.
Note that the number of tag bits is controled by ``dcacheReqTagBits``, which is usually set to 6.
Using more than 6 bits will cause errors or hangs.
Adding RoCC accelerator to Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -67,3 +112,4 @@ For instance, if we wanted to add the previously defined accelerator and route c
new RocketConfig)
To add RoCC instructions in your program, use the RoCC C macros provided in ``tests/rocc.h``. You can find examples in the files ``tests/accum.c`` and ``charcount.c``.

View File

@@ -0,0 +1,9 @@
Prefetchers
====================================
The BAR-fetchers library is a collection of Chisel-implemented prefetchers, designed for compatibility with Chipyard and Rocket-Chip SoCs.
This package implements a generic prefetcher API, and example implementations of NextLine, Strided, and AMPM prefetchers.
Prefetchers can be instantiated in front of a L1D HellaCache, or as TileLink nodes in front of some TileLink bus.
An example configuration using prefetchers is found in the ``PrefetchingRocketConfig``

View File

@@ -0,0 +1,8 @@
Shuttle RISC-V Core
===================
Shuttle is a Rocket-based superscalar in-order RISC-V core, supporting the base RV64IMAFDC instruction set with supervisor and user-mode. Shuttle is a 6-stage core that can be configured to be dual, three, or quad-issue, although dual-issue is the most sensible design point. Shuttle is not designed to meet any power, performance, or area targets. It exists purely as a demonstrative example of another RISC-V CPU design point.
The superscalar microarchitecture presents the most advantages for 1) floating-point kernels and 2) RoCC accelerator kernels, as scalar control code can execute concurrently with floating point or RoCC instructions, maintaining high utilization of those units.
Shuttle is tape-out proven, and has similar physical design complexity as Rocket.

View File

@@ -33,4 +33,6 @@ so changes to the generators themselves will automatically be used when building
fft
NVDLA
Sodor
Shuttle
Mempress
Prefetchers