.H 1 "Cache and TLB Theory of Operation"

This is the theory of operation for the Firefox TLB and cache.

.H 2 "Cache"

Firefox has a 16 Kbyte 2 way cache.  Each way is implemented with a cache
control unit (CCU) and eight 2K x 8 25 ns SRAM's.  Five of the SRAM's are
used for data and data parity and 3 of the SRAM's are used for tag, status
and tag parity.

.H 3 "Cache Address Lines"

The CCU supports a cache size of up to 16K bytes deep and so there are 3 unused address
lines on firefox.  These are RAD[0-2].

There is only one tag for each eight data words.  This means that there are
three address lines that do not connect to the tag SRAM's.  These are
RAD[11-13].  These address lines can not be connected because the tag value
is only valid on the first write to the cache on a cache fill and the dirty
bit is checked in only one location.

The RAM address lines are connected in a single line with a schottky diode at the end
to clamp undershoot.  There is also a termination resistor to 2.85 volts.
The main reason that the undersoot is clamped is to prevent it from ringing
back up above .7 volts.   The termination resistor helps some with pull up
but the main purpose is to keep the DC high level of driver lower so that there
is not as large of a swing when there is a high to low transition.  This gives
less ringing is the fast case and makes the slow case faster.  The slow case
is mainly capacitive and so a smaller voltage transition will be faster.
The budget for address changing was 1.5 to 6.0 ns.  The simulations show that
Firefox should be between about 1.8 and 5.0 ns.  When a board was
looked at it was about 3.5 ns.

.H 3 "Cache data lines"

There are 32 data lines NRD[0-31] and 3 data parity lines NRD[32-34].  This
means that there are 5 extra RAM data lines because the SRAM's are 8 bits
wide.

The delay budget for the cache data lines is 1.5 ns.  It was shown that for
Firefox the additional delay added because of the trace was about .5 ns.
This was done by comparing the rated capacitive load of the SRAM's to the
transmission line load of Firefox.

The data lines are assumed to be between 2.5 and 4.5 inches.  The minimum
is needed because the CCU floats the data bus and expect the data to
remain valid for a few nano-seconds.  The minimum length is only for capacitance and
so if the trace was made wider it could also be made shorter.

The parity lines are used in the cache to allow error correction.  This is
done by keeping vertical parity in the CCU.  This requires that all writes
be read modify writes so that the vertical parity register can be updated.
When there is a parity error the CCU will read all the locations in the cache
(taking parity) to determine what bit is in error.  The initialization
software is required to set up the CCU to let it know how big the cache is
so that it knows how many locations to read when there is an error.

.H 3 "Cache Tag Lines"

There are 24 tag bits.  These are listed below:

.DL
.LI
TAG[0-17] - These are the bits used to compare the real address.
.LI
TAG[18] - Lock bit - This bit is used to lock the entree.  It can be used
by software to lock critical code into the cache.  It can also be used
with the valid bit to map out bad locations in the cache.
.LI
TAG[19] - Dirty bit -  When a cache line is removed this bit
is used to tell if the line has to be written back to memory.
.LI
TAG[20] - Private bit - Used for cache coherency in multi-CPU systems.  This
bit is not used on Firefox.
.LI
TAG[21] - Valid bit - Used to say it the entree at the location is valid.
.LI
TAG[22,23] - Parity bits -  These are the parity bits for the bits listed
above.
.LE

.H 3 "Cache Transaction Types"

There are three types of transaction that the CCU will do.  These are listed
below:

.AL
.LI
READ - This is the CPU read.  The CPU reads a single word.  These can happen
every 40 ns.
.LI
WRITE - This is the read modify write.  It is the CPU write.  The CPU writes
a single word.  These cycles take 80 ns.  (The dirty bit is set.)
.LI
Cache Line Fill - When there is a Cache miss the SIU will do a cache line fill.
The CCU will always read the 8 bytes before writing the new 8 bytes, to
update the vertical parity.  The cache line is read while the SIU is getting
the new cache line from main memory.  It the cache line is dirty then it will pass it on
to the SIU to be written back to main memory.
.LE

.H 2 "The TLB"

The TLB has 2K entrees in the Firefox TLB.  The TLB control unit (TCU)
supports up to 4K entrees.  The address line that is not used on Firefox is
TAD[1].  There is only one way of associativity in the TLB and so to prevent
interlock it has to be a split instruction and data TLB.  the address line
TAD[0] is used to split the TLB for instructions and data.  There are 1K entrees
for instructions and 1K for data.

There are 80 data bits in the TLB entree.  Firefox uses the same RAM's in the
TLB that it uses in the cache (ten 2K x 8 25 ns SRAM's).

.H 3 "TLB Addressing"

The address lines are driven by the TCU.  These are the index for the TLB
RAM's.  They are generated by doing exclusive or's with bits 10 - 20 of the
virtual address and part of the space ID.

The timing for the TLB address and data lines is not as critical as the cache.
Only limited simulation were done to verify the timing.  The budget for
address lines changing is 7 ns.  It is not as critical that this spec be met
as it is for the cache address lines to meet the 6 ns spec.

.H 3 "TLB Data Lines"

The TLB is a cache for real addresses.  The real address that corresponds to
the virtual address is the data part of the TLB cache.

The real address is used by the CCU's to see if it matches the real address in
the cache.  There are 21 real address lines (RPN[0-20]).  The first 19 are
the real address and the last two are the parity for the real address.
The TCU does not need the real address and so these 21 lines do not connect to
the TCU.  On writes to the TLB the TCU drives the control and address lines for
the RPN RAM's and the first CCU drives the data lines.

On real access by the CPU (do not use the TLB), the TCU must drive the RPN bus
to the CCU's.  This is why there are TTL buffers in the TLB array.  When there
is a real access the TLB will drive the upper address bits on to TDA[0-18]
and enable the TTL buffers and disable the SRAM's.  The CCU's also know that
it is a real address and that they are not to check parity on the RPN.  Parity does not
have to be checked because the TLB SRAM's are not being used.  The TCU does not
generate parity on real accesses.

Note that there are 19 bits in the upper part of the real address.  This is the
real page number.  The page size is 2 Kbytes (11 bits).  There are two bits
missing.  The two bits missing are NAD[1,2].  This saves on the number of pins
used by the CCU and TCU and also allows one less SRAM to be used in each of
the cache arrays.  NAD[0] has to be used because this bit is used to tell if
it is an IO transaction or not.  The IO transactions can not be cached.
By not using address lines 1 and 2, the Firefox maximum main memory size is
limited to 256 Mbytes.

.H 3 "TLB Tag"

The TLB tag contains 59 bits.  The fields are listed below.  For more
information see the HPPA architecture and instruction reference manual
(HOP part number 09740-90014).

.DL
.LI
TDA[0,31,32,58] - Parity bits
.LI
TDA[1] - valid bit - This bit indicates if the entree is valid.
.LI
TDA[2-17] - Space ID
.LI
TDA[18-30] - Virtual page number - Note that there are 13 virtual page number
bits.  The other 8 are used only for indexing.  Three are used for both.
This is what limits the size options for the TLB.  The TLB can be as
small as 256 bytes and as large as 2 Kbytes (each for the instructions and
data).
.LI
TDA[33-39] - Access rights
.LI
TDA[40-54] - Access ID - Sometimes called protection ID.  The access ID is
compared to the protection ID's.
.LI
TDA[55] - Virtual IO bit
.LI
TDA[56] - TLB dirty bit
.LI
TDA[57] - Data break bit
.LE

.H 3 "TLB Error handling"

When the TLB gets a parity error it treats it much like a TLB miss.  In Firefox
there is always a clean copy of the TLB entree in main memory.   There is a
bug in the TCU that gives some small probability that Firefox will not recover
when there is a TLB parity error.

.H 2 "SRAM Requirements"

This section talks about special SRAM requirements that Firefox needs.
The vendors have agreed to these requirements.  These requirements are
special only in that they are not part of all the standard specs or
they are not clear in the specs.  In general they do not require
custom parts or special test from the vendors.

.H 3 "Timing References"
.H 4 "Inputs"
The AC timing characteristics are specified for input signal transition
times of less than 4 ns between .8 and 2.4 volts measured at the 1.5
volt level.

Input signals (including write control) are not required to change
monotonically.

The following method will be used to determine the minimum and maximum times
that input signals can be considered valid.  This method will be used for
measuring access time and all set up and hold times.

.H 5 "Minimum Delay"

A straight line is drawn from .8 volts to 2.4 volts that is on the left side
of the signal and just touches it.  The slope of the
line is chosen to maximize the delay at 1.5 volts but must have a transition
time no greater than 4 ns.  The minimum delay is the 
time that the line crosses 1.5 volts.

.H 5 "Maximum Delay"

The maximum delay is measured in the same way as the minimum delay except
that the line is drawn to the right of the signal and the delay at 1.5 volts
is minimized.

.H 4 "Outputs"
Outputs are measured at the 1.5 volt level.  When measured at 2.0 volts and
0.8 volts the access time is lengthened by no more than 2 ns.

.H 3 "Other Requirements"

.H 4 "Under Shoot"
The DC VIL min spec is -.5 volts.  Inputs are allowed to go down to -2 volts
for less than 10 ns with no speed degradation or data loss.  This is allowed
to happen on all inputs at the same time at a 30 MHz rate.

.H 4 "Data Hold"

It is assumed that
data will only be held for as long as the outputs remain in the high
impedance state.  If the RAM starts driving data at 0 ns after WE goes
high then data will only be held for 0 ns.

.H 4 "Floating the Data Lines"

As the RAM goes into the high Z state (by CE going high or WE going low)
the logic state of the data outputs will not be effected.
This allows us to disable the
RAM outputs and read the data 20 to 30 ns later, assuming that there is
some capacitive load and very little resistive load on the data outputs
of the RAM.
