.pl 10.95i
.nr Ej 1
.nr Hs 7
.nr Hb 7
.nr Oc 1
.nr Pi 5
.nr Pt 1
.nr Hi 1
.nr Cl 7
.ds HP 12 12 10 10 10 10
.nr Ls 0
.pn 7
.SA 1
.tr ~ 
.PH "''''"
.PF "'\fB~~~~~~~~~~~   PCXN ERS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\*(DT~~~~~~~~~~~~~~~~~~~~~~~~~%~~~~~~~~~56\fR' ' '"
.TL
TOP GUN EXTERNAL REFERENCE SPECIFICATION (c)
.AF "(c) Copyright 1987 Hewlett-Packard Company"
.AU "Edited by Rob Horning"
.MT 4 1
.OP
.H 1 "INTRODUCTION"
.ad b
.P
This chapter provides a description of the External Reference
Specification (ERS) for the PCXN product.  
.H 2 "Purpose and Scope"
This document is intended to give an overview of the PCXN
product.
.ad b
.P
Specific questions about the Top Gun product can be answered by the appropriate
people shown in the table below.
.TB "PCXN Project Staff"
.SP
.TS
center box tab(:);
cb | cb
lfR  | lfR .
NAME:RESPONSIBILITY
=
Russ Sparks:Section Manager
Howell Felsenthal:Hardware Project Manager
Mark Hodapp:Software Project Manager
Rob Horning:Processor Board,ERS
Tom Spencer:Processor Board
Tony Riccio:clock, Turn on
John Hoppal:Mechanical
.TE
.H 1 "PCXN PRODUCT OVERVIEW"

PCXN is a NIO based CPU board that uses the PCX CMOS26 chip set.  It complies
with the ESO packaging standard.  It is implemented with the larger of the two
boards supported by the standard (260 x 367 mm).  The first product that will
use PCXN will be a Silver Fox upgrade.

The PCXN CPU board has the following functionality:
.DL
.LI
Main CPU and cache system.
.LI
Hardware floating point.
.LI
NIO interface, arbitration, and clocking.
.LI
PDH and PDC.
.LE

PCXN is based on four unique custom CMOS VLSI chips.  These are the CPU, CMUX
(cache Comparator and Multiplexer), FPC (Floating Point Co-processor), and the
PMIN (Processor Memory Interface for Nio).  Three CMUX chips are required.

.H 2 "The CPU"

The PCX implementation is centered around the CPU chip.  There is no main bus
that connects all the VLSI chips.  The CPU has the following features in
addition to the core CPU functionality:

.DL
.LI
Separate data cache and instruction cache interfaces.  Each interface consists of a 32 bit
data bus, a 21 bit RPN (real page number) bus, a 16 bit cache RAM address bus,
and several control signals.
.LI
A two way associative 64 entry instruction TLB and a 2 way associative data TLB.
The unused portions of the tag SRAM's are used as secondary TLB entry storage.
.LI
Co-processor interface used to communicate with the floating point co-processor.
.LI
Memory/IO bus interface used to communicate with the PMIN.
.LE

The block diagram on the following page shows the PCX partitioning.  The block
diagram is for a general PCX system and not specific to PCXN.  Each area is
discussed below.

.H 3 "Instruction Cache"

The Instruction cache has the following features:
.DL
.LI
The PCXN processor board has a 512K byte instruction cache.  The CMUX and
CPU support from 128K to 512K bytes of instruction cache.
.LI
The cache line size is 32 bytes.
.LI
The cache is two way associative.
.LI
The cache is indexed with the page offset of the virtual address (9 bits) plus
a hashed version of the other part of the virtual address.
.LI
The data portion of the instruction cache is implemented with 64K x 4 15 ns
SRAM's.  PCX does will not support 256K x 4 SRAM's.
.LI
The Tag portion of the instruction cache is implemented with 16K x 4 12 ns
SRAM's.  PCX requires the tag SRAM's to be at least one fourth the depth of the
data SRAM's.  This allows for the off chip TLB.
.LI
The instruction cache is protected by parity.  Error correction is not needed
because lines will never be dirty in the instruction cache.  The data in the
instruction cache is always valid in main memory.  Parity errors are
treated as misses and also cause a LPMC.
.LE

The CPU provides the address and control signals to the SRAM's and the RPN
from the on chip TLB to the I-CMUX.  The I-CMUX the compares the RPN to both
sets of tags and if there is a match it multiplexes the data to the CPU and
the co-processor (FPC).  The CMUX signals the CPU that there was a hit.

.H 3 "Data Cache"

The data cache is different than the instruction cache in the following ways:
.DL
.LI
It implements error correction.  Error correction is needed because it is
a write back cache and so lines are allowed to be dirty.  The data in the cache
may not be in main memory.  When the data CMUX corrects an error there will be
about a 10 state penalty.  It will also cause a LPMC.
.LI
The data cache is 64 bits wide.  Being 64 bits wide requires that there be two
CMUX chips.  One CMUX is connected to the odd bits and the other one to the
even bits.  The data path to the CPU is only 32 bits wide, but the data path
to the co-processor is 64 bits wide.
.LI
Both the data and the tag portions of the data cache are implemented with 16K
x 4 SRAM's.  The data cache tag must be at least half as deep as the data.
This is because it has the same size cache line but the word is twice as
wide.  The CMUX and the CPU support up to 64K deep SRAM's.
.LE

.H 3 "SRAM Address Drivers"

The time required to drive the address to the SRAM's is in the critical path
for system timing.  To improve this time a custom bipolar driver is being
designed.  This part will be in a 28 pin ceramic leaded surface mount package.
The package must dissipate as much as 1.75 watts and so it will require a heat
sink.  The data cache and the instruction cache will each require 4 parts
assuming that there are 6 buffers in each package.

.H 3 "TLB"

The PCXN TLB has the following features:
.DL
.LI
Two way associative 64 entry data TLB.
.LI
Two way associative 64 entry instruction TLB
.LI
Parity checking on both TLB's.  Parity errors in the on chip TLB will cause
HPMC's.  It is not clear if the system will be able to recover from these.
.LI
There is a secondary TLB off contained in the unused portions of the cache
tag SRAM's.  The secondary instruction TLB is in the instruction cache tags,
and the secondary data TLB is in the data cache tags.  When there the secondary
TLB is used there is about a 10 state penalty.
,LI
Parity is generated and checked on the secondary TLB's.  Parity errors are
treated as TLB misses and cause a LPMC.
.LI
Both the secondary TLB's contain 4K entries.  (one fourth the tag depth)
.LE

.H 3 "PMIN"

The PMIN interfaces the CPU to the NIO.  The NIO interface is not part of the
CPU for a two reasons:
.AL
.LI
The CPU is already a very complex chip and the interface to the PMI is less
complex than the interface to NIO.
.LI
This makes it easy to leverage the CPU to other busses.  The PMI is a fairly
simple chip (compared to the CPU) and this is the only chip that would have to
be changed.
.LE

The PMI does not support smart caches (the CPU does).  It would not
be feasible to put more than one PCX CPU on a NIO bus because the main memory
bandwidth could not support more than one.

.H 3 "FPC"

The floating point co-processor contains the register stack, the co-processor
interface and the floating point chip interface.  PCXN will use BIT floating
point chips (same as Topgun).  Assuming that there are no cache misses and
that the floating point chips can keep up PCXN
will be able to do floating point instructions every four clocks (load, load,
operation, store).  The FPC has a 64 bit data path to the data cache.

The FPC is much like the PMI in that if better floating
point chips are developed we can change a fairly simple chip (the FPC) and
have no effect on the other chips in the system.

 
.H 3 "Architecture"
.DL 4 1
.LI
HPPA Architecture, MIDBUS memory bus, CIO IO bus
.LI
Level 2 implementation (64 bit virtual addressing)
.LI
Full Architected 256 MByte Physical IO space
.LI
4 GByte (32 bit) Physical Memory addressing
.LI
System Memory Bus
.DL 4 1
.LI
NIO
.LI
8 MHz operation
.LE

.H 3 "SRAM Timing"

The AC timing characteristics are specified for input signal transition
times of less than 4 ns between 0.8 and 2.4 volts measured at the 1.5
volt level.
.P
Input signals (including write control) are not required to change
monotonically.
.P
The following method will be used to determine the minimum and maximum times
that input signals can be considered valid.  This method will be used for
measuring access time and all set up and hold times.
.tr~ 
.VL 17 
.LI Minimum~delay:
A straight line is drawn tangent to or intersecting the signal
such that the signal is always to the left
of the line between 0.8 and 2.4 volts.
If the time between the 0.8 and the 2.4 volt level is
greater than 4 ns, the delay is decreased to 4 ns while
still keeping the signal to the left between 0.8 and 2.4 volts.
The minimum delay is measured at the point in time
that the line crosses 1.5 volts.
.LI Maximum~delay:
The maximum delay is measured in the same way as the minimum delay except
that the line is drawn to the right of the signal.
The maximum delay is measured at the point in time
that the line crosses 1.5 volts.
.LE
.P
This method of measurement was arrived at in consultation
with the SRAM vendors to shave a few nanoseconds off of the
way these delays are typically specified.  Typical methods
are more conservative than they need to be.
.H 5 "Outputs"
.ad b
.P
Outputs are measured at the 1.5 volt level.  When measured at 2.0 volts on
a low-to-high transition and
0.8 volts on a high-to-low transition
the access time is lengthened by no more than 2 ns.

