


				  - 1 -



							 5/15/87
						     Rob Horning


       In looking at the problems with dumping the the PCX1 cache
       to memory in the	time allowed between power fail	warning	and
       power fail, I came to realize that the architecture seems to
       be broken.  There is currently a	proposed change	to the
       memory architecture that	would require memory boards to die
       gracefully when the CPU fails to	dump all of the	cache to
       main memory.  This is not a difficult thing to do and it
       does allow limited configuration	systems	that can guarantee
       enough time to dump their cache (like firefox and Cheetah)
       to die ungracefully.  What bothers me is	that we	are
       allowing	systems	to be designed that do not always work.	 We
       would tell customers with critical applications to buy the
       battery back-up option, but we could not	guarantee that it
       would work.

       The current NIO spec has	a band width and power fail warning
       hold up time that restricts the data cache size to less than
       100K bytes.  Current implementations are	resolving this
       problem by providing more hold up time than the spec
       requires.  This could be	a problem if in	the future IO
       expanders are designed (that could hold memory) that just
       meet the	spec.

       The root	problem	is that	caches are growing in size while
       memory bus band width and power supply hold up time are
       staying the same. This trend can	be expected to continue	in
       the future.  The	strategy of having a fixed hold	up time	to
       dump a cache needs to be	reviewed.

       The battery back-up strategy should meet	the following
       goals:

	 1.  Compatibility with	the current strategy.  The current
	     strategy works for	smaller	cache sizes.

	 2.  Be	able to	guarantee functionality.  The architecture
	     should not	be the limiter for battery back-up
	     reliability.

	 3.  Do	not limit system performance to	support	the
	     strategy.

	 4.  Allow for growth to complex multi-bus, multi-processor
	     systems.

	 5.  Implementations should be transparent to the operating
	     system.











				  - 2 -



	 6.  Do	not put	unreasonable constraints on the	power
	     supply designs.

       Proposed	Strategy -

       I have looked at	several	alternatives and propose that we
       adopt the following strategy:

       Support the current strategy, but augment it to allow an	UPS
       to be used to give additional power supply hold up time.
       The only	effect on NIO would be that you	would have to allow
       the UPS to assert power-fail warning.  NIO expanders could
       be designed to the 5 milli-second hold-up time spec and not
       worry about the memory bandwidth	and cache size.	 The
       expense is put on the system that needs a large cache and
       battery back-up.

       This strategy has very little effect on the architecture.
       If an SPU is designed with limited cache	and does not
       support memory in IO expanders, then the	method for
       providing battery back-up does not matter.

       If an SPU supports memory in IO expanders or if it plans	to
       support larger caches in	the future (that can not be dumped
       to main memory in the hold-up time), then it must provide
       support for a UPS.  The UPS must	power the expanders and	the
       SPU and be able to generate power-fail warning.	The UPS
       implementation is left to the designer.	It could be
       designed	to work	along with the current battery back-up
       strategy	and only provide more hold-up time or it could
       provide power for the entire battery back-up time.

       Other Alternatives -

       Other alternatives where	considered.  Some of these are
       listed below:


	  - Limit the number of	dirty cache lines.  This could be
	    done by limiting the cache size or by having a dirty
	    line counter and forcing the cache to be flushed (or
	    part of the	cache) when the	counter	was exceeded.  Both
	    of these alternatives would	have a bad effect on
	    performance	and the	second would also complicate the
	    cache interface design.

	  - Battery back the cache.  This has a	problem	with
	    causing a larger load on the secondary power supply
	    regulator.	It also	puts restrictions and burden on	the
	    cache interface to insure that during power	up and
	    power down that all	the signals behave properly.  In











				  - 3 -



	    most cache system this would at cause some delay in	the
	    cache access time which would add directly into the
	    system clock period.  It is	highly desirable to keep
	    this critical timing path as simple	as possible.

	  - Provide a longer hold up time.  It does not	look
	    feasible to	provide	a much longer hold up time.  It	is
	    not	clear how much is needed.  Expanders would have	to
	    be designed	with enough hold up time to allow for the
	    largest cache that they would ever be used with.

















































