
.H 1 "Firefox Error Correction Circuitry"

This is a summary of the Firefox error correction circuitry (ECC).  It focuses
on the contributions that have been made that give it advantages over
traditional ECC.

At the highest level the Firefox ECC gives the following features.  The
combination of these features make it unique.

.DL
.LI
Allows double bit errors to be corrected when one is a previously
detected hard error.
.LI
The ECC and number of RAM's require only single bit correction and double bit
detection.
.LI
Does not require real time software intervention.
.LI
It does not require the hard error to be stuck (It can be random).
.LE

.H 2 "The Problem Solved"

The Firefox memory system has the following set of
goals:

.DL
.LI
Minimize cost
.LI
Maximize performance
.LI
Meet computer group reliability goals
.LE

It also has the following constraints:
.DL
.LI
The Spectrum architecture does not allow real time intervention to aid in
correcting errors.
.LI
The soft error rate for RAM chips is not acceptable.
.LE

Because of the high soft error rate of RAM chips Firefox has single bit error
correction.  Double bit error detection can be added for very little cost
and most ECC circuits have this.  The ability to correct soft errors is
important at all times and so with standard ECC a single hard failure requires
that the board be repaired as soon as possible.  The hard failure rate of
the RAM chips may be high enough that Firefox would have trouble meeting
the reliability goals if redundancy is not used.  In Firefox the double bit
error detection feature is traded off for the ability to allow hard failures
to stay in the system.

.H 2 "Prior Solutions"

This sections discusses some prior solutions to the problem and why they
will not work on Firefox.

.H 3 "Higher Failure Rate"

The most common approach is to allow a higher failure rate.  For Firefox type
systems the soft error rate of the RAM chips has not been acceptable but
the hard error rate has been.  In the last few years there has been a lot
of pressure to improve the reliability.  There will be even more pressure
in the next few years.

.H 3 "Double Bit Correction"

There are ECC solutions that correct double bit errors. These are expensive
because they require more RAM chips and the ECC is complex.

.H 3 "Software Correction"

Another solution requires that the CPU be interrupted before the bad data
is used and correcting the double bit error with software.  The software needs
to be able to read the syndrome bits.  This solution requires that one of the
bits be stuck (not random).  The Spectrum architecture does not insure
that the CPU can be interrupted before the bad data is used (and still
recover).  This could also be a problem with other systems.


.H 2 "How It Works"

This section discusses how the Firefox ECC works.

The software will periodicly pole the memory controller to see if any errors
have occurred (probably every few hours or once a day).  The location and
syndrome for the error will be logged on disc.  When a board reports more
than a few errors (probably about 5) with the same syndrome the error
is considered hard.

When an error is determined to be hard the syndrome and the bank address for
the error is written to the memory controller.  This data is used for two
things.  First, when single bit errors are caused by this bit they are not
reported because there is no need to report the known bad bit.  Second, it
is used to correct double bit errors that involve the known bad bit.  When
a double bit error is detected the known bad bit is inverted and the data is
run through the error corrector again.  If one of the errors was the bad
bit the double bit error will become a single bit error and the corrector can
correct it.  Cycles that require double bit error correction will take a little
bit longer but they are fairly rare (no more than once a week).

