One or more (but not all) FEPs can reset during an observation, resulting in loss of science data from the affected FEP(s) for the rest of the science run. The unaffected FEP(s) will continue to operate normally. This is different than the DEA Sequencer Reset During a Science Observation anomaly which shuts down all FEPs.
FEP resets have occurred three times during the mission; most recently in 2013:
It appears likely it could happen again. Side B of the DPA seems particularly susceptible to power transients.
If all of the FEPs were halted, this could be an instance of the DEA Sequencer Reset During a Science Observation anomaly. However, if some FEPs were halted but others continued, then the resets were most likely caused by a momentary “glitch” in the DPA-B +5V supply which halted the CPUs that were currently powered by that supply.
If some FEPs are halted and their deltaOverclock
values are
large and negative for the first three output nodes and large and
positive for the fourth output node, this may be an instance of the
Hi/Lo Pixel Anomaly.
If this is a suspected FEP Reset anomaly, some analysis is necessary to be certain.
If you are watching PMON during a Comm and the event occurs while you are watching then:
Before the anomaly, the pmon
display should look normal.
If you happened to have comm right when the reset happened, you
would see FEPREC_RESET
in red in the bottom left column that
would eventually scroll off as more housekeeping packets came in.
The CCD/FEP statistics in the middle section would stop accumulating
for the FEP(s) that had reset.
If you miss the FEP reset itself, but get comm afterwards, pmon
does not complain, but you would notice that fewer CCD/FEPs are
accumulating statistics in the center section than you would
expect. The header part of the middle section, with the CCD/FEP
assignments, would still show the FEPs that had reset, but the
display would be static or blank depending on whether pmon had
seen any data from the current observation previous to the anomaly.
So there is an indication that something is amiss, but it is not
in red flashing letters.
If you saw the science report at the end of the run, you would see that the FEPs that had reset would have many fewer exposures/events/etc.
If you missed the FEP reset itself but suspect one occurred because PMON reports that a FEP or FEPs stopped collecting, OR if DS Ops has alerted us to the fact that one or more FEPs stopped collecting, then you can proceed to step 2.
Look at the following DEA Housekeeping and MSID values:
DEA Housekeeping: DPA5VHKB
Telemetry MSIDs: 1DPICBCU, 1DPP0BVO, 1DPICACU
Typically, DPA5VHKB bounces around +/- a volt. However, if you see it steadies up right around the time of the FEP halt, this indicates that all DPA-B boards were in a reset state. Check to see if the DPA-B current (1DPICBCU) dropped at around the same time while the DPA-B voltage (1DPP0BVO) and the DPA-A current and voltage remained steady. The above behavior confirms that the FEPs actually reset.
To look at these values:
When available, obtain the relevant dump data file from the directory:
/dsops/critical/GOT/input/
which is located on the Ops LAN, and extract the relevant MSIDs.
There are a total of 8 boards in the DPA, powered independently:
DPA-A +5V to BEP-A, FEP0, FEP1, and FEP2
DPA-B +5V to BEP-B, FEP3, FEP4, and FEP5
Previous occurrences of this anomaly were on the DPA-B side, affecting only the side B FEPs. In that case, one should see:
The timeline of events in the anomaly can be reconstructed by examining the packet data in the MIT psci data files. Peter Ford, Joan Quigley, Royce Buehler, or Catherine Grant can do that. They should find the time of the last event packets from the affected FEPs and the time that the FEP resets were reported in software housekeeping.
If the time between these two events is less than 449 seconds, then the reset was not due to a watchdog timer reset. See Peter Ford’s Obsid 15232 memo for more information.
If you happen to observe the incident on PMON, send a warning email to
CXCDS Ops (send to operators@cfa
or ascdsops@head
). Then do the
analysis above when the data is available. If that analysis confirms a
FEP reset, then send an email to the Flight Directors alerting them of
the incident.
Most likely, we will be notified by CXCDS Ops that data collection on one or more of the CCDs stopped during an observation. We need to:
WSPOW00000
command should clear the condition.
However, any run immediately following which executes WSVIDALLDN
instead
(such as an event histogram or no-bias run) may be affected, since in this
case the anomaly is likely to persist.