These pages detail possible ACIS-related anomalies that may occur. Each page describes the anomaly itself, when it has happened in the past, how likely it is to happen again, and what the response should be. Appropriate links are also given to relevant flight notes, procedures, and other documentation.
Contents:
The DPA-A shuts down anomalously, presumably due to a spurious command.
Note
The diagnosis and response to this anomaly presented in this document assumes that the active BEP is powered by DPA side A.
The DPA-A has shut down 4 times over the mission:
It appears likely that the anomaly will occur again if the mission continues.
Within a major frame (32.2 seconds), one should see:
All other hardware telemetry should be nominal. The current values for these can be found on our Real-Time Telemetry pages. Older data can be examined from the dump files or the engineering archive.
Our real-time web pages will alert us and the Lead System Engineer will call us. We need to:
sot_red_alert
announcing that the ACIS team is aware of the DPA-A shutdown
and is investigating, and that a telecon will be called when more information is available.sot_yellow_alert
.sot_red_alert
and call a telecon with the FOT, SOT, and FDs to brief
them on the diagnosis and the plan to develop a CAP to recover.SOP_ACIS_DPAA_ON
)WSVIDALLDN
command to power off the video boards (1A_WS007_164.CLD
)SOP_ACIS_SW_STDFOPTG
)SOT_SI_SET_ACIS_FP_TEMP_TO_M121C
)sot_shift
to inform the project that ACIS is restored
to its default configuration.Note
As of this writing, the latest ACIS Flight Software Patch is F, Optional Patch G. Before preparing the CAP, the latest version of the procedure should be checked.
The DPA-B shuts down anomalously, presumably due to a spurious command.
Note
The diagnosis and response to this anomaly presented in this document assumes that the active BEP is powered by DPA side A.
The DPA-B has shut down once:
It appears likely that the anomaly will occur again if the mission continues.
Within a major frame (32.2 seconds), one should see:
All other hardware telemetry should be nominal. The current values for these can be found on our Real-Time Telemetry pages. Older data can be examined from the dump files or the engineering archive.
Our real-time web pages will alert us and the Lead System Engineer will call us. We need to:
sot_red_alert
announcing that the ACIS team is aware of the DPA-B shutdown
and is investigating, and that a telecon will be called when more information is available.sot_yellow_alert
.sot_red_alert
and call a telecon with the FOT, SOT, and FDs to brief
them on the diagnosis and the plan to develop a CAP to recover.SOP_ACIS_DPAB_ON
, this can be executed even if a science run is currently in
progress, see note below).SOP_61021_STANDBY
)SOP_ACIS_DPAB_ON
)SOP_ACIS_WARMBOOT_DEAHOUSEKEEPING
).sot_shift
to inform the project that ACIS is restored
to its default configuration.Note
At this point in the mission (Jan 2017), it is standard practice to power off unused FEPs to reduce power consumption and keep the electronics temperatures lower. For this reason, it is believed that it is safe to power on the DPA side B during a science run that does not use these FEPs. If this practice is changed later in the mission, this procedure may have to be revisited.
Warning
The FOT DEA-A On procedure and script are currently being updated. Links will need to be checked/changed when that process completes.
The DEA-A shuts down anomalously, presumably due to a spurious command.
Note
The diagnosis and response to this anomaly assumes that the active DEA is DEA-A.
The DEA-A has shutdown only once in the mission, in 2005:
It appears it could happen again, but one occurrence provides little guidance.
Within a major frame (32.2 seconds), one should see:
All other hardware telemetry should be nominal. The current values for these can be found on our Real-Time Telemetry pages. Older data can be examined from the dump files or the engineering archive.
Our real-time web pages will alert us and the Lead System Engineer will call us. We need to:
sot_red_alert
announcing that the ACIS team is aware of the DEA-A shutdown
and is investigating, and that a telecon will be called when more information is available.sot_yellow_alert
.sot_red_alert
and call a telecon with the FOT, SOT, and FDs to brief
them on the diagnosis and the plan to develop a CAP to recover.SOP_61036_DEAA_ON
)SOP_ACIS_WARMBOOT_DEAHOUSEKEEPING
)SOT_SI_SET_ACIS_FP_TEMP_TO_M121C
)sot_shift
to inform the project that ACIS is restored
to its default configuration.The DEA sequencer crashes, resulting in loss of science data from all video boards for the rest of the science run.
The DEA Sequencer has reset three times during the mission; most recently in 2013:
It appears likely that the anomaly will occur again.
scienceReport
packet that will contain
non-zero fepErrorCodes
and a non-zero terminationCode
.Most likely we will be notified by CXCDS Ops that data ceased prematurely for an observation. We need to:
One or more (but not all) FEPs can reset during an observation, resulting in loss of science data from the affected FEPs for the rest of the science run.
The unaffected FEPs will continue to operate normally. This is different than the DEA Sequencer Reset During a Science Observation anomaly which shuts down all FEPs.
FEP resets have occurred three times during the mission; most recently in 2013:
It appears likely it could happen again. Side B of the DPA seems particularly susceptible to power transients.
If all of the FEPs were halted, this could be an instance of DEA Sequencer Reset During a Science Observation. However, if some FEPs were halted but others continued, then the resets were most likely caused by a momentary “glitch” in the DPA-B +5V supply which halted the CPUs that were currently powered by that supply.
To be certain:
Obtain the relevant dump data file from the directory /dsops/critical/GOT/input/
,
which is located on the Ops LAN.
Run ACORN on the file as per the instructions in “Running ACORN on data dumps in the case of an anomaly (04/06/16)”
Run the MIT decom on the data dump as per the instructions in “Memo on Running MIT tools (04/26/16)”
Look for the line that gives the tiem of the last exposure packet from the offending FEPs. It will look like this:
136:19:57:18 - Last exposure packets from FEPs 3 and 4, exposureNumber=11324
Look for when the FEPREC_RESET
occurred. The line will look like
this:
136:19:57:59 - SwHousekeeping FEPREC_RESET reported, count=2, value=4
If the time between those two events is less thatn 449 seconds, then the reset was not due to a watchdog timer-induced reset.
Look at the following MSIDs:
Typically, DPA5VHKB bounces around +/- a volt. If, however you see it steadies up right around the time of the FEP halt, this indicates that all DPA-B boards were in a reset state.
Check to see if the DPA-B current (1DPICBCU) dropped at around the same time while the DPA-B voltage (1DPP0BVO) and the DPA-A current and voltage remained steady.
The above behavior confirms that the FEPs actually reset.
Most likely we will be notified by CXCDS Ops that data from one or more of the CCDs stopped during an observation. We need to:
Usually the power down prior tot he next observation clears the anomaly. If the target is not on one of the halted FEPs, then it is likely that the science objectives of the observation will be met.
We should examine data from the next observation because power-cycling the FEPs should clear the condition. But if the next observation uses the same configuration, the FEPs will not be power cycled and the anomaly will persist.
Event data stops being reported for one CCD/FEP combination and the delta overclock values are peculiar, large and negative for the first three output nodes and large and positive for the fourth output node.
It appears likely it could happen again.
Both of the following symptoms will be noticed:
deltaOverclock
values reported from these FEPs are large and negative for the first three output nodes and
large and positive for the fourth output node.Both of these symptoms can be observed from one of the PMON pages.
Most likely we will be notified by CXCDS Ops that data ceased prematurely for a single CCD for an observation. But there is a chance we can catch this anomaly in a realtime contact during a long observation. If yes, we have a SOP written to intervene if the observation with the anomaly is still in progress to dump diagnostic data.
If not, we need to:
Science event telemetry begins before bias map has been telemetered, overloading direct memory transfers from one or more FEPs, resulting in locked values for the FEP Threshold Plane.
Three times:
Most likely not. The untricklebias
patch, installed in Flight Software
Version B-opt-C in 2003, calls all biasThief
methods from within the
science thread, preventing simultaneous telemetry of bias and event
packets.
We are speaking of two symptoms, interleaved bias and event packets, and frozen threshold crossing values. The most immediately obvious symptom is saturated telemetry, with far too many events from the affected FEP or FEPs. On an affected FEP, the threshold crossing counts will not change from one frame to the next. The symptoms disappear with the next science run, and may become apparent only when SSR data are examined.
In 2001, the response was to recycle FEP power. This is now automatically built into the start of each SI mode command sequence, and is unnecessary. The subsequent science run in the load will execute normally, so no corrective action is necessary.
If we see T-Plane latchup on coming into comm, it may be worth trying
to salvage the remainder of the science run. Check whether a CLD
exists for a WSPOWXXXXX
packet to command the required set of DEA boards
and FEPs. If so, and significant time remains for the obsid, execute
stop science, WSFEPALLDN
, the WSPOWXXXXX
command, and start science.
A special case: if the following science run is a no-bias version of
the same SI mode, it will not recycle the FEP power. Send a stop
science and a WSFEPALLDN
command to ACIS.
Afterward, the team may examine the telemetry stream at leisure to see what may have triggered the latchup. If there was no interleaving of bias and events, look for any other simultaneous high demand on DMA transfers out of the affected FEPs.
Once the T-Plane latches, science will be lost from the latched FEPs for the rest of the science run, and some science is likely to be lost from remaining FEPs due to telemetry saturation.
1A_AA000_191.CLD
, AA00000000
(stop science command)1A_WS003_165.CLD
, WSFEPALLDN
(command to power down all FEPs)Currently there are 2 6-chip and 3 5-chip power commands in CLDs. The 5-chip commands all power an additional, unneeded FEP.