The Canadian Space Agency (CSA) is seeking a solution that will decrease the cost of managing the maintenance of space robotics by making use of the large body of stored system data records to train an agent to diagnose and predict failures of the flight hardware. The deadline to propose a solution is October 28th.
Under this challenge, a solution is sought to improve the efficiency of a maintenance program for space robotic sub-systems such as cameras, sensors, and mechanisms. Predictive maintenance (specifically self-diagnosis and failure prediction) is not a new area of research, but with advancements in machine learning (ML) and big data analytics (BDA), predictive maintenance can result in significant improvements in reliability, prediction of servicing needs based on equipment performance patterns, and reduction of equipment downtime.
CSA would like to use AI-based predictive maintenance to minimize on-orbit downtime, have longer equipment life, and reduce safety hazards, by servicing equipment based on actual wear and tear instead of scheduled service visits. The solution would make use of historical maintenance data records and downlinked telemetry from past operations as its training data (CSA can provide this data to successful bidders). By providing auto-diagnostics and early failure prediction, both preventive and preparatory actions can be taken. CSA believes that by employing new digital technologies like ML and BDA, the life cycle cost of maintaining future robotic systems can be greatly reduced.
Desired outcomes and considerations
Essential (mandatory) outcomes
The solution must:
- Provide auto-diagnostics and failure prediction based on learning, hypothesis, and analysis using available telemetry data of a space robotic system.
- Provide a data classifier and labeling tool which includes preparation and cleansing to ensure the data is correctly formatted.
- Handle imbalanced data (i.e., nominal operations will be much more frequent than off-nominal).
- Provide a diagnosis to help isolate faults in the system.
- Predict failures of the subsystems or components of the flight system.
- Have an adequate training method using a body of data to reliably diagnose and predict failures of the flight hardware.
The solution should:
- Be applicable to other types of (non-robotics) equipment.
- Include information technology security as part of its inherent design.
Background and context
The Mobile Servicing System (MSS) is an important asset for the ISS program, and its reliability is key to the ongoing life cycle of the station. For future missions like the cislunar Gateway as an example, advanced space robotics systems will operate in the harsh deep space environment and the launch cost of spare parts will be greatly increased. The reliability, safe functioning, and operating cost of these robotic systems will be key to the success of the program. The same applies to surface mobility systems such as rovers and commercial in-orbit services and active debris removal systems.
As complex systems operate in space, standard maintenance and fault diagnosis techniques might not be sufficiently cost-effective. Typical diagnostic and early failure prediction methods require extensive experimentation and modeling during the initial system development. Characterizing the on-orbit performance of the system while still in Earth gravity and ambient conditions is unfeasible, as it would require too much engineering analysis and high cost. Modern approaches based on ML and BDA could provide a solution to overcome the shortfalls of a standard maintenance management program.
At the time when MSS was developed in the late 1990s, the concept of auto-diagnosis was not highly developed and there was an absence of related technologies; hence, traditional maintenance approaches were adopted. As a result of launch schedules and other considerations, the MSS servicing was based on a 2-year cycle with 1 year mitigation period. Note also that the ISS crew is partially responsible for diagnostic and maintenance activities. The MSS operation logs report that a fairly wide range of time was spent diagnosing and correcting anomalies; examples varied from 5 minutes to several hours – and some anomalies are not yet understood. The opportunity cost of this time delays in the execution of science experiments, and potential safety hazards. This finding points to a need for a well-founded and effective self-diagnosis and failure prediction approach.
The steps in a generic use case where an agent (either a ground control operator or an onboard intelligent software agent) performs diagnosis on a space flight system would be as follows:
- A fault occurs in the flight system; evidence of this fault exists within the system telemetry reported to the agent.
- The operations context is noted; essentially this is keeping track of the intent of the current operation and any error identifiers which occur, which can help to refine the search for relevant potential faults.
- If necessary, the agent can choose to query the flight system for additional diagnostic information.
- Using the available information, a likely fault diagnosis is determined. Based on this diagnosis, the flight system can be queried again to confirm the fault, and/or obtain additional data to characterize the fault or search for a root cause.
- If the fault can be cleared by a telecommand, then the agent sends the appropriate command(s).
- The agent verifies that the fault has been correctly repaired. This may involve specifically querying the flight system for additional diagnostic information.
Another scenario of interest is autonomous detection of outside-of-nominal performance. While operating autonomously without a communication link to the ground segment:
- The robot performs a routine operation (e.g., insertion of a payload into a receptacle).
- An intelligent agent compares the telemetry reported by the robot throughout the operation to a model based on previous similar operations.
- The agent determines if the performance is within normal variations, or if an anomaly or performance degradation has occurred. If the latter, the data is flagged for priority download at the next communication window so that ground personnel can perform a detailed post-analysis.
Predictive maintenance is a concept that is applied to optimize asset maintenance plans through the prediction of asset failures via data-driven techniques. The success of these approaches, such as ML and BDA, depends on having a large body of representative data. In addition, the data must be labeled correctly in order to accurately predict failure patterns or perform auto-diagnosis. Much of the data quality challenges can be addressed by deep learning algorithms that can be used to build more accurate predictive models. These deep learning models will be able to apply insights from previously labeled data to new, unlabeled data so both predictive and prescriptive analyses will become even more accurate over time.
In the future, the Industrial Internet of Things (IIoT) and Deep Learning are expected to play a substantial role in the advancement of predictive analytics and overcome data quality issues and the technology gap. Potential innovative solutions could be considered for Canadarm3 on the cislunar Gateway as well as for other commercial in-orbit and active debris removal services and surface mobility systems; Canada could lead the way in an effort to implement an intelligent maintenance program for space flight systems.