Lab on Rare Event Prediction

Abstract

In the age of Industry 4.0 and smart automation, unplanned downtime is costing industries over $50 billion annually. Even with preventive maintenance, industries like automotive lose more than $2 million per hour due to downtime caused by unexpected or “rare” events. The extreme rarity of these events makes their detection and prediction a significant challenge for AI practitioners. Factors such as the lack of highquality data, methodological gaps in the literature and limited practical experience with multimodal data exacerbate the difficulty of rare event detection and prediction. This lab will provide hands-on experience to learn how to address these challenges by exploring the entire lifecycle of rare event analysis, from data generation and preprocessing to model development and evaluation. The development of a process ontology and its use for user-level explanations will also be demonstrated. Participants will be introduced to the limited publicly available datasets and, more importantly, gain hands-on experience with a newly developed multi-modal dataset designed explicitly for rare event prediction. Through several hands-on sessions, participants will learn how to generate such a highquality dataset and the practical use of this dataset to develop rare event prediction models. Those interested in developing AI models involving diverse multimodal data for other applications will also benefit from participation.

Introduction

Rare event prediction is pivotal in industrial applications, particularly in Industry 4.0. These anomalous events, characterized by their infrequent occurrence, pose significant challenges for predictive modeling due to the skewed nature of the data and the complexity of evaluation and lead to unplanned downtime, reduced equipment lifespan, and increased energy consumption. Traditional machine learning methods often struggle in the case of rare events, mainly due to the scarcity of high-fidelity data and the complex interdependencies between system components.

This lab aims to provide participants with practical knowhow on the entire lifecycle of rare event prediction and the process of generating explanations for the predicted outcome. Specifically, we focus on conducting four hands-on sessions covering (i) the existing dataset and deep dive into the newly developed FutureFactories dataset, (ii) addressing data scarcity, imbalance and improving the quality of data through data augmentation, (iii) exploring the algorithms and evaluation methods in rare event prediction, and (iv) the process of developing a process ontology for the manufacturing domain and its use in generating user-level explanations for the predicted outcome. The learning from this lab will also be relevant to other domains and applications, such as healthcare, finance, and energy, where predictive maintenance can help prevent costly failures in complex systems. Participants will gain valuable insights and skills transferrable across industries where rare events impact operational efficiency and require advanced predictive techniques.

Goals of the Lab

The target audience includes academic researchers, data scientists, and practitioners working in industrial AI, particularly those focused on predictive maintenance, anomaly detection, and the application of machine learning techniques in complex manufacturing environments.
- Develop a thorough understanding of the data used in manufacturing, the machinery involved, data collection methodologies, and the specific challenges encountered in manufacturing environments.
- Gain a comprehensive understanding of methods for handling data scarcity in rare event and anomaly prediction, including the application of data augmentation techniques.
- Acquire hands-on experience with various predictive modeling approaches for rare event detection and anomaly prediction using real-world manufacturing datasets.
- Develop insights into the significance of integrating multimodal data (e.g., time-series and images) for enhancing anomaly detection and prediction.
- Learn to apply process ontology for generating user-level explanations, improving the interpretability and explainability of AI models in manufacturing settings.

Table 1: Detailed Outline of the Lab
Time	Activity	Presenter
10 mins	Introduction and Overview: Introduction to rare event prediction, anomaly detection, smart manufacturing and the objectives of the lab.	Amit Sheth
20 mins	Exploring the Datasets for Rare Event Analysis: Overview of the public datasets available, deep dive into Future Factories (FF) setup and data collection for different version of FF-dataset.	Ruwan Wickramarachchi
20 mins	Problem 1: Addressing Data Scarcity and Improving the Quality: Hands-on lab focusing on using data augmentation techniques and modeling approaches to improve rare event prediction.	Chathurangi Shyalika
20 mins	Problem 2: Model Selection, Development and Evaluation: Discussion and hands-on lab for anomaly prediction using robust modeling techniques, including capturing dependencies between anomalies.	Dhaval Patel
20 mins	Problem 3: The Development and Use of Process Ontology: Hands-on session on how to develop a process ontology for rare event and anomaly prediction and its use for explanation generation.	Revathy Venkataramanan
15 mins	Q&A Session: Addressing participants’ questions and providing further clarifications.	All presenters

Technical Setup and Tools

To ensure an optimal learning experience during the lab, participants should bring a laptop with Python 3 installed. Familiarity with basic terminal operations is also recommended, as it will facilitate smoother interaction with the tools and platforms used throughout the lab. The organizers will provide all necessary software requirements and dependencies to ensure that participants can concentrate on the hands-on exercises and core content without being distracted by setup issues. The lab will primarily utilize the following tools and platforms:

Python for programming and machine learning model implementation.
Google Colab provides a cloud-based environment to run Python code and easy access to high-performance GPUs for computationally intensive tasks.
PyTorch or TensorFlow for deep learning tasks.
Jupyter Notebooks for interactive coding.
Scikit-learn for basic machine learning models.
Pandas and NumPy for data manipulation.
Data visualization tools like Matplotlib or Seaborn.

Supplementary Materials

The organizers have a rich history of contributions in the domain of rare events and anomaly prediction and conducting well-attended tutorials at international venues. These contributions not only underscore the depth of their expertise but also serve as valuable resources for participants wishing to delve deeper into the subject. The following is a list of supplemental materials that augment the content of the lab:

A Comprehensive Survey on Rare Event Prediction: This paper provides a thorough review of current approaches to rare event prediction, covering data, processing, algorithms, and evaluation methods across various modalities (Shyalika, Wickramarachchi, and Sheth 2023). View
Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing: This study evaluates how data enrichment techniques, combined with machine learning, enhance rare event detection and prediction in manufacturing processes (Shyalika et al. 2024b). View
Analog and Multi-modal Manufacturing Datasets Acquired on the Future Factories Platform: This paper introduces two industry-grade datasets from a manufacturing assembly line, including time series analog and multi-modal data (with images) over 30 hours, introducing anomalies to aid AI research in manufacturing intelligence (Harik et al. 2024).

View

Dataset: Rare Event Classification in Multivariate Time Series: This paper introduces a multivariate time series dataset from the pulp-and-paper manufacturing industry, featuring sensor readings and rare paper break events, intended for building classification models to predict these events and exploring other supervised or unsupervised approaches (Ranjan et al. 2018).

View

Bosch Production Line Performances Dataset: Introduces a real-world dataset aimed at predicting internal failures within the manufacturing assembly line processes at Bosch Inc. (Bosch 2016).

View

Autonomous Condition-based Asset Maintenance: Industrial assets in smart factories are typically monitored using IoT sensor data. This operational setup enables early detection of abnormal conditions in assets through anomaly detection (Patel et al. 2024).
Dynamic Process Ontology: Materials and resources related to Dynamic Process Ontology and its use-cases can be found here - GitHub Repository and YouTube Video.
RI2AP: Robust and Interpretable 2D Anomaly Prediction in Assembly Pipelines: This paper introduces RI2AP, a method designed for robust and interpretable anomaly prediction in rocket assembly pipelines, addressing key challenges in predicting rare anomalies (Shyalika et al. 2024a).

View

NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines: This research proposes a neurosymbolic multimodal fusion approach that integrates time series and image data for enhanced and interpretable anomaly prediction in assembly pipelines (GitHub Repository).
AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines: This paper presents AssemAI, an interpretable image-based anomaly detection system for smart manufacturing pipelines, featuring a custom YOLO-FF model, tailored image dataset, and integrated explainability techniques to enhance reliability and efficiency in industrial environments (Prasad et al. 2024).

View

Smart Manufacturing Research at Artificial Intelligence Institute, University of South Carolina: Smart Manufacturing Wiki.
Neurosymbolic AI Research at Artificial Intelligence Institute, University of South Carolina: Neurosymbolic AI Research Wiki.
List of Previously Conducted Tutorials: Tutorials Anomaly Detection: Anomaly Detection Code Pattern.

These materials provide a comprehensive backdrop to the lab, offering participants a holistic understanding of the domain and the organizers’ pioneering contributions.

References

Bosch. 2016. Bosch Production Line Performance - kaggle.com. Link. [Accessed 19-Apr-2023].
Harik, R.; Kalach, F. E.; Samaha, J.; Clark, D.; Sander, D.; Samaha, P.; Burns, L.; Yousif, I.; Gadow, V.; Tarekegne, T.; et al. 2024. Analog and Multi-modal Manufacturing Datasets Acquired on the Future Factories Platform. arXiv preprint arXiv:2401.15544.
Patel, D.; Lin, S.; Shah, D.; Jayaraman, S.; Ploennigs, J.; Bhamidipati, A.; and Kalagnanam, J. 2024. AI Model Factory: Scaling AI for Industry 4.0 Applications. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13): 16467–16469.
Ranjan, C.; Reddy, M.; Mustonen, M.; Paynabar, K.; and Pourak, K. 2018. Dataset: rare event classification in multivariate time series. arXiv preprint arXiv:1809.10717.
Shyalika, C.; Roy, K.; Prasad, R.; Kalach, F. E.; Zi, Y.; Mittal, P.; Narayanan, V.; Harik, R.; and Sheth, A. 2024a. RI2AP: Robust and Interpretable 2D Anomaly Prediction in Assembly Pipelines. Sensors, 24(10): 3244.
Shyalika, C.; Wickramarachchi, R.; El Kalach, F.; Harik, R.; and Sheth, A. 2024b. Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing. Sensors, 24(15): 5009.
Shyalika, C., Wickramarachchi, R. and Sheth, A.P., 2023. A comprehensive survey on rare event prediction. ACM Computing Surveys.
Prasad, R., Shyalika, C., Zand, R., Kalach, F.E., Venkataramanan, R., Harik, R. and Sheth, A., 2024. AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines. arXiv preprint arXiv:2408.02181. Machine Learning for Predictive Models in Engineering Applications Special Session (MLPMEA) at ICMLA 2024.

Author Biographies

Chathurangi Shyalika

Chathurangi Shyalika is a Ph.D. student at the AI Institute, University of South Carolina. Her research interests focus on Deep Learning, Multimodal-AI, Time Series Analysis, and Neurosymbolic-AI. She is excited about the opportunities to apply her research interests to real-world problems such as rare-event prediction, anomaly detection, and event understanding in various domains. She has authored several publications at AAAI, ACM, Sensors, IEEE, Springer, NeurIPS, and ICMLA. Contact her at: jayakodc@email.sc.edu

Ruwan Wickramarachchi

Ruwan Wickramarachchi is a Ph.D. candidate at the AI Institute, University of South Carolina. His dissertation research focuses on introducing expressive knowledge representation and knowledge-infused learning techniques to improve machine perception and context understanding in autonomous systems. He has co-organized several tutorials on Neurosymbolic-AI. Contact him at: ruwan@email.sc.edu

Revathy Venkataramanan

Revathy Venkataramanan is a PhD Candidate at AI Institute, University of South Carolina. She is also a research intern at Hewlett Packard Enterprise Labs. Her research focuses on using a neurosymbolic approach to build explainable AI models through multi-contextual grounding, reasoning, and traceability. She works in several domains, such as diet management, AI pipeline optimization, and Smart Manufacturing. She has co-organized workshops at AAAI 2024 and tutorials at IEEE BigData and KGSW. She has published in several venues, such as JMIR, IEEE SMC, and ICMLA. Contact her at: revathy@email.sc.edu

Dhaval Patel

Dhaval Patel is with IBM Research since 2016 and currently works as a Senior Technical Staff Member (STSM). Dr. Dhaval Patel holds PhD in Computer Science from the National University of Singapore. Dr. Patel is an expert in Data Mining, Machine Learning, Time Series Data Analysis, etc. He is a key contributor to many Flagship IBM Research products, including AutoAI-TS, Maximo Application Suites for Anomaly Detection at Scale, etc. Contact him at: pateldha@us.ibm.com

Amit Sheth

Prof. Amit Sheth is an Educator, Researcher, and Entrepreneur. His current research includes neuro-symbolic AI, emphasizing trustworthy, explainable, and safe AI. He is a fellow of the IEEE, AAAI, AAAS, ACM, and AAIA. Three of the AI companies he founded involved licensing his university research outcomes. Contact him at: amit@sc.edu

Developing Explainable Multimodal AI Models with Hands-On Lab on the Lifecycle of

Rare Event Prediction in Manufacturing

Abstract

Introduction

Goals of the Lab

Technical Setup and Tools

Supplementary Materials

References

Author Biographies

Chathurangi Shyalika

Ruwan Wickramarachchi

Revathy Venkataramanan

Dhaval Patel

Amit Sheth