Article Preview
TopIntroduction
Information retrieval is fundamental to modern data-driven applications (Ikegwu et al., 2022), as it enables users to quickly and precisely locate relevant information within vast, complex datasets by leveraging sophisticated indexing and ranking techniques (Pakhale, 2023). As the volume and diversity of data continue to grow exponentially (Villalobos et al., 2022; Wang, Li, & He, 2025), alongside increasingly dynamic user behaviors, the need for robust and adaptive retrieval systems has become more pressing than ever (Adadi, 2021; Wang, Li, Lv, et al., 2025). Current state-of-the-art methods, including traditional models like BM25 (Aklouche et al., 2019) and deep learning-based approaches such as bidirectional encoder representations from transformers for information retrieval (BERT-IR; Hambarde & Proenca, 2023), have achieved significant advances in retrieval accuracy and relevance (Li et al., 2025; Wang et al., 2021). However, these methods struggle to cope with non-stationary environments and the challenges of integrating heterogeneous modalities. As a result, their scalability and robustness in real-world applications are significantly limited (Syaharuddin, 2024).
Addressing these limitations introduces several key challenges. First, user preferences and document collections in dynamic environments are constantly evolving, making static retrieval strategies ineffective (Huang et al., 2024). User interests can shift rapidly due to contextual factors, seasonal trends, or new emerging topics, requiring retrieval models to dynamically adjust their ranking functions to remain effective. Traditional retrieval methods struggle to capture these shifts, as they rely on pre-trained representations that become outdated when user behaviors and content distributions change. Second, the integration of multimodal data—including textual, visual, and metadata information—adds complexity to state representation and ranking optimization (P. Chen et al., 2024; Li, Guan, et al., 2023; Xin Li et al., 2023). As new content types emerge and document distributions fluctuate, retrieval models must effectively incorporate diverse features while preventing over-reliance on any single modality. Ensuring robustness across multimodal sources remains a significant challenge, particularly when certain modalities are incomplete or noisy. Lastly, handling noisy or incomplete feedback, a common occurrence in user interaction data (H. Chen et al., 2021; Ren & Wang, 2024), demands robust mechanisms to maintain system performance under uncertain conditions (Cámara et al., 2022). User interactions, such as click-through data, may be unreliable due to accidental clicks, non-engaged browsing, or inconsistent behavior. Effective retrieval frameworks must distinguish between reliable and noisy feedback to refine ranking strategies without being misled by deceptive signals.
To tackle these challenges, recent research has explored reinforcement learning (RL)-based retrieval frameworks that optimize ranking strategies through policy networks (Yao et al., 2021). These methods emphasize adaptivity and continuous learning from user interactions (Mishra, 2024), making them particularly promising for dynamic retrieval tasks (Halkiopoulos & Gkintoni, 2024). In addition, multimodal data integration has been increasingly recognized as a way to enhance system understanding of both user intent and content relevance (Bayoudh et al., 2022). However, existing approaches often struggle to balance computational efficiency with adaptability, particularly in large-scale and evolving environments (Guo et al., 2024).