Jove
Visualize
Contact Us
  1. Home
  2. Research Domains
  3. Information And Computing Sciences
  4. Machine Learning
  5. Deep Learning
  6. Progressive Discretization For Generative Retrieval: A Self-supervised Approach To High-quality Docid Generation

Progressive discretization for generative retrieval: A self-supervised approach to high-quality DocID generation

Shunyu Yao1, Jie Hu1, Zhiyuan Zhang2

  • 1China Telecom Research Institute, Beijing, 102209, China.

Neural Networks : the Official Journal of the International Neural Network Society|June 14, 2025

Related Experiment Videos

View abstract on PubMed

Summary

This study introduces Self-supervised Progressive Discretization (SPD) to create better document identifiers (DocIDs) for generative retrieval. SPD enhances generative retrieval performance by improving DocID quality and reducing information distortion.

Area of Science:

  • Information Retrieval
  • Machine Learning
  • Natural Language Processing

Background:

  • Generative retrieval utilizes large language models as differentiable indices for document memorization and retrieval.
  • Traditional methods encode documents and queries separately, limiting performance.
  • Existing methods for generating document identifiers (DocIDs) often suffer from information distortion due to unsupervised discretization.

Purpose of the Study:

  • To propose a novel framework, Self-supervised Progressive Discretization (SPD), for generating high-quality document identifiers (DocIDs).
  • To improve the performance of generative retrieval systems by addressing limitations in DocID creation.

Main Methods:

  • SPD distills document information into multi-perspective continuous representations using self-supervised learning.
  • A progressive discretization algorithm transforms continuous representations into approximate vectors and discrete DocIDs.
  • The self-supervised model, approximate vectors, and DocIDs are integrated into a query-side training pipeline.

Main Results:

  • SPD successfully creates high-quality, search-oriented DocIDs.
  • The proposed framework achieves state-of-the-art performance in generative retrieval benchmarks.
  • SPD mitigates information distortion during the discretization process.

Conclusions:

  • Self-supervised Progressive Discretization (SPD) offers a robust method for generating effective DocIDs for generative retrieval.
  • The SPD framework significantly advances the capabilities of large-scale generative retrieval systems.
  • This work demonstrates the potential of self-supervised learning in optimizing document representations for retrieval.
Keywords:
Generative retrievalLanguage modelsNeural networkSelf-supervised progressive discretization

Related Experiment Videos

Related Concept Videos

JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site

Terms & Conditions of Use
Privacy Policy
Policies