BUILDING A QUESTION-ANSWER DATASET FOR VIETNAMESE PUBLIC ADMINISTRATIVE DOCUMENTS

Authors

  • Thanh Ha Thai Nguyen University of Information Technology and Communication
  • Dinh-Dien La Department of Information and Communications, Ha Giang, Vietnam
  • Van-Khanh Tran Thai Nguyen University of Information and Communication Technology
  • Trung-Nghia Phung Thai Nguyen University of Information and Communication Technology

Keywords:

Vietnamese QA dataset, Legal Vietnamese dataset, Public service online, Vietnamese public administrative documents

Abstract

The development of effective chatbots for legal domains poses significant challenges due to the complexity, ambiguity, and specialized language inherent in legal texts. This paper introduces a comprehensive Question-Answer (QA) dataset specifically designed for Vietnamese public administrative documents. This dataset aims to serve as a standardized resource for fine-tuning deep learning models tailored for legal chatbots. The primary goal is to enhance the chatbots' capability to accurately address citizen inquiries regarding procedures in online public services. The dataset was constructed through a meticulous process involving the collection, preprocessing, and annotation of public administrative documents. We ensured a broad coverage of topics relevant to public services and crafted questions that reflect real queries. The dataset is divided into a training set and a test set, facilitating the training and evaluation of machine learning models. Our dataset contributes to the advancement of AI-driven public service solutions in Vietnam, providing a valuable resource for the research community to develop and refine legal chatbots.

Author Biographies

Dinh-Dien La, Department of Information and Communications, Ha Giang, Vietnam

Dinh-Dien La is a PhD student majoring in computer science, University of Information and Communications Technology, Thai Nguyen University. He is currently Deputy Director of the Department of Information and Communications of Ha Giang province, in charge of digital transformation. His research interests are data science, machine learning and deep learning in the domain of law, public administration.

Van-Khanh Tran, Thai Nguyen University of Information and Communication Technology

{Van-Khanh Tran} received Ph.D. in Natural Language Processing from the Japan Advanced Institute of Science and Technology (JAIST), where his research focused on deep learning for natural language generation in spoken dialogue systems.
He also holds a Master's degree in Information Technology from Manuel S. Enverga University Foundation, Philippines, and a Bachelor's degree in Computer Science from the University of Information and Communication Technology in Vietnam.
He has held research positions at various institutions, including the Applied Artificial Intelligence Institute at Deakin University, Australia, and VinBigdata's Virtual Assistant Center, where he contributed to the development of NLP applications.
He is currently an AI Research Scientist on the NLP team at FPT Smart Cloud's Generative AI (GenAI) Center, where he focuses on developing large language models and AI assistant ecosystems tailored for Vietnamese users. He also serves as the Deputy Head of the Institute of Applied Science and Technology. His research interests include natural language processing, large language models, and AI applications in the legal, healthcare, and finance domains.

Trung-Nghia Phung, Thai Nguyen University of Information and Communication Technology

{Assoc. Prof. Trung-Nghia Phung} received his Engineering degree in Electronics and Telecommunications from Hanoi University of Science and Technology (HUST) in 2002. He completed his Master of Science degree in Telecommunications from Vietnam National University –Hanoi (VNUH) in 2007 and his PhD degree in Information Science from Japan Advanced Institute of Science and Technology (JAIST) in 2013. He was Dean of Faculty of Electronics and Telecommunications, Head of Academic Affairs, and he has been Rector of Thai Nguyen University of Information and Communication Technology (ICTU). He has been a Vice President of Vietnam Club of Faculties-Institutes-Schools-Universities of ICT (FISU) and President of FISU Branch in the Northern Midlands, Mountains and Coastal Region of Vietnam. He was the recipient of the award for the excellent young researcher (Gloden Globe award) from Ministry of Science and Technology (MOST) of Vietnam in 2008. His main research interest lies in the field of interaction between signal processing and machine learning and he has published more than 70 research papers related to this field. He serves as a technical committee program member, organizing co-chair, program co-chair, track chair, section chair, editorial board member and reviewer of several conferences, journals and books. He is now an associate editor of Thai Nguyen University Journal of Science and Technology (ICT section).

Downloads

Published

2025-06-27