BUILDING A QUESTION-ANSWER DATASET FOR VIETNAMESE PUBLIC ADMINISTRATIVE DOCUMENTS
Keywords:
Vietnamese QA dataset, Legal Vietnamese dataset, Public service online, Vietnamese public administrative documentsAbstract
The development of effective chatbots for legal domains poses significant challenges due to the complexity, ambiguity, and specialized language inherent in legal texts. This paper introduces a comprehensive Question-Answer (QA) dataset specifically designed for Vietnamese public administrative documents. This dataset aims to serve as a standardized resource for fine-tuning deep learning models tailored for legal chatbots. The primary goal is to enhance the chatbots' capability to accurately address citizen inquiries regarding procedures in online public services. The dataset was constructed through a meticulous process involving the collection, preprocessing, and annotation of public administrative documents. We ensured a broad coverage of topics relevant to public services and crafted questions that reflect real queries. The dataset is divided into a training set and a test set, facilitating the training and evaluation of machine learning models. Our dataset contributes to the advancement of AI-driven public service solutions in Vietnam, providing a valuable resource for the research community to develop and refine legal chatbots.