Ai-Driven Output Checking For Official Statistics: Leveraging Llms and Workflow Automation

Researchers rely on confidential and sensitive microdata provided by national statistical institutes and other organizations to perform their studies. Ensuring confidentiality in these research outputs is a critical challenge, as manual output checking processes are time-consuming and require expert knowledge. This paper presents an innovative and automated framework that integrates large language models (LLMs), prompt engineering techniques, and workflow automation (n8n) for statistical disclosure control (SDC). The proposed system introduces an AI-driven output checking process, including code generation, data processing, and output validation. The framework significantly reduces human labor by enabling researchers to pre-check their outputs through a Seamless Within-Activity Review (SWAR) method prior to submission. The paper discusses challenges such as computational costs, confidentiality concerns, the need for human oversight especially in the early stages, and explores reinforcement learning as a strategy to improve AI-driven risk assessments over time. The suggested system ensures adherence to data privacy laws while taking a step toward scalable, AI-assisted disclosure control in official statistics.

Ricardo Carvalho
School of Economics and Management, University of Porto
Portugal

Afshin Ashofteh
NOVA Information Management School, Universidade Nova de Lisboa and Statistics Portugal (advisor)
Portugal

Pedro Campos
School of Economics and Management, University of Porto and Statistics Portugal
Portugal