Uncovering Hidden Patterns in Regulatory Documents: Applying Machine Learning to Single Audits
Section: Washington Update

By C. Christina Ho, VP of Govnerment Analytics and Innovation, Elder Research

Audit and regulatory documents provide considerable opportunities for applying advanced machine learning techniques that can draw out desired insights and findings. Natural Language Processing, or NLP, is a popular sub-domain of artificial intelligence that allows computers to process, analyze, and understand human language. Advances in NLP have opened the door to a myriad of applications throughout the regulatory and compliance landscape.
Federal agencies like the Department of Health and Human Services (HHS) and the Department of Housing and Urban Development (HUD) have begun using predictive analytics models that assess grant recipient ‘realized risk’ based upon a recipients’ single audit findings. In both cases, machine learning models were trained to identify and extract key findings within the reports. Using NLP, the findings were then weighted according to the severity of the enclosed findings and concerns.
HUD utilized the resulting audit scores as an input component for a broader recipient risk model. This new model overhauled previous risk scoring approaches used within the department to produce recipient risk scores that more closely mirror the inherent recipient risk. HHS developed an API where users could search a database of pre-scored recipient audits using the EIN/DUNS identifiers and year. For audits not yet scored, users could input the public URL pointing to the audit PDF triggering the scoring algorithm to calculate a subsequent risk score.
For years, grant recipients, including many states, have spent valuable resources on single audits as required by the federal government. The data (audit findings) housed within the Federal Audit Clearinghouse (FAC) was rarely used to inform decisions given the difficulties with employing analytics on unstructured text. That is no longer the case. Regulation Technology (RegTech) is a quickly advancing industry. The example applications above highlight the extent to which advancing technology can be used to improve decision-making in a field typically reserved for manual review and analysis. In both cases, single audits provided a rich, unstructured dataset from which models could be trained to replace the laborious task of manual review.