Building Resilient Data Pipelines for Enterprise AI

Overview

Our researchers are developing robust, scalable, and intelligent data infrastructures to support enterprise AI adoption. It builds on past collaborations with leading technology firms and government agencies through the Australian Research Council (ARC) Training Centre for Information Resilience (CIRES) (https://cires.org.au/) and is driven by expertise in data-centric AI, AI for database systems (AI4DB), and human-in-the-loop visualized data analytics. The work addresses critical challenges in (multi-source and multi-modal) data discovery, quality, integration, and provenance, —key factors for successful AI deployment. Researchers develop end-to-end solutions for automating data preparation pipelines, including dataset discovery, relationship identification, imputation, and conflict resolution. Advanced algorithms optimize these pipelines based on user-defined objectives such as scalability, traceability, latency, and cost-effectiveness. The program also explores AI-driven enhancements to database systems, including index advisors, learned indexes, and query optimization. Outcomes of this research include benchmark datasets, open-source tools, and influential publications in top venues in data science fields. These contributions provide actionable insights and technical foundations for building resilient, adaptive, and domain-specific data ecosystems that enable enterprise-scale AI.

A key contribution is BiasNavi, an LLM-powered toolkit designed to help users navigate and manage bias in datasets. Bias identification and mitigation is an issue that undermines fairness, transparency, and reliability in AI systems. BiasNavi features an autonomous agent that integrates modules for bias identification, measurement, surfacing, and adaptation. It provides intuitive, personalized and conversational interfaces that guide users through the bias management pipeline, making responsible AI practices accessible to both technical and non-technical stakeholders. A case study using the COMPAS dataset highlights how BiasNavi democratizes bias mitigation through advanced reasoning capabilities.

Building Resilient Data Pipelines for Enterprise AI

Overview

Project members

Research Leads

Professor Shazia Sadiq

Professor Zhifeng Bao

Professor Gianluca Demartini

Professor Shane Culpepper

Dr Rocky Chen

Dr Junliang Yu

Building Resilient Data Pipelines for Enterprise AI

Overview

Project members

Research Leads

Professor Shazia Sadiq

Professor Zhifeng Bao

Professor Gianluca Demartini

Professor Shane Culpepper

Dr Rocky Chen

Dr Junliang Yu

Related projects