Home >> News

SAP Releases First ERP-Based Dataset To Fuel AI Research

Tuesday, April 15, 2025

SAP is taking a significant step in bridging the gap between generative AI and enterprise data through the release of its new dataset, Sales Autocompletion Linked Business Tables (SALT). While AI models, such as Large Language Models (LLMs), have shown remarkable results in natural language tasks like writing emails, answering questions, and even crafting wedding speeches, applying these models to structured, tabular data, critical for business operations - has remained a challenge. A key reason is the scarcity of high-quality, realistic tabular data for training and benchmarking AI in enterprise settings.

SALT, curated using anonymized customer data from SAP’s Enterprise Resource Planning (ERP) system, is designed to address this gap. It features millions of real-world sales order entries spread across interconnected tables and captures the complexity of business data, including heterogeneous data types and column imbalances. The dataset is now publicly available on Hugging Face and GitHub, allowing researchers to work with data that reflects actual business operations.

Obtaining and sharing such data has historically been difficult due to concerns around privacy, confidentiality, and commercial sensitivity. As a result, there has been a growing disconnect between academic research and the realities of enterprise data. SALT aims to close this divide by enabling AI researchers to develop models that are more effective in handling real-world business data.

SAP researchers believe SALT could be a key resource in training better foundation models tailored for enterprise use. “There is a gap between academia and industry in terms of data. We want to enable the research community to work on real problems, not just simulated problems,” says SAP researcher Tassilo Klein.

To further its commitment, SAP is also developing its own AI model, the SAP Foundation Model, which is designed to work with tabular data out-of-the-box with minimal training. Complemented by SAP’s Knowledge Graph, this AI model will leverage metadata to understand relationships within complex datasets, making it easier to adapt to various business scenarios.

According to Johannes Hoffart, CTO of Business AI at SAP, "SALT is just the beginning. For now, we are starting with just one customer and use case. However, we plan to publish more datasets that cover a diverse set of customers and use cases,” he explains.

The initiative also encourages collaboration with academia, giving researchers the opportunity to test and publish results using real industry data. With SALT and the SAP Foundation Model, SAP is laying the groundwork for a new era of AI-powered enterprise automation, pushing generative AI beyond text and into the heart of business operations.

 

Facebook