The open source project OpenLineage can now prove itself in the sandbox of the LF AI & Data Foundation. OpenLineage goes back to a development by Datakin, a company specializing in DataOps, and is intended to define an open, cross-industry standard that facilitates the acquisition and processing of meta and master data, even in more complex AI and data projects, via an API at runtime.
OpenLineage defines a generic model of run, job and data record entities that can be identified using a consistent naming strategy, as outlined in the following diagram. The entities of the basic model can also be enriched with further aspects if necessary.
In order to be able to more easily guarantee the traceability of the origin of data (data lineage) in overarching projects, OpenLineage creates a central integration instance between data warehouses, analysis tools and SQL engines on the one hand as well as connected projects such as data discovery and metadata -Engine Amundsen or the metadata service Marquez, which is also based on Datakin, on the other hand.
Further information on OpenLineage can be found in the announcement of the LF AI & Data Foundation as well as on the Project website at GitHub.