ETL

Data Catalog Structures for ETL Users

1243 views September 14, 2018 admin 3

UDM maintains a clean, well-structured set of endpoints that facilitate easy setup and maintenance of ETL data pulls from numerous sources with just a few API calls to UDMDirect. The database schema is illustrated here:

All dimensional (static) data is provided in UDM_Catalog_* feeds.

UDM_Catalog_Publishers contains the list of all agencies, private companies, government departments, commodity exchanges, etc., that publish the data stored in our databases.

UDM_Catalog_Reports contains the list of all reports published by publishers. Each report belongs to one, and only one Publisher.

UDM_Catalog_Sources provides detailed documentation on all URL addresses and other sources from which we collect data belonging to a particular report. Each Source belongs to one, and only one Report. A Report may have multiple Sources.

UDM_Catalog_Series gives the full list of all time series in our database.

UDM_Catalog_Units is the list of all units of measurement. Each Series has one, and only one Unit.

UDM_Catalog_Areas gives the full list of all geographic areas in our database.

UDM_Catalog_Categories and UDM_Catalog_Measures provide concepts used to classify data into standardized Categories and Measures, to facilitate search for Series of interest.

UDM_Catalog_ValueFeeds provides the full list of all Value_* endpoints containing data.

UDM_Catalog_XRef captures all many-to-many relationships among Publishers, Reports, Sources, Series, Areas, Categories, Measures and ValueFeeds.

UDM_Catalog_Coverage provides the date span (first and last report period) for each Series and Area, stored in each Value_* feed.

The schema provided above also includes a number of Value_* feeds: Value_Daily (daily data), Value_Weekly (weekly data), Value_Monthly (monthly data), Value_Quarterly (quarterly data), and Value_Annual (annual data). Each data record is described by three static concepts: Series, Area, and Source. Value_Special_* feeds require specialized static data, captured in UDM_Catalog_Special_* feeds. This is illustrated in the chart above for Value_Special_CMEDairySpotMarket feed which stores daily, transaction-level data for CME spot dairy market. The specialized data catalog feed is necessary to describe the types of transactions (bids, offers, sales, and closing price).

All feeds targeting ETL are designed to facilitate easy initial load (looping over single-column primary key), and incremental data pulls (timestamp + soft-delete policy ensure full referential integrity). Incremental pulls also enable us to easily handle high frequency of data pull requests during the day, so ETL users can set the update frequency to e.g. 15 minute intervals.

 

 

Was this helpful?