ETL

Data Catalog: Categories

170 views September 10, 2018 September 14, 2018 admin 0

UDM team maintains a carefully curated metadata structures for each value we store in our databases. Each value stored in the database belongs to one, and only one Series. As of September 2018, UDM database stores over 34,000 series. How should one go about looking for a series of interest? To facilitate this process, we recently introduced our own classification of series into meaningful Categories. Consider for example these series: “Boterproductie” (Netherlands), “Elaboración de mantequilla” (Chile), “MRiRW Masło Ekstra, 82% tl., w blokach, średnia ważona cena” (Poland) and “Butter, Stocks in All Warehouses” (United States). Each of these series contain either production, price or inventory of butter. Butter is the concept common to each of these series, and Butter is the category all of them are associated with. Butter, as a category, is defined as “Butter is a fatty product derived exclusively from milk and/or products obtained from milk, principally in the form of an emulsion of the type water-in-oil.” Categories span more than just physical commodities. For example:

Class III Milk:

  • In United States, under Federal Milk Marketing Orders, all skim milk and butterfat used to produce used to produce cream cheese and other spreadable cheeses, hard cheese of types that may be shredded, grated, or crumbled; plastic cream, anhydrous milkfat, and butteroil; and evaporated or sweetened condensed milk in a consumer-type package.

Euro / U.S. Dollar Exchange Rate:

  • U.S. Dollars to One Euro

S&P 500: 

  • The Standard & Poor’s 500, abbreviated as the S&P 500 is an American stock market index based on the market capitalization of 500 large companies having common stock listed on the NYSE or NASDAQ.

Our Excel users can see which Series belong to a particular Category in the tab Series, by filtering the column Category:

Each new Series we add to the database belongs to at least one Category, but may, where appropriate, belong to multiple Categories. Classification of Series into Categories was introduced in June 2018, and series added to the database before that date may not yet be classified, but our plan is to eventually have all series classified into one or more categories.

The ETL data catalog endpoint with the full list of all Categories is /UDM_Catalog_Categories. The fields are:

  • CategoryID (int, not null)
  • CategoryName (nvarchar(400), not null)
  • CategoryDescription (nvarchar(900), not null)
  • CategoryParentID (int, null)
  • LastModified (datetime2(6), not null)
  • Active (bit, not null)

CategoryID is the unique identifying number assigned to each Category, i.e. the primary key for the table where this data resides. CategoryName is the brief name for the Category. CategoryDescription is the verbose description of the Category. Each Category may belong to a single higher-level category. For example, Gouda Cheese, Edam Cheese and Cheddar Cheese are all subcategories under Cheese. The feed UDM_Catalog_Categories contains the timestamp field LastModified which can be used to facilitate incremental pulls. No record is ever hard-deleted from the underlying UDM database table. Instead, soft-deleted records have Active field set to 0. Soft-delete principle in conjunction with the timestamp of last change allows the users to optimize their data pulls by only requesting those fields that have been modified since the last time the data feed was pulled. For example, the request to the server may be:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Categories?$filter=LastModified gt DateTime’2018-09-01T04:30:00.0000000′

or more simply:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Categories?$filter=LastModified gt DateTime’2018-09-01′

ETL users should pull in the cross-reference catalog feed /UDM_Catalog_XRef to obtain information on all relevant many-to-many relationships in the UDM Database.

 

Was this helpful?