ETL

Data Catalog: Areas

329 views September 12, 2018 September 10, 2019 admin 0

Each value stored in the database belongs to one, and only one Area.

Our Excel users can see the full list of Areas in the Excel version of the UDM Data Catalog, in the tab Areas:

The ETL data catalog feed with the full list of all Areas is /UDM_Catalog_Areas. The fields are:

  • AreaID (int, not null)
  • AreaName (nvarchar(400), not null)
  • AreaNameTranslated (nvarchar(900), not null)
  • AreaNameCode (nvarchar(900), not null)
  • AreaDescription (nvarchar(max), not null)
  • LastModified (datetime2(6), not null)
  • Active (bit, not null)

AreaID is the unique identifying number assigned to each Area, i.e. the primary key for the table where this data resides. AreaName is the formal name for the Area. Examples include: United States; Minnesota; Sri Lanka; FMMO 33 – Mideast; World;  Napa County, CA. etc. Areas are not hierarchically structured, and when we have encountered reports that has used a custom area such as ‘Minnesota-Wisconsin’, or ‘19 States‘ we have added those areas to our collection. AreaName is always stored in the original language and letter, e.g. 河北 is the AreaName for Hebei Province in China. To make the collection accessible to all users, we have also included English translation of all names in the field AreaNameTranslated. In this example that would be Hebei. AreaNameCode stores the name of the area with all characters other than A-Z, a-z, and 0-9 removed or replaced by the underscore sign or closest letter, e.g. AreaName: Baden-Württemberg, AreaNameCode: Baden_Wurttemberg. AreaDescription contains the verbose description of the area, where additional information is necessary. For example, U.S. Department of Agriculture uses the area 23 States to present aggregate milk production over top 23 milk producing states in the United States. To capture this area, we have AreaName: 23 States, AreaDescription: Top 23 milk producing states. Currently included states are: Arizona, California, Colorado, Florida, Idaho, Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, New Mexico, New York, Ohio, Oregon, Pennsylvania, South Dakota, Texas, Utah, Vermont, Virginia, Washington, Wisconsin.

The feed UDM_Catalog_Areas contains the timestamp field LastModified which can be used to facilitate incremental pulls. No record is ever hard-deleted from the underlying UDM database table. Instead, soft-deleted records have Active field set to 0. Soft-delete principle in conjunction with the timestamp of last change allows the users to optimize their data pulls by only requesting those fields that have been modified since the last time the data feed was pulled. For example, the request to the server may be:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Areas?$filter=LastModified gt DateTime’2018-07-21T04:30:00.0000000′

or more simply:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Areas?$filter=LastModified gt DateTime’2018-07-21′

Note: AreaFIPS field will be removed from this feed on September 1, 2019. We are preparing a different set of feeds to capture area nomenclatures such as ISO 3166-1, FIPS, etc.

ETL users should pull in the cross-reference catalog feed /UDM_Catalog_XRef to obtain information on all relevant many-to-many relationships in the UDM Database, such as Product-Area, or Area-Report.

To add this table to your database:

 

CREATE TABLE [Report].[Catalog.Areas]( 
[AreaID] [int] NOT NULL,
[AreaName] [nvarchar](400) NOT NULL,
[AreaNameTranslated] [nvarchar](400) NOT NULL,
[AreaNameCode] [nvarchar](400) NOT NULL,
[AreaDescription] [nvarchar](MAX) NULL,
[LastModified] DATETIME2(6) NOT NULL, 
[Active] BIT NOT NULL, 
CONSTRAINT [PK_Catalog.Areas] PRIMARY KEY ([AreaID])
) 
GO

Was this helpful?