ETL

Data Catalog: Reports

470 views June 11, 2018 May 29, 2019 admin 0

UDM team maintains a carefully curated structure of metadata categories for each value we store in our databases. Each data record belongs to one, and only one Report.

Excel files can find the full list of all Reports in the UDM Data Catalog Excel file, under tab Reports.

 

Data Feed for ETL automated connections is /UDM_Catalog_Reports.

The fields in this data feed are:

  • ReportID (int, not null)
  • PublisherID (int, not null, foreign key: UDM_Catalog_Publishers, field: PublisherID)
  • ReportCountryID (int not null, foreign key: UDM_Catalog_Areas, field: AreaID)
  • ReportName (nvarchar(400), not null)
  • ReportNameTranslated (nvarchar(400), null)
  • ReportNameCode (nvarchar(400), null)
  • ReportCountry (nvarchar(50), null)
  • ReportUrl (nvarchar(400), null)
  • LastModified (datetime2(6), not null)
  • Active (bit, not null)

ReportID is the unique identifying number assigned to each report. Each Report belongs to one, and only one Publisher, captured by the foreign key PublisherIDReportName is the full official name of the report that contains the data we collect. For example, “Collecte Hebdomadaire de Lait de Vache” is a report published by FranceAgriMer. UDM standard protocols are to store series names, report names and publisher names using the language in which they were originally published, in this case French. To facilitate easier use we also provide ReportNameTranslated, in this example: “Weekly Collections of Cow’s Milk“. ReportUrl contains the link to the homepage of the report. We may be collecting data from a report from multiple Sources, but only one URL will be listed here.

The feed UDM_Catalog_Reports contains the timestamp field LastModified which can be used to facilitate incremental pulls. No record is ever hard-deleted from the underlying UDM database table. Instead, soft-deleted records have Active field set to 0. Soft-delete principle in conjunction with the timestamp of last change allows the users to optimize their data pulls by only requesting those fields that have been modified since the last time the data feed was pulled. For example, the request to the server may be:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Reports?$filter=LastModified gt DateTime’2018-08-31T15:30:00.0000000′

or more simply:

http://udmdirect.dairymarkets.com/Universal/UDM_Catalog_Reports?$filter=LastModified gt DateTime’2018-08-31′

To add this table to your database:

CREATE TABLE [UDM].[Catalog.Reports]( 
[ReportID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[ReportCountryID] [int] NULL,
[ReportName] [nvarchar](400) NOT NULL,
[ReportNameTranslated] [nvarchar] (400) NULL,
[ReportNameCode] [nvarchar] (400) NOT NULL,
[ReportUrl] [nvarchar](400) NULL,
[LastModified] DATETIME2(6) NOT NULL, 
[Active] BIT NOT NULL, 
CONSTRAINT [PK_Catalog.Reports] PRIMARY KEY ([ReportID])
) 
GO 

ALTER TABLE [UDM].[Catalog.Reports] ADD CONSTRAINT [FK_PublisherID_Catalog_Reports] FOREIGN KEY([PublisherID])
REFERENCES [UDM].[Catalog.Publishers] ([PublisherID])
GO

ALTER TABLE [UDM].[Catalog.Reports] ADD CONSTRAINT [FK_ReportCountryID_Catalog_Reports] FOREIGN KEY([ReportCountryID])
REFERENCES [UDM].[Catalog.Areas] ([AreaID])
GO

Was this helpful?