heritage.site

About

heritage.site is an experimental data project that I have started in 2022 to discover and learn tools, practices in the data engineering space.

Essentially it is based on a full backup of en.wikipedia.org, filtering all articles that relate to a heritage site (containing a specific infobox template) and join with the related pageview dump to measure each page popularity.

A lot of data cleansing.

Then the sites are geographically sorted and display on this site for convenient access.

There is a lot more I wish I can add to the site to explore more data about that domain, so stay tuned.

December 2022 - Stage 1

June 2023 - Stage 2

February 2024 - Stage 3

New grand total of 122,500 sites

Contact

For comments, feedback or report you can fill up this contact form.

Gabriel