Metropolitan Museum of Art and MoMA Collection Analysis

SQL
Data Analysis
Data Visualization
Data Manipulation
PostgreSQL
This project showcases a series of SQL queries designed to analyze and cross-reference art collection data from two prominent museum databases: MoMA (Museum of Modern Art) and the Metropolitan Museum of Art.
It includes operations such as extracting unique tags from artworks, aggregating artworks by these tags, and joining artist information across both collections based on ULAN (Union List of Artist Names) IDs.
The queries perform tasks such as:
  • Identifying unique tags across artworks and counting their occurrences.
  • Aggregating titles of artworks associated with each tag.
  • Matching artists from MoMA to artworks in the Met collection based on ULAN IDs, and vice versa.
  • Grouping and counting Met artworks associated with MoMA artists.
This analysis provides insights into how artists and artworks are interconnected across these collections, offering a data-driven perspective on the categorization and attribution of artworks within major museum databases.

Extract and list all unique tags from the 'metobjects' table.

Each tag is split from a delimited string in the 'tags' column. Results are ordered alphabetically by tag.
sql
SELECT DISTINCT unnest(string_to_array(tags, '|')) AS tag
FROM metobjects 
ORDER BY tag;

Aggregate titles by tags and count the occurrences of each tag.

First, it unnests the tags from the 'tags' column into individual rows. Then, it aggregates titles associated with each tag into an array and counts them. Results are grouped by tag and ordered by the count of items, descending.
sql
WITH unnested_tags AS (
  SELECT 
    unnest(string_to_array(tags, '|')) AS tag,
    title
  FROM 
    metobjects 
),
aggregated_names AS (
  SELECT 
    tag,
    array_agg(title) AS item_names,  -- Aggregates titles into an array for each tag.
    count(title) AS item_count  -- Counts the number of titles for each tag.
  FROM 
    unnested_tags
  GROUP BY 
    tag
  ORDER BY item_count DESC  -- Orders the results by the count of titles, descending.
)
SELECT * FROM aggregated_names;

Join 'momaartists' with 'metobjects' based on matching ULAN IDs.

Matches are found by extracting the numeric ID from the 'Artist ULAN URL' in 'metobjects' and comparing it with the 'ulan' field in 'momaartists'. This query retrieves the artist's ULAN, display name, and the title of associated items.
sql
SELECT
  ma.ulan,
  ma.displayname,
  mo.title AS met_items  -- Titles from 'metobjects' that match the artist's ULAN.
FROM
  momaartists ma
JOIN
  metobjects mo
ON
  ma.ulan::TEXT = REGEXP_REPLACE(mo."Artist ULAN URL", '.*ulan/(\d+).*', '\1');

Similar to Query 3, but aggregates titles into an array and counts the total items per artist.

Results are grouped by artist ULAN and display name, and ordered by the count of associated items, descending. This query gives an overview of how many items from 'metobjects' are associated with each artist.
sql
SELECT
  ma.ulan,
  ma.displayname,
  array_agg(mo.title) AS met_items,  -- Aggregates matching titles into an array for each artist.
  count(ma.displayname) AS met_items_count  -- Counts the number of items associated with each artist.
FROM
  momaartists ma
JOIN
  metobjects mo
ON
  ma.ulan::TEXT = REGEXP_REPLACE(mo."Artist ULAN URL", '.*ulan/(\d+).*', '\1')
GROUP BY
  ma.ulan,
  ma.displayname
ORDER BY
  met_items_count DESC;  -- Orders artists by the count of associated items, descending.