Mark E. Phillips: Compressibility of the DPLA Creator Field by Hub

This is the second post in a series of posts exploring the metadata from the Digital Public Library of America.
In the first post I introduced the idea of using compressibility of a field as a measure of quality.
This post I wanted to look specifically at the dc.creator field in the DPLA metadata dataset.
DC.Creator Overview
The first thing to do is to give you an overview of the creator field in the DPLA metadata dataset.
As I mentioned in the last post there are a total of 15,816,573 records in the dataset I’m working with. These records are contributed from a wide …read more

Source: Planet Code4Lib