Vitaly Meursault - Federal Reserve Bank of Philadelphia
Mapping Inventions in the Space of Ideas, 1836–2022: Representation, Measurement, and Validation.
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas Reynolds
How well can different methods meaningfully represent inventions in the “space of ideas?” We evaluate four leading natural language processing (NLP) models, each of which produces a different numerical representation of patent text. We design three novel, domain-specific validation tasks to select between these representations. Sentence-BERT (S-BERT) significantly outperforms other widely-used NLP models, creating metrics better aligned with both expert and non-expert human judgment about patent similarity. The choice of representation matters significantly for economic measurement. According to S-BERT, contemporaneous patents have declined in similarity over more than a century, as inventions have “spread out” on an expanding knowledge frontier. Other representations report ambiguous or diverging patterns. We reproduce the S-BERT result using newly-digitized records of historical interferences, which show secular declines in the rate of multiple invention. Our results highlight the importance of validation and model selection as an essential step in constructing and using measures derived from patent text.