The Winning Font in Court Opinions | Free Law Project

Michael Lissner

At CourtListener, we’re developing a new system to convert scanned court
documents to text. As part of our development we’ve analyzed more than
1,000 court opinions to determine what fonts courts are using.

Now that we have this information,our next step is to create training
data for our OCR system so
that it specializes in these fonts, but for now we’ve attached a
spreadsheet with our findings, and a script that can be used by others
to extract font metadata from PDFs.

Unsurprisingly, the top font — drumroll please — is Times New Roman.

Attachments

extract_font_metadata_from_files.py_.txt

font-analysis.ods


Source link