Automated Legislation Summaries
Published on Oct 10, 2024
Chicago Councilmatic is now showing auto-generated summaries for most non-routine legislation!
The City Clerk provides a useful one-sentence description for each bill already, but for more complex bills, you’ve needed to dig through the many legally-dense pages of a given ordinance to understand what it’s about. As a site that aims to make Chicago City Council more accessible to the public, the legislative text itself has always been a huge barrier.
This is actually where a Large Language Model (LLM) like ChatGPT can be useful. By taking in the ordinance text and asking it to provide a short, detailed, and accurate summary, users would be able to get a useful overview quickly without needing to scroll through pages and pages of scanned PDFs. Sounds like an easy win, right? Well, not exactly. Doing it required a bit of work.
First, we needed to figure out what bills were worth summarizing. As Chicago City Council has about 150,000 pieces of legislation, trying to summarize all of them would take a long time and cost us a lot of money (each bill ended up costing us about $0.06 in ChatGPT). As a first pass, we decided to focus on bills that were flagged as ‘key legislation’ (something the Clerk’s website indicates) and from the current legislative session starting in mid-2023 (we may go back and add older legislation later).
Next, we needed to extract the text from each of the documents for each bill and save them as a text file. For this we used tesseract, an OCR command line tool that converts PDFs or images to text.
Finally, we passed this text to Open AI’s ChatGPT API with a prompt to summarize it in 50 words or less. The first time we tried this, however, the summary was not really good. It turns out, we needed to fine tune our ChatGPT prompts and break them up into different pieces. After some research, we found that others have spent some time thinking about legislative summaries, and we found the llm-text-summarization library on GitHub by Sourajit Roychowdhury. If you’re interested in the details of how it works, they provided a useful summary of why and how it works here.
Design and Architecture of llm-text-summarization
To make it compatible with our existing data update process, we ended up forking the llm-text-summarization project and cleaned up the interface a bit to make it easier to call from our existing scripts. You can see how the new bill summarization pipeline works in our chicago-council-scrapers repo.
We consider this a first version of this feature, and plan to make improvements to it over time. If you notice any issues or have suggestions for us on how to improve it, please send us an email at info@datamade.us.
Showing auto-generated legislation summary for SO2024-0007838