Congratulations to 2022-2023 On the Books State Partners:
the University of South Carolina and the University of Virginia!
Partners were selected through a competitive call for proposals (see below). University libraries at the University of South Carolina and the University of Virginia are working with the On the Books team to create legal corpora and identify Jim Crow language in their states’ laws. The partners have diverse project teams made up of experts in legal information, Jim Crow, and machine learning.
Partners are using the workflows and documentation created by the On the Books team to create well-documented textual datasets of laws enacted in their states during the Jim Crow era (1866-1967). They will also use machine learning to identify Jim Crow language in the laws. Partners will train their models using the training set created by the On the Books team, which contains laws that were labeled by experts as either “Jim Crow” or “not Jim Crow”. The cohort meets monthly to work though any challenges that arise while adapting the scripts and workflows created for North Carolina to work with the digitized legal volumes for South Carolina and Virginia.
This work will go through 2023.
Call for Proposals (CLOSED)
Partner States ($55,000 award)
Teams from two states are being funded.
On the Books: Jim Crow and Algorithms of Resistance (OTB) is a project from the University Libraries at the University of North Carolina at Chapel Hill that has created a text corpus of NC General Statutes from 1866-1967 and used text analysis and expert assessments to identify laws likely to be Jim Crow laws. The Jim Crow laws identified by the project are available as a plain text corpus and can be searched from the OTB website. The project website lists and contextualizes the Jim Crow laws and provides educational resources. A GitHub repository provides documented scripts generated by the project team.
OTB has received funding from Mellon Foundation to expand the project to two additional states and facilitate the use of OTB products in research and teaching. Funds will be regranted to teams from two partner states, who will have sixteen months to create corpora for their own states and use the OTB training set to identify laws likely to be Jim Crow laws.
The workflow and documentation from OTB are available for use, and the OTB project team will meet with partner states monthly to provide guidance in completing the work. The workflow will need to be altered to accommodate the specific format of the laws from other states. The OTB team is eager to collaborate, learn from partners, and determine the best workflow. Deliverables for partner states may vary somewhat depending on the partners’ vision for the project, but their participation must include:
- Attendance at monthly update meetings with teams from both partner states and the OTB project team.
- Creation of a well-documented plain text corpora of state laws during the Jim Crow era (1866-1967), to be made publicly accessible from a repository.
- Complete list of volumes for inclusion identified
- Metadata identified and collected
- Quality assessment of OCR, whereby randomly selected words and pages are checked for accuracy
- Creation of a well-documented plain text corpora of Jim Crow language identified using machine learning and the OTB training set.
- Analysis (supervised classification)
- The algorithm used to identify Jim Crow language will be assessed by the OTB project team to ensure replicability and success
- Review of results, whereby the team’s scholarly expert will review the Jim Crow language identified by the algorithms to confirm whether laws that were identified through machine learning are Jim Crow or not Jim Crow
- Corpora must be made publicly accessible from a repository.
- Sustainability plan (to be submitted as part of the proposal).
- Promotion of the project by writing papers or presenting at relevant conferences.
- Submission of a final report.
The project timeline does not accommodate digitization, so the volumes needed to create the corpus covering 1866-1967 must be already digitized and openly accessible. Some gaps in coverage are acceptable.
Project teams should include members with the following expertise:
- Technical expertise in coding and manipulating textual data sets, to create the corpora and run the analysis. The workflow uses scripts written in Python.
- Collections expertise or expertise in legal information, to identify a comprehensive listing of the volumes needed to create the corpora and provide guidance on the legislative practices of the state.
- Scholarly expertise on Jim Crow, to review model results and verify Jim Crow language (expertise may be in history, African American studies, law, or other relevant disciplines).
- Expertise with cleaning and manipulating data and sufficient time devoted to the project, to prepare the corpora.
Who could apply
Applicant institutions must be in the United States. Project teams can be comprised of members from different institutions. Each team will have a subaward issued to a single institution, which could distribute funds to collaborators elsewhere.
How the funds may be used
Funds may be used to compensate staff and faculty time, for consultants from outside the institution, or for conference travel to present on the work. If graduate students are hired for the project, their contributions should be acknowledged on all output and publications.
Funding is from Mellon Foundation. Their funding guidelines state that tuition and indirect funds cannot be covered.
The proposal process
Potential partners submitted a proposal in the following format:
- Name of the project
- Investigators’ names, institutions, bios, and project roles
- List of collaborators or consultants, their titles, institutions/organizations, and project roles
- Proposed activities and rationale (about 1500-2000 words)
- Vision for the project
- Description of the work to be done
- Significance of the proposed work
- Discuss the rationale for identifying Jim Crow language in the laws for the specific state being proposed
- Budget and Budget Justification
- Describe the availability and provide the location of the digitized volumes needed to create the corpora. Describe any collection gaps in the dates of interest (1866-1967).
- Data Management / Sustainability Plan
- Describe the data files you will produce and how you will preserve them in the long term. What metadata will be provided, or how do you anticipate documenting your work to make it usable by others? In what open-access repository will you preserve your work? Any repository should be consulted before being listed in your proposal. How will products be sustained over time?
- Statement of support from library director or equivalent, ensuring that the investigators have the skills and capacity to accomplish the proposed research.
- Cover letter from the PI describing how the project will be organized that references the Mellon Foundation’s grantmaking policies indicating that the project team’s organization will comply with the policies.
The selection committee favored proposals from diverse teams that demonstrated the skills needed to complete the work. Although ideal projects would have complete collections of digitized legal volumes, a comprehensive set is not a requirement.
Submission Deadline: April 15, 2022. A selection committee chose partners on April 29, 2022.
Project Initiation: August 2022
Project Completion: November 2023
Contact: Send questions to Brianna Nuñez: firstname.lastname@example.org