About the Site and API
Introduction
Open ONI provides access to information about digitized newspaper pages. To encourage a wide range of potential uses, we designed several different views of the data we provide, all of which are publicly visible. Each uses common Web protocols, and access is not restricted in any way. You do not need to apply for a special key to use them. Together they make up an extensive application programming interface (API) which you can use to explore all of our data in many ways.
Details about these interfaces are below. In case you want to dive right in, though, we use HTML link conventions to advertise the availability of these views. If you are a software developer or researcher or anyone else who might be interested in programmatic access to the data in Open ONI, we encourage you to look around the site, "view source" often, and follow where the different links take you to get started.
For more information about the open source Open ONI software please see Open ONI on GitHub. Also, please consider subscribing to the chronam-users discussion list if you want to discuss how to use or extend the software or data from its APIs.
The API
Jump to:
- Search the newspaper directory and digitized page contents using OpenSearch.
- Link using our stable URL pattern for Open ONI resources.
- IIIF views of Open ONI resources.
- Linked Data views of Open ONI resources.
- Bulk Data for research and external services.
- CORS and JSONP support for your JavaScript applications.
Searching newspaper pages using OpenSearch
Searching newspaper pages is possible via OpenSearch. This is advertised in a LINK header element of the site's HTML template as "Open ONI Page Search", using this OpenSearch Description document.
- andtext: the search query
- format: 'html' (default), or 'json', or 'atom' (optional)
- page: for paging results (optional)
Examples:
- https://gahistoricnewspapers.galileo.usg.edu/search/pages/results/?andtext=thomas
search for "thomas", HTML response - https://gahistoricnewspapers.galileo.usg.edu/search/pages/results/?andtext=thomas&format=atom
search for "thomas", Atom response - https://gahistoricnewspapers.galileo.usg.edu/search/pages/results/?andtext=thomas&format=atom&page=11
search for "thomas", Atom response, starting at page 11
Link to Open ONI Resources
Open ONI uses links that follow a straightforward pattern. You can use this pattern to construct links into specific newspaper titles, to any of its available issues and their editions, and even to specific pages. These links can be readily bookmarked and shared on other sites.
We are committed to supporting this link pattern over time, so even if we change how the site works, we will redirect any requests to the system using this specific pattern.
The link pattern uses LCCNs, dates, issue numbers, edition numbers, and page sequence numbers.
Examples:
- https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/
title information for LCCN sn 86069873 - https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1/
first available edition from Dec. 30, 1865 - https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1/seq-1/
first available page from first edition, January 5, 1900
IIIF Views
In addition to the use of JSON in OpenSearch results, there are also IIIF Presentation API and Image API JSON views available for various resources. These IIIF views are typically linked from their HTML representation using the element. For example:
- https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351.json
title information for LCCN sn86069873 as an IIIF Collection - https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1.json
first available edition from January 5, 1900 as an IIIF Manifest - https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1/seq-1.json
first available page from first edition, January 5, 1900 as an IIIF Canvas - https://gahistoricnewspapers.galileo.usg.edu/newspapers.json
a list of all newspaper titles for which there is digital content represented as a IIIF Collection - /reports/batches.json
a list of all batches of content that have been loaded - https://gahistoricnewspapers.galileo.usg.edu/reports/batches/batch_dlc_fairview_ver01.json
detailed information about a specific batch as a IIIF Collection
Linked Data
Linked Data allows us to connect the information in Open ONI directly to related data on the Web explicitly. Open ONI provides several Linked Data views to make it easy to connect with other information resources and to process and analyze newspaper information with conceptual precision.
We use concepts like Title (defined in DCMI Metadata Terms) and Issue (defined in the Bibliographic Ontology) to describe newspaper titles and issues available in the data. Using these concepts, defined in existing ontologies, can help to ensure that what we mean by "title" and "issue" is consistent with the intent of other publishers of linked data.
These elements are used in RDF views of several types of pages, ranging from a list of the newspaper titles available on the site and information about each, to enumerations of all the pages that make up each issue and all of the files available for each page.
Examples:
- https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351.rdf: information about The colored American.
- https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1.rdf: information about The colored American. [1865-12-30]
- https://gahistoricnewspapers.galileo.usg.edu/lccn/sn82014351/1865-12-30/ed-1/seq-1.rdf: details about all of the files associated with the The colored American. (Augusta, Ga.) 1865-1866, December 30, 1865, Image 1
- https://gahistoricnewspapers.galileo.usg.edu/newspapers.rdf: list of available newspaper titles
Comparing the RDF versions of the links above with their HTML counterpart links, you might notice that the URI pattern we follow for these views is to remove the final slash, replacing it with ".rdf". We follow this pattern to comply with best practices for publishing linked data, and also to keep the URIs easy to understand and use.
For each of the HTML pages with a linked data counterpart in RDF, we provide links to those alternate views from the HTML page using the LINK header element. This can support automating the process of using the RDF data in tools like bookmarklets, plugins, and scripts, and it also helps us to advertise the availability of the additional views. In many views, such as newspaper page images, we also provide LINK elements pointing to the various available files (image, text, OCR coordinate XML) for each available page or other potentially useful information. We encourage you to explore the entire site and to look for and use these LINK elements. Just follow your nose, and view the source.
In addition to the concepts describe above, we use concepts from several other vocabularies in describing materials and also in linking to related data available on other sites. These additional vocabularies and external sites include:
- DBpedia
- Dublin Core and DCMI Terms
- FRBR concepts in RDF
- GeoNames
- LCCN Permalink
- lingvoj.org
- OAI-ORE (more about aggregations below)
- OWL
- RDA
- WorldCat
We are grateful to all of these providers and we hope we can follow their lead in encouraging additional connections between data and vocabulary providers. Please be aware that how we use these vocabularies will likely change over time, as they continue to develop, and as new vocabularies are introduced.
Bulk Data
In certain situations the granular access provided by the API may be somewhat constraining. For example, perhaps you are a researcher who would like to try out new indexing techniques on the millions of pages of OCR data. Or perhaps you are a service provider and anticipate needing to support a high volume of fulltext searches across the corpus, and do not want the Open ONI API as an external dependency. To support these and other potential use cases we are beginning to provide bulk access to the underlying data sets. The initial bulk data sets include:
- Batches: each batch of digitized content is made available via the Batches HTML, Atom and JSON views. These views provide links to where the files comprising the batch can be fetched with a web crawling tool like wget.
- OCR Bulk Data: the complete set of OCR XML and text files that make up the newspaper collection are made available as compressed archive files. These files are listed in the OCR report, and are also made available via Atom and JSON feeds that will allow you to build automated workflows for updating your local collection.
CORS and JSONP Support
To help you integrate Open ONI into your JavaScript applications, the OpenSearch and AutoSuggest JSON responses support both Cross-Origin Resource Sharing (CORS) and JSON with Padding (JSONP). CORS and JSONP allow your JavaScript applications to talk to services without the need to proxy the requests yourself.
CORS Example
curl -i 'http://chroniclingamerica.loc.gov/suggest/titles/?q=manh'
HTTP/1.1 200 OK
Date: Mon, 28 Mar 2011 19:45:34 GMT
Expires: Tue, 29 Mar 2011 19:45:37 GMT
ETag: "7d786bec2ca003d86009f8ccdfd72912"
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With
Content-Length: 7045
Last-Modified: Mon, 28 Mar 2011 19:45:37 GMT
Content-Type: application/x-suggestions+json
[
"manh",
[
"Manhasset life. (Manhasset, N.Y.) 19??-19??",
"Manhasset mail. (Manhasset, N.Y.) 1927-1986"
],
[
"sn97063690",
"sn95071148"
],
[
"http://chroniclingamerica.loc.gov/lccn/sn97063690/",
"http://chroniclingamerica.loc.gov/lccn/sn95071148/"
]
]
JSONP Example
curl -i 'http://chroniclingamerica.loc.gov/suggest/titles/?q=manh&callback=suggest'
HTTP/1.1 200 OK
Date: Mon, 28 Mar 2011 19:45:34 GMT
Expires: Tue, 29 Mar 2011 19:45:37 GMT
ETag: "7d786bec2ca003d86009f8ccdfd72912"
Cache-Control: max-age=86400
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With
Content-Length: 7045
Last-Modified: Mon, 28 Mar 2011 19:45:37 GMT
Content-Type: application/x-suggestions+json
suggest([
"manh",
[
"Manhasset life. (Manhasset, N.Y.) 19??-19??",
"Manhasset mail. (Manhasset, N.Y.) 1927-1986"
],
[
"sn97063690",
"sn95071148"
],
[
"http://chroniclingamerica.loc.gov/lccn/sn97063690/",
"http://chroniclingamerica.loc.gov/lccn/sn95071148/"
]
]);
CORS is arguably a more elegant solution, and is supported by most modern browsers. However JSONP might be a better option if your application needs legacy browser support.