Tips on Searching the Web Archive
Descriptive records for web archives are searchable using the Library’s site-wide search box. A quick link to the Archived Website format is also featured in a tab at the top of this program site. Use the Archived Websites link above to see web archives currently available for public access. From https://www.loc.gov/websites, users may also use the faceted search options as well as the main loc.gov search bar to find descriptive records for the web archives.
Each web archive has an item page displaying the descriptive record and thumbnails (generally taken from the earliest capture in the archive) for each URL collected for an organization or person we targeted for archiving. A web archive will have multiple thumbnails if the URL has changed over time. There may be multiple web archives for different parts of the same organization.
Descriptive records identify the event or thematic collection(s) the web archive is associated with, the Library division responsible for developing the collection, and information such as additional URLs that were collected related to the seed URL – typically content hosted on third-party sites or other related domains that the Library has identified for archiving.
The descriptive records will provide a link to the archived resources; look for the "View Captures" link or click on the thumbnail. A date range showing the available captures will be displayed. If access is restricted to onsite use only, the Access Condition statement will notify the user and the link to the archive will not appear. To limit your search to content available outside Library premises, select “Available Online” using the Access Condition facet.
The Library uses OpenWayback to replay the web archives. The archive URL search allows navigating across time and within the archive. It also allows for search of content that may not have descriptive records. Use it to navigate to a specific year, date and time to see a version of an archived resource.
- In the search box, type or paste in any domain or URL and click on the FIND icon to locate results from the Library's web archive.
- The results default to a calendar showing the most recent dates of captures that are available. The blue bubbles represent available captures; the bigger the bubble, the more captures the Library has for a particular day.
- The timeline under the search box provides an overview of how much, and when, the site was archived. Click anywhere on the timeline to navigate to an earlier year.
- The Library's OpenWayback shows all dates that a resource has been archived by the Library. Some content appears in the archive beyond the time period of the collections they were selected for. Reasons include:
- The crawler picks up bits and pieces of other websites as it is archiving a targeted URL, due to the nature of how websites are interlinked, and as it finds embedded content
- During quality review processes, if issues with the archiving process were found, content may be crawled longer than anticipated to get a better capture of the content.
- The Library ingested copies of .gov content collected by the Internet Archive from 1996-2001, resulting in some government content that predates the Library's own program start date in 2000.
- "Not in Archive" error or other error restricting access may appear when navigating the archive, indicating resources that the Library has not archived, or that have been restricted to onsite access. See For Researchers for more details.
- The date of capture may change as you navigate the archive, since the archives are interconnected and interlinked. If a resource has not been archived on particular date, OpenWayback points you to the closest resource for the content that is missing.
Advanced URL Search
Advanced users may be interested in learning how to understand the archived URL better, and how to edit the URL more cleverly. As you navigate the archive, you may notice that the browser's URL bar changes. This URL can be edited to narrow results or broaden them, or to look for a specific URL in the archive.
A typical URL in the archive will look like this:http://webarchive.loc.gov/all/19970101000000-20161231235959*/http://loc.gov.
There is much information about the archived resrource embedded in that URL. Here's what the various components mean, and how you can edit them:
- webarchive.loc.gov = the library's web archive.
- /all/ = this indicates that you are in the Library's public version of the archive. Users may also see /legacy/ as they navigate some of our earlier collections, such as September 11 web archive.
- 19970101000000-20161231235959* = this string of numbers represents the date and time of capture, year, month, day, and time of day, translated as yyyymmddhhmmss. Edit any of these components to narrow the time frame down or expand to see more. For example:
- http://webarchive.loc.gov/all/20160106233839/http://www.loc.gov/ will search for a specific date and time in the archive.
- Using a wildcard in place of the date will return all dates in the archive with those limitations = http://webarchive.loc.gov/all/*/http://www.loc.gov/
- Using a wildcard in place of a portion of the date will similarly return more targeted results. For instance if you would like to see captures of loc.gov from only 2016, you would type = http://webarchive.loc.gov/all/2016*/http://www.loc.gov/
- /loc.gov = the url you're looking for in the archive. Edit this if you want to navigate the archive in a simple way – just edit the URL at the end to navigate to another URL.
Note that if any of those components disappear as you are navigating the archive, you might have left the archive and navigated to the "live" web. When you are using the web archive, if a site isn't 100% archived or if the site is constructed in a certain way, you may wander to the live site in some cases.