How to Search
This page describes how to use the search engine in detail. Refer to the sidebar at right for particular tips and tricks.
QUICK LINKS TO TOPICS ON THIS PAGE
- What You Can Search
- Who Can Search
- Where You Can Search From
- Basic Search
- Phrase Searches
- Boolean Searches
- Search Results
- Refining Results
- Searching in Context
- Advanced Search
- RIF Search
- FAQ
What You Can Search
The entire contents of this disk is available for searching, including:
- Document titles and page content
- Book titles and page content
- Photograph metadata (title and caption)
- Audio transcripts and audio captioning data (but not untranscribed audio)
- Frontmatter and help pages
- Mary Ferrell Database entries
- Dealey Plaza Witness Database entries
- "RIF" document metadata fields
In the case of document and book pages, each scanned page has been fed through Optical Character Recognition (OCR) software to create searchable text "behind the page." Note that the OCR process is imperfect; the harder it is for you to read the words on a poor-quality page, the harder it is for the computer as well. Do not expect perfection, or even close to it for fuzzy or faded text.
You can find out for yourself what the computer thinks the text is on a page - click on the "view searchable text on page" control icon in the toolbar to the left of any scanned page image.
Photograph captioning information, database entries, and frontmatter pages feature searchable text which exactly matches those data types' content.
Who Can Search
All pages on this website are freely available for browsing and viewing. A membership is required for two things:
- access to PDF copies of documents
- to run more than a few complimentary searches
Learn more about a Mary Ferrell Foundation Membership
Where You Can Search From
Searches may be initiated from:
- Search box in banner (to right of tabs)
- searches entire disk - Search page (click Adv. Search or RIF Search to left of search box in banner)
- use control panel to fine-tune searches - Sidebars on document listing pages
- searches document set and its subsets - Sidebars in specialized pages:
- Books
- Mary Ferrell Database
- Dealey Plaza Witness Database - Search box in Document View page
- searches just current document
Note that other features of the disk also make use of the search engine, for example the "SEE ALL" feature on the RIF Form tab of document pages.
Basic Search
The simplest way to search is to type one or more words - Ruby, Carrico, tsbd, AMSPELL - into the search box in the banner at the top of any page. Searches are not case-sensitive, so RUBY and ruby and RubY are all identical. Run the search by typing the Enter key or clicking the Search button at the right edge of the box.
Upon hitting Enter or clicking Search, you will be taken to the Search Results page, which displays the highest-ranking page hits. From the Search Results page, you may click to view matching documents, see more pages or results, or refine your search.
If you typed a single word, the results will show pages which feature that word in the page title or body text.
If you typed multiple words, this is NOT treated as a phrase (use quotes for that, as explained subsequently). Instead, all pages which feature ANY of the words appear in the results. Importantly, pages which feature more than one of the search words, and especially words which are "rare", will be ranked higher.
Phrase Searches
In order to search for a series of consecutive words, enclose the phrase in quotes, for example "Albert Osborne" or "formation of the warren commission". This will return only pages which feature that exact phrase embedded in them.
Actually, "exact phrase" is not quite accurate. Small "stop words" are removed from search phrases, so "formation of the warren commission" actually turns into "formation warren commission". But the pages being searched go through the same stop-word removal process, so the match works. This does mean, though, that "formation of the warren commission" will match a page which really says "formation by a warren commission", as "of the" and "by a" would all be removed. See #11 in the sidebar for the list of stop words.
Omission of stop words can cause some problems - "to be or not to be" turns into an empty phrase. But usually this "sloppiness" is a good thing, and will result in fewer misses due to minor wording changes or stray punctuation (which is also removed).
Also, a certain amount of "slop" is intentionally introduced into phrase searches, to account for stop words and other slight variations. So "we heard the first report" can be found via "we heard first report" or "we heard a first report" or "we heard a this first report".
Note that stop words are removed in all search text, not just phrases.- but generally improves search by including hits which differ only by these connective words.
Due to text recognition errors on scanned pages, the longer the phrase the more likely it will miss simply due to recognition errors. If you know a given phrase should exist, but can't find it, try portions of the full phrase.
Boolean Searches
Since search results can often contain a great many results, it can be helpful to hone your search by looking for only pages which contain word (or phrase) A and word (or phrase) B. For example, the example shown here would find only pages which contain both the name Maheu and which also contain the phrase "mafia plots". You would enter such a search like this: maheu and "mafia plots".
Not that boolean search is different from phrase search - in phrase search the phrase words must be matched in sequence; boolean search just requires that the words be present somewhere on the same page.
Search Results
Regardless of from where a search is invoked, you are taken to the Search Results page, where the search is conducted and results displayed. If the search was invoked from the banner search box, then the search is conducted across all page types, and a "segregated" results page is shown.
Below a line which displays the number of total hits, the segregated results page shows the top few hits for each type of data; these sections are:
- Frontmatter Pages (help pages and other html pages)
- Essays
- Reports (documents called out specially because they are major reports)
- Documents (all other documents)
- Books
- Journals
- Mary Ferrell Database
- Photos
- Dealey Plaza Witness Database
Each section title displays the number of page hits for that page type. Within each such section, the top few hits are displayed. Each hit consists of:
- If book or photo, thumbnail image
- Title and page number if relevant (te title is a link to view the page)
- Collection (link to document set or resource)
- One or a few lines featuring the searched word or phrase in context, bolded
- Link to see all page hits in the document (if relevant)
Note that each section header has a triangle at its left. Click anywhere in the section header bar to "roll up" that section to hide its results. If you are only interested in reports and documents, for example, you may hide the others. You can roll and unroll each section at any time; the settings you choose are "sticky" and persist across searches until you change them again.
Refining Results
The title (first line in hit) is a link to view the search hit, whether that is a page in a document, a Mary Ferrell Database entry, a photograph, an essay, etc. Books are the exception - for copyright reasons only the short snippet shown on the results page is available.
Note that for journals, documents, and reports (which are just specially-noted documents), all the page hits in that document are "rolled up" into one result. You may see a link at the bottom of the hit of the form "see all page hits in this document." Click on that link to go to a separate page where you can see all the page hits in that document, report, or journal issue.
If the section - books, reports, documents, etc. - has more hits than the few shown on the segregated results page, the hits are followed by a link to see the next set of results for that type only.
You may "drill down" to see all search hits for a given page type, all hits within a document set (collection), or all hits within a single multi-page document. If there are more hits than are first presented on the results page, a "see more results" button will appear at the bottom. Click that to add more results to the page.
Note that when you click a search hit title to go view the document page or database entry or whatever it is, usually the words or phrases being searched for are highlighted on that page, as shown below.
Searching in Context
The search banner isn't the only place you can search from. There are specialized "search sidebars" in the following places:
- Document set listing pages. Search is scoped to include only documents in the current document set and sub-sets.
- Books main page. Searches only books.
- Photos main page. Searches only photos.
- Mary Ferrell's Database pages. Searches only Mary Ferrell Database entries.
- Stewart Galanor's Dealey Plaza Witness Database. Searches only the Dealey Plaza Witness Database, and shows results on-page rather than going to Search Results.
By design, any of these scoped searches will not result in a "segregated" search results page, but rather one which shows only hits within a given page type (document, book, photo, Mary Ferrell Database, etc.). As noted earlier, the Dealey Plaza Witness Database is special in that searches are presented within the Database page.
Additionally, the document viewer used to read a given document has a within-document search system. A search popup allows searches to be entered, results to be displayed right in the popup, and links to jump to the marked page.
Advanced Search
The search results page doubles as an Advanced Search page, and can be reached directly by clicking on the ADV SEARCH tab in the banner atop each page.
This page allows more control over your search than can be done from the banner search box or search sidebars. By default, these special controls are hidden. Click "enable filters" below the search box to turn them on. The full search control panel has these features:
- Search for: Search entry box with "X" button to clear the input, and "Search" button to initiate the search.
- Search in: Checkboxes to control exactly which page types are searched
- Confine search to document set(s): Revealable panel to control which document sets are searched
- Limit search to date range(s): Date input boxes for setting a date range on search results
Search in: The default is "All", which searches all document types. You can click on one or more of the other types to limit searches to just reports, non-report documents, books, essays, photos, Mary Ferrell Database entries, and other types of pages.
Confine search to document set(s): Clicking on the "change" link where the current document scoping is shown reveals a panel where one or more "document sets" (collections) may be checked for inclusion in the search. note that this is an advanced feature which is often not needed, because alternately you can use search sidebars to scope to a particular document set and its sub-sets.
Note that only "top-level" docsets are included in the list - there are several hundred docsets and it would be too cumbersome to display them all. Selecting one of these top-level docsets automatically includes its child sub-sets.
The revealed panel also has a "clear all" link for convenience to reset all checkmarks to off. When no docsets are selected, this means all docsets are included.
Limit search to date range(s): Type in a date of the form MM/DD/YYYY, or use the datepicker popup to select a start date, and end date, or both. Only documents tagged with a date within the specified range will be returned. You may leave one of the date fields blank to keep the start or end date open-ended. Note that this feature has limitations: many documents are undated, and also all documents, including larger reports, have a single date for the entire document, not for individual pages or sections.
RIF Search
The Search page is really two search pages in one. Click the "RIF Search" radio button on that page, or click "RIF SEARCH" in the banner tab area, and the Search Page interface changes to accomodate searching of Record Identification Forms, or RIFs.
This site contains over 100,000 documents which contain such "metadata", assigned by the National Archives, and these documents feature the printed RIF page as the first page in the document. The fields on these forms include Agency, Date, From, To, Subjects, Classification, and more, as described below.
You can search a single field for matching values by selecting the field in the dropdown and then typing your search term in the box, and hitting Enter or clicking Search. There are some subtleties associated with particular fields as outlined below:
- Record number. You can type in a full record number, as in 104-10400-10027, or you can enter partial RIF numbers such as 104 or 104-10400. Note that complete components must be used; 104-104 yields 0 hits.
- Title. Full search features: enter a single word, multiple word, quoted phrase, and use booleans (e.g.: "marina oswald").
- Record series. Must enter exact record series (though not case-sensitive).
- Agency. Must enter exact agency.
- Agency file number. Full search features (e.g.: jcs or "jcs 2304/189").
- Originator. Must enter exact originator.
- From. Full search features (e.g.: taylor).
- To. Full search features (e.g.: "joint chiefs").
- Date. Enter a 3-part slash-separated date, like 11/22/63. Note that unfortunately 11/22/63 and 11/22/1963 are not the same, and RIF dates are encoded irregularly.
- Page. Enter a page count number, like 15.
- Document type. Must enter exact document type.
- Subjects. Subjects fields consist of an array of semicolon-separated values, which are collapsed into a single string. Full search features; searches across the entire field (e.g.: paine ruth).
- Classification. Must enter exact classification.
- Restrictions. Must enter exact restrictions.
- Current status. Must enter exact current status.
- Last review. Enter a 3-part slash-separated date, like 10/04/98.
- Opening criteria. Must enter exact current opening criteria.
- Comments. Full search features (e.g.: "index cards").
The RIF search form supports date-range filtering as well. The other filters are not available (doctype makes no sense, as all RIF-based documents are of type "document", and document set (collection) filtering is not supported).
The search results for RIF searches shows more information for each hit; basically it is the entire RIF form in a compact layout. The field being searched is bolded. Note that in RIF searches the ordering of search results is not particularly important - in fact it is essentially random in most cases, except where only a portion of a field is matched; in this case shorter fields with a higher "percentage match" will be shown first.
Clicking on the title of a search results takes you to view the document. When viewing such a document, there will be a tab labeled RIF above the page. Click on the RIF tab, and you will see the RIF data as a form.
IMPORTANT NOTE: The RIF VIEW tab when viewing documents has a special feature. For many of the fields, there is a link to see all documents which feature that same value in that same field. Clicking this link runs a pre-defined search, putting quotes around the value; the ensuing search results page shows all documents with the same value in that same field.
How do you know what possible values are in these fields? The easiest way is to look at some of them. Well over half of the documents on this disk have RIF data associated with them; CIA files in particular almost all have them.
RIF search results by default are sorted by "relevance," which is merely the default sorting among 5 options, selectable via links at the top-right above the search results:
- RELEVANCE - Measure of how often the search term appears compared to document length (default)
- RECORD NUMBER - The document's record number, e.g., 104-10400-10412
- DATE - The document's date
- AGENCY FILE NO. - The agency's internal file number for this document
- TITLE - The document's title
Note that the sortby field is "sticky" while refining a search - changing the search term, filtering by date, and using the "see more" button. If you click the RIF SEARCH tab in the banner to start a new RIF search, the sortby field will be reset to RELEVANCE.
FAQ
Q. If a search fails, is that proof that the word is not in any document?
A. NO. Assuming you have entered the search correctly, it may fail for the following reasons:
1. The search term really is not in any document.
2. The OCR process mis-recognized the word on one or more pages, so it's there but the search engine doesn't know it.
3. The OCR process found the word, but it was left out of the search index. Because the OCR process creates so many "junk" words when encountering faint or otherwise hard-to-read text, the search index would normally be filled with many hundreds of thousands of junk words. To cope with this, the search engine drops most of the words which appear 3 or fewer times across the entire collection. This pruning is designed to try not to drop actual words and names, only junk, via an "exception list" and other means. But the techniques employed are error-prone. Therefore, there is no doubt that some rarely-appearing names and other valid words are missing from the search index. No search system is perfect; this compromise enhances usability with very little actual downside. Note that any word which appears more than 3 times across the entire collection will be guaranteed to be in the index.
Q. I'm getting too many results. How can I winnow them down?
A. Try these techniques:
1. Make sure you are using quotes properly. The phrase search "albert osborne" finds all pages which feature the word albert immediately followed by the word osborne. Omitting quotes will find all pages which have the word albert or the word osborne in them, which will be many more pages.
2. Use boolean "and" to combine two searches and get only the pages which hit in both.
3. Use the document type checkboxes, document set search controls, and date filters to confine your search to a subset of documents. You may also limit document types/sets implicitly by searching from the search sidebars in various parts of the disk.
Q. Why can't I see book pages?
Since the books on this site are copyrighted, only short excerpts are shown. Viewing of book pages via either browsing or searching is not permitted. There are some book chapters available for reading because the MFF obtained permission - see the Book Previews page.
Q. Sometimes I see the word CLEAR to the left of ADVANCED SEARCH in the search banner. What is that for?
A. When you have clicked through the search results page to a "hit", the words you searched for are highlighted on that page. This is due to the page address including search={xxxx} as part of the address. Clicking CLEAR will remove this from the address and refresh the page, removing the yellow highlights. This can be useful for decluttering the page view, and also if you want to copy the page address to share and don't want the search term to be part of it.
Q. I ran a few searches and then got a "SEARCH LIMIT REACHED" message. Why?
The Mary Ferrell Foundation makes all pages of this website freely available for browsing, online reading, and linking. In order to support the MFF's work, some income is necessary. Our primary source of revenue is from memberships, which afford users unlimited searches. A Professional membership also allows access to PDF copies of documents. Join the MFF today and support our work.