Skip to main content
When testing Document Processing, use the sample PDFs below. Upload these documents to test successful parsing, various tax form types, and suspicious document detection.
Test scenarios use the file name to determine results. The sandbox ignores actual file contents. Only the file name matters.

Pay stubs

DocumentDownload
Most recent paystubmost.recent.paystub.pdf
Next recent paystubnext.recent.paystub.pdf
First paystubfirst.paystub.pdf

Tax documents

DocumentDownload
W-2w2.pdf
1099-DIV1099div.pdf
1099-G1099g.pdf
1099-INT1099int.pdf
1099-MISC1099misc.pdf
1099-NEC1099nec.pdf
1099-R1099r.pdf
SSA-1099ssa1099.pdf
1040f1040.pdf
For 1099 tax documents, Truv supports parsing formats from any year after 2021. This includes 1099-DIV, 1099-G, 1099-INT, 1099-MISC, 1099-NEC, and 1099-R.

Volunteer documents

DocumentDownload
Volunteer lettervolunteer_letter.pdf
Volunteer timesheetvolunteer_timesheet.pdf

Suspicious document detection

ScenarioDescriptionDownloads
Tampered documentsInformation is falsified or manipulatedTampered 1, Tampered 2, Tampered 3
Different SSNsPersonal information is inconsistent across documentsSSN 1, SSN 2, SSN 3
Different applicant namesPersonal information is inconsistent across documentsApplicant 1, Applicant 2, Applicant 3
No data or invalid dataInformation is missing or unable to be parsedNo data 1, No data 2, No data 3

Base64 encoding for Document Collections API

The Document Collections API accepts base64-encoded file content when creating or uploading to a collection. To encode a test document for use with the API:
# Download a test document
curl -O https://citadelid-resources.s3.us-west-2.amazonaws.com/doc_upload/most.recent.paystub.pdf

# Base64 encode it
base64 -i most.recent.paystub.pdf -o most.recent.paystub.b64

# Use the encoded content in your API call
cat most.recent.paystub.b64
Pass the base64 string as the content field when creating a document collection:
{
  "files": [
    {
      "filename": "most.recent.paystub.pdf",
      "content": "BASE64_ENCODED_CONTENT"
    }
  ]
}
In sandbox mode, the file name determines the test scenario, not the actual content. The base64 content can be from any valid PDF — only the filename matters for sandbox behavior.