When testing Document Processing, use the sample PDFs below. Upload these documents to test successful parsing, various tax form types, and suspicious document detection.
Test scenarios use the file name to determine results. The sandbox ignores actual file contents. Only the file name matters.
Pay stubs
| Document | Download |
|---|
| Most recent paystub | most.recent.paystub.pdf |
| Next recent paystub | next.recent.paystub.pdf |
| First paystub | first.paystub.pdf |
Tax documents
For 1099 tax documents, Truv supports parsing formats from any year after 2021. This includes 1099-DIV, 1099-G, 1099-INT, 1099-MISC, 1099-NEC, and 1099-R.
Volunteer documents
| Document | Download |
|---|
| Volunteer letter | volunteer_letter.pdf |
| Volunteer timesheet | volunteer_timesheet.pdf |
Suspicious document detection
| Scenario | Description | Downloads |
|---|
| Tampered documents | Information is falsified or manipulated | Tampered 1, Tampered 2, Tampered 3 |
| Different SSNs | Personal information is inconsistent across documents | SSN 1, SSN 2, SSN 3 |
| Different applicant names | Personal information is inconsistent across documents | Applicant 1, Applicant 2, Applicant 3 |
| No data or invalid data | Information is missing or unable to be parsed | No data 1, No data 2, No data 3 |
Base64 encoding for Document Collections API
The Document Collections API accepts base64-encoded file content when creating or uploading to a collection. To encode a test document for use with the API:
# Download a test document
curl -O https://citadelid-resources.s3.us-west-2.amazonaws.com/doc_upload/most.recent.paystub.pdf
# Base64 encode it
base64 -i most.recent.paystub.pdf -o most.recent.paystub.b64
# Use the encoded content in your API call
cat most.recent.paystub.b64
Pass the base64 string as the content field when creating a document collection:
{
"files": [
{
"filename": "most.recent.paystub.pdf",
"content": "BASE64_ENCODED_CONTENT"
}
]
}
In sandbox mode, the file name determines the test scenario, not the actual content. The base64 content can be from any valid PDF — only the filename matters for sandbox behavior.