Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF metadata requires sanitization #36

Closed
maxwellfunk opened this issue Jul 19, 2021 · 3 comments · Fixed by #58
Closed

PDF metadata requires sanitization #36

maxwellfunk opened this issue Jul 19, 2021 · 3 comments · Fixed by #58
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@maxwellfunk
Copy link
Contributor

Description of Issue:

Several posted PDF documents retain the metadata of the original document that was edited to create the pdf (e.g., original filename.docx). Need to sanitize original metadata in these docs and reupload.

References (Docs, Links, Files):

image

Possible Solution

download, edit pdf to sanitize and reupload documents in the repository.

@maxwellfunk maxwellfunk self-assigned this Jul 19, 2021
@maxwellfunk maxwellfunk added the documentation Improvements or additions to documentation label Jul 19, 2021
@maxwellfunk
Copy link
Contributor Author

may require a bulk download of all .pdf docs, sanitization using acrobat standard or foxit phantom full edition and re-upload.

@JillTunick
Copy link
Contributor

Turns out only some of the PDFs have old Word titles in their metadata and it's easy to delete it in the PDF File>Properties>Title field. I propose this solution to sanitize the metadata:

I'll download and go through the PDFs one by one, yank the old Word titles from those that have that metadata, save those cleaned PDFs in a folder, and then upload them, with the same filenames and the same labels, to replace the ones with funky metadata in the idmanagement.gov/docs/ area. I would use the History info to identify the older version of the cleaned document with the same filename and I'd delete the older version.

Would that work?

@maxwellfunk
Copy link
Contributor Author

modifying the properties manually is fine.

Note: We will not be able to modify digitally signed documents, so we will just update and sanitize as they are replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
2 participants