Hi! Let’s decode a PDF file with Python in base64 format. We will be using Python 3.8.10. Let’s go! ⚡⚡✨✨
Base64 is a method of encoding binary to text. More specifically, it represents binary data in an ASCII string format. Recall that ASCII is standard for encoding electronic communication.
In this example, we are going to decode a PDF file on disk to the base64 format. Here is our code:
import base64
with open("sample.pdf", "rb") as pdf_file:
encoded_string = base64.b64encode(pdf_file.read())
file_64_decode = base64.b64decode(encoded_string)
file_result = open('sample_decoded.pdf', 'wb')
file_result.write(file_64_decode)
Let’s explain what is happening here:
- We import our base64 library which should already be installed by default.
- You should have a PDF file in the same folder as the script with which to test this code. We called ours sample.pdf you can name yours whatever you wish but be sure to modify the code.
- We read this file from disk and pass it to the b64encode() method. This method encodes the file read from disk to the base64 format and returns the encoded bytes. We save these encoded bytes as variable encoded_string.
- Now we begin the decode step. We call the b64decode() method which decodes the ASCII string encoded_string and return the decoded bytes. The decoded bytes will be stored as file_64_decode.
- We simply write the decoded bytes file_64_decode to disk as PDF file sample_decoded.pdf.
- Make sure you don’t have any file in the same directory with the name sample_decoded.pdf or you may get an error.
That’s it! See how easy that was?
Base64 encoding is NOT the same as encryption. The point of encoding anything in Base64 is not to provide security. Rather, it is to encode non-HTTP-compatible characters that may be in the user name, password or other data into those that are HTTP-compatible. Please keep this in mind.
Anyone can simply decode your file or other data, once they know you used base64 to encode it.
Thanks for reading and good luck! 👌👌👌