Hi! Let’s encode a PDF file with Python in base64 format. We will be using Python 3.8.10. Let’s go! ⚡⚡✨✨
Base64 is a method of encoding binary to text. More specifically, it represents binary data in an ASCII string format. Recall that ASCII is standard for encoding electronic communication.
In this example, we are going to encode a PDF file on disk to the base64 format. Here is our code:
import base64
with open("sample.pdf", "rb") as pdf_file:
encoded_string = base64.b64encode(pdf_file.read())
Let’s explain what’s happening here:
- We import our base64 library. This library should installed by default.
- Have a PDF file with which to test this code. The file should be called sample.pdf and it should reside in the same folder as the python script.
- We read the PDF file from disk and pass it as a parameter to the b64encode() method. This method encodes the file read from disk to the base64 format and returns the encoded bytes. We save these encoded bytes as variable encoded_string.
Simple right? If you print the encoded_string to the screen you will see that it is a very long string of text that isn’t intelligible.
Base64 encoding is NOT the same as encryption. The point of encoding anything in Base64 is not to provide security. Rather, it is to encode non-HTTP-compatible characters that may be in the user name, password or other data into those that are HTTP-compatible.
Anyone can simply decode your file or other data, once they know you used base64 to encode it.
Ready to decode this file back into a regular PDF? Check out how to do that HERE in another great tutorial. Thanks for reading! 👌👌👌