Base64 is a binary-to-text encoding. 64 means 64 chars to represent all possibility of 6 bits.

binary-to-text encoding

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. Like Base64 encoding.

Why we need binary-to-text encodings?

These encodings are necessary for transmission of data when the channel does not allow binary data (such as email or NNTP) or is not 8-bit clean.

What is Base64

Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in an ASCII string format by translating the data into a radix-64 representation.

Base64 is particularly prevalent on the World Wide Web where its uses include the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.

Base64 Design

Use A-Z, a-z, 0-9, and +, / totally 64 chars to represent all possibility of 6 bits.

For example: “Man” can be encoded into “TWFu”

echo -n Man | base64
TWFu

Questions

But i still have question, How to express TWFu in computor?

In IT world, CPU, RAM, Disk, Transmission, All of them express data in Bits, 0 or 1. Unicode is making sense because it explain down to 0 and 1. But Base64 encoding data into Base64 text. How do IT world get to know them? For example, “TWFu” can be expressed in Unicode, or ASCII. Then the bits will be different.

Base64 is building on the top of 64 ASCII chars. That means it don’t care how you encode and decode the 64 chars. You need to get 64 chars first and then talk about Base64.

How to tell if one chunk of data is encoded as Base64 or not?

There are no mark in the data to tell that is Base64 encoded. But you can use the following regular expression to check if a string constitutes a valid base64 encoding:

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$
  • In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /]. If the rest length is less than 4, the string is padded with ‘=’ characters.