There are many functions in Python 3 that allow us to easily obtain a properly formatted string in python. By strings we actually mean string literals or string variables which comprise numbers, letters, special characters or a combination of all three. This is Part 2 of our tutorial on getting a Formatted String In Python be sure to check out Part 1. Enjoy!
Date and Time
This is a major one! When we get strings in different date and time formats we will need to convert them into a valid date and/or time for other operations, reporting etc.
You will need to understand how to use the format codes of the datetime library to parse the given string into a proper datetime object. As you will see from the example simply knowing this codes will give you the power to parse almost any string you may come across into a valid date time object. You can find the full documentation HERE but for reference I have included the essential, commonly used codes here.
- %Y: Year (4 digits)
- %m: Month
- %Z: Timezone
- %d: Day of month
- %H: Hour (24 hour)
- %M: Minutes
- %S: Seconds
- %f: Microseconds
- %I: Hour 00-12
- %p: AM/PM
- %A: Weekday, full version
Please consult the documentation for more information on how to use the critically important datetime library.
#Snippet 25
from datetime import datetime
#Date of Olympics 2024 in Paris
#ds means Date String
#dto Date Time Object
#Example String 1
ds = '2024-07-26 22:00:00.000000'
dto = datetime.strptime(ds, '%Y-%m-%d %H:%M:%S.%f')
print('Date 1:', dto.date())
print('Time 1:', dto.time())
#Example String 2
ds = 'Jul 26 2024 10:00 PM'
dto = datetime.strptime(ds, '%b %d %Y %I:%M %p')
print('Date 2:', dto.strftime("%d %B %Y"))
print('Time 2:', dto.strftime("%I:%M %p"))
#Example String 3
ds = '7/26/2024'
dto = datetime.strptime(ds, '%m/%d/%Y')
print('Date 3:', dto.strftime("%a %d %b %Y"))
print('Weekday 3:', dto.strftime("%A"))
#Example String 4
dto = datetime.strptime("07-26-2024T10:00:00PM+0200", "%m-%d-%YT%I:%M:%S%p%z")
print('Time Zone 4:', dto.strftime("%Z"))
#Output
#Date 1: 2024-07-26
#Time 1: 22:00:00
#Date 2: 26 July 2024
#Time 2: 10:00 PM
#Date 3: Fri 26 Jul 2024
#Weekday 3: Friday
#Time Zone 4: UTC+02:00
Also be sure to understand the difference between strftime() and strptime(). Strptime is what you will use to parse the string into the datetime object given the format and strftime is what you will use to convert the object to a string of a specified format. In both cases, knowledge of the codes is critical. The best part is that this knowledge will allow you to deal with a wide range of potential date time string literals.
We did a tutorial specifically for Python DateTime which you can find HERE.
Another certainly useful tip would be to parse dates using the dateutil library, for which you can find install instructions and documentation HERE.
First you install the library at the command line.
pip install python-dateutil
#Snippet 28
import datetime
import dateutil.parser
date_string = '2017-02-02T00:00:00Z'
#we parse the string, it becomes a datetime object
parsed_date_string = dateutil.parser.parse(date_string)
print(parsed_date_string)
#output looks like this: 2017-02-02 00:00:00+00:00
date_value = datetime.datetime.strptime(str(parsed_date_string), '%Y-%m-%d %H:%M:%S%z')
print(date_value)
#output will also be: 2017-02-02 00:00:00+00:00
Parsing datetime string literals is an additional control measure against using incorrectly formatted datetimes, which will produce errors immediately before you can use them in calculations.
Phone Number
First we install the library. Find more information HERE.
pip install phonenumbers
#Snippet 24
import phonenumbers
num = phonenumbers.parse("1(206)555-5555", "US")
pn = phonenumbers.format_number(num, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
print(pn)
num = phonenumbers.parse("12065555555", "US")
pn = phonenumbers.format_number(num, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
print(pn)
num = phonenumbers.parse("1206", "US")
pn = phonenumbers.format_number(num, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
print(pn)
#Output
#+1 206-555-5555
#+1 206-555-5555
#+1 1206
Nifty huh? See how we take a string literal and format it as a phone number, with locale specific formatting also? The only issue is that there is no validation take place as you see in the output. It is quite easy to get non-valid phone numbers. This means we must do additional checks to make sure that the phone numbers have the correct amount of digits, etc.
NOTE: For your convenience you can find a list of country codes HERE. Keep the list handy if you are going to be working with phone numbers for your project.
File Sizes
Sometimes we need to format numeric literals as proper file sizes but first we must install a library…again! 🙌🙌 Find more information on the library HERE.
pip install hurry.filesize
#Snippet 22
from hurry.filesize import size, si
#Système Internationale standard
print(f"{size(56000, system=si)}")
print(f"{size(300000000, system=si)}")
#International Electrotechnical Commission standard
from hurry.filesize import size, iec
print(f"{size(99000, system=iec)}")
print(f"{size(7777777777, system=iec)}")
#Output
#56K
#300M
#96Ki
#7Gi
Surprise, surprise! Even file sizes have standards, who knew? 🤷♂️🤷♂️ You will notice that, like numerize, the size method does not take a string literal. Not the most convenient because this means we need to be sure that our string literal is numbers only before using. Remember to do a redundant Type Check just to be safe.
Shorten Large Numbers
We will need to format strings to shorten large numbers for reports or graphs.
That’s where numerize comes in. First we install!🙌 Find more HERE.
- 50,000 becomes 50K
- 5 Million becomes 5M
pip install numerize
Numerize does not work with string literals directly but it works with numbers and gives you a string that is a shortened form of the input number. Nonetheless, it isn’t hard to see why this is useful if we want to get from a raw string to actual, readable data that we can use.
If you know that the string will be a number it’s safe to use it directly in the numerize method else it gives an error. Using string literals in numerize will give a TypeError so be sure to do your Type Checking.
#Snippet 21
from numerize import numerize
n = numerize.numerize(5999)
print(f"{n}")
n = numerize.numerize(50000)
print(f"{n}")
n = numerize.numerize(5000000)
print(f"{n}")
n = numerize.numerize(5000000000)
print(f"{n}")
#Output
#6K
#50K
#5M
#5B
Currency/Money
Money money money !!!💵💵💵 Just like with the previous section we must always be mindful that currency/money representations are not universal. We have to change the locale to suit.
We must install the Babel library first. Go HERE for instructions:
pip3 install Babel
pip install Babel
#Snippet 19
import babel.numbers
from decimal import *
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "EUR",locale='de_DE' ))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "CHF",locale='gsw_CH' ))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "CNY",locale='zh_CN' ))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "INR",locale='hi_IN' ))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "JPY",locale='en_US' ))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "RUB",locale='ru' ))
print(babel.numbers.format_currency \
(Decimal( "21430332889.92" ), \
"TRY", \
locale='tr', \
format_type='name'))
print(babel.numbers.format_currency( Decimal( "21430332889.92" ), "SAR",locale='tr', format_type='name' ))
print(babel.numbers.format_currency(1, 'USD', locale='en_US', format_type='name'))
#BITCOIN BABAYYYYYY!
print(u'\u20BF'+" "+ str(2))
#Output
#21.430.332.889,92 €
#21’430’332’889.92 CHF
#¥21,430,332,889.92
#₹21,43,03,32,889.92
#¥21,430,332,890
#21 430 332 889,92 ₽
#21.430.332.889,92 Türk lirası
#21.430.332.889,92 Suudi Arabistan riyali
#1.00 US dollar
#₿ 2
Pay attention to how vastly different some currency formats can be from what we are accustomed too.
NOTE: For your convenience, you can find a list of currency codes HERE. Keep this handy for your financial services related projects.
Type Checking
Type checking is about verifying the contents of our strings before we proceed to get the proper string format in python. It allows us to answer the following questions:
- Are we working with digits only?
- Are we working numbers only?
- Are we working with a combination of numbers and digits?
Type checking is important because some formatting operations will not work if our strings contain illegal character types. For example, you cannot multiply letters or format a sentence as a date and time, can you? Be sure to make a habit of checking your string types first before moving forward to get a properly formatted string in python. It will make your life a lot easier and fortunately it’s very simple.
#Snippet 23
print("1231424".isdigit())
print("deadbeefcoffee".isalpha())
print("123X456".isdigit())
print("deadbe777efcoffee".isalpha())
#Output
#True
#True
#False
#False
Simple, no? Trust me, redundant type checks will save you HOURS of troubleshooting ✨ {*PRO TIP*} ✨.
Scientific/Percent/Hex/Binary
#Snippet 20
from decimal import *
#Scientific
print( f"{Decimal('46600000'):.2E}")
print( f"{Decimal('46600000'):.2e}")
#Percent
print( f"{Decimal('0.9999'):%}")
print( f"{Decimal('25'):%}")
print( f"{Decimal('0.33'):%}")
#Hexadecimal
print( f"{255:x}")
print( f"{100+55:x}")
print( f"{155:x}")
#Binary
print( f"{1000:b}")
print( f"{100:b}")
print( f"{10:b}")
#Output
#4.66E+7
#4.66e+7
#99.99%
#2500%
#33%
#ff
#9b
#9b
#1111101000
#1100100
#1010
Bonus
Finally we want to look at reversing formatting or undoing string formatting. Why would we need to do this? Well the obvious use case would be data cleaning. If you have to clean a huge data dump you may need to know how to properly strip unwanted formatting. This can be trick but here is some code you can use to get started:
#Snippet 27
import locale
import re
a='10.000,99 €'
b='₹3,32,889.92'
c='$3.99'
#we set the locale of the string format BEFORE
#we remove it
locale.setlocale(locale.LC_NUMERIC, "de_DE")
result = re.sub('[^0-9|.]','', locale.delocalize(a))
print(result)
locale.setlocale(locale.LC_NUMERIC, "hi_IN")
result = re.sub('[^0-9|.]','', locale.delocalize(b))
print(result)
locale.setlocale(locale.LC_NUMERIC, "en_US")
result = re.sub('[^0-9|.]','', locale.delocalize(c))
print(result)
#Output
#10000.99
#332889.92
#3.99
The locale library you saw above is important for ‘internationalization services’ i.e. formatting data depending on region, but conveniently it can also do the reverse as you saw above. Again, cleaning data may not always be straightforward and may require you to be creative and ingenious. 🧙♂️🧙♂️
There is no substitute for hard work! PRACTICE PRACTICE PRACTICE!