I’ve begun to focus more on Python as I’ve taken on different projects this past year. I wanted to make a quick post about the different ways to accomplish the same thing. There’s no ‘wrong way or right way’ but it’s more about the given situation that each way to accomplish the same goal fits.
So let’s dive in.
All of the following is available on my GitHub.
To keep this example as fundamental as possible this is the scenario:
- The data set is provided in a yaml file
- The file must be parsed and provide output
The yaml file ‘network-device.yaml’ has the following contents:
switches:
- hostname: switch1
vendor: arista
location: datacenter
- hostname: switch2
vendor: cisco
location: inventory
Python Scripting
We start with opening the yaml file and parsing it with a simple for loop. The benefits to this approach is it’s simplicity
from dataclasses import dataclass
import yaml
from pydantic import BaseModel
with open('network-device.yaml', 'r') as file:
devices = yaml.safe_load(file)
"""
Parse through the YAML in a simple script
ref. https://pyyaml.org/wiki/PyYAMLDocumentation
"""
for d in devices['switches']:
print(f" device {d['hostname']} is made by {d['vendor']} and is in location {d['location']}")
Scripting is a great way to start and end a project in a short amount of time. But if the requirements extend the program to days or weeks of work it can become a mess without good documentation as the progress as made on the program.
Python Classes
We then try a python class to do the same thing.
from dataclasses import dataclass
import yaml
from pydantic import BaseModel
with open('network-device.yaml', 'r') as file:
devices = yaml.safe_load(file)
"""
Parse through the YAML with a Class
ref. https://docs.python.org/3/library/dataclasses.html
"""
class Devices:
def __init__(self, hostname, vendor, location):
self.hostname = hostname
self.vendor = vendor
self.location = location
D = [Devices(hostname = d['hostname'], vendor = d['vendor'], location = d['location']) for d in devices['switches']]
for h in D:
print(f" device {h.hostname} is made by {h.vendor} and is in location {h.location} {type(Devices)}")
I’m using a list comprehension as ‘D’ which is a fancy way of creating the objects of the class. I could also have used a basic ‘for’ loop. The benefits of the Python Class approach is I now have an object I can do more things with. This object is held in memory so I can do other things with this object in other methods of this class (or child classes) without having to re-parse the yaml again.
Classes are an extremely powerful tool but in my opinion should have a documented flow of why code is the way it is so the next person working on it understands what is going on.
Python Dataclasses
Dataclasses take classes are another great way for object instantiation and also self-documenting by asking the programmer to declare the variable type like a statically typed programming language (ex. C programming language). Also by using the decorator the dataclass is doing a lot of the work under the covers for us by automatically generating the __init__ and __repr__ methods.
"""
Using the dataclass decorator
ref. https://docs.python.org/3/library/dataclasses.html
"""
@dataclass
class OtherDevices:
hostname: str
vendor: str
location: str
A = [OtherDevices(**c) for c in devices['switches']]
for i in A:
print(f" device {i.hostname} is made by {i.vendor} and is in location {i.location} {type(OtherDevices)}")
One thing to call out is I went ahead and used keyword arguments (kwargs) to help generate the list comprehension instead of spelling out each variable name and it’s key. Only done as a time saver.
Pydantic
Pydantic is taking this to the next level by asking the user to create models.
"""
Using Pydantic
ref. https://pydantic-docs.helpmanual.io/usage/models/
"""
class Switches(BaseModel):
hostname: str
vendor: str
location: str
E = [Switches(**h) for h in devices['switches']]
for i in E:
print(f" device {i.hostname} is made by {i.vendor} and is in location {i.location} with {type(Switches)}")
There is a lot to read about all the things Pydantic has to offer but if I changed the hostname type to ‘int’ it will immediately kick back an error saying that value I’m passing it is not an integer. This is of huge value as dataclassses did not do this validation of type for us.
"""
Using Pydantic
ref. https://pydantic-docs.helpmanual.io/usage/models/
"""
class Switches(BaseModel):
hostname: int
vendor: str
location: str
E = [Switches(**h) for h in devices['switches']]
for i in E:
print(f" device {i.hostname} is made by {i.vendor} and is in location {i.location} with {type(Switches)}")
Traceback (most recent call last):
File "C:\Users\user\Documents\python\ways.py", line 64, in <module>
E = [Switches(**h) for h in devices['switches']]
File "C:\Users\user\Documents\python\ways.py", line 64, in <listcomp>
E = [Switches(**h) for h in devices['switches']]
File "pydantic\main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Switches
hostname
value is not a valid integer (type=type_error.integer)
There is a lot more to pydantic but the obvious immediate value is that if the program is part of cross team or even inter-team collaboration having a built in validator is very nice to have so another member of the team can’t misuse a given method or object. Also for class, dataclass and pydantic ways are all great ways to start a larger program as it should force the programmer to think about the requirements and what/how they want to accomplish. The actual typing of the syntax at that point is fairly trivial once a well laid out plan is in place. We always want to use the right tool for the right job and not write code around an inherent problem in the planning phase.
I didn’t dive into the ‘cons’ of any approach or another on purpose as each way to accomplish the same goal is directly related to the scope and size of a given project which is extremely variable.