Configuration – YAML Style!

TIP: References Quick List

Intro

Files with the either the YML or YAML extension are Yet Another Markup Language (YAML) files. This format of file is used to store data. This is a markup language (storing data / configuration) rather than a programming language (storing commands / instructions). There may be cases when the data stored is a command that should be run by another system, but it is still considered data because that command is interpreted and consumed by another system in order to be run.

Unless the application consuming the configuration files has a preference, either the YML or YAML extension can be used.

Some examples of YAML uses for Java applications:

In a YAML file, indenting (the spaces at the beginning of each line) is very important. That is how a line of text is tied to a larger part of the configuration.

💚 TIP: Make Spaces Visible When Editing YAML Files

In IntelliJ, open the YAML file you wish to edit. Once that file is opened and selected, then:

Go to the View Menu
Go to Active Editor
Select Show Whitespace

You should now be able to see spaces (dots) and tabs (arrows), as well of end-of-line markers (paragraph symbols) that would normally be represented as blank spaces.

You want to be consistent in a YAML file. Don’t mix spaces and tabs. Otherwise, the configuration may not be interpreted as you intend.

Start and End Of File Markers

These are optional. You may or may not see them in YAML files in the wild.

Start of file indicated with three dashes

---

End of file indicated with three dots

...

Lists of Values

Lists of values can be represented on multiple lines (with the same indenting), where the contents of the line start with a dash (-) followed by a space ( ). This format lends itself well to peer reviews during pull requests, as having each value on a separate line often helps to highlight the actual change, rather than having to wade through a long line of values to painstakingly, manually find what differs.

---
- 'Mandy'
- 'Cindy'
- 'Aurora'
...

List values can also be represented with square braces ([]) in an abbreviated form. However, doing so on one line may make peer reviews and troubleshooting more difficult, as having all of the values on a single line will increase the work of the reader.

flavors: [ 'Strawberry', 'Lime', 'Mango' ]

Maps / Dictionaries / Key-Value Pairs

Key-Value pairs describing some larger category of data are called maps / dictionaries. This just means that there is some keyword and an associated value mapped to it. Different languages call them one or the other, but the concept is similar. For example, if looking at yourself, you have a hair-color, eye-color, height, weight, etc. Those are keywords that could have values associated when describing you as a person.

A key-value pair is specified with the format: <key>: <value> where a colon (:) and a space ( ) separate the two.

name: Billy
hair: red

This abbreviated format is supported by YAML, but using this approach will make it more difficult on the human trying to read/understand the configuration. I would not recommend having all of this data on a single line.

karen: {name: Karen, hair: brown }

I have also seen a hybrid format in the wild with the curly braces ({}), but still using separate lines for nested content:

sammy: {
  name: Sammy,
  hair: brown
}

With this format, the commas between fields is also optional. Although including the commas may help with some IDE auto-formatting issues.

Indenting

Lines at the same level (number of spaces) of indenting will be considered related (AKA different elements in the same list, or different fields in the same object/class). To indicate a nested structure (something that describes a larger category/object) we can add 2 spaces ( ) to indicate this relationship.

danny: 
  name: Danny
  hair: brown

In this example, name and hair are things that describe danny.

Comments

Having the ability to add comments to the configuration, such as links to where to find more details, a related defect/issue, etc. can be a great help. In a YAML file, you make a line into a comment by having one or more spaces ( ) and then a pound sign (#).

 # This is a comment and will be ignored when the YAML file is read into the consuming application.

Multi-Line Blocks of Text

There are a couple of different options if you want to write text on multiple lines for a single field in a YAMl file:

With Newlines Preserved

To keep the line breaks in the text consumed by the application (but ignore the indenting at the start of each line), you would use the pipe (|) symbol to start the value:

poem: |
  Roses are red,
  Violets are blue,
  I have one definition,
  And new lines, too!

With Newlines Converted to Spaces

If you want the flexibility of writing and storing a long text value on multiple lines, but have all newlines converted to spaces and the text be on a single line when consumed by the application, you would use the greater than (>) symbol to start the value:

build_command: >
  date; 
  mvn clean install; 
  date

In this example, we want to configure the build_command, when executed, to print out the date and time before and after running another command. The build_command‘s value will be interpreted as a single line of text with spaces in place of the line breaks. This is a rather simple example, but having 3 different instructions stored in the code on three different lines can still make it easier on the reader when reviewing and troubleshooting. For example, do each of the interim commands end with the semi-colon (;) so that they can be strung together? How many commands do we expect to be run?

Quoted Text

As-Is Text – Single Quotes

If you surround text values in single quotes (‘), then nothing inside the value can be escaped/interpreted – such as contain references to variables in Ansible.

as_is_text: 'I will be interpreted exactly as written here.  No escape characters or variables will be interpreted.'

Interpreted Text – Double Quotes

If you surround text values in double quotes (“), then you can include certain escaped characters and certain applications, like Ansible, will also let you include variable references. However, the format of such variable references, when supported, are going to vary by the application consuming the YAML configuration file. Ansible, for example, uses a Jina2 templating / variable expressions.

interpreted_text: "\tI am interpreted text.  When consumed, I will start with a tab character, as opposed to a slash and then a t."

Unit Testing Configuration Files

Unit testing of configuration files is something that is easily overlooked. However, if you don’t test that the configuration file:

Is valid per YAML syntax,
Follows the team’s business rules for when to expect different values,
Has all required values defined.

Then, the lack of testing could become an issue when the application fails to start (if a runtime validation catches the configuration issue) or worse by simply behaving in an unexpected way for your customers. For example, production environments might only support HTTPS connections, while development environments might support both HTTP and HTTPS connections. I would much rather find out that we had a misconfiguration when a unit test fails in the build, compared to a customer complaint / security vulnerability report for a production environment.

Coding Chica