This is a second shot at expressing Path Schema as algebraic objects. See my first attempt. The definitions should be equivelent, and any places they are not indicates a deficency in one of the defintions. This should be a bit more elegant, than before. It is also a bit more extensive. Note that and are now defined differently, and and are what one should be focussing on instead, this is to use the free monoid convention.
In general a path can be described as a a hierachical index, onto a directed multigraph. Noting that “flat” sets, trees, and directed graphs are all particular types of directed multigraphs.
To repeat the introduction:
This post comes from a longish discussion with Fengyang Wang (@TotalVerb), on the JuliaLang Gitter. Its pretty cool stuff.
It is defined here independent of the object (filesystem, document etc) being indexed. The precise implementation of the algebric structure differs, depending on the Path types in question, eg Filesystem vs URL, vs XPATH.
This defintion is generally applicable to paths, such as:
 File paths
 URLs
 XPath
 JSON paths
 Apache ZooKeeper Paths
 Swift Paths (Server/Container/Psuedofolder/Object)
 Globs
The defintion whch follows provides all the the expected functionality on paths
 Definition
 Parentdir, and basename
 Derived Operations
 Resolving a Path to the object
 the Pseudoparent Element
 Other extensions
 Conclusion
Definition
To have the general functionality of a path we only need to define a few items on our index, from these the rest of the functionality flows. This sets a minimal amount of functionality for a path at the most abstract level. Particular path implementations can provide more. The defintion is made from 3 object , , , with 3 rules (1), (2), (3) binding them.
Our path is defined by two sets, , and , and an operation .
 We call the set of relative path components.
 the set of absolute path roots.
 is called the
pathjoin
operator. By convention acts on the right.
 (1) It is required that generates a free monoid from $R$
 We will call the identity element of this monoid .
 We will also call this the relative path root, and call the roots of the path schema
 For convience we will follow the usual definition of
 (2) It is also required that is disjoint from
 (3) It is required that acts faithfully, on
 We call the closure of under , .
 By the associativity of , we know
 From this, and from the faithfulness of on itself (as a free monoid), we know that acts faithfully on all of
 We will also define here for convience:
 Note that is not defined to act with the operator on or on ; and further we suggest that if does in a particular implemention do so, then is likely misdefined.
Number of roots
Depending on the system being indexed, there may be one or more abolute roots.
In linux file systems, there is one root /
.
In windows, there is one root per hard drive C:
, D:
etc, plus a an additional root for named pipes \\.\pipe\
.
One also might wish to consider UNC paths to be part of the same system in windows – these to would have there on roots.
In theory we could also have zero absolute path roots – this would mean there are on absolute paths. However this is not generally useful, as the evaluation function given later, to resolve an index into the object is only defined for absolute paths.
Parentdir, and basename
As the path is a heirachical index, every point in the heirachy has a parent, or is a root.
We define a function . This is called the parentdir
function. Note the similarity to Lisp’s cdr
, or the tail operation.
 for nonroots: we have:
 for roots: we have:
It takes a little to show that this is properly a function, but it comes from the faithfulness of . Note that and , and that these as disjoint.
We define another a function . This is called the basename
function. Note the similarity to Lisp’s car
, or the head operation.
 for nonroots: we have:
 for roots: we have:
Note that for nonroots, the resault is always (and covers every) path componant – that is to say . And for roots, this is the identity. We note that
and are linked. In that: . This can probably be related to a kind of pseduoinverse.
Parentdir and (..
)
As a digression:
In many implementations of paths, it may be tempting to define an element $\varphi \in R$ and state that .
Eg in filepaths this would be ..
. However this is not possible, under the above definition of a path, as then would no longer be a free monoid, or indeed a monoid at all. As by the fact that is the identity; but by definition of . Depending on how this is resolved one might end up with for pathjoin(a, ./..)==a
.
A alternative is to define a normalisation function (or a equalience relation), that resolves this; as we will do later.
Derived Operations
the root function
We define a function to find the root of a path.
 if then
 if then
The root of an element of is always ; and of an element of is always an element of .
To make finding roots fast (as it is useful for the relative_to operation) it seems efficent to always store an element of , as its root (an element of ), and a relative path from that root (an element of ; which we will call the relative compent of the absolute path). We know that all absolute paths can be expressed this way, due to the faithfulness of .
the depth function
We define a function , often call depth or nesting level.
 for roots ,
 for nonroots for $n$ the smallest integer, such that
the parts function
We define a funtion to find the parts of a path. This breaks a path down in to a sequence starting with it’s root, and followed by path componants
 otherwise
 this ends with
We can recombine the path components: for product over by .
Notice this option (like all the other really), can be implemented easily, by implementing it for the and then apply it to the relative component of a absolute path, until it reach the root – and then subsituting the absolute root for .
the within function
, takes two paths, with one inside the other and finds the relative path from the second to the first. It is a weaker version of
 for we say “the path of , within is ”
 let
 let
 is defined only if and
 note that this means they must share a common root.
 then for
Resolving a Path to the object
Finally we have an the operation that turns a absolute path () into a entity, or a set of entities from the Domain being indexed . These operations are less clear, as at this level objects must be considered – the data store (the indexed set) must be accessed to resolve. And issues like symlinks, hardlinks, etc start to matter (To use examples from POSIX filepaths).
 we define to evaluate the path
 For ,
 that is to say is indeed a function.
 Note that the implication is one way, that is to say: there can be multiple paths such that but
Depending on the type of path schema, this may be 1 enitity, or many
 We call path schema with paths that can eval to any nymber of entities MultiPaths Schemas
 eg XPATH or Glob
 We call Path schema with paths which always eval to one or zero entities MonoPath Schemas
 eg URL, POSIX / NT File System paths
 In these case, we can define .
 Where $\bar{A}^\ast$ is the restriction of $A$ to paths that exis.
 That is
Either way we can make some statements about the cardinality of
 or its (perhaps more interesting) contrapositive:
 for MonoPaths we thus have if is defined then is defined.
the Pseudoparent Element
This was touched on before, that for many systems we would like to have a element , that acts like a the parentdir function. But we can’t because it would break the monoid. Now that we have a understanding of evaluating a path to a domain object, it starts to make most sense to talk of this.
This is not defined for all Path Schema, but it is for many (For file paths it is ..
).
Where it exists:
We define , as having the property that: then .
Does even exist? (POSIX compliance)
Note that for POSIX filesystems using this is not POSIX compliant, as POSIX requires that Symlink directories are substituted in to the path, before ..
(our ) is resolved.
Our function does not have that requirement.
As far as I know there is no way to be POSIX complient on the behavour of ..
without actually reading the filesystem; to know what is or is not a symlink.
See this Unix Stack Exchange question for more information.
The behavour here is the default behavour in Bash. ksh, zsh and ash, with the L
flag (which is on by default for cd
and off by default for pwd
). Also in python os.path
, and Node.jl’s path
.
Python3 pathlib, has the correct behavour – in that it does not process ..
at all, except in the final resolve step. i.e. it does not offer any of the following functionality except a final resolve for absolute paths (Their is the defined ealier). The Haskell Path Package bans ..
outright.
This gets particular hairy for Multipaths; which have path components that are more complex than simply directories.
Consider for a Glob: finds all folders below a
with a sibling that is named b
.
Where as which finds all folders below a
.
And so ..
is not for Glob paths, and indeed there is no such element for them.
So not supporting funtions involving may be a good idea for an implementation. Without it though you can not have nor . Except by accessing the backing system.
Normalising, to remove
Using this, and we can now define a normalising function that removes the where possible.
 .
 This has the requirement that then .
 And for and then .
The function we will define removes all instances of from absolute paths (elements of ), and all nonleading instances from relative paths (elements of ) – though some nonleading s will potentenitally become leading.
We consider this on parts the relative componant of the path. That is (which is just if ) In this we repeatedly replace: from the left the earliest occurances of with , where , and . If it were a relative path, then this all we have to do, we simply rejoin the rewritten paths, and it is done. If it is applied to a absolute path, then we then can strip all leading instances of , since , .
The replacement rule method is quite easy to explain in words. But rather awkward to write mathematically.
To supplement it for clarity, see the implementation, of normparts
in this Julia, Jupyter Notebook.
Proving that does have the features we import upon it, with regard to not changing what paths point to, is not done here. Indeed I am yet to do so at all. It seems like to be very involved.
Note that even if for some with , this still does not imply that , as other things, eg POSIX symlinks, can still give multiple paths, to the same domain object.
The relative_to function
Earlier we saw the $within$ function; which could find the relative path of one directory, within the other. But this was limitted, as we could not move up the heirachy. Now using we can.
, takes two paths, and finds the relative path from the second to the first.
 for we say “the path of , relative to is ”
 let
 let

is defined only if , i.e
 let , with ,
such that , is the common path length
 then
 note that the first part of this is equal to if
And with this: . And . (proving that, also looks like it would be very fun, and is not done here).
As suggeted before, if is not defined to meet , and thus is not defined, we can still define the properties of If we do not have
Other extensions
Canonical Name
Some path schema do actually permit a canonical name. In these schema Swift Object Stores are one such path system which permit a canonical name.
Differenciating Directory Path from File Paths
We have differenciated Absolute Paths from Relative Paths, it is also possible to differenciate File paths from Directory Paths. This is done in the Haskell Path package. This places restictions on pathjoin (). For $R_F^\ast\subset R^\ast$ the relative file paths, and for $A_F^\ast\subset A^\ast$ the set absolute file paths. For $R_D^\ast=\subset R^\ast = R^\ast \setminus R_F^\ast$ the set of relative directory paths, and for then only the following joins are permitted, and they have the following results:
This stops from being a monoid, but is still a free monoid. The earlier definitions can be rewritten again, to take this into account, though it does add considerable complexity. We replace all uses of with uses of $R_D^\ast$, paired with $R_F$.
Conclusion
So that is paths. This is a highly abstract set of features. We show that from the simple rules we can get complex functionality, that is applicable accross the wide range of pathing systems It is applicable on all kinds of things, from keyvalues stores to XPATH. Most importantly perhaps, it covers URLs and Filepaths.
In general one might think monopath schema with only as single absolute root, are isomorphic to strings. This is true, but it is important to note the size of the alphabet – which for most path schema in use is countable infinite.
We did not prove any of our assertions, here. And indeed I’ve not written out proof for many of them at all. I hope however this was illuminating.