yokome.features.tree

class yokome.features.tree.DataOnlyTree(value=None)

A tree node that conjunctively gathers conjunctive children.

Semantically, children of this node represent cooccurrences. They may only be other instances of DataOnlyTree.

Nodes of this type may carry satellite data.

Parameters

value – The satellite data to carry.

attach(value=<object object>)

Attach a child to this node in the tree.

Parameters

value – A satellite value to store in the child.

Returns

The newly created child.

from_list()

Build a tree from a nested array structure.

Parameters

array – The data to be stored in the tree. Arrays may be lists or tuples. The first element of a nested array is the data to be stored in a node, the other elements are passed on to its child nodes.

Returns

The root node of the newly created tree.

class yokome.features.tree.DataTree(value=None)

A tree structure with satellite data.

Parameters

value – The sattelite data to carry.

exception yokome.features.tree.InvalidDataError

Exception to be raised on invalid restriction data.

exception yokome.features.tree.InvalidEntryError(reason, *args, **kwargs)

Exception to be raised on invalid restriction requests.

Parameters

reason – The exception that caused this error.

class yokome.features.tree.LabeledTree(data=None)

A tree structure with labels.

Parameters

data – The data to be stored in this tree node.

detach()

Break connection between this node and its parent.

to_dict()

Return a dictionary representation of this tree node.

Returns

A dictionary with an entry 'data' for the satellite data and an entry 'children' that contains a dictionary from this node’s labels to a list of children of this node under the respective label. Each such child is again a dictionary as described herein.

exception yokome.features.tree.MissingDataException

Exception to be raised on missing restriction data.

class yokome.features.tree.StructureTree

A tree data structure without satellite data.

class yokome.features.tree.TemplateTree(data=None, restrictions=None)

A tree data structure that assigns labels to children based on a restriction template.

Data is required to be hashable.

Parameters
  • data – The data to be stored in this tree node.

  • restrictions – A string specifying the location of a JSON file containing the restriction data, a dictionary that encodes the restriction data, or None.

attach(data)

Attach a child to this node in the tree.

Before insertion checks are applied to ensure that the specified data is allowed to be inserted at the intended position according to the restrictions.

Parameters

data – The data to be stored in the child node.

Returns

The newly created child.

classmethod from_dict(structure, restrictions)

Build a tree from nested dictionary structure.

Parameters
  • structure (dict) – The data to be stored in the tree. The data stored under the entry 'data' is the satellite data to be stored in the root node. The entry 'children' contains a dictionary from the labels of the root node to lists of children. Each such child is again a dictionary as described herein and will be created as a child node.

  • restrictions – A string specifying the location of a JSON file containing the restriction data, a dictionary that encodes the restriction data, or None.

Returns

The root node of the newly created tree.

is_valid_data(x)

Determine whether the specified element is valid satellite data.

Parameters

x – The value for which to determine whether it may be stored in the tree.

Returns

True if x may be stored in the tree, False otherwise.

classmethod parse(pos_list, pos_dict)

Parse a linear list of elements, building a hierarchical tree.

Follow the heuristics implied by the restrictions in pos_dict so as to create a tree in which the relations between the elements w.r.t. hierarchy, nonconflict and mutual exclusion are made explicit.

Hierarchy is expressed in the restrictions via parent relationships. Nonconflict and mutual exclusion are expressed using labels. Children of the same parent with the same label are mutually exclusive, while children with different labels are nonconflicting.

Parameters
  • pos_list – A list of elements to be stored.

  • pos_dict

    A dictionary of restrictions of the following form:

    {
      <element>:
        {
          'parents': <list of parent elements>,
          'label': <label under which to store this element>
        },
      ...
    }
    

    There must be one element with an empty parent list. This is the root element.

score(token)

Evaluate how much the token specification in token matches this lexeme specification.

The base scoring function is as follows:

\[\begin{split}\mathrm{score^\prime}(x, y) = \begin{cases}\frac{2 + \sum_{l \in \mathrm{labels}(x)} \max_{x^\prime \in \mathrm{children}(x, l), y^\prime \in \mathrm{children}(y)} \mathrm{score}(x^\prime, y^\prime)}{2 + \frac{\mathrm{length}(x) + \mathrm{length}(y)}{2}}, & \mathrm{data}(x) = \mathrm{data}(y)\\ 0, & \text{otherwise}\end{cases}\end{split}\]

This function has the following properties:

  • The score is in the interval \([0, 1]\), with only a complete match scoring 1, and only an empty match scoring 0. Trees that have their roots at different levels in the hierarchy receive a match score of 0.

  • Data in a child node makes a contribution of at most half of the contribution of the data in its parent node to the overall score.

  • Sibling nodes make a higher contribution to the overall score than child nodes.

  • Additional/missing child nodes that are not in conflict with other child nodes result in a reduced score.

To increase the expressiveness of the score, it is downscaled along a quarter-circle curve in \([0, 1] \times [0, 1]\), i.e.:

\[\mathrm{score}(x, y) = 1 - \sqrt{1 - \mathrm{score^\prime}(x, y)^2}\]

For equal scores, children that are specified earlier are considered first to be part of the match result.

Nodes in unrelated parts of the hierarchy do not affect the score. The same holds for overspecification: Descendant nodes in the token tree that are not found in the lexeme tree are not considered.

In case of multiple inheritance, the behavior on TemplateTree overrides the behavior on DataOnlyTree. A resulting TemplateTree match contains the data from the input, but adheres to this TemplateTree’s restrictions.

class yokome.features.tree.Tree

A generic tree data structure.

attach(value=<object object>)

Attach a child to this node in the tree.

Parameters

value – A satellite value to store in the child.

Returns

The newly created child.

detach()

Detach this node from its parent.

yokome.features.tree.dfs(tree, prefix='', next_sibling=False, shortened=False)

Printing function for DataOnlyTree trees.

For LabeledTree / TemplateTree, use their LabeledTree.__str__() method instead.

Parameters
  • tree – The tree to print.

  • prefix (str) – The prefix to be prepended to each line.

  • next_sibling (bool) – Whether the current node is not the last child of its parent.

  • shortened (bool) – Deprecated. Not used.