yokome.deployment.server

exception yokome.deployment.server.BadRequestError

HTTP Bad Request Error.

yokome.deployment.server.ENGLISH = 'eng'

str: ISO 639-3 language code for English.

yokome.deployment.server.JAPANESE = 'jpn'

str: ISO 639-3 language code for Japanese.

yokome.deployment.server.KANA_RANGES = ((12353, 12438), (12449, 12538))

tuple<tuple<int>>: Character ranges of kana characters.

The ranges contain pronouncable characters only and are expressed as pairs of start (including) and end (including) characters.

yokome.deployment.server.KANA_RATIO = 0.05

float: Minimum kana rate for immediate JPN detection.

exception yokome.deployment.server.NotFoundError

HTTP Not Found Error.

yokome.deployment.server.PORT = 5003

int: Server port to start on.

yokome.deployment.server.TOKEN_SERVANT = 'tokenizer'

str: Tokenizer API location.

yokome.deployment.server.TOKEN_SERVICE = 'tokenize'

str: Tokenization API location for tokenizer API.

exception yokome.deployment.server.UnprocessableEntityError

HTTP Unprocessable Entity Error.

exception yokome.deployment.server.UnsupportedMediaTypeError

HTTP Unsupported Media Type Error.

yokome.deployment.server.WSD_SERVANT = 'wsd'

str: WSD API location.

yokome.deployment.server.WSD_SERVICE = 'disambiguate'

str: Disambiguation API location for WSD API.

yokome.deployment.server.api_disambiguate()

Respond to an HTTP POST request at the disambiguation endpoint.

The expected data has the following form:

{
  'language': <ISO 639-3 language code or null>,
  'tokens': <A sentence, split into its tokens>,
  'i': <position of the token of interest>
}
Returns

An HTTP response. If the request was successful, the data is a JSON of the dictionary returned by disambiguate(). Otherwise, send an error message, see handle_error().

yokome.deployment.server.api_tokenize()

Respond to an HTTP POST request at the tokenizer endpoint.

The expected data has the following form:

{
  'language': <ISO 639-3 language code or null>,
  'text': <the text to tokenize>
}
Returns

An HTTP response. If the request was successful, the data is a JSON dictionary that has an entry 'language' for the provided/detected language and may have an entry 'sentences' for the tokenized sentences. Otherwise, send an error message, see handle_error().

yokome.deployment.server.api_tokenizer_inform()

Respond to an HTTP OPTIONS request at the tokenizer endpoint.

Allow the POST method from all origins.

yokome.deployment.server.api_wsd_inform()

Respond to an HTTP OPTIONS request at the disambiguation endpoint.

Allow the POST method from all origins.

yokome.deployment.server.detect_language(text)

Detect the language in which the text was written.

Currently only support Japanese.

Parameters

text (str) – The text to detect a language in.

Returns

An ISO 639-3 language code, if a language was detected, None otherwise.

yokome.deployment.server.disambiguate(tokens, i, language)

Disambiguate the token at index i for the specified language.

Currently only support Japanese.

Parameters
  • tokens – A sentence, split into its tokens.

  • i (int) – The position of the token of interest in tokens.

  • language (str) – ISO 639-3 language code of the language of the tokens.

Returns

A dictionary containing an entry 'language' for the language and an entry 'lexemes' for the lexemes of the token at index i.

The entry 'lexemes' is list of data on lexemes, ranked by their overall suitability to describe the meaning of the token at i, with their connotations in turn associated with their suitability. Each element is a dictionary of the following form:

{
  'entry_id': <ID of the lexeme in the dictionary>,
  'headwords': <list of lemmas for the lexeme>,
  'discriminator': <int for lexemes with the same main headword>,
  'roles': [
    {
      'poss': <POS tag list for the role>,
      'connotations': [
        {
          'sense_id': <the ID of the connotation within the lexeme>,
          'glosses': ((<gloss_type>, <gloss>), ...),
          'score': <connotation score>
        },
        ...
      ]
    },
    ...
  ],
  'score': <overall lexeme score>
}

Raises

NotImplementedError – If the requested language is not supported.

yokome.deployment.server.handle_error(error)

Catch errors for HTTP requests and provide apt responses.

Handle the following errors:

  • Bad Request Error (400)

  • Not Found Error (404)

  • Unsupported Media Type Error (415)

  • Unprocessable Entity Error (422) (also while catching TypeError/ValueError)

  • Not Implemented Error (501)

All remaining errors results in an Internal Server Error (500).

The data in the response is an error message. In case of the debug mode, a traceback is appended.

Returns

An HTTP response with the respective error.

yokome.deployment.server.tokenize(text, language=None)

Tokenize the specified text for the specified language.

Attempt to detect the language of the text if no language is provided. For Japanese, apply the JUMAN++ morphological analyzer (Morita, Kawahara, Kurohashi 2015).

Parameters
  • text (str) – The text to tokenize.

  • language (str) – ISO 639-3 language code of the language the text is written in. If None, the language is detected.

Returns

A dictionary containing the language. The tokenized sentences are contained only if there are a segmenter and tokenizer for the language.