yokome.deployment.server
¶
-
exception
yokome.deployment.server.
BadRequestError
¶ HTTP
Bad Request Error
.
-
yokome.deployment.server.
ENGLISH
= 'eng'¶ str: ISO 639-3 language code for English.
-
yokome.deployment.server.
JAPANESE
= 'jpn'¶ str: ISO 639-3 language code for Japanese.
-
yokome.deployment.server.
KANA_RANGES
= ((12353, 12438), (12449, 12538))¶ tuple<tuple<int>>: Character ranges of kana characters.
The ranges contain pronouncable characters only and are expressed as pairs of start (including) and end (including) characters.
-
yokome.deployment.server.
KANA_RATIO
= 0.05¶ float: Minimum kana rate for immediate JPN detection.
-
exception
yokome.deployment.server.
NotFoundError
¶ HTTP
Not Found Error
.
-
yokome.deployment.server.
PORT
= 5003¶ int: Server port to start on.
-
yokome.deployment.server.
TOKEN_SERVANT
= 'tokenizer'¶ str: Tokenizer API location.
-
yokome.deployment.server.
TOKEN_SERVICE
= 'tokenize'¶ str: Tokenization API location for tokenizer API.
-
exception
yokome.deployment.server.
UnprocessableEntityError
¶ HTTP
Unprocessable Entity Error
.
-
exception
yokome.deployment.server.
UnsupportedMediaTypeError
¶ HTTP
Unsupported Media Type Error
.
-
yokome.deployment.server.
WSD_SERVANT
= 'wsd'¶ str: WSD API location.
-
yokome.deployment.server.
WSD_SERVICE
= 'disambiguate'¶ str: Disambiguation API location for WSD API.
-
yokome.deployment.server.
api_disambiguate
()¶ Respond to an HTTP POST request at the disambiguation endpoint.
The expected data has the following form:
{ 'language': <ISO 639-3 language code or null>, 'tokens': <A sentence, split into its tokens>, 'i': <position of the token of interest> }
- Returns
An HTTP response. If the request was successful, the data is a JSON of the dictionary returned by
disambiguate()
. Otherwise, send an error message, seehandle_error()
.
-
yokome.deployment.server.
api_tokenize
()¶ Respond to an HTTP POST request at the tokenizer endpoint.
The expected data has the following form:
{ 'language': <ISO 639-3 language code or null>, 'text': <the text to tokenize> }
- Returns
An HTTP response. If the request was successful, the data is a JSON dictionary that has an entry
'language'
for the provided/detected language and may have an entry'sentences'
for the tokenized sentences. Otherwise, send an error message, seehandle_error()
.
-
yokome.deployment.server.
api_tokenizer_inform
()¶ Respond to an HTTP OPTIONS request at the tokenizer endpoint.
Allow the POST method from all origins.
-
yokome.deployment.server.
api_wsd_inform
()¶ Respond to an HTTP OPTIONS request at the disambiguation endpoint.
Allow the POST method from all origins.
-
yokome.deployment.server.
detect_language
(text)¶ Detect the language in which the text was written.
Currently only support Japanese.
- Parameters
text (str) – The text to detect a language in.
- Returns
An ISO 639-3 language code, if a language was detected,
None
otherwise.
-
yokome.deployment.server.
disambiguate
(tokens, i, language)¶ Disambiguate the token at index
i
for the specified language.Currently only support Japanese.
- Parameters
tokens – A sentence, split into its tokens.
i (int) – The position of the token of interest in
tokens
.language (str) – ISO 639-3 language code of the language of the tokens.
- Returns
A dictionary containing an entry
'language'
for the language and an entry'lexemes'
for the lexemes of the token at indexi
.The entry
'lexemes'
is list of data on lexemes, ranked by their overall suitability to describe the meaning of the token ati
, with their connotations in turn associated with their suitability. Each element is a dictionary of the following form:{ 'entry_id': <ID of the lexeme in the dictionary>, 'headwords': <list of lemmas for the lexeme>, 'discriminator': <int for lexemes with the same main headword>, 'roles': [ { 'poss': <POS tag list for the role>, 'connotations': [ { 'sense_id': <the ID of the connotation within the lexeme>, 'glosses': ((<gloss_type>, <gloss>), ...), 'score': <connotation score> }, ... ] }, ... ], 'score': <overall lexeme score> }
- Raises
NotImplementedError – If the requested language is not supported.
-
yokome.deployment.server.
handle_error
(error)¶ Catch errors for HTTP requests and provide apt responses.
Handle the following errors:
Bad Request Error
(400)Not Found Error
(404)Unsupported Media Type Error
(415)Unprocessable Entity Error
(422) (also while catchingTypeError
/ValueError
)Not Implemented Error
(501)
All remaining errors results in an
Internal Server Error
(500).The data in the response is an error message. In case of the debug mode, a traceback is appended.
- Returns
An HTTP response with the respective error.
-
yokome.deployment.server.
tokenize
(text, language=None)¶ Tokenize the specified text for the specified language.
Attempt to detect the language of the text if no language is provided. For Japanese, apply the JUMAN++ morphological analyzer (Morita, Kawahara, Kurohashi 2015).
- Parameters
text (str) – The text to tokenize.
language (str) – ISO 639-3 language code of the language the text is written in. If
None
, the language is detected.
- Returns
A dictionary containing the language. The tokenized sentences are contained only if there are a segmenter and tokenizer for the language.