Ainized-StanfordNLP (ENG)
한국어로 보시려면 여기를 클릭해주세요.
While searching for the NLP (Natural Language Processing), I found something called StanfordNLP. While reading the description of StanfordNLP, I thought it would be okay to make it into an API server. So, I implemented the API server using nodejs, dockerize it, and distribute it on ainize. The distributed API server provides over 70 language models. You can use it through the link below.
What is StanfordNLP ?
StanfordNLP is a python natural language analysis package. StanfordNLP is a library based on the Stanford system of the CoNLL 2018 UD Shared Task, introduced in the Universal Dependency Parsing from Scratch paper. In StanfordNLP, a variety of tools can be used in a pipeline; these include the following tools:
Tokenization
Tokenization is the work of dividing a given corpus into Token (meaningful units). According to the paper, Tokenization and sentence segmentation are treated as unit-level sequence tagging problems, and they are predicted through BiLSTMs(Bidirectional LSTMs) model and provided output values in CoNLL-U format.
Lemmatization
The goal of both lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For example, verbs such as (am, is, and are) in English come from Be verbs, so the Lemma of these words is Be verbs.
POS(Part-Of-Speech) Tagging
POS tagging is to identifying the parts of speech of words in a sentence. this API server provides UPOS and XPOS. UPOS is abbreviation for Universal part-of-speech, which provides parts of speech following the guidelines of Universal Dependency. XPOS is a Language-specific POS that provides national parts of speech.
Dependency Parsing
Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between “head” words and words, which modify those heads.
Reference
https://stanfordnlp.github.io/stanfordnlp/index.html
http://nlpprogress.com/english/dependency_parsing.html