LinguaStream
Материал из MachineLearning.
LinguaStream
An Integrated Experimentation Environment for Computational Linguistics
LinguaStream allows to design elegant and shareable solutions for many natural language processing (NLP) problems, quickly and conveniently. It offers:
* An integrated experimentation and prototyping environment, including a variety of operators that can be visually pipelined and imbricated to build high-level natural language processing algorithms. * An advanced document annotation model that allows arbitrary text spans to be marked up in documents, along with rich annotations represented as feature structures. * A variety of standard operators, like word or sentence splitting, POS tagging, projection of lexical resources, statistical analyses, etc. * A set of advanced operators based on declarative languages, allowing to formalise linguistic models according to different paradigms: automata, unification grammars, inference rules, constraint sets, etc. * A set of shell operators allowing to integrate with third-party systems, like the Syntex syntactic analysis tool. * A set of XML-based operators allowing to manipulate resources and perform document-engineering tasks. * A set of visualisation methods allowing annotated documents analyses to be observed in place, in concordancer-like views, from a statistical point of view, etc. * Shareability: the platform favours the sharing of NLP resources and algorithms, and we host a repository where they can be described and made available. * Openness: LinguaStream is multi-platform, can handle documents in any XML format, has a component-based architecture, and offers a clean Java API.