Solr Indexer/Searcher WebLab Web Service · This service implements the Indexer and Searcher interface of WebLab and connect to a remote SOLR engine in order to realize the functions. The connection is mandatory and thus the remote SOLR server should be started beforehand. Configuration of the SOLR server URL is found in src/main/webapp/WEB-INF/cxf-servlet.xml . See in particular "indexerServiceBean" and "searcherServiceBean". Moreover, the service implements several "Analyser" services that could be included in a complete search chain in order to: - enrich a ResultSet with metadata about Hits (see resultSetMetaEnricherServiceBean) - highlight snippet in ResultSet (see highlighterServiceBean) - provide facets related to the current results (see facetSuggestionServiceBean) - suggest spell correction of the original query (see spellSuggestionServiceBean)
Group: org.ow2.weblab.webservices - All Dependencies
WebLab WebServices Parent POM · This is a generic parent for Web Services developed for the WebLab platform.
open-search-connector · This is a project generated WebLab Maven plug-in
Language Extraction component · This component is dedicated to process text resources contained by the Resource in input in order to identify in which language they are written. A dc:language property is added to every Text section having as value name of the ngp file used as for language profile.
Moses translate service · This service rely on Moses, and the command line must be installed. NOT TESTED ON WINDOWS BASED OPERATING SYSTEM. A dc:language property is added to every Text section that will be created in parallel to the original Text section.
Local file Exposer · When indexing a shared folder, it enable to add the value of the dc:source property to wl:isExposedAs. It is possible to apply a transformation on the value to be copied. To let this work, it's need to add a context file in either your tomcat or liferay configuration and to let this service use the URL of your server as exposition pattern.
Gate Extraction · Gate based component, that can process the Text units to extract informations using Gate's tools (such as grammars, gazetteers, tokenizer or POS Taggers).
Folder Resource Iterator · This service is QueueManager that browses a folder containing WebLab resources. It provides filtering features to prevent from crawling resources if needed.
Folder Listener · Create a queue manager which listens to folders and converts files, resources or warcs for each nextResource call. Each particular type of file is managed through a dedicated implementations.
folder-crawler-service · This simple crawler can be used to iterate over files inside a folder on your file system.
Resource Container using file system. · It's a file system repository. Just configure the file system folder and the component will save and load resources from files in this folder. When you save a resource, the component checks if the resource's uri exists and if it's exists replace it, if not generate an unique uri for the repository and replace all old uris with the new one and save the resource. You can load every saved resource, and subresource.
Blank Lines Remover · A service which remove all unused blank lines in text section of MediaUnits.
XML specific normaliser using Xpath. · An XML specific normaliser using Xpath.
Normaliser using Tika · This service is an integration of Apache Tika project. It enables to extract metadata and text content of many kinds of files format. The WebLab document in input is enriched with RDF properties for the metadata and Text unit(s) for the content. The service can be configured through the Spring bean of CXF to handle various kind of features (identifying language or not, provide a normalised XHTML output of the document...).
Sphinx transcription service · This service rely on Sphinx.
Solr Indexer/Searcher WebLab Web Service · This service implements the Indexer and Searcher interface of WebLab and uses a embedded SOLR engine in order to realize the functions. Unlike the other SolR service, this version does not need having SolR installed. Configuration of the SOLR service can be done through IndexerBean.xml and SearcherBean.xml whereas the SolR instance configuration resides in the solr directory.
Simple Gazetteer · Load Gazetteer files from a folder to annotate WebLab documents.
Simple Resource Container using file system. · It's a simple file system repository. Just configure the file system folder and the component will save and load resources from files in this folder. This implementation is able to save and get resources. The limitation are that it is not assigning an new URI. It hashes the existing one to save the files. It's also not able to get or save sub resources since it uses the hash as key. In most of the application, you'd rather to use the file-repository whitch is able to do so.
RSS splitter using modified ROME · Take a RSS doc and split it into multiple WL annotated docs using ROME RSS Parser.