I am glad to announce the first release of the PhraseNET Database Toolbox, an Academic freeware application software for retrieval and extraction phraseological units in Spanish. This release is just a bug fix release. No new features were added. This software is based on my PhD thesis PhraseNET Automated Detection and Extraction of Phraseological Units - Universitat Politecnica de Valencia (2011)
PhraseNET Database Toolbox is a short term for PhraseNET Database Toolbox and Numerical Methods for Retrieval and Extraction of Phraseological Units representing more broadly the key tags of this application. In short, information extraction is a system capable of extracting from relevant documents only the information required, in this case phraseological units. Information retrieval is an ordered storage system of documents in a database.
The methodology of PhraseNET Database Toolbox was based on the SMART system, designed by Salton in 1964. The SMART system uses the vector space model for thematic classification of documents as well as the technique of relevance feedback to refine the process of recovery of the information.
This is the most widespread theoretical model in information retrieval, called the vector space model that Salton (1983) formed by a matrix term / document that represents the database where each document is represented by a vector of n elements, where n is the number of indexed terms in the entire collection of documents likely to continue on any item in the collection.
To each vector element is assigned a numerical value corresponding to the importance of the term in the document, from 0 to 1 if the document does not collect such a term or if the weight value assigned to the term does not exist. Currently many continue using SMART techniques for managing document retrieval in databases.
It is not an easy task to look for the linguistic equivalences of the phraseological units of two languages. We consider it a very relevant fact to design and implement a tool able to detect variations in language, i.e. changes due to verbal tenses, plurals, gender etc. The tool that we propose identifies the phraseological units of a textual corpus and looks for their equivalent in other languages. The novelty of the tool we have designed is that it detects the units even when they vary their representation in the text.
The core of the automatic system of the phraseological unit extraction is an algorithm based on a corpus, which provides a list of all the units after a constrative analysis with a dictionary of lexical patterns. The main advantage of this method, compared with others, is that it does not require a very specialized knowledge of Phraseology.
Nevertheless, this process entails some difficulties when adapted to the extraction of units from other languages, difficulties that are inherent to the methodology of IE. As a consequence, PhraseNET is constantly evolving and we are regularly implementing some aspects.
The objectives that we consider in this study are, on the one hand, to design a tool that allows us to detect phraseological units not taking into account their linguistic expression. On the other hand, to detect the phraseological units in the texts with examples that can identify their location in the corpus. Finally, to identify the same patterns in other languages.
Once designing the tool and describing its different parts and its utilities, we conclude that PhraseNET can extract the following variations of the phraseological units: morphologic, syntactic, lexical, diatopic, diastratic and diafasic, internal modifications (as the reduction of the phraseological units with the elimination or addition of the components) and the external, in the periphery. We are conscious that this study could include some aspects that we have not mentioned, but we have, at the moment, delimitated the basic aspects of the tool in order to improve its characteristics in the future.
At present, we can provide access to all resources from a desktop computer running on the Windows environment and connected to the Internet; even if you are not connected yet, it will have access to most resources. So we offer many resources at a distance for those who want to use the Internet exclusively, regardless of operating environment (Windows or Android) by mobile phone or tablet.
Currently the technology is increasingly present in the different everyday areas. Thus people have migrated to the online world. Developing applications that access the Internet from any device or location is no longer an option for these users and has become a necessity in an increasingly competitive market and when seeking information.
In the last few years, Internet access by mobile devices like smartphones and tablets has grown exponentially than desktops or notebooks.
How can a website that serves a huge range of different devices for Internet access be developed? Responsive design is the answer. Responsive design aims to solve this problem by presenting a layout adaptable to the most diverse types of devices on the market today.
Responsive web design is a new approach to website design, ensuring users have a good viewing experience with all types of device. The aim is not to look good only on a desktop screen, but also on tablets, iPhones and smartphones.
The application resources are based on Web pages, and some tools enable the user to save the information gathered from web pages to the database and retrieve it later.
According to the http://gs.statcounter.com/, comparing worldwide Internet usage from 2009 to 2016, in 2009 the Internet usage with mobile and tablet devices was 0%, while desktop computers represented 100% of the Internet usage worldwide, but year after year the mobile-based web visits are increasing while the desktop web visits are decreasing. In 2016, the mobile-based web visits represented 51.5% while desktop web visits represented 48.7%.
If web visits are for searches, messages, business, social networks, marketplaces and ecommerce, why not develop a website for academic purposes also, for Internet running under a mobile device?
Keeping in mind the above, the third module, named the Phrasenet Database Toolbox for Mobiles, was built based on Responsive Web Design and runs exclusively on the Web.
For Apps running online only we will make available a corpus we call CHADES created in 1996 whose main focus is the Journalism and Spanish-American Literature recording the diversity of speech of the Americas. At the present time it has about five million words, based on texts not yet converted to electronic form (scanned) and electronic texts found on the Internet. Most of the corpus is based on journalistic publications: 70%. The books are second with just 26%. Magazines occupy a modest third place, remaining at just 4%. CHADES contains notes on the information available: the name of the authors, the article title, page, paragraph, chapter, and the date of publishing.