In ePol we build and maintain an infrastructure for supporting qualitative and quantitative Content Analysis (CA), the Leipzig Corpus Miner (LCM) . This integrated application of differnt technologies was built by the NLP Group at the University of Leipzig. The infrastructure aims at the integration of “close reading” procedures on individual documents with procedures of “distant reading”, e.g. lexical characteristics of large document collections. Therefore information retrieval systems, lexicometric statistics and machine learning procedures are combined in a coherent framework which enables qualitative data analysts to make use of state-of-the-art Natural Language Processing (NLP) techniques on very large document collections. Applicability of the framework ranges from social sciences to media studies and market research. The LCM is more of an infrastructure in contrast to complete software packages. The LCM is a combination of different technologies which provide a qualitative data analysis environment accessible by an interface targeted towards domain experts unfamiliar with NLP. Analysts are put in a position to work on their data with more methodical rather than technical understanding of the algorithms. Applied technologies behind the user interface need to support analysts in tasks such as data storage, retrieval, processing and presentation. We integrate technologies such as UIMA, SOLR, MongoDB and Glassfish to create a distributed multi-tier environment capable to process and store the 3.5 million text documents of our research corpus.
We also address the application of such methods by using R and its Text Mining capabilities . This open source platform allows for rapid prototyping of ideas and for immediate discussions about results and methodology. We also used R to train and educate scholars in Text Mining methods for the social sciences .
For contact and information on the LCM please contact the NLP Group at University of Leipzig.
- Gregor Wiedeman – [gregor.wiedemann] [at] [uni-leipzig.de]
- Andreas Niekler – [aniekler] [at] [informatik.uni-leipzig.de]
 Andreas Niekler, Gregor Wiedemann und Gerhard Heyer: Leipzig Corpus Miner – A Text Mining Infrastructure for Qualitative Data Analysis. In: Terminology and Knowledge Engineering 2014 (TKE 2014), Berlin, 2014 read