The Linguistic Engineering Group
The Linguistic Engineering (LE) Group is part of the Department of Artificial
Intelligence at the Institute of
Computer Science, Polish
Academy of Sciences (ICS PAS).
People
| Leonard Bolc, PhD (Professor Emeritus) |
(Leonard.Bolc@IPIPAN.Waw.PL)
|
| Elżbieta Hajnicz, PhD | (Elzbieta.Hajnicz@IPIPAN.Waw.PL)
|
| Anna Kupść,
PhD (on leave) | (Anna.Kupsc@IPIPAN.Waw.PL )
|
| Małgorzata Marciniak, PhD | (Malgorzata.Marciniak@IPIPAN.Waw.PL)
|
| Agnieszka
Mykowiecka, PhD | (Agnieszka.Mykowiecka@IPIPAN.Waw.PL)
|
| Jakub Piskorski, PhD |
(Jakub.Piskorski@ipipan.waw.pl)
|
| Adam
Przepiórkowski, PhD, Head of the Group | (Adam.Przepiorkowski@IPIPAN.Waw.PL)
|
| Agata
Savary, PhD, Visiting Scholar (2009-2010) | (agata.savary@univ-tours.fr)
|
| Tomek
Strzałkowski, PhD, Foreign Associate | (tomek@cs.albany.edu)
|
| Stan
Szpakowicz, PhD, Foreign Associate | (szpak@site.uottawa.ca)
|
| Aleksander Wawer, MSc (part time) | (Aleksander.Wawer@IPIPAN.Waw.PL)
|
| Marcin
Woliński, PhD | (Marcin.Wolinski@IPIPAN.Waw.PL)
|
| Alina Wróblewska, MSc | (Alina.Wroblewska@IPIPAN.Waw.PL)
|
Research
The main research areas of the Group are:
- (Polish) corpus linguistics; cf. the IPI PAN Corpus of Polish and the National Corpus of Polish;
- syntactic and semantic parsing of Polish; cf. Spejd and Świgra;
- extraction of linguistic knowledge from corpora;
- information extraction;
- sentiment analysis;
- morphosyntactic system of Polish;
- generative linguistic formalisms, esp., HPSG and LFG.
The Group is a member of CLARIN
and FLaReNet.
Current externally funded projects:
- Adaptive problem-solving support system based on text-mining
techniques ‒ national
Ministry of Science and
Higher Education Innovative Economy Operational Programme
(PO IG) grant, 1 January 2010 ‒ 31 December 2013. Polish title:
Adaptacyjny system wspomagający rozwiązywanie
problemów w oparciu o analizę treści dostępnych źródeł
elektronicznych. PI: Jacek Koronacki.
- Construction of a treebank for Polish using automatic
syntactic analysis ‒ a national
Ministry of Science and
Higher Education research grant (number N N104 224735), 14 October 2008 ‒
13 April 2011. Polish title: Budowa banku drzew składniowych dla języka polskiego
z wykorzystaniem automatycznej analizy składniowej. PI: Marcin
Woliński.
- CLARIN (Common Language Resources and Technology Infrastructure) ‒ a European (ESFRI) insfrastructure
project, 1 January 2008 ‒ 31 December 2010. PI at ICS PAS: Adam
Przepiórkowski.
- NKJP (National Corpus of Polish) ‒ a national
Ministry of Science and
Higher Education research/development grant (number R17 003 03), 13 December 2007 ‒
12 December 2010. Polish title: Narodowy Korpus Języka Polskiego. PI: Adam
Przepiórkowski.
Some of our past projects:
- Automatic detection of semantic dependencies within verb
argument structures in large treebanks ‒ a national Ministry of Science and
Higher Education habilitation grant (number
N N516 0165 33), 2 November 2007
‒ 1 November 2009. Polish
title: Automatyczne wykrywanie zależności semantycznych
w strukturze argumentowej czasowników w dużych korpusach
tekstów anotowanych syntaktycznie. PI: Elżbieta Hajnicz.
- LUNA (spoken Language UNderstanding in multilinguAl
communication systems) ‒ a European (IST) Specific Targeted Research
Project (contract number 033549), 4 September 2006 ‒ 3 September
2009. Polish PI: Agnieszka Mykowiecka.
- Spoken language understanding in multilingual communication
systems ‒ a Ministry of Science and
Higher Education support for the Polish participation in the LUNA project, 1 March 2008 ‒
1 September 2009. Polish title:
Rozumienie mowy w wielojęzycznych
systemach komunikacji. PI: Małgorzata Marciniak.
- LT4eL (Language Technology for
eLearning) ‒ a European (IST) Specific Targeted Research
Project (contract number 027391), 1 December 2005 ‒ 31 May 2008.
Polish PI: Adam Przepiórkowski.
- Automatic extraction of linguistic knowledge from a large corpus
of Polish ‒ a national Ministry of Science and
Higher Education research grant (number 3T11C00328), 9 March
2005 ‒ 8 March 2008. Polish title: Automatyczna ekstrakcja wiedzy lingwistycznej z dużego korpusu języka polskiego.
PI: Adam Przepiórkowski. The first publicly available tagger of
Polish, TaKIPI has originally been developed within this project.
- Information Extraction from Polish free text ‒ a national Ministry of Science and
Higher Education research grant (number 3T11C00727), 20 October 2004 ‒ 19 October
2007. Polish title: Opracowanie narzędzi do ekstrakcji
informacji z tekstów w języku polskim. PI: Agnieszka
Mykowiecka.
- The IPI PAN Corpus of Polish
‒ a national KBN grant
(7T11C04320), 1 April 2001 ‒ 31 March 2004. Polish title:
Anotowany korpus pisanego języka polskiego z dostępem przez internet (z uwzględnieniem zastosowań w inżynierii lingwistycznej). PI: Adam Przepiórkowski.
- A Treebank / Test-Suite of Polish
Utterances ‒ a EU CRIT-2
subproject (ICS-MM), 15 October 1997 ‒ 14 October 2000.
Coordinator: Leonard Bolc.
- An HPSG Grammar of Polish (theory and implementation) ‒ a national KBN grant (8T11C01110), 1 January
1996 ‒ 31 December 1998. Polish title:
Zastosowanie metod inżynierii lingwistycznej do
automatycznej analizy i syntezy tekstów języka
polskiego. PI: Leonard Bolc.
Publicly available tools and resources
Here are some of the tools and resources created within our projects.
Tools (all open source, under GPL):
- Świgra
-- a DCG parser,
- Spejd -- a shallow
parsing and disambiguation system,
- TaKIPI -- a morphosyntactic tagger for Polish,
- Poliqarp -- a
corpus indexing and search engine,
- Dendrarium --
a treebank development system (under development),
- Anotatornia
-- a system for multi-level manual annotation of corpora (forthcoming),
- WSDDE
-- a system for designing and performing Word Sense Disambiguation
experiments (forthcoming),
- etc.
Resources:
Other activities
Links to some other activities of the Group:
Creation Date: Monday, June 19, 1995
Last Modified: Sat Jan 23 12:52:56 CET 2010
Maintained by AP.