-
Predicting the level of text standardness in user-generated content [Elektronski vir]Ljubešić, Nikola, 1979- ...Non-standard language as it appears in user-generated content has recently at- tracted much attention. This paper pro- poses that non-standardness comes in two basic varieties, technical and ... linguistic, and develops a machine-learning method to discriminate between standard and non- standard texts in these two dimensions. We describe the manual annotation of a dataset of Slovene user-generated content and the features used to build our re- gression models. We evaluate and dis- cuss the results, where the mean abso- lute error of the best performing method on a three-point scale is 0.38 for tech- nical and 0.42 for linguistic standard- ness prediction. Even when using no language-dependent information sources, our predictor still outperforms an OOV- ratio baseline by a wide margin. In addi- tion, we show that very little manually an- notated training data is required to perform good prediction. Predicting standardness can help decide when to attempt to nor- malise the data to achieve better annota- tion results with standard tools, and pro- vide linguists who are interested in non- standard language with a simple way of selecting only such texts for their research.Source: Proceedings [Elektronski vir] (Str. 371-378)Type of material - conference contribution ; adult, seriousPublish date - 2015Language - englishCOBISS.SI-ID - 58338402
Author
Ljubešić, Nikola, 1979- |
Fišer, Darja, 1978- |
Erjavec, Tomaž, 1960- |
Čibej, Jaka |
Marko, Dafne |
Pollak, Senja, 1980- |
Škrjanec, Iza
Topics
nestandardni jezik |
spletne uporabniške vsebine |
korpusi |
avtomatska mera jezikovne standardnosti |
nadzorovano strojno učenje |
non-standard lagnuage |
user-generated content |
corpora |
automatic language standardness |
measure supervised machine learning
Shelf entry
Permalink
- URL:
Impact factor
Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.
| Year | Impact factor | Edition | Category | Classification | ||||
|---|---|---|---|---|---|---|---|---|
| JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP | |
Impact factor
Select the library membership card:
DRS, in which the journal is indexed
| Database name | Field | Year |
|---|
| Links to authors' personal bibliographies | Links to information on researchers in the SICRIS system |
|---|---|
| Ljubešić, Nikola, 1979- | 36871 |
| Fišer, Darja, 1978- | 26294 |
| Erjavec, Tomaž, 1960- | 05023 |
| Čibej, Jaka | 36914 |
| Marko, Dafne | ![]() |
| Pollak, Senja, 1980- | 31844 |
| Škrjanec, Iza | ![]() |
Select pickup location:
Material pickup by post
Notification
Subject headings in COBISS General List of Subject Headings
Select pickup location
| Pickup location | Material status | Reservation |
|---|
Please wait a moment.
