Large vocabulary continuous speech recognition of an inflected language using stems and endings

ALL libraries (COBIB.SI union bibliographic/catalogue database)

Large vocabulary continuous speech recognition of an inflected language using stems and endings

Rotovnik, Tomaž, telekomunikacije ; Sepesy Maučec, Mirjam ; Kačič, Zdravko

In this article, we focus on creating a large vocabulary speech recognition system for the Slovenian language. Currently, state-of-theart recognition systems are able to use vocabularies with sizes ... of 20,000 to 100,000 words. These systems have mostly been developed for English, which belongs to a group of uninflectional languages. Slovenian, as a Slavic language, belongs to a group of inflectional languages. Its rich morphology presents a major problem in large vocabulary speech recognition. Compared to English, the Slovenian language requires a vocabulary approximately 10 times greater for the same degree of text coverage. Consequently, the difference in vocabulary size causes a high degree of OOV (out-of-vocabulary words). Therefore OOV words have a direct impact on recognizer efficiency. The characteristics of inflectional languages have been considered when developing a new search algorithm with a method for restricting the correct order of sub-word units, and to use separate language models based on sub-words. This search algorithm combines the properties of sub-word-based models (reduced OOV) and word-based models (the length of context). The algorithm also enables better search-spacelimitation for sub-word models. Using sub-word models, we increase recognizer accuracy and achieve a comparable search space to that of a standard word-based recognizer. Our methods were evaluated in experiments on a SNABI speech database.

Source: Speech communication. - ISSN 0167-6393 (Vol. 49, iss. 6, 2007, str. 437-452)

Type of material - article, component part

Publish date - 2007

Language - english

COBISS.SI-ID - 11385366

Keep searching

Author
Rotovnik, Tomaž, telekomunikacije | Sepesy Maučec, Mirjam | Kačič, Zdravko

Availability in libraries

source: Speech communication. - ISSN 0167-6393 (Vol. 49, iss. 6, 2007, str. 437-452)

Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.

Year	Impact factor		Edition		Category		Classification
Year	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Links to authors' personal bibliographies	Links to information on researchers in the SICRIS system
Rotovnik, Tomaž, telekomunikacije	21304
Sepesy Maučec, Mirjam	18168
Kačič, Zdravko	06821

Source: Personal bibliographies and: SICRIS

The material from the parent unit is free. If the material is delivered to the pickup location from another unit, the library may charge you for this service.

Pickup location	Material status	Reservation

Upload image

Shelf entry

Adding material to shelf was successful.

Adding material to shelf failed.

It was not necessary to add the material to the shelf.

Permalink

E-mail

Impact factor

Impact factor

Select the library membership card:

DRS, in which the journal is indexed

Select pickup location:

Material pickup by post

Notification

Citations

Subject headings in COBISS General List of Subject Headings

Select pickup location

Reservation was successful.

Reservation failed.

Reservation...

Bibliographic data

Number of loans

Loan was successful

Loan failed

Loan was successful

Loan failed

Loan was successful

Loan failed

Loan was successful

Loan failed

Theme