An Entity of Type: disease, from Named Graph: http://dbpedia.org, within Data Space: dbpedia.org

Automatic indexing is the computerized process of scanning large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and using those controlled terms to quickly and effectively index large electronic document depositories. These keywords or language are applied by training a system on the rules that determine what words to match. There are additional parts to this such as syntax, usage, proximity, and other algorithms based on the system and what is required for indexing. This is taken into account using Boolean statements to gather and capture the indexing information out of the text. As the number of documents exponentially increases with the proliferation of the Internet, automatic indexing will become essential to maintaining the ability to find releva

Property Value
dbo:abstract
  • Automatická indexace je automatický proces redukce textu pomocí počítačového programu za účelem získání relevantních pojmů, které výstižně charakterizují jeho obsah. Vybrané pojmy se mohou nechat v přirozeném jazyce, nebo jsou přiřazeny k heslům řízeného slovníku selekčních jazyků. Takto vybrané výsledky se částečně překrývají s výstupy, které vytvořili lidští indexátoři, avšak určitá část výsledků, které nejsou shodné, je podrobena dalšímu výzkumu. Ten by se měl pokusit najít odpovědět, jak tyto postupy sjednotit a pomoci zpřesnit automatickou indexaci, tak jak to umí lidští indexátoři. Nicméně nejedná se pouze o jedinou technologii. Je zde využita řada dalších metod. Jde o kombinaci indexovacích algoritmů, statistických měření, jazykových analýz atd. Algoritmy mohou být nastaveny i tak, že neprozkoumávají pouze plný text, ale využívají i struktury daného dokumentu, jako jsou nadpisy, záhlaví a odstavce. Automatická indexace potřebuje vysoký výpočetní výkon (cs)
  • Automatic indexing is the computerized process of scanning large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and using those controlled terms to quickly and effectively index large electronic document depositories. These keywords or language are applied by training a system on the rules that determine what words to match. There are additional parts to this such as syntax, usage, proximity, and other algorithms based on the system and what is required for indexing. This is taken into account using Boolean statements to gather and capture the indexing information out of the text. As the number of documents exponentially increases with the proliferation of the Internet, automatic indexing will become essential to maintaining the ability to find relevant information in a sea of irrelevant information. Natural language systems are used to train a system based on seven different methods to help with this sea of irrelevant information. These methods are Morphological, Lexical, Syntactic, Numerical, Phraseological, Semantic, and Pragmatic. Each of these look and different parts of speed and terms to build a domain for the specific information that is being covered for indexing. This is used in the automated process of indexing. The automated process can encounter problems and these are primarily caused by two factors: 1) the complexity of the language; and, 2) the lack intuitiveness and the difficulty in extrapolating concepts out of statements on the part of the computing technology. These are primarily linguistic challenges and specific problems involve semantic and syntactic aspects of language. These problems occur based on defined keywords. With these keywords you are able to determine the accuracy of the system based on Hits, Misses, and Noise. These terms relate to exact matches, keywords that a computerized system missed that a human wouldn't, and keywords that the computer selected that a human would not have. The Accuracy statistic based on this should be above 85% for Hits out of 100% for human indexing. This puts Misses and Noise combined to be 15% or less. This scale provides a basis for what is considered a good Automatic Indexing System and shows where problems are being encountered. (en)
  • 自动标引(英語:Automatic Indexing)包括关键词自动提取(又称自动抽词标引)与自动两种类型。关键词自动提取是一种识别有意义且具有代表性片段或词汇的自动化技术。关键词自动提取在文本挖掘域被称为关键词抽取(英語:Keyword Extraction),在计算语言学领域通常着眼于术语自动识别(英語:Automatic Term Recognition),在訊息检索领域,就是指自动标引。自动标引属于文本訊息抽取的范畴。文本訊息抽取是从文本数据中抽取人们关注的特定的訊息。 (zh)
dbo:wikiPageID
  • 28204669 (xsd:integer)
dbo:wikiPageLength
  • 9105 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID
  • 1107344169 (xsd:integer)
dbo:wikiPageWikiLink
dbp:wikiPageUsesTemplate
dcterms:subject
gold:hypernym
rdf:type
rdfs:comment
  • 自动标引(英語:Automatic Indexing)包括关键词自动提取(又称自动抽词标引)与自动两种类型。关键词自动提取是一种识别有意义且具有代表性片段或词汇的自动化技术。关键词自动提取在文本挖掘域被称为关键词抽取(英語:Keyword Extraction),在计算语言学领域通常着眼于术语自动识别(英語:Automatic Term Recognition),在訊息检索领域,就是指自动标引。自动标引属于文本訊息抽取的范畴。文本訊息抽取是从文本数据中抽取人们关注的特定的訊息。 (zh)
  • Automatická indexace je automatický proces redukce textu pomocí počítačového programu za účelem získání relevantních pojmů, které výstižně charakterizují jeho obsah. Vybrané pojmy se mohou nechat v přirozeném jazyce, nebo jsou přiřazeny k heslům řízeného slovníku selekčních jazyků. Takto vybrané výsledky se částečně překrývají s výstupy, které vytvořili lidští indexátoři, avšak určitá část výsledků, které nejsou shodné, je podrobena dalšímu výzkumu. Ten by se měl pokusit najít odpovědět, jak tyto postupy sjednotit a pomoci zpřesnit automatickou indexaci, tak jak to umí lidští indexátoři. (cs)
  • Automatic indexing is the computerized process of scanning large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and using those controlled terms to quickly and effectively index large electronic document depositories. These keywords or language are applied by training a system on the rules that determine what words to match. There are additional parts to this such as syntax, usage, proximity, and other algorithms based on the system and what is required for indexing. This is taken into account using Boolean statements to gather and capture the indexing information out of the text. As the number of documents exponentially increases with the proliferation of the Internet, automatic indexing will become essential to maintaining the ability to find releva (en)
rdfs:label
  • Automatická indexace (cs)
  • Automatic indexing (en)
  • 自动标引 (zh)
owl:sameAs
prov:wasDerivedFrom
foaf:isPrimaryTopicOf
is dbo:wikiPageWikiLink of
is foaf:primaryTopic of
Powered by OpenLink Virtuoso    This material is Open Knowledge     W3C Semantic Web Technology     This material is Open Knowledge    Valid XHTML + RDFa
This content was extracted from Wikipedia and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License