About: Tf

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: Tf–idf Goto Sponge NotDistinct Permalink

An Entity of Type : yago:WikicatRankingFunctions, within Data Space : dbpedia.org associated with source document(s)

Type:

http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FTf%E2%80%93idf

In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. tf–idf is one of the most popular term-weighting schemes today. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries use tf–idf.

Attributes	Values
rdf:type	yago:Abbreviation107091587 yago:Abstraction100002137 yago:Act100030358 yago:Activity100407535 yago:Algorithm105847438 yago:Code106355894 yago:CodingSystem106353757 yago:Communication100033020 yago:Event100029378 yago:Form106290637 yago:Function113783816 yago:LanguageUnit106284225 yago:MathematicalRelation113783581 yago:Part113809207 yago:Procedure101023820 yago:Program106568978 yago:PsychologicalFeature100023100 yago:Relation100031921 yago:WikicatFunctionsAndMappings yago:WikicatInternetSearchEngines yago:Word106286395 yago:Writing106359877 yago:WrittenCommunication106349220 yago:YagoPermanentlyLocatedEntity yago:Rule105846932 yago:SearchEngine106578654 yago:Software106566077 yago:WikicatAbbreviations yago:WikicatAlgorithmsOnStrings yago:WikicatRankingFunctions
rdfs:label	تي اف-اي دي دف (ar) Tf-idf (ca) Tf-idf (cs) Tf-idf-Maß (de) Tf-idf (es) Tf–idf (eu) Tf–idf (in) Tf-idf (it) TF-IDF (fr) Tf-idf (ja) Tf-idf (ko) TFIDF (pl) Tf–idf (pt) Tf–idf (en) TF-IDF (ru) TF-IDF (uk) Tf-idf (zh)
rdfs:comment	Tf-idf je metodika hodnocení relevance při vyhledávání textu. Název je spojením zkratek dvou termínů: * Term Frequency – četnost slova v dokumentu * Inverse document frequency – převrácená četnost slova ve všech dokumentech (cs) معامل التي اف-أي دي دف (تردد المصطلح-معكوس تردد الوثيقة (TF-IDF)) هو معامل غالبا ما يستخدم في استرجاع المعلومات . هذا المعامل هو مقياس إحصائي يستخدم لتقييم مدى أهمية وجود كلمة في مستند معين في ذخيرة النصوص. الأهمية تزيد نسبيا بزيادة عدد مرات ظهور الكلمة أو المصطلح في المستند ولكن تُقَأبَل بتردد الكلمة في الذخيرة بشكل عام. غالبا ما تستخدم الأشكال المختلفة للمعامل، بواسطة محركات البحث كأداة مركزية في لتقييم وترتب الوثائق حسب الصلة وفقا لاستعلام المستخدم. واحدة من أبسط تحسب بواسطة جمع المعامل لكل مصطلح من مصطلحات الاستعلام؛ العديد من دوال الترتيب الأكثر تطورا هو شكل من أشكال هذا النموذج البسيط. (ar) Das Tf-idf-Maß (von englisch term frequency ‚Vorkommenshäufigkeit‘ und inverse document frequency ‚inverse Dokumenthäufigkeit‘) ist ein statistisches Maß, das im Information Retrieval zur Beurteilung der Relevanz von Termen in Dokumenten einer Dokumentenkollektion eingesetzt wird. Mit der so errechneten Gewichtung eines Wortes bezüglich des Dokuments, in welchem es enthalten ist, können Dokumente als Suchtreffer einer wortbasierten Suche besser in der Trefferliste angeordnet werden, als es beispielsweise über die Termfrequenz allein möglich wäre. (de) Le TF-IDF (de l'anglais term frequency-inverse document frequency) est une méthode de pondération souvent utilisée en recherche d'information et en particulier dans la fouille de textes. Cette mesure statistique permet d'évaluer l'importance d'un terme contenu dans un document, relativement à une collection ou un corpus. Le poids augmente proportionnellement au nombre d'occurrences du mot dans le document. Il varie également en fonction de la fréquence du mot dans le corpus. Des variantes de la formule originale sont souvent utilisées dans des moteurs de recherche pour apprécier la pertinence d'un document en fonction des critères de recherche de l'utilisateur. (fr) Dalam , tf–idf, TFIDF, atau TFIDF (singkatan dari bahasa Inggris: term frequency–inverse document frequency, bahasa Indonesia: frekuensi istilah–inversi frekuensi dokumen) adalah ukuran statistik yang menggambarkan pentingnya suatu istilah terhadap sebuah dokumen dalam sebuah kumpulan atau korpus. Ukuran ini sering dipakai sebagai dalam pencarian temu balik informasi, penambangan teks, dan . Nilai tf–idf bertambah sebanding dengan jumlah kemunculan istilah dalam dan bergantung pada jumlah dokumen dalam korpus yang memiliki istilah tersebut. (in) La funzione di peso tf-idf (term frequency–inverse document frequency) è una funzione utilizzata in information retrieval per misurare l'importanza di un termine rispetto ad un documento o ad una collezione di documenti. Tale funzione aumenta proporzionalmente al numero di volte che il termine è contenuto nel documento, ma cresce in maniera inversamente proporzionale con la frequenza del termine nella collezione. L'idea alla base di questo comportamento è di dare più importanza ai termini che compaiono nel documento, ma che in generale sono poco frequenti. (it) TF-IDF (от англ. TF — term frequency, IDF — inverse document frequency) — статистическая мера, используемая для оценки важности слова в контексте документа, являющегося частью коллекции документов или корпуса. Вес некоторого слова пропорционален частоте употребления этого слова в документе и обратно пропорционален частоте употребления слова во всех документах коллекции. Мера TF-IDF часто используется в задачах анализа текстов и информационного поиска, например, как один из критериев релевантности документа поисковому запросу, при расчёте меры близости документов при кластеризации. (ru) tf-idf（英語：term frequency–inverse document frequency）是一種用於資訊檢索與文本挖掘的常用加權技術。tf-idf是一種統計方法，用以評估一字詞對於一個文件集或一個語料庫中的其中一份文件的重要程度。字詞的重要性隨著它在文件中出現的次數成正比增加，但同時會隨著它在語料庫中出現的頻率成反比下降。tf-idf加權的各種形式常被搜索引擎應用，作為文件與用戶查詢之間相關程度的度量或評級。除了tf-idf以外，互聯網上的搜尋引擎還會使用基於連結分析的評級方法，以確定文件在搜尋結果中出現的順序。 (zh) Tf-idf (de l'anglès Term frequency – inverse document frequency) és un terme utilitzat en anàlisi de text quantitativa, i és la freqüència d'ocurrència del terme en un document concret en relació a la presència que el terme té en el conjunt de documents analitzats. És una mesura numèrica que expressa com és de rellevant una paraula en un document d'una col·lecció, i que per tant, defineix les paraules més característiques d'un document. Aquesta mesura s'utilitza sovint com un factor de ponderació en la recuperació d'informació. El valor tf-idf augmenta segons el nombre de vegades que una paraula apareix en un document i no en altres, el que permet esbrinar quines paraules són més comuns en aquest text respecte els altres. (ca) Tf-idf (del inglés Term frequency – Inverse document frequency), frecuencia de término – frecuencia inversa de documento (o sea, la frecuencia de ocurrencia del término en la colección de documentos), es una medida numérica que expresa cuán relevante es una palabra para un documento en una colección. Esta medida se utiliza a menudo como un factor de ponderación en la recuperación de información y la minería de texto. El valor tf-idf aumenta proporcionalmente al número de veces que una palabra aparece en el documento, pero es compensada por la frecuencia de la palabra en la colección de documentos, lo que permite manejar el hecho de que algunas palabras son generalmente más comunes que otras. (es) Informazioaren berreskurapenean, tf–idf edo TFIDF, terminoen maiztasuna–alderantzizko dokumentu maiztasuna (ingelesez term frequency–inverse document frequency), zenbakizko estatistika bat da eta hitz bat zein garrantzitsua den adieraztea du helburu, dokumentu bilduma batean edo corpus batean. Sarri erabili oi da ponderazio-faktore modura, informazioaren berreskurapeneko bilaketetan, testu-meatzaritzan, eta erabiltzaile modelaketa.Tf-idf balioa proportzionalki hazten da hitz bat dokumentuan agertzen den kopuruarekiko, eta hitzaren corpuseko maiztasunekin orekatzen da, zeinak hitz batzuk, oro har sarriago agertzen direla erakusten duen. Gaur egun, tf-idf da termino-ponderazio-eskema ezagunenetako bat; liburutegi digitaletan, testuetan oinarritutako gomendio-sistemen % 83k erabiltzen du tf-i (eu) In information retrieval, tf–idf (also TFIDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. tf–idf is one of the most popular term-weighting schemes today. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries use tf–idf. (en) 情報検索の分野において、tf–idf (または、 TF*IDF、TFIDF、TF–IDF、Tf–idf)は、term frequency–inverse document frequencyの略であり、コーパスや収集された文書群において、ある単語がいかに重要なのかを反映させることを意図した統計量（数値）である。また、tf-idfは情報検索や、テキストマイニング、におけるにもよく用いられる。ある単語のtf-idfの値は文書内におけるその単語の出現回数に比例して増加し、また、その単語を含むコーパス内の文書数によってその増加が相殺される。この性質は、一般にいくつかの単語はより出現しやすいという事実をうまく調整することに役立っている。今日、tf-idfはもっとも有名な語の重みづけ(term-weighting)手法である。2015年に行われた研究では、電子図書館におけるテキストベースのレコメンダシステムのうち83%がtf-idfを利用していたことがわかった。 tf-idfの重み付け手法を変形したものは、ユーザーのクエリ（検索ワード）から文書の適合性を得点化し、順位づけする際の中心的なツールとして、よく検索エンジンで用いられている。tf-idfは、自動要約や文書分類といった様々な分野において、によるフィルタリングを行うことでうまく動作できる。 (ja) TF-IDF(Term Frequency - Inverse Document Frequency)는 정보 검색과 에서 이용하는 가중치로, 여러 문서로 이루어진 문서군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다. 문서의 를 추출하거나, 검색 엔진에서 검색 결과의 순위를 결정하거나, 문서들 사이의 비슷한 정도를 구하는 등의 용도로 사용할 수 있다. TF(단어 빈도, term frequency)는 특정한 단어가 문서 내에 얼마나 자주 등장하는지를 나타내는 값으로, 이 값이 높을수록 문서에서 중요하다고 생각할 수 있다. 하지만 단어 자체가 문서군 내에서 자주 사용되는 경우, 이것은 그 단어가 흔하게 등장한다는 것을 의미한다. 이것을 DF(문서 빈도, document frequency)라고 하며, 이 값의 역수를 IDF(역문서 빈도, inverse document frequency)라고 한다. TF-IDF는 TF와 IDF를 곱한 값이다. (ko) TFIDF (ang. TF – term frequency, IDF – inverse document frequency) – ważenie częstością termów – odwrotna częstość w dokumentach – jedna z metod obliczania wagi słów w oparciu o liczbę ich wystąpień, należąca do grupy algorytmów obliczających . Każdy dokument reprezentowany jest przez wektor, składający się z wag słów występujących w tym dokumencie. TFIDF informuje o częstości wystąpienia termów uwzględniając jednocześnie odpowiednie wyważenie znaczenia lokalnego termu i jego znaczenia w kontekście pełnej kolekcji dokumentów. Wartość TF-IDF oblicza się ze wzoru: gdzie: (pl) O valor tf–idf (abreviação do inglês term frequency–inverse document frequency, que significa frequência do termo–inverso da frequência nos documentos), é uma medida estatística que tem o intuito de indicar a importância de uma palavra de um documento em relação a uma coleção de documentos ou em um corpus linguístico. Ela é frequentemente utilizada como fator de ponderação na recuperação de informações e na mineração de dados. (pt) TF-IDF (від англ. TF — term frequency, IDF — inverse document frequency) — статистичний показник, що використовується для оцінки важливості слів у контексті документа, що є частиною колекції документів чи корпусу. Вага (значимість) слова пропорційна кількості вживань цього слова у документі, і обернено пропорційна частоті вживання слова у інших документах колекції. Найпростішу функцію ранжування можна визначити як суму TF-IDF кожного терміну в запиті. Більшість просунутих функцій ранжування ґрунтуються на цій простій моделі. (uk)
foaf:depiction
dcterms:subject	Statistical natural language processing Ranking functions Vector space model
Wikipage page ID	2057290 (xsd:integer)
Wikipage revision ID	1123031029 (xsd:integer)
Link from a Wikipage to another Wikipage	Probability distribution Proportionality (mathematics) Scikit-learn Noun phrase Boolean data type Information retrieval Conditional entropy McGraw-Hill Okapi BM25 Gensim Mutual information Logarithmic scale Zipf's law Statistical natural language processing Ranking functions Document Latent Dirichlet allocation Latent semantic analysis PageRank Text corpus Search engine Probability theory Relevance (information retrieval) Hans Peter Luhn Apache Lucene Karen Spärck Jones Heuristic Word count Automatic summarization Frequency (statistics)

Faceted Search & Find service v1.17_git139 as of Feb 29 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3330 as of Mar 19 2024, on Linux (x86_64-generic-linux-glibc212), Single-Server Edition (378 GB total memory, 54 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software