About: Heritrix

Property	Value
dbo:abstract	هيراتراكس (Heritrix) هو زاحف أرشيف أنترنت، والذي صمم خصيصا للأرشفة ويب. فهو مفتوح المصدر ومكتوب بلغة جافا. الواجهة الرئيسية للوصول باستخدام متصفح ويب، وهناك أداة سطر الأوامر التي يمكن اختياريا استخدامها لبدء بالزحف.طور هيراتراكس بالاشتراك مع أرشيف الإنترنت والمكتبات الوطنية في الشمال المواصفات المكتوبة في أوائل عام 2003. وكان إطلاقه الرسمي في أول يناير 2004، وتم تحسينها باستمرار من قبل العاملين في أرشيف الإنترنت وغيرها من الأطراف المهتمة. (ar) Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls. Heritrix was developed jointly by the Internet Archive and the Nordic national libraries on specifications written in early 2003. The first official release was in January 2004, and it has been continually improved by employees of the Internet Archive and other interested parties. For many years Heritrix was not the main crawler used to crawl content for the Internet Archive's web collection. The largest contributor to the collection, as of 2011, is Alexa Internet. Alexa crawls the web for its own purposes, using a crawler named ia_archiver. Alexa then donates the material to the Internet Archive. The Internet Archive itself did some of its own crawling using Heritrix, but only on a smaller scale. Starting in 2008, the Internet Archive began performance improvements to do its own wide scale crawling, and now does collect most of its content. (en) Heritrix es un rastreador (o crawler) de ficheros web a través de internet. Su licencia es open-source y está escrito completamente en JAVA. Su interfaz de configuración es accesible usando un navegador web, haciéndolo muy versátil y cómodo de usar, aunque también puede ser lanzando desde línea de comandos. Heritrix fue desarrollado conjuntamente por Internet Archive y "Nordic National Libraries" a principios de 2003. La primera versión fue publicada en enero de 2004 y ha sido continuamente actualizado por los miembros de Internet Archive y terceras partes. (es) Heritrix est un robot d'indexation conçu et utilisé par Internet Archive pour l'archivage du web. C'est un logiciel libre programmé en langage Java. Son interface principale est accessible depuis un navigateur web, mais un outil en interpréteur de commandes peut aussi être optionnellement utilisé pour lancer l'indexation. Heritrix a été développé conjointement par Internet Archive et les Bibliothèques Nationales Nordiques en 2003. Sa première publication officielle a eu lieu en janvier 2004, et il a depuis été continuellement amélioré par les membres d'Internet Archive et par des tiers intéressés. (fr) Heritrix はインターネット・アーカイブが開発したウェブアーカイブのためのWebクローラーの一種。Java言語で実装され、フリーソフトウェアライセンスにより自由に利用できる。主にウェブブラウザを使って操作するが、コマンドラインツールを使ってクロールを開始するなどの操作も可能である。名前は「(女性の)相続人」を意味するheiressの古語に由来する。 Heritrixの開発は、2003年にまとめられた仕様に基づいて、インターネット・アーカイブとNordic National Librariesの共同で行われた。最初のリリースは2004年1月で、その後インターネット・アーカイブの従業員や外部のウェブアーカイブに関心を持つ人々によって継続的に改良が続けられている。もっともHeritrixがインターネット・アーカイブ自身のウェブ収集に使われるようになったのはかなり後のことである。かつてはアーカイブの大半はアレクサ・インターネット社から提供されていた。アレクサ社は自身の業務に供するため独自のia_archiverと呼ばれるクローラーを使ってウェブ収集を行っており、収集したデータをインターネット・アーカイブに寄贈している。当初インターネット・アーカイブ自身もHeritrixを使って収集を行ってはいたが、小規模なものに留まっていた。 2008年からインターネット・アーカイブは自身の全ウェブ規模のクローリングの性能を向上させ、現在では自身で収集したものが大半を占めるようになっている。 (ja)
dbo:genre	dbr:Web_crawler
dbo:license	dbr:Apache_License
dbo:thumbnail	wiki-commons:Special:FilePath/Heritrix_logo.png?width=300
dbo:wikiPageExternalLink	http://archive-access.sourceforge.net/projects/wayback/ http://crawler.archive.org/faq.html%23windows http://webmasters.stackexchange.com/a/690/21219 http://www.iwaw.net/05/papers/iwaw05-sigurdsson.pdf http://www.webtechniques.com/archives/1997/05/burner/ https://webarchive.jira.com/wiki/display/Heritrix/Heritrix http://archive-access.sourceforge.net/projects/nutch/ http://archive-access.sourceforge.net/projects/wera/ http://www.iwaw.net/04/Mohr.pdf https://archive.org/web/researcher/ArcFileFormat.php https://archive.org/web/researcher/cdx_legend.php https://web.archive.org/web/20060111160619/http:/wiki.lib.umn.edu/DI2/HowToCrawl https://web.archive.org/web/20080101070319/http:/www.webtechniques.com/archives/1997/05/burner/ https://web.archive.org/web/20110612183924/http:/www.iwaw.net/05/papers/iwaw05-sigurdsson.pdf https://web.archive.org/web/20110612184035/http:/www.iwaw.net/04/Mohr.pdf
dbo:wikiPageID	5681427 (xsd:integer)
dbo:wikiPageLength	9318 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1096273807 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Royal_Library_of_the_Netherlands dbr:Bibliotheca_Alexandrina dbr:Bibliothèque_nationale_de_France dbc:Free_web_crawlers dbc:Web_archiving dbr:Unix-like dbr:CiteSeerX dbr:Apache_License dbr:Library_and_Archives_Canada dbr:Library_of_Congress dbr:Linux dbr:Austrian_National_Library dbr:British_Library dbr:Web_browser dbr:Web_crawler dbr:Wget dbr:Alexa_Internet dbr:Free_software_license dbr:HTTP_header dbr:Internet_Archive dbr:Internet_Memory_Foundation dbr:Java_(programming_language) dbr:ARC_(file_format) dbc:2014_software dbr:Web_ARChive dbr:Microsoft_Windows dbr:National_Library_of_Israel dbr:National_Library_of_New_Zealand dbr:National_and_University_Library_of_Iceland dbr:Smithsonian_Institution_Archives dbr:National_Digital_Information_Infrastructure_and_Preservation_Program dbr:National_Library_of_Finland dbr:Web_archiving dbr:Command-line dbr:File:Heritrix_logo.png
dbp:caption	Screenshot of Heritrix Admin Console. (en)
dbp:genre	dbr:Web_crawler
dbp:license	dbr:Apache_License
dbp:logo	145 (xsd:integer)
dbp:name	Heritrix (en)
dbp:operatingSystem	dbr:Unix-like dbr:Linux dbr:Microsoft_Windows
dbp:programmingLanguage	dbr:Java_(programming_language)
dbp:revision	531730721 (xsd:integer)
dbp:screenshot	Heritrix 3.4.0 Web UI.png (en)
dbp:screenshotSize	250 (xsd:integer)
dbp:sourcearticle	Re: Control over the Internet Archive besides just “Disallow /”? (en)
dbp:sourcepath	http://webmasters.stackexchange.com/a/690/21219
dbp:wikiPageUsesTemplate	dbt:Web_crawlers dbt:CCBYSASource dbt:Citation_needed dbt:Cite_conference dbt:Cite_journal dbt:Failed_verification dbt:Infobox_software dbt:Portal dbt:Refbegin dbt:Refend dbt:Short_description dbt:Start_date_and_age dbt:URL dbt:Wikidata dbt:Internet_Archive_navbox
dbp:wordnet_type	http://www.w3.org/2006/03/wn/wn20/instances/synset-software-noun-1
dcterms:subject	dbc:Free_web_crawlers dbc:Web_archiving dbc:2014_software
gold:hypernym	dbr:Crawler
rdf:type	owl:Thing dbo:Software schema:CreativeWork dbo:Work wikidata:Q386724 wikidata:Q7397 yago:Abstraction100002137 yago:Code106355894 yago:CodingSystem106353757 yago:Communication100033020 yago:Program106568978 yago:WikicatInternetSearchEngines yago:Writing106359877 yago:WrittenCommunication106349220 yago:SearchEngine106578654 yago:Software106566077
rdfs:comment	هيراتراكس (Heritrix) هو زاحف أرشيف أنترنت، والذي صمم خصيصا للأرشفة ويب. فهو مفتوح المصدر ومكتوب بلغة جافا. الواجهة الرئيسية للوصول باستخدام متصفح ويب، وهناك أداة سطر الأوامر التي يمكن اختياريا استخدامها لبدء بالزحف.طور هيراتراكس بالاشتراك مع أرشيف الإنترنت والمكتبات الوطنية في الشمال المواصفات المكتوبة في أوائل عام 2003. وكان إطلاقه الرسمي في أول يناير 2004، وتم تحسينها باستمرار من قبل العاملين في أرشيف الإنترنت وغيرها من الأطراف المهتمة. (ar) Heritrix es un rastreador (o crawler) de ficheros web a través de internet. Su licencia es open-source y está escrito completamente en JAVA. Su interfaz de configuración es accesible usando un navegador web, haciéndolo muy versátil y cómodo de usar, aunque también puede ser lanzando desde línea de comandos. Heritrix fue desarrollado conjuntamente por Internet Archive y "Nordic National Libraries" a principios de 2003. La primera versión fue publicada en enero de 2004 y ha sido continuamente actualizado por los miembros de Internet Archive y terceras partes. (es) Heritrix is a web crawler designed for web archiving. It was written by the Internet Archive. It is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls. Heritrix was developed jointly by the Internet Archive and the Nordic national libraries on specifications written in early 2003. The first official release was in January 2004, and it has been continually improved by employees of the Internet Archive and other interested parties. (en) Heritrix est un robot d'indexation conçu et utilisé par Internet Archive pour l'archivage du web. C'est un logiciel libre programmé en langage Java. Son interface principale est accessible depuis un navigateur web, mais un outil en interpréteur de commandes peut aussi être optionnellement utilisé pour lancer l'indexation. (fr) Heritrix はインターネット・アーカイブが開発したウェブアーカイブのためのWebクローラーの一種。Java言語で実装され、フリーソフトウェアライセンスにより自由に利用できる。主にウェブブラウザを使って操作するが、コマンドラインツールを使ってクロールを開始するなどの操作も可能である。名前は「(女性の)相続人」を意味するheiressの古語に由来する。 Heritrixの開発は、2003年にまとめられた仕様に基づいて、インターネット・アーカイブとNordic National Librariesの共同で行われた。最初のリリースは2004年1月で、その後インターネット・アーカイブの従業員や外部のウェブアーカイブに関心を持つ人々によって継続的に改良が続けられている。もっともHeritrixがインターネット・アーカイブ自身のウェブ収集に使われるようになったのはかなり後のことである。かつてはアーカイブの大半はアレクサ・インターネット社から提供されていた。アレクサ社は自身の業務に供するため独自のia_archiverと呼ばれるクローラーを使ってウェブ収集を行っており、収集したデータをインターネット・アーカイブに寄贈している。当初インターネット・アーカイブ自身もHeritrixを使って収集を行ってはいたが、小規模なものに留まっていた。 (ja)
rdfs:label	هريتركس (ar) Heritrix (es) Heritrix (en) Heritrix (fr) Heritrix (ja)
owl:sameAs	freebase:Heritrix wikidata:Heritrix dbpedia-ar:Heritrix dbpedia-es:Heritrix dbpedia-fi:Heritrix dbpedia-fr:Heritrix dbpedia-ja:Heritrix https://global.dbpedia.org/id/2sR1g
prov:wasDerivedFrom	wikipedia-en:Heritrix?oldid=1096273807&ns=0
foaf:depiction	wiki-commons:Special:FilePath/Heritrix_3.4.0_Web_UI.png wiki-commons:Special:FilePath/Heritrix_logo.png
foaf:homepage	https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
foaf:isPrimaryTopicOf	wikipedia-en:Heritrix
foaf:name	Heritrix (en)
is dbo:wikiPageWikiLink of	dbr:List_of_Web_archiving_initiatives dbr:Wayback_Machine dbr:Web_crawler dbr:Webarchiv dbr:International_Internet_Preservation_Consortium dbr:Internet_Archive dbr:Internet_Memory_Foundation dbr:ARC_(file_format) dbr:Web_ARChive dbr:Australian_Web_Archive dbr:National_and_University_Library_of_Iceland dbr:Web_archiving dbr:PADICAT
is foaf:primaryTopic of	wikipedia-en:Heritrix