About: Unicode equivalence

Property	Value
dbo:abstract	Cet article traite des équivalences Unicode. Unicode contient de nombreux caractères. Pour maintenir la compatibilité avec des standards existants, certains d’entre eux sont équivalents à d’autres caractères ou à des séquences de caractères. Unicode fournit deux notions d’équivalence : canonique et de compatibilité, la première étant un sous-ensemble de la deuxième. Par exemple, le caractère n suivi du diacritique tilde ◌̃ est canoniquement équivalent et donc compatible avec le simple caractère Unicode ñ, tandis que la ligature typographique ff est seulement compatible (mais non canoniquement équivalente) avec la séquence de deux caractères f. La normalisation Unicode est une normalisation de texte qui transforme des caractères ou séquences de caractères en une même représentation équivalente, appelée « forme normale » dans cet article. Cette transformation est importante, car elle permet de faire des comparaisons, recherches et tris de séquences Unicode. Pour chacune des deux notions d’équivalence, Unicode définit deux formes, l’une composée, et l’autre décomposée, conduisant à quatre formes normales, abrégées NFC, NFD, NFKC et NFKD, qui seront détaillées ci-dessous et qui sont aussi décrites dans Normalisation Unicode. (fr) Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter "ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Similarly, each Hangul syllable block that is encoded as a single character may be equivalently encoded as a combination of a leading conjoining jamo, a vowel conjoining jamo, and, if appropriate, a trailing conjoining jamo. Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature "ﬀ") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true. The standard also defines a text normalization procedure, called Unicode normalization, that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one fully composed (where multiple code points are replaced by single points whenever possible), and one fully decomposed (where single points are split into multiple ones). (en) 유니코드 등가성(Unicode equivalence)은 특정한 일련의 들이 반드시 동일 문자를 대표해야 하는 유니코드 문자 인코딩 표준의 사양이다. 이 기능은 비슷하거나 동일한 문자들을 포함하는 기존의 표준 문자 집합과의 호환성을 허용하기 위해 표준에 도입되었다. 유니코드는 2가지 개념을 제공하는데, 하나는 표준 형식의 등가성이고 나머지 하나는 호환성이다. 표준 형식의 등가성으로 정의되는 코드포인트 시퀀스는 인쇄와 출력을 할 때 동일한 모양과 의미를 가질 것으로 추정한다. 이를테면 코드포인트 U+006E(라틴어 소문자 "n")에 이어서 U+0303(결합 물결표 "◌̃")가 오면 하나의 코드포인트 U+00F1(스페인어 알파벳의 소문자 "ñ")과 동일하게 정의된다. 그러므로 이 시퀀스들은 동일한 방식으로 표시되어야 하고 이름의 알파벳순 배열이나 검색 등을 할 때 애플리케이션에서 동일하나 방식으로 처리되어야 한다. 이 표준은 동등한 문자 시퀀스를 대체함으로써 2개의 텍스트 중 어느 것이 와도 동일한 코드 포인트 시퀀스로 통합해주는 유니코드 정규화로 불리는 절차를 정의한다. (ko) Unicodeには既存の標準との互換性を維持するための文字が多数存在する。それらの中には他の文字や文字の並びと機能的に等価なものが存在する。このため、Unicodeは数種類の等価性を定義している。たとえば、文字 n の後ろに結合文字 ~ を続けたものは、1つのUnicode文字 ñ と等価である。Unicodeは等価性を定義するために2つの標準を保守している。 (ja) Unicode等價性（Unicode equivalence）是為和許多現存的標準能夠相容，Unicode（統一碼）包含了許多特殊字符。在這些字符中，有些在功能上會和其它字符或字符序列等價。因此，Unicode將一些碼位序列定義成相等的。Unicode提供了兩種等價概念：標準等價和相容等價。前者是後者的一個子集。例如，字符n後接著組合字符~標準等價和相容等價於Unicode字符ñ。而合字ﬀ則只有相容等價於兩個f字符。 Unicode正規化是文字正規化的一種形式，是指將彼此等價的序列轉成同一列序。此序列在Unicode標準中稱作正規形式。對於每種等價概念，Unicode又定義兩種形式，一種是完全合成的，一種是完全分解的。因此，最後會有四種形式，其縮寫分別為：NFC、NFD、NFKC、NFKD。對於Unicode的文字處理程式而言，正規化是很重要的。因為它影響了比較、搜尋和排序的意義。 (zh)
dbo:wikiPageExternalLink	http://unicode.org/reports/tr15/ http://www.w3.org/International/charlint/ https://www.unicode.org/faq/normalization.html
dbo:wikiPageID	8477071 (xsd:integer)
dbo:wikiPageLength	15709 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1095761778 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Roman_numerals dbr:Samba_(software) dbr:Index_(database) dbr:Character_(computing) dbr:UTF-16 dbr:UTF-8 dbr:Unicode dbr:Unicode_compatibility_characters dbr:Vietnamese_alphabet dbr:Dutch_alphabet dbr:Concatenation dbr:Text_normalization dbr:Equivalence_class dbr:Uconv dbr:Angstrom dbr:Ligature_(typography) dbr:Closure_(mathematics) dbr:Combining_character dbr:Complex_text_layout dbr:Ñ dbr:Dakuten dbr:Full-width dbr:Subscript dbr:Tilde dbr:Typographic_ligature dbr:Alphabet dbr:Alphabetical_order dbr:Hangul_Jamo_(Unicode_block) dbr:Representative_(mathematics) dbr:HTML dbr:Half-width_katakana dbr:Hangul dbr:Language dbr:Latin_alphabet dbr:Bijection dbc:Unicode_algorithms dbr:Superscript dbr:Swedish_language dbr:Code_point dbr:Code_unit dbr:Diacritic dbr:Sorting_algorithm dbr:Spanish_alphabet dbr:Idempotent dbr:Injective_function dbr:OS_X dbr:Canonical_form dbr:Sorting dbr:Volt dbr:Unicode_subscripts_and_superscripts dbr:Diacritics dbr:IDN_homograph_attack dbr:IJ_(digraph) dbr:ISO/IEC_14651 dbr:Precomposed_character dbr:Character_set dbr:Ring_above dbr:Japanese_script dbr:Rich_text dbr:String_searching dbr:Base_character
dbp:wikiPageUsesTemplate	dbt:Refimprove dbt:Reflist dbt:Unicode_navigation
dcterms:subject	dbc:Unicode_algorithms
gold:hypernym	dbr:Specification
rdf:type	yago:WikicatUnicodeAlgorithms yago:Abstraction100002137 yago:Act100030358 yago:Activity100407535 yago:Algorithm105847438 yago:Event100029378 yago:Procedure101023820 yago:PsychologicalFeature100023100 yago:YagoPermanentlyLocatedEntity dbo:ProgrammingLanguage yago:Rule105846932 yago:WikicatAlgorithms
rdfs:comment	유니코드 등가성(Unicode equivalence)은 특정한 일련의 들이 반드시 동일 문자를 대표해야 하는 유니코드 문자 인코딩 표준의 사양이다. 이 기능은 비슷하거나 동일한 문자들을 포함하는 기존의 표준 문자 집합과의 호환성을 허용하기 위해 표준에 도입되었다. 유니코드는 2가지 개념을 제공하는데, 하나는 표준 형식의 등가성이고 나머지 하나는 호환성이다. 표준 형식의 등가성으로 정의되는 코드포인트 시퀀스는 인쇄와 출력을 할 때 동일한 모양과 의미를 가질 것으로 추정한다. 이를테면 코드포인트 U+006E(라틴어 소문자 "n")에 이어서 U+0303(결합 물결표 "◌̃")가 오면 하나의 코드포인트 U+00F1(스페인어 알파벳의 소문자 "ñ")과 동일하게 정의된다. 그러므로 이 시퀀스들은 동일한 방식으로 표시되어야 하고 이름의 알파벳순 배열이나 검색 등을 할 때 애플리케이션에서 동일하나 방식으로 처리되어야 한다. 이 표준은 동등한 문자 시퀀스를 대체함으로써 2개의 텍스트 중 어느 것이 와도 동일한 코드 포인트 시퀀스로 통합해주는 유니코드 정규화로 불리는 절차를 정의한다. (ko) Unicodeには既存の標準との互換性を維持するための文字が多数存在する。それらの中には他の文字や文字の並びと機能的に等価なものが存在する。このため、Unicodeは数種類の等価性を定義している。たとえば、文字 n の後ろに結合文字 ~ を続けたものは、1つのUnicode文字 ñ と等価である。Unicodeは等価性を定義するために2つの標準を保守している。 (ja) Unicode等價性（Unicode equivalence）是為和許多現存的標準能夠相容，Unicode（統一碼）包含了許多特殊字符。在這些字符中，有些在功能上會和其它字符或字符序列等價。因此，Unicode將一些碼位序列定義成相等的。Unicode提供了兩種等價概念：標準等價和相容等價。前者是後者的一個子集。例如，字符n後接著組合字符~標準等價和相容等價於Unicode字符ñ。而合字ﬀ則只有相容等價於兩個f字符。 Unicode正規化是文字正規化的一種形式，是指將彼此等價的序列轉成同一列序。此序列在Unicode標準中稱作正規形式。對於每種等價概念，Unicode又定義兩種形式，一種是完全合成的，一種是完全分解的。因此，最後會有四種形式，其縮寫分別為：NFC、NFD、NFKC、NFKD。對於Unicode的文字處理程式而言，正規化是很重要的。因為它影響了比較、搜尋和排序的意義。 (zh) Cet article traite des équivalences Unicode. Unicode contient de nombreux caractères. Pour maintenir la compatibilité avec des standards existants, certains d’entre eux sont équivalents à d’autres caractères ou à des séquences de caractères. Unicode fournit deux notions d’équivalence : canonique et de compatibilité, la première étant un sous-ensemble de la deuxième. Par exemple, le caractère n suivi du diacritique tilde ◌̃ est canoniquement équivalent et donc compatible avec le simple caractère Unicode ñ, tandis que la ligature typographique ff est seulement compatible (mais non canoniquement équivalente) avec la séquence de deux caractères f. (fr) Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. (en)
rdfs:label	Équivalence Unicode (fr) 유니코드 등가성 (ko) Unicodeの等価性 (ja) Unicode equivalence (en) Unicode等價性 (zh)
owl:sameAs	freebase:Unicode equivalence yago-res:Unicode equivalence wikidata:Unicode equivalence dbpedia-fr:Unicode equivalence dbpedia-ja:Unicode equivalence dbpedia-ko:Unicode equivalence dbpedia-sr:Unicode equivalence dbpedia-zh:Unicode equivalence https://global.dbpedia.org/id/2Mo2v
prov:wasDerivedFrom	wikipedia-en:Unicode_equivalence?oldid=1095761778&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Unicode_equivalence
is dbo:wikiPageRedirects of	dbr:Normalization_Form_C dbr:Normalization_Form_D dbr:Unicode_normalization dbr:Combining_class dbr:NFC_normalisation dbr:NFC_normalization dbr:NFD_normalisation dbr:NFD_normalization dbr:NFKC dbr:Normalization_Form_Canonical_Composition dbr:Normalization_Form_Canonical_Decomposition dbr:Canonical_decomposition dbr:Canonical_equivalence dbr:Canonically_equivalent dbr:Glyph_Composition_/_Decomposition dbr:Compatibility_decomposition dbr:Compatibility_equivalence dbr:Unicode_normalisation dbr:UTF-8-MAC
is dbo:wikiPageWikiLink of	dbr:Normalization_Form_C dbr:Normalization_Form_D dbr:List_of_Latin-script_digraphs dbr:Ring_(diacritic) dbr:DIN_91379 dbr:UTF-8 dbr:Unicode dbr:Unicode_character_property dbr:Unicode_compatibility_characters dbr:Unicode_normalization dbr:Universal_Disk_Format dbr:Virtaal dbr:Duplicate_characters_in_Unicode dbr:Early_Dynastic_Cuneiform_(Unicode_block) dbr:Combining_class dbr:Text_normalization dbr:Ellipsis dbr:NFC_normalisation dbr:NFC_normalization dbr:NFD_normalisation dbr:NFD_normalization dbr:NFKC dbr:Uconv dbr:Andrew_West_(linguist) dbr:Angstrom dbr:Apache_Subversion dbr:Complex_text_layout dbr:Filename dbr:Normalization_Form_Canonical_Composition dbr:Normalization_Form_Canonical_Decomposition dbr:Overline dbr:HFS_Plus dbr:Internationalized_Resource_Identifier dbr:Symbol_(typeface) dbr:Collation dbr:Trace_monoid dbr:Shadda dbr:IDN_homograph_attack dbr:Precomposed_character dbr:European_ordering_rules dbr:Canonical_decomposition dbr:Canonical_equivalence dbr:Canonically_equivalent dbr:Glyph_Composition_/_Decomposition dbr:Compatibility_decomposition dbr:Compatibility_equivalence dbr:Unicode_normalisation dbr:UTF-8-MAC
is foaf:primaryTopic of	wikipedia-en:Unicode_equivalence