Wikidata talk:Lexicographical data
Add topicLexicographical data Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.
|
| On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2025/12. |
Connecting adjective senses to items
[edit]To connect a verb sense to the corresponding item, we have predicate for (P9970), and I observe that item for this sense (P5137) is used mainly for name senses. What is the correct way to connect an adjective sense to an item, if there is one? Dv103 (talk) 12:12, 9 August 2025 (UTC)
- There is pertainym of (P8471) but that links to another lexeme sense, not to an item directly. That may be sufficient though? ArthurPSmith (talk) 17:51, 12 August 2025 (UTC)
- That is a nice property, but I think that a property that directly links adjective to items would be very useful. For context, I am working for Abstract Wikipedia. Do you think that a new property "quality for" is worth proposing? Dv103 (talk) 18:24, 12 August 2025 (UTC)
- Now I've created the proposals Wikidata:Property proposal/quality for and Wikidata:Property proposal/type of quality. Dv103 (talk) 11:46, 19 August 2025 (UTC)
- That is a nice property, but I think that a property that directly links adjective to items would be very useful. For context, I am working for Abstract Wikipedia. Do you think that a new property "quality for" is worth proposing? Dv103 (talk) 18:24, 12 August 2025 (UTC)
- @Dv103: "Mainly" in your first sentence is not in itself limiting; P5137 is not restricted to use on nouns, and in fact other non-predicative lexical categories may be linked to items using it. Mahir256 (talk) 21:29, 12 August 2025 (UTC)
- I see that, according to the documentation, item for this sense (P5137) should be used only when there is an item having exactly the meaning of a quality. How many items are there indicating a quality? Dv103 (talk) 22:00, 12 August 2025 (UTC)
- @Dv103: Not too many, although items for them certainly can be made based on, among other sources, Concepticon entries (filter the 'Ontological category' for 'Properties'), NCIt entries, and AAT entries. Mahir256 (talk) 15:27, 13 August 2025 (UTC)
- I agree. Items for qualities are valuable and I encourage you to create missing ones 😀 So9q (talk) 09:58, 19 August 2025 (UTC)
- @Dv103: Not too many, although items for them certainly can be made based on, among other sources, Concepticon entries (filter the 'Ontological category' for 'Properties'), NCIt entries, and AAT entries. Mahir256 (talk) 15:27, 13 August 2025 (UTC)
- I see that, according to the documentation, item for this sense (P5137) should be used only when there is an item having exactly the meaning of a quality. How many items are there indicating a quality? Dv103 (talk) 22:00, 12 August 2025 (UTC)
New suggestion
[edit]Property_talk:P5323#Restrict_to_lexeme_form? So9q (talk) 09:55, 19 August 2025 (UTC)
Multiple representations
[edit]Hi,
In the data model of Lexemes, it is possible to add multiple representations to one form. But (AFAIK) it has never been said when and how to use this feature.
Right now, there is ~15 million forms and only ~154 000 with more than one representation (so ~1%) and it's mostly used on a few languages (in descending order: Japanese (Q5287), Hebrew (Q9288), Punjabi (Q58635), Sumerian (Q36790), Hindustani (Q11051), making 2/3 of the total). The highest being ਅੜਾਉਣੀ/ਅੜੌਣੀ/اڑاوَݨی/اڑاوݨی/اڑاؤݨی/اڑاوَنی/اڑاونی/اڑاؤنی (L744679) with 8 representations on one form, followed by òmo/oˊmo/oʼmo/ohmo (L1417242).
Also, on a more technical side, is there an identifier (or anything) that allows to point to a specific representation? (like LX-FY for the form Y of lexeme X or even the weird ID for statements like Lexeme:L1#L1$e30d2889-462b-19fa-1f77-1dd57ea722cb). The only way I know if to get all representations of a form and then filter by language (but it's more complicated and it requires to know the language in advance).
Cheers, VIGNERON (talk) 08:21, 21 September 2025 (UTC)
How to express usage examples on smaller or dead languages?
[edit]I'm trying to put usage example (P5831) on the verb ikó (L1520208) from the Tupi (Q56944) language. The problem is that the property uses monolingual text, which is limited to languages offered by the MediaWiki software. Is there a alternative for this so I could register some examples in this language? Luk3 (talk) 04:19, 15 October 2025 (UTC)
How to deal with forms whose written representation is unknown but not their pronunciation?
[edit]Hi. I am working on Lorrain (Q671198) and I'm importing Q106167910. I stumbled on an issue: different dialects use different words, and this dictionary usually accounts for this by stating all the known pronunciations on a "main" entry and having all the variants redirect to that entry. However, it sometimes "forgets" to provide the word and only specifies the pronunciation.
As an example, see Āchpac (L1508548) and the corresponding excerpt from the dictionary s:fr:Page:Zéliqzon - Dictionnaire des patois romans de la Moselle, œuvre complète, 1924.djvu/44:
Āchpac [ǟs̆pak.. S, ās̆pǫk V], n. p. — Aspach, vill. de l’arr. de Sarrebourg.
ǟs̆pak is the pronunciation for 'Āchpac'. ās̆pǫk is the pronunciation for '?'. Nowhere in the dictionary there is the written form of ās̆pǫk.
Having worked on this dictionary for years already, I can assume the latter is going to look like 'Āchpoc' when written, but I feel uncomfortable "guessing" without having a way to properly state that this form was "recreated" on a "best guess basis". Or if it is even a good practice to "recreate" these missing forms in the first place.
As such, on Āchpac (L1508548) and any lexemes where this occurred, I went ahead and created a '?' form with all the proper data. @Lepticed7 suggested instead to add a 'pronunciation' statement at the Lexeme level to state that no known form is associated to this pronunciation but it still exists nonetheless.
Is there a better way? Poslovitch (talk) 09:54, 19 October 2025 (UTC)
Unicode to use Wikidata as a lexicon datasource ?
[edit]See this article on the unicode blog : https://blog.unicode.org/2025/11/introducing-unicode-inflection-library.html for Unicode Inflexion Library (Q136796507).
It's a library published by the unicode consortium ought to compute flexions for words, just as Wikifunctions is made for at a low level for language generation. They cite Wikidata as their source for the lexicon and the flexions. I don't know if it's well known here, hence this message. I'll add a word in the next Wikidata Weekly (if it has not been done in an earlier edition…) author TomT0m / talk page 12:31, 15 November 2025 (UTC)
- Turns out it's been done : Wikidata:Status_updates/2025_11_10. I added a mention of this also on the wikipage of this talk page. author TomT0m / talk page 12:39, 15 November 2025 (UTC)