Commons talk:Categories

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
This is the talk page for discussing improvements to Commons:Categories.
Archives: 1, 2, 3, 4, 5

Sortkey recommendations

[edit]

Regarding this:

The special sortkey τ (lowercase Greek letter Tau) is used to sort templates at the end of the related Commons-category, see for example Category:Transport templates sorted in Category:Transport. (Sorting in Commons is not case sensitive so only uppercase Τ (Tau) is shown.)

I wonder whether this is a good recommendation. I tried it for a few categories and I found it quite confusing because the uppercase Tau is, in practice, visually indistinguible from the latin letter T. I see a lot of template categories with a sortkey of ~ and would actually consider that a very good idea because it's sorted after Z, it's visually recognizable, and it's a character which you probably can find on way more keyboards than the Tau.

On a side note, I would expect using three different dashes as sortkeys to create a lot of confusion, and for many people it will be hard to understand the difference between them. So I would also suggest to remove any mention of the emdash and the endash here.

Thanks -- Reinhard Müller (talk) 15:35, 16 February 2024 (UTC)[reply]

agreed. i think common practice is using ~ for anything commons-, or more broadly, wikimedia-related stuff.
the dashes were added by https://commons.wikimedia.org/w/index.php?title=Commons%3ACategories&diff=prev&oldid=703824439 . RZuo (talk) 15:40, 16 February 2024 (UTC)[reply]
@W like wiki: any input from your side? --Reinhard Müller (talk) 15:53, 22 February 2024 (UTC)[reply]
Grüß Dich @Reinhard Müller: Yes, I agree. I can't remember on which Commons page I read this recommondation with the Greek Tau before including it here, but I had always the same problems, need to copy paste it from here or even made a copy of this on my user page. And also about the equal look to normal "T" I was not so happy, but when I wrote this chapter I can not create a new rule. But when we do this now I appreciate!
Same with the dashes. Maybe some have an idea if they are useful, but also here I would apreciate a changing, deletion in this case. Best Regards -- W like wikiPlease ping me!Postive1Postive2  16:25, 22 February 2024 (UTC)[reply]
Thanks to everybody who commented! I updated the page and hope that I didn't mess up anything regarding the translation. --Reinhard Müller (talk) 18:15, 22 February 2024 (UTC)[reply]

Controversial categories

[edit]

Hi, I'd like to get feedback regarding categories that can be seen as controversial. On en-wiki, there is a rule that

Categorizations should generally be uncontroversial; if the category's topic is likely to spark controversy, then a list article (which can be annotated and referenced) is probably more appropriate.

As far as I can see there is no such policy on Wikicommons. Is there some other policy which deals with this issue? What is the community consensus?

To provide a concrete example, this edit added back the category Territories under occupation by Russia to the category Abkhazia. This is controversial, since while the overwhelming majority of countries consider Abkhazia to be a part of Georgia, only a minority explicitly said that it's occupied by Russia (see Wikipedia:Russian-occupied_territories_in_Georgia#International_position).

I believe that this category isn't helpful since the category name cannot explain all these nuances. It would be better to create a page/gallery with the related media. I'm pinging User:Laurel_Lodged who has added this category. Alaexis (talk) 09:21, 28 May 2024 (UTC)[reply]

I agree that some such policy is needed in Commons. I agree that "Categorizations should generally be uncontroversial". But one editor's uncontroversial is another editor's hot potato. Unlike Wiki, Commons does not lend itself to list article creation. So the likely solution is a case-by-case evaluation and an agreement to adhere to community consensus. By the way, regarding Abkhazia, Wiki itself says, "On 23 October 2008, the Parliament of Georgia declared Abkhazia a Russian-occupied territory, a position shared by most United Nations member states.[1] So it's not just me. Laurel Lodged (talk) 15:05, 28 May 2024 (UTC) Laurel Lodged (talk) 15:05, 28 May 2024 (UTC)[reply]
Ehh, where do you see that in the source? I've tagged it on en-wiki. If I'm missing something and the source does say it, then indeed it wouldn't be controversial and I would not object to placing Abkhazia in this category. Alaexis (talk) 20:23, 28 May 2024 (UTC)[reply]
"Georgia asserted that the territories of South Ossetia and Abkhazia, including the upper Kodori Valley, were occupied by Russian forces. On 23 October, the Parliament of Georgia adopted a law declaring Abkhazia and South Ossetia “occupied territories” and the Russian Federation a “military occupier.” This claim was reiterated […] In describing the “current occupation” Georgia also stated: “the western part of the former ‘buffer zone’ (the village of Perevi in the Sachkhere District) remains under Russian occupation." If Wiki is making claims not supported by sources, then Wiki is the place to make those edits. Laurel Lodged (talk) 10:30, 30 May 2024 (UTC)[reply]
Yes, absolutely. But there is a difference between *Georgia* considering it an occupied territory and "most UN members" sharing this position. I never argued with the former. Alaexis (talk) 08:42, 1 June 2024 (UTC)[reply]
 Question @Alaexis, @Laurel Lodged It is hard to tell if this is really a question about general policy or if it is really a discussion about a particular case. In the case of the latter, this should really be had as a CfD over Category:Abkhazia. If there is a change to policy that you think would help improve things, that should be discussed here, and you can certainly refer to this case as reference. Josh (talk) 20:15, 18 July 2024 (UTC)[reply]

FYI: Moved historical page, redirected that target to this page

[edit]

"Commons:Naming categories" now redirects to this commons: ns page. It was problematic for the number of links (internal and from WD), and the confusion being caused with the pre-existing arrangement.

The page that was at that space is now at Commons:Naming categories (historical). The number of links to its detail are minimal, and it should not be problematic for functional management of this site having it moved.  — billinghurst sDrewth 00:02, 31 May 2024 (UTC)[reply]

@Billinghurst Thank you for doing that, it is a big help to avoid confusion for folks. Josh (talk) 20:02, 18 July 2024 (UTC)[reply]

Sortkey recommendations

[edit]

a question that bounced around in my mind a few times is what are the purposes of each of the symbolic sortkeys? the most commons ones I see are '(space)', '*', '+' and '~'. what are their roles?

So far, that's not clearly defined, and different people use completely different sortkey prefixes for the same purpose.
I have collected a few ideas about what could be seen as "best practice". I don't know whether we actually want to come up with a policy or at least a recommendation, but if, then this list might serve as a base for that. Thanks --Reinhard Müller (talk) 07:02, 9 July 2024 (UTC)[reply]

another thing while discussing sorting is a common thing that I see in category pages with accent marks in the titles: they use {{DEFAULTSORT:}} to exclude remove the accents. simple example is 'café' which is turned into {{DEFAULTSORT:cafe}}. if this is something that should be encouraged in the wiki, please feel free to add it to the policy! Juwan (talk) 10:11, 8 July 2024 (UTC)[reply]

@Reinhard Müller, thank you for sharing some ideas on ways to use a variety of sort keys to sub-sort by type of sub-topic. I am often frustrated by the willy-nilly use of special characters by users, especially to '+1' their preferred topics to the top of the list. I readily use a few established special characters for sorting non-topical categories, such as a space for index categories, # for numbers, ? for 'unidentified' (maintenance) categories, and ~ for some other types of maintenance categories. For topical non-number categories however, I do not see the attraction of using special character sorting, as it requires a few things at a minimum:
  1. The user must already be familiar with the sort key special character system.
  2. The user must parse the topic they are seeking, in order to figure out which special character they should look under.
  3. The system has to be consistently-enough employed that once a user has passed hurdles 1 and 2, they can have some confidence they will actually find what they are looking for.
Currently, none of these are true for a lot of the special characters, and so I generally resist using them for topical categories, and while I think your list is well thought-out, I don't think in the end that it provides any real additional value over using the alphabetical sort system that categories are fundamentally based on.
As for using sort-keys for normal alphabetical sorting (e.g. using sortkey 'buildings' to sort Category:Science buildings in Category:Science), that is extremely useful and I use it a lot. I do think some additional guidelines right here on COM:CAT to help users quickly grasp common practices is a good idea. Josh (talk) 19:56, 18 July 2024 (UTC)[reply]
it is certainly a very nice scheme. the only issue in my case is what you've raised. is this perhaps going too far? as in, is it too complex for someone to understand, especially without necessarily having to read the policy? Juwan (talk) 20:21, 18 July 2024 (UTC)[reply]
@JnpoJuwan I always try and keep accessibility front and center in my mind when considering categorization. For someone like me who's been on the project since its inception (or close enough anyway), I am able to take the time to learn and apply various elegant schemes for organization, but for especially newer or irregular users, that really isn't practical. Even as a veteran, I am routinely frustrated when I look for something and don't find it (such as buildings under b) but instead have to then figure if a) the sub even exists, and b) what special character did someone come up to sort it under. Having a standardized key list and implementing it consistently might help that for me since I spend a lot of time in categories and can learn and keep fresh that knowledge (I even kind of like the scheme), but I still don't think it helps the bulk of less-regular users just looking to sort their contributions or find images for their projects. For this reason, I think it falls down on the accessibility question. Josh (talk) 20:59, 18 July 2024 (UTC)[reply]
@JnpoJuwan As far as accent marks (or their suppression) in sort keys is concerned, I have seen some discussion on whether the current sort algorithm handles accents and other diacritics as it should. It is certainly not consistent about how search handles them. I don't know if we really should be suppressing them though, and I generally don't in the few cases I've had them to worry about. My native language doesn't use diacritics (except for borrow words) so I probably don't have the best intuitive feel on which way to go on this question. Josh (talk) 20:01, 18 July 2024 (UTC)[reply]
to give you some perspective, in my native language Portuguese at least, we tend to ignore accent marks when sorting, so an algorithm would sort it like so: aa áb ac. I haven't seen how other languages manage their sorting schema (speakers of, for example, Spanish, German, Swedish would probably want diacritics kept), I need more opinions on that side. Juwan (talk) 20:18, 18 July 2024 (UTC)[reply]
@JnpoJuwan Thank you for that insight, it is always fascinating how different languages have such different perspectives on the world. As a mono-lingual project with a multi-lingual audience, that remains a big challenge for Commons to grapple with. Josh (talk) 21:01, 18 July 2024 (UTC)[reply]
Spanish considers ñ a distinct letter between n and o; accented vowels are treated as if the accent weren't there, except if words are otherwise identically spelled (e.g. que and qué), in which case the unaccented one comes first. Historically they treated ch as a single latter sorting after c and ll as a single letter sorting after l, but in the last few decades that has largely disappeared.
German normally sorts ä, ö, and ü as ae, oe, and ue; the difference is considered a typographic convention. Ditto for ß and ss.
Romanian sorts ș after s and ț after t and considers them distinct letters. Similarly a, ă, â are considered distinct letters (in that order), and the same for i and î.
Those are the only languages other than English where I know enough to speak confidently. Inconveniently, as far as I can tell, mediawiki doesn't readily support correctly sorting ñ, ș, or ț, nor the three non-standard Romanian vowels. - Jmabel ! talk 21:43, 18 July 2024 (UTC)[reply]
In my native language Hungarian, ö (and ő) is sorted after o (and ó), the same goes for u/ú and ü/ű (other diacritic differences – including those between o and ó, between ö and ő etc. – count only if there’s no other difference, but di- and trigraphs have their own places – if they really di- or trigraphs, and not only those letters next to each other; a rule nearly impossible to create an algorithm for). This means that according to the Hungarian rules, Olaszliszka goes before Öcsöd – however, according the German rules, it’s just the other way round, Oecsoed being before Olaszliszka! This demonstrates that a Commons-wide default cannot fulfill all languages’ needs, so I think the only sensible default other than the current one is completely disregarding accents, i.e. treating ñ and ň the same as n; ö, ő and ô the same as o; ș, ş and š the same as s, and so on. —Tacsipacsi (talk) 22:21, 18 July 2024 (UTC)[reply]
in short, is what {{DEFAULTSORT:}} tries to achieve is a way to bypass MediaWiki's (current) technical restrictions? Juwan (talk) 22:53, 18 July 2024 (UTC)[reply]
@JnpoJuwan: not really. Even with {{DEFAULTSORT:}} we have to live with most of those restrictions. But (besides the issues that started this discussion about handling incommensurate subcats separately) {{DEFAULTSORT:}} lets us
  • sort people "last name first" (though increasingly this happens implicitly as a side effect of Wikidata Infoboxes)
  • sort numbers sanely (by default they'd sort alphabetically) so we can force a sequence 1, 2, 3, ... 9, 10, 11, ... 20, ... instead of 1, 10, 11, ... 2, 20, ... 3, ... 9
  • do things like in a language where every public square is going to begin with "Plaza", sort a list of public squares by the part of the name that actually matters, so not everything is just lumped under "P"
As noted above, some other uses are more controversial. - Jmabel ! talk 05:04, 19 July 2024 (UTC)[reply]

Use of English varieties in category names

[edit]

There was a discussion at Commons talk:Categories/Archive 4#LANGVAR in category names ?, and many users had expressed support to implement local dialectal names for categories. However, there was no consensus on the proposal by Joshbaumgartner, which would implement it. So I have modified the proposal and drafted it at User:Sbb1413/ENGVAR proposal. It is not intended to be a separate policy. Rather, it is intended to be additions and modifications of the existing policy at COM:CAT to accommodate local dialectal terms. The main changes of this proposal include the avoidance of ambiguous dialectal terms. Sbb1413 (he) (talkcontribsuploads) 14:02, 18 July 2024 (UTC)[reply]

@Sbb1413 Thank you for raising this for some further discussion. This represents a potentially significant redirection in our category naming approach and as such you are completely correct to frame it as a change/addition to current Commons category policies  as opposed to a new stand-alone policy. I think this is a good approach, as it necessarily requires us to consider any impacts on existing comcat and adjust them at the same time to rectify any inconsistencies that might be exposed were this merely to be put forward as an unrelated new policy. I unfortunately have some other priorities at the moment, but I want to give this due thought and provide a comprehensive input, though it may be a week or so before I can do that properly. In the meantime, I would like to get clarity on a couple of items just to understand the starting point here as accurately as possible:
  1. Would the intent be to have an official list of approved language variations for specific topics with due process required for additions or changes, or is each topic to simply be up to the normal wrangling among users over which term is right in their given locale?
  2. Would the intent be to retroactively apply this policy to existing topics, or to stick with the policy of needing more than just langvar reasons to change an existing category?
  3. What ultimate authority, if any, would we be relying on to determine correct langvar?
I'm sure there will be more to follow. Josh (talk) 19:39, 18 July 2024 (UTC)[reply]
@Joshbaumgartner My answers:
  1. The intent is to have a consensus-based list of accepted language variations for certain topics. The list will be inserted at the top of a topic category. For small categories, additions and changes would be done boldly, while larger categories would require agreements with some other users.
  2. This policy will be retroactively applied to existing topics.
  3. The ultimate authority to determine the correct ENGVAR is of course consensus.
Sbb1413 (he) (talkcontribsuploads) 11:27, 19 July 2024 (UTC)[reply]
I have created a rough draft of a templated list of consensus-based English dialectal terms at User:Sbb1413/ENGVAR template. This can be a standalone template or a part of {{Topic by country}} and {{Country category}} templates. China and Russia are included for their own English terms for "astronaut". Sbb1413 (he) (talkcontribsuploads) 12:03, 19 July 2024 (UTC)[reply]
@Sbb1413 Thanks, those are more or less along the lines of what I would have guessed. I think the right way to do the {{Topic by country}}, etc. templates would be to build in a langvar switch of some sort, ideally without requiring manual activation, with the approved variations added into the data templates. However, I wouldn't really worry too much about exactly how to do the templates at this stage. I would simply remark that converting templates to support this will be its own effort for a team to develop and implement once the new policy is enacted. This discussion should focus on what the right policy is and getting the language (no pun) correct for COM:CAT. Templates and other tools will have to follow suit. Josh (talk) 12:43, 19 July 2024 (UTC)[reply]
One of the things that strikes me as I think about this direction, is that all of the logic for saying that Australia categories should be given Australian English names instead of the universal English topic name, or Canada, the US, etc., is the same logic that would say France should use French or Mexico should have Mexican Spanish variations of a topic name. Obviously, we are starting this discussion limited to English variants, but realistically, I'm not sure that is anything more than a purely arbitrary line we are drawing. Just a thought. Josh (talk) 12:52, 19 July 2024 (UTC)[reply]
@Joshbaumgartner We can use the same English term for all countries if it is used by all the major dialects. If such term doesn't exist, some countries will use their own regional terms while others will use the "universal English topic name". Since our core naming policy is to use English in category names as much as possible, we obviously shouldn't extend this proposal to other languages. Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary. Sbb1413 (he) (talkcontribsuploads) 13:07, 19 July 2024 (UTC)[reply]
Well our core policy is also to apply the Universality Principle, and this proposal is considering upending that. In fact, from a legal theory perspective, I would posit that the universality principle is the primary supporting principle of the English-only naming policy, or at least it is the key reflection of the policy's intent, i.e. if the UP goes (or is neutered), then what real basis is there for the English-only policy? I totally understand that we don't necessarily want to extend this proposal to other languages, as that would be a much bigger fish to fry and may bring in some vocal opposition, but I'm not sure if maintaining English-only is still tenable if the UP is eroded. I don't think this is a point for or against this effort, just a consideration of potential future ramifications. Also, this is a fundamental policy change being considered, so I would not consider anything obvious. As for variations in other languages, I know Spanish spoken in Mexico has plenty of variation vis-a-vis Spain or even other parts of Latin America. That is my only personal practical experience with a non-English language, so I'm 2-for-2 so far on language variations being a thing. However, even if a language is completely homogenous across its usage, it doesn't change the point that localization is localization, whether we are talking variants or entirely different languages. Josh (talk) 13:32, 19 July 2024 (UTC)[reply]