I dropped from the previous corpus the "title_1" AND "title 2" queries, since those are no longer being submitted to enwiki, and re-categorized the remaining queries primarily by language. (Some foreign language queries were previously categorized as "music" or "movies" since they were titles of such. For the purposes of language detection, I recategorized queries like "O Menino Quadradinho" from "movies" to "Portuguese" because even though it's a movie title, it is made up of Portuguese words, in the same way that "The Matrix" is a movie title, but also an English phrase.)
Table No 21 Tamil Movies
Below are the results tables for overall ("TOTAL") performance, and for English, Spanish, Chinese, Portuguese, Arabic and French (the languages with at least 10 manually identified instances in our queries).
The table below shows the number of actual items for each language, and the number identified (with 0.95 probability, maximizing precision) for each language. Note that not all identifications are correct, but it's clear that we're missing almost 400 English identifications, and we have 41 Romanian identifications that are incorrect.
One option would be to determine by-language thresholds. So, English, Chinese and Arabic might have thresholds of 0.0 (all of them!), while Romanian gets a threshold of 1.1 (none of them!). With more data for some of the less well-represented languages, which is most of them, we should be able to boost both recall and precision. We'd also have to determine a method for dealing with cases where, for example, Romanian scores 0.714... and English scores 0.285..., of which there are plenty. The simplest approach might be to take the first acceptable candidate (i.e., skip over Romanian, then take English). This approach is essentially doing very simple machine learning on the output of the ElasticSearch language detection plugin, but could improve performance.
Scalable images respect the user's base preference, which may have been selected for that user's particular devices. Module:InfoboxImage, which is used in this infobox, accommodates the use of scaling. However, setting the appropriate scale is slightly more complex than setting a raw "px" value. This guide provides a quick conversion table to make the process of setting a scale easier. It is based on a default thumbnail setting of 220px.
The title of the source material and the name(s) of the source material writer(s). Use this field in conjunction with screenplay and story where applicable (i.e. "Screen story") if movies are based on previously produced or published material, such as books, plays, articles, old screenplays etc. 2ff7e9595c
Comments