LinguaLibre talk

Difference between revisions of "Stats"

 
(6 intermediate revisions by 3 users not shown)
Line 10: Line 10:
 
:{{ping|Pamputt}} strange. Indeed it worked yesterday (but I noticed then some very varying lag...) and today it's still works for me but veryyyy long (I didn't measure but probably around 2-3 minutes ?). {{ping|Seb35}} any idea where it comes from? Indeed, I don't see any link with the translate tag, it more likely either Queryviz, Blazegraph and some caching issue. If it a charge issue, maybe we could split this page in several subpages. Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|talk]]) 13:15, 28 April 2021 (UTC)
 
:{{ping|Pamputt}} strange. Indeed it worked yesterday (but I noticed then some very varying lag...) and today it's still works for me but veryyyy long (I didn't measure but probably around 2-3 minutes ?). {{ping|Seb35}} any idea where it comes from? Indeed, I don't see any link with the translate tag, it more likely either Queryviz, Blazegraph and some caching issue. If it a charge issue, maybe we could split this page in several subpages. Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|talk]]) 13:15, 28 April 2021 (UTC)
 
::{{u|VIGNERON}} {{u|Pamputt}}: I measure ~45 seconds for the 4 queries, the first appearing after about 30 seconds. I don’t think there is a real issue: these queries are a bit heavy, and when you load the page, you are requesting 4 queries at the same time to Blazegraph. If necessary, the cache time (currently 10 min) can be increased to a "long" value like 1 or 2 days, so that the page will be (almost) always quickly displayed but will be always a bit aged. [[User:Seb35|Seb35]] ([[User talk:Seb35|talk]]) 14:47, 30 April 2021 (UTC)
 
::{{u|VIGNERON}} {{u|Pamputt}}: I measure ~45 seconds for the 4 queries, the first appearing after about 30 seconds. I don’t think there is a real issue: these queries are a bit heavy, and when you load the page, you are requesting 4 queries at the same time to Blazegraph. If necessary, the cache time (currently 10 min) can be increased to a "long" value like 1 or 2 days, so that the page will be (almost) always quickly displayed but will be always a bit aged. [[User:Seb35|Seb35]] ([[User talk:Seb35|talk]]) 14:47, 30 April 2021 (UTC)
 +
:{{ping|VIGNERON|Seb35}} could you check again. Now I've waited 1h30 and only "The most prolific speakers" appeared. I do not know how long I need to wait to see the others queries to be displayed. I think that more than 30 seconds is too long for most of the users; they just will think that it does not work. If we do not have a better solution, I should display a banner to indicate that it takes a very long time to see the results be displayed. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 20:53, 5 June 2021 (UTC)
 +
::Hmm, very weird. I tested again and now all the queries display after 15 seconds. Any explanation? [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 13:48, 6 June 2021 (UTC)
 +
:::{{ping|Pamputt|VIGNERON|Seb35}} From what I understood, the queries need to load, which takes from 1 minute to ?? minutes, but once they're loaded, if you (or someone else) load the page within the next 10 minutes, they will be displayed almost instantly thanks to the cache. <small>''What I don't understand is that, once they're loaded, if you purge the page's cache, they're still displayed instantly (so it may be another cache)''</small>.<br/>Would it be possible to keep this cache permanently (not just for 10 minutes), so that when anyone arrives on the page, they don't face a loading page (which seems to be broken when you are not aware)? The values would be uploaded by users who want the latest version (i.e. at least once a day) by clicking on a dedicated button (and the loading time would be only at this moment, assumed by the user). Would that be possible?<br/>All the best — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 08:06, 7 June 2021 (UTC)
 +
::::{{ping|WikiLucas00}} The core issue on this page is that queries are natively long (seems to be 1-2 minutes, and querying the five requests at the same time is making the problem worse). Indeed, there is a cache of 10 minutes implemented in nginx. Currently the the cache is global for all SPARQL requests (through QueryViz and through the Blazegraph interface and through the future SPARQL federation). I just pushed the updated nginx configuration for frontend and Blazegraph, see [https://github.com/lingua-libre/operations/tree/master/nginx Github].
 +
::::<s>If I’m not mistaken, the cache can be purged with a hard refresh of the browser (Ctrl+F5).</s>
 +
:::::(just tried, does not work: my Firefox does not send a "Cache-Control: no-cache" because it is a XHR request and not the main HTTP request, and explicitely sending this Cache-Control does not work any more, probably because nginx cache bypass the cache only if [https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_bypass proxy_cache_bypass] is set to something)
 +
::::I find it is not desirable that SPARQL request in the Blazegraph interface have a long cache, at the contrary of QueryViz requests. However I can dissociate these two paths by specifying a different $wgQueryVizEndpoint in LocalSettings.php.
 +
::::I can propose the following thing:
 +
::::* remove the cache on the Blazegraph interface and the future SPARQL federation,
 +
::::* put a long cache on QueryViz requests (long = 12h ?),
 +
::::* add [https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update proxy_cache_background_update on] and [https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating proxy_cache_use_stale timeout updating] on nginx to serve a possibly-stale response while updating the result of the request, so that it stays quite fresh as you propose as long as the page is visited,
 +
::::* possibly a cron can be added to refresh every 12h the results of the 5 SPARQL requests on this page Statistics.
 +
::::It should work in theory, but I have no experience for now with these "stale-updating-requests".
 +
::::On the long terme, QueryViz should be improved to display an error in case of response 504 (or other 500), and ideally be able to control the caching though parameters in the wikicode, but this needs a careful design because it could need cooperation with nginx and/or cron (the latter to keep fresh results); or the nginx cache could be replaced by a MediaWiki cache. All that needs to be discussed on a Phabricator ticket if desired – I precise QueryViz evolutions are not included in the contract we have with WMFR.
 +
::::[[User:Seb35|Seb35]] ([[User talk:Seb35|talk]]) 11:38, 14 June 2021 (UTC)
 +
:::::Let's go this way. Also, what about increasing the [https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout proxy_read_timeout] on nginx so that we can avoid some 504's ? In regards to QueryViz, volunteer developers like myself will take care of that. --[[User:Poslovitch|Poslovitch]] ([[User talk:Poslovitch|talk]]) 08:18, 23 June 2021 (UTC)

Latest revision as of 08:18, 23 June 2021

Hi, would it be possible to change the labels "new speakers" and "new languages" to "speakers of the month" and "languages of the month" ? As it stands it is a bit confusing. Best --Adélaïde Calais WMFr (talk) 10:18, 31 March 2020 (UTC)

It has been changed to "Active speakers" and "Active languages". Each row clearly represents one month. Should be a good balance. Yug (talk) 21:18, 17 February 2021 (UTC)

LinguaLibre vs Commons

@Adélaïde Calais WMFr Examining the early days of both is very interesting. See stats.wikimedia.org/#/commons.wikimedia.org. Yug (talk) 15:04, 3 March 2021 (UTC)

Loading for ever

VIGNERON, queries do not seem to success anymore (loading indefinitely). Do you have any idea of what happens (I remember it was working yesterday or two days ago maximum)? Pamputt (talk) 08:26, 28 April 2021 (UTC)

@Pamputt strange. Indeed it worked yesterday (but I noticed then some very varying lag...) and today it's still works for me but veryyyy long (I didn't measure but probably around 2-3 minutes ?). @Seb35 any idea where it comes from? Indeed, I don't see any link with the translate tag, it more likely either Queryviz, Blazegraph and some caching issue. If it a charge issue, maybe we could split this page in several subpages. Cheers, VIGNERON (talk) 13:15, 28 April 2021 (UTC)
VIGNERON Pamputt: I measure ~45 seconds for the 4 queries, the first appearing after about 30 seconds. I don’t think there is a real issue: these queries are a bit heavy, and when you load the page, you are requesting 4 queries at the same time to Blazegraph. If necessary, the cache time (currently 10 min) can be increased to a "long" value like 1 or 2 days, so that the page will be (almost) always quickly displayed but will be always a bit aged. Seb35 (talk) 14:47, 30 April 2021 (UTC)
@VIGNERON & Seb35 could you check again. Now I've waited 1h30 and only "The most prolific speakers" appeared. I do not know how long I need to wait to see the others queries to be displayed. I think that more than 30 seconds is too long for most of the users; they just will think that it does not work. If we do not have a better solution, I should display a banner to indicate that it takes a very long time to see the results be displayed. Pamputt (talk) 20:53, 5 June 2021 (UTC)
Hmm, very weird. I tested again and now all the queries display after 15 seconds. Any explanation? Pamputt (talk) 13:48, 6 June 2021 (UTC)
@Pamputt, VIGNERON, & Seb35 From what I understood, the queries need to load, which takes from 1 minute to ?? minutes, but once they're loaded, if you (or someone else) load the page within the next 10 minutes, they will be displayed almost instantly thanks to the cache. What I don't understand is that, once they're loaded, if you purge the page's cache, they're still displayed instantly (so it may be another cache).
Would it be possible to keep this cache permanently (not just for 10 minutes), so that when anyone arrives on the page, they don't face a loading page (which seems to be broken when you are not aware)? The values would be uploaded by users who want the latest version (i.e. at least once a day) by clicking on a dedicated button (and the loading time would be only at this moment, assumed by the user). Would that be possible?
All the best — WikiLucas (🖋️) 08:06, 7 June 2021 (UTC)
@WikiLucas00 The core issue on this page is that queries are natively long (seems to be 1-2 minutes, and querying the five requests at the same time is making the problem worse). Indeed, there is a cache of 10 minutes implemented in nginx. Currently the the cache is global for all SPARQL requests (through QueryViz and through the Blazegraph interface and through the future SPARQL federation). I just pushed the updated nginx configuration for frontend and Blazegraph, see Github.
If I’m not mistaken, the cache can be purged with a hard refresh of the browser (Ctrl+F5).
(just tried, does not work: my Firefox does not send a "Cache-Control: no-cache" because it is a XHR request and not the main HTTP request, and explicitely sending this Cache-Control does not work any more, probably because nginx cache bypass the cache only if proxy_cache_bypass is set to something)
I find it is not desirable that SPARQL request in the Blazegraph interface have a long cache, at the contrary of QueryViz requests. However I can dissociate these two paths by specifying a different $wgQueryVizEndpoint in LocalSettings.php.
I can propose the following thing:
  • remove the cache on the Blazegraph interface and the future SPARQL federation,
  • put a long cache on QueryViz requests (long = 12h ?),
  • add proxy_cache_background_update on and proxy_cache_use_stale timeout updating on nginx to serve a possibly-stale response while updating the result of the request, so that it stays quite fresh as you propose as long as the page is visited,
  • possibly a cron can be added to refresh every 12h the results of the 5 SPARQL requests on this page Statistics.
It should work in theory, but I have no experience for now with these "stale-updating-requests".
On the long terme, QueryViz should be improved to display an error in case of response 504 (or other 500), and ideally be able to control the caching though parameters in the wikicode, but this needs a careful design because it could need cooperation with nginx and/or cron (the latter to keep fresh results); or the nginx cache could be replaced by a MediaWiki cache. All that needs to be discussed on a Phabricator ticket if desired – I precise QueryViz evolutions are not included in the contract we have with WMFR.
Seb35 (talk) 11:38, 14 June 2021 (UTC)
Let's go this way. Also, what about increasing the proxy_read_timeout on nginx so that we can avoid some 504's ? In regards to QueryViz, volunteer developers like myself will take care of that. --Poslovitch (talk) 08:18, 23 June 2021 (UTC)