Discussion:
[Design] Winter
Amir E. Aharoni
2015-07-20 20:49:36 UTC
Permalink
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/

Brandon's designs made a lot of sense and looked like a much-needed
refreshment of what should be MediaWiki's default skin, but now a few
months after he left the project appears to be in limbo.

Is there any intention to follow up on that or to start new work in that
area?

--
Amir Elisha Aharoni · א־מ֮י׹ אֱל֎ישׁ֞ע אַהֲךוֹנ֎י
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
S Page
2015-07-20 21:26:50 UTC
Permalink
Post by Amir E. Aharoni
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Brandon's designs made a lot of sense and looked like a much-needed
refreshment of what should be MediaWiki's default skin, but now a few
months after he left the project appears to be in limbo.

The Compact personal bar and Fixed header Beta features implement some
ideas from Winter. Both are available on the beta cluster. I know bug(s)
with CPB block further deployment

I think any official work on this would fall under the Reading team, I
don't see it on https://m.mediawiki.org/wiki/Reading/Strategy_and_Roadmap

I agree there's good stuff in Winter.
Legoktm
2015-07-21 05:19:37 UTC
Permalink
Post by S Page
The Compact personal bar and Fixed header Beta features implement some
ideas from Winter. Both are available on the beta cluster. I know bug(s)
with CPB block further deployment
It was undeployed yesterday[1][2].

[1] https://gerrit.wikimedia.org/r/#/c/225668/
[2] https://phabricator.wikimedia.org/T87489

-- Legoktm
MZMcBride
2015-07-21 05:30:44 UTC
Permalink
Post by Amir E. Aharoni
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Hi.

Yes, I remember Winter. It was a nice prototype.

Vector needs love, for sure. But my impression is that the Wikimedia
Foundation design team has neither the focus nor the commitment to provide
this support. This means you'll need volunteers, particularly ones capable
of working with the Wikimedia community to modernize the interface without
annoying or angering users (readers and editors alike) in the process.

My personal view is that a gradual approach is preferable to a large and
sudden redesign. There's now an experimental responsive mode in the Vector
skin. I'm very cautiously optimistic about its path forward.

MZMcBride
Trevor Parscal
2015-07-21 06:33:18 UTC
Permalink
I also believe that iterating on Vector is highly preferable to introducing
a new skin. That said, to try and be as fair as I can to Brandon, he
publicly declared last year at Wikimania London, "Winter is not a skin".

While I didn't understand his explaination of what it was, observationally
it appeared to be the user interface equivalent of a futurist predicting a
few years into the future in reasonable detail. Some of it will end up
being true, some will not. My understanding is that
the Winter implementation was of the semi-functional prototype variety
and little or none of the design work was based on usability research of
Vector, the status quo.

In contrast, Vector really is a skin that was implemented specifically for
production use and is now a battle tested platform from which to build
upon. Also, the UX improvements that were made over Monobook, the status
quo at the time, were based on usability research. This is a practice we
should continue with for future changes.

I know that it's sometimes exciting to people to make dramatic reveals of
proposals for sweeping changes. It's also fun to get excited about
them. However, this grand-unveiling boil-the-ocean approach never works out
in practice. It unnecessarily strains design, developement and community
engagement efforts. It is wasteful and wreckless. It is arrogant and
ignorant. It's not who we are, and it's not how we do things.

Even Vector was based heavily on Monobook, and in every way in which early
versions of Vector deviated from Monobook, without just cause, it was
"fixed" to be more similar. This was not wrong. Making arbitrary changes
was wrong. Starting from scratch is even worse.

We should carefully continue along the path of iterating on Vector. We
should gradually converge it's styling and implementation with that of OOjs
UI. We should continue improving usability and accessibility on a variety
of form factors. We should perform research and base changes on the
findings it produces. This will enable us to move forward with minimal
cost, and far less drama.

- Trevor
Post by MZMcBride
Post by Amir E. Aharoni
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Hi.
Yes, I remember Winter. It was a nice prototype.
Vector needs love, for sure. But my impression is that the Wikimedia
Foundation design team has neither the focus nor the commitment to provide
this support. This means you'll need volunteers, particularly ones capable
of working with the Wikimedia community to modernize the interface without
annoying or angering users (readers and editors alike) in the process.
My personal view is that a gradual approach is preferable to a large and
sudden redesign. There's now an experimental responsive mode in the Vector
skin. I'm very cautiously optimistic about its path forward.
MZMcBride
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
Ryan Lane
2015-07-21 17:52:03 UTC
Permalink
Post by Trevor Parscal
I also believe that iterating on Vector is highly preferable to
introducing a new skin. That said, to try and be as fair as I can to
Brandon, he publicly declared last year at Wikimania London, "Winter is not
a skin".
Post by Trevor Parscal
While I didn't understand his explaination of what it was, observationally
it appeared to be the user interface equivalent of a futurist predicting a
few years into the future in reasonable detail. Some of it will end up being
true, some will not. My understanding is that the Winter implementation was
of the semi-functional prototype variety and little or none of the design
work was based on usability research of Vector, the status quo.
Isn't that research 7 years old? I remember the usability project and for
all intents and purposes is was mostly a failure. We didn't meet most of the
terms of the grants and very little came out of it, other than vector, which
is a modest change from monobook.
Post by Trevor Parscal
In contrast, Vector really is a skin that was implemented specifically for
production use and is now a battle tested platform from which to build upon.
Also, the UX improvements that were made over Monobook, the status quo at
the time, were based on usability research. This is a practice we should
continue with for future changes.
"battle tested" == outdated and relatively unchanged in nearly 7 years. The
web evolves and Wikimedia does not (at least for readers).
Post by Trevor Parscal
I know that it's sometimes exciting to people to make dramatic reveals of
proposals for sweeping changes. It's also fun to get excited about
them. However, this grand-unveiling boil-the-ocean approach never works out
in practice. It unnecessarily strains design, developement and community
engagement efforts. It is wasteful and wreckless. It is arrogant and
ignorant. It's not who we are, and it's not how we do things.
This actually works amazingly well in practice for most organizations. Maybe
Wikimedia /should/ be this and maybe it /should/ be how Wikimedia does
things. Isn't a motto of the movement "Be bold"? What happened to that?
Maybe we should change things to "Be careful; it's scary to change".
Post by Trevor Parscal
Even Vector was based heavily on Monobook, and in every way in which early
versions of Vector deviated from Monobook, without just cause, it was
"fixed" to be more similar. This was not wrong. Making arbitrary changes was
wrong. Starting from scratch is even worse.
Only because the community is scared of change. Every community is, though.
People got used to Vector and they'd get used to Winter after a month or
two. This happens frequently to other major sites. The thing you need to
keep in mind is that you need to actually hold strong for a few months until
people get used to things, while fixing legitimate bugs.
Post by Trevor Parscal
We should carefully continue along the path of iterating on Vector. We
should gradually converge it's styling and implementation with that of OOjs
UI. We should continue improving usability and accessibility on a variety of
form factors. We should perform research and base changes on the findings it
produces. This will enable us to move forward with minimal cost, and far
less drama.
It's sad that Wikimedia has given up on users.

- Ryan Lane
Federico Leva (Nemo)
2015-07-21 18:26:49 UTC
Permalink
Post by Ryan Lane
Isn't a motto of the movement "Be bold"?
No. That's a motto for the wikis, because on wikis it's easy to reverse
every edit.

Nemo
Isarra Yos
2015-07-21 18:52:16 UTC
Permalink
I find this conversation worrying.
Post by Ryan Lane
I also believe that iterating on Vector is highly preferable tointroducing a
new skin.
Ideally, each new skin that is introduced is an interation on the
previous. What worked well is maintained and built upon, what didn't is
changed. We don't ever want to simply throw out what we have.

But we also need to find a point to break off and actually make it into
a new one. Keep going in the direction of winter (or whatever), and
there comes a point when it simply is not Vector anymore - and that's
fine, but there's no reason to take the Vector that was away from those
who legitimately liked it, either. We allow users (including third-party
users) their preferences, and there is historical value, too, in keeping
the older styles around in some form.
Post by Ryan Lane
In contrast, Vector really is a skin that was implemented specifically for
production use and is now a battle tested platform from which to build upon.
Also, the UX improvements that were made over Monobook, the status quo at
the time, were based on usability research. This is a practice we should
continue with for future changes.
"battle tested" == outdated and relatively unchanged in nearly 7 years. The
web evolves and Wikimedia does not (at least for readers).
Aye, we do need to move on. But there are also lessons in what has
lingered all this time - we need to look at it and understand why in
order to properly address it and serve the underlying needs. This is why
we iterate on what's there, and don't only make drastically new things.
Post by Ryan Lane
I know that it's sometimes exciting to people to make dramatic reveals of
proposals for sweeping changes. It's also fun to get excited about
them. However, this grand-unveiling boil-the-ocean approach never works out
in practice. It unnecessarily strains design, developement and community
engagement efforts. It is wasteful and wreckless. It is arrogant and
ignorant. It's not who we are, and it's not how we do things.
This actually works amazingly well in practice for most organizations. Maybe
Wikimedia /should/ be this and maybe it /should/ be how Wikimedia does
things.
We are not most organisations; where many answer to external
stakeholders, and the consumers are simply the product, that is not so
here. Wikimedia doesn't just answer to its communities, it IS the
communities - all of them, the various projects, the WMF, GLAM, even
dark corners of Commons and random people doing meetups for editathons -
and its purpose is not profit, but education via a tenable, usable end
result of efforts from all of them.
Post by Ryan Lane
Isn't a motto of the movement "Be bold"? What happened to that?
Maybe we should change things to "Be careful; it's scary to change".
Neither of these work without the other. Being bold, you must be
careful, or it will blow up in your face. Being careful gets you nowhere
without also being bold.
Post by Ryan Lane
Even Vector was based heavily on Monobook, and in every way in which early
versions of Vector deviated from Monobook, without just cause, it was
"fixed" to be more similar. This was not wrong. Making arbitrary changes was
wrong. Starting from scratch is even worse.
Only because the community is scared of change. Every community is, though.
People got used to Vector and they'd get used to Winter after a month or
two. This happens frequently to other major sites. The thing you need to
keep in mind is that you need to actually hold strong for a few months until
people get used to things, while fixing legitimate bugs.
"The community is scared of change" seems to be a common excuse from
those too scared to work with communities outside of their own.

And many communities do propose change - some changes are good, some not
so good, some need more resources to ever actually work. Just shoving
things down people's throats, however, does not work. Consider the
multimedia viewer, which needed an overhaul for copyrights alone and is
still problematic to date. Consider visual editor when it was first
released; even now, when it is so much more powerful, it isn't even
available by default on many major wikis. Consider the typography
refresh, which has been piecemeally reverted over the course of months.
Then look at extensions like massmessage, abusefilter,
timedmediahandler, apisandbox, globalcssjs, and others which considered
the use cases and worked with the end users to make a sensible product
with little reason to reject it. These may be smaller changes, or less
reader-facing, but the way they were developed, never even mind how they
were introduced, is particularly important. People were involved,
problems were considered.

If you want to know what the "community" is afraid of, it's not change.
It's things being developed entirely without them even in mind, getting
shoved at them forcefully, and breaking what workflows they have. Unlike
for some organisations, these are not simply users we profit off of
while they amuse themselves, but volunteers donating their time, effort,
and content, and they are the ones you should be concerning yourself
with always. Not the readers, them.

We make the content work for the readers so that the volunteers' efforts
are not in vain.
Post by Ryan Lane
We should carefully continue along the path of iterating on Vector. We should
gradually converge it's styling and implementation with that of OOjs
UI. We should continue improving usability and accessibility on a variety of
form factors. We should perform research and base changes on the findings it
produces. This will enable us to move forward with minimal cost, and far
less drama.
It's sad that Wikimedia has given up on users.
Who has given up? The fact that we are even having this conversation
seems pretty clear evidence that we haven't just yet.
Ryan Lane
2015-07-21 21:38:18 UTC
Permalink
Post by Isarra Yos
Aye, we do need to move on. But there are also lessons in what has
lingered all this time - we need to look at it and understand why in
order to properly address it and serve the underlying needs. This is why
we iterate on what's there, and don't only make drastically new things.
Do we actually know the lessons? Are they listed anywhere? Are they valid
anymore? Do modern web practices cover them?

It's great to iterate on things when they are relatively modern. It's folly
to do so when you're almost a decade behind the industry standard. The
argument itself is odd because Vector has not been iterating steadily
towards modern practices. It's been stagnant for years.
Post by Isarra Yos
We are not most organisations; where many answer to external
stakeholders, and the consumers are simply the product, that is not so
here. Wikimedia doesn't just answer to its communities, it IS the
communities - all of them, the various projects, the WMF, GLAM, even
dark corners of Commons and random people doing meetups for editathons -
and its purpose is not profit, but education via a tenable, usable end
result of efforts from all of them.
The community you're talking about is the editor community, which is a tiny
fraction of the overall community, but attempts to speak with authority over
the entirety of it. The vocal portion of the editor community that speaks
with this authority is even a minor fraction of the editor community. We're
talking about .001% of the entire community that holds the entire movement
hostage (5167 people voted in the last election, and there's 430 million
monthly active readers).

The reader community is massive and has no voice, except their complaints
across the internet. The WMF can and should be the voice for the reader
community.
Post by Isarra Yos
Post by Ryan Lane
Isn't a motto of the movement "Be bold"? What happened to that?
Maybe we should change things to "Be careful; it's scary to change".
Neither of these work without the other. Being bold, you must be
careful, or it will blow up in your face. Being careful gets you nowhere
without also being bold.
The status quo is that change never happens because people are too scared to
change. There's no boldness here. There's hardly even basic assertiveness.
Post by Isarra Yos
"The community is scared of change" seems to be a common excuse from
those too scared to work with communities outside of their own.
Or an argument of those who think it's not in the readers' best interest to
have editors with little to no knowledge of software engineering or UX
design dictating the engineering and design of reader features.
Post by Isarra Yos
And many communities do propose change - some changes are good, some not
so good, some need more resources to ever actually work. Just shoving
things down people's throats, however, does not work. Consider the
multimedia viewer, which needed an overhaul for copyrights alone and is
still problematic to date. Consider visual editor when it was first
released; even now, when it is so much more powerful, it isn't even
available by default on many major wikis. Consider the typography
refresh, which has been piecemeally reverted over the course of months.
Then look at extensions like massmessage, abusefilter,
timedmediahandler, apisandbox, globalcssjs, and others which considered
the use cases and worked with the end users to make a sensible product
with little reason to reject it. These may be smaller changes, or less
reader-facing, but the way they were developed, never even mind how they
were introduced, is particularly important. People were involved,
problems were considered.
If you want to know what the "community" is afraid of, it's not change.
It's things being developed entirely without them even in mind, getting
shoved at them forcefully, and breaking what workflows they have. Unlike
for some organisations, these are not simply users we profit off of
while they amuse themselves, but volunteers donating their time, effort,
and content, and they are the ones you should be concerning yourself
with always. Not the readers, them.
We make the content work for the readers so that the volunteers' efforts
are not in vain.
I've also volunteered my time for the past 10 years, but as an engineer. I
care about Wikimedia more as a reader than as an editor and my experience as
a reader is not great and the editor community is the primary reason for
this. The WMF's hesitation to make change is heavily based on the pitchforks
and torches lit by this community.
Post by Isarra Yos
Post by Ryan Lane
We should carefully continue along the path of iterating on Vector. We should
gradually converge it's styling and implementation with that of OOjs
UI. We should continue improving usability and accessibility on a variety of
form factors. We should perform research and base changes on the findings it
produces. This will enable us to move forward with minimal cost, and far
less drama.
It's sad that Wikimedia has given up on users.
Who has given up? The fact that we are even having this conversation
seems pretty clear evidence that we haven't just yet.
There's not really a conversation. The UX lead is saying "Winter is dead,
let's continue with the iterations on Vector", though there's no real
iteration going on. The editor community is opposed to any change that
doesn't completely agree with them, where the "them" is around 5,000 people
who also can't agree with each other and aren't qualified to be making the
decisions to begin with.

- Ryan Lane
Legoktm
2015-07-21 22:23:33 UTC
Permalink
Post by Ryan Lane
There's not really a conversation. The UX lead is saying "Winter is dead,
let's continue with the iterations on Vector", though there's no real
iteration going on.
I'd consider <https://gerrit.wikimedia.org/r/#/c/220667/> to be a good
start of iterating on Vector.

-- Legoktm
Jon Robson
2015-07-21 23:00:31 UTC
Permalink
My views are most closely aligned with Ryan to be honest and
historically I've lost 3rd party users to mediawiki instances because
of how it looks, and the choice isn't great out there. I'm yet to meet
someone outside our community who likes how Wikipedia looks, that's
always the first thing they complain about. I fear we suffer from
Stockholm syndrome working in our codebase that we forget about those
voices that don't get heard. We are the .001%!

Whilst I'm glad to see the patch lego pointed to merged, I would wage
money that $wgVectorResponsive when set to true would cause a lot of
backlash (some people just don't like responsive sites [1]) and I
predict it will need to become a separate skin called VectorResponsive
to keep 'everyone happy'.

I think it's okay to iterate, but from my many experiences in the
mediawiki skin world, you have to leave the status quo as an option
and make the new skin experience opt in. Even then it's hard to get
things out of opt in mode - personal compact toolbar was well received
on the most part but a complete hack in implementation yet I saw no
progress in consolidating it into our experience.

Vector is not evolving, otherwise it would have happened already. The
only changes to it in the past 3 years have been badly received
typography changes and minor tweaks.

Traditionally, more skins has created more headaches, but maybe it's
time to rethink this infrastructure [2] and encourage a more abundant
selection of skins on our wikis. From my perspective the lack of
competition in the Wikipedia skin world is preventing innovation. FWIW
I'd love to have a go at making a new skin based on Winter's ideas in
my spare time with a fixed header, but given that I have no confidence
it will ever get on the cluster I have no motivation to do this. Where
is Apex deployed for example [3]? Why can't I try this out on
Wikipedia and see if I prefer the experience?

The closest thing I see to MediaWiki are Wikia wiki's and Wordpress
and both of those seem to have a much more active and healthy skin
ecosystem. Is this something we want to recreate or are we saying that
Vector is the only skin MediaWiki will ever need? If that's the case,
I'm troubled.

In MobileFrontend the Minerva skin was created and I would estimate is
the most actively developed of skins at the moment. We make decisions
that people don't like, to keep the interface as simple and
uncluttered as we possibly can, as that's what it's designed for.
People can choose Vector if they prefer that experience on mobile, and
I truly hope they'll be able to try a responsive version of Vector
too. I'm aware some people hate it but at least it's trying to create
a drastically different Wikipedia site experience and I'd like to see
more skins like this. Choice is an important aspect of any open source
project.

[1] https://www.google.com/search?q=i+hate+responsive+sites&oq=i+hate+responsive+sites&aqs=chrome..69i57j69i60j69i64l3j69i60.2814j0j4&sourceid=chrome&es_sm=91&ie=UTF-8#q=i+hate+responsive+sites&start=10
[2] https://www.mediawiki.org/wiki/Requests_for_comment/Redo_skin_framework
[3] https://www.mediawiki.org/wiki/Skin:Apex
Post by Legoktm
Post by Ryan Lane
There's not really a conversation. The UX lead is saying "Winter is dead,
let's continue with the iterations on Vector", though there's no real
iteration going on.
I'd consider <https://gerrit.wikimedia.org/r/#/c/220667/> to be a good
start of iterating on Vector.
-- Legoktm
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
bawolff
2015-07-22 10:35:05 UTC
Permalink
Post by Jon Robson
Traditionally, more skins has created more headaches, but maybe it's
time to rethink this infrastructure [2] and encourage a more abundant
selection of skins on our wikis. From my perspective the lack of
competition in the Wikipedia skin world is preventing innovation. FWIW
I'd love to have a go at making a new skin based on Winter's ideas in
my spare time with a fixed header, but given that I have no confidence
it will ever get on the cluster I have no motivation to do this. Where
is Apex deployed for example [3]? Why can't I try this out on
Wikipedia and see if I prefer the experience?
I've heard complaints from some skin writers, that the lack of
stability in MediaWiki's skin system is a major annoyance for them. I
don't know how representative that view is, but redesigning the skin
system every 6 months is probably not a great way to get more skins
made.
Post by Jon Robson
The closest thing I see to MediaWiki are Wikia wiki's and Wordpress
and both of those seem to have a much more active and healthy skin
ecosystem. Is this something we want to recreate or are we saying that
Vector is the only skin MediaWiki will ever need? If that's the case,
I'm troubled.
Last I checked, Wikia was running MediaWiki, and their code was open
source (albeit, with weird dependencies).

Its unsurprising that wordpress is beating us in skin diversity, given
the use case and how the install base of wordpress is distributed.

Our skin ecosystem could probably be better, but I'm unconvinced by
this comparison. I don't think its as horrible as you make it out
though. I've seen plenty of wikis use skins that are not
monobook/vector.
Post by Jon Robson
Choice is an important aspect of any open source
project.
As a general statement, that's debatable. There's plenty of open
source projects that specifically try to reduce choice in order to be
minimal, or meet other requirements. As far as MediaWiki, goes I'd
agree that skin choice is an important goal. Its not entirely clear
that that is an important goal for Wikimedia though.

--
bawolff
Jonathan Morgan
2015-07-23 20:31:39 UTC
Permalink
Post by Jon Robson
My views are most closely aligned with Ryan to be honest and
historically I've lost 3rd party users to mediawiki instances because
of how it looks, and the choice isn't great out there. I'm yet to meet
someone outside our community who likes how Wikipedia looks, that's
always the first thing they complain about. I fear we suffer from
Stockholm syndrome working in our codebase that we forget about those
voices that don't get heard. We are the .001%!
If the problem is that important voices (readers') are not being heard, the
solution is to ask them, not push for global deployment of a completely new
and basically untested UI concept. Readers are no less opinionated than
editors, and their wants and needs are no less important or heterogeneous.
Whether Winter looks more in line with someone (Ryan's?) idea of "the
industry standard" in 2015 than Vector doesn't mean it provides a better
experience for anyone.

You can't just assert that Winter's an improvement; you have to test.
Winter was designed based on a certain set of assumptions on what people
want out of their Wikipedia reading/editing experience. Even if you
believe, as I do, that many of these are good/clever/inspired assumptions,
Winter (or new features introduced by Winter) still needs to be tested
before they are deployed as the default option on Wikimedia wikis. Vector
was also designed based assumptions... but it also had the benefit of a
whole lot of user testing and community consultation.
Post by Jon Robson
I think it's okay to iterate, but from my many experiences in the
mediawiki skin world, you have to leave the status quo as an option
and make the new skin experience opt in. Even then it's hard to get
things out of opt in mode - personal compact toolbar was well received
on the most part but a complete hack in implementation yet I saw no
progress in consolidating it into our experience.
The fact that iterating takes time, and that it's hard to get existing
users to adopt new software, is not a valid argument for making sudden,
sweeping changes to the desktop Wikipedia interface. Iterating takes time
because when it's done well (read: when you're actually iterating, rather
than making ad hoc changes), the software is being improved *for the people
it's designed for* and *for the things its designed to do*. If you think
it's going to be hard to drive adoption of incremental UI improvements, try
getting buy in on a whole slew of them introduced all at once, without a
solid rationale or empirical evidence to back up your decision.
Post by Jon Robson
Vector is not evolving, otherwise it would have happened already. The
only changes to it in the past 3 years have been badly received
typography changes and minor tweaks.
This sounds like a problem with process, not a problem with Vector.
Switching to Winter won't fix it. If we somehow managed to introduce Winter
tomorrow, how would we assure that it continued to evolve?
Post by Jon Robson
Traditionally, more skins has created more headaches, but maybe it's
time to rethink this infrastructure [2] and encourage a more abundant
selection of skins on our wikis. From my perspective the lack of
competition in the Wikipedia skin world is preventing innovation. FWIW
I'd love to have a go at making a new skin based on Winter's ideas in
my spare time with a fixed header, but given that I have no confidence
it will ever get on the cluster I have no motivation to do this. Where
is Apex deployed for example [3]? Why can't I try this out on
Wikipedia and see if I prefer the experience?
This seems to be the heart of the problem (at least, the problem for WMF as
a software company). We need to make it easier to test and then incorporate
test results (including direct user feedback) into products. Again, this is
a process/infrastructure issue, not a problem with our current UI. Tests
can be standard usability studies; single-user opt-in deployments (like
beta features); time-limited pilots for a single wiki, namespace, or page;
or controlled A/B tests with random sampling of a class of users. None of
that has anything to do with whether Winter is better, or worse, than
Vector.

I like Winter. I'd like to see us move in that direction. But what I really
want to do is test whether Winter works for the people it's supposed to:
readers and editors. Because not everyone likes what I like, and not
everyone interacts with Wikipedia/MediaWiki the way I do.

We're talking about Winter like it's one thing, but it's really a
collection of bold, interesting design ideas. I find many of these design
ideas compelling ('sticky' search/menu bar; responsive design), other less
so (hiding the ToC under a hamburger menu...ugh). It's not an all or
nothing proposition with Winter, or with Vector. We should be talking about
how to upgrade our testing infrastructure and our design process so that we
can incorporate the best parts of Winter into the default MediaWiki user
experience of MediaWiki. Then we can call it whatever we want.
Post by Jon Robson
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Jon Robson
2015-07-24 00:49:24 UTC
Permalink
Post by Jonathan Morgan
Post by Jon Robson
My views are most closely aligned with Ryan to be honest and
historically I've lost 3rd party users to mediawiki instances because
of how it looks, and the choice isn't great out there. I'm yet to meet
someone outside our community who likes how Wikipedia looks, that's
always the first thing they complain about. I fear we suffer from
Stockholm syndrome working in our codebase that we forget about those
voices that don't get heard. We are the .001%!
If the problem is that important voices (readers') are not being heard, the
solution is to ask them, not push for global deployment of a completely new
Yes, agreed. The reading web team is actually thinking about ways we
can gather feedback from our reader audience to aid design.
Post by Jonathan Morgan
and basically untested UI concept. Readers are no less opinionated than
editors, and their wants and needs are no less important or heterogeneous.
Whether Winter looks more in line with someone (Ryan's?) idea of "the
industry standard" in 2015 than Vector doesn't mean it provides a better
experience for anyone.
Agreed. I do recognise however that Vector is not the best experience
and I'm lamenting our conservativeness in the area of skins. I'm
personally frustrated that it seems that despite recognising we have
no understand of how to go about making it better. I personally do not
feel empowered to try things and listen to the things we have learnt
fromexperiments. The recent change legoktm points out slaps some
responsive styles on Vector. It's not clear how we are going to test
this and measure whether it is good or bad and eventually make a
decision whether we should do it or not (FWIW I think slapping on
media queries is not a recipe for success in making mobile device
friendly experience but I was happy to see someone try something and
I'm happy to be proved wrong)
Post by Jonathan Morgan
You can't just assert that Winter's an improvement; you have to test.
I'm personally not asserting anything but FWIW I recall user tests
were run on experiments such as the fixed header and showed that
people found located items better. We didn't see it through to
completion though (whoever was involved in that please let me know
what happened).
Post by Jonathan Morgan
Winter was designed based on a certain set of assumptions on what people want out
of their Wikipedia reading/editing experience. Even if you believe, as I do,
that many of these are good/clever/inspired assumptions, Winter (or new
features introduced by Winter) still needs to be tested before they are
deployed as the default option on Wikimedia wikis. Vector was also designed
based assumptions... but it also had the benefit of a whole lot of user
testing and community consultation.
Sure and as I said above a lot of them were - but I wasn't involved in that.
Post by Jonathan Morgan
Post by Jon Robson
I think it's okay to iterate, but from my many experiences in the
mediawiki skin world, you have to leave the status quo as an option
and make the new skin experience opt in. Even then it's hard to get
things out of opt in mode - personal compact toolbar was well received
on the most part but a complete hack in implementation yet I saw no
progress in consolidating it into our experience.
The fact that iterating takes time, and that it's hard to get existing users
to adopt new software, is not a valid argument for making sudden, sweeping
changes to the desktop Wikipedia interface.
I wasn't suggesting this, but as Ryan says, various big websites do
big redesigns, and do just fine. These designs are not sweeping
changes, they have been iterated on and beta tested on small
audiences, over a period of time, and then suddenly unveiled in
completion to an audience, so despite the backlash that is guaranteed
by big redesigns from some of your users, on the long term these
websites have made informed decisions on how the site should look to
improve the usability and experience of users.

Iterating takes time because
Post by Jonathan Morgan
when it's done well (read: when you're actually iterating, rather than
making ad hoc changes),
Sure.. but right now we don't even seem to be iterating and that to me
is the problem. We've tried iterating in beta features but those
initiatives (personal compact toolbar, typography refresh, multimedia
viewer) struggled for various reasons.
Post by Jonathan Morgan
the software is being improved for the people it's
designed for and for the things its designed to do. If you think it's going
to be hard to drive adoption of incremental UI improvements, try getting buy
in on a whole slew of them introduced all at once, without a solid rationale
or empirical evidence to back up your decision.
Post by Jon Robson
Vector is not evolving, otherwise it would have happened already. The
only changes to it in the past 3 years have been badly received
typography changes and minor tweaks.
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking.
How do people think we should improve this process? We do a lot of
lamenting and defending on this list but never seem to offer action
items... any bold offers about how we reverse this anti-pattern?
Post by Jonathan Morgan
Post by Jon Robson
Traditionally, more skins has created more headaches, but maybe it's
time to rethink this infrastructure [2] and encourage a more abundant
selection of skins on our wikis. From my perspective the lack of
competition in the Wikipedia skin world is preventing innovation. FWIW
I'd love to have a go at making a new skin based on Winter's ideas in
my spare time with a fixed header, but given that I have no confidence
it will ever get on the cluster I have no motivation to do this. Where
is Apex deployed for example [3]? Why can't I try this out on
Wikipedia and see if I prefer the experience?
This seems to be the heart of the problem (at least, the problem for WMF as
a software company). We need to make it easier to test and then incorporate
test results (including direct user feedback) into products. Again, this is
a process/infrastructure issue, not a problem with our current UI. Tests can
be standard usability studies; single-user opt-in deployments (like beta
features); time-limited pilots for a single wiki, namespace, or page; or
controlled A/B tests with random sampling of a class of users. None of that
has anything to do with whether Winter is better, or worse, than Vector.
I like Winter. I'd like to see us move in that direction. But what I really
readers and editors. Because not everyone likes what I like, and not
everyone interacts with Wikipedia/MediaWiki the way I do.
We're talking about Winter like it's one thing, but it's really a collection
of bold, interesting design ideas. I find many of these design ideas
compelling ('sticky' search/menu bar; responsive design), other less so
(hiding the ToC under a hamburger menu...ugh). It's not an all or nothing
proposition with Winter, or with Vector. We should be talking about how to
upgrade our testing infrastructure and our design process so that we can
incorporate the best parts of Winter into the default MediaWiki user
experience of MediaWiki. Then we can call it whatever we want.
Post by Jon Robson
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF)
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
Nihiltres
2015-07-24 08:17:49 UTC
Permalink
Post by Jon Robson
Post by Jonathan Morgan
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking.
How do people think we should improve this process? We do a lot of
lamenting and defending on this list but never seem to offer action
items... any bold offers about how we reverse this anti-pattern?
We need the *process* to be more obvious, and the *principles* behind the changes agreed-upon. I'll elaborate, but first, some context…
Post by Jon Robson
What I'm saying is that there should be a process to make an interface
change directed at readers, with stated test results, A/B tested, and
adopted if testing meets the criteria of the test results. The editor
community should have little to no say in the process, except to suggest
experiments or question obviously incorrect test results.
The basic idea is that through proper testing of features you should be able
to know an experience is better for the readers without them having a direct
voice.
An example: Make search more discoverable. Add a feature or make an
interface change to test this. A/B test it. See if the frequency of search
usage increased. See if it adversely affected other metrics. If it helped
search usage and didn't negatively affect other metrics, adopt the change.
The issue is that there will be a vocal minority of people who absolutely
hate this change, no matter what it is. These people should be ignored.
The editor community should have little to no say in the process
or
Post by Jon Robson
a vocal minority
or, worst,
Post by Jon Robson
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*, and that's precisely what I see above. This is not a productive approach, because it pits stakeholders against one another. Wikipedia is not a *competition*, it's supposed to be a *collaboration*. It's even worse when it's framed in the otherwise reasonable context of A/B testing, because that conceals the part of it that has one particular subset of stakeholders decide what metrics (e.g. search) are important. While I do disagree, I don't mean to argue specifically against Ryan Lane's position here—I'm just using it as an example of positions that exacerbate the social problems. It doesn't matter in what ways he or I are right or wrong on the approach if it's going to lead to another conflict.

If we ignore people, or worse, specifically disenfranchise them, that's sure to lead to conflict when the interested stakeholders pursue their interests and thus become that "vocal minority". Rather, we need an obvious process, backed by principles that most of everyone can agree on, so that we don't hit catches like one-sided priorities. Yes, we do need to figure out how to make sure that reader interests are represented in those principles. If the shared process and shared principles lead us to something that some people don't agree with, *then* there might be a justification to tell that minority to stuff it in the name of progress.

I'll leave off there, because the next thing I intuitively want to go onto involve my personal views, and those aren't relevant to this point (they can wait for later). Instead: a question: what *principles* ought to underpin designs moving forward from Vector? If we can't work through disagreements there, we're going to see objections once an unbalanced set of principles are implemented in design patterns.


Nihiltres
Derk-Jan Hartman
2015-07-24 11:13:18 UTC
Permalink
While I agree in principle with what Nihiltres states, it doesn't help us
very much. There is so much resentment that has built up on several sides,
that I don't see how we are going to get past that.

Also the scaling required to fulfill everyone's wishes using the stated
methodology is huge. Think in the order of putting at least 25 people on
requirements analyses, design and technology work for a year. My gut
feeling, based on years of Dev and Wikimedia experience tells me that this
would be a bigger project than VE. Which is of course insane with it being
'just' a skin, but it's the only way we can right this ship, unless a lot
of people learn something about the virtues of imperfection.

DJ
Post by Nihiltres
Post by Jon Robson
Post by Jonathan Morgan
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking.
How do people think we should improve this process? We do a lot of
lamenting and defending on this list but never seem to offer action
items... any bold offers about how we reverse this anti-pattern?
We need the *process* to be more obvious, and the *principles* behind the
changes agreed-upon. I'll elaborate, but first, some context

Post by Jon Robson
What I'm saying is that there should be a process to make an interface
change directed at readers, with stated test results, A/B tested, and
adopted if testing meets the criteria of the test results. The editor
community should have little to no say in the process, except to suggest
experiments or question obviously incorrect test results.
The basic idea is that through proper testing of features you should be
able
Post by Jon Robson
to know an experience is better for the readers without them having a
direct
Post by Jon Robson
voice.
An example: Make search more discoverable. Add a feature or make an
interface change to test this. A/B test it. See if the frequency of
search
Post by Jon Robson
usage increased. See if it adversely affected other metrics. If it helped
search usage and didn't negatively affect other metrics, adopt the
change.
Post by Jon Robson
The issue is that there will be a vocal minority of people who absolutely
hate this change, no matter what it is. These people should be ignored.
The editor community should have little to no say in the process
or
Post by Jon Robson
a vocal minority
or, worst,
Post by Jon Robson
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*,
and that's precisely what I see above. This is not a productive approach,
because it pits stakeholders against one another. Wikipedia is not a
*competition*, it's supposed to be a *collaboration*. It's even worse when
it's framed in the otherwise reasonable context of A/B testing, because
that conceals the part of it that has one particular subset of stakeholders
decide what metrics (e.g. search) are important. While I do disagree, I
don't mean to argue specifically against Ryan Lane's position here—I'm just
using it as an example of positions that exacerbate the social problems. It
doesn't matter in what ways he or I are right or wrong on the approach if
it's going to lead to another conflict.
If we ignore people, or worse, specifically disenfranchise them, that's
sure to lead to conflict when the interested stakeholders pursue their
interests and thus become that "vocal minority". Rather, we need an obvious
process, backed by principles that most of everyone can agree on, so that
we don't hit catches like one-sided priorities. Yes, we do need to figure
out how to make sure that reader interests are represented in those
principles. If the shared process and shared principles lead us to
something that some people don't agree with, *then* there might be a
justification to tell that minority to stuff it in the name of progress.
I'll leave off there, because the next thing I intuitively want to go onto
involve my personal views, and those aren't relevant to this point (they
can wait for later). Instead: a question: what *principles* ought to
underpin designs moving forward from Vector? If we can't work through
disagreements there, we're going to see objections once an unbalanced set
of principles are implemented in design patterns.
Nihiltres
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
Ryan Lane
2015-07-24 23:54:00 UTC
Permalink
Post by Nihiltres
Post by Ryan Lane
An example: Make search more discoverable. Add a feature or make an
interface change to test this. A/B test it. See if the frequency of search
usage increased. See if it adversely affected other metrics. If it helped
search usage and didn't negatively affect other metrics, adopt the change.
The issue is that there will be a vocal minority of people who absolutely
hate this change, no matter what it is. These people should be ignored.
The editor community should have little to no say in the process
or
Post by Ryan Lane
a vocal minority
or, worst,
Post by Ryan Lane
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*,
and that's precisely what I see
Post by Nihiltres
above. This is not a productive approach, because it pits stakeholders
against one another. Wikipedia is
Post by Nihiltres
not a *competition*, it's supposed to be a *collaboration*. It's even
worse when it's framed in the
Post by Nihiltres
otherwise reasonable context of A/B testing, because that conceals the
part of it that has one particular
Post by Nihiltres
subset of stakeholders decide what metrics (e.g. search) are important.
While I do disagree, I don't mean
Post by Nihiltres
to argue specifically against Ryan Lane's position here—I'm just using it
as an example of positions
Post by Nihiltres
that exacerbate the social problems. It doesn't matter in what ways he or
I are right or wrong on the
Post by Nihiltres
approach if it's going to lead to another conflict.
The idea is to remove the social or political problems from the process.
Define the goals and feature sets (this is the part of the process that
requires community interaction), implement and test the changes, review the
results. The data is the voice of the community. It's what proves if an idea
is good or bad.

As I said before, though, there's always some vocal minority that will hate
change, even when it's presented with data proving it to be good. These
people should be ignored at this stage of the process. They can continue to
provide input to future changes, but the data should be authoritative.
Post by Nihiltres
If we ignore people, or worse, specifically disenfranchise them, that's
sure to lead to conflict when the
Post by Nihiltres
interested stakeholders pursue their interests and thus become that "vocal
minority". Rather, we need
Post by Nihiltres
an obvious process, backed by principles that most of everyone can agree
on, so that we don't hit catches
Post by Nihiltres
like one-sided priorities. Yes, we do need to figure out how to make sure
that reader interests are
Post by Nihiltres
represented in those principles. If the shared process and shared
principles lead us to something that
Post by Nihiltres
some people don't agree with, *then* there might be a justification to
tell that minority to stuff it in the
Post by Nihiltres
name of progress.
I'll leave off there, because the next thing I intuitively want to go onto
involve my personal views, and
Post by Nihiltres
those aren't relevant to this point (they can wait for later). Instead: a
question: what *principles*
Post by Nihiltres
ought to underpin designs moving forward from Vector? If we can't work
through disagreements there,
Post by Nihiltres
we're going to see objections once an unbalanced set of principles are
implemented in design patterns.
There's not really a lack of principles, there's a lack of reasonable
process. What's wrong with change guided by data science? We know the
scientific process works. The current process is design by a committee
that's comprised mostly of people untrained in the field, with no data
proving anyone's case. Even when there is data it's often ignored in favor
of consensus of the editor community.

- Ryan
Steven Walling
2015-07-25 04:32:25 UTC
Permalink
Having been away from WMF engineering and design for almost a year, I'd
like to reiterate how what Ryan is outlining is not some bold outlandish
idea. In fact, it's standard operating procedure for how the top tier of
product development is done at every non-enterprise software company worth
a damn.

At Quora, we basically follow a version of this philosophy, though we
certainly consult a ton directly with our community before, during and
after the product development process. We just do this in a consultative
way—not a consensus driven one.

I would say the one unintentionally misleading part of what Ryan is saying
is that it makes it sound like a zero sum game where the company wins and
the community loses.

It's actually the opposite. Wikimedians today, wanting things to be perfect
according to their standards and for consensus among all to be arrived at,
dramatically slow things down and increasing the time/cost of development.
When you have a much more rapid pace of change enabled by data and by the
ability to ignore vocal minorities, it means more stuff gets done. This
would free up huge amounts of design and development hours to focus on
fixing tools near and dear to said vocal minorities, ultimately making
everyone happier.
Post by Trevor Parscal
Post by Ryan Lane
An example: Make search more discoverable. Add a feature or make an
interface change to test this. A/B test it. See if the frequency of
search
Post by Ryan Lane
usage increased. See if it adversely affected other metrics. If it
helped
Post by Ryan Lane
search usage and didn't negatively affect other metrics, adopt the
change.
Post by Ryan Lane
The issue is that there will be a vocal minority of people who
absolutely
Post by Ryan Lane
hate this change, no matter what it is. These people should be ignored.
This is *exactly* the sort of issue that leads to conflict. Some parts
Post by Ryan Lane
The editor community should have little to no say in the process
or
Post by Ryan Lane
a vocal minority
or, worst,
Post by Ryan Lane
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*,
and that's precisely what I see
above. This is not a productive approach, because it pits stakeholders
against one another. Wikipedia is
not a *competition*, it's supposed to be a *collaboration*. It's even
worse when it's framed in the
otherwise reasonable context of A/B testing, because that conceals the
part of it that has one particular
subset of stakeholders decide what metrics (e.g. search) are important.
While I do disagree, I don't mean
to argue specifically against Ryan Lane's position here—I'm just using it
as an example of positions
that exacerbate the social problems. It doesn't matter in what ways he or
I are right or wrong on the
approach if it's going to lead to another conflict.
The idea is to remove the social or political problems from the process.
Define the goals and feature sets (this is the part of the process that
requires community interaction), implement and test the changes, review the
results. The data is the voice of the community. It's what proves if an idea
is good or bad.
As I said before, though, there's always some vocal minority that will hate
change, even when it's presented with data proving it to be good. These
people should be ignored at this stage of the process. They can continue to
provide input to future changes, but the data should be authoritative.
If we ignore people, or worse, specifically disenfranchise them, that's
sure to lead to conflict when the
interested stakeholders pursue their interests and thus become that
"vocal
minority". Rather, we need
an obvious process, backed by principles that most of everyone can agree
on, so that we don't hit catches
like one-sided priorities. Yes, we do need to figure out how to make sure
that reader interests are
represented in those principles. If the shared process and shared
principles lead us to something that
some people don't agree with, *then* there might be a justification to
tell that minority to stuff it in the
name of progress.
I'll leave off there, because the next thing I intuitively want to go
onto
involve my personal views, and
those aren't relevant to this point (they can wait for later). Instead: a
question: what *principles*
ought to underpin designs moving forward from Vector? If we can't work
through disagreements there,
we're going to see objections once an unbalanced set of principles are
implemented in design patterns.
There's not really a lack of principles, there's a lack of reasonable
process. What's wrong with change guided by data science? We know the
scientific process works. The current process is design by a committee
that's comprised mostly of people untrained in the field, with no data
proving anyone's case. Even when there is data it's often ignored in favor
of consensus of the editor community.
- Ryan
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
Brian Wolff
2015-07-25 13:30:21 UTC
Permalink
Post by Ryan Lane
The idea is to remove the social or political problems from the process.
Everyone in basically any context wants to remove social and political
problems. Ignoring them is not the same as removing them.
Post by Ryan Lane
Define the goals and feature sets (this is the part of the process that
requires community interaction), implement and test the changes, review the
results. The data is the voice of the community. It's what proves if an idea
is good or bad.
As I said before, though, there's always some vocal minority that will hate
To be clear I dont think every small vocal minority needs to be taken into
account and i dont think wikipedians do either. Sometimes people seem to
use the word vocal minority for a majority of users in some class.
Post by Ryan Lane
change, even when it's presented with data proving it to be good. These
people should be ignored at this stage of the process. They can continue to
provide input to future changes, but the data should be authoritative.
Data does not prove things "good". Data proves (or more likely provides
some support but not proves) some objective hypothesis. Proving normative
claims with objective data is pretty impossible.

That may sound pendantic, but i think its an important distinction.
Evidence should be presented in the form of "This change improved
findability of the edit button by 40% among anons in our experiment [link
to details]. Therefor I/we believe this is a good change because I/we think
that findability of edit button is important". Separating what the data
proves and what are personal opinions about the data is important to make
the "science" sound legitament and not manipulatrd.
Post by Ryan Lane
There's not really a lack of principles, there's a lack of reasonable
process. What's wrong with change guided by data science? We know the
scientific process work.
We know its also extremely easy to manipulate, especially when the science
is only done by one party that has a specific objective. It can also be
myopic, concentrating on one factor well ignoring the holistic whole.

Ultimately the usefulness depends on the skill of whomever is doesigning
and conducting the experiments.
Post by Ryan Lane
The current process is design by a committee
that's comprised mostly of people untrained in the field, with no data
proving anyone's case. Even when there is data it's often ignored in favor
of consensus of the editor community.
Consensus of the editor commmunity is ancedotal data. That data may be
extremely biased and should be evaluated carefully. But it doesnt make
sense to just throw it out totally, particularaly in cases where its the
only data we have. We should also be evaluating why consensus and data are
conflicting. Maybe there are unstudied factors causing the conflict so the
two positions are not mutually exclusive.
Ryan Lane
2015-07-27 18:02:20 UTC
Permalink
Post by Brian Wolff
Data does not prove things "good". Data proves (or more likely provides
some support but not proves) some objective hypothesis. Proving normative
claims with objective data is pretty impossible.
Post by Brian Wolff
That may sound pendantic, but i think its an important distinction.
Evidence should be presented in the form of "This change improved
findability of the edit button by 40% among anons in our experiment [link to
details]. Therefor I/we believe this is a good change because I/we think
that findability of edit button is important". Separating what the data
proves and what are personal opinions about the data is important to make
the "science" sound legitament and not manipulatrd.
It sounds pedantic because it is :). Good/bad in my proposal was targeting
the hypothesis, not the moral concept of good/bad. Good = the hypothesis is
shown to be effective; bad = the hypothesis is shown to be ineffective.

What you've ignored in my proposal is the part where the community input is
part of the formation of the hypothesis. I also mentioned that vocal
minorities should be ignored with the exception of questioning the
methodology of the data analysis.
Post by Brian Wolff
Consensus of the editor commmunity is ancedotal data. That data may be
extremely biased and should be evaluated carefully. But it doesnt make sense
to just throw it out totally, particularaly in cases where its the only data
we have. We should also be evaluating why consensus and data are
conflicting. Maybe there are unstudied factors causing the conflict so the
two positions are not mutually exclusive.
Post by Brian Wolff
--
Anecdotal data should be used as a means of following up on experiments, but
should not be considered in the data set as it's an unreliable source. If
there's a large amount of anecdotal data coming in, it's something that
should be part of the standard data set. There's obviously exceptions to
this, but assuming there's enough data it should be possible to gauge the
effectiveness of changes without relying on anecdotal data.

For instance, if a change negatively affects an editor's workflow, it should
be reflected in data like "avg/p95/p99 time for x action to occur", where x
is some normal editor workflow.

- Ryan
Jonathan Morgan
2015-07-27 20:51:33 UTC
Permalink
Post by Ryan Lane
For instance, if a change negatively affects an editor's workflow, it should
be reflected in data like "avg/p95/p99 time for x action to occur", where x
is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in live
deployments (which are, at best, quasi-experiments
<https://en.wikipedia.org/wiki/Quasi-experiment>), you seldom get results
that are as unequivocal as the example you're presenting here. And
quantifying the influence of a single causal factor (such as the impact of
a particular UI change on time-on-task for this or that editing workflow)
is even harder.

Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe
that these are the only factors. And many community members who have read
our research don't even accept our premises, let alone our findings.

I'm not at all afraid of sounding pedantic here (or of writing a long-ass
wall of text), because I think that many WMF and former-WMF participants in
this discussion are glossing over important stuff: Yes, we need a more
evidence-based product design process. But we also need a more
collaborative, transparent, and iterative deployment process. Having solid
research and data on the front-end of your product lifecycle is important,
but it's not some kind of magic bullet and is no substitute for community
involvement in product design (through the lifecycle).

We have an excellent Research & Data
<https://wikimediafoundation.org/wiki/Staff_and_contractors#Research_and_Data>
team. The best one we've ever had at WMF. Pound-for-pound, they're as good
as or better than the Data Science teams at Google or Facebook. None of
them would ever claim, as you seem to here, that all you need to build good
products are well-formed hypotheses and access to buckets of log data.

I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying
new products on Wikimedia projects: what works, what doesn't. He held up
the design/deployment process for Vector as an example of *good* process,
one that we should (re)adopt.

Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects
to opt-in and try it out on prototype wikis[2]. The product team set public
criteria for when it would release the product as default across
production projects: retention of 80% of the Beta users who had opted in,
after a certain amount of time. When a beta tester opted out, they were
sent a survey to find out why[3]. The product team attempted to triage the
issues reported in these surveys, address them in the next iteration, or
(if they couldn't/wouldn't fix them), at least publicly acknowledge the
feedback. Then they created a phased deployment schedule, and stuck to
it[4].

This was, according to Liam (who's been around the movement a lot longer
than most of us at WMF), a successful strategy. It built trust, and engaged
volunteers as both evangelists and co-designers. I am personally very eager
to hear from other community members who were around at the time what they
thought of the process, and/or whether there are other examples of good WMF
product deployments that we could crib from as we re-assess our current
process. From what I've seen, we still follow many good practices in our
product deployments, but we follow them haphazardly and inconsistently.

Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent *and influence*
the sentiments of the broader, less vocal, contributor base in important
ways. And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building
something worthwhile, or deploying it responsibly.

We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks.

And we're not a commercial community-driven site like Quora or Yelp, which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to
support that platform, content, and community.

Our biggest opportunity to innovate, as a company, is in our design
process. We have a dedicated, multi-talented, active community of
contributors. Those of us who are getting paid should be working on
strategies for leveraging that community to make better products, rather
than trying to come up with new ways to perform end runs around them.

Jonathan

1.
https://usability.wikimedia.org/wiki/What%27s_new,_questions_and_answers#How_was_it_decided_that_these_changes_would_be_implemented.3F
2. https://usability.wikimedia.org/wiki/Prototype
3. https://usability.wikimedia.org/wiki/Beta_Feedback_Survey
4. https://usability.wikimedia.org/wiki/Releases/Default_Switch
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Brian Wolff
2015-07-27 20:59:18 UTC
Permalink
Post by Jonathan Morgan
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying
new products on Wikimedia projects: what works, what doesn't. He held up
the design/deployment process for Vector as an example of *good* process,
one that we should (re)adopt.
Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects
to opt-in and try it out on prototype wikis[2]. The product team set public
criteria for when it would release the product as default across
production projects: retention of 80% of the Beta users who had opted in,
after a certain amount of time. When a beta tester opted out, they were
sent a survey to find out why[3]. The product team attempted to triage the
issues reported in these surveys, address them in the next iteration, or
(if they couldn't/wouldn't fix them), at least publicly acknowledge the
feedback. Then they created a phased deployment schedule, and stuck to
it[4].
This was, according to Liam (who's been around the movement a lot longer
than most of us at WMF), a successful strategy. It built trust, and engaged
volunteers as both evangelists and co-designers. I am personally very eager
to hear from other community members who were around at the time what they
thought of the process, and/or whether there are other examples of good WMF
product deployments that we could crib from as we re-assess our current
process. From what I've seen, we still follow many good practices in our
product deployments, but we follow them haphazardly and inconsistently.
I agree wholeheartedly with your email. But I wonder if this part is a
bit looking at the past through rose coloured glasses. Vector roll out
was certainly better than some other feature rollouts, but... it was
hardly without pain if I remember correctly. Although it was a long
time ago, and before I was involved on the dev side, so my memory is a
bit fuzzy.

--
-bawolff
Jonathan Morgan
2015-07-27 21:05:23 UTC
Permalink
Post by Brian Wolff
I agree wholeheartedly with your email. But I wonder if this part is a
bit looking at the past through rose coloured glasses. Vector roll out
was certainly better than some other feature rollouts, but... it was
hardly without pain if I remember correctly. Although it was a long
time ago, and before I was involved on the dev side, so my memory is a
bit fuzzy.
I was also a bit surprised to hear that process held up as a positive
example. But it was before my time as well, so I don't have direct
knowledge, just what Liam related to me.

The parts of that process I'm most excited about are:
1. setting public success criteria ahead of time, based on user
adoption/retention
2. the public commitment to iterate*, before broad rollout, based on
specific feedback from beta testers.

- J

*and not just fix bugs, but actually revise/add/eliminate features
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Brian Wolff
2015-07-27 21:20:09 UTC
Permalink
Post by Jonathan Morgan
Post by Brian Wolff
I agree wholeheartedly with your email. But I wonder if this part is a
bit looking at the past through rose coloured glasses. Vector roll out
was certainly better than some other feature rollouts, but... it was
hardly without pain if I remember correctly. Although it was a long
time ago, and before I was involved on the dev side, so my memory is a
bit fuzzy.
I was also a bit surprised to hear that process held up as a positive
example. But it was before my time as well, so I don't have direct
knowledge, just what Liam related to me.
1. setting public success criteria ahead of time, based on user
adoption/retention
2. the public commitment to iterate*, before broad rollout, based on
specific feedback from beta testers.
- J
*and not just fix bugs, but actually revise/add/eliminate features
Yes I agree that setting out public criteria ahead of time is
something very nice. I see a lot of comments from users who feel they
are powerless to prevent the feature from being fully deployed if it
turns out to be bad, and thus don't want to have any trials at all,
because they feel it leads down a road which cannot be turned back on.

--bawolff
Federico Leva (Nemo)
2015-07-27 22:24:44 UTC
Permalink
Post by Jonathan Morgan
I was also a bit surprised to hear that process held up as a positive
example. But it was before my time as well, so I don't have direct
knowledge, just what Liam related to me.
Vector was probably the first case where MediaWiki turned into a real
battlefield, with WMF on one side and volunteers (i.e. all the
traditional MediaWiki users and developers) on the other.
http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/49535/

Of course there are still lessons to learn from the process and Trevor's
email proves some were learnt. :) I'm not sure what exactly would the
the things to copy or drop, but as someone involved in communication of
the initiative in 2009–10, I can say that:
* explaining the project was much easier than it is with most WMF
software projects now;
* WMF got much better at releasing data and analysis (when they exist)!

Nemo
Ryan Lane
2015-07-28 05:52:38 UTC
Permalink
On Mon, Jul 27, 2015 at 11:02 AM, Ryan Lane
For instance, if a change negatively affects an editor's workflow, it should
be reflected in data like "avg/p95/p99 time for x action to occur", where x
is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in
live deployments (which are, at best, quasi-experiments), you seldom get
results that are as unequivocal as the example you're presenting here.  And
quantifying the influence of a single causal factor (such as the impact of a
particular UI change on time-on-task for this or that editing workflow) is
even harder.
The idea of A/B tests is to try to isolate things. You're not going to get
perfect data all of the time and you'll likely need to retry experiments
with more focus until you can be assured your tests are accurate, but this
is definitely doable in live deployments.

I used editing as an example, but you're right in that it's difficult to get
reliable metrics for a lot of editing actions (though it should be a bit
easier in VE). That's of course why I gave a search example previously,
which is much easier to isolate. In fact, most reader based tests should be
pretty reliable, since the reader feature set is much smaller and the number
of readers is massive. This topic is about skin changes, btw ;).
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe that
these are the only factors. And many community members who have read our
research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem
(which you've done through research), then to break it down into small
actionable parts (you've already mentioned them), then to tackle each
problem by proposing solutions, implementing them in a testable way and then
seeing if the results are positive or not.

The results of changing the warnings had pretty strong indications that the
new messages moved the retention numbers in a positive way, right? Why
shouldn't we trust the data there? If the data wasn't good enough, is there
any way to make them more accurate?methodology.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass
wall of text), because I think that many WMF and former-WMF participants in
this discussion are glossing over important stuff: Yes, we need a more
evidence-based product design process. But we also need a more
collaborative, transparent, and iterative deployment process. Having solid
research and data on the front-end of your product lifecycle is important,
but it's not some kind of magic bullet and is no substitute for community
involvement in product design (through the lifecycle).
We have an excellent Research & Data team. The best one we've ever had at
WMF. Pound-for-pound, they're as good as or better than the Data Science
teams at Google or Facebook. None of them would ever claim, as you seem to
here, that all you need to build good products are well-formed hypotheses
and access to buckets of log data. 
Until the very, very recent past there wasn't even the ability to measure
the simplest of things. There's no real-time or close to real-time
measurements. There's no health dashboards for vital community metrics.
There's no experimentation framework. Since there's no experiment framework
there's no run-time controls for product managers to run A/B tests of
feature flagged features. There's very few analytics events in MediaWiki.

I don't want to sound negative, because I understand why all of this is the
case, since analytics has been poorly resourced, ignored and managed into
the ground until pretty recently, but Wikimedia isn't at the level of most
early startups when it comes to analytics.

Wikimedia does have (and has historically had) excellent researchers that
have been doing amazing work with insanely small amounts of data and
infrastructure.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying
new products on Wikimedia projects: what works, what doesn't. He held up the
design/deployment process for Vector as an example of good process, one that
we should (re)adopt. 
Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects to
opt-in and try it out on prototype wikis[2]. The product team set public
criteria for when it would release  the product as default across production
projects: retention of 80% of the Beta users who had opted in, after a
certain amount of time. When a beta tester opted out, they were sent a
survey to find out why[3]. The product team attempted to triage the issues
reported in these surveys, address them in the next iteration, or (if they
couldn't/wouldn't fix them), at least publicly acknowledge the feedback.
Then they created a phased deployment schedule, and stuck to it[4]. 
I was on that project (Usability Initiative) as an ops engineer. I was hired
for it, in fact. I remember that project well and I wouldn't call it a major
success. It was successful in that it changed the default skin to something
that was slightly more modern than Monobook, but it was the only truly
successful part of the entire project. I think Vector is the only surviving
code from it. The vast majority of Vector features didn't make it
permanently into the Vector skin. Mostly what stayed around was the "look
and feel" of the skin.

The community was a lot more accepting of change then, but it was still a
pretty massive battle. The PM of that project nearly worked herself to death.
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent and influence the
sentiments of the broader, less vocal, contributor base in important ways.
And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building something
worthwhile, or deploying it responsibly.

Yeah. Obviously it's necessary to not ship broken or very buggy code, but
that's a different story. It's also a lot easier to know if your code is
broken when you A/B test it before it's shipped. It should be noticeable
from the metrics, or the metrics aren't good enough.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks. 

What's your proposal that's somehow better than what most of the rest of the
sites on the internet are doing? Maybe you can't do exactly what they're
doing due to lack of resources, but you can at least do the basics.
And we're not a commercial community-driven site like Quora or Yelp, which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to support
that platform, content, and community. 
I don't understand how you can say this. This is exactly how fundraising at
WMF works and it's been shown to be incredibly effective. WMF is most likely
the most effective organization in the world at large-scale small donations.
It's this way because it constantly tests changes to see what's more
effective. It does this using almost exactly the methodology I'm describing.
Why can't we bring a little bit of this awesomeness into the rest of the
engineering organization?

- Ryan
Federico Leva (Nemo)
2015-07-28 07:52:17 UTC
Permalink
Post by Ryan Lane
I don't understand how you can say this. This is exactly how fundraising at
WMF works and it's been shown to be incredibly effective.
There is no proof that the WMF fundraising is effective.
https://meta.wikimedia.org/wiki/Talk:Fundraising_2012/Report
When you don't measure costs and externalities, of course any profit
looks good.

Nemo
Jonathan Morgan
2015-07-29 00:51:46 UTC
Permalink
Responses inline!
Post by Ryan Lane
The idea of A/B tests is to try to isolate things. You're not going to get
perfect data all of the time and you'll likely need to retry experiments
with more focus until you can be assured your tests are accurate, but this
is definitely doable in live deployments.
I used editing as an example, but you're right in that it's difficult to get
reliable metrics for a lot of editing actions (though it should be a bit
easier in VE). That's of course why I gave a search example previously,
which is much easier to isolate. In fact, most reader based tests should be
pretty reliable, since the reader feature set is much smaller and the number
of readers is massive. This topic is about skin changes, btw ;).
We started out talking about skin changes, but we've been in
meta-discussion-land for a few days now. That's not surprising (we touched
on probably the biggest perennial conflicts between WMF and the editing
community). What's surprising to me is that, so far, this time the
discussion has been both frank and relatively proactive. So I want to ride
this wave as far as it takes us.

A/B tests are great, and we should use them more often for reader-facing
UI. But a new default skin isn't just reader-facing; it's everyone-facing.
Making things easier, more engaging, or more delightful for non-editors
isn't going to do us much good if it makes things harder, less engaging, or
less delightful for editors.

There are definitely products that are primarily reader-facing. But most of
our products (and certainly the default skin) have a substantial impact on
the editing experience as well. Earlier, you said the editor community "should
be worked around when changes are meant to affect readers and those changes
don't directly negatively affect editor metrics." I counter that:
a) there is no single editor metric, or set of metrics, that we can use to
fully determine the impact of this or that design change on the editing
experience of Wikipedia.
b) even if there were such metrics, it would be highly counterproductive
for WMF to say to editors "we don't care about your experiences, just your
aggregate performance". Also, dickish.

Because I see two issues at play here, and I think they are inextricably
linked: We need to be more evidence-driven, and we need more, not less,
community involvement in our design process.

If we don't become more evidence-driven (which requires updates to both or
processes and our infrastructure), we will always struggle to build
products that meet the needs of our users (readers, editors, third-party
MediaWiki peeps).

But *whether or not we become more evidence-driven, *we will always
struggle to get the products we build implemented, if our most powerful
user group doesn't currently trust us to act in their best interest. Or
even our own.
Post by Ryan Lane
Post by Jonathan Morgan
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe that
these are the only factors. And many community members who have read our
research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem
(which you've done through research), then to break it down into small
actionable parts (you've already mentioned them), then to tackle each
problem by proposing solutions, implementing them in a testable way and then
seeing if the results are positive or not.
The results of changing the warnings had pretty strong indications that the
new messages moved the retention numbers in a positive way, right? Why
shouldn't we trust the data there? If the data wasn't good enough, is there
any way to make them more accurate?methodology.
The data were good :) Actually, Snuggle and the Teahouse both came out of
this line of research. These two products share several features, that
Winter (and most of our major products) don't:

1. They are permanently opt-in: no person has to use them. No Wikimedia
project has to adopt them.
2. They add functionality, rather than replacing it.
3. They are incrementalist approaches to addressing a major issue
identified through careful front-end research.
4. They were designed in collaboration (not just consultation) with editors.
4. They are powered (to this day) by dedicated volunteers who are invested
in their success.
5. They were cheap to build, and are cheap to maintain.

Some of these features probably limit their overall impact. But they
virtually assure their long-term sustainability, which means they can keep
on addressing the newcomer retention problem, even after the
grants/dissertations that supported their development are gone. FWIW, many
other new editor engagement products have had to be scuttled after the
product team that developed them (and championed them) was disbanded, or
the Foundation's priorities changed.

I'm not suggesting that this design approach offers a template for how to
make people <3 VE or whatever, but there are lessons here about how to do
evidence-based design well, and about the advantages of getting core
contributors to feel invested in what you build.
Post by Ryan Lane
Until the very, very recent past there wasn't even the ability to measure
the simplest of things. There's no real-time or close to real-time
measurements. There's no health dashboards for vital community metrics.
There's no experimentation framework. Since there's no experiment framework
there's no run-time controls for product managers to run A/B tests of
feature flagged features. There's very few analytics events in MediaWiki.
I don't want to sound negative, because I understand why all of this is the
case, since analytics has been poorly resourced, ignored and managed into
the ground until pretty recently, but Wikimedia isn't at the level of most
early startups when it comes to analytics.
Wikimedia does have (and has historically had) excellent researchers that
have been doing amazing work with insanely small amounts of data and
infrastructure.
I didn't think you were dissing the researchers; sorry if it came off that
way. My point was that our research & data team know that a) A/B tests
alone aren't usually sufficient to justify major design changes and b) good
science won't convince anyone if they already mistrust or dislike you.
Leila and Aaron, for example, have had to invest a lot of time explaining,
contextualizing, defending their research, trying to (re)build trust so
that people will give their research a fair hearing.
Post by Ryan Lane
I was on that project (Usability Initiative) as an ops engineer. I was hired
for it, in fact. I remember that project well and I wouldn't call it a major
success. It was successful in that it changed the default skin to something
that was slightly more modern than Monobook, but it was the only truly
successful part of the entire project. I think Vector is the only surviving
code from it. The vast majority of Vector features didn't make it
permanently into the Vector skin. Mostly what stayed around was the "look
and feel" of the skin.
The community was a lot more accepting of change then, but it was still a
pretty massive battle. The PM of that project nearly worked herself to death.
Right! It's way harder now. All of us whose jobs require us to interact
with community members around product design have to fight that battle.
There's a lot of mistrust: we're perceived by many as being incompetent,
and/or acting in bad faith vis a vis the core contributors to Wikimedia
projects. It really sucks sometimes.

But we, as an organization (if not as individuals), bear a good deal of
responsibility for the state we're in. A lot of it stems from the way we
have designed and deployed products in the past. Fixing that requires more
than more research and better testing infrastructure. And perpetuating the
meme that the community is afraid of change and that's why we can't have
nice things... certainly doesn't help.
Post by Ryan Lane
Post by Jonathan Morgan
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent and influence the
sentiments of the broader, less vocal, contributor base in important ways.
And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building something
worthwhile, or deploying it responsibly.
Yeah. Obviously it's necessary to not ship broken or very buggy code, but
that's a different story. It's also a lot easier to know if your code is
broken when you A/B test it before it's shipped. It should be noticeable
from the metrics, or the metrics aren't good enough.
Post by Jonathan Morgan
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks.
What's your proposal that's somehow better than what most of the rest of the
sites on the internet are doing? Maybe you can't do exactly what they're
doing due to lack of resources, but you can at least do the basics.
My proposal is that we should follow a more participatory design process.
Better tools and research are necessary, but insufficient. And the
"consulting" model that Quora uses isn't appropriate to Wikimedia.

It sounds to me like you and Steven think that we can build faster and
better if we distance ourselves more from the community--abstracting their
experience as metrics, and limiting their participation to consultation.
But I don't think that what's slowing us down is our efforts to work with
communities around what we deploy, where we deploy it, and when. I think
what slows us down is that we constantly say that we're open and
collaborative, but often fail to be open and collaborative when it matters
most. This engenders mistrust, which makes it harder for us to experiment,
delays deployments, results in buggier, less usable, and less useful
products, and virtually guarantees that many of our core users are going to
defer or actively resist adopting what we build.

In order to dig ourselves out, let's pursue a two-pronged strategy of:
a) evidence driven product development: using quantitative and qualitative
research to decide what to build and how to build it
b) a transparent, iterative, and participatory process: telling people what
intend to build, when and under what circumstances we intend to deploy it,
and consistently addressing the feedback we get from people at every stage,
in good faith

We won't ever succeed with a) if we don't show that we can implement b)
consistently.
Post by Ryan Lane
Post by Jonathan Morgan
And we're not a commercial community-driven site like Quora or Yelp,
which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to support
that platform, content, and community.
I don't understand how you can say this. This is exactly how fundraising at
WMF works and it's been shown to be incredibly effective. WMF is most likely
the most effective organization in the world at large-scale small donations.
It's this way because it constantly tests changes to see what's more
effective. It does this using almost exactly the methodology I'm describing.
Why can't we bring a little bit of this awesomeness into the rest of the
engineering organization?
Fundraising is great! I love fundraising. And not just because they pay my
salary--they have great research and an enviable testing infrastructure.
But tracking the performance of banners that drive monetary contributions
is a fundamentally different task from tracking the performance (<-- not
sure that word even applies) of a whole new default UI that fundamentally
changes the way both casual readers and dedicated editors interact with
Wikipedia. Fundraising products, and the process by which we design and
evaluate them, aren't representative of our big software products like
Mobile site/apps, Content Translation, VE, Flow, etc.

That's why I'm pushing on your "we can make it work through A/B testing"
thesis around deploying something as radical and complex as Winter, as
opposed to iterating on Vector. A whole new skin affects everyone's
experience of the site in complex and multifaceted ways; there's no single
(or even primary) metric of performance. And we can't expect to short-cut
the design process or short-circuit community involvement. The only way out
is through.

Jonathan
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Monte Hurd
2015-07-29 02:32:16 UTC
Permalink
Hey all!

I'm not very familiar with mediawiki skins, so apologies if this is
ridiculous, not possible, mentioned already, etc, but what Jonathan said
here really stood out to me as maybe at the heart of the issue:

"But a new default skin isn't just reader-facing; it's everyone-facing.
Making things easier, more engaging, or more delightful for non-editors
isn't going to do us much good if it makes things harder, less engaging, or
less delightful for editors."

My question is, could mediawiki use one skin when editing (Vector, or
rename it "Vector-Editing") and a copy of Vector ("Vector-Reading" or
something) when not editing (i.e. when reading)? They'd be initially
identical, but going forward they could begin to slowly diverge as required
by their respective editing and reading flows. There'd just have to be a
mechanism to switch at the appropriate time... in theory.

-Monte
Post by Jonathan Morgan
Responses inline!
Post by Ryan Lane
The idea of A/B tests is to try to isolate things. You're not going to get
perfect data all of the time and you'll likely need to retry experiments
with more focus until you can be assured your tests are accurate, but this
is definitely doable in live deployments.
I used editing as an example, but you're right in that it's difficult to get
reliable metrics for a lot of editing actions (though it should be a bit
easier in VE). That's of course why I gave a search example previously,
which is much easier to isolate. In fact, most reader based tests should be
pretty reliable, since the reader feature set is much smaller and the number
of readers is massive. This topic is about skin changes, btw ;).
We started out talking about skin changes, but we've been in
meta-discussion-land for a few days now. That's not surprising (we touched
on probably the biggest perennial conflicts between WMF and the editing
community). What's surprising to me is that, so far, this time the
discussion has been both frank and relatively proactive. So I want to ride
this wave as far as it takes us.
A/B tests are great, and we should use them more often for reader-facing
UI. But a new default skin isn't just reader-facing; it's everyone-facing.
Making things easier, more engaging, or more delightful for non-editors
isn't going to do us much good if it makes things harder, less engaging, or
less delightful for editors.
There are definitely products that are primarily reader-facing. But most
of our products (and certainly the default skin) have a substantial impact
on the editing experience as well. Earlier, you said the editor community "should
be worked around when changes are meant to affect readers and those
a) there is no single editor metric, or set of metrics, that we can use to
fully determine the impact of this or that design change on the editing
experience of Wikipedia.
b) even if there were such metrics, it would be highly counterproductive
for WMF to say to editors "we don't care about your experiences, just your
aggregate performance". Also, dickish.
Because I see two issues at play here, and I think they are inextricably
linked: We need to be more evidence-driven, and we need more, not less,
community involvement in our design process.
If we don't become more evidence-driven (which requires updates to both or
processes and our infrastructure), we will always struggle to build
products that meet the needs of our users (readers, editors, third-party
MediaWiki peeps).
But *whether or not we become more evidence-driven, *we will always
struggle to get the products we build implemented, if our most powerful
user group doesn't currently trust us to act in their best interest. Or
even our own.
Post by Ryan Lane
Post by Jonathan Morgan
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe that
these are the only factors. And many community members who have read our
research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem
(which you've done through research), then to break it down into small
actionable parts (you've already mentioned them), then to tackle each
problem by proposing solutions, implementing them in a testable way and then
seeing if the results are positive or not.
The results of changing the warnings had pretty strong indications that the
new messages moved the retention numbers in a positive way, right? Why
shouldn't we trust the data there? If the data wasn't good enough, is there
any way to make them more accurate?methodology.
The data were good :) Actually, Snuggle and the Teahouse both came out of
this line of research. These two products share several features, that
1. They are permanently opt-in: no person has to use them. No Wikimedia
project has to adopt them.
2. They add functionality, rather than replacing it.
3. They are incrementalist approaches to addressing a major issue
identified through careful front-end research.
4. They were designed in collaboration (not just consultation) with editors.
4. They are powered (to this day) by dedicated volunteers who are invested
in their success.
5. They were cheap to build, and are cheap to maintain.
Some of these features probably limit their overall impact. But they
virtually assure their long-term sustainability, which means they can keep
on addressing the newcomer retention problem, even after the
grants/dissertations that supported their development are gone. FWIW, many
other new editor engagement products have had to be scuttled after the
product team that developed them (and championed them) was disbanded, or
the Foundation's priorities changed.
I'm not suggesting that this design approach offers a template for how to
make people <3 VE or whatever, but there are lessons here about how to do
evidence-based design well, and about the advantages of getting core
contributors to feel invested in what you build.
Post by Ryan Lane
Until the very, very recent past there wasn't even the ability to measure
the simplest of things. There's no real-time or close to real-time
measurements. There's no health dashboards for vital community metrics.
There's no experimentation framework. Since there's no experiment framework
there's no run-time controls for product managers to run A/B tests of
feature flagged features. There's very few analytics events in MediaWiki.
I don't want to sound negative, because I understand why all of this is the
case, since analytics has been poorly resourced, ignored and managed into
the ground until pretty recently, but Wikimedia isn't at the level of most
early startups when it comes to analytics.
Wikimedia does have (and has historically had) excellent researchers that
have been doing amazing work with insanely small amounts of data and
infrastructure.
I didn't think you were dissing the researchers; sorry if it came off that
way. My point was that our research & data team know that a) A/B tests
alone aren't usually sufficient to justify major design changes and b) good
science won't convince anyone if they already mistrust or dislike you.
Leila and Aaron, for example, have had to invest a lot of time explaining,
contextualizing, defending their research, trying to (re)build trust so
that people will give their research a fair hearing.
Post by Ryan Lane
I was on that project (Usability Initiative) as an ops engineer. I was hired
for it, in fact. I remember that project well and I wouldn't call it a major
success. It was successful in that it changed the default skin to something
that was slightly more modern than Monobook, but it was the only truly
successful part of the entire project. I think Vector is the only surviving
code from it. The vast majority of Vector features didn't make it
permanently into the Vector skin. Mostly what stayed around was the "look
and feel" of the skin.
The community was a lot more accepting of change then, but it was still a
pretty massive battle. The PM of that project nearly worked herself to death.
Right! It's way harder now. All of us whose jobs require us to interact
with community members around product design have to fight that battle.
There's a lot of mistrust: we're perceived by many as being incompetent,
and/or acting in bad faith vis a vis the core contributors to Wikimedia
projects. It really sucks sometimes.
But we, as an organization (if not as individuals), bear a good deal of
responsibility for the state we're in. A lot of it stems from the way we
have designed and deployed products in the past. Fixing that requires more
than more research and better testing infrastructure. And perpetuating the
meme that the community is afraid of change and that's why we can't have
nice things... certainly doesn't help.
Post by Ryan Lane
Post by Jonathan Morgan
Whether or not we (WMF) think it is fair that we have to listen to
"vocal
minorities" (Ryan's words), these voices often represent and influence the
sentiments of the broader, less vocal, contributor base in important ways.
And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building something
worthwhile, or deploying it responsibly.
Yeah. Obviously it's necessary to not ship broken or very buggy code, but
that's a different story. It's also a lot easier to know if your code is
broken when you A/B test it before it's shipped. It should be noticeable
from the metrics, or the metrics aren't good enough.
Post by Jonathan Morgan
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks.
What's your proposal that's somehow better than what most of the rest of the
sites on the internet are doing? Maybe you can't do exactly what they're
doing due to lack of resources, but you can at least do the basics.
My proposal is that we should follow a more participatory design process.
Better tools and research are necessary, but insufficient. And the
"consulting" model that Quora uses isn't appropriate to Wikimedia.
It sounds to me like you and Steven think that we can build faster and
better if we distance ourselves more from the community--abstracting their
experience as metrics, and limiting their participation to consultation.
But I don't think that what's slowing us down is our efforts to work with
communities around what we deploy, where we deploy it, and when. I think
what slows us down is that we constantly say that we're open and
collaborative, but often fail to be open and collaborative when it matters
most. This engenders mistrust, which makes it harder for us to experiment,
delays deployments, results in buggier, less usable, and less useful
products, and virtually guarantees that many of our core users are going to
defer or actively resist adopting what we build.
a) evidence driven product development: using quantitative and qualitative
research to decide what to build and how to build it
b) a transparent, iterative, and participatory process: telling people
what intend to build, when and under what circumstances we intend to deploy
it, and consistently addressing the feedback we get from people at every
stage, in good faith
We won't ever succeed with a) if we don't show that we can implement b)
consistently.
Post by Ryan Lane
Post by Jonathan Morgan
And we're not a commercial community-driven site like Quora or Yelp,
which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to support
that platform, content, and community.
I don't understand how you can say this. This is exactly how fundraising at
WMF works and it's been shown to be incredibly effective. WMF is most likely
the most effective organization in the world at large-scale small donations.
It's this way because it constantly tests changes to see what's more
effective. It does this using almost exactly the methodology I'm describing.
Why can't we bring a little bit of this awesomeness into the rest of the
engineering organization?
Fundraising is great! I love fundraising. And not just because they pay my
salary--they have great research and an enviable testing infrastructure.
But tracking the performance of banners that drive monetary contributions
is a fundamentally different task from tracking the performance (<-- not
sure that word even applies) of a whole new default UI that fundamentally
changes the way both casual readers and dedicated editors interact with
Wikipedia. Fundraising products, and the process by which we design and
evaluate them, aren't representative of our big software products like
Mobile site/apps, Content Translation, VE, Flow, etc.
That's why I'm pushing on your "we can make it work through A/B testing"
thesis around deploying something as radical and complex as Winter, as
opposed to iterating on Vector. A whole new skin affects everyone's
experience of the site in complex and multifaceted ways; there's no single
(or even primary) metric of performance. And we can't expect to short-cut
the design process or short-circuit community involvement. The only way out
is through.
Jonathan
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Design mailing list
https://lists.wikimedia.org/mailman/listinfo/design
Nihiltres
2015-07-29 09:02:31 UTC
Permalink
Post by Monte Hurd
Hey all!
"But a new default skin isn't just reader-facing; it's everyone-facing. Making things easier, more engaging, or more delightful for non-editors isn't going to do us much good if it makes things harder, less engaging, or less delightful for editors."
My question is, could mediawiki use one skin when editing (Vector, or rename it "Vector-Editing") and a copy of Vector ("Vector-Reading" or something) when not editing (i.e. when reading)? They'd be initially identical, but going forward they could begin to slowly diverge as required by their respective editing and reading flows. There'd just have to be a mechanism to switch at the appropriate time... in theory.
-Monte
No, that's a bad idea. Editing is the core feature of Wikipedia. The interface when not editing should *scream* editability. Disentangle reading from editing, and we risk exacerbating the existing problem of recruiting newbies: it would make it harder for them to acclimatize if there's a big interface shift on top of everything else they have to learn (citing, neutrality, you name it). For example, VisualEditor is an effort to reduce the existing shift, by (largely) removing wikitext from the list of things necessary to learn.


Nihiltres

Liam Wyatt
2015-07-28 12:28:14 UTC
Permalink
Thanks for cc'ing me Jonathan, I wouldn't have seen this otherwise.

TL;DR - Objectively measurable criteria. Clear process. No surprises.

The context of my giving the example of Vector as a good example *of
process* was after the presentation about the future of 'Flow' at
Wikimania.[1] I highly recommend people read the slides of this session if
you've not already - great stuff![2] In particular, I was talking about how
the Usability Initiative team were the first to use an opt-in Beta process
at the WMF. It was the use of iterative development, progressive rollout,
and closed-loop feedback that made their work a successful *process*. I
wasn't talking about the Vector skin per-se.

Significantly, they had a publicly-declared and measurable, criteria for
determining what counted as "community acceptance/support". This criteria
was 80% retention rate of opt-in users. They did not lock-down the features
of one version of their beta and move to the next version until they could
show that 80% of people who tried it, preferred it. Moreover, they stuck to
this objective criteria for measuring consensus support all the way to the
final rollout.[3]

This system was a great way to identify people who had the willingness to
change but had concerns, as opposed to getting bogged down by people who
would never willingly accept a change or people who would accept all
changes regardless. It also meant that those people became 'community
advocates' for the new system because they had positive experiences of
their feedback being taken into account.

And I DO remember the process, and the significance that was attached to it
by the team (which included Trevor Parscal), because in 2009 I interviewed
the whole team in person for the Wikipedia Weekly podcast.[4] Far
from "*looking
at the past through rose coloured glasses" *I recall the specific
pain-points on the day that the Vector Skin became the default. These were
the inter-language links list being autocollapsed, and the Wikipedia logo
was updated.[5] The fact that it was THESE things that caused all the
controversy on the day that Vector went from Beta to opt-out is
instructive. These were the two things that were NOT part of the Beta
testing period - no process, surprises. Tthe people who had valid feedback
had not been given an opportunity to provide it and valid feedback came
instead in the form of swift criticism on mailing lists.[6]

My support for concept of a clearly defined, objectively measured, rollout
*process* for new features is not new... When Fabrice announced "beta
features" in November 2013 I was the first to respond - referring to the
same examples, and telling the same story about the Usability Initiative's
processes.[7]

Then, as now, the "beta features" tab lists the number of users who have
opted-in to a tool, but there is no comparative/objective explanation of
what that actually means! For example, it tells me that 33,418 people have
opted-in to "Hovercards", but is that good? How long did it take to reach
that level? How many people have switched it off? What proportion of the
active editorship is that? And most importantly - what relationship does
this number have to whether Hovercards will 'graduate' or 'fall' the opt-in
Beta process?

Which brings me to the point I made to Jonathan, and also Pau, at Wikimania
about the future of Flow.
If there's two things we Wikimedians hate most, I've come to believe that
they are:
1) The absence of a clear process, or a failure to follow that process
2) Being surprised

We can, generally, abide outcomes/decisions that we don't like (e.g.
article-deletion debates) as long as the process by which that decision was
arrived at was clearly explained, and objectively followed. I believe this
is why there was so much anger and frustration about the 'autoconfirm
article creation trial' on en.wp [8] and the 'superprotect' controversy -
because they represented a failure to follow a process, and a surprise
(respectively).

So, even more than the Vector skin or even the Visual Editor, Flow
ABSOLUTELY MUST have a clear, objectively measurable, *process* for
measuring community consensus because it will be replacing
community-designed and community-operated workflows (e.g. [9]). This means
that once it is enabled on a particular workflow:
1) an individual user can't opt-out to the old system.
2) it will most affect, and be most used by, admins and other
very-active-users.
Therefore, I believe that this development must be an iterative process of
working on 1 workflow on 1 wiki at a time, with objective measures of
consensus-support that are at least partially *determined by the affected
community itself*. This will be the only way that Flow can gain community
consensus for replacing the existing
template/sub-page/gadget/transclusion/category-based workflows.[10]

Because Flow will be updating admin-centric workflows, if it is rolled-out
in a way that is anything less than this then it will strike the community
as hubris - "it is necessary to destroy the town in order to save it".[11]

-Liam / Wittylama

P.S. While you're at it please make ALL new features go through the "Beta
features" process with some consistent/discoverable process. As it is, some
things live there permanently in limbo, some things DO have a process
associated with them, and some things bypass the beta system
altogether. As bawolff
said, this means people feel they don't have any influence over the rollout
process and therefore chose to not be involved at all.[12]

[1]
https://wikimania2015.wikimedia.org/wiki/Submissions/User(s)_Talk(ing):_The_future_of_wiki_discussions

[2]
https://wikimania2015.wikimedia.org/wiki/File:User(s)_Talk(ing)_-_Wikimania_2015.pdf
[3] https://blog.wikimedia.org/2010/05/13/a-new-look-for-wikipedia/
[4] Sorry - I can't find the file anymore though. This was the page:
https://en.wikipedia.org/wiki/Wikipedia:WikipediaWeekly/Episode76
[5] https://blog.wikimedia.org/2010/05/13/wikipedia-in-3d/
[6]
https://commons.wikimedia.org/wiki/Talk:Wikipedia/2.0#Logo_revisions_need_input
[7]
https://lists.wikimedia.org/pipermail/wikimedia-l/2013-November/128896.html
[8]
https://en.wikipedia.org/wiki/Wikipedia:Autoconfirmed_article_creation_trial
[9]
https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing)_-_Wikimania_2015.pdf&page=4
[10]
https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing)_-_Wikimania_2015.pdf&page=8
[11] https://en.wikipedia.org/wiki/B%E1%BA%BFn_Tre#Vietnam_War
[12] https://lists.wikimedia.org/pipermail/design/2015-July/002355.html


wittylama.com
Peace, love & metadata
Post by Jonathan Morgan
Post by Ryan Lane
For instance, if a change negatively affects an editor's workflow, it should
be reflected in data like "avg/p95/p99 time for x action to occur", where x
is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in
live deployments (which are, at best, quasi-experiments
<https://en.wikipedia.org/wiki/Quasi-experiment>), you seldom get results
that are as unequivocal as the example you're presenting here. And
quantifying the influence of a single causal factor (such as the impact of
a particular UI change on time-on-task for this or that editing workflow)
is even harder.
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe
that these are the only factors. And many community members who have read
our research don't even accept our premises, let alone our findings.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass
wall of text), because I think that many WMF and former-WMF participants in
this discussion are glossing over important stuff: Yes, we need a more
evidence-based product design process. But we also need a more
collaborative, transparent, and iterative deployment process. Having solid
research and data on the front-end of your product lifecycle is important,
but it's not some kind of magic bullet and is no substitute for community
involvement in product design (through the lifecycle).
We have an excellent Research & Data
<https://wikimediafoundation.org/wiki/Staff_and_contractors#Research_and_Data>
team. The best one we've ever had at WMF. Pound-for-pound, they're as good
as or better than the Data Science teams at Google or Facebook. None of
them would ever claim, as you seem to here, that all you need to build good
products are well-formed hypotheses and access to buckets of log data.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying
new products on Wikimedia projects: what works, what doesn't. He held up
the design/deployment process for Vector as an example of *good* process,
one that we should (re)adopt.
Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects
to opt-in and try it out on prototype wikis[2]. The product team set public
criteria for when it would release the product as default across
production projects: retention of 80% of the Beta users who had opted in,
after a certain amount of time. When a beta tester opted out, they were
sent a survey to find out why[3]. The product team attempted to triage the
issues reported in these surveys, address them in the next iteration, or
(if they couldn't/wouldn't fix them), at least publicly acknowledge the
feedback. Then they created a phased deployment schedule, and stuck to
it[4].
This was, according to Liam (who's been around the movement a lot longer
than most of us at WMF), a successful strategy. It built trust, and engaged
volunteers as both evangelists and co-designers. I am personally very eager
to hear from other community members who were around at the time what they
thought of the process, and/or whether there are other examples of good WMF
product deployments that we could crib from as we re-assess our current
process. From what I've seen, we still follow many good practices in our
product deployments, but we follow them haphazardly and inconsistently.
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent *and influence*
the sentiments of the broader, less vocal, contributor base in important
ways. And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building
something worthwhile, or deploying it responsibly.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks.
And we're not a commercial community-driven site like Quora or Yelp, which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to
support that platform, content, and community.
Our biggest opportunity to innovate, as a company, is in our design
process. We have a dedicated, multi-talented, active community of
contributors. Those of us who are getting paid should be working on
strategies for leveraging that community to make better products, rather
than trying to come up with new ways to perform end runs around them.
Jonathan
1.
https://usability.wikimedia.org/wiki/What%27s_new,_questions_and_answers#How_was_it_decided_that_these_changes_would_be_implemented.3F
2. https://usability.wikimedia.org/wiki/Prototype
3. https://usability.wikimedia.org/wiki/Beta_Feedback_Survey
4. https://usability.wikimedia.org/wiki/Releases/Default_Switch
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Brian Wolff
2015-07-27 23:05:21 UTC
Permalink
Post by Brian Wolff
Post by Brian Wolff
Data does not prove things "good". Data proves (or more likely provides
some support but not proves) some objective hypothesis. Proving normative
claims with objective data is pretty impossible.
Post by Brian Wolff
That may sound pendantic, but i think its an important distinction.
Evidence should be presented in the form of "This change improved
findability of the edit button by 40% among anons in our experiment [link to
details]. Therefor I/we believe this is a good change because I/we think
that findability of edit button is important". Separating what the data
proves and what are personal opinions about the data is important to make
the "science" sound legitament and not manipulatrd.
It sounds pedantic because it is :). Good/bad in my proposal was targeting
the hypothesis, not the moral concept of good/bad. Good = the hypothesis is
shown to be effective; bad = the hypothesis is shown to be ineffective.
At the risk of being a bit nitpicky here, if that's the case, and when
you say "...there's always some vocal minority that will hate change,
even when it's presented with data proving it to be good", what you
really mean is "... there's always some vocal minority that will hate
change, even when it's presented with data proving that there exists
some hypothesis about the change that can be shown to be in effect".

Similarly, when you say "The data is the voice of the community. It's
what proves if an idea
is good or bad.", what you really mean "The data is the voice of the
community. It's what proves if an idea has a hypothesis which has been
shown to be in effect or not be in effect."

Well I tend to agree with the versions of these statements where
"good" means a hypothesis is in effect, I don't think they make for a
very compelling argument.
Post by Brian Wolff
What you've ignored in my proposal is the part where the community input is
part of the formation of the hypothesis. I also mentioned that vocal
minorities should be ignored with the exception of questioning the
methodology of the data analysis.
Fair enough. Well I don't think data "experiments" should be the
be-all and end-all, its certainly a useful tool. I agree that ensuring
community input in hypothesis formation and methodology critique is
vital to make sure that we make the best use of this tool.
Post by Brian Wolff
Anecdotal data should be used as a means of following up on experiments, but
should not be considered in the data set as it's an unreliable source. If
there's a large amount of anecdotal data coming in, it's something that
should be part of the standard data set. There's obviously exceptions to
this, but assuming there's enough data it should be possible to gauge the
effectiveness of changes without relying on anecdotal data.
For instance, if a change negatively affects an editor's workflow, it should
be reflected in data like "avg/p95/p99 time for x action to occur", where x
is some normal editor workflow.
Say we wanted to improve discoverability of the edit button for new
users. So we put it in <blink> tags. This pisses off everyone for the
obvious reason. How would we measure user aggravation?

--bawolff
Quim Gil
2015-07-28 12:27:41 UTC
Permalink
Post by Jon Robson
I'd love to have a go at making a new skin based on Winter's ideas in
my spare time with a fixed header, but given that I have no confidence
it will ever get on the cluster I have no motivation to do this. Where
is Apex deployed for example [3]? Why can't I try this out on
Wikipedia and see if I prefer the experience?
mediawiki.org is in the cluster and, as I learned in the past weeks,
experimentation with optional skins should be fine there. It's a first step.

S and I are in the process of requesting the availability of the Blueprint
skin as optional and experimental in mediawiki.org --
https://phabricator.wikimedia.org/T93613. Having Apex (and/or Bluesky,
Foreground, etc) joining the party would be very useful.

Whenever a new "unsolicited redesign" (not the most welcoming and
encouraging term) shows up, we quickly point out to problems such design
would suffer in real use. However, it is not easy at all to polish the
problems a MediaWiki skin might go through without enabling it in a real
wiki with real users and a real collection of various extensions.

More skins available in mediawiki.org would bring movement and progress to
Vector and friends. If a skin wins adoption and excitement in mediawiki.org,
it will be a matter of time that other projects will request it as equally
optional and experimental. This might be a motivation for designers and
frontend developers currently frustrated with discussions like this one, to
work on Vector or its alternatives.
--
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
MZMcBride
2015-07-21 22:44:14 UTC
Permalink
I strongly agree with what Isarra wrote. She's wise.
Post by Ryan Lane
Post by Isarra Yos
Aye, we do need to move on. But there are also lessons in what has
lingered all this time - we need to look at it and understand why in
order to properly address it and serve the underlying needs. This is
why we iterate on what's there, and don't only make drastically new
things.
Do we actually know the lessons? Are they listed anywhere? Are they valid
anymore? Do modern web practices cover them?
We need to do better about this.
Post by Ryan Lane
It's great to iterate on things when they are relatively modern. It's
folly to do so when you're almost a decade behind the industry standard.
The argument itself is odd because Vector has not been iterating steadily
towards modern practices. It's been stagnant for years.
And this.
Post by Ryan Lane
The reader community is massive and has no voice, except their complaints
across the internet. The WMF can and should be the voice for the reader
community.
This is bullshit. "Decisions are made by those who show up." If you want
to be part of the discussion, all you have to do is participate in good
faith. That's how I'm involved, that's how you're involved, that's how
Isarra and Nemo and Risker and nearly everyone else is involved. Pro-tip:
that's not only how Wikipedia works, that's how life works too.
Post by Ryan Lane
Isn't a motto of the movement "Be bold"? What happened to that?
Have you read the English Wikipedia page lately? It's a nightmare. :-)

https://en.wikipedia.org/wiki/Wikipedia:Be_bold

I keep meaning to cut it back at some point. The various namespace
restrictions are such silliness. In any case, for a long time it's been
"be bold, but not reckless." A big top-down redesign (not that you're
directly proposing such, I'm speaking generally) would be reckless.
Post by Ryan Lane
The status quo is that change never happens because people are too scared
to change. There's no boldness here. There's hardly even basic
assertiveness.
Yeah, the community has put in place some protections to ensure that it
doesn't get trampled by a bunch of product managers sitting in San
Francisco. I won't apologize for that, it's a feature, not a flaw.
Post by Ryan Lane
Post by Isarra Yos
"The community is scared of change" seems to be a common excuse from
those too scared to work with communities outside of their own.
Or an argument of those who think it's not in the readers' best interest
to have editors with little to no knowledge of software engineering or UX
design dictating the engineering and design of reader features.
Encyclopedias are only supposed to be written by experts, too, right? :-)
We're getting into trope territory here.
Post by Ryan Lane
There's not really a conversation. The UX lead is saying "Winter is dead,
let's continue with the iterations on Vector", though there's no real
iteration going on. The editor community is opposed to any change that
doesn't completely agree with them, where the "them" is around 5,000
people who also can't agree with each other and aren't qualified to be
making the decisions to begin with.
What would you like to see changed in Vector? Concrete suggestions. For
me, I'd like to see it become a responsive skin (in the process, killing
MobileFrontend) and I'd like to see some of the gradients removed (or at
least re-evaluated). Those are concrete, actionable items that will likely
get resolved this year. Your turn!

MZMcBride
bawolff
2015-07-22 10:20:57 UTC
Permalink
Post by Ryan Lane
The community you're talking about is the editor community, which is a tiny
fraction of the overall community, but attempts to speak with authority over
the entirety of it. The vocal portion of the editor community that speaks
with this authority is even a minor fraction of the editor community. We're
talking about .001% of the entire community that holds the entire movement
hostage (5167 people voted in the last election, and there's 430 million
monthly active readers).
The reader community is massive and has no voice, except their complaints
across the internet. The WMF can and should be the voice for the reader
community.
In my experience, the WMF lacks the ability (or perhaps maturity?) to
be that voice. Every time someone invokes the readers, usually they do
it to re-assert their personal opinions on the manner, because they're
losing an argument. After all, its not like the readers are going to
rise up and object that their voice is being appropriated. If it was
possible for computer programmers to know what there users wanted
magically, without gathering any evidence, computer programming would
be an entirely different field. As far as I know, misunderstanding
user requirements is one of the top reasons software projects fail.
WMF has certainly severely misjudged the requirements of the editor
community at times, why would they be any better at the reader
community?
Post by Ryan Lane
I've also volunteered my time for the past 10 years, but as an engineer. I
care about Wikimedia more as a reader than as an editor and my experience as
a reader is not great and the editor community is the primary reason for
this. The WMF's hesitation to make change is heavily based on the pitchforks
and torches lit by this community.
Blame is easy to throw around. You can just as easily say that the
problem is due to the WMF viewing the community as a problem to be
worked around, creating an antagonistic relationship that degrades
everyone's interests.

--
-bawolff
Ryan Lane
2015-07-24 05:40:01 UTC
Permalink
Post by bawolff
Post by Ryan Lane
The reader community is massive and has no voice, except their complaints
across the internet. The WMF can and should be the voice for the reader
community.
In my experience, the WMF lacks the ability (or perhaps maturity?) to
be that voice. Every time someone invokes the readers, usually they do
it to re-assert their personal opinions on the manner, because they're
losing an argument. After all, its not like the readers are going to
rise up and object that their voice is being appropriated. If it was
possible for computer programmers to know what there users wanted
magically, without gathering any evidence, computer programming would
be an entirely different field. As far as I know, misunderstanding
user requirements is one of the top reasons software projects fail.
WMF has certainly severely misjudged the requirements of the editor
community at times, why would they be any better at the reader
community?
What I'm saying is that there should be a process to make an interface
change directed at readers, with stated test results, A/B tested, and
adopted if testing meets the criteria of the test results. The editor
community should have little to no say in the process, except to suggest
experiments or question obviously incorrect test results.

The basic idea is that through proper testing of features you should be able
to know an experience is better for the readers without them having a direct
voice.

An example: Make search more discoverable. Add a feature or make an
interface change to test this. A/B test it. See if the frequency of search
usage increased. See if it adversely affected other metrics. If it helped
search usage and didn't negatively affect other metrics, adopt the change.

The issue is that there will be a vocal minority of people who absolutely
hate this change, no matter what it is. These people should be ignored.
Post by bawolff
Blame is easy to throw around. You can just as easily say that the
problem is due to the WMF viewing the community as a problem to be
worked around, creating an antagonistic relationship that degrades
everyone's interests.
I think there's a lot of blame to be thrown around, but the editor community
is who's being worked around and they should be worked around when changes
are meant to affect readers and those changes don't directly negatively
affect editor metrics.

Of course, all of this should be backed-up by data, and it's surely a
failing of the WMF that their development process isn't data driven.

- Ryan
David Abián
2015-07-21 19:22:53 UTC
Permalink
I would like to point out that, perhaps, editors have some kind of
'nostalgia' with their skin, but I also see that editors are not helped
when they want to configure or to improve the design of 'their' wiki,
and CSS and JS pages have to be maintained by a few admins without an
expertise in web design (I'm one of these), with bad results (in other
cases, those pages are directly not maintained).

Wikipedia is visually outdated and the success of many companies
offering designs for Wikipedia (and mass media announcing these
services) makes this problem obvious. But being outdated is not the
whole problem.

Why can big pictures on the articles overflow? Wouldn't it be simple to
add a "max-width:100%"? Too many failures remain after too many years...
--
David Abián - [[User:Abián]]
http://davidabian.com

Wikimedia España
http://www.wikimedia.es
Derk-Jan Hartman
2015-07-22 11:04:50 UTC
Permalink
Post by David Abián
Why can big pictures on the articles overflow? Wouldn't it be simple to
add a "max-width:100%"? Too many failures remain after too many years...
I'm trying out something like this:
https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&type=revision&diff=672564266&oldid=580604132

But as I expected it is breaking Template:Panorama, Template:Wide_image etc.

Once more we come to the point, where we REALLY need CSS stylesheets per
template, to make sure we can simply change SOMETHING.

DJ
Derk-Jan Hartman
2015-07-22 11:39:53 UTC
Permalink
This is better, since it limits impact to small resolution screens...

https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&type=revision&diff=672568310&oldid=580604132

DJ


On Wed, Jul 22, 2015 at 1:04 PM, Derk-Jan Hartman <
Post by Derk-Jan Hartman
Post by David Abián
Why can big pictures on the articles overflow? Wouldn't it be simple to
add a "max-width:100%"? Too many failures remain after too many years...
https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&type=revision&diff=672564266&oldid=580604132
But as I expected it is breaking Template:Panorama, Template:Wide_image etc.
Once more we come to the point, where we REALLY need CSS stylesheets per
template, to make sure we can simply change SOMETHING.
DJ
Continue reading on narkive:
Loading...