[Cz-biology] Auto-created articles about genes

Larry Sanger sanger at citizendium.org
Thu Sep 20 15:01:13 CDT 2007


Thanks for this, Andrew.  My comments:

> >********************************
> > How does this effort relate to the many other gene 
> databases/portals 
> >already available?
> 
> There are two main advantages of this effort, both of which 
> stem from the fact that CZ is a wiki.  First, all the other 
> gene portals (our SymAtlas database included) are primarily 
> composed to tag-value pairs (e.g., symbol = "APP", function = 
> "apoptosis", etc.).  Second, other gene portals are 99% 
> one-way communication, from data providers to data consumers. 
>  Of course, we all here know that wikis are a great 
> complementary resource to these types of databases, allowing 
> both free-text and user-contributed gene annotation.

That makes sense.

> >********************************
> > since a parallel effort is intended at Wikipedia, will the 
> intent be 
> >substantially different?  Other than our use of subpages, 
> how might our 
> >articles/clusters differ from  Wikipedia's?  If they wouldn't differ 
> >appreciably, is that a reason for us not to do it  (or, indeed, to 
> >insist on doing it)?
> 
> I've been thinking about this issue quite a bit, since there 
> is a compelling argument that doing parallel efforts at CZ 
> and WP dilutes
> the impact and contributions of both.

Indeed, this is something you ought to think more about.  Most important is
this consideration: you evidently want geneticists to work systematically
and much on these articles.  But if that work is done in the context of
Wikipedia, there is an excellent chance it will simply go to waste.  There
are various reasons for this.  The geneticists will grow disgusted, as very
many experts become, with Wikipedia, and stop working.  The work that they
do then "rots" on Wikipedia, as no one adequately knowledgeable maintains
it.  There is also this danger on CZ--but it would be less of a danger, I
think, in the long run.

You could also find articles, types of information, work of particular
individuals, etc., all in the crosshairs of overzealous Wikipedia admins.  I
don't know how like it would be in this particular case, but I wouldn't want
to make my chances myself.  I mean, for example, if I wanted to upload a
database of information about great philosophers and philosophical texts,
say I certainly wouldn't want it left in the hands of Wikipedia admins.  You
have to understand that you, as an organization and as experts, *don't have
any official authority* on Wikipedia.  Decisions about your information are
not in your hands, they are ultimately in the hands of people who are, I'm
sorry to say, heavily anonymous and immature.

Third, there are two complementary problems.  On the one hand, you split
geneticist participation in a wiki gene encyclopedia between WP and CZ; on
the other hand, you forego the possibility of a focused and unified effort
in the expert hands of CZ editors and processes.  Considering that most wiki
initiatives fail, period, this may be the most important point of all.  I
would also be much more apt to spend my own time, recruiting geneticists, if
the project were exclusively a CZ project.  I wouldn't take so much
interest, frankly, if it were competing with WP.

In fact, and this is also important, if the CZ articles were left largely
untouched and the WP articles experienced some development, I would want to
delete them from CZ.  There's no point in having two copies of this same
sort of resource if they aren't both moving forward.

In short, CZ is the right home for this sort of project, and splitting the
scientist population and the mindshare is, frankly, a non-starter as far as
I'm concerned.  Obviously, though, this is up to you.

There is one other consideration.  It might be better to begin life on CZ
and, if there isn't enough interest, then switch to the inferior solution.
This is probably the best way to maximize the success of your project--more
than either starting exclusively on WP, or splitting the difference.

> >********************************
> > It is to be watched whether a pharma company might have any 
> commercial  
> >interest, even one not evident to you, in influencing the content in 
> >any way  of an article they are involved with.
> 
> A valid point, and we welcome the scrutiny.  First, it's 
> worth pointing out that potential biases pertain to hand-made 
> edits as well.  The fact that we're talking about a bot to 
> make automated edits changes the number of contributions I'm 
> (indirectly) making, and not the fact that I work for a 
> company.  Unless CZ plans on excluding all contributors who 
> work for commercial entities, then I think this comes down to 
> a person-by-person evaluation of credentials when approving 
> authorship and editorship and ongoing evaluation of contributions.

Well, I would emphasize a different point.  Insofar as a pharma-funded
organization is supplying the bot and data, we can already see exactly what
the information is they're supplying, and the external links, etc.  We can
RIGHT NOW make a judgment if there is something unfair going on.  The
question isn't really whether a pharma company is benefitted; the whole
world would benefit from a kick-ass Citizendium.  The question is whether
the data *unfairly* benefits an entity and does so by our information
unfairly preferring one entity over another.

If you biologists, familiar with the players and resources available, assure
me this isn't the case based on the example provided, I think I'm
comfortable with the situation.

> Third, as was pointed out in an email that Larry forwarded, 
> the functions of the bot and the rules by which it operates 
> are completely transparent.

Exactly.

> As I see it, the only 
> potential conflict of interest is the link from the gene 
> stubs to SymAtlas (the free and public gene portal that we
> created) and the SymAtlas images displayed on the "Gallery" 
> subpage.**

The questions, clearly, are (1) whether there is another free (or
very-commonly-subscribed-to) resource that is as good or better.  Anyone
know?  And (2) whether the (image) information is actually useful to
geneticists.

> ** it turns out that I actually didn't set up the APP example 
> stub how I'd really like to see it.  I intended to put a link 
> directly back to SymAtlas, where additional gene expression 
> data sets are available. Take a look at the WP pages linked 
> above to see basically how I'd propose linking them here 
> ("More reference expression data" link).
> 
> 
> >********************************
> > And what is the long-term plan here?  And why is the 
> license an issue?
> 
> Well, no one asked that first question, but it certainly 
> relates to the second.  Eventually I'd like to incorporate 
> gene wiki content directly into SymAtlas (actually SymAtlas' 
> successor, being developed now.), including reciprocal links. 
>  One link will take CZ/WP users to SymAtlas and its 
> additional gene expression data sets.  Similarly, SymAtlas 
> will display the community-contributed wiki content and link 
> back to CZ/WP.

That's yet another reason, by the way, to have only one home for these
articles, and that it be CZ: here, geneticists can act as editors, and
someone who uses our data doesn't have to negotiate between different CZ and
WP versions of articles.

Personally, I don't have any problem with a corporation profiting from CZ's
information, as long as they--as in this case--bring something significant
to the table.

> >********************************
> > And what are the next immediate steps?
> 	
> The next step as far as CZ will be to test whether the WP bot 
> will work with little/no modifications.  There were no 
> objections from the CZ-Tools group, so we hope to do this in 
> the next week or two.

That sounds good.

> The WP bot trial period is done, so we 
> expect to go into mass production mode there later this week. 

Again, I think that's a bad idea.  I would be forced to reconsider my
stance.

> Although hiccups aren't unexpected, I hope to have at least 
> a thousand or so automated and semi-automated WP edits done 
> in the next month.  Not long after that, I hope to draft a 
> manuscript to submit to an academic journal.  If the CZ bot 
> test goes as expected, I think it would be possible to 
> quickly catch up over here (assuming there continues to be 
> support for it here and the licensing issue can be worked 
> out) so that the CZ effort can also be mentioned/highlighted 
> in the manuscript.

The licensing issue can be worked out very quickly, I think.  So far I've
seen no objections from the biologists, and I don't know that I would even
ask the Editorial Council for their opinion on the licensing question,
frankly: such legal questions ultimately must be decided by the legal
owners/trustees of the project.  Of course, if it turned out to be extremely
unpopular, my decision might be influenced.

--Larry



More information about the Cz-biology mailing list