[Cz-biology] Auto-created articles about genes

Larry Sanger sanger at citizendium.org
Thu Sep 20 15:04:14 CDT 2007


Sorry, a sentence in there should read: "I don't know how likeLY it would be
in this particular case, but I wouldn't want to TAKE make my chances
myself."

> -----Original Message-----
> From: cz-biology-bounces at mail.citizendium.org 
> [mailto:cz-biology-bounces at mail.citizendium.org] On Behalf Of 
> Larry Sanger
> Sent: Thursday, September 20, 2007 4:01 PM
> To: 'Biology Workgroup List'
> Subject: Re: [Cz-biology] Auto-created articles about genes
> 
> 
> Thanks for this, Andrew.  My comments:
> 
> > >********************************
> > > How does this effort relate to the many other gene
> > databases/portals
> > >already available?
> > 
> > There are two main advantages of this effort, both of which
> > stem from the fact that CZ is a wiki.  First, all the other 
> > gene portals (our SymAtlas database included) are primarily 
> > composed to tag-value pairs (e.g., symbol = "APP", function = 
> > "apoptosis", etc.).  Second, other gene portals are 99% 
> > one-way communication, from data providers to data consumers. 
> >  Of course, we all here know that wikis are a great 
> > complementary resource to these types of databases, allowing 
> > both free-text and user-contributed gene annotation.
> 
> That makes sense.
> 
> > >********************************
> > > since a parallel effort is intended at Wikipedia, will the
> > intent be
> > >substantially different?  Other than our use of subpages,
> > how might our
> > >articles/clusters differ from  Wikipedia's?  If they 
> wouldn't differ
> > >appreciably, is that a reason for us not to do it  (or, indeed, to 
> > >insist on doing it)?
> > 
> > I've been thinking about this issue quite a bit, since there
> > is a compelling argument that doing parallel efforts at CZ 
> > and WP dilutes
> > the impact and contributions of both.
> 
> Indeed, this is something you ought to think more about.  
> Most important is this consideration: you evidently want 
> geneticists to work systematically and much on these 
> articles.  But if that work is done in the context of 
> Wikipedia, there is an excellent chance it will simply go to 
> waste.  There are various reasons for this.  The geneticists 
> will grow disgusted, as very many experts become, with 
> Wikipedia, and stop working.  The work that they do then 
> "rots" on Wikipedia, as no one adequately knowledgeable 
> maintains it.  There is also this danger on CZ--but it would 
> be less of a danger, I think, in the long run.
> 
> You could also find articles, types of information, work of 
> particular individuals, etc., all in the crosshairs of 
> overzealous Wikipedia admins.  I don't know how like it would 
> be in this particular case, but I wouldn't want to make my 
> chances myself.  I mean, for example, if I wanted to upload a 
> database of information about great philosophers and 
> philosophical texts, say I certainly wouldn't want it left in 
> the hands of Wikipedia admins.  You have to understand that 
> you, as an organization and as experts, *don't have any 
> official authority* on Wikipedia.  Decisions about your 
> information are not in your hands, they are ultimately in the 
> hands of people who are, I'm sorry to say, heavily anonymous 
> and immature.
> 
> Third, there are two complementary problems.  On the one 
> hand, you split geneticist participation in a wiki gene 
> encyclopedia between WP and CZ; on the other hand, you forego 
> the possibility of a focused and unified effort in the expert 
> hands of CZ editors and processes.  Considering that most 
> wiki initiatives fail, period, this may be the most important 
> point of all.  I would also be much more apt to spend my own 
> time, recruiting geneticists, if the project were exclusively 
> a CZ project.  I wouldn't take so much interest, frankly, if 
> it were competing with WP.
> 
> In fact, and this is also important, if the CZ articles were 
> left largely untouched and the WP articles experienced some 
> development, I would want to delete them from CZ.  There's no 
> point in having two copies of this same sort of resource if 
> they aren't both moving forward.
> 
> In short, CZ is the right home for this sort of project, and 
> splitting the scientist population and the mindshare is, 
> frankly, a non-starter as far as I'm concerned.  Obviously, 
> though, this is up to you.
> 
> There is one other consideration.  It might be better to 
> begin life on CZ and, if there isn't enough interest, then 
> switch to the inferior solution. This is probably the best 
> way to maximize the success of your project--more than either 
> starting exclusively on WP, or splitting the difference.
> 
> > >********************************
> > > It is to be watched whether a pharma company might have any
> > commercial
> > >interest, even one not evident to you, in influencing the 
> content in
> > >any way  of an article they are involved with.
> > 
> > A valid point, and we welcome the scrutiny.  First, it's
> > worth pointing out that potential biases pertain to hand-made 
> > edits as well.  The fact that we're talking about a bot to 
> > make automated edits changes the number of contributions I'm 
> > (indirectly) making, and not the fact that I work for a 
> > company.  Unless CZ plans on excluding all contributors who 
> > work for commercial entities, then I think this comes down to 
> > a person-by-person evaluation of credentials when approving 
> > authorship and editorship and ongoing evaluation of contributions.
> 
> Well, I would emphasize a different point.  Insofar as a 
> pharma-funded organization is supplying the bot and data, we 
> can already see exactly what the information is they're 
> supplying, and the external links, etc.  We can RIGHT NOW 
> make a judgment if there is something unfair going on.  The 
> question isn't really whether a pharma company is benefitted; 
> the whole world would benefit from a kick-ass Citizendium.  
> The question is whether the data *unfairly* benefits an 
> entity and does so by our information unfairly preferring one 
> entity over another.
> 
> If you biologists, familiar with the players and resources 
> available, assure me this isn't the case based on the example 
> provided, I think I'm comfortable with the situation.
> 
> > Third, as was pointed out in an email that Larry forwarded,
> > the functions of the bot and the rules by which it operates 
> > are completely transparent.
> 
> Exactly.
> 
> > As I see it, the only
> > potential conflict of interest is the link from the gene 
> > stubs to SymAtlas (the free and public gene portal that we
> > created) and the SymAtlas images displayed on the "Gallery" 
> > subpage.**
> 
> The questions, clearly, are (1) whether there is another free (or
> very-commonly-subscribed-to) resource that is as good or 
> better.  Anyone know?  And (2) whether the (image) 
> information is actually useful to geneticists.
> 
> > ** it turns out that I actually didn't set up the APP example
> > stub how I'd really like to see it.  I intended to put a link 
> > directly back to SymAtlas, where additional gene expression 
> > data sets are available. Take a look at the WP pages linked 
> > above to see basically how I'd propose linking them here 
> > ("More reference expression data" link).
> > 
> > 
> > >********************************
> > > And what is the long-term plan here?  And why is the
> > license an issue?
> > 
> > Well, no one asked that first question, but it certainly
> > relates to the second.  Eventually I'd like to incorporate 
> > gene wiki content directly into SymAtlas (actually SymAtlas' 
> > successor, being developed now.), including reciprocal links. 
> >  One link will take CZ/WP users to SymAtlas and its 
> > additional gene expression data sets.  Similarly, SymAtlas 
> > will display the community-contributed wiki content and link 
> > back to CZ/WP.
> 
> That's yet another reason, by the way, to have only one home 
> for these articles, and that it be CZ: here, geneticists can 
> act as editors, and someone who uses our data doesn't have to 
> negotiate between different CZ and WP versions of articles.
> 
> Personally, I don't have any problem with a corporation 
> profiting from CZ's information, as long as they--as in this 
> case--bring something significant to the table.
> 
> > >********************************
> > > And what are the next immediate steps?
> > 	
> > The next step as far as CZ will be to test whether the WP bot
> > will work with little/no modifications.  There were no 
> > objections from the CZ-Tools group, so we hope to do this in 
> > the next week or two.
> 
> That sounds good.
> 
> > The WP bot trial period is done, so we
> > expect to go into mass production mode there later this week. 
> 
> Again, I think that's a bad idea.  I would be forced to 
> reconsider my stance.
> 
> > Although hiccups aren't unexpected, I hope to have at least
> > a thousand or so automated and semi-automated WP edits done 
> > in the next month.  Not long after that, I hope to draft a 
> > manuscript to submit to an academic journal.  If the CZ bot 
> > test goes as expected, I think it would be possible to 
> > quickly catch up over here (assuming there continues to be 
> > support for it here and the licensing issue can be worked 
> > out) so that the CZ effort can also be mentioned/highlighted 
> > in the manuscript.
> 
> The licensing issue can be worked out very quickly, I think.  
> So far I've seen no objections from the biologists, and I 
> don't know that I would even ask the Editorial Council for 
> their opinion on the licensing question,
> frankly: such legal questions ultimately must be decided by 
> the legal owners/trustees of the project.  Of course, if it 
> turned out to be extremely unpopular, my decision might be influenced.
> 
> --Larry
> 
> _______________________________________________
> Cz-biology mailing list
> Cz-biology at mail.citizendium.org 
> http://mail.citizendium.org/mailman/listinfo/cz-biology
> 



More information about the Cz-biology mailing list