Tree curation sprint

We had a data curation sprint on November 30 from 9:30-11:30. During this time, all project personnel will set aside time to upload trees and data into Phylografter. All project personnel are expected to participate. The goals were two-fold:
  1. Test the data upload interface of Phylografter
  2. Produce a set of input trees for the synthesis group to use for testing

This sprint generated a list of bugs and feature requests

Before Friday morning

1. Register in Phylografter

In order to upload data, your OpenID (from Google, Yahoo!, or another OpenID provider) will need to be authorized for this by Phylografter. Most PIs and other personnel are already authorized. If you aren't, please contact Richard Ree.

2. Set up Mendeley

You should have a Mendeley account and be a member of the OpenTree group. See here for Mendeley information. If there are studies that you want that aren't yet in the Mendeley group, add them!

3. Review project documentation

If you are interested in more background: Minutes of meetings where the 'sprint' has been discussed. Also mentioned here.

Also see the Trello board.

Instructions for Friday

We are going to use an etherpad to communicate, add notes, etc.

We avoid race conditions (where two people are working on the same data set) using this spreadsheet. Before you start entering a study in Phylografter, check that no one else has picked this one, and if not, enter it in the spreadsheet.

Using Phylografter

The Phylografter site has instructions for upload. Trees can be loaded from Treebase (easiest) or entered from outside of Treebase (harder).

Upload from Treebase
Go to www.reelab.net/phylografter to upload matrices and trees from published studies, either from Treebase or not from Treebase. To get to the upload form, mouse over 'Studies' at the top left and the 'New study' and'Import from Treebase' choices will drop down.

[I expect the GUI will be mostly self-explanatory.]

Upload not from Treebase
{Information on file formats, if not going from Treebase} See https://github.com/nexml/nexml/wiki/NeXML-Manual

[When uploading could one use a bare DOI for bibliographic reference, instead of having to enter the full bibliographic entry manually? If not, what is the preferred bib. format - maybe give an example in the GUI?]

Name resolution
The 'tips' or 'leaves' of each tree must resolve to taxa known to the system (?-OTToL). There is not at present a way to add new taxa beyond this set. If you are attempting to enter a tree, and a tip is not known, you may have to re-enter the tree later after this software shortcoming is addressed.

Tips will usually be known, since most tips are for sequences, most sequences are on deposit with Genbank, Genbank resolves species names to entries in NCBI Taxonomy, and ?-OTToL knows all species names that are known to NCBI Taxonomy (and many others besides). The system will canonicalize synonyms. In case of a homonym such as Spirella (which is a plant, a bryozoan, and a foram) you may have to select which sense is applicable in the data set.

Phylografter's starting species list is internally called 'pre-OTToL' and derives from {list of source taxonomies} . This taxonomy may be updated or replaced at some point in the future but such changes will not invalidate curation that is performed now.

Cody says (29 October): [Obviously things will be different by the time the sprint starts.]

Well, curl is arguably the easiest way to interact with the service right now. There may be other better ways that I don't know about, but just plugging in a URL into a web browser won't work because the browser won't format the request properly. Once things have stabilized, I can put up a web form GUI.

For now though, if you want to try it, it is pretty easy. On a Mac you can just copy and paste the example curl calls directly into the terminal (and/or edit the list of names to be queried). If you use Windows, I think you might have to download a curl utility...

Regarding the request formatting, if you want to specify different names to be queried, then you can just change the value of the "queryString" property, leaving everything else the same. Names are separated by commas and can contain spaces. For example, replace the highlighted portion below with any list of names.

curl -X POST http://opentree-dev.bio.ku.edu:7474/db/data/ext/TNRS/graphdb/doTNRSForNames \
-H "Content-Type: Application/json" -d '{"queryString":"Ulva, Lilium martagon, Mus musculus musculus"}'

What next


Coming in 2013:creation of synthetic overall trees formed by combining trees that have overlapping sets of tip names; using taxonomies to create synthetic trees where phylogenetic trees are unavailable; GUI visualization and download of same; and much more. See Requirements gathering