Ongoing project: sequencing the genome of an Alaskan Utricularia

Anne-Lise · Aug 20, 2017

Hello all,

As I already introduced it in the "ID this plant" section, I recently sequenced the entire genome of an Utricularia that grows around here (Fairbanks, AK).
From the look of the pictures I posted in the ID thread, it seems that we're dealing with U. macrorhiza but this will be confirmed with the data I collected.

I open this thread to share my work with you and discuss ideas of what I could do with the genome. I guess, I would like to ask you what you would like to see in a publication. Is there any question that we might investigate ? Is there any ambiguity that we could solve ? I really would like to enrich the publication of this genome with the interest of passionated people. My idea is that maybe beyond the scientific and technical scope of this project, there are areas of questionning that are well known from people whom loves and grows carnivorous plants but that escape to laypeople. I have been out of the carnivorous plant hobby for too long and I think I might miss some pertinent perspective.

So, I think that I will describe as much as I can where I am at in this post: things about the plant, the site where I collected it and my preliminary results. I will keep you posted while I move forward with the data.

1- The plant:

Please find the pictures on this thread: https://www.terraforums.com/forums/identify-that-plant-/142413-utricularia-wetlands-alaska.html
The blooming starts at the beginning of July and stands for about 10 days. I observed only one type of flower which led me to suspect that it is just one species that lives in the spot that I observe. There, I found individuals of diverse sizes which make me think that some individuals are older than others. For the sequencing, I used an individual that is from the "young" population. It presented more green parts which insured me that I would extract fresh DNA from it. This individual was made of 3 stems. I used 2 of them for the DNA extraction and kept one to perpetuate it, in case I needed to extract more DNA. Each stem was about 8cm long.

2- The site:

Wetlands for sale located in the west part of town at the bottom of the Chena hills. Very brakish water. I located a big colony of the plant in a part that is more than 70cm deep. The colony is that thick too. I posted pictures of in in the ID thread.
The lakes and bodies of still water start to freeze at the end of September. They freeze down to 2m deep. Thus, I guess that the wetlands where is located the utricularia freezes completely. The thaw starts in April and is complete by the end of May.

3- Utricularia's genome background.

On the NBCI database, we can find:
-complete sequences of chloroplasts from: U. gibba, U. reniformis, U. macrorhiza
-complete sequences of mitochondria from: U. gibba, U. reniformis
-chromosome sequences of the complete genome of U.gibba (maybe U. reniformis - to be confirmed)
-diverse proteins from ~15 species of Utricularias (mostly mitochondrial).

Rought description of these sequences:

-Chloroplastes are about 140-150 kilobase pair (kbp) long
-Mitochondria ~ 860 kbp
-14 chromosomes expected: 4 entirely assembled for U. gibba (length of 3 to 8 Megabase pair - Mbp) and 10 others fragmented. U. gibba's genome is 100Mbp long in total. I expect roughly the same size for our Alaskan plant.

3- The sequencing:

Before the DNA extraction, I "rinsed" the plant in 5 baths of clean water to remove the mud that would have prevented the chemistry of the DNA preparation. However, this doesn't mean that the plant was sterile. Thus, the DNA extracted was likely mostly of the plant but also from microorganisms associated with it (especially those living in the bladders).
I extracted the DNA from 2 stems and included some bladders. I set aside 4 x10 bladders that I froze at -80C in "RNA later" solution in case I came upon some funds to do the transcriptomic of the bladders (and of the microbial communities that live in them). Transcriptomics = analysis of the expressed genes which is different from doing a genome sequencing that just looks at the genes encoded (not necessarily expressed).
The sequencing was made with the Oxford Nanopore technology on minION device. I collected 2.5 millions of reads (fragments of DNA) which represents 10.8 gigabase paires of nucleotides. I filtered the most reliable reads and ended up with 1.6 millions of them representing about 7.5 Gigabase paires of nucleotides. Thus, if the genome is about 100Mbp long, with the filtered reads we should be able to cover it about 75 times which is good for the reliablity of the results. However, the genome of the organelles often dominates over the nuclear chromosomes in sequencings. Thus, it is possible that we'll end up with a high coverage of the mitochondrion and plastid genomes and a lower coverage for the other chromosomes.

4- Preliminary results:

The fragments of DNA that have been sequenced represent the pieces of a giant jigsaw and the sequencing in itself was the easy part. The assembly of the genome is the tricky part: it consists in finding the overlappings between the different fragments, assessing their reliability and merging them. It's the bioinformatic work that I am facing now. As I said previously, my first task was to filter the best fragments to decrease the computing workload and improve the quality of the output. Currently, I did assemblies of the genome with 2 different programs and I am in the process to assess the quality of these 2 assemblies. This work involves big files, command lines programs on a super computer and me whose trying to find my way in all this. Put in other words: it is not a fast process.
One program (minimap) gave me an assembly of 140Mbp and the other (canu) gave me 205Mbp. This is a bit different than the 100Mbp of U. gibba but keep in mind that these assemblies encompass bacterial genomes too and the alaskan plant might have a slightly bigger genome than U. gibba. My longest assembled fragment is 4.6 Mbp but it is likely the genome of a bacterium, not a plant chromosome.
One aspect of my work will be to "decontaminate" my assemblies: removing the genomes of bacteria that were sequenced along the plant. I found a great tool for that and I will assess soon if it is beneficial to do the decontamination before or after the assembly.

5- Ideas for a paper:

To publish a paper, the basics is to describe the techniques involved for this sequencing.

Additional points to explore for the paper:

5.1 - I was thinking to see if reconstructing a phylogeny of utricularias was possible. It will depends of if such thing has been published recently and of the data available for the other species of Utricularias that have been deposited in the NCBI database.

5.2 - I thought that I would try to see if I can look for the telomeres like it was done for U. gibba. That will tell us if we managed to reconstitute entire chromosomes.

5.3 - maybe compare this genome to U. gibba's

This is where I am at right now. Please feel free to comment and suggest ideas to explore !

I will keep you posted at every significant step of the bioinformatic work.

Tanukimo · Aug 21, 2017

Maybe this paper can help you? https://academic.oup.com/aob/article-abstract/doi/10.1093/aob/mcx056/3904474/Phylogeny-of-the-orchid-like-bladderworts-gen?redirectedFrom=fulltext There is also a brief section about the phylogeny of the Lentibulariaceae in the Genlisea monograph.

Anne-Lise · Aug 21, 2017

Well, thank you Tanukimo, it helps indeed. This phylogeny is pretty recent. Thus, it is not necessary for me to reconstruct one except for the case I came to radically different results.

emc2 · Aug 23, 2017

Probably already popped up in your pubmed alerts but else http://www.nature.com/articles/s41598-017-08461-5.pdf could be an interesting starting point too.

Anne-Lise · Aug 24, 2017

[MENTION=12066]emc2[/MENTION]: Yeah, I watched the talk they gave at the London Calling in 2016. It got me really excited. I was not aware that they did a Nature paper with it. It's awesome ! Thank you !!!
I didn't read the paper yet but I think they used a older generation chemistry which means that we get even better results now.
I'm training with the Utricularia in order to do this kind of sequencing in the field too with endangered species. I think it's totally doable for small genomes like Utricularia's. Right now, the challenge for me is the bioinformatics that follows. The good news is that Utricularia's genome doesn't seem to be very complex (very little repeat regions). Thus the assembly softwares do not struggle.
Now, I found out something really exciting this week about another species of Utricularia: some sort of symbiosis. I might have some results regarding that by the end of the week. If it was the case, I'll update this thread.

patrickntd · Aug 24, 2017

Hi Anna,

Re: sequencing, I think you may also need to deal with DNA from fungi and algae in your resultant dataset, since they are commonly live together with Utrics in the nature.

Re: point to explore:
I think phylogeny and comparison to gibba should be done which is straightforward and informative. It will be even more interesting if you can correlate that to the geographic information. In addition, I am interested to know how this specie adapt to the specific environment in AK (freezing cold and short daytime in winter) based on the genome. For examples, do the proteins in energy utilization, digestion, reproduction have SNP to make them adapt to cold temperature?

Just my 2 cents. Looking forward for your publication.

Patrick

emc2 · Aug 24, 2017

Anne-Lise said:
Right now, the challenge for me is the bioinformatics that follows. The good news is that Utricularia's genome doesn't seem to be very complex (very little repeat regions). Thus the assembly softwares do not struggle.

What software do you use with the minion, can you use an OLC one as you end up with long reads?
I unfortunately have no experience with genomes, only do exomes and transcriptomes, but I can always have a look if I can run it on our HPC cluster.

Anne-Lise · Aug 25, 2017

emc2 said:
What software do you use with the minion, can you use an OLC one as you end up with long reads?
I unfortunately have no experience with genomes, only do exomes and transcriptomes, but I can always have a look if I can run it on our HPC cluster.

I assembled the genome with Canu and Minimap/miniasm to compare the results of the 2 assemblers. They might be OLCs. They're pretty standard for Nanopore reads. Assemblies went fast and pretty well I guess. I've got no Illumina reads so I polished with the Nanopore reads and Racon (another Nanopore specialized program) and I will also apply a couple of NanoPolish iterations (also designed for Nanopore reads).
I am currently testing binning tools to sort the contigs of the plant from those of the microbiota associated with it. For that, I'm using BuzyBee and MetaProb on the assemblies but also on the reads to see the effect of a binning pre-assembly. As the plant assembles pretty well, I do not exclude to redo the assembly from the scratch is I see that the binning of the reads if beneficial. In that case, I will also focus on the major other bins that can be of significance.
In general for this sequencing, there is no big difficulty but the steep curve of learning Bash, Linux, HPC and the tools. I'm the bottleneck here - :-D

- but it's actually quite interesting to think about the pipeline and make comparisons.
I appreciate your proposition and I will contact you for sure if I manage to do the transcriptomic of the bladders or maybe for the annotation of the genome. I would certainly benefiate of your experience ! That's awesome, thank you !

emc2 · Aug 25, 2017

I was where you are a few years ago, it's definitively interesting to learn, but takes a lot of time!

I will MP you my pipeline, mostly using SLURM scheduler commands and bash to launch the actual tools.

Anne-Lise · Aug 25, 2017

patrickntd said:
Hi Anna,

Re: sequencing, I think you may also need to deal with DNA from fungi and algae in your resultant dataset, since they are commonly live together with Utrics in the nature.

Excellent point ! I found this paper a couple of days ago and I am definitely keeping my eyes open on Tetrahymena and its algae: https://www.ncbi.nlm.nih.gov/pubmed/27613221
A month ago when I wasn't aware of this paper, I blasted some reads reads and I remember seeing some ciliophora hits which looks promissing now. These past couple of days I looked at taxonomic assignations of my contigs within prokaryotes (BuzyBee does that quickly and nicely). I've got an almost complete genome of a Flavobacterium but there are also some pseudomonas, comamonadales, methanococcales... But definitely, my interest now shifted on the ciliophora and its algae. Fingers crossed, I should know pretty soon !

patrickntd said:
Re: point to explore:
I think phylogeny and comparison to gibba should be done which is straightforward and informative. It will be even more interesting if you can correlate that to the geographic information. In addition, I am interested to know how this specie adapt to the specific environment in AK (freezing cold and short daytime in winter) based on the genome. For examples, do the proteins in energy utilization, digestion, reproduction have SNP to make them adapt to cold temperature?

Also, an interest of mine and I am glad that you share my interest in cold adaptation. I understand that the turions helps from one year to the other but still, they stay hard frozen for 8 months a year (like the woodfrogs, lol) ! It worth to look at anti-freeze proteins and such. My lab does that with bacteria isolated from sea ice. Hopefully, I'll find a way to adapt this search to the plant.
Regarding a comparison to gibba, I was a bit afraid of it as I never compared genomes but if you say that it is easy, I am willing to give it a try ! Hence, I might need suggestions for the tool(s) to use. I can definitely find that in litterature but if you have some experience, I would really appreciate your insight to pick an efficient one that doesn't take days to install and figure out.

Thank you, Patrick !

patrickntd · Aug 25, 2017

Anne,

Unfortunately, I don't have hand-n experience of bioinformatics that can provide you technical advice. I think you may not need whole genome comparison. Instead, you can compare those protein or gene family of interest. Good luck.

Anne-Lise · Aug 28, 2017

Thank you [MENTION=8687]patrickntd[/MENTION]. I hope that you're doing okay in Houston. My thoughts are going to you and your dears.

patrickntd · Aug 29, 2017

Thank you very much.

Search

Ongoing project: sequencing the genome of an Alaskan Utricularia

Share this page

Anne-Lise

Tanukimo

Anne-Lise

emc2

Anne-Lise

patrickntd

emc2

Anne-Lise

emc2

Anne-Lise

patrickntd

Anne-Lise

patrickntd