Hello all,
As I already introduced it in the "ID this plant" section, I recently sequenced the entire genome of an Utricularia that grows around here (Fairbanks, AK).
From the look of the pictures I posted in the ID thread, it seems that we're dealing with U. macrorhiza but this will be confirmed with the data I collected.
I open this thread to share my work with you and discuss ideas of what I could do with the genome. I guess, I would like to ask you what you would like to see in a publication. Is there any question that we might investigate ? Is there any ambiguity that we could solve ? I really would like to enrich the publication of this genome with the interest of passionated people. My idea is that maybe beyond the scientific and technical scope of this project, there are areas of questionning that are well known from people whom loves and grows carnivorous plants but that escape to laypeople. I have been out of the carnivorous plant hobby for too long and I think I might miss some pertinent perspective.
So, I think that I will describe as much as I can where I am at in this post: things about the plant, the site where I collected it and my preliminary results. I will keep you posted while I move forward with the data.
1- The plant:
Please find the pictures on this thread: https://www.terraforums.com/forums/identify-that-plant-/142413-utricularia-wetlands-alaska.html
The blooming starts at the beginning of July and stands for about 10 days. I observed only one type of flower which led me to suspect that it is just one species that lives in the spot that I observe. There, I found individuals of diverse sizes which make me think that some individuals are older than others. For the sequencing, I used an individual that is from the "young" population. It presented more green parts which insured me that I would extract fresh DNA from it. This individual was made of 3 stems. I used 2 of them for the DNA extraction and kept one to perpetuate it, in case I needed to extract more DNA. Each stem was about 8cm long.
2- The site:
Wetlands for sale located in the west part of town at the bottom of the Chena hills. Very brakish water. I located a big colony of the plant in a part that is more than 70cm deep. The colony is that thick too. I posted pictures of in in the ID thread.
The lakes and bodies of still water start to freeze at the end of September. They freeze down to 2m deep. Thus, I guess that the wetlands where is located the utricularia freezes completely. The thaw starts in April and is complete by the end of May.
3- Utricularia's genome background.
On the NBCI database, we can find:
-complete sequences of chloroplasts from: U. gibba, U. reniformis, U. macrorhiza
-complete sequences of mitochondria from: U. gibba, U. reniformis
-chromosome sequences of the complete genome of U.gibba (maybe U. reniformis - to be confirmed)
-diverse proteins from ~15 species of Utricularias (mostly mitochondrial).
Rought description of these sequences:
-Chloroplastes are about 140-150 kilobase pair (kbp) long
-Mitochondria ~ 860 kbp
-14 chromosomes expected: 4 entirely assembled for U. gibba (length of 3 to 8 Megabase pair - Mbp) and 10 others fragmented. U. gibba's genome is 100Mbp long in total. I expect roughly the same size for our Alaskan plant.
3- The sequencing:
Before the DNA extraction, I "rinsed" the plant in 5 baths of clean water to remove the mud that would have prevented the chemistry of the DNA preparation. However, this doesn't mean that the plant was sterile. Thus, the DNA extracted was likely mostly of the plant but also from microorganisms associated with it (especially those living in the bladders).
I extracted the DNA from 2 stems and included some bladders. I set aside 4 x10 bladders that I froze at -80C in "RNA later" solution in case I came upon some funds to do the transcriptomic of the bladders (and of the microbial communities that live in them). Transcriptomics = analysis of the expressed genes which is different from doing a genome sequencing that just looks at the genes encoded (not necessarily expressed).
The sequencing was made with the Oxford Nanopore technology on minION device. I collected 2.5 millions of reads (fragments of DNA) which represents 10.8 gigabase paires of nucleotides. I filtered the most reliable reads and ended up with 1.6 millions of them representing about 7.5 Gigabase paires of nucleotides. Thus, if the genome is about 100Mbp long, with the filtered reads we should be able to cover it about 75 times which is good for the reliablity of the results. However, the genome of the organelles often dominates over the nuclear chromosomes in sequencings. Thus, it is possible that we'll end up with a high coverage of the mitochondrion and plastid genomes and a lower coverage for the other chromosomes.
4- Preliminary results:
The fragments of DNA that have been sequenced represent the pieces of a giant jigsaw and the sequencing in itself was the easy part. The assembly of the genome is the tricky part: it consists in finding the overlappings between the different fragments, assessing their reliability and merging them. It's the bioinformatic work that I am facing now. As I said previously, my first task was to filter the best fragments to decrease the computing workload and improve the quality of the output. Currently, I did assemblies of the genome with 2 different programs and I am in the process to assess the quality of these 2 assemblies. This work involves big files, command lines programs on a super computer and me whose trying to find my way in all this. Put in other words: it is not a fast process.
One program (minimap) gave me an assembly of 140Mbp and the other (canu) gave me 205Mbp. This is a bit different than the 100Mbp of U. gibba but keep in mind that these assemblies encompass bacterial genomes too and the alaskan plant might have a slightly bigger genome than U. gibba. My longest assembled fragment is 4.6 Mbp but it is likely the genome of a bacterium, not a plant chromosome.
One aspect of my work will be to "decontaminate" my assemblies: removing the genomes of bacteria that were sequenced along the plant. I found a great tool for that and I will assess soon if it is beneficial to do the decontamination before or after the assembly.
5- Ideas for a paper:
To publish a paper, the basics is to describe the techniques involved for this sequencing.
Additional points to explore for the paper:
5.1 - I was thinking to see if reconstructing a phylogeny of utricularias was possible. It will depends of if such thing has been published recently and of the data available for the other species of Utricularias that have been deposited in the NCBI database.
5.2 - I thought that I would try to see if I can look for the telomeres like it was done for U. gibba. That will tell us if we managed to reconstitute entire chromosomes.
5.3 - maybe compare this genome to U. gibba's
This is where I am at right now. Please feel free to comment and suggest ideas to explore !
I will keep you posted at every significant step of the bioinformatic work.
As I already introduced it in the "ID this plant" section, I recently sequenced the entire genome of an Utricularia that grows around here (Fairbanks, AK).
From the look of the pictures I posted in the ID thread, it seems that we're dealing with U. macrorhiza but this will be confirmed with the data I collected.
I open this thread to share my work with you and discuss ideas of what I could do with the genome. I guess, I would like to ask you what you would like to see in a publication. Is there any question that we might investigate ? Is there any ambiguity that we could solve ? I really would like to enrich the publication of this genome with the interest of passionated people. My idea is that maybe beyond the scientific and technical scope of this project, there are areas of questionning that are well known from people whom loves and grows carnivorous plants but that escape to laypeople. I have been out of the carnivorous plant hobby for too long and I think I might miss some pertinent perspective.
So, I think that I will describe as much as I can where I am at in this post: things about the plant, the site where I collected it and my preliminary results. I will keep you posted while I move forward with the data.
1- The plant:
Please find the pictures on this thread: https://www.terraforums.com/forums/identify-that-plant-/142413-utricularia-wetlands-alaska.html
The blooming starts at the beginning of July and stands for about 10 days. I observed only one type of flower which led me to suspect that it is just one species that lives in the spot that I observe. There, I found individuals of diverse sizes which make me think that some individuals are older than others. For the sequencing, I used an individual that is from the "young" population. It presented more green parts which insured me that I would extract fresh DNA from it. This individual was made of 3 stems. I used 2 of them for the DNA extraction and kept one to perpetuate it, in case I needed to extract more DNA. Each stem was about 8cm long.
2- The site:
Wetlands for sale located in the west part of town at the bottom of the Chena hills. Very brakish water. I located a big colony of the plant in a part that is more than 70cm deep. The colony is that thick too. I posted pictures of in in the ID thread.
The lakes and bodies of still water start to freeze at the end of September. They freeze down to 2m deep. Thus, I guess that the wetlands where is located the utricularia freezes completely. The thaw starts in April and is complete by the end of May.
3- Utricularia's genome background.
On the NBCI database, we can find:
-complete sequences of chloroplasts from: U. gibba, U. reniformis, U. macrorhiza
-complete sequences of mitochondria from: U. gibba, U. reniformis
-chromosome sequences of the complete genome of U.gibba (maybe U. reniformis - to be confirmed)
-diverse proteins from ~15 species of Utricularias (mostly mitochondrial).
Rought description of these sequences:
-Chloroplastes are about 140-150 kilobase pair (kbp) long
-Mitochondria ~ 860 kbp
-14 chromosomes expected: 4 entirely assembled for U. gibba (length of 3 to 8 Megabase pair - Mbp) and 10 others fragmented. U. gibba's genome is 100Mbp long in total. I expect roughly the same size for our Alaskan plant.
3- The sequencing:
Before the DNA extraction, I "rinsed" the plant in 5 baths of clean water to remove the mud that would have prevented the chemistry of the DNA preparation. However, this doesn't mean that the plant was sterile. Thus, the DNA extracted was likely mostly of the plant but also from microorganisms associated with it (especially those living in the bladders).
I extracted the DNA from 2 stems and included some bladders. I set aside 4 x10 bladders that I froze at -80C in "RNA later" solution in case I came upon some funds to do the transcriptomic of the bladders (and of the microbial communities that live in them). Transcriptomics = analysis of the expressed genes which is different from doing a genome sequencing that just looks at the genes encoded (not necessarily expressed).
The sequencing was made with the Oxford Nanopore technology on minION device. I collected 2.5 millions of reads (fragments of DNA) which represents 10.8 gigabase paires of nucleotides. I filtered the most reliable reads and ended up with 1.6 millions of them representing about 7.5 Gigabase paires of nucleotides. Thus, if the genome is about 100Mbp long, with the filtered reads we should be able to cover it about 75 times which is good for the reliablity of the results. However, the genome of the organelles often dominates over the nuclear chromosomes in sequencings. Thus, it is possible that we'll end up with a high coverage of the mitochondrion and plastid genomes and a lower coverage for the other chromosomes.
4- Preliminary results:
The fragments of DNA that have been sequenced represent the pieces of a giant jigsaw and the sequencing in itself was the easy part. The assembly of the genome is the tricky part: it consists in finding the overlappings between the different fragments, assessing their reliability and merging them. It's the bioinformatic work that I am facing now. As I said previously, my first task was to filter the best fragments to decrease the computing workload and improve the quality of the output. Currently, I did assemblies of the genome with 2 different programs and I am in the process to assess the quality of these 2 assemblies. This work involves big files, command lines programs on a super computer and me whose trying to find my way in all this. Put in other words: it is not a fast process.
One program (minimap) gave me an assembly of 140Mbp and the other (canu) gave me 205Mbp. This is a bit different than the 100Mbp of U. gibba but keep in mind that these assemblies encompass bacterial genomes too and the alaskan plant might have a slightly bigger genome than U. gibba. My longest assembled fragment is 4.6 Mbp but it is likely the genome of a bacterium, not a plant chromosome.
One aspect of my work will be to "decontaminate" my assemblies: removing the genomes of bacteria that were sequenced along the plant. I found a great tool for that and I will assess soon if it is beneficial to do the decontamination before or after the assembly.
5- Ideas for a paper:
To publish a paper, the basics is to describe the techniques involved for this sequencing.
Additional points to explore for the paper:
5.1 - I was thinking to see if reconstructing a phylogeny of utricularias was possible. It will depends of if such thing has been published recently and of the data available for the other species of Utricularias that have been deposited in the NCBI database.
5.2 - I thought that I would try to see if I can look for the telomeres like it was done for U. gibba. That will tell us if we managed to reconstitute entire chromosomes.
5.3 - maybe compare this genome to U. gibba's
This is where I am at right now. Please feel free to comment and suggest ideas to explore !
I will keep you posted at every significant step of the bioinformatic work.