Tuesday, May 28, 2013

Bioinformatics or Bust!

Ok, so that was more than a day’s hiatus I took from the QIIME tutorial, but I had a hooding ceremony to attend! My folks just left Sunday, it was a great visit :)

Today I also sent my dissertation to my committee… and immediately after discovered additional formatting errors after battling formatting issues for a week. LeSigh.

Back to the QIIME tutorial using the VirtualBox and Ubuntu interface. Today I learned the following:

1. How to convert a .sff file to .fna file in fasta format via QIIME 
(http://qiime.org/scripts/process_sff.html). It’s easy. Basically in your terminal window, make sure you’re in the directory (cd) that the .sff file is. Then use the command

process_sff.py -i lakes454.sff

That will convert the input file (-i) named lakes454.sff, which is what 454 pyrosequencing gives you, to outputs lakes454.fna and lakes454.qual. The .fna is fasta format of all the 454 sequences. There is also a .qual file that is generated, which tells you about the quality of the bases/sequences. A full description can be found here under “Quality Scores” http://qiime.org/tutorials/tutorial.html

Also to get info about a script (.py) type in
process_sff.py –h

This will bring up the help file. It tells you all the inputs and outputs you will get. You can also type in
process_sff.py

By itself and get general info about what this script does.

2. Making a mapping file. BY HAND. Yes, I’ve been given the sample name, barcode sequences, forward primer, and reverse primer in a .txt file… but not in the format required by QIIME. So, to do it by hand, I am following the instructions here:

It’s not horrible… okay yes it was. I named the file Lakes_Map1.txt. Started it in Excel with each heading as a separate column. Then copied and pasted the samples IDs and barcodes in the appropriate columns, etc. Tried to save as tab-delimited .txt file, and Excelt added double quotations to the .txt file.

I deleted all the double quotations by hand in the .txt file (Gatta be a better way to do this but all I could find online were Excel macros and I don’t use those yet) itself and saved it again.
I then checked my mapping file to see if there were any formatting errors in it using the command:

check_id_map.py -m Lakes_Map1.txt -o Lakes_output

This generated a new folder entitled Lakes_output with some files in it. The .html file tells you where the errors are and the _corrected.txt file tries to correct them for you. I deleted the …. That were inserted where the errors in my file were and resaved the file as Lakes_Map1.txt (deleted all the old ones). And redid the check_id_map.py command. This time there were no errors. Yay!

So this is as far as I got today. Not bad! I’m also using the QIIME tutorial files they give to try the new commands first and then using the Lakes data given to me by a co-worker to try these things on REAL data. I feel that’s the only way I am going to learn this process. Today’s bioinformatics “workout” took about 2.25 hours with all the errors and doing stuff by hand.

Signing off for today. Time for a real physical workout.

Monday, May 20, 2013

My journey onward

This week I am being "Hooded". Basically that means I wear some robes (rented) and my mentor and the Dean of my college say a few words about me at the doctoral ceremony. My parents are coming to Delaware from Colorado because they wanted to take pictures...
My actual defense is June 11th.
I'll be happy to see my parents this weekend, regardless.

Why am I not more excited to be getting my doctorate after 27 years of academia (since I started preschool at age 4 in India)? I'm not sure... I think it's because of the following:
1. impending sense of DOOM due to the collapse of the American economy
2. impending sense of DOOM because science doesn't seem to matter to most people
3. impending sense of DOOM due to not having a job lined up yet
4. impending sense of DOOM because I don't know bioinformatics.

So what can I do about this sense of DOOM? Well, probably nothing about number 1 or 2 short term. Number 3 I am applying for everything which interests me, so I'm bound to get lucky sometime.
Number 4 is obvious, teach myself bioinformatics.
It's a painful process, though, since it requires expertise in computers, scripting/programming, understanding many openly available programs and data-harvesting sites, and lots of time.
Today I made some progress in this! I am trying to teach myself Qiime (pronounced chime apparently), and I am using their free tutorial to do so. It took me hours to just download and install the VirtualBox and get it running. Make space on my hard drive by backing-up/deleting old pictures. I had to figure out how to turn on virtualization technology in the BIOS of my laptop. Finally get to typing in commands and have Qiime do things for me using their example files.

All I did today was check to make sure the mapping file was properly formatted. Generally you get a file like this from 454 sequencing of 16S rRNAs. The program gave me a corrected file which looks good. YAY.
I will continue with this tomorrow. For now I have to go and format my dissertation. It must be sent to the committee on Monday next week.