Ok, so that was more than a day’s hiatus I took from the
QIIME tutorial, but I had a hooding ceremony to attend! My folks just left
Sunday, it was a great visit :)
Today I also sent my dissertation to my committee… and
immediately after discovered additional formatting errors after battling
formatting issues for a week. LeSigh.
Back to the QIIME tutorial using the VirtualBox and
Ubuntu interface. Today I learned the following:
1. How to convert a .sff file to .fna file in fasta
format via QIIME
(http://qiime.org/scripts/process_sff.html).
It’s easy. Basically in your terminal window, make sure you’re in the directory
(cd) that the .sff file is. Then use the command
process_sff.py -i lakes454.sff
That will convert the input file (-i) named lakes454.sff,
which is what 454 pyrosequencing gives you, to outputs lakes454.fna and
lakes454.qual. The .fna is fasta format of all the 454 sequences. There is also
a .qual file that is generated, which tells you about the quality of the
bases/sequences. A full description can be found here under “Quality Scores” http://qiime.org/tutorials/tutorial.html
Also to get info about a script (.py) type in
process_sff.py –h
This will bring up the help file. It tells you all the
inputs and outputs you will get. You can also type in
process_sff.py
By itself and get general info about what this script
does.
2. Making a mapping file. BY HAND. Yes, I’ve been given
the sample name, barcode sequences, forward primer, and reverse primer in a
.txt file… but not in the format required by QIIME. So, to do it by hand, I am
following the instructions here:
It’s not horrible… okay yes it was. I named the file
Lakes_Map1.txt. Started it in Excel with each heading as a separate column.
Then copied and pasted the samples IDs and barcodes in the appropriate columns,
etc. Tried to save as tab-delimited .txt file, and Excelt added double
quotations to the .txt file.
I deleted all the double quotations by hand in the .txt
file (Gatta be a better way to do this but all I could find online were Excel
macros and I don’t use those yet) itself and saved it again.
I then checked my mapping file to see if there were any
formatting errors in it using the command:
check_id_map.py -m
Lakes_Map1.txt -o Lakes_output
This generated a new folder entitled Lakes_output with
some files in it. The .html file tells you where the errors are and the
_corrected.txt file tries to correct them for you. I deleted the …. That were
inserted where the errors in my file were and resaved the file as
Lakes_Map1.txt (deleted all the old ones). And redid the check_id_map.py
command. This time there were no errors. Yay!
So this is as far as I got today. Not bad! I’m also using
the QIIME tutorial files they give to try the new commands first and then using
the Lakes data given to me by a co-worker to try these things on REAL data. I
feel that’s the only way I am going to learn this process. Today’s
bioinformatics “workout” took about 2.25 hours with all the errors and doing
stuff by hand.
Once you have a .csv with extra quotes and stuff, here's how I would handle it:
ReplyDelete1. Open the .csv in a text editor, such as Notepad in Windows.
2. Use the "Find and replace" command to Find: "" and replace with nothing.
3. Tell it to "Replace All". (ctrl+z will undo if yields unexpected results.)
No macro required. :)
Oh good idea. Thanks! :)
ReplyDelete