BI296: Linux Programming For Bioinformatics

2017 Spring

[Exercise 0]

  1. Follow the instructions to install CentOS on your laptop, together with Windows 7/8/10.
  2. If you use EFI mainboard instead of BIOS/MBR, refer to this tutorial.
  3. Or instead you can use virtual machine to install CentOS.
  4. If you want to install CentOS under MacOS, use Docker instead.

[Exercise 1](Due: March 24, 2017)

  1. There are a list of compression format in Linux, write down them as many as you can.
  2. write down the commands to compress into or decompress from the above format file.
  3. Here is a compressed file, can you convert it to *.tar.Z? Or *.tar.bz2? Which format has greater compression rate? See tar tutorial for further help.

[Exercise 2: Regular Expression](Due: April 21, 2017)

[Exercise 3: BASH](Due: April 28, 2017.)

  1. Convert chipseq.txt into processed.txt.
  2. Write a bash script to calculate the sequence length for each sequence in regions.fas. You should output the sequence name as well as its corresponding length.
  3. Summarize the usage of (), [], {} in bash.
  4. Use some examples to describe the usage of the bash command bc.
  5. Write a bash program to generate random DNA sequence of length 100, and with nucleotide compositions: A(0.30), T(0.30), C(0.20), G(0.20) or whatever you want.
  6. Generate the random sequences of length 200, 500, 1000, and 2000, with the same composition as above.
  7. Use histogram or barplot to test whether the random sequences are consistent with the preset composition.

[Exercise 4: C Programming](Due: May 5, 2017)

  1. Write a C program to generate a random square matrix with a given number of rows and columns, and save it in a file with TAB-delimited format.
  2. Write a C program to read a square matrix from a given file (TAB-separated), without specifying the number of fields for each row.
  3. Write a C program to transpose the square matrix, to compute the diagonal vector.
  4. Here is the C source code for computing the determinant for a square matrix.
  5. Write a Makefile for the above source code for quick compilation.
  6. Write a C program to sort a struct array based on a given property of the struct. Any sorting algorithm can be applied to your program.

[Exercise 5: Python Fundamentals](Due: May 12, 2017)

[Exercise 6: Python Strings and Lists](Due: May 19, 2017)

[Exercise 7: Python Tuples,Dicts and File I/O](Due: May 26, 2017)

[Exercise 8: Object-oriented Programming in Python](Due: June 02, 2017)

[Exercise 9: Python Data Science](Due: June 16, 2017)

[Exercise 10: Algorithms](Due: June 16)

Write a program to compute the suffix arrays of each of the following strings. Assume that string indices start at 1.
  1. dogwood$
  2. banana$
  3. unabashable$
  4. appellee$
  5. ceaselessness$
  6. nonchronological$
  7. possessionlessness$
  8. nonchronological$
  9. noncorroboration$
  10. teletypesetter$
  • Project 1: Sorting an Array in Different Algorithms
  • [Tutorials][Source Code Templates]

    You are required to pack the sorting algorithms into a Python module, so that you can call all the algorithms by importing the module. Besides sorting the numerical array, the module should be able to sort a string array as well.

  • Project 2: Image Compression Algorithms
  • Image Compression algorithms can be compared under categories of lossy and lossless techniques. The best image compression algorithm known for lossy compression is 'WebP'. This algorithm became popular in May, 2014 when Facebook started using it. Google's this new image compression algorithm provides 25-35% more compression as compared to JPEG. Though the algorithm is lossy in nature but the loss of information is far beyond the scope of human visual perception.

    However, when we talk about lossless compression algorithm that produces the smallest output regardless of speed, the 'best' algorithm is probably one of the PAQ-based context mining algorithms, which use a large number of independent context models to predict the next pixel in an image from neighboring pixels, followed by weighted averaging o the predictions and arithmetic coding. Some also use color transformations like (R, G, B)->(G, G-R, G-B) to decrease the correlation between colors. Here are three of the benchmarks:

    1. SqueezeChart
    2. Silesia Compression Benchmark
    3. Bitmap file-type compression test


  • Project 3: Sparse Matrices Representation and Computation
  • A sparse matrix is a matrix which contains only a very small proportion of non-zero elements, such as a 100-by-100 matrix with only 500 non-zero elements. Can you figure out a way to store the matrix in order to reduce the storage demand? And furthermore, define some functions for sparse matrix computation, e.g., computing the sum, difference, product of two sparse matrices, or transpose the matrix, computing the determinant, eigenvalues and eigenvectors.


  • Project 4: Encryption algorithms - Comparison
  • Project 5: Biological Webcrawler System
  • Project 6: Email Spam Detection Algorithms
  • Project 7: Deep Learning Framework