Plotting the coordinates of HiSeq (2500) and MiSeq (v2) reads:
Take a peek at the thumbnail of a MiSeq tile, you can see the tile is actually square. But the edges of the image are darker than the center area. Very likely that only spots (clusters) in the center are selected and used for base-calling.
If the surface of each tile on the flow cell is full off oligos which allows library binding, clusters should be generated evenly on the flow cell. Each tile should have the same cluster density across the whole area, and sequencing by synthesis reactions should take place at the same time with same performance, no matter a cluster is located at the corner or at the center. However, for some reason, imaging is not equally good across the tile. There is a reduction of brightness at the periphery compared to the image center (This effect seems like vignetting in photography). The circle distribution of reads indicates that clusters at the edges/corners (21% of all clusters) are discarded.
MiSeq v3 kit generates more dense clusters than v2 kit, and gives higher yield. But the shape and size of the 'selected' clusters are pretty much the same in both reagents. If it's possible to improve the imaging device, or cluster identification algorithm, all clusters in a tile could be identified and used for base-calling. That would increase the yield of MiSeq runs by ~27%. It's a little pity that those clusters go to waste.
Codes:
# extract tile numbers and x,y coordinates gunzip -c XXX_S1_L001_R1_001.fastq.gz | sed -n '1~4s/:/ /gp' | cut -d ' ' -f 5-7 > pos.XXX.txt
# read the file (specifying colClasses option instead of using the default can make 'read.table' run MUCH faster) pos.miseq.all = read.table("pos.miseq.txt", colClasses=c("numeric","numeric","numeric")) pos.hiseq.all = read.table("pos.hiseq.txt", colClasses=c("numeric","numeric","numeric")) # set column names names(pos.miseq.all) = c("t", "x", "y") names(pos.hiseq.all) = c("t", "x", "y") # sampling a subset pos.miseq = pos.miseq.all[sample(nrow(pos.miseq.all), 10000), ] pos.hiseq = pos.hiseq.all[sample(nrow(pos.hiseq.all), 50000), ] # plot for each tile for (i in sort(unique(pos.miseq$t))){ cat(i, "\n") # plot hiseq reads first plot(pos.hiseq[pos.hiseq$t==i, 2:3], pch=21, col="blue", xlim=c(0,30000), ylim=c(0,100000), main=paste("tile:",i)) # add miseq reads points(pos.miseq[pos.miseq$t==i, 2:3], pch=23, col="orange") legend(x="right", c("HiSeq","MiSeq"), bty="n", col=c("blue","orange"), pch=c(21,23)) # pause for 2 seconds Sys.sleep(time=5) }