A helper script in R
I recently needed to concatenate multiple .csv-files, which displayed the same row names. The concatenated matrix would thus consist of columns from different files. To that end, I wrote the following R script.
First, let us specify the input directory.
inDir = "/home/user/csv_files/"
outF = paste(inDir, "concatenated.csv", sep="")
Second, let us list all .csv-files in the input directory and load the first file.
lst = list.files(path=inDir, pattern="*.csv")
first = read.csv(lst[1], row.names=1, header=F)
Third, let us loop through all other .csv-files and attach them to the growing dataframe.
hndl = first
for (i in 2:length(lst)) {
add = read.csv(lst[i], row.names=1, header=F)
hndl = merge(hndl, add, by=0, all=T)
sub = subset(hndl, select=-c(Row.names))
rownames(sub) = hndl[,'Row.names']
hndl = sub
}
Fourth, let us order the rows of the final matrix according to the very first .csv-file, and then save the matrix as output.
out = hndl[match(rownames(first), rownames(hndl)),]
write.csv(out, file=outF)