Split a text file by empty line or string pattern

Say we have a file as below, we want to split this file by empty line.

$ cat infile.txt
hello
hello world

quick fox
runs
away

hello again

Using awk, we can achieve that.

$ awk '{print $0 > "outfile" NR}' RS='' infile.txt
$ cat outfile1
hello
hello world
$ cat outfile2
quick fox
runs
away
$ cat outfile3
hello again

There is also a csplit command, which can do similar things.

$ csplit --digits=2 --quiet --prefix=outfile infile.txt '/^$/+1' '{*}'
$ cat outfile00
hello
hello world

$ cat outfile01
quick fox
runs
away

$ cat outfile02
hello again

man csplit to see specific usage of csplit. It may be a little confusing, at least for me. I can’t find a way to eliminate the empty line of the outfiles.

Similarly, if we want to split infile by specific string pattern instead of empty line, we can also achieve that.

Let’s try csplit first. This time, try to split infile2.txt by string “AAAA”.

$ cat infile2.txt
hello
hello world
AAAA
quick fox
runs
away
AAAA
hello again
$ csplit --digits=2 --quiet --prefix=outfile infile2.txt '/^AAAA$/+1' '{*}'
$ cat outfile00
hello
hello world
AAAA
$ cat outfile01
quick fox
runs
away
AAAA
$ cat outfile02
hello again

Then, we try awk.

$ awk '{print $0 > "outfile" NR}' RS='AAAA\n' infile2.txt
$ cat outfile1
hello
hello world

$ cat outfile2
quick fox
runs
away

$ cat outfile3
hello again

If you don’t like the trailing empty line of all the outfiles, use “printf” instead of “print”, like this.

$ awk '{printf $0 > "outfile" NR}' RS='AAAA\n' infile2.txt
$ cat outfile1
hello
hello world
$ cat outfile2
quick fox
runs
away
$ cat outfile3
hello again

In conclusion, to split file by empty line or specific string pattern, consider using awk or csplit. Personally I perfer awk, as it’s more flexible.

本文出自夜惊心的博客，转载请保留出处

Split a text file by empty line or string pattern 16 February 2017