Concat to PDF
I wrote a script a few months ago to scrape web novels. I extracted
them as plaintext and saved every chapter. Since some books had over a
hundred chapters, I wanted to concatenate them so it would be easier to
read. Concatenation can be done quickly using cat
and tools
like pandoc
can generate pretty PDFs.
Concatenation was a little difficult if the files were sorted oddly
or if only select files needed to be concatenated. I used a little KDE
servicemenu to achieve this task in a fun way (check
kf5-config --path services
to see where it can be
placed):
[Desktop Entry]
Type=Service
Icon=smiley-shape
X-KDE-ServiceTypes=KonqPopupMenu/Plugin
MimeType=all/allfiles;
Actions=mergeEntry;
Encoding=UTF-8
[Desktop Action mergeEntry]
Name=Merge selected file(s)
Icon=document-send
Exec=kdialog --msgbox "Will merge the following files:\n$(echo %F | head -c 500)..." && awk 'FNR==1{print ""}1' %F > "./merged$(date +'%s').out"
The neat thing about my process is that I was concatenating Markdown files. You know what that means? pandoc can parse these easily and give me a pretty PDF!
Font size larger (https://stackoverflow.com/a/46055046), use extarticle package, which also supports 14, 17, and 20pt:
pandoc -V geometry:margin=1in -V documentclass="extarticle" -V fontsize=14pt ...
If markdown:
pandoc -V geometry:margin=1in -V fontsize=12pt -f markdown -t pdf ... ...
If you’re converting books, try fonts (eg. Garamond) installed on your computer:
pandoc -V geometry:margin=1in -V fontsize=12pt -V mainfont="Garamond" -f markdown -t pdf ... ...
OR (check your ~/.fonts directory)
pandoc -V geometry:margin=1in -V fontsize=12pt -V mainfont="pala.tff" -f markdown -t pdf ... ...
If toc and chapters:
pandoc -V geometry:margin=1in -V fontsize=12pt -f markdown -t pdf myinput.md -o myoutput.pdf --toc
If want headers: --top-level-division=chapter
If CJK characters, read (https://stackoverflow.com/a/48090656), you must use xelatex and set a valid font:
pandoc -V geometry:margin=1in -V fontsize=12pt -V CJKmainfont="Noto Sans CJK JP" -f markdown -t pdf myinput.md -o myoutput.pdf --toc --pdf-engine=xelatex --standalone
If want headers:
-s aka --standalone
And the best settings, all together:
markdown to pdf:
pandoc -V geometry:margin=1in -V documentclass="extarticle" -V fontsize=14pt -V mainfont="Garamond" -V CJKmainfont="Noto Sans CJK JP" -f markdown -t pdf myinput.md -o myoutput.pdf --toc --pdf-engine=xelatex --standalone
epub to pdf:
pandoc -V geometry:margin=1in -V fontsize=12pt -V mainfont="Garamond" -V CJKmainfont="Noto Sans CJK JP" -f epub -t pdf myinput.epub -o myoutput.pdf
tags: code