Sunday, August 25, 2013

Get Prajavaani and Kannadaprabha e-paper every day in Linux machine

Download Prajavaani 

First let us understand how the  Prajavaani e-paper has been uploaded by the news paper press in http://prajavaniepaper.com.

http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${EDITION_LETTER}_${PAGE_NUMBER}100.pdf
$YEAR ==> YYYY (e.g 2013)
$MONTH==> 01 - 12
$DAY==>01 - 31

$REVERSE_DATE ==> YYYYMMDD (e.g 20130825)
$EDITION_LETTER ==> a - z
e.g
       a ==>Main Edition
       c ==> Classifieds
       g ==> Bhoomika
       h ==> Kreeda Puravani
       j ==> Sapthaahika Puravani
       n ==> PV Metro
       p ==> Siskshana
       z ==> Additional Page
Not all these editions will be printed on all days.  For example, Saapthaahika puravaani will be printed only on Sunday, Bhoomika is only on Saturday etc.. Hence we have to check for all these letters while downloading.

Here is the script which will download  all the pages of the corresponding editions and finally it will merge each pdf pages in to  one PDF file( pdftk needs to be installed in your machine).

Steps to follow:
              1. Copy the below code in to one file (/root/downloadpaper.sh)  and provide the executable permission for this. After this,  please provide write access to $LOGFILE for all  users.

              2. Add the  below line as a root user in /etc/rc.local file. This will enable to download the paper  in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.

bash  /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`

=====================================
#!/bin/bash

# usage  $0 DD MM YYYY

DAY=$1 
MONTH=$2
YEAR=$3

DATE=$DAY$MONTH$YEAR
REVERSE_DATE=$YEAR$MONTH$DAY
DIRECTORY="$HOME/prajavani/$DATE"
LOGFILE="/var/log/newspapers.log" // provide write access to all users chmod 777

if [ -e "$DIRECTORY/$DATE.pdf" ]; then
        echo "$DATE prajavani paper is downloaded, hence exiting"       >> $LOGFILE
        echo "===========================================================\n" >> $LOGFILE
        exit
fi

exit_status=1
while [ "$exit_status" -ne 0 ]
do
        wget -q prajavaniepaper.com
        exit_status=$?
        sleep 5

done
rm -rf index.html
echo "Downloading prajavani paper of $1/$2/$3 date on `date +%d/%m/%Y`" >> $LOGFILE
mkdir -p $DIRECTORY
cd $DIRECTORY
INPUT=""

for letter in a b c d e f g h i j k l m n o p q r s t u v w x y z
do
        exit_status=0
        pageno=1
        while [ $exit_status -eq 0 ]
        do
                wget -q "http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${letter}_$(printf %03d $pageno)100.pdf"
                exit_status=$?
                let pageno=$pageno+1
        done
        if [ $exit_status -ne 0 -a $pageno -gt 2 ]; then
                let "pageno = $pageno - 2"
                i=1;
                while [ $i -le $pageno ]
                do
                        INPUT="$INPUT ${REVERSE_DATE}${letter}_$(printf %03d $i)100.pdf"
                        let i=$i+1
                done
        fi
done

pdftk $INPUT output $DATE.pdf
rm $INPUT
echo "\nFinished merging of prajavani pages-- enjoy the reading :) " >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE

KannadaPrabha

In Kannadaprabha paper, there is no separate letter for each edition. hence it is very simple to download. The way the date format is handled is different from prajavani paper. apart from everything is same. 
The  above  script is slightly modified  as below..

Steps to follow:
              1. Copy the below code in to one file (/root/downloadpaper.sh)  and provide the executable permission for this. After this,  please provide write access to $LOGFILE for all  users.

              2. Add the  below line as a root user in /etc/rc.local file. This will enable to download the paper  in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.

bash  /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`

=============================================
#!/bin/bash

DAY=$1
MONTH=$2
YEAR=$3

if [ $MONTH -lt 10 ];then
        MONTH=`echo $MONTH | cut -b 2`
fi

if [ $DAY -lt 10 ];then
        DAY=`echo $DAY | cut -b 2`
fi

DATE=$DAY$MONTH$YEAR
DIRECTORY="$HOME/kannadaprabha/$DATE"
LOGFILE="/var/log/newspapers.log"

if [ -e "$DIRECTORY/$DATE.pdf" ]; then
        echo "$DATE kannadaprabha paper is downloaded, hence exiting"   >> $LOGFILE
        echo "===========================================================\n" >> $LOGFILE
        exit
fi

exit_status=1
while [ "$exit_status" -ne 0 ]
do
        wget -q archives.kannadaprabha.com
        exit_status=$?
        sleep 5

done
rm -rf index.html
echo "Downloading kannadaprabha paper of $1/$2/$3 date on `date +%d/%m/%Y`" >> $LOGFILE
mkdir -p $DIRECTORY
cd $DIRECTORY

exit_status=0
pageno=1
while [ $exit_status -eq 0 ]
do
        wget -q "http://archives.kannadaprabha.com/pdf/$DATE/$pageno.pdf"
        exit_status=$?
        let "pageno = $pageno + 1"
done

i=1;
INPUT=""
let "pageno = $pageno - 1"
echo "Finished downloading $pageno of kannadaprabha pages" >> $LOGFILE

while [ $i -lt $pageno ]
do
        INPUT="$INPUT $i.pdf"
        let "i = $i + 1"
done


pdftk $INPUT output $DATE.pdf
rm $INPUT
echo "\nFinished merging of kannadaprabha pages-- enjoy the reading :) " >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE


Thanks,
Shivu