Download Prajavaani
First let us understand how the Prajavaani e-paper has been uploaded by the news paper press in http://prajavaniepaper.com.
http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${EDITION_LETTER}_${PAGE_NUMBER}100.pdf
$YEAR ==> YYYY (e.g 2013)
$MONTH==> 01 - 12
$DAY==>01 - 31
$REVERSE_DATE ==> YYYYMMDD (e.g 20130825)
$EDITION_LETTER ==> a - z
e.g
a ==>Main Edition
c ==> Classifieds
g ==> Bhoomika
h ==> Kreeda Puravani
j ==> Sapthaahika Puravani
n ==> PV Metro
p ==> Siskshana
z ==> Additional Page
Not all these editions will be printed on all days. For example, Saapthaahika puravaani will be printed only on Sunday, Bhoomika is only on Saturday etc.. Hence we have to check for all these letters while downloading.
Here is the script which will download all the pages of the corresponding editions and finally it will merge each pdf pages in to one PDF file( pdftk needs to be installed in your machine).
Steps to follow:
1. Copy the below code in to one file (/root/downloadpaper.sh) and provide the executable permission for this. After this, please provide write access to $LOGFILE for all users.
2. Add the below line as a root user in /etc/rc.local file. This will enable to download the paper in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.
bash /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`
=====================================
#!/bin/bash
# usage $0 DD MM YYYY
DAY=$1
MONTH=$2
YEAR=$3
DATE=$DAY$MONTH$YEAR
REVERSE_DATE=$YEAR$MONTH$DAY
DIRECTORY="$HOME/prajavani/$DATE"
LOGFILE="/var/log/newspapers.log" // provide write access to all users chmod 777
if [ -e "$DIRECTORY/$DATE.pdf" ]; then
echo "$DATE prajavani paper is downloaded, hence exiting" >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
exit
fi
exit_status=1
while [ "$exit_status" -ne 0 ]
do
wget -q prajavaniepaper.com
exit_status=$?
sleep 5
done
rm -rf index.html
echo "Downloading prajavani paper of $1/$2/$3 date on `date +%d/%m/%Y`" >> $LOGFILE
mkdir -p $DIRECTORY
cd $DIRECTORY
INPUT=""
for letter in a b c d e f g h i j k l m n o p q r s t u v w x y z
do
exit_status=0
pageno=1
while [ $exit_status -eq 0 ]
do
wget -q "http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${letter}_$(printf %03d $pageno)100.pdf"
exit_status=$?
let pageno=$pageno+1
done
if [ $exit_status -ne 0 -a $pageno -gt 2 ]; then
let "pageno = $pageno - 2"
i=1;
while [ $i -le $pageno ]
do
INPUT="$INPUT ${REVERSE_DATE}${letter}_$(printf %03d $i)100.pdf"
let i=$i+1
done
fi
done
pdftk $INPUT output $DATE.pdf
rm $INPUT
echo "\nFinished merging of prajavani pages-- enjoy the reading :) " >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
First let us understand how the Prajavaani e-paper has been uploaded by the news paper press in http://prajavaniepaper.com.
http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${EDITION_LETTER}_${PAGE_NUMBER}100.pdf
$YEAR ==> YYYY (e.g 2013)
$MONTH==> 01 - 12
$DAY==>01 - 31
$REVERSE_DATE ==> YYYYMMDD (e.g 20130825)
$EDITION_LETTER ==> a - z
e.g
a ==>Main Edition
c ==> Classifieds
g ==> Bhoomika
h ==> Kreeda Puravani
j ==> Sapthaahika Puravani
n ==> PV Metro
p ==> Siskshana
z ==> Additional Page
Not all these editions will be printed on all days. For example, Saapthaahika puravaani will be printed only on Sunday, Bhoomika is only on Saturday etc.. Hence we have to check for all these letters while downloading.
Here is the script which will download all the pages of the corresponding editions and finally it will merge each pdf pages in to one PDF file( pdftk needs to be installed in your machine).
Steps to follow:
1. Copy the below code in to one file (/root/downloadpaper.sh) and provide the executable permission for this. After this, please provide write access to $LOGFILE for all users.
2. Add the below line as a root user in /etc/rc.local file. This will enable to download the paper in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.
bash /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`
=====================================
#!/bin/bash
# usage $0 DD MM YYYY
DAY=$1
MONTH=$2
YEAR=$3
DATE=$DAY$MONTH$YEAR
REVERSE_DATE=$YEAR$MONTH$DAY
DIRECTORY="$HOME/prajavani/$DATE"
LOGFILE="/var/log/newspapers.log" // provide write access to all users chmod 777
if [ -e "$DIRECTORY/$DATE.pdf" ]; then
echo "$DATE prajavani paper is downloaded, hence exiting" >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
exit
fi
exit_status=1
while [ "$exit_status" -ne 0 ]
do
wget -q prajavaniepaper.com
exit_status=$?
sleep 5
done
rm -rf index.html
echo "Downloading prajavani paper of $1/$2/$3 date on `date +%d/%m/%Y`" >> $LOGFILE
mkdir -p $DIRECTORY
cd $DIRECTORY
INPUT=""
for letter in a b c d e f g h i j k l m n o p q r s t u v w x y z
do
exit_status=0
pageno=1
while [ $exit_status -eq 0 ]
do
wget -q "http://prajavaniepaper.com/pdf/$YEAR/$MONTH/$DAY/${REVERSE_DATE}${letter}_$(printf %03d $pageno)100.pdf"
exit_status=$?
let pageno=$pageno+1
done
if [ $exit_status -ne 0 -a $pageno -gt 2 ]; then
let "pageno = $pageno - 2"
i=1;
while [ $i -le $pageno ]
do
INPUT="$INPUT ${REVERSE_DATE}${letter}_$(printf %03d $i)100.pdf"
let i=$i+1
done
fi
done
pdftk $INPUT output $DATE.pdf
rm $INPUT
echo "\nFinished merging of prajavani pages-- enjoy the reading :) " >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
KannadaPrabha
In Kannadaprabha paper, there is no separate letter for each edition. hence it is very simple to download. The way the date format is handled is different from prajavani paper. apart from everything is same.
The above script is slightly modified as below..
Steps to follow:
1. Copy the below code in to one file (/root/downloadpaper.sh) and provide the executable permission for this. After this, please provide write access to $LOGFILE for all users.
2. Add the below line as a root user in /etc/rc.local file. This will enable to download the paper in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.
bash /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`
1. Copy the below code in to one file (/root/downloadpaper.sh) and provide the executable permission for this. After this, please provide write access to $LOGFILE for all users.
2. Add the below line as a root user in /etc/rc.local file. This will enable to download the paper in background, when you boot the machine. The script is having logic of checking whether it has already downloaded or not. If not it will download.
bash /root/downloadpaper.sh `date +%d` `date +%m` `date +%Y`
=============================================
#!/bin/bash
DAY=$1
MONTH=$2
YEAR=$3
if [ $MONTH -lt 10 ];then
MONTH=`echo $MONTH | cut -b 2`
fi
if [ $DAY -lt 10 ];then
DAY=`echo $DAY | cut -b 2`
fi
DATE=$DAY$MONTH$YEAR
DIRECTORY="$HOME/kannadaprabha/$DATE"
LOGFILE="/var/log/newspapers.log"
if [ -e "$DIRECTORY/$DATE.pdf" ]; then
echo "$DATE kannadaprabha paper is downloaded, hence exiting" >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
exit
fi
exit_status=1
while [ "$exit_status" -ne 0 ]
do
wget -q archives.kannadaprabha.com
exit_status=$?
sleep 5
done
rm -rf index.html
echo "Downloading kannadaprabha paper of $1/$2/$3 date on `date +%d/%m/%Y`" >> $LOGFILE
mkdir -p $DIRECTORY
cd $DIRECTORY
exit_status=0
pageno=1
while [ $exit_status -eq 0 ]
do
wget -q "http://archives.kannadaprabha.com/pdf/$DATE/$pageno.pdf"
exit_status=$?
let "pageno = $pageno + 1"
done
i=1;
INPUT=""
let "pageno = $pageno - 1"
echo "Finished downloading $pageno of kannadaprabha pages" >> $LOGFILE
while [ $i -lt $pageno ]
do
INPUT="$INPUT $i.pdf"
let "i = $i + 1"
done
pdftk $INPUT output $DATE.pdf
rm $INPUT
echo "\nFinished merging of kannadaprabha pages-- enjoy the reading :) " >> $LOGFILE
echo "===========================================================\n" >> $LOGFILE
Thanks,
Shivu