During our SharePoint 2010 to SharePoint Online migration, i was asked to archive around 1000+ article pages (used as company newsletters) as PDF and store in SharePoint Online site. If i could archive those pages as PDF, there was no need to keep all those sites. So it was also a good opportunity to cleanup our environment before migration
Solution:
First thing that came to my mind was to print from Internet Explorer but the results were horrendous. It would take 7-8 pages to print one article page.
I searched for third party tools and found PDFCreator and WkhtmltoPdf tools.
Option 1) automating using wkhtmltopdf tool
Observation:
- links are clickable
- banner is coming
- photos are not coming in most cases
- colors were better than option 2)
- Takes 6-7 pages to print one sharepoint article page
Option 2) automating using PDFCreator + Powershell
Observation:
a.
links are showing up but aren't clickable
b.
banner is not coming
c.
photos are coming
d. Takes 6-7 pages to print one sharepoint article page
d. Takes 6-7 pages to print one sharepoint article page
To test, I tried printing using chrome and it works perfectly. It takes only one page to print one article page, has better colors, all links are clickable, all images gets printed.
Using the headless and disabling GPU options did the trick.
This obviously needs Chrome browser to be installed on the server. Since i wasn't allowed to install Chrome on the SharePoint server, i copied the Google folder from my desktop-program files and pasted it on SP server. That was all it took to make the command work. In the powershell, you have to mention the path to chrome.exe as you will see in the code below:
# This script is intended to work on SharePoint 2010. For higher versions,
# just use CSOM
# Load Microsoft.SharePoint
# Open your SharePoint web
# just use CSOM
# Load Microsoft.SharePoint
# Open your SharePoint web
[void][System.reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint")
$site = new-object Microsoft.SharePoint.SPSite("http://your absolute sharepoint 2010 web Url where newsletters/article pages reside")
$web = $site.openweb()
write-host $web.Title
# Get all publishing pages
# Since there were folders in library, i added Scope=Recursive
$pubweb = [Microsoft.SharePoint.Publishing.PublishingWeb]::GetPublishingWeb($web)
$query = new-object Microsoft.SharePoint.SPQuery
$query.ViewAttributes = "Scope='Recursive'"
$pages = $pubweb.GetPublishingPages($query)
write-host 'no of pages ' $pages.count
# Loop through pages
# $newslettername: i kept it same as file name
# Remove .aspx and change extension to .pdf
# this also contains the desktop path where pdf will be saved
# Remove .aspx and change extension to .pdf
# this also contains the desktop path where pdf will be saved
foreach ($listItem in $pages)
{
$newslettername = ''
$pageurl = ''
$pageurl = $listItem.uri.AbsoluteUri
if($pageurl -Match 'templates') {
$newslettername = 'c:\temp\newsletters\templates\' + $listItem.name + '.pdf'
}
else {
$newslettername = 'c:\temp\newsletters\' + $listItem.name + '.pdf'
}
$newslettername = $newslettername -replace '.aspx',''
# Give full path to chrome.exe
# Run the command for converting .aspx page i.e. newsletter i.e. article page to PDF
F:\shishir\Google\Chrome\Application\chrome.exe --headless --disable-gpu --print-to-pdf=$newslettername $pageurl
write-host 'printed the page at' $pageurl
}
$web.Dispose()
$site.Dispose()






