How to Stream Files between Site Collections and Sites in SharePoint with JavaScript

Some time ago I was working on a client-side migration tool to use in SharePoint Online – to copy files and their metadata between libraries both within the same site collection, and into different site collections. It turned out to be a pretty difficult problem – but was solved in the end by using the browser itself as a lifeboat to carry the data between the site collections. If you’re reading this post you probably already looked at the Copy webservice, and “CopyIntoItemsLocal”, which only works when the source and destination are in the same site collection (and causes trouble by populating the hidden “copysource” metadata of the file in the destination file).

Anyway…

Along the way, I discovered a very nice jQuery extension called SPServices, which neatly wrapped up some of the SharePoint API commands, and made the development much more straightforward than it might otherwise have been. The snippet below uses SPServices to load a document within SharePoint into the browser’s memory, and then to write it back into a file elsewhere in SharePoint.

var source_file_url = "https://server/sites/site_collection_a/library_a/document_a.docx";
var destination_file_url = "https://sites/site_collection_b/library_b/document_b.docx";

setTimeout(function() {
    
    var streamLength = 0;
    
    // Read the SourceFileURL into memory
    $().SPServices({
        operation: "GetItem",
        Url: source_file_url,
        async: false,
        completefunc: function (xData, Status) {
            itemstream = $(xData.responseXML).find("Stream").text();
            streamLength = itemstream.length;
            itemfields = "";
            $(xData.responseXML).find("FieldInformation").each(function(){
                itemfields+=$(this).get(0).xml;
            });
        }
    });
    
    if (streamLength) {
    
        // Write the data back into the DestinationFileURL
        $().SPServices({
            operation: "CopyIntoItems",
            SourceUrl: source_file_url,
            async: false,
            DestinationUrls: [destination_file_url],
            Stream: itemstream,
            Fields:itemfields,
            completefunc: function (xData, Status) {
                // file has copied at this point
            }
        });
    }
    else {
        // failed to find list content type
    }
}, 0);

It’s worth noting that at the point the file has been copied, you probably still need to do some work around setting content types (if that is important to you) – I will cover this when I get a chance to write it up, because it’s NOT trivial (especially doing everything on the client-side).

It’s also worth noting that the snippet above is wrapped in a setTimeout call – a handy trick I discovered with asynchronous calls in JavaScript to make sure the code within completes before anything else happens. Kind of a cheap alternative to promises.

Posted by Jonathan Beckett in Notes, 0 comments

Batch File to Run PowerShell Scripts and Log Console Output to Log Files

The easiest way to schedule a PowerShell script is to run it from a DOS batch file, and schedule the execution of the batch file on a Windows server via the Task Scheduler. The following batch file will run a PowerShell script, and also organise output from the PowerShell (via Write-Host) into a text file, stored in a sensibly formatted sub-folder and filename structure.

ECHO OFF

REM create a logfile name (log_yyyy-mm-dd-hh-mm-ss.log)
SET logfilename=log_%DATE:~-4%-%DATE:~3,2%-%DATE:~7,2%-%TIME:~0,2%-%TIME:~3,2%-%TIME:~6,2%.log

REM create a folder name (log_yyyy-mm-dd)
SET foldername=log_%DATE:~6,4%-%DATE:~3,2%-%DATE:~0,2%

REM create a log subdirectory
mkdir c:\logs\%foldername%

REM run the powershell script and pipe output to the logfile
powershell.exe "c:\scripts\MyScript.ps1" > "c:\logs\%foldername%\%logfilename%"

The take-away from this script is almost certainly the arcane DOS batch file comments required to get bits and pieces of the date and time.

Posted by Jonathan Beckett in Notes, 0 comments

Repairing List Item Permission Actions in Exported Nintex Workflows

While working on a sizeable SharePoint and Nintex Workflow development project recently, I came across a significant issue in Nintex Workflow that the support engineers at Nintex said was not a bug. I disagree with them, and had to find a workaround anyway, so thought I would share it.

The Problem

If you export a workflow from one SharePoint system (e.g. a development farm), and import it into a different SharePoint system (e.g. a production farm), if you are using the “Set Item Permissions” action within your workflow, you will discover that it is broken after importing. I did a bit of digging, and discovered why – the exported workflow describes the permissions sets in XML by both their name, and their internal IDs (large integers) – but only seems to use the IDs when importing – Nintex Workflow doesn’t try to correlate the permission sets by name on the destination system, so presumes it cannot find the permission sets described in the workflow actions.

It’s worth repeating – the Nintex support engineer I dealt with said this was by design. I was quite shocked.

The Solution

If you’re working on a sizeable project, you probably have all the workflows exported to a folder on the filesystem. You can therefore process the files to replace the IDs from the original system with those of the target system. So – we can run the following PowerShell script on the files, while they are sitting on the destination server(s):


$url = "https://server/sites/site_collection/subsite"

$uri = [System.Uri]$url

# Loop through files in Workflows subdirectory
foreach ( $source_file in $(Get-ChildItem './Workflows' -File | Sort-Object -Property Name) ) {

    Write-Host $("Processing [" + $source_file.Name + "]") -foregroundcolor white
    
    $file_content = Get-Content "../Workflows/$source_file"
    
    # repair the role definitions in the XML
    Write-Host " - Repairing Role Definition IDs in XML"
    foreach ($role_definition in $web.RoleDefinitions){
        $pattern     = $('\#' + $role_definition.Name + '\;\#None\;\#[0-9]+\$\$\#\#')
        $replacement = $('#' + $role_definition.Name + ';#None;#' + $role_definition.Id + '$$$$##')
        $file_content = $file_content -replace $pattern , $replacement
    }
    
    # Write the file into the modified directory
    $file_content | out-file -encoding utf8 "./Workflows/Modified/$source_file"
    
    Write-Host $(" - Finished Processing [" + $source_file.Name + "]")

}

The above snippet presumes you have all your workflows in a folder called “Workflows”, alongside the PowerShell script. It also presumes a sub-folder called “Modified” exists within the Workflows folder, to put the modified workflows into. The script does a regex search for the role definition names in the XML (the permission sets), and swaps them out for the matching ones for the destination system. After running the script, you end up with a set of workflow export files that work.

In my mind, this entire situation could have been avoided if the developers at Nintex had been a bit more forward thinking. At least there is a solution.

Posted by Jonathan Beckett in Notes, 0 comments

Breaking the CopySource connection within SharePoint after using CopyIntoItemsLocal

When you use the CopyIntoItemsLocal method of the Copy.asmx webservice in SharePoint, you get a copy of your document, but you also get a piece of unwelcome hidden data in the copied document – a property called “_CopySource”. Normally this is used by the “Send To” function in the SharePoint interface, and allows SharePoint to keep track of what has been copied where – then allowing “Manage Copies” functionality from the source item. That the copy webservice partially populates it looks like a bug (because the manage copies dialog is left empty), so we have to deal with it after copying.

The telltale that you have this happening is a “Go to Source” option appearing in the ECB, and permissions issues cropping up if you lock down the source library or list that you’re copying from (Sharepoint will challenge the user for a username and password when opening the copied item).

The solution is to empty the “_CopySource” property of the copied items, which you can do via the lists webservice.

Call the update items method of the lists webservice (lists.asmx), with the following SOAP header;

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:m="http://schemas.microsoft.com/sharepoint/soap/">
  <soap:Header>
  </soap:Header>
  <soap:Body>
    <m:UpdateListItems>
      <m:listName>{Library Display Name}</m:listName>
      <m:updates>
        <Batch OnError="Continue" ListVersion="1">
          <Method ID="1" Cmd="Update">
            <Field Name="ID">{Document ID Number}</Field>
            <Field Name="MetaInfo" Property="_CopySource"></Field>
          </Method>
        </Batch>
      </m:updates>
    </m:UpdateListItems>
  </soap:Body>
</soap:Envelope>

Just replace {Library Display Name} and {Document ID Number} with appropriate data.

Posted by Jonathan Beckett in Notes, 0 comments

WordPress Link Farming with Python

A little while ago I was trying to think of a good way to discover interesting blogs to read, and hit upon an idea – what if I could aggregate together the URLs of all the blogs that had commented on the recent posts of another blog. I ended up writing this Python script to do exactly that.

The script performs the following operations:

  • Find all child pages of the URL given (e.g. https://mylovelyblog.wordpress.com)
  • Load each page, and extract all URLs ending with “wordpress.com”
  • Aggregate the found URLs into one list
  • De-duplicate the final list of URLs
  • Output the list of URLs

To use it, you might save the script as “grab_wordpress.py”, and run the following command at the command prompt:

python grab_wordpress.py http://someblog > urls.txt

… which will save all the URLs into a text file called “urls.txt”.

import sys,re,urllib2

# find URLs matching the pattern and return them
def find_blogs(html):
    
    matches = re.findall('[A-Za-z0-9]+.wordpress.com', html, re.S)
    unique_matches = list(set(matches))
    result = []

    for match in unique_matches:
        result.append('http://' + match)

    return result

# find URLs within a given page
def find_child_urls(url,html):
    matches = re.findall('href=[\'\"]([^\"\']+)[\'\"]',html)
    unique_matches = list(set(matches))
    result = []
    for match in unique_matches:
        if url in match:
            if match.endswith('/'):
                result.append(match)
    return result

# get the URL passed in
url = sys.argv[1]

# tell the user what we are doing
print 'Fetching [' + url + ']'

# fetch the first page
response = urllib2.urlopen(url)
html = response.read()

# fetch the child URLs
child_urls = find_child_urls(url,html)
print str(len(child_urls)) + ' child URLs'

# loop through the child URLs, fetching the pages, and trawling for wordpress URLs
all_blog_urls = []

for child_url in child_urls:
    print 'Fetching [' + child_url + ']'
    response = urllib2.urlopen(child_url)
    html = response.read()
    blog_urls = find_blogs(html)
    for blog_url in blog_urls:
        all_blog_urls.append(blog_url)

# de-duplicate
all_blog_urls = list(set(all_blog_urls))

print str(len(all_blog_urls)) + ' blog URLs'

for blog_url in all_blog_urls:
    print blog_url

The script could be changed to output an HTML page with a list of anchors, but you could easily do that via a search/replace in a text editor too.

Hopefully this is useful for somebody else too!

Posted by Jonathan Beckett in Notes, 0 comments

Removing Webparts from SharePoint Site Pages with Powershell

The following PowerShell snippet shows how to remove a WebPart from a page in SharePoint by it’s title. I wrote it up because it’s not as straightforward as you might imagine – you need to loop through the WebParts in the page via the “webpartmanager” object – there is no enumeration of WebParts in a page.

$webpart_title = "Hello World"
$page_filename = "home.aspx"

# Connect to Web
$web = Get-SPWeb "https://intranet.contoso.com"

# Instantiate Webpart Manager
$webpartmanager = $web.GetLimitedWebPartManager($($web.Url + "/SitePages/" + $page_filename),[System.Web.UI.WebControls.WebParts.PersonalizationScope]::Shared)

$webpartsarray = @()

for($i=0;$i -lt $webpartmanager.WebParts.Count;$i++) {
  if($webpartmanager.WebParts[$i].title -eq $webpart_title) {
    $webpartsarray = $webpartsarray + $webpartmanager.WebParts[$i].ID
  }
}

$num_webparts = $webpartsarray.length

if ($num_webparts -gt 0)
{
  Write-Host "Found Webpart [$webpart_title]"
  for($j=0; $j -lt $num_webparts; $j++)
  {
    Write-Host $("Deleting WebPart [" + $webpartsarray[$j] + "]")
    $webpartmanager.DeleteWebPart($webpartmanager.WebParts[$webpartsarray[$j]])
  }
}
else
{
  Write-Host "WebPart Not found"
}

# release resources
$web.Close()
$web.Dispose()

Posted by Jonathan Beckett in Notes, 0 comments

Provisioning SharePoint WebPart Pages, and WebParts with PowerShell

One of the more common exercises you might undertake in a PowerShell script when automating the deployment of infrastructure is the creation of WebPart pages, adding WebParts onto those pages, and potentially making one of those pages the default front page for a given site.

The following code snippets illustrate the methods required for each step of the process.

Connect to SharePoint

Before we can issue any instructions to SharePoint we need to add the SharePoint SnapIn to PowerShell, and connect to a web.

if ((Get-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue) -eq $null)
{
  Add-PSSnapin "Microsoft.SharePoint.PowerShell"
}
$web = Get-SPWeb "https://intranet.contoso.com"

Create a WebPart Page in the Site Pages Library

Next we create a page in the site pages library – notice that the layout template is chosen by it’s internal ID – you can find these out by searching MSDN for “SharePoint Page Layout Template enumeration”.

$site_pages_library = $web.lists["Site Pages"]
$pageTitle = "My Page"
$layoutTemplate = 4 # Template code
$xml = "" + $site_pages_library.ID + "NewWebPageNewWebPartPage" + $layoutTemplate + "true" + $pageTitle + ""
$result = $web.ProcessBatchData($xml)

Add a WebPart to the Page

First we need to instantiate a WebPart manager object, which will be used to manipulate the webparts within a given page.

$webpartmanager = $web.GetLimitedWebPartManager($web.Url + "/SitePages/My%20Page.aspx", [System.Web.UI.WebControls.WebParts.PersonalizationScope]::Shared)

Add a Content Editor WebPart to the Page

Some example code to add a content editor WebPart to a WebPart page. Notice in particular the final command to the WebPart Manager object, which details the section of the page, and an index number within that section to place the WebPart.

$webpart = new-object Microsoft.SharePoint.WebPartPages.ContentEditorWebPart
$webpart.ChromeType = [System.Web.UI.WebControls.WebParts.PartChromeType]::None
$webpart.Title = "Example Content Editor WebPart"
$docXml = New-Object System.Xml.XmlDocument
$contentXml = $docXml.CreateElement("Content")
$inner_xml = "Hello World!"
$contentXml.set_InnerText($inner_xml) > $null
$docXml.AppendChild($contentXml) > $null
$webpart.Content = $contentXml
$webpartmanager.AddWebPart($webpart, "Header", 1) > $null

Add a List View WebPart to the Page

It turns out adding list views to the page is a bit easier than a content editor webpart. Note that the webpart will take a copy of the view, so if you change the underlying view it is using within the list, it will not affect the WebPart.

$list = $web.Lists["My List"]
$view = "All Items"
$webpart = new-object Microsoft.SharePoint.WebPartPages.ListViewWebPart
$webpart.ChromeType = [System.Web.UI.WebControls.WebParts.PartChromeType]::TitleOnly
$webpart.Title = "Example List View WebPart"
$webpart.ListName = $list.ID
$webpart.ViewGuid = $view.ID
$webpartmanager.AddWebPart($webpart, "Body", 1) > $null

Make the new Page the default front page for the site

One of the more common reasons to provision a page through Powershell is as part of a dashboard that will become the user interface for the SharePoint site – this is how you do that.

$root_folder = $web.RootFolder
$root_folder.WelcomePage = "SitePages/My%20Page.aspx"
$root_folder.Update()

Release Resources

Finally, we need to release the resources PowerShell is holding onto with SharePoint.

$web.Close()
$web.Dispose()
Posted by Jonathan Beckett in Notes, 0 comments

Powershell to Clear a Large SharePoint List

When a SharePoint list grows to millions of rows (cough – Nintex Workflow History List!), it becomes a huge problem to clear it’s contents down. The following script uses a couple of tricks to essentially set a “cursor” to loop through a huge list, and clear it down. It does it in chunks of 1000 items at a time (using the “batch” function of the SharePoint API), and then empties both the site and site collection recycle bins. It runs repeatedly until the offending list is cleared down.

$site_collection_url = "https://intranet.contoso.com"
$list_title = "Things"
$batch_size = 1000

$site = get-spsite $site_collection_url
$web = get-spweb $site_collection_url

$list = $web.Lists[$list_title]

$query = New-Object Microsoft.SharePoint.SPQuery
$query.ViewAttributes = "Scope='Recursive'";
$query.RowLimit = $batch_size
$caml = ''
$query.Query = $caml
$process_count = 0

do
{
    $start_time = Get-Date
    write-host $(" - [Compiling Batch (" + $batch_size + " items)]") -nonewline

    $list_items = $list.GetItems($query)
    $count = $list_items.Count
    $query.ListItemCollectionPosition = $list_items.ListItemCollectionPosition

    $batch = ""

    $j = 0
    for ($j = 0; $j -lt $count; $j++)
    {
        $item = $list_items[$j]
        $batch += "$($list.ID)$($item.ID)Delete$($item.File.ServerRelativeUrl)"
        if ($i -ge $count) { break }
    }

    $batch += ""

    write-host " [Sending Batch]" -nonewline
    $result = $web.ProcessBatchData($batch)

    write-host " [Emptying Web Recycle Bin]" -nonewline
    $web.RecycleBin.DeleteAll()

    write-host " [Emptying Site Recycle Bin]" -nonewline
    $site.RecycleBin.DeleteAll()

    $end_time = Get-Date
    $process_count += $batch_size

    write-host $(" [Processing Time " + ($end_time - $start_time).TotalSeconds + "] [Processed " + $process_count + " so far]") -nonewline
    write-host " [Waiting 2 seconds]"

    start-sleep -s 2

}
while ($query.ListItemCollectionPosition -ne $null)

// Release Resources
$web.Dispose()
$site.Dispose()

The take-away from this script is the ListItemCollectionPosition property of the query object – which appears to work like a cursor. I had never seen it before I started searching for solutions to this problem. It may well be useful again in the future.

Posted by Jonathan Beckett in Notes, 0 comments

SharePoint mixing up Folders and Document Sets with WebDAV

When you create a document library, the document library (by default) has two content types – Document, and Folder. Folder is hidden from you – but when you list the content types of the library in Powershell, the following will be reported:

  • Document
  • Folder

When you create a folder in WebDAV, the Folder Content Type is chosen by SharePoint automatically. Interestingly, if you use powershell to remove the Folder content type, and then create another folder via WebDAV, it will have NO content type listed, even though the folder will still work as a folder in the library.

Here’s the catch – and how you can confuse SharePoint:

If you remove the Folder Content Type, and add a Document Set Content Type, and then go to WebDAV to create a folder, the folder will get the Document Set content type. The reason for this appears to be in the way SharePoint chooses the folder content type – Folders and Document Set IDs (GUIDs) both begin “0x0120”. It looks like SharePoint finds the document set content type, and uses it for the folder. It even does it if you add the folder content type back, which points towards SharePoint interrogating the content types in the order they were applied to the library.

The solution is to re-write the UniqueContentTypeOrder for the RootFolder of the library. It is a generic list of Content Type IDs – and you must be careful to use the IDs of the List Content Types – not the Web or Site Content Types, because their GUIDs will differ.

Posted by Jonathan Beckett in Notes, 0 comments

Bulk Editing Posts at WordPress.com with the REST API

A little while ago I migrated my personal blog over to WordPress.com – and didn’t notice for quite some time that there were some issues in the body text of some of the older posts (the blog has several thousand posts). If the blog had been hosted on my own server, I could have just written a script to do a database update on the content, but it is hosted at wordpress.com – so that wasn’t an option.

I had a play with the WordPress REST API, and am happy to report that it allowed me to not only load all of the posts from my blog via a script, but also update them.

The script below is purely a guide – it will not work “out of the box”, as you will see if you read the various notes. It’s a template you can fashion to do what you want by adding the various pieces together. In my “real” version, all of the snippets are in one script, one after another.

Oh – and finally – worth noting that this is PHP, and I ran it at the command line in a virtual machine running Ubuntu Server 16.x, spun up at Digital Ocean, and then destroyed afterwards. It cost pennies for the time it existed. The only installs I had to do on the VM were PHP 7, and PHP CURL. There would be nothing to stop you converting it into a PHP script running in a browser, except you would probably hit time-outs. The nice thing about running it at the command line is you get to see progress as it runs.

Get an Access Token

Although some methods of the WordPress API (such as retrieving sites, and posts) require no authentication, we will be calling update later – so will need to get an access token. To do this you have to configure an application at developer.wordpress.com/apps, which will give you a Client ID, and a Client Secret string (the snippet below should be self explanatory).

$client_id = '...';
$client_secret = '...';
$site_url = 'your_blog_name.wordpress.com';
$username = '...';
$password = '...';

// get an access token
$curl = curl_init( 'https://public-api.wordpress.com/oauth2/token' );
curl_setopt( $curl, CURLOPT_POST, true );
curl_setopt( $curl, CURLOPT_POSTFIELDS, array(
    'client_id' => $client_id,
    'client_secret' => $client_secret,
    'grant_type' => 'password',
    'username' => $username,
    'password' => $password,
) );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1);
$auth = curl_exec( $curl );
$auth = json_decode($auth);
$access_token = $auth->access_token;

print "Access Token [".$access_token."]\r\n\r\n";

Get Site Information

The REST API call to retrieve posts needs the internal WordPress ID of your site – to get this you need to call the Sites API.

// get site info
$site_options = array (
    'http' =>
    array (
    'ignore_errors' => true,
    ),
);
$site_context = stream_context_create( $site_options );
$site_response = file_get_contents(
    'https://public-api.wordpress.com/rest/v1.2/sites/'.$site_url.'/',
    false,
    $site_context
);
$site_response = json_decode( $site_response );
$site_id = $site_response->ID;

Retrieve the Posts and Update Them

To get hold of the posts from the blog, we need to repeatedly call the posts API, with a number of parameters – essentially the number of posts to grab in each iteration, and the number of pages to try and loop through. There are a number of ways of iterating the pages – I have gone with a very hacky way that suited my needs – you could be far more clever, and use the page_handle data that comes back with the response data.

// configuration parameters
$posts_per_page = 20;
$pages = 200;
$search_pattern = "..."; // the pattern to identify content within a post that needs updating
$replace_search_pattern = "..."; // the replacement search pattern (regex)
$replace_pattern = "..."; // the replacement pattern (regex)

// setup the post context
$posts_options = array ( 'http' => array ('ignore_errors' => true, ),);
$posts_context = stream_context_create( $posts_options );

// loop through the pages
for ($page=1; $page<$pages; $page++)
{
    $posts_url = 'https://public-api.wordpress.com/rest/v1.1/sites/'.$site_url.'/posts/?page='.$page.'&number='.$posts_per_page .'&fields=ID,title,content';
    $posts_response = file_get_contents( $posts_url, false, $posts_context);
    $posts_response = json_decode( $posts_response );
    for ($i=0; $iposts);$i++) {
        $post = $posts_response->posts[$i];
        print " - ".$post->ID." ".$post->title;

        // does the post have a pattern match in it ?
        $match_result = preg_match($search_pattern,$post->content);
        if ($match_result > 0) {
            print " MATCH FOUND";
            $post_id = $post->ID;
            $updated_content = preg_replace($replace_search_pattern, $replace_pattern, $post->content);

            print "\r\n\r\n".$updated_content."\r\n\r\n";

            // do the update
            $update_options = array (
                'http' => array (
                    'ignore_errors' => true,
                    'method' => 'POST',
                    'header' => array (
                        0 => 'authorization: Bearer '.$access_token,
                        1 => 'Content-Type: application/x-www-form-urlencoded',
                    ),
                'content' => http_build_query( array (
                    'content' => $updated_content,
                    )),
                ),
            );

            $update_context = stream_context_create( $update_options );
            $update_response = file_get_contents('https://public-api.wordpress.com/rest/v1.2/sites/'.$site_id.'/posts/'.$post_id,false,$update_context);
            $update_response = json_decode( $update_response );

            print " UPDATED";
        }

        print "\r\n";
    }
}

It’s a little bit technical in places, but most of this code was lifted from the WordPress API documentation. As I said at the start – this is not a working solution that you can just paste in – it’s a guide to how you can interract with the WordPress.com API from PHP. Hopefully it will be useful to somebody else at some point.

Posted by Jonathan Beckett in Notes, 0 comments