Kutools for Excel

How to Remove Duplicates items in SharePoint Online List and keep one by PowerShell ?

Posted on Updated on

Hello folks,

We have faced recently a critical situation with a customer:
He was running a certain PowerShell script to update a SharePoint Online list with Data, it reaches 7000 items, a certain issue occurred in the script, and he found that he as 20,000 items in the same list, 70% was duplicates.

He wants to solve the situation in a safe way with no risk, by removing duplicate items and keep the Original one.
To solve the situation I did the following, This solution can be applied on all SharePoint versions 2010, 2013, 2016, & SharePoint Online, as the script is written to work as Client Object Model.

      1. I took back-up from the List, if you use SharePoint On-premise version, you can take back-up of the site. If you are using SharePoint Online, you can use Sharegate, however this step is not mandatory.
      2. Export AllItems to Excel using the Excel action in the ribbon, and make sure the ID Column is part of the export.
      3.  Download and setup a free tool named Kutools for Excel from here , the page contains also description how to use it to mark the duplicated column.
      4. Now take of the IDs from the Excel, and paste it in a text file on your harddrive
        Ids
      5. Then run the below PowerShell Script
        Add-Type -Path "C:\Path\SPOnline\Microsoft.SharePoint.Client.dll"
        Add-Type -Path "C:\Path\SPOnline\Microsoft.SharePoint.Client.Runtime.dll"
        $siteurl = "https://tenant.sharepoint.com/sites/yourSite/"
        $UserName = "siteadmin@tenant.onmicrosoft.com"
        $SecurePassword = Read-Host -Prompt "Please enter your password" -AsSecureString
        $Credentials = New-Object -TypeName System.Management.Automation.PSCredential -argumentlist $userName, $SecurePassword
        $SPOCredentials = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($UserName, $SecurePassword)
        $context = New-Object Microsoft.SharePoint.Client.ClientContext($siteurl)
        $context.Credentials = $SPOCredentials
        $web = $context.Site.RootWeb
        $context.Load($web)
        $context.Load($web.Lists)
        $context.ExecuteQuery()
        $listTitle = "List Name"
        $list = $Context.Web.Lists.GetByTitle($listTitle)
        $Context.Load($list)
        $Context.ExecuteQuery()
        $TargetSites = Get-Content "C:\temp\List-of-Ids.txt"
        foreach ($targetSite in $TargetSites)
        {
        $id=$targetSite.Split(";")[0]
        $caml = @"<View Scope="RecursiveAll"><Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy><Where><Eq><FieldRef Name='ID' /><Value Type='Number'>$id</Value></Eq></Where></Query><RowLimit Paged="TRUE">1</RowLimit></View>"
        $query = new-object Microsoft.SharePoint.Client.CamlQuery
        $query.ViewXml = $caml
        $items = $list.GetItems($query)
        $Context.Load($items)
        $Context.ExecuteQuery()
        if ($items.Count -gt 0)
        {
        for ($i = $items.Count-1; $i -ge 0; $i--)
        {
        Write-Host "Deleted: " $items[$i].Id.ToString() -ForegroundColor Yellow
        }
        $Context.ExecuteQuery()
        }
        }
        #$context.Dispose()

     

    Now, the items will start to be deleted, you can change the PowerShell script to fits your need.
    This method is safe because you see exactly what do you want to delete before executing the deletion script