Friday, February 5, 2010

C#.net - Find and Replace in large files

Problem-Scenario: When we are working on large files (e.g. > 1GB) and you have to do simple operation like Replace a string with another string.
regex.replace or string.replace or using xmldom object or LINQ doesn't work! And there are no free tools in market which does it without throwing up in middle.
  
Solution:
Well the solution is as simple as it can get. Just use StreamReader & StreamWriter. These don't load file in memory but streams through your text or xml file byte by byte

Example: (Change the required parameter values for changing a site's site definition)

            /// Replaces text in a file.
            ///
Path of the text file.
            ///Text to search for.
            ///Text to replace the search text.
            

public void ReplaceInFile(string SourcefilePath, string DestfilePath)
            {
               string data;
                if( !(SourcefilePath.Contains(".xml")) )
                {
                    Console.WriteLine("Please specify Manifest.xml path.. Filename is missing");
                    return;
                }
               if(File.Exists(SourcefilePath) == false)
               {
                   Console.WriteLine("File doesn't exist at the specified path\n");
                   return;
               }
              

            StreamReader streamReader = new StreamReader(SourcefilePath);
            StreamWriter streamWriter = new StreamWriter(DestfilePath);

            while (streamReader.Peek() >= 0)
            {

                data = streamReader.ReadLine();

                //**********************************
                // Strings for changing the Configuration IDs
                string OldConfig1 = @"Configuration=""0""";
                string NewConfig = @"Configuration=""2""";

                //-1. Webtemplate="InsideCustompublishingWorkflow - Config as -1 change it to 2
                string searchtext23 = @"WebTemplate=""InsideCustompublishingWorkflow""";

                if (data.Contains(searchtext23) == true)
                {
                    //change the configuration
                    data = data.Replace(OldConfig1, NewConfig);
                 }

           
                //CHANGE THE SITE TEMPLATE NAME
                // 1. WebTemplate="INSIDECustomPUBLISHING" - Aold
                string searchtext1 = @"WebTemplate=""INSIDECustomPUBLISHING""";
                string replacetext1 = @"WebTemplate=""INSIDECustomPUBLISHINGnew""";

                if (data.Contains(searchtext1) == true)
                {
                    //change the configuration
                    data = data.Replace(OldConfig1,NewConfig);
                    data = data.Replace(searchtext1, replacetext1);
                   
                }
               

                //CHANGE THE SITE TEMPLATE SETUPPATH
                //
INSIDECustomPUBLISHING - Aold //SetupPath="SiteTemplates\INSIDECustomPUBLISHING"
                string searchtext4 = @"SetupPath=""SiteTemplates\
INSIDECustomPUBLISHING\";
                string replacetext4 = @"SetupPath=""SiteTemplates\
INSIDECustomPUBLISHINGnew\";
                data = data.Replace(searchtext4, replacetext4);


                //Write the data on .xmlnew file
                streamWriter.WriteLine (data);

            }

            streamReader.Close();
            streamWriter.Close();
}

 

How to change an existing site's site definition?

Problem-Scenario:
I faced this problem in one of my projects. There were many custom site definitions beings used in the organization & few were all messed up; so they wanted me to change the site definition of all the existing sites to just one custom site definition.

Solution:
Now there are two supported ways of changing a site's site definition. Yeah! you heard it right. It is supported by Microsoft.

First way is using the sharepoint deployment API. You can refer to Stefan Gobner's blog on this:
http://blogs.technet.com/stefan_gossner/archive/2007/08/30/deep-dive-into-the-sharepoint-content-deployment-and-migration-api-part-4.aspx
I personally found it difficult, so i analyzed the manifest.xml & did lot of testing to find another simpler solution which doesn't use SharePoint API but does the same thing.

Note: you cannot save site as template (.Stp) because it has limitation of maximum 10MB size which makes the approach useless

The second way is to manipulate the manifest.xml directly
Steps:

1) Find the template name, setuppath, site definition ID, Configuration ID of all the site definitions which needs to be changed as well as of the new site definition. 
For that go to
C:\\Program Files\\Common Files\\Microsoft Shared\\web server extensions\\12\\TEMPLATE\\1033\\XML  open the webtemp.xml of your site definition. after couple of lines, you will see:
"CustomsiteDefitionName" is template name and "50" is template ID or sitedefinition ID
What is the configuration ID for the configuration we want to use?
Our sites are using "Custom team Site", so our configuration id is "0"
In the file, you will also see SetupPath=""SiteTemplates\PUBLISHING\"; for publishing site for example.
Note down these for all site definitions in question.
2) Export the site whose site definition you want to change
stsadm -o export  -url   -filename  [-includeusersecurity]  [-versions] <1-4> -nofilecompression   [-quiet]
This stsadm command will export the site, including it’s subsites, in non compressed fashion. The result will be lot of .dat or data files, few xml – important ones -> Manifest.Xml & Requirements.xml
3) Open Manifest.xml &Requirements.xml & change TemplateName, TemplateID,ConfigurationID & Setuppath of the current site definition to the new sitedefinition's one
That's it & you are almost done.

4) Import this site back to its place. You can delete or overwrite the previous version..upto you. I would suggest taking backup first.
5) Njoy! your site's site definition has been changed.
Note: If your manifest.xml grows bigger (like more than 1GB), any DOM manipulation won't work on it. All DOM ways of opening big files fail since it loads everything in memory. Read my blog on how to handle it...