Recently I have been working through some thorny data and encoding issues whilst integrating via BizTalk. PowerShell really is the new Swiss Army Knife that Perl always was in my long past days of Korn shell on AIX UNIX. As an example I have had to strip out a particular field from a c. 1GB XML file and get a unique count. Typically on a 32bit system, 1GB won't load into a DOM to just do XmlDocument.SelectNodes("...").Count. So I wrote a few lines of C# in a ConsoleApplication that used a XmlTextReader to just Console.WriteLine the element value every time it encountered a particular element. I can reuse this code over and over. So here is a bit of PowerShell to manipulate the output:
get-content C:\temp\mylist.txt | sort -caseSensitive | where {$_.Length -gt 0} | get-unique
This gives me a unique list whilst skipping empty lines. I could go further just in case there is some whitespace crawling in:
get-content C:\temp\mylist.txt | %{$_.Trim()} | where {$_.Length -gt 0} | sort -caseSensitive | get-unique
Maybe I want that counted:
get-content C:\temp\mylist.txt | %{$_.Trim()} | where {$_.Length -gt 0} | sort -caseSensitive | get-unique| measure-object
Another issue we faced was that we receiving files using different encodings. What tool should I use to check to view the Hex and see what the bytes in the stream really say and check the difference? PowerShell:
3> gc C:\temp\FileOne.xml -encoding byte -totalcount 1000 | format-hex
Address: 0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 ASCII
-------- ----------------------------------------------------------------------- ------------------------
00000000 EF BB BF 3C 44 69 73 70 6C 61 79 4E 61 6D 65 20 56 61 6C 75 65 3D 22 43 ...<DisplayName Value="C
00000018 65 73 C3 A1 72 69 61 20 C3 89 76 6F 72 61 22 20 6C 61 6E 67 75 61 67 65 es..ria ..vora" language
00000030 3D 22 65 6E 2D 47 42 22 20 2F 3E ="en-GB" />
Here I can see that the UTF-8 Byte Order Mark is present and what bytes are being used to represent the accented characters in the above XML.
To someone who started out as a UNIX sysadmin / Oracle DBA you have no idea what it is like to have a proper shell again!