至顶网›软件频道 ›谁动了我的Infopath附件？－Infopath上传文件产生“文件损坏”

谁动了我的Infopath附件？－Infopath上传文件产生“文件损坏”

扫一扫
分享文章到微信
扫一扫
关注官方公众号
至顶头条

最近接到了一个需求，需要将保存在Infopath表单中的数据提取出来。{　　Console.WriteLine(string.Format("目标文件的第{0}位置出现与源文件{1}不匹配的情况，不匹配的字符{2}",srcPtr,destPtr,dst[destPtr]));

来源：cnblogs 2007年11月2日

关键字：表单 SharePoint 数据 InfoPath Office

　　最近接到了一个需求，需要将保存在Infopath表单中的数据提取出来。可是这么做出现了一个问题，就是经过IP上传的文件，似乎都被IP的附件控件增加了一些数据，这会导致一种颇为致命的错误，这么来做会破坏原文件的二进制布局。如果容错性不好的程序，会直接报错。比如我把word文件提出来序列化到硬盘上打开，2003会直接提示错误，2007倒是在提示错误以后，可以自动修复。不过我们不需要这种功能。

　　Here we go!

　　首先应该找出，调皮的Infopath到底将哪些数据藏在了文件的什么地方。动手写了一个程序，将未处理过的文件和被处理过的文件逐字节匹配，遇到不匹配的数据以后会尝试查找匹配的数据。

　　FileStream fs1 = new FileStream(@"c:\source", FileMode.Open);

　　FileStream fs2 = new FileStream(@"c:\dest", FileMode.Open);

　　BinaryReader brSrc = new BinaryReader(fs1);

　　BinaryReader brDst = new BinaryReader(fs2);

　　byte[] src=brSrc.ReadBytes(int.Parse(brSrc.BaseStream.Length.ToString()));

　　byte[] dst = brDst.ReadBytes(int.Parse(brDst.BaseStream.Length.ToString()));

　　int destPtr = 0 for (int srcPtr = 0 srcPtr < src.Length; srcPtr++, destPtr++)

　　{

　　if (dst[destPtr] != src[srcPtr])

　　{

　　Console.WriteLine(string.Format("目标文件的第{0}位置出现与源文件{1}不匹配的情况，不匹配的字符{2}",srcPtr,destPtr,dst[destPtr]));

　　for (; destPtr < dst.Length; destPtr++)

　　{

　　if (dst[destPtr] == src[srcPtr])

　　{

　　Console.WriteLine(String.Format("在第{0}个位置上，找到对应byte", destPtr)); Console.Read();

　　break }

　　}

　　else Console.WriteLine("source的第{0}位与dest的第{1}位匹配",srcPtr,destPtr);

　　}

　　Console.Read();

　　经过测试发现，新的文件比老文件增大了58个字节，看来我的猜测是对的，Infopath的确在文件中动了手脚!

　　再看一下结果：

　　还好，数据只被增加到了Infopath文件的头部。剩下的就是要分析头的格式，因为一般头都是可变长度的，所以分析格式可以动态的取出实际的infopath文件。

　　取出这58个字节，经过Unicode解码，发现了我上传文件的文件名。联想到infopath的xml文件中，并没有存有文件名的节点，但是仍然可以在infopath中将数据显示出来的情况，问题就很好解释了。

　　参考Infopath的官方博客的文章，http://blogs.msdn.com/infopath/archive/2004/03/18/92221.aspx

　　发现了这么一段话：

　　BYTE[4]: Signature (based on the signature for PNG):

　　(decimal) 199 73 70 65(hexadecimal) C7 49 46 41

　　(ASCII C notation) \307 I F A

　　The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a file attachment. The rest identifies the file as an InfoPath File Attachment.

　　DWORD: Size of the header

　　DWORD: IP Version

　　DWORD: dwReserved

　　DWORD: File size

　　DWORD: Size of file name buffer

　　File name buffer: variable size

　　注意到这个头文件的格式分为六个部分，除了BYTE[4]中的四个字节，其余五部分都是DWORD类型。注意，DWORD是双字，一字是两个字节，也就是说一个DWORD的大小是4个字节。问题到这里就很清楚了，前4*(4+1)个字节是固定不变的，其中，在第20到24字节保存的正是文件名的大小。