产品经理反馈程序经常失去响应,从他那里创建了 dump 文件,取回来,用 windbg 分析一番。感慨颇多。
调试过程
加载符号
1 2
| >!analyze -v wow64cpu!CpupSyscallStub+0x9:
|
看到这个,系统是64位的,转换一下
1 2 3 4 5 6 7 8 9 10 11 12 13
| >.load wow64exts >!sw >!analyze -v
STACK_TEXT: 002294a8 76a43d3c 00000000 002294ec 396663fe ntdll_771d0000!ZwDelayExecution+0x15 00229510 5169f801 00000001 00000000 00035086 KERNELBASE!SleepEx+0x65 00229528 519377cd 0dd20834 0dd20830 00229650 clr!EESleepEx+0x4f 00229538 517afa11 00000000 39676e87 025cfb70 clr!__DangerousSwitchToThread+0x72 00229650 517afad5 00390da0 39676e37 00229700 clr!ThreadNative::StartInner+0x2c1 002296e0 79946049 00229700 00000000 025cfb50 clr!ThreadNative::Start+0x6a 002296f8 79945fe4 00000001 00229738 0802a1c7 mscorlib_ni!System.Threading.Thread.Start(System.Threading.StackCrawlMark ByRef)+0x61 00229704 0802a1c7 023dbe24 025cfb40 00000000 mscorlib_ni!System.Threading.Thread.Start()+0x18
|
看了一下,这里应该不是事故现场。看看其他线程
1 2 3 4 5 6 7 8 9 10 11
| >~*kb //此处省略一些信息 ...... 417 Id: 1480.3cfc Suspend: 0 Teb: 7ea61000 Unfrozen 00 3849f5dc 7721eb4e 00000250 00000000 00000000 ntdll_771d0000!NtWaitForSingleObject+0x15 01 3849f640 7721ea32 00000000 00000000 00000000 ntdll_771d0000!RtlpWaitOnCriticalSection+0x13e 02 3849f668 77209aa9 772d20c0 4f52bc5c 7ea63000 ntdll_771d0000!RtlEnterCriticalSection+0x150 03 3849f6fc 7720984c 3849f76c 4f52bde8 00000000 ntdll_771d0000!LdrpInitializeThread+0xc6 04 3849f748 77209879 3849f76c 771d0000 00000000 ntdll_771d0000!_LdrpInitialize+0x1ad 05 3849f758 00000000 3849f76c 771d0000 00000000 ntdll_771d0000!LdrInitializeThunk+0x10
|
417个线程,大部分线程堆栈都是这样,看起来可能是存在死锁,导致线程不能正常退出
1 2
| >!locks Scanned 9 critical sections
|
没有多余信息。查看一下关键段信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
| >!cs -s -l -o DebugInfo = 0x772d4380 Critical section = 0x772d20c0 (ntdll_771d0000!LdrpLoaderLock+0x0) LOCKED LockCount = 0x187 WaiterWoken = No OwningThread = 0x00001a0c RecursionCount = 0x1 LockSemaphore = 0x250 SpinCount = 0x00000000 OwningThread DbgId = ~47s OwningThread Stack = ChildEBP RetAddr Args to Child 184fd76c 7721eb4e 00000ed8 00000000 00000000 ntdll_771d0000!NtWaitForSingleObject+0x15 (FPO: [3,0,0]) 184fd7d0 7721ea32 00000000 00000000 00000000 ntdll_771d0000!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo]) 184fd7f8 6f882f8e 6f894060 00000000 0dd63b7c ntdll_771d0000!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo]) 184fd820 6f882dc8 06f640c8 00000000 0dd63b78 bcrypt+0x2f8e 184fd874 72269a42 184fd894 72269a70 00000000 bcrypt+0x2dc8 184fd898 7659c167 722a49cc 184fd8c0 721aea6e msxml6!AutoInitSalt::AutoInitSalt+0x1f (FPO: [Non-Fpo]) (CONV: thiscall) 184fd8a4 721aea6e 721aea9c 721aeb90 00000001 msvcrt!_initterm+0x13 (FPO: [Non-Fpo]) 184fd8c0 72191456 72190000 00000000 00000000 msxml6!_CRT_INIT+0xc3 (FPO: [Non-Fpo]) (CONV: stdcall) 184fd920 721ae3fe 72190000 00000001 00000000 msxml6!__DllMainCRTStartup+0x9e (FPO: [Non-Fpo]) (CONV: stdcall) 184fd940 77209344 72190000 00000001 00000000 msxml6!InitDllMain+0x90 (FPO: [Non-Fpo]) (CONV: stdcall) 184fd960 7720fde1 7219135c 72190000 00000001 ntdll_771d0000!LdrpCallInitRoutine+0x14 184fda54 7720ea5e 00000000 6f549168 184fdbf8 ntdll_771d0000!LdrpRunInitializeRoutines+0x26f (FPO: [Non-Fpo]) 184fdbc8 7724d39f 184fdc38 184fdbf8 0dee0974 ntdll_771d0000!LdrpLoadDll+0x472 (FPO: [Non-Fpo]) 184fdc04 76a42e0f 00000000 184fdc58 184fdc38 ntdll_771d0000!LdrLoadDll+0xc7 (FPO: [Non-Fpo]) 184fdc4c 75d29c67 00000000 00000000 00002008 KERNELBASE!LoadLibraryExW+0x233 (FPO: [Non-Fpo]) 184fdc68 75d29bea 00000000 184fdce4 00002008 ole32!LoadLibraryWithLogging+0x16 (FPO: [Non-Fpo]) (CONV: stdcall) 184fdc8c 75d29ad6 184fdce4 184fdcb0 184fdcb4 ole32!CClassCache::CDllPathEntry::LoadDll+0xaf (FPO: [Non-Fpo]) (CONV: stdcall) 184fdcbc 75d28fde 184fdce4 184fdfcc 184fdcdc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [Non-Fpo]) (CONV: stdcall) 184fdf08 75d28eb3 00000001 184fdfcc 184fdf38 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [Non-Fpo]) (CONV: thiscall) 184fdf50 75d28db9 00000001 071100e8 184fdf7c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [Non-Fpo]) (CONV: stdcall) ${$ntdllwsym}!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled. ----------------------------------------- DebugInfo = 0x070be3a8 Critical section = 0x6f894060 (bcrypt+0x14060) LOCKED LockCount = 0x1 WaiterWoken = No OwningThread = 0x000021e4 RecursionCount = 0x1 LockSemaphore = 0xED8 SpinCount = 0x00000000 OwningThread DbgId = ~18s OwningThread Stack = ChildEBP RetAddr Args to Child 0b80e2bc 7721eb4e 00000250 00000000 00000000 ntdll_771d0000!NtWaitForSingleObject+0x15 (FPO: [3,0,0]) 0b80e320 7721ea32 00000000 00000000 0b80e388 ntdll_771d0000!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo]) 0b80e348 77200329 772d20c0 7c9ba944 6f88275c ntdll_771d0000!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo]) 0b80e3e4 77200262 6f840000 0b80e420 00000000 ntdll_771d0000!LdrGetProcedureAddressEx+0x159 (FPO: [Non-Fpo]) 0b80e400 76a41f7c 6f840000 0b80e420 00000000 ntdll_771d0000!LdrGetProcedureAddress+0x18 (FPO: [Non-Fpo]) 0b80e428 6f8826e6 6f840000 6f88275c 0dd7a980 KERNELBASE!GetProcAddress+0x44 (FPO: [Non-Fpo]) 0b80e43c 6f882fbc 6f840000 0dd7a980 00000000 bcrypt+0x26e6 0b80e470 6f882dc8 0dd7a980 00000000 0dd63c20 bcrypt+0x2fbc 0b80e4c4 6f8564b4 0b80e4f0 76b6b79c 00000000 bcrypt+0x2dc8 0b80e4f4 6f856445 00000002 0b80e5b4 0b80e618 bcryptprimitives!CheckSignaturePadding+0x44 (FPO: [Non-Fpo]) 0b80e530 6f886380 0dd7b1b8 0b80e5b4 0b80e618 bcryptprimitives!MSCryptRsaVerifySignature+0x90 (FPO: [Non-Fpo]) 0b80e574 76b8515e 0dc37230 0b80e5b4 0b80e618 bcrypt+0x6380 0b80e5a8 76b8511a 0dc37230 76b6b79c 0b80e618 crypt32!I_CryptCNGSignAndEncodeHash+0x18d (FPO: [Non-Fpo]) 0b80e5e0 76b84fc5 00000001 0044b478 00000000 crypt32!I_CryptCNGVerifyEncodedSignature+0xba (FPO: [Non-Fpo]) 0b80e668 76b84eb4 76b6bf8c 00000001 0044b478 crypt32!I_CryptCNGVerifyCertificateSignedContent+0x15d (FPO: [Non-Fpo]) 0b80e6d4 76b65db2 00000000 00000001 00000002 crypt32!CryptVerifyCertificateSignatureEx+0x242 (FPO: [Non-Fpo]) 0b80e72c 76b65b7c 0de850b0 0ddb6358 00000000 crypt32!ChainGetSubjectStatus+0x90 (FPO: [Non-Fpo]) 0b80e758 76b655a3 0de850b0 00000000 0ddb6358 crypt32!CCertIssuerList::CreateElement+0x51 (FPO: [Non-Fpo]) 0b80e790 76b694fc 0de850b0 0de85128 07138910 crypt32!CCertIssuerList::AddIssuer+0x87 (FPO: [Non-Fpo]) 0b80e7bc 76b6605d 00000002 0de850b0 0de85128 crypt32!CChainPathObject::FindAndAddIssuersFromCacheByMatchType+0x87 (FPO: [Non-Fpo]) ${$ntdllwsym}!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
|
有很多个,但是前两个就可以破案了。
18号线程:
0b80e348 77200329 772d20c0 7c9ba944 6f88275c ntdll_771d0000!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
进入了 0x6f894060 关键段,想要访问 0x772d20c0 这个关键段
47号线程:
184fd7f8 6f882f8e 6f894060 00000000 0dd63b7c ntdll_771d0000!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
进入了 0x772d20c0 关键段,但是想要进入 0x6f894060 关键段。造成死锁。
再看47号线程这部分,LockCount = 0x187 有391个线程因为这个被锁住,跟线程数也差不多对的上,基本可以断定 0x772d20c0 这个关键段不能被正常释放导致。再分析两个线程的堆栈,可以看到都进入了 bcrypt 模块,基本可以判定 微软的 bcrypt 会存在死锁
去网上搜了一下,果然有人遇上同样的问题
https://social.technet.microsoft.com/Forums/Lync/en-US/dee65a4a-ed42-426b-8540-427d2875154f/excel-365-may-experience-a-deadlock-while-opening-encrypted-spreadsheet?forum=Office2016ITPro
不是客户端本身代码问题,松了一口气,但是微软这个加密模块会存在死锁,感觉还是有点 emmm… 唔得行
后记
一个加密的工具模块,竟然会用到锁,这个设计也是挺迷的。这个问题在 Win7 下出现过,Win10 还没有接到反馈。说到底 dotnet framework 这个设计还是为人所诟病的。运行时不能独立出来,对系统本身依赖太多。首先会因为用户本身的环境有问题,导致软件运行不正常;其次,还有一部分的问题因为用户的环境不同,有些会出问题,有些不会出问题,而且框架没得改。也难怪没有火起来。还好微软及时幡然悔悟,dotnet core 3 终于回归正轨。
WPF 这套开发效率确实是高,但是面对各种各样用户的系统环境,如果 dotnet 不能让人做到可以自己编译自己改,那就还是存在硬伤的。发现问题但改不动,微软对于框架本身的自信最终还是会害了自己。就看 dotnet 5 能不能有啥大突破了