Tuesday, July 29, 2008
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
Whilst the Linux desktop (GNOME) still feels a little primitive, it certainly has a few features that Windows could learn from.
Labels: linux, technology, windows
Monday, June 30, 2008
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
Today, I signed up for an account with a host that provides a backend, I could post short messages to (commonly referred to as micro-blogging) using Emacs. It happens to be known by the name "Twitter", and here's how I got Emacs to talk to it:
(setq twitter-post-user nil)
(setq twitter-post-password nil)
(defun twitter-post ()
(interactive)
(let
((message (read-from-minibuffer "Message: "))
(user
(cond ((eq twitter-post-user nil) (read-from-minibuffer "Username: "))
(t twitter-post-user)))
(password
(cond ((eq twitter-post-password nil) (read-passwd "Password: "))
(t twitter-post-password))))
(shell-command
(concat "wget "
(concat "--user=" user " ")
(concat "--password=" password " ")
(concat "--post-data status=\"" message "\" ")
"http://twitter.com/statuses/update.xml"))))
M-x twitter-post - twittering from emacs!
PS: I also use Windows, a library that Emacs uses to communicate with Intel hardware.
Sunday, December 23, 2007
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
A couple of days back I was talking to a friend of mine about, amongst other things, the CLR and the conversation drifted to how the CLR gets loaded into a process. People looking at .NET for the first time are often perplexed by how much a managed assembly resembles a native binary from the outside and yet how the CLR magically takes control over its execution at runtime, and he was no exception. The details are interesting enough that I thought I’d make a blog post.
Every binary (EXE or DLL) designates an address in its code section as being its entry point. This is the address the operating system transfers control to when the binary is executed (EXE) or gets loaded (DLL). Although can be overridden, the entry point for applications written in C/C++ typically gets set to (w)mainCRTStartup or (w)WinMainCRTStartup depending on the sub-system, by the linker.
To check this, set a break point in main and look at the call stack of an application written in C/C++:
ChildEBP RetAddr 0012ff50 00401fbb iat!wmain 0012ffa0 77183833 iat!__tmainCRTStartup+0x15e 0012ffac 778ba9bd kernel32!BaseThreadInitThunk+0xe 0012ffec 00000000 ntdll!_RtlUserThreadStart+0x23
A similar examination for a managed binary would exemplify how the CLR gets loaded into the process. In Windbg, open a managed executable (File->Open Executable). ntdll!LdrpInitializeProcess notices that the process is being debugged and hence breaks into the debugger. Now is a good time to take note of a few things.
Here’s what Windbg says:
ModLoad: 01100000 01108000 image01100000 ModLoad: 77880000 7799e000 ntdll.dll ModLoad: 79000000 79045000 C:\Windows\system32\mscoree.dll ModLoad: 77140000 77218000 C:\Windows\system32\KERNEL32.dll (123c.1754): Break instruction exception - code 80000003 (first chance) eax=00000000 ebx=00000000 ecx=001efa10 edx=778e0f34 esi=fffffffe edi=77945d14 eip=778c2ea8 esp=001efa28 ebp=001efa58 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 ntdll!DbgBreakPoint: 778c2ea8 cc int 3
“image01100000” is the name Windbg has given the managed EXE launched. 0x01100000 is the address at which the image got loaded. However there is no mention of the entry point which is contained in one of the PE headers (called the optional header). Given that this binary is mapped, memory could be dumped, headers disassembled to get to the entry point. However that would be extremely cumbersome especially when a tool can do the same thing. Enter dumpbin. Dumpbin like its name suggests is a tool that dumps the data contained within a PE.
Here is a partial dump of “dumpbin /HEADERS test.exe”
…
FILE HEADER VALUES
14C machine (x86)
3 number of sections
476D839B time date stamp Sun Dec 23 03:07:31 2007
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
10E characteristics
Executable
Line numbers stripped
Symbols stripped
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
8.00 linker version
400 size of code
600 size of initialized data
0 size of uninitialized data
22EE entry point (004022EE)
2000 base of code
4000 base of data
400000 image base (00400000 to 00407FFF)
2000 section alignment
200 file alignment
4.00 operating system version
0.00 image version
4.00 subsystem version
…
Dumpbin says the entry point of this EXE is at RVA 22EE. An RVA (which stands for Relative Virtual Address) however is not an address. An RVA is the offset from the base of the binary. Windbg’s output already indicated that 0x01100000 is the address at which the binary got loaded. Thus the entry point is 0x01100000 + 0x22EE == 0x011022EE. (Note: dumpbin lists the image’s preferred base address as being 0x400000 however that was not where it got loaded. EXE’s always (usually) get loaded at their preferred base address however this I believe is due to ASLR). Now set a breakpoint at 0x011022ee and let the application execute.
When the breakpoint is hit, here’s what Windbg says:
0:000> bp 0x011022ee 0:000> g Breakpoint 0 hit eax=77183821 ebx=7ffdf000 ecx=00000000 edx=011022ee esi=00000000 edi=00000000 eip=011022ee esp=001efeb4 ebp=001efebc iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 image01100000+0x22ee: 011022ee ff2500201001 jmp dword ptr [image01100000+0x2000 (01102000)]
The code at 0x011022ee is a jmp instruction the target of which is the address found at 0x01102000. Usually an indirection in a jmp/call target could mean amongst other things a dynamically imported entry point. Let’s now get back to dumpbin to check if that is the case.
“dumpbin /IMPORTS test.exe” dumps the following information:
Section contains the following imports:
mscoree.dll
402000 Import Address Table
4022BC Import Name Table
0 time date stamp
0 Index of first forwarder reference
0 _CorExeMain
The listing says, the “Import Address Table” resides at RVA 0x2000 (Note: the dumpbin listing shows the address calculated using the module’s preferred base address: 0x400000). Let’s know take a quick detour to understand how dynamic linking works in Windows and for that purpose examine the code, the compiler generates under the following circumstances:
1. Calling a function implemented in the same binary
test.c
void a(void) { }
void b(void) { a(); }
test.asm PUBLIC _a _TEXT SEGMENT _a PROC push ebp mov ebp, esp pop ebp ret 0 _a ENDP _TEXT ENDS PUBLIC _b _TEXT SEGMENT _b PROC push ebp mov ebp, esp call _a pop ebp ret 0 _b ENDP _TEXT ENDS END
2. Calling a function exported by a DLL
test.c
__declspec(dllimport) void a(void);
void b(void) { a(); }
test.asm PUBLIC _b EXTRN __imp__a@0:PROC _TEXT SEGMENT _b PROC push ebp mov ebp, esp call DWORD PTR __imp__a@0 pop ebp ret 0 _b ENDP _TEXT ENDS END
The highlighted lines explain the difference in the code that gets generated. “call _a” means call the function whose address is _a. “call DWORD PTR __imp__a@0” means call the function whose address is the first DWORD at address __imp__a@0.(_a and __imp__a@0 are resolved by the linker). In the case of calling a function exported by a DLL, the compiler puts in an indirection because it does not know where the function’s code resides in memory.

The above diagram explains this in a little more detail. There is an array of function pointers per binary that contains as many slots as there are imported functions from other DLLs. Each of these slots’ address is what is shown as an __imp__* symbol in the assembly listing. At runtime the loader populates this array with actual function pointers as the DLLs get loaded. The “Table of function pointers” is formally called the “Import Address Table” or simply IAT.
Armed with this knowledge, let’s examine the jmp instruction again. The dumpbin listing confirmed that the target of the jmp instruction was the first entry in the IAT. Let’s now examine what it is.
0:000> dd 0x01102000 01102000 79003a6c 00000000 00000048 00050002 01102010 0000205c 00000238 00000001 06000001 01102020 00000000 00000000 00000000 00000000 01102030 00000000 00000000 00000000 00000000 01102040 00000000 00000000 00000000 00000000 01102050 1e2a000a 00032802 002a0a00 424a5342 01102060 00010001 00000000 0000000c 302e3276 01102070 3730352e 00003732 00050000 0000006c
Dumping the first few DWORDS starting 0x01102000 shows us that the first DWORD (the target of the jmp instruction) is 0x79003a6c, which happens to be the address of mscoree!_CorExeMain.
mscoree!_CorExeMain: 79003a6c 55 push ebp 79003a6d 8bec mov ebp,esp 79003a6f 51 push ecx 79003a70 56 push esi 79003a71 6a00 push 0 79003a73 8d45fc lea eax,[ebp-4] 79003a76 50 push eax 79003a77 6a01 push 1 79003a79 e8ef000000 call mscoree!GetInstallation (79003b6d) 79003a7e 8bf0 mov esi,eax 79003a80 85f6 test esi,esi 79003a82 0f8c61610100 jl mscoree!_CorExeMain+0x33 (79019be9) 79003a88 68a43a0079 push offset mscoree!`string' (79003aa4) 79003a8d ff75fc push dword ptr [ebp-4] 79003a90 ff1520100079 call dword ptr [mscoree!_imp__GetProcAddress (79001020)] 79003a96 85c0 test eax,eax 79003a98 0f8444610100 je mscoree!_CorExeMain+0x2f (79019be2) 79003a9e ffd0 call eax 79003aa0 5e pop esi 79003aa1 c9 leave 79003aa2 c3 ret 79003aa3 90 nop
Thus the instruction at the entry point of the managed EXE did nothing more than jumping to a function that it imported from mscoree.dll. That leads us to the question – what is mscoree.dll?
Mscoree.dll is the CLR’s “start up shim”. All it does is to choose a specific version of the CLR (amongst the multiple side-by-side versions present in the machine) that will execute the assembly. This choice is influenced by several factors that are beyond the scope of this blog post. _CorExeMain hosts the CLR which executes the managed entry point that it determines from the metadata in the assembly.
That the SSCLI implementation cannot use the same mechanism is worth mentioning. Although the file format is the same, the PE is not the native file format on all Operating Systems SSCLI supports (read FreeBSD). This necessitates “clix.exe” that accomplishes the same thing the jmp instruction does on Windows.
Labels: clr
Friday, December 14, 2007
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link

Meenakshi Amman temple, Madurai
After having to put up with quite a few unexpected server shutdowns, I finally got over my laziness, to move my blog to (what appears to be so far) a relatively more stable server.
Labels: website
Wednesday, November 21, 2007
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
Over the weekend I was investigating a few GC root regressions related to a recent change in our runtime. I wanted to check if an object was erroneously being treated as garbage and hence collected between construction and a member method call.
Here’s the code I wrote:
function Person(name)
{
this.name = name;
}
Person.prototype.getName = function()
{
CollectGarbage();
return this.name;
}
(new Person("Bill")).getName();
Looks good enough for the purpose but didn’t work. When executed, this was the error I got:
---------------------------
Windows Script Host
---------------------------
Script: C:\home\kaushis\code\test\test.js
Line: 6
Char: 1
Error: 'undefined' is null or not an object
Code: 800A138F
Source: Microsoft JScript runtime error
---------------------------
OK
---------------------------
I initially assumed the GC had actually collected the object after entering the member function but before ‘this’ could be de-referenced within – precisely the type of bug I was after – However the runtime diagnostic traces indicated otherwise.
My manager who happened to notice my code, immediately pointed out what I was doing wrong. For people who haven’t guessed it yet, here’s a clue. If line 12 is replaced with
print((new Person("Bill")).getName());
it works. For people who still don’t get it here’s another clue. This is the parse tree generated out of the first snippet of code
Pretty obvious. Isn’t it?
In ECMAScript, each line of code is terminated (either manually or automatically) by a semicolon which apparently doesn’t apply (automatic semicolon insertion rules are defined by the standard and are probably more complex than the language’s grammar itself!) to the Person.prototype.getName = [function_expression] case and hence the parser treats the subsequent expression to be a part of the function expression. i.e.
function() { ... }( new Person(“Bill”) ).getName()
Now, this is valid syntax in ECMAScript that calls the function just defined with a new instance of Person and calls getName on the return value. Lesson learnt – never depend on the automatic semicolon inserter.
Labels: jscript
Sunday, November 04, 2007
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
Azhagar Kovil, Madurai
Labels: photos
Sunday, February 11, 2007
Permanent Link |
0 comments
|
Trackbacks
|
Create a Link
How many times have you wondered how to get rid of Emacs' cluttered mess of ~ files?. I have never liked my directories littered with ~ files. On the other hand, these files have saved me a couple of times. So I wanted this feature but didn't like seeing the ~ files. To get rid of it, my makefiles always used to have a "del *~" under the "clean" rule until I discovered a convenient alternative. Paste this snippet of ELISP code into your .emacs file and have it save tilde files to an alternate directory that you can clean periodically :-)
;; create a backup file directory
(defun make-backup-file-name (file)
(concat “~/.emacs.backups/” (file-name-nondirectory file) “~”))
Labels: emacs
January 2006 | February 2006 | March 2006 | September 2006 | October 2006 | February 2007 | November 2007 | December 2007 | June 2008 | July 2008 | Current
The views expressed on this website and weblog are mine and do not necessarily reflect the views of my employer.
Last updated: 5:03 PM 2/10/2007 | contact