Thursday, June 6, 2013

Dumping Tomcat packets using tshark

Tshark (command line version of wireshark) is a wonderful tool for dumping packets and recently I used it on my Mac since I couldn't easily get Tomcat to log the HTTP packets coming in on port 8080. Having used it in the past for lots of other reasons, I felt compelled to find a generic solution to this problem where you have to rely on application level logging to determine why something works or doesn't.

Here is the command I used (lo0 is the loopback interface since I was running the client and server on my PC)

tshark -f "tcp port 8080" -i lo0 -V. Here is a very good page on tshark that I am sure I will come back to again and again to get more juice out of this tool.

Getting Postman to test Restlet

Postman is a wonderful Chrome application for testing REST method calls. However it was a bit hard to get it to work for testing Rest calls with my Restlet server. Here are the gotchas I faced and havent yet figured them out completely. I love to record them so I can come back much later and they are still here ;)

1. Postman does not add the Content-Type header by itself. If you select an HTTP method which allows a body (e.g POST), it allows you to create the body and select the format (JSON/XML etc) but you must remember to add the Content-Type header.
2. If your application requires authentication, you can add that too. Postman supports basic, digest as well as OAuth (which I would love to test out next).
3. The biggest problem I ran into was when I sent XML or JSON body and the Restlet server replied back with 415 - Unsupported Media Type. The request does not even hit my application side! If you write a client application and choose the Media Type of MediaType.APPLICATION_XML and the server side method is annotated by @Post("txt:xml") it works. However, when you set the Content-Type header in Postman to application/xml it does not work. To debug this further, I installed wireshark and dumped the packet contents. I was surprised to find that the client built using Restlet framework was sending a Content-Type header as text/plain. This had to be some issue on my end. Interestingly, if I made the corresponding changes in Postman, the request coming out from that application also started to work. These are the two headers I inserted. Note that a Content-Type of text/plain still does not work. You must indicate the charset as well to make it work.

Content-Type: text/plain; charset=UTF-8
Accept: application/xml


Friday, May 24, 2013

Comparing Oracle NoSQL and MongoDB features

I recently had a chance to go through Oracle documentation on its NoSQL solution in an effort to evaluate it against MongoDB. The below comments are based on Oracle's 11g Release 2 document (4/23/2013) while Mongo is at its Release 2.4.3.

Oracle's deployment architecture for its NoSQL solution bears strong resemblance to MongoDB with some restrictions which I feel will go away in subsequent releases. The whole system is deployed using a pre-decided number of replica sets where each replica set may have a master and several slaves. The durability and consistency model also seems similar to what MongoDB offers, although the explanation  and control seemed a lot more complex. One of the restrictions is that the user must decide how many partitions the whole system will have in the beginning itself and these partitions are allocated amongst the shards. Mongo's concept of "chunks" seemed a lot similar but easier to use and understand. 

One of the biggest issues is security - the system has almost no security at user access level and there is no command line shell to interact with it. The only API which can possibly be used is Java. This is clearly not developer friendly right now. 

Perhaps the most confusing was the concept of JSON schemas which were used to save and retrieve the values in the key value database. Every time you save and retrieve a value from the database you have to specify a schema to serialize and de-serialize the data and these schemas may have different versions that you need to track. There could be multiple schemas concurrently being used (e.g each table could be using one schema). What was confusing about it was why Oracle took this approach and even if they did why was this not hidden under the hood so users dont have the deal with it. The boilerplate code which has to be written to constantly handle this simply looks unreadable and no developer would find it fun to use this system.

I also noticed the absence of ad-hoc indexing in the system, something I have begun to appreciate in Mongo now.

Another odd feature was that when an update was made to a set of records which has the same "major" key, the update could be made as a transaction. This was because a records which had the same "major key" always went to the same partiton which ends up on a single replica set and hence on one physical node which is acting as a master. This is one more thing which must be carefully considered by the developer before the schema is designed.

I found Mongo to be much more mature in terms of using true flexible schema - the user sees records as full JSON documents where each record in the same table can potentially have different fields. In Oracle, if a field is missing and its in the schema it must have a default value (very much like relational).

What seemed to be well thought through was the deployment architecture, durability, consistency model and how it was controlled by the user on a per operation basis. I am not sure of any big deployments using Oracle NoSQL yet so it would be good to hear if there are any. I am also expecting a big revamp in the next release from Oracle so this is easier to use from a development standpoint.

Tuesday, April 9, 2013

AD Authentication with SVN

Frequently with SVN, you would want to integrate with Active Directory to enable users to use their Windows Login and Password with SVN. Here is how you do it:


The Apache module to use in this case is the mod_auth_sspi. Once you have enabled that, set the SSPI configuration section in subversion.conf. the SSPIDomain should be set to the name of the domain you want to authenticate with.

When a user logs into SVN, the user ID they type into the SVN Authentication prompt is the Windows Account (without the domain) and the password is the domain password.

This may not be enough for you if you are using bugzilla integration with SVN using SCMBug. Bugzilla has an integration with AD (which I personally have not used) with enough documentation around it. If you use it, then just change the SCMBug configuration file to pass through the user id from SVN to Bugzilla. I think that should work, but if you find some tweaking is necessary in SCMBug, just post it back in the comments section for this post. I will definitely appreciate it in my next gig!

Wednesday, April 3, 2013

Interesting case for var -> val conversion in Scala

While Scala supports imperative as well as functional style of programming, it recommends programming in the latter. One of the challenges I faced was getting rid of a counter in the following BFS (Breadth First) style recursion (mutable to immutable variable conversion). I wont go into specifics but rather look at the pattern to be used here:

def BFSCounter(money: Int, coins: List[Int]): Int = {
    var n = 0
    // termination checks
    for (// loop dependent on input parameters
        n += BFSCounter( ...)
    n
}

The above code launches multiple recursions in each call of BFSCounter and the variable n which maintains the count must be incremented and returned. 

The strategy to get rid of the mutable variable is to pass the counter from one call of BFSCounter to another.  To do that we must also make the calls to BFSCounter a constant (independent of variables or input parameters). This means understanding the traversal pattern, termination checks etc. The final code looks like the following:

def BFSCounter2(money: Int, coins: List[Int]): Int = {
    // termination checks
    val count1 = BFSCounter2(...0..)
    BFSCounter2(count1)
}

Note that you may be able to reduce the code to just one call to BFSCounter2 or maybe you will need three calls. The point is that the number of calls should be finite and the counter is passed into the BFSCounter2 and passed back out as return value. The last call's return value is the final return value of the function.

Scala: Parameter-less functions

Scala has a few tripping points for newbies and one of them is parameter-less functions which look just like variable names or field names. I came across a very interesting example when researching the difference between def and val in Scala:

Scala has three ways to define a variable, method name or object:
  • def defines a method
  • val defines a fixed value (which cannot be modified - these are like Java final variables)
  • var defines a variable (which can be modified later). These are discouraged in Scala since they are mutable and are indicative that the program is using an imperative style of programming.
Consider the following example which may fool you (from this question on stackoverflow):

class Person(val name:String,var age:Int )
def person =new Person("Kumar",12) 
person.age=20 
println(person.age)

These lines of code give output as 12. ALWAYS. The reason is that "def person" creates a function definition instead of an object. Thus each time you write person in the code, it refers to a brand new Person object created by the person method. This happens in the assignment as well as the println code!


Another example from the "Programming in Scala, 2nd Edition" - If a function literal
consists of one statement that takes a single argument, you need not explicitly
name and specify the argument. Thus, the following code works and println looks like the name of variable in this case:

args.foreach(println)

Scala: Call by Name

Functional languages typically implement the call by name evaluation strategy natively. There are some special use cases where such an evaluation strategy may boost performance and one of the examples was given here.

The case  in consideration is how log statements within code can be made more efficient while being clean. Typically log level determines if the log statement actually does something in the code. 

For example, lets say you wanted to put the following log statement in the code:

logger.info("ok" + "to" + "concatenate" + "string" + "to" + "log" +"message")

If the check for the log level is done inside the log method, then by the time the check is done, the string concatenation would already have been performed. That would lead to bad performance. So the solution would be to wrap the code above with a check on the log level:

if (logger.isEnabledFor(Logger.INFO)) {
    // Ok to log now.
    logger.info("ok" + "to" + "concatenate" + "string" + "to" + "log" +"message");
}

In a language like Scala, this would be achieved by using a call by name argument to the logger method. The call by name argument is evaluated (pretty much like a parameter-less function call) only when needed. It is different from a lazy evaluation in that it is evaluated each time it is needed (call it lazy with no memory!).

Here is the code quoted from the above article:

def log(level: Level, message: => String) = if (logger.level.intValue >= level.intValue) logger.log(level, msg)

Another excellent example can be seen here.


Monday, March 25, 2013

Design Pattern Reference

Here are some of the best articles on design patterns on the internet. Once you review the examples and discussions, its best to do some comparisons next.

  1. Facade.
  2. Flyweight
  3. Adapter
  4. Chain of Responsibility
  5. Singleton
  6. Observer
  7. Composite
  8. Command
  9. Strategy
  10. Proxy.
  11. Decorator
  12. Broker
  13. Builder
  14. Dependency Injection
  15. State
  16. Bridge
  17. Interface
  18. Prototype
Interesting comparisons to do:

Facade, Adapter: Facade works on an entire subsystem (multiple classes) to make suability simpler and Adapter generally does not target simplicity as much as mapping one set of calls to another.
Proxy, Decorator: Proxy pattern binds the class being proxied with the proxy class at compile time. Decorator does this at runtime. Decorator typically intends to add or remove functionality while Proxy typically exposes full functionality. In most cases, proxy always uses lazy instantiation for the class which is being proxied and decorator will typically instantiate the containing class in the constructor.
Chain of Responsibility, Decorator: Think linked lists v/s wrappers. Linked lists can kill the flow at any time w/o visiting all elements. In addition wrappers implement both pre and post processors.
Decorator, Inheritance: If each operation implemented by a decorator is instead implemented as a sub class, you will have to create multiple combination sub-classes for each combination which can be produced using a decorator sequence. Think m+n v/s m 'times' n for the total number of classes.
Strategy, Dependency Injection: DI is a refinement of Strategy pattern.
Strategy, Command: Strategy pattern can be used when implementations change for a task amongst different classes (which implement the same interface). Command on the other hand is much simpler. It makes sense when the caller does not want to get involved in the details of how a handler implements a certain command.

UML Refresher

Here is a quick UML refresher for class diagrams.

  1. Inheritance is indicated by a solid line with a closed, unfilled arrowhead pointing at the super class
  2. dotted line with a closed, unfilled arrow means implementation of the interface it is pointing to
  3. A bi-directional association is indicated by a solid line between the two classes. At either end of the line, you place a role name and a multiplicity value.
  4. A uni-directional association is drawn as a solid line with an open arrowhead pointing to the known class.  In a uni-directional association, two classes are related, but only one class knows that the relationship exists.
  5. A basic aggregation relationship indicates that one class is a part of another class. In an aggregation relationship, the child class instance can outlive its parent class. To represent an aggregation relationship, you draw a solid line from the parent class to the part class, and draw an unfilled diamond shape on the parent class's association end.
  6. A composition aggregation relationship is just another form of the aggregation relationship, but the child class's instance lifecycle is dependent on the parent class's instance lifecycle. The composition relationship is drawn like the aggregation relationship, but this time the diamond shape is filled.
  7. Attributes/operations visibility: +/public, #/protected, -/private, ~/package.
Reference

Friday, March 1, 2013

Mapping IIS worker processes to pools

Once in a while you have to diagnose memory or performance issues with .NET applications. The first indication comes from the Windows task manager which may indicate a process hogging the CPU/memory or users may complain of excessive slowness. In my particular case, I recently examined an IIS6 worker process which was spinning at 20% and using 1GB of memory on a 32 bit Windows 2003 Server. Combine with user complaints that the system was extremely slow and unresponsive, it was something to look into. The investigation is still ongoing, but I am documenting the steps taken so far.

Step #1: Find out which application or applications are causing the problem. Using the process ID from the task manager, you can use one of the following methods depending on whether this is IIS6 or IIS7
  1. On II7 the IIS Manager gives an option to locate running worker processes, the pools they belong to and the requests they are currently processing right from the GUI. 
  2. On II6, this can get a little tricky. A little VB Script can be of help: IISApp.vbs. Since it is typically found in the system folders, you can pretty much run it directly from the command tool. Note that this script won't be found on IIS7 installs.

Step #2: The next step typically is to start tracking performance counters and see which ones may indicate problems. From the command prompt, type "perfmon". Using perfmon, you can turn on different performance counters. The slowness can be due to a variety of reasons. There could be a queue which is getting very long - a processor or network queue for example or system is doing too many context switches or there are too many interrupts or .NET garbage collection could be going on. The important point is to pick a place to start and find patterns. For example, I noticed that the processor queue length was relatively high and the context switches were very high in the system. The next step, it seemed to me was to try and isolate the process threads which were switching in and out and what was causing them to block so the OS would switch the thread out.

Perfmon shows multiple instances of the same process with suffixes like "#1", '#2' etc. This is not very helpful and brings us to Step #3 meanwhile in the digging process:

Step #3 The following article from Microsoft describes how to fix perfmon to print process IDs instead.

http://support.microsoft.com/kb/281884

Monday, February 18, 2013

Scripting on Linux

One of the best user contributed collections on unix scripting, available at The Linux Documentation Project. This has lots of illustrative and sometimes idiosyncratic examples which are better suited to learning than any other online site.

Friday, February 8, 2013

Inno Setup: Creating a scheduled Log Rotator

Setting up a rotation mechanism for log files on a Windows machine can be an interesting problem. This could be real easy on Linux using CRON but on Windows you may find the a tool called waRmZip (download from sourceforge or look at http://winadmin.forret.com/) very useful. It works very well and you can set up a scheduler job to fire it at a predetermined time. To do this kind of thing in Inno Setup, use the code below:

function GetDoLogRotateTaskAddParams(Default: String):String;
begin
  // This will use the Operation System SYSTEM account for running the task
  result := '/Create /RL HIGHEST /F /TN "MODULE Log Rotate" /SC DAILY /ST 23:59 /RU "" /TR "' + ExpandConstant('{sys}\cscript.exe \"{#InstallHomeDir}\Tools\waRmZip.wsf\" C:\Logs /gt:1MB /ma:1 /md:\"C:\Logs\Backup-$DAY\" /q');
  if Length(LogDeleteTime) > 0 then begin
    result := result + ' /da:'+LogDeleteTime;
  end;
  result := result + '"';  
end;

This assumes that the waRmZip is installed using the following command:

Source: waRmZip.wsf; DestDir: {#InstallHomeDir}\Tools;

A Task scheduler can be created using the following "Run" command in Inno Setup:

Filename: "schtasks.exe"; Parameters: "{code:GetDoLogRotateTaskAddParams}"; Description: " Log Rotate Task"; Flags: runhidden; Check: GetDoLogRotateTask;

Automating FTP using WinSCP scripting

The tool WINSCP can be used for scripting ftp related tasks. These are especially useful in automating FTP related tasks in an installer. I am posting some sample code here which may be useful. The entire set of scripting capabilities can be obtained from the WINSCP website.

option batch on
option confirm off'
open ftp://:@
cd
put
close
bye

This script can then be invoked using the WINSCP.com command as below:

Winscp.com /console /script=

Inno Setup: Creating a Windows Scheduled Task

On some occasions, you may need to create a scheduled task in Windows using Inno Setup. Here is an easy way of doing this using the command "schtasks.exe". You should put that in the "Run" section in the script:

Filename: "schtasks.exe"; Parameters: "{code:GetDoModuleTaskAddParams}"; Description: "Module Job Task"; Flags: runhidden; Check: GetDoModuleTask;

function GetModuleTaskAddParams(Default: String):String;
begin
  // This will use the Operation System SYSTEM account for running the task
  result := '/Create /RL HIGHEST /F /TN "Module Job" /SC DAILY /ST 23:59 /RU "" /TR "' + ModuleSubDir + '\' + ModuleFileName;
  result := result + '"';  
end;

Thursday, February 7, 2013

Inno Setup: Deleting files during install

Deleting files selectively can be a daunting task with Inno Setup. Especially if the sequence presented is Backup the files, then delete them before the new ones are copied on top. In my previous posts I mentioned steps how to take a backup of files and here are some details on how to delete files from a particular folder or for that matter, based on a pattern. This may be necessary on several occasions:
  1. On a Windows system, especially some versions of the OS may be buggy. You copy new DLLs over existing ones and the system doesn't acknowledge - keeps using the old DLLs. Especially observed on some Windows 2003 Servers when the number of arguments in a function defined inside the DLL was varied and a new DLL copied over. This kind of thing happens rarely but when it strikes, you can spend hours star gazing. The solution is to delete the DLLs and copy new ones over.
  2. New files you are copying may be fewer and you dont want old junk in the folder.
Here it goes:

Step #1: Add a function call in the install directives. I will explain the arguments and why the complexity in a bit...

Check:DeleteFiles(ExpandConstant('{app}\\bin\*'), ExpandConstant('{#InstallHomeDir}\UnInstall\{#MyAppVersion}\Websites\\bin'));


Step #2: Create a function to delete the files. 

function DeleteFiles(pattern,dirToCheck:String):Boolean;
var
  i,alen:Integer;
begin
  
  // We will run ONLY if exists condition is satisfied
  if not DirExists(dirToCheck) then begin
    result := true;
    exit;
  end;

  alen := GetArrayLength(DirArray);
  for i := 0 to alen -1 do begin
    if CompareText(pattern, DirArray[i]) = 0 then begin
      result := true;
      exit;
    end;
  end;
  //MsgBox(pattern, mbError, MB_OK);
  DelTree(pattern, false, true, false);
  SetArrayLength(DirArray, alen+1);
  DirArray[alen] := Copy(pattern,0,length(pattern));
  result := true;
end;

Importance of the second argument: You should know by now that Inno Setup calls all the Check functions BEFORE it calls any File directives. This means that if you are making any backups, the Check function above which deletes the files can delete the files BEFORE the backup ever happens. The second argument tells the function where to look if the files have been backed up before they are deleted. If the backup folder exists, that means the files must have been copied (you can extend that check any way you want). The array used in the delete function is also important. The delete must be done ONLY once before the code is installed. The function may be called again by InnoSetup and if the delete kicks in again, you can kiss your installed files goodbye!