Overview of CCM to Splunk Conversion Script

Posted by David Veuve - 2010-10-05 10:34:28
Probably the hardest working component of my Splunk CDR project is the Perl script that converts CCM logs to Splunk logs. In addition to just doing the base conversion of CCM CSV format logs to Splunk-readable logs, this script adds a significant amount of additional detail to the log files to make reporting easier.
Its core functionalities are as follows:
  • Read and Remove CCM logs

    CCM Logs will end up being a long list of files in a directory, where SFTP drops them. After processing, the script will either leave the files in place (useful when testing, where you regenerate your /var/log file 50 times per day), move them to another place (if you want to retain the originals), or delete them altogether.

  • Convert CSV logs

    Line one of the CCM log format contains a list of the 70+ fields used in the file. The data types for those fields are in line 2, and then lines 3+ are log entries for each phone call that has completed since the last log file was created (the interval of log file creation is set in CCM Servicability). The Perl script takes in the first line, parses out the field order, then ultimately prints out in KV format (callingPartyNumber=5555551234).

  • Write out Splunk-Compatible logs

    In writing this script, I opted to have it output to a normal syslog-style file, rather than any of the other Splunk input options. The rationale for this was purely environmental � in our organization, our backups and design are centered around log files, rather than Splunk buckets. The script has options to either overwrite the destination file every time (ideal for environment, with oneshotting inputs) or appending (appropriate for production)

  • Normalize Numbers

    It is essential to normalize numbers for reporting. This converts your internal extension of 2235 into 5555552235. That way, when you do searches, you can just search where the number 5555552235 is involved and get both internal/outgoing/incoming cals. Additionally, this process resolves codes that aren�t normally valid numbers; in our environment, we ended up with a large number of #555 numbers � these were for our Cisco Contact Center system.

  • Look Up Users

    For each phone call with a duration > 0, we should have a local user. This can take a few different formats:

    • Internal->Internal

      These calls will have two local users. The script outputs LocalUser=�Joe Smith�, LocalUser=�Marie Smith�. The transforms.conf will read this in as a multi-value pair, so you can search for either user individually, and get the call.

    • External->Internal, Internal->External

      These are obvious � our local user is the only local one.

    • External->External

      In the event that a user forwards incoming calls to an external number, the Local User is going to be the extension specified in lastRedirectDn.

    • External->Voicemail

      Arguably, �Voicemail� could be considered to be the ultimate destination, but I think it is more useful to have the real user listed here. Again, this will be lastRedirectDn.

  • Look Up User

    Once the script determines a normalized number to look up, the next step in the process is to actually look up the user. The script will currently consult the following sources:

    • Active Directory

      Configured with settings in the conf file, this will �look up given attributes and gather line information.

    • CSV File

      The script will take a CSV file with static hosts. This is useful for IPCC ports, voicemail ports, etc. Whereas the Active Directory processing will only take information specific to a single user, the CSV file processing can take a regex that will match given ports. It also has a �Redirect� flag that will indicate whether the Name should be used, or whether the script should do a lookup for the lastRedirectDn. This CSV file is where you will define your Voicemail extensions, so that the script knows to lookup the real user, instead of listing the Voicemail user as the LocalUser

    • CCM System

      I have the infrastructure in place to consult a CCM System, but the code isn�t written for this, as we don�t need it in our environment. If there is ever a need to add it, it can probably be added very easily.

    • Determine Call Direction

      Inbound or Outbound? Billed differently, so the distinction is important.

  • Determine Call Type

    In the US Telcom world (at least out here in San Francisco), there are the following types of calls, all of which can have different billing attached:

    • Internal: Doesn�t leave the phone system
    • Zone 1: 0-12 Miles
    • Zone 2: 12-15 Miles
    • Zone 3: 16+ Miles
    • IntraLATA: Within your LATA (Local Area Telephone Access)
    • InterLATA: Beyond your LATA, but within the state
    • InterState: Between States
    • International: Between Countries
    • Emergency: 911
    • Outgoing Toll Free: Several different extensions, with 800, 888, 877 being best known
    • Incoming Toll Free: If you have a toll free number, and someone calls it. (Interestingly, I haven�t found a way to distinguish an incoming toll free call � this should be added to the script at some point.)
    • Reverse Billing Numbers: This is arbitrary from the perspective of most end users, but could be important in some scenarios. Certain area codes abuse phone law and charge carriers very high connect fees, upwards of 40 cents per minute, despite being within the US. Many free conference services utilize this functionality, and for that reason, some carriers (Speakeasy, Google Voice) will block access to RBN numbers. Recognition of RBN numbers can be disabled via the conf file.
  • Look Up Area Codes

    All US normalized calls (well, most) should have an area code attached to it. That can be localized to be Chicago, IL or San Francisco, CA, which provides good context for the reporting (in particular, the destination state is very valuable).

    Note: This requires a compatible database. In my test environment, I found one very easily by Googling, but chose not to include it with the package due to legal concerns.

  • Look Up International Numbers

    The script will also resolve an international call, and determine if the call is a mobile phone or not. This is very useful in controlling the cost of usage.

    Note: This requires a compatible database. There are a variety available (both for cost and for free, with varying accuracy), but I was unable to include a copy.

As you can see, there are a lot of tasks taken on by this Perl script. This will probably expand a bit over time, but I think that for current needs, it is fairly complete. We�ll go into how to customize the configuration settings in another post.