collectl (3.7.4-1) unstable; urgency=medium * New upstream release 3.7.4 * typo in $netFilt (should have been $netFiltIgnore) preventing any network from being included in totals when *-netfilt specified, but also made me rethink the way summaries are calculalted (see next item) * 2 more network types were discovered to be causing double counting in summaries, specifically vibr and vnets. since the exceptions occur at a far greater rate it was decided that rather than have a default list of those network types to exclude from the summaries, it makes far more sense to have a list with those that SHOULD be included as well as a mechanism for handling new summary types. This led to a reinterpretation of *-netfilt. see the man page and Network.html for more details * removed references to XC, which is no longer supported * use abs to generate path to exe, simpler and cleaner [thanks Jeff] * extended the way formatit is loaded and changed the order that collectl.conf is discovered, noting it should only effect people actually modifying code or moving things to non*standard locations. it IS now documented in Startup and Initialization. [thanks again, Jeff] * set max lines to read for diskstats to 20000 for those with real large disk counts where 10000 wasn't enough [thanks jean*marc] * very rare, but if doing timing and no hires present, $microInterval gets set to zero and the division by the interval blows up * finally remembered to remove -G and --group which were replaced by --tworaw * clarified description of -s defaults in manpage as well as adding a pointer to the online documentation on file naming [thanks rob] * added additional error message for when files match selection string but none contain *date-time.raw [thanks rob] * add support for newer kernel CPU stats: guest, guest_nice * now that 2.4 kernels no longer supported, make sure CPU stats contain at least softirq field * change headers with % to PCT and remove space, also remove whitespace in interrupt detail output for type and devices columns [thanks rob] * new switch --ALL, selects summary and detail data for all subsystems [thanks rob] * new switch --full, selects --verbose, always includes RECORD separator and includes which subsystem data is being reported with each interval in the RECORD header to make parsing easier for rob [thanks rob] * if you DON'T collect tcp data but want to play it back, variables weren't initialized to 0 and you get uninit variable warnings * if disk name ends with a digit (can only happen when manually changing disk filtering in either collectl.conf or with *-rawdskfilt, don't include in disk summary stats [thanks guy] * discovered a place where some numa counters go backwards! This MUST be a kernel bug but inserted code to mitigate and warn if it happens [thanks rob] * removed a line of code incorrectly initializing $HCAPosts[] because that is now a doubly indexed array [thanks Jeff] * discovered tap devices don't set default network speeds correctly and can cause 'bogus' messages so use default max * make 'Intrpt' header mixed case for CPU details, not all upper * new 3rd option for --top, allows one to display the top-n processes sorted by any column vertically, similar to playback mode, which in some cases can be very handy * if only 1 tcp subtype selected with --tcpfilt, was printing column header of ERR and I've no idea why. Changed it to TCP. * I didn't like --tcpfilt I by itself forcing --verbose so changed it to just being in the *-tcpfilt string will force it and updated man page as well since *-tcpfilt wasn't even documented in it * As warned I'm in the process of direct support for lustre and you should contact Peter Piela at TeraScala to get a copy of his lustre plugin. Therefore *sl is being removed as a default. To get collectl's native lustre support in daemon mode, you must add it to *s. Native support will be completely removed around the summer of 2015. -- Troy Heber Wed, 10 Sep 2014 14:13:36 -0600 collectl (3.7.3-1) unstable; urgency=low * Support for infiniband extended counters also allows multiple copies to run * Removed myrinet and quadrics support. Also dropped nvidia and sexpr as promised * New switch --cpufilt, allows display a subset of CPUs for machines with high cpu counts -- Troy Heber Tue, 15 Apr 2014 12:26:14 -0600 collectl (3.6.9-1) unstable; urgency=low * typo in network plot header loop resulted in infinite loop * remove $int/secs from numa hit rate calc AND add more precision to its output [thanks stig] -- Troy Heber Fri, 18 Oct 2013 08:55:57 -0600 collectl (3.6.8-1) unstable; urgency=low * new flag $exportComm must be set in gexpr/ganglia so that they won't generate an error if run without -f or -A [thanks tom] * new switch: --intfilt allows filtering of interrupts * always log messages of type F/E to syslog in daemon mode even if -m is not set [thanks again, tom] * wasn't dealing correctly with missing whitespace after network name in /proc/dev/net in initRecord() [thanks andy] * updated init.d script for suse per the maintainer's instructions [thanks tom] * extra spaces were being printed in plot mode for tpc stats * added entry to envrules.std to deal with intel Phi Co-Processor * debian init.d script now does 'exit 1' if status reports 'not running' * rawnetignore switch wasn't working correctly * found/fixed some subtle problems with --procanalyze as well as some cleanup * need to ignore first sample after initializing summary arrays * need to init summary hashes for thrutime and accumT because get uninit var in print routine is only a single process entry * found a typo in procAnalyze() to a $usecs which wasn't being used! * added error check to make sure --procanalyze with -P requires -s * added a little more debugging output for -d128 * discovered dynamic disk/network detail names for interactive mode were not being reported correctly. sounds a lot worse than it is because this is typically not done very often nor are disks/networks very dynamic except in large, virtualized environments such as clouds * add to list of devices to exlude from network summary data: tap, dp and nl, which are associated with openstack cinder. remember you can always add more to that list with --netfilt * $lastHour was never referenced and dayInit() called every time a log was created so fix logic to update $lastHour correctly AND call initDay() one time and do it before newLog() called. * closed a couple of file handles that were left open and reportedly causing some defunct processes with -sx. [thanks brian] * fixed bug in lustre stats recording [thanks roland] * clarified --showsubopts text about disk and network filters in that they apply to both summary and detail data output * fixed problem with --import and --stats * --statsopt a didn't work because when changed some internal logic missed changing a test of $timestampFlag to $timestampCounter[$rawPFlag] and so now $timestampCount can be removed entirely * clear $firstpass after 1st pass during playback * make sure filename initialized before calling loadConfig so if there is an error logsys() doesn't get an undefined var warning * to be safe, remove any quotes on net/dsk filters in case included by mistake in DaemonCommands string * tightened up tests to see if daemonized collectl already running * if no hiRes::Time, fudge the value of $microInterval based on -i [thanks Domi] * new --procOpt k, removes known shells from process listing with -sZ, currently set to /bin.sh, /usr/bin/perl, /usr/bin/python and python -- Troy Heber Wed, 16 Oct 2013 09:32:56 -0600 collectl (3.6.7-1) unstable; urgency=low * set network speed for vnets to '??' so they'll use $DefNetSpeed for bogus checks since the kernel hardcodes then to 10 which makes no sense * code to print brief totals for -st wasn't include in a conditional so you'd always get extra columns of output when *st was NOT included * needed to initialize numaMem->{lock} for cases where user selects -sM and no data collected * added randomize and align switches to graphite module and align switch only to gexpr.ph since gexpr uses current times in messages * added escape switch to graphite to allow one to change the dots in hostname * change to suse startup script to look in /usr/sbin instead of /usr/bin * added debug mask of 16 to lexpr to help test x= switch * can now use commas OR colons with lexpr,x= though commas preferred and colons may go away * added disk qlen, wait, svctime and util to lexpr * it was pointed out that in getExec() I'm initializing $oneline instead of $oneLine * for debian init script, reverse logic for running start-stop-deamon with *test so it will work with buxybox too * new switch: --cpuopts z (the only option) which suppresses lines of idle activity from detailed stats * when purging imported detail plot data, only do so if file had changed * when playing back multiple files, do NOT try to process a new file that has not yet seen the end of the current interval ($timestampCound==1) * fix SuSE init.d script -- Troy Heber Sat, 23 Mar 2013 11:35:31 -0600 collectl (3.6.5-1) unstable; urgency=low * was not updating new major/minor numbers for a disk when they changed so got stuck in a loop which kept disk maj/min changed every interval * new -r option to purge older .log files, def=12 months * fixed DaemonCommands to preserver order so you can override anything by adding on the right side of it * new 'align' switch added to lexpr so default is NOT to align to whole min * for -sE do not convert negative temperatures [thanks kevin] * add error handling to 'print' in logmsg * vmstat needs to set $sameColsFlag to make header pagination work with -p * new graphite switch f, use fqdn for host [thanks Bryant] * when lexpr called with x= it needs to set summary data flag in case nothing else is being reported, otherwise timestamps print after the data instead of before * lexpr typos: $tcpError, $udpError and $icmpError should not be singular * timestamp wasn't being updated for -sD because it was specified in $dskdetFormat * explicitly close logs before opening new ones in the hope that the occasionally corrputed file problems with gunzip will go away * tcp 'last' variables weren't correctly initialized and so was printing bad data on first line of output * modified lexpr, gexpr and graphite such that when i= is used, to align sending on whole minute boundaries which is particularly useful with rrd * merged snmp and tcp stats under -st and changed export routines to show summary error counts for *st. removed snmp.ph from kit. summaries (based on *-tcpfilt) as does brief format * correctly deal with dynamic disks/networks instead of pulling names from header, get them from raw file when discovered * simplify code that deals with changed disks, now that more cleanly handled * replace runtime calls to 'die' with calls to syslog * readS was still left in INSTALL! [thanks gavin] * added system boot time to header * new values for procopts s/S to show process start times * graphite.ph now prints loadavgs to 2 decimal places [thanks brandon] * extended lexpr,x= functionality to also call an init routine * initFormat now returns entire header! * if nothing returned from an import module on a printVerbose or printPlot call for detail data do not call printText() since it will screw up colmux and plot detail file with empty lines * new --rawdskignore AND --rawnetignore because sometimes easier to specify a pattern of things to ignore * removed restriction for running as root to get network speeds via ethtool by looking in /sys/devices now * slight change to way the disk queue depth is being calculated to provide better accuracy [thanks ken] * new --dskopts f reports disk details with some fractional values * always calculate disk details even when only doing -sd since a plugin might want to get at them * new graphite switch b, will cause output to be prefaced by a specified string [thanks justin] * slight change to s= functionality for lexpr, gexpr and graphite: no arguments will disable all but imported data, allowing you do log *s data to files sending over socket * need to give other routines (specifically --import) access to the lexpr interval by declaring it with 'our' * had to change the way lexpr/gexpr/graphite do min/max/avg since they were using a positional index to track intermediate values when clearly a hash is required for cases where not all intervals contain same elements * -P and --plotflag had different effects on $headerRepeat because prior to calling getopts I was peeking ahead for an ARG of *P and not including --plo [thanks devilized] * gexpr module has wrong units for network packets and with 'g' modes had to multiply kb counts by 1024 to convert to bytes, which is the units for these that ganglia uses [thanks, trevor] * clean up handling of missing ipmitool and root access [thanks trevor] * finally remembered to remove readS from the kit [thanks joseba] * when filtering a process by the fill path with 'f', never include collectl itself * documented utime in manpage * if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed messages * new switches, --rawdiskfilt and --rawnetfilt, allow one to filter disks/nets at time of data collection so they never appear in raw file * added call to IntervalEnd() (if it exists) for --import * add option timeout to --address when connecting back to explicit address * moved code that deal with fractional intervals and !HiRes closer to other interval processing * added 'strict' to snmp module as well as 'help' option: snmp,h * fixed problems with --import * if --import is used to generate detail data with -f and -P not specified, collectl throws an error trying to close the detail log which clearly hasn't been created * when using interval other than the defaul AND -s-all, blank lines are printed for standard intervals which don't have imported data. this applied to brief, verbose AND detail data * added some more systems to envrules: Proliant SL230/SL250 Gen 8 and SE1170s -- Troy Heber Wed, 13 Feb 2013 10:49:47 -0700 collectl (3.6.3-1) unstable; urgency=low * New upstream release 3.6.3 * finally remembered to remove readS from the kit * when filtering a process by the fill path with 'f', never include collec itself * documented utime in manpage * if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed messages * new switches, --rawdiskfilt and --rawnetfilt, allow one to filter disks/nets at time of data collection so they never appear in raw file * added call to IntervalEnd() (if it exists) for --import * add option timeout to --address when connecting back to explicit address * moved code that deal with fractional intervals and !HiRes closer to othe interval processing * added 'strict' to snmp module as well as 'help' option: snmp,h * fixed problems with --import * if --import is used to generate detail data with -f and -P not specifi collectl throws an error trying to close the detail log which clearly hasn't been created * when using interval other than the defaul AND -s-all, blank lines are printed for standard intervals which don't have imported data. this applied to brief, verbose AND detail data * added some more systems to envrules: Proliant SL230 /SL250 Gen 8 and SE1 * fixed serious bug introduced a number of versions ago, which during playback of multiple files and specifying date/time caused collectl to continue reading first timestamp in each file and generating 'uninit variable' errors. not harmful, but inefficient and ugly! * added exit codes of 0/1 to all the exit points * moved help text for --stats from basic to extended * found $file=~/rawp/ near line 1440 clearing $1, $2 and $3 and so $prefix $fileDate and $fileTime were not getting set correctly * clarified 'No files processed' message to be a little more explicit * broaden where collectl looks for lustre modules and also fixed a typo of $lustops to $lustOpts * procAnalize incorrectly totaling fault totals instead if interval values * optimize new pid processing with --procfilt * add new pids to pidSkip{} as appropriate * undef pidSkip{} whenever pids wrap * added hello.ph and graphite.ph to INSTALL * was incorrectly setting DiskFilterFlag to 1 all the time, even when not overridden in collectl.conf. while not a bug, it does cause a slight increase in overhead -- Troy Heber Thu, 24 May 2012 09:41:42 -0600 collectl (3.6.1-1) unstable; urgency=low * New upstream release 3.6.1 * removed --ssh switch, making detecting the parent going away the default behavior * added switch --nohup which will allows collectl to continue running if parent exits, which is more consistent with how *-nohup itself works * in logmsg ONLY write to STDERR when attached to a terminal * serious problem when using --tworaw and a flush interval < that for the process data occurs because newer versions of zlib will fail if you try to flush to a file that has not been updated. since I don't know which version of zlib this started happening in and feel this is a relatively rare case, we're just rejecting this combination regardless of zlib version. I do have an email out to the zlib author and if I ever get to the bottom of this will be ble to relax this restriction. * use getimeofday() for timestamps in logmsg() * enhanced timing parameters when -i0 used. if specified user 2nd/3rd parameter as ratio to first making it possibily to measure loads of different rations other than 1:6:30. * discovered --import was missing from man pages and so added it * when playing back a file, set $verboseFlag if user specified --verbose but NEVER clear it * experimental import: snmp, see http://collectl.sourceforge.net/Snmp.html for details * printf in record() blows up if formatting chars in command string! [thanks mike] * added accumulated time as a --top sort option * changed formatting of accumulated time in process output to simply be hh:mm:ss or mm::ss.ss when less than an hour to be more in line with top * new swithes, --stats and --sumstats report stats in brief mode, the latter only summary data * during playback need to check $numProcessed before reporting none were processed * stats reporting logic wasn't processing 1st file, checking for $numProcessed>1 * removed -oA and replaced/extended functionality with --stats/--statopts * wasn't allowing --procopts playing back process data unless -sZ which was silly * subtle problem found: illegal 'last' in pidNew() because file disappeared between initial -e and trying to open it a few usecs later! can't exit a sub via last so changed to return(0) * our friends at OFED slightly changed the output of perfquery again [thanks frederic] -- Troy Heber Mon, 05 Mar 2012 09:33:15 -0700 collectl (3.6.0-1) unstable; urgency=low * New upstream release 3.6.0 * New subsystem: -sM. memory detail now shows numa info * Bunch of new/updated switches: --dskopts, --netopts, --xopts, --extract * Enhanced disk/network output filtering, allowing for exclusion of instance names * Added new sections to documentation * Finally dropped support for 2.4 kernels -- Troy Heber Fri, 16 Dec 2011 10:09:43 -0700 collectl (3.5.1-1) unstable; urgency=low * Initial Release (Closes: #535233) -- Troy Heber Fri, 29 Jul 2011 09:18:23 -0600