#
# README  1.7   07/05/17
#

- IMPORTANT: It is strongly encouraged that you run cediag(1M) immediately
  upon installation in order to accept the license.  cediag cannot be
  run from cron(1M) or from explorer without the license being accepted.
  The postinstall script will remind you to run cediag to accept the
  license as soon as possible.

- The cediag tool applies Sun's DIMM Replacement Policy to Sun systems
  to recommend when DIMMs should be replaced and under what timeframe.

- There are two packages in this distribution, a 32-bit package (SUNWcest)
  and a 64-bit package (SUNWcestx).  The SUNWcest package must be installed
  prior to the SUNWcestx package.  pkgadd(1M) will enforce this.

- To use the tools in this package, be sure to add /opt/SUNWcest/bin to
  your PATH and /opt/SUNWcest/man to your MANPATH.

- In offline modes, cediag analyzes the /var/adm/messages file for CE messages
  from Solaris.  It is therefore critical that CE error reporting not be turned
  off in your system.  Be sure that ce_verbose_memory (or ce_verbose in early
  versions of Solaris 8 and 9) is not being set to 0 in your /etc/system file.

- cediag requires perl(1) version 5.00503 or later to work correctly.  cediag
  provides its own symbolic link from /opt/SUNWcest/bin/perl to /usr/bin/perl.
  If you cannot install perl version 5.00503 or later as /usr/bin/perl, then
  you can remove the link at /opt/SUNWcest/bin/perl and make a new link in its
  place to the location where you can install this version of perl.

  If you try to run cediag without a valid perl(1) binary or link at
  /usr/bin/perl, you'll see an error similar to this:

  # ls -l cediag
  -r-xr-xr-x   1 root     bin       131385 Nov 11 20:18 cediag
  # ./cediag
  ksh: ./cediag:  not found

- Though these packages will install on versions of Solaris 7 and earlier,
  it is important to note that cestat(1M), also contained herein, will not
  function on these versions.  cestat only works on Solaris 8 with
  108528-24 and later; and on Solaris 9 with 112233-11 and later.

  If you try to run cestat on Solaris 7 or earlier, you're very likely
  to see an error much like this:

  # ./cestat
  ld.so.1: ./sparcv9/cestat: fatal: libc.so.1: version `SUNW_1.19' not found (required by file ./sparcv9/cestat)
  Killed

NEW NOTES:

- This release of cediag adds support for rule 4b, a new rule which
  recognizes an additional pattern of CEs which can create a UE if
  they occur simultaneously.

- A new feature has been added to this release of cediag.  While not an
  official rule, it is called rule 5 supplemental.  In some instances,
  Solaris is not able to retire memory pages that have experienced CEs.
  In these situations, if cediag detects more than 120 CEs on the same
  DIMM and bit, at the same AFAR, within 24 hours without the page being
  retired, cediag will recommend that the DIMM be replaced.  This feature
  is to reduce the excessive buildup of messages in the messages files and
  any concern caused thereby.

- cediag has had several performance enhancements in this release.
  . Prescreening: In earlier releases of cediag, every line from the
	messages files was compared against numerous regular expressions (RE)
	to determine whether it related to the memory DIMM replacement rules.
	Now, every line is matched against a single pre-screen RE to determine
	whether it will match one of the REs related to the rules.  Message
	lines which don't pass the pre-screen are skipped, greatly reducing
	the number of RE comparisons the cediag performs.

	Prescreening is only performed in "live mode" (see the man page) and
	is turned on by default.  It can be turned off with '-N'.

  . Checkpointing: Earlier releases of cediag scanned every line of every
	messages file looking for data related to the DIMM replacement rules.
	Checkpointing allows cediag to save the time of the last message it
	analyzed so that it can pick up where it left off.  Actually, it
	starts scanning with the messages printed 24 hours immediately prior
	to that last message scanned so that CE history from the previous 24 
	hours can be recovered for those rules with a 24 hour time requirement.

	When checkpointing is turned off, cediag will print out DIMM
	replacement recommendations based upon all messages in the system,
	every time it is run.

	When checkpointing is turned on, cediag will print DIMM replacement
	recommendations for any rules which were triggered in the 24 hours
	prior to the last run, and for any rules which have been triggered
	since the last run.  When using checkpointing, cediag may trigger
	its rules on different CEs than it would without checkpointing.
	The two modes of operation (checkpointing and non-checkpointing)
	should not be out of sync in this manner by more than one or two CEs.

	Checkpointing is only performed in "live mode" (see the man page) and
	is turned off by default.  It can be turned on with '-P'.

  . Automatic priority adjustments:  If the amount of data in your
	/var/adm/messages files is so large that it takes cediag more then
	five minutes to analyze it, cediag will automatically start lowering
	its execution priority to keep from taking CPU resources from the
	applications running on the system.

	This feature is only enabled in "live mode" (see the man page).

cediag bugs fixed in this release:
6444328 cediag uses uninitialized variables when -c arg is not provided
6346503 cediag needs to implement rule 4B - uniboard, daktari, excalibur
6245432 wrt pages retired: do not rely on messages* but try to use kstat-p.out
6424378 CEDIAG returning warning messages about file more recent when it is not
6390568 cediag support for the UltraSparc IIIi+ is needed
6319316 cediag -v -e fails to report retired pages wnich are present in messages files
6250522 cediag not interpreting RCE/FRC errors correctly
6249670 cediag doesn't recognize 118558 as a valid KUP for Solaris 9
6204402 usIIIi RCE and RUE events need to be detected
6244964 cediag can attempt to recognize failed page retires
6553544 cediag performance needs improvement

---------------------------------------------------------------------------
- These notes are for cestat(1M).

  cestat(1M) is a tool designed to display statistics for the memory page
  retire (MPR) functionality recently introduced into Solaris 8 and Solaris 9.
  If you run cestat independently of cediag, you should keep in mind that
  cestat works only on MPR capable systems and therefore is providing
  analysis and advice only for rule 5 of the DIMM Replacement Policy.
  Since cediag takes into account the other rules of the Replacement Policy,
  if cediag's advice differs from advice provided by cestat, cediag's advice
  should supercede cestat's advice.

  Using cestat, you may come across bug 4893666.  This is mentioned here
  to raise awareness of this bug to prevent confusion during the use of
  cestat.

  When Dynamic Reconfiguration (DR) is used to unconfigure memory from a
  system, Solaris does not always reduce the number of retired pages, if
  that board contained retired pages, nor does it clear the CE statistics
  against any DIMMs on that board that may have experienced CEs.

  Bug 4893666 documents the case where the retired page count is not
  correctly reduced when the kernel memory cage board is unconfigured
  using DR.  The fix to this bug is found in patch 117350-03 for
  Solaris 8.  For Solaris 9, it is fixed in 116668-03 and later and in
  117124-01 and later.

  zerocecnt(1M), which is delivered with cestat and cediag, clears
  CE statistics from the kernel.  Whenever DR is used to unconfigure a
  system board which has DIMMs which have experienced CEs, zerocecnt
  should be used to clear the CE statistics for those DIMMs.  See the man
  page for zerocecnt for details on its use.


KNOWN BUGS with cestat or cediag:
6191134  cestat displays negative CE statistics on Solaris 8
5087648  zerocecnt should eliminate duplicates


---------------------------------------------------------------------------

NetConnect and cediag

By following these steps, it is possible to have NetConnect interface with
cediag.  This section was taken from the Workaround section of bug 6202798.


Perform the following steps to add a custom pattern for the cediag alarm:

1. Log into the monitored system as the root user.

2. Use a text editor to open the customer.patterns file in the
/etc/opt/SUNWsrshp/ directory.

3. Change the keyword and regular expression in an unedited pattern from NEVER,NO PATTERN to Customer Alarm,cediag,cediag: advice

For example, 1.7.3,Customer Alarm,cediag,cediag: advice

4. Use a text editor to open the ssha_pvr_config.cfg file in the
/etc/opt/SUNWsrshp/ directory.

5.Locate the Customer-Defined Rules section and uncomment the following lines
by removing the # sign. You must also increment the filemon.number value in
the ssha_pvr_config.cfg file so that the lines appear similar to the following:

filemon[3].file.path = /var/adm/messages

filemon[3].rules.path = /etc/opt/SUNWsrshp/customer.patterns

6. When cediag is installed, and the customer.patterns rule is
added for the cediag error messages, the is no need to generate
alarms for individual corrected error messages.

Use a text editor to open the base.patterns file in the /etc/opt/SUNWsrshp/
directory. Insert a "#" in front of the following lines to disable these alarms:

1.4.1,Memory Warning,Corr,Corrected Memory Error on .* is Intermittent
1.4.2,Memory Warning,Corr,Corrected (Ecache|Mtag) Error on .* is Intermittent
1.4.3,Memory Warning,NOTICE,NOTICE: [[]AFT0[]] ([A-Z]+|.*([^C].|C[^E])[)]) Event1.4.4,Memory Warning,Softerror,Softerror: Intermittent ECC
1.4.5,Memory Warning,Corr,Corrected Memory Error on .* is Persistent
1.4.6,Memory Warning,Corr,Corrected (Ecache|Mtag) Error on .* is Persistent
1.4.7,Memory Warning,Softerror,Softerror: Persistent ECC
1.4.8,Memory Warning,WARN,WARNING: [[]AFT0[]] [0-9]+ soft errors in less than
1.4.9,Memory Warning,WARN,WARNING: [[]AFT0[]] Sticky Softerr.*encountered

For example:
#1.4.1,Memory Warning,Corr,Corrected Memory Error on .* is Intermittent

7. If you want to receive email or pager notification of cediag events, see
"Notifications" in the SunSM Remote Services Net Connect 3.1.1 Customer
Operations Guide to set up hardware alarm notification.

For more information on custom Net Connect alarms, see "Adding Customized
Hardware Alarm Rules" in the SunSM Remote Services Net Connect Customer
Operations Guide.


