Commit Graph

105 Commits

Author SHA1 Message Date
Aaron Miller
4fb6fde74d EDAC: Expose per-DIMM error counts in sysfs
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.

EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.

Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-19 10:29:40 +01:00
Ben Dooks
628ea92f09 EDAC: Make dev_attr_sdram_scrub_rate static
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:

  drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
  'dev_attr_sdram_scrub_rate' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-06 15:54:16 +01:00
Mauro Carvalho Chehab
78d88e8a3d edac: rename edac_core.h to edac_mc.h
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Borislav Petkov
bba142957e EDAC: Correct channel count limit
c44696fff0 ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.

This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with

  2c1946b6d6 ("EDAC: Use static attribute groups for managing sysfs entries")

 [ without that, we're exploding in edac_create_sysfs_mci_device()
   because we're dereferencing out of the bounds of the
   dynamic_csrow_dimm_attr array. ]

Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.

We still need to check against the number of channels present on the MC
first, as Thor reported.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com>
Tested-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org> # 4.2
2016-06-16 10:06:35 +02:00
Tony Luck
ab67b6c22d EDAC: Fix used after kfree() error in edac_unregister_sysfs()
Code flow looks like this:

  device_unregister(&mci->dev);
   -> kobject_put+0x25/0x50
    -> kobject_cleanup+0x77/0x190
      -> device_release+0x32/0xa0
	-> mci_attr_release+0x36/0x70
	  -> kfree(mci);
  bus_unregister(mci->bus);

Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 13:23:54 +02:00
Borislav Petkov
d4538000ca EDAC: Remove edac_get_sysfs_subsys() error handling
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:41 +01:00
Borislav Petkov
733476cf20 EDAC: Rip out the edac_subsys reference counting
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.

Do that and rip out all the code around it, thus simplifying the whole
handling significantly.

Move the edac_subsys node back to edac_module.c.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:39 +01:00
Borislav Petkov
12e26969b3 EDAC, mc_sysfs: Fix freeing bus' name
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:

  .loc 1 1108 0
  movq    672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
  .loc 1 1109 0
  popq    %rbx    #

  .loc 1 1108 0
  movq    (%rax), %rdi    # _7->name,
  jmp     kfree   #

and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.

Fix that by saving the name ptr before doing stuff to string it points to.

  general protection fault: 0000 [#1] SMP
  Modules linked in: ...
  CPU: 4 PID: 10318 Comm: modprobe Tainted: G          I EN  3.12.51-11-default+ #48
  Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
  task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
  RIP: 0010:[<ffffffffa019da92>]  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
  RSP: 0018:ffff88030da3fe28  EFLAGS: 00010292
  RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
  RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
  RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
  R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
  R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
  FS:  00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
  Stack:
  Call Trace:
    i7core_unregister_mci.isra.9
    i7core_remove
    pci_device_remove
    __device_release_driver
    driver_detach
    bus_remove_driver
    pci_unregister_driver
    i7core_exit
    SyS_delete_module
    system_call_fastpath
    0x7fc9bf426536
  Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
  RIP  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
   RSP <ffff88030da3fe28>

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: <stable@vger.kernel.org> # v3.6..
Fixes: 7a623c0390 ("edac: rewrite the sysfs code to use struct device")
2015-12-11 16:56:38 +01:00
Tan Xiaojun
30f84a891b EDAC: Use edac_debugfs_remove_recursive()
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.

This can be triggered by the following sequence:

1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under <debugfs_mountpoint>/edac/ like
   "fake_inject", for example.

Removing edac_core then, causes a NULL pointer dereference.

Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-14 18:50:32 +02:00
Toshi Kani
d0c9c93019 EDAC: Don't allow empty DIMM labels
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-28 16:39:05 +02:00
Toshi Kani
438470b84c EDAC: Fix sysfs dimm_label store operation
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:

 1) A newline-terminated input string causes redundant newlines:

  # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  test

  #  od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 164 145 163 164 012 012
            t   e   s   t  \n  \n
  0000006

 2) The original label string (31 characters) cannot be stored due to
    an improper size check:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat /sys/bus/mc0/devices/dimm0/dimm_label

  # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
   0000000 012 012
            \n  \n
   0000002

 3) An input string longer than the buffer size results a wrong label
    info as it allows a retry with the remaining string:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  _TEST

Fix these issues by making the following changes:
 1) Replace a newline character at the end by setting a null. It also
    assures that the string is null-terminated in the label buffer.
 2) Check the label buffer size with 'sizeof(dimm->label)'.
 3) Fail a request if its string exceeds the label buffer size.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Robert Elliott <elliott@hpe.com>
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:45:59 +02:00
Toshi Kani
1ea62c59c8 EDAC: Fix sysfs dimm_label show operation
After

  7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")

sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.

  [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

  [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.

  [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
            C   P   U   _   S   r   c   I   D   #   0   _   H   a   #   0
  0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
            _   C   h   a   n   #   0   _   D   I   M   M   #   0  \0
  0000037

Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().

Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:19:21 +02:00
Borislav Petkov
7ac8bf9bc9 EDAC: Carve out debugfs functionality
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.

No functionality change.

Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 12:29:46 +02:00
Tony Luck
c44696fff0 EDAC: Remove arbitrary limit on number of channels
Currently set to "6", but the reset of the code will dynamically
allocate as needed.  We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:22 -03:00
Alexey Khoroshilov
c6b97bcf8e EDAC: Properly unwind on failure path in edac_init()
edac_init() does not deallocate already allocated resources on failure
path.

Found by Linux Driver Verification project (linuxtesting.org).

 [ Boris: The unwind path functions have __exit annotation but are being
   used in an __init function, leading to section mismatches. Drop the
   section annotation and make them normal functions. ]

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:13:07 +01:00
Takashi Iwai
4e8d230de9 EDAC: Allow to pass driver-specific attribute groups
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups.  This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.

edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:07:41 +01:00
Takashi Iwai
2c1946b6d6 EDAC: Use static attribute groups for managing sysfs entries
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.

This simplifies the code a lot and avoids the possible races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:06:58 +01:00
Borislav Petkov
f11135d87d EDAC: edac_mc_sysfs: Make stuff static
Fix sparse warnings.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:49:04 +01:00
Junjie Mao
1bf1950c4e EDAC: Fix the leak of mci->bus->name when bus_register fails
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.

Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:38:45 +01:00
Jim Snow
50043e257c sb_edac: Fix off-by-one error in number of channels
This prevented edac sysfs code from properly handling 6 channels
per memory controller.

Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Aristeu Rozanski
7b8278358c edac: add DDR4 and RDDR4
Haswell memory controller can make use of DDR4 and Registered DDR4

Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:55 -03:00
Mauro Carvalho Chehab
c897df0e2d Linux 3.14-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTE+9XAAoJEHm+PkMAQRiGrMQIAKI2V49Kj8WlnwGchFvsbGJB
 SLALwNi33T/IBKdZRhrfryBu02Zj7eVvZ2ML35dJEnmF88O+dJBDMTkKV1xalrip
 mtkBrjUnfAI04fq/daLQ1TsAy4qqlra5tSTuDCw8ILOnGPwT0VydIEHNdtmoUIfw
 xlZLxHzny1MslZ78d7uR/cUnV9ylKRRajWzfw1HT8hL51fCt8nRWY0sCvwvl+kMJ
 LsK+6I7mHDUuzA7QBmBI+dhzQgos5+JkkrnpmqHAqwmIh+AI3ksmjUCQ4dM7owrO
 IvEx+ZNDqxAdLcm1WAxATNfxddFXHc62JTvKuuKqTVWuaxVfK1Aqt8MjDMIPeAQ=
 =yV5u
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc5' into patchwork

Linux 3.14-rc5

* tag 'v3.14-rc5': (1117 commits)
  Linux 3.14-rc5
  drm/vmwgfx: avoid null pointer dereference at failure paths
  drm/vmwgfx: Make sure backing mobs are cleared when allocated. Update driver date.
  drm/vmwgfx: Remove some unused surface formats
  MAINTAINERS: add maintainer entry for Armada DRM driver
  arm64: Fix !CONFIG_SMP kernel build
  arm64: mm: Add double logical invert to pte accessors
  dm cache: fix truncation bug when mapping I/O to >2TB fast device
  perf tools: Fix strict alias issue for find_first_bit
  powerpc/powernv: Fix indirect XSCOM unmangling
  powerpc/powernv: Fix opal_xscom_{read,write} prototype
  powerpc/powernv: Refactor PHB diag-data dump
  powerpc/powernv: Dump PHB diag-data immediately
  powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
  powerpc/ftrace: bugfix for test_24bit_addr
  powerpc/crashdump : Fix page frame number check in copy_oldmem_page
  powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
  kvm, vmx: Really fix lazy FPU on nested guest
  perf tools: fix BFD detection on opensuse
  drm/radeon: enable speaker allocation setup on dce3.2
  ...
2014-03-11 06:55:49 -03:00
Borislav Petkov
9da21b1509 EDAC: Poll timeout cannot be zero, p2
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
2014-02-14 10:40:29 +01:00
Prarit Bhargava
79040cad3f drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
If you do

  echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec

the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.

  WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
  list_add corruption. prev->next should be next (ffff8808291dd1b8), but was           (null). (prev=ffff8808286fe3f8).
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    __list_add+0xac/0xc0
    __internal_add_timer+0xab/0x130
    internal_add_timer+0x17/0x40
    mod_timer_pinned+0xca/0x170
    intel_pstate_timer_func+0x28a/0x380
    call_timer_fn+0x36/0x100
    run_timer_softirq+0x1ff/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290

  kernel BUG at kernel/timer.c:1084!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    run_timer_softirq+0x245/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290
  RIP   cascade+0x93/0xa0

  WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Workqueue: edac-poller edac_mc_workq_function [edac_core]
  Call Trace:
    dump_stack+0x45/0x56
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_null+0x1a/0x20
    __queue_delayed_work+0xed/0x1a0
    queue_delayed_work_on+0x27/0x50
    edac_mc_workq_function+0x72/0xa0 [edac_core]
    process_one_work+0x17b/0x460
    worker_thread+0x11b/0x400
    kthread+0xd2/0xf0
    ret_from_fork+0x7c/0xb0

This patch adds a range check in the edac_mc_poll_msec code to check for 0.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Mauro Carvalho Chehab
37e59f876b [media, edac] Change my email address
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-02-07 08:03:07 -02:00
Rashika Kheria
95285933d0 EDAC: Mark edac_create_debug_nodes as static
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.

Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:59:59 +01:00
Ingo Molnar
c874b6ba55 An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSC6mSAAoJEBLB8Bhh3lVKhKYQALt3lklah1BpZIqyTiLqtizN
 YVF/lSubpcHB2BfnZUzaSInE+17uo21DJ44tQ56O0lWDsxLqadJzTDiJ2BG5Ol3u
 JOIWti/Yl8YoCQQcir6QksBAUkR26iCpelO8kO6J/JWGEekVgl3Oik4NOmYrdN55
 rUoHIyVdK5Z5y4j79Pmth7//+c6OFli1cAeUmBlIvxS9T4T2ZCz30jBim76VTS8H
 AjgaX/aBlE3SxAYoMWLZh1VglukxAVCG2qZ9lm7iNLCpkGwP59jT/DE9Gok2IXBg
 id9SMTkrpitijCyM3oKYox14Tl+QP/brElnWCVyVeRIpkVH8s2WUkU+qHeZztBg9
 8i/aU9x4bOCkDjhvBjIM4jbYJAvvaVKlIXPvANO8xl/A0D7nCsmCs7DpritRHZEr
 3y4N8SQsaamVD083+UaVciAo1XCpl5cNq9gH/Y7+U8h2bIThZn8HTbZ1uWgXaCY1
 OwfElbJDKInsSmDVBEklMI8CF42YsjGQc2JC+A/3M3CapTfepKPg6SvptNQueZb+
 SUw0mgGBumpdDQSHo5tPf5JL1y57ERkdOBVryrqYtr6qdjw3ox9o/+B8eTy3gkg1
 b71LEhzG2UbdSVaHiYhRE/IR3W3yKzNX8Oh3BTHfp04jqnNijx4hHu9oX7j3W59M
 P/bd6GHHI5ZY66aLqCue
 =r2kL
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS/EDAC updates from Boris Petkov:

 "An amd64_edac fix for single channel configurations + trivial cleanups
  courtesy of Jingoo Han."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:07:20 +02:00
Jingoo Han
c542b53da9 EDAC: Replace strict_strtol() with kstrtol()
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-24 09:30:03 +02:00
Borislav Petkov
88d84ac973 EDAC: Fix lockdep splat
Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-23 16:01:28 -07:00
Jingoo Han
c7f62fc87b EDAC: Replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:16:33 +02:00
Srivatsa S. Bhat
c8c64d165c EDAC: Don't give write permission to read-only files
I get the following warning on boot:

------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name:  -[8737R2A]-
Write permission without 'store'
...
</snip>

Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-09 12:40:45 +02:00
Borislav Petkov
8b7719e08a EDAC, mc_sysfs.c: Fix string array pointer types
Those should be const ptr to a const string, fix them.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-25 15:44:25 +01:00
Mauro Carvalho Chehab
9713faecff EDAC: Merge mci.mem_is_per_rank with mci.csbased
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab
1eef128254 amd64_edac: Correct DIMM sizes
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:02 +01:00
Stephen Hemminger
fbe2d3616c EDAC: Make sysfs functions static
Fixes lots of sparse warnings here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-05 11:33:57 +01:00
Mauro Carvalho Chehab
3d958823e2 edac: better report error conditions in debug mode
It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab
e7100478fa edac: only create sdram_scrub_rate where supported
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.

Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:31 -03:00
Lans Zhang
44d22e2404 EDAC: Cleanup device deregistering path
Use device_unregister to replace put_device + device_del for
cleanup, and fix the potential use after free.

Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07 17:43:00 +01:00
Konstantin Khlebnikov
311bd84247 EDAC: Fix kernel panic on module unloading
This patch fixes use-after-free and double-free bugs in
edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
calls mc_attr_release() which calls kfree(). The following
device_del() works with already released memory. An another kfree() in
edac_mc_sysfs_exit() releses the same memory again. Great.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: stable@vger.kernel.org # 3.[67]
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07 17:42:58 +01:00
Denis Kirjanov
2d56b109e3 EDAC: Handle error path in edac_mc_sysfs_init() properly
Make sure proper deregistration happens on all error paths in
edac_mc_sysfs_init.

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
[ Boris: cleanup and concretize commit message ]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28 11:56:52 +01:00
Wei Yongjun
db7312a295 EDAC: Convert to use simple_open()
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:22 +01:00
Josh Hunt
3c0622760a EDAC: Fix mc size reported in sysfs
This is the complement to previous commit "EDAC: Fix csrow size
reported in sysfs". This fixes the memory controller size reporting on
csrow-based memory controllers. The csrow size is already combined for
both channels. Without this patch memory size is reported doubled.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:50 +01:00
Borislav Petkov
16a528ee39 EDAC: Fix csrow size reported in sysfs
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.

Fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:40 +01:00
Borislav Petkov
921a689965 EDAC: Pass mci parent
Initialize the mem_ctl_info descriptor of a csrow properly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:23 +01:00
Mauro Carvalho Chehab
38ced28b21 edac: allow specifying the error count with fake_inject
In order to test if the error counters are properly incremented,
add a way to specify how many errors were generated by a trace.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-27 09:01:30 -03:00
Rob Herring
e7930ba49e edac: create top-level debugfs directory
Create a single, top-level "edac" directory for debugfs. An "mc[0-N]"
directory is then created for each memory controller. Individual drivers
can create additional entries such as h/w error injection control.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-12 12:15:49 -03:00
Mauro Carvalho Chehab
9eb07a7fb8 edac: edac_mc_handle_error(): add an error_count parameter
In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.

The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.

The changes at the drivers were made by this small script:

	$file .=$_ while (<>);
	$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
	print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-12 12:15:47 -03:00
Mauro Carvalho Chehab
03f7eae80f edac: remove arch-specific parameter for the error handler
Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.

The changes at the EDAC drivers were done by this small perl script:

	$file .=$_ while (<>);
	$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
	print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:52 -03:00
Mauro Carvalho Chehab
6e84d359b2 edac_mc: Cleanup per-dimm_info debug messages
The edac_mc_alloc() routine allocates one dimm_info device for all
possible memories, including the non-filled ones. The debug messages
there are somewhat confusing. So, cleans them, by moving the code
that prints the memory location to edac_mc, and using it on both
edac_mc_sysfs and edac_mc.

Also, only dumps information when DIMM/ranks are actually
filled.

After this patch, a dimm-based memory controller will print the debug
info as:

[ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
[ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow:   csrow = ffff8801169be000
[ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow:   csrow->first_page = 0x0
[ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow:   csrow->last_page = 0x0
[ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow:   csrow->page_mask = 0x0
[ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow:   csrow->nr_channels = 3
[ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow:   csrow->channels = ffff8801149c2840
[ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow:   csrow->mci = ffff880117426000
[ 1011.380041] EDAC DEBUG: edac_mc_dump_channel:   channel->chan_idx = 0
[ 1011.380042] EDAC DEBUG: edac_mc_dump_channel:     channel = ffff8801149c2860
[ 1011.380044] EDAC DEBUG: edac_mc_dump_channel:     channel->csrow = ffff8801169be000
[ 1011.380046] EDAC DEBUG: edac_mc_dump_channel:     channel->dimm = ffff88010fe90400
...
[ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
[ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm:   dimm = ffff88010fe90400
[ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm:   dimm->label = 'CPU#0Channel#0_DIMM#0'
[ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
[ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm:   dimm->grain = 8
[ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
...

(a rank-based memory controller would print, instead of "dimm?", "rank?"
 on the above debug info)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:49 -03:00
Joe Perches
956b9ba156 edac: Convert debugfX to edac_dbg(X,
Use a more common debugging style.

Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:49 -03:00