linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-13 15:41:39 +00:00

Author	SHA1	Message	Date
Eric Dumazet	f930103421	tcp: Namespaceify sysctl_tcp_sack Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 10:53:28 -04:00
Eric Dumazet	eed29f17f0	tcp: add a struct net parameter to tcp_parse_options() We want to move some TCP sysctls to net namespaces in the future. tcp_window_scaling, tcp_sack and tcp_timestamps being fetched from tcp_parse_options(), we need to pass an extra parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 10:53:28 -04:00
Jiri Pirko	a5fcf8a6c9	net: propagate tc filter chain index down the ndo_setup_tc call We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 09:55:53 -04:00
David S. Miller	50dffe7fad	Merge branch 'mlx4-drivers-version-update' Tariq Toukan says: ==================== mlx4 drivers: version update This patchset contains version updates for the MLX4 drivers: Core, EN, and IB. Just like we've done in mlx5, we modify the outdated driver version (reported in ethtool for example). This better reflects the current driver state, and removes the redundant date string. We are not going to change this frequently or even use it. I include the IB patch in this series as it has similar subject and content. It does not cause any kind of conflict with Doug's tree. The rdma mailing list is CCed. Please let me know if I need to submit this differently. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:33:02 -04:00
Tariq Toukan	0a528ee9a5	IB/mlx4: Bump driver version Remove date and bump version for mlx4_ib driver. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:33:01 -04:00
Tariq Toukan	808df6a209	net/mlx4_en: Bump driver version Remove date and bump version for mlx4_en driver. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:33:01 -04:00
Tariq Toukan	cea2a6d81d	net/mlx4_core: Bump driver version Remove date and bump version for mlx4_core driver. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:33:00 -04:00
Andrew Lunn	5ebe31d7b2	net: dsa: mv88e6xxx: Have 6161/6123 use EDSA tags The mv88e6161 and mv88e6123 are capable of using EDSA tags when passing frames from the host to the switch and back. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:28:23 -04:00
Mark Bloch	57d88182ea	vxlan: use a more suitable function when assigning NULL When stopping the vxlan interface we detach it from the socket. Use RCU_INIT_POINTER() and not rcu_assign_pointer() to do so. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:26:03 -04:00
Ganesh Goudar	1dec4cec9f	cxgb4: Fix tids count for ipv6 offload connection the adapter consumes two tids for every ipv6 offload connection be it active or passive, calculate tid usage count accordingly. Also change the signatures of relevant functions to get the address family. Signed-off-by: Rizwan Ansari <rizwana@chelsio.com> Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 15:14:34 -04:00
David S. Miller	e5c5180a23	Merge branch 'nfp-ctrl-vNIC' Jakub Kicinski says: ==================== nfp: ctrl vNIC This series adds the ability to use one vNIC as a control channel for passing messages to and from the application firmware. The implementation restructures the existing netdev vNIC code to be able to deal with nfp_nets with netdev pointer set to NULL. Control vNICs are not visible to userspace (other than for dumping ring state), and since they don't have netdevs we use a tasklet for RX and simple skb list for TX queuing. Due to special status of the control vNIC we have to reshuffle the init code a bit to make sure control vNIC will be fully brought up (and therefore communication with app FW can happen) before any netdev or port is visible to user space. FW will designate which vNIC is supposed to be used as control one by setting _pf%u_net_ctrl_bar symbol. Some FWs depend on metadata being prepended to control message, some prefer to look at queue ID to decide that something is a control message. Our implementation can cater to both. First two users of this code will be eBPF maps and flower offloads. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:42 -04:00
Jakub Kicinski	f9380629fa	nfp: advertise support for NFD ABI 0.5 NFD ABI 0.5 is equivalent to NFD ABI 3.0 but requires that the driver checks the APP id symbol and makes sure it can support given app. Most advanced apps will likely require control vNIC (ability to exchange control messages between the driver and app FW). Detailed app version checking and capability exchange is left to app-specific code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:42 -04:00
Jakub Kicinski	02082701b9	nfp: create control vNICs and wire up rx/tx When driver encounters an nfp_app which has a control message handler defined, allocate a control vNIC. This control channel will be used to exchange data with the application FW such as flow table programming, statistics and global datapath control. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:42 -04:00
Jakub Kicinski	2c7e41c0b2	nfp: allow non-equal distribution of IRQs Thus far the code assumed all vNICs will request similar number of IRQs. This will be no longer true with control vNICs (where 1 IRQ will suffice). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:41 -04:00
Jakub Kicinski	6d4b0d8ed6	nfp: slice the netdev spawning function We want to be able to create a special vNIC for control messages. This vNIC should be created before any netdev is registered to allow nfp_app logic to exchange messages with the FW app before any netdev is visible to user space. Unfortunately we can't enable IRQs until we know how many vNICs we will need to spawn. Divide the function which spawns netdevs for vNICs into three parts: - vNIC/memory allocation; - IRQ allocation; - netdev init and register. This will help us insert the initialization of the control channel after IRQ allocation but before netdev init and register. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:41 -04:00
Jakub Kicinski	21537bc701	nfp: don't clutter init code passing fw_ver around Reading fw version from the BAR is trivial. Don't pass it around through layers of init functions, simply read it again where needed. This commit has the side effect of each vNIC having the exact NFD version from its own control memory, rather than all data vNICs assuming the version of the first one. This should not result in user-visible changes, though. Capabilities of data vNICs of trival apps are identical. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:41 -04:00
Jakub Kicinski	73e253f0e5	nfp: map all queue controllers at once RX and TX queue controllers are interleaved. Instead of creating two mappings which map the same area at slightly different offset, create only one mapping. Always map all queue controllers to simplify the code and allow reusing the mapping for non-data vNICs. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:40 -04:00
Jakub Kicinski	c24ca95ff6	nfp: make vNIC ctrl memory mapping function reusable We will soon need to map control vNIC PCI memory as well as data vNIC memory. Make the function for mapping areas pointed to by an RTsym reusable. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:40 -04:00
Jakub Kicinski	77ece8d5f1	nfp: add control vNIC datapath Since control vNICs don't have a netdev, they can't use napi and queuing stack provides. Add simple tasklet-based data receive and send of control messages with queuing on a skb_list. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:40 -04:00
Jakub Kicinski	5c0dbe9ecf	nfp: prepare config and enable for working without netdevs Out of the three stages of ifup/ifdown (allocate, configure, start) - this commit prepares the configuration stage for working with control vNICs. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:39 -04:00
Jakub Kicinski	a7b1ad0875	nfp: allow allocation and initialization of netdev-less vNICs vNICs used for sending and receiving control messages shouldn't really have a netdev. Add the ability to initialize vNICs for netdev-less operation. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:39 -04:00
Jakub Kicinski	042f4ba62f	nfp: make sure debug accesses don't depend on netdevs We want to be able to inspect the state of descriptor rings of the control vNIC, so it will use the same interface as data vNICs. Make sure the code doesn't use netdevs to determine state of the rings and names things appropriately. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:39 -04:00
Jakub Kicinski	c821e61789	nfp: prepare print macros for use without netdev To be able to reuse print macros easily with control vNICs make the macros check if netdev pointer is populated and use dev_* print functions otherwise. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:38 -04:00
Jakub Kicinski	cd083ce158	nfp: move nfp_net_vecs_init() Move nfp_net_vecs_init() after all datapath functions. We will need to init poll() callbacks from this function soon. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:38 -04:00
Jakub Kicinski	4621199dbf	nfp: reuse ring free code on close On the close path reuse the ring free helpers introduced for runtime reconfiguration. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:38 -04:00
Jakub Kicinski	ee26756d01	nfp: split out the allocation part of open Our open/close implementations have 3 stages: - allocation/freeing of ring resources, irqs etc., - device config, - device/stack enable (can't fail). Right now all of those stages are placed in separate functions, apart from allocation during open. Fix that. It will make it easier for us to allocate resources for netdev-less vNICs. Because we want to reuse allocation code in netdev-less vNICs leave the netif_set_real_num_[rt]x_queues() calls inside open. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:37 -04:00
Jakub Kicinski	d00ca2f378	nfp: reorder open and close functions We will soon reuse parts of .ndo_stop() for clean up after errors in .ndo_open(). Reorder the associated functions to make that possible. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 12:51:37 -04:00
Andrew Lunn	2b30842b23	net: fec: Clear and enable MIB counters on imx51 Both the IMX51 and IMX53 datasheet indicates that the MIB counters should be cleared during setup. Otherwise random numbers are returned via ethtool -S. Add a quirk and a function to do this. Tested on an IMX51. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-07 10:06:52 -04:00
David S. Miller	216fe8f021	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 22:20:08 -04:00
David S. Miller	9747e23138	Merge branch 'phylib-support-for-MV88X3310-10G-phy' Russell King says: ==================== net: Add phylib support for MV88X3310 10G phy This patch series adds support for the Marvell 88x3310 PHY found on the SolidRun Macchiatobin board. The first patch introduces a set of generic Clause 45 PHY helpers that C45 PHY drivers can make use of if they wish. Patch 2 ensures that the Clause 22 aneg_done function will not be called for incompatible Clause 45 PHYs. Patch 3 fixes the aneg restart to be compatible with C45 PHYs - it can currently only cope with C22 PHYs. Patch 4 moves the "gen10g" driver into the Clause 45 code, grouping all core clause 45 code together. Patch 5 adds the phy_interface_t types for XAUI and 10GBase-KR links. As 10GBase-KR appears to be compatible with XFI and SFI, XFI and SFI, I currently see no reason to add XFI and SFI interface modes. There seems to be vendor code out there using these, but they all alias back to the same hardware settings. Patch 6 adds support for the MV88X3310 PHY, which supports both the copper and fiber interfaces. It should be noted that the MV88X3310 automatically switches its MAC facing interface between 10GBase-KR and SGMII depending on the negotiated speed. This was discussed with Florian, and we agreed to update the phy interface mode depending on the properties of the actual link mode to the PHY. v2: - update sysfs-class-net-phydev documentation - avoid genphy_aneg_done for non-C22 PHYs - expand comment about 0x30 constant - add comment about lack of reset - configure driver using MARVELL_10G_PHY ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:21 -04:00
Russell King	20b2af32ff	net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support Add phylib support for the Marvell Alaska X 10 Gigabit PHY (MV88X3310). This phy is able to operate at 10G, 1G, 100M and 10M speeds, and only supports Clause 45 accesses. The PHY appears (based on the vendor IDs) to be two different vendors IP, with each devad containing several instances. This PHY driver has only been tested with the RJ45 copper port, fiber port and a Marvell Armada 8040-based ethernet interface. It should be noted that to use the full range of speeds, MAC drivers need to also reconfigure the link mode as per phydev->interface, since the PHY automatically changes its interface mode depending on the negotiated speed. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Russell King	c125ca0918	net: phy: add XAUI and 10GBASE-KR PHY connection types XAUI allows XGMII to reach an extended distance by using a XGXS layer at each end of the MAC to PHY link, operating over four Serdes lanes. 10GBASE-KR is a single lane Serdes backplane ethernet connection method with autonegotiation on the link. Some PHYs use this to connect to the ethernet interface at 10G speeds, switching to other connection types when utilising slower speeds. 10GBASE-KR is also used for XFI and SFI to connect to XFP and SFP fiber modules. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Russell King	921690f2aa	net: phy: split out 10G genphy support Move the old 10G genphy support to sit beside the new clause 45 library functions, so all the 10G phy code is together. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Russell King	002ba7058a	net: phy: hook up clause 45 autonegotiation restart genphy_restart_aneg() can only restart autonegotiation on clause 22 PHYs. Add a phy_restart_aneg() function which selects between the clause 22 and clause 45 restart functionality depending on the PHY type and whether the Clause 45 PHY supports the Clause 22 register set. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Russell King	41408ad519	net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support Avoid calling genphy_aneg_done() for PHYs that do not implement the Clause 22 register set. Clause 45 PHYs may implement the Clause 22 register set along with the Clause 22 extension MMD. Hence, we can't simply block access to the Clause 22 functions based on the PHY being a Clause 45 PHY. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Russell King	5acde34a5a	net: phy: add 802.3 clause 45 support to phylib Add generic helpers for 802.3 clause 45 PHYs for >= 10Gbps support. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 21:14:13 -04:00
Linus Torvalds	b29794ec95	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Made TCP congestion control documentation match current reality, from Anmol Sarma. 2) Various build warning and failure fixes from Arnd Bergmann. 3) Fix SKB list leak in ipv6_gso_segment(). 4) Use after free in ravb driver, from Eugeniu Rosca. 5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet. 6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme Piccoli. 7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping Zhang. 8) Use after free in vxlan deletion, from Mark Bloch. 9) Fix ordering of NAPI poll enabled in ethoc driver, from Max Filippov. 10) Fix stmmac hangs with TSO, from Niklas Cassel. 11) Fix crash in CALIPSO ipv6, from Richard Haines. 12) Clear nh_flags properly on mpls link up. From Roopa Prabhu. 13) Fix regression in sk_err socket error queue handling, noticed by ping applications. From Soheil Hassas Yeganeh. 14) Update mlx4/mlx5 MAINTAINERS information. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits) net: stmmac: fix a broken u32 less than zero check net: stmmac: fix completely hung TX when using TSO net: ethoc: enable NAPI before poll may be scheduled net: bridge: fix a null pointer dereference in br_afspec ravb: Fix use-after-free on `ifconfig eth0 down` net/ipv6: Fix CALIPSO causing GPF with datagram support net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value Revert "sit: reload iphdr in ipip6_rcv" i40e/i40evf: proper update of the page_offset field i40e: Fix state flags for bit set and clean operations of PF iwlwifi: fix host command memory leaks iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265 iwlwifi: mvm: clear new beacon command template struct iwlwifi: mvm: don't fail when removing a key from an inexisting sta iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3 iwlwifi: mvm: fix firmware debug restart recording iwlwifi: tt: move ucode_loaded check under mutex iwlwifi: mvm: support ibss in dqa mode iwlwifi: mvm: Fix command queue number on d0i3 flow iwlwifi: mvm: rs: start using LQ command color ...	2017-06-06 14:30:17 -07:00
Linus Torvalds	e87f327ecd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull sparc fixes from David Miller: 1) Fix TLB context wrap races, from Pavel Tatashin. 2) Cure some gcc-7 build issues. 3) Handle invalid setup_hugepagesz command line values properly, from Liam R Howlett. 4) Copy TSB using the correct address shift for the huge TSB, from Mike Kravetz. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: delete old wrap code sparc64: new context wrap sparc64: add per-cpu mm of secondary contexts sparc64: redefine first version sparc64: combine activate_mm and switch_mm sparc64: reset mm cpumask after wrap sparc/mm/hugepages: Fix setup_hugepagesz for invalid values. sparc: Machine description indices can vary sparc64: mm: fix copy_tsb to correctly copy huge page TSBs arch/sparc: support NR_CPUS = 4096 sparc64: Add __multi3 for gcc 7.x and later. sparc64: Fix build warnings with gcc 7. arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5	2017-06-06 14:28:18 -07:00
David Rientjes	abb2ea7dfd	compiler, clang: suppress warning for unused static inline functions GCC explicitly does not warn for unused static inline functions for -Wunused-function. The manual states: Warn whenever a static function is declared but not defined or a non-inline static function is unused. Clang does warn for static inline functions that are unused. It turns out that suppressing the warnings avoids potentially complex #ifdef directives, which also reduces LOC. Suppress the warning for clang. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-06 14:09:22 -07:00
David S. Miller	b3aefc2fbd	Merge branch 'sparc64-context-wrap-fixes' Pavel Tatashin says: ==================== sparc64: context wrap fixes This patch series contains fixes for context wrap: when we are out of context ids, and need to get a new version. It fixes memory corruption issues which happen when more than number of context ids (currently set to 8K) number of processes are started simultaneously, and processes can get a wrong context. sparc64: new context wrap: - contains explanation of new wrap method, and also explanation of races that it solves sparc64: reset mm cpumask after wrap - explains issue of not reseting cpu mask on a wrap ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:48 -07:00
Pavel Tatashin	0197e41ce7	sparc64: delete old wrap code The old method that is using xcall and softint to get new context id is deleted, as it is replaced by a method of using per_cpu_secondary_mm without xcall to perform the context wrap. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:29 -07:00
Pavel Tatashin	a0582f26ec	sparc64: new context wrap The current wrap implementation has a race issue: it is called outside of the ctx_alloc_lock, and also does not wait for all CPUs to complete the wrap. This means that a thread can get a new context with a new version and another thread might still be running with the same context. The problem is especially severe on CPUs with shared TLBs, like sun4v. I used the following test to very quickly reproduce the problem: - start over 8K processes (must be more than context IDs) - write and read values at a memory location in every process. Very quickly memory corruptions start happening, and what we read back does not equal what we wrote. Several approaches were explored before settling on this one: Approach 1: Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for every process to complete the wrap. (Note: every CPU must WAIT before leaving smp_new_mmu_context_version_client() until every one arrives). This approach ends up with deadlocks, as some threads own locks which other threads are waiting for, and they never receive softint until these threads exit smp_new_mmu_context_version_client(). Since we do not allow the exit, deadlock happens. Approach 2: Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into into C code, and issue new versions to every CPU. This approach adds some overhead to runtime: in switch_mm() we must add some checks to make sure that versions have not changed due to wrap while we were loading the new secondary context. (could be protected by PSTATE_IE but that degrades performance as on M7 and older CPUs as it takes 50 cycles for each access). Also, we still need a global per-cpu array of MMs to know where we need to load new contexts, otherwise we can change context to a thread that is going way (if we received mondo between switch_mm() and switch_to() time). Finally, there are some issues with window registers in rtrap() when context IDs are changed during CPU mondo time. The approach in this patch is the simplest and has almost no impact on runtime. We use the array with mm's where last secondary contexts were loaded onto CPUs and bump their versions to the new generation without changing context IDs. If a new process comes in to get a context ID, it will go through get_new_mmu_context() because of version mismatch. But the running processes do not need to be interrupted. And wrap is quicker as we do not need to xcall and wait for everyone to receive and complete wrap. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:29 -07:00
Pavel Tatashin	7a5b4bbf49	sparc64: add per-cpu mm of secondary contexts The new wrap is going to use information from this array to figure out mm's that currently have valid secondary contexts setup. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:29 -07:00
Pavel Tatashin	c4415235b2	sparc64: redefine first version CTX_FIRST_VERSION defines the first context version, but also it defines first context. This patch redefines it to only include the first context version. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:28 -07:00
Pavel Tatashin	14d0334c67	sparc64: combine activate_mm and switch_mm The only difference between these two functions is that in activate_mm we unconditionally flush context. However, there is no need to keep this difference after fixing a bug where cpumask was not reset on a wrap. So, in this patch we combine these. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:28 -07:00
Pavel Tatashin	5889748573	sparc64: reset mm cpumask after wrap After a wrap (getting a new context version) a process must get a new context id, which means that we would need to flush the context id from the TLB before running for the first time with this ID on every CPU. But, we use mm_cpumask to determine if this process has been running on this CPU before, and this mask is not reset after a wrap. So, there are two possible fixes for this issue: 1. Clear mm cpumask whenever mm gets a new context id 2. Unconditionally flush context every time process is running on a CPU This patch implements the first solution Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:28 -07:00
Liam R. Howlett	f322980b74	sparc/mm/hugepages: Fix setup_hugepagesz for invalid values. hugetlb_bad_size needs to be called on invalid values. Also change the pr_warn to a pr_err to better align with other platforms. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:03 -07:00
James Clarke	c982aa9c30	sparc: Machine description indices can vary VIO devices were being looked up by their index in the machine description node block, but this often varies over time as devices are added and removed. Instead, store the ID and look up using the type, config handle and ID. Signed-off-by: James Clarke <jrtc27@jrtc27.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541 Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:03 -07:00
Mike Kravetz	654f480762	sparc64: mm: fix copy_tsb to correctly copy huge page TSBs When a TSB grows beyond its current capacity, a new TSB is allocated and copy_tsb is called to copy entries from the old TSB to the new. A hash shift based on page size is used to calculate the index of an entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT should be used. As a result, when copy_tsb is called for a huge page TSB the entries are placed at the incorrect index in the newly allocated TSB. When doing hardware table walk, the MMU does not match these entries and we end up in the TSB miss handling code. This code will then create and write an entry to the correct index in the TSB. We take a performance hit for the table walk miss and recreation of these entries. Pass a new parameter to copy_tsb that is the page size shift to be used when copying the TSB. Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 13:45:02 -07:00
Jane Chu	c79a13734d	arch/sparc: support NR_CPUS = 4096 Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info() only allocates a single page for NR_CPUS mondo entries. Thus we cannot use all 4096 CPUs on some SPARC platforms. To fix, allocate (2^order) pages where order is set according to the size of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa are not used in asm code, there are no imm13 offsets from the base PA that will break because they can only reach one page. Orabug: 25505750 Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Atish Patra <atish.patra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 16:41:47 -04:00

1 2 3 4 5 ...

679210 Commits