linux/include/net
Hans Verkuil 626cf23660 poll: add poll_requested_events() and poll_does_not_wait() functions
In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem among others.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.

Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.

This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.

The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself.  That way drivers can obtain the mask through a new
poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Very rarely drivers might want to know whether poll_wait will actually
wait.  If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting.  This might be useful information in order
to avoid doing expensive work.

A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation.  This is now used in sock_poll_wait() in
include/net/sock.h.  This was the only place in the kernel that needed
this information.

Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead.  In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.

This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events.  It's been replaced by a call
to poll_requested_events().

For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.

Any driver accessing the qproc or key fields directly will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument.  This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

	events = wait ? wait->key : ~0;

   This will still work although it should be replaced with the new
   poll_requested_events() function (which does exactly the same).
   This will now even work better, since wait is no longer set to NULL
   unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
   NULL. Renaming qproc should prevent this from happening. There are no
   kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
   waiting. This is no longer sufficient as the correct test is now
   wait == NULL || wait->_qproc == NULL.

   However, the worst that can happen here is a slight performance hit in
   the case where wait != NULL and wait->_qproc == NULL. In that case the
   driver will assume that poll_wait() will actually add the fd to the set
   of waiting file descriptors. Of course, poll_wait() will not do that
   since it tests for wait->_qproc. This will not break anything, though.

   There is only one place in the whole kernel where this happens
   (sock_poll_wait() in include/net/sock.h) and that code will be replaced
   by a call to poll_does_not_wait() in the next patch.

   Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
   actually waiting. The next file descriptor from the set might match the
   event mask and thus any possible waits will never happen.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
..
9p 9p: Reduce object size with CONFIG_NET_9P_DEBUG 2012-01-05 10:51:44 -06:00
bluetooth Bluetooth: fix conding style issues all over the tree 2012-03-08 02:02:26 -03:00
caif caif-hsi: Add RX flip buffer 2012-02-04 16:06:28 -05:00
irda Fix common misspellings 2011-03-31 11:26:23 -03:00
iucv af_iucv: add shutdown for HS transport 2012-03-07 22:52:24 -08:00
netfilter netfilter: xt_LOG: add __printf() to sb_add() 2012-03-07 17:41:52 +01:00
netns netns: Fail conspicously if someone uses net_generic at an inappropriate time. 2012-01-27 21:06:02 -05:00
nfc NFC: NCI code identation fixes 2012-03-06 15:16:25 -05:00
phonet net: dont hold rtnl mutex during netlink dump callbacks 2011-05-02 15:26:28 -07:00
sctp sctp: Export sctp_do_peeloff 2012-03-08 13:52:08 -08:00
tc_act
act_api.h net: sched: constify tcf_proto and tc_action 2011-07-06 02:52:16 -07:00
addrconf.h ipv6: Remove never used function inet6_ac_check(). 2012-02-01 16:14:17 -05:00
af_ieee802154.h
af_rxrpc.h net: Remove __KERNEL__ cpp checks from include/net 2011-04-24 10:54:56 -07:00
af_unix.h switch unix_sock to struct path 2012-03-20 21:29:41 -04:00
ah.h ipsec: update MAX_AH_AUTH_LEN to support sha512 2011-01-13 21:48:25 -08:00
arp.h ipv4: Eliminate spurious argument to __ipv4_neigh_lookup 2012-02-15 17:48:35 -05:00
atmclip.h atm: clip: Use device neigh support on top of "arp_tbl". 2011-11-30 18:51:03 -05:00
ax25.h atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
ax88796.h
cfg80211-wext.h cfg80211: remove unused wext handler exports 2011-08-08 14:26:29 -04:00
cfg80211.h cfg80211: clarify timestamp in cfg80211_inform_bss 2012-03-13 14:54:20 -04:00
checksum.h
cipso_ipv4.h doc: Update the email address for Paul Moore in various source files 2011-08-01 17:58:33 -07:00
cls_cgroup.h
compat.h net: get rid of some pointless casts to sockaddr 2012-03-11 19:11:22 -07:00
datalink.h
dcbevent.h dcb: Add stub routines for !CONFIG_DCB 2011-10-06 15:49:51 -04:00
dcbnl.h net: dcb: getnumtcs()/setnumtcs() should return an int 2012-03-02 18:16:49 -08:00
dn_dev.h
dn_fib.h decnet: Convert to use flowidn where applicable. 2011-03-12 15:08:55 -08:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: Convert to use flowidn where applicable. 2011-03-12 15:08:55 -08:00
dn.h decnet: net/dn.h needs net/flow.h 2012-02-15 16:37:44 -05:00
dsa.h dsa: Include linux/if_ether.h to fix build error 2011-12-01 11:41:06 -05:00
dsfield.h
dst_ops.h net: Rename the dst_opt default_mtu method to mtu 2011-11-26 14:29:50 -05:00
dst.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-23 17:13:56 -05:00
esp.h
ethoc.h
fib_rules.h
flow_keys.h flow_dissector: use a 64bit load/store 2011-11-29 13:17:03 -05:00
flow.h ipv4: reset flowi parameters on route connect 2012-02-04 19:29:48 -05:00
garp.h garp: remove last synchronize_rcu() call 2011-05-12 17:46:56 -04:00
gen_stats.h Fix common misspellings 2011-03-31 11:26:23 -03:00
genetlink.h net: Deinline __nlmsg_put and genlmsg_put. -7k code on i386 defconfig. 2012-01-30 15:22:06 -05:00
gre.h
icmp.h ipv4: reduce percpu needs for icmpmsg mibs 2011-11-09 16:04:20 -05:00
ieee80211_radiotap.h wireless: move ieee80211chan2mhz macro 2011-11-11 12:32:50 -05:00
ieee802154_netdev.h
ieee802154.h 6LoWPAN: add fragmentation support 2011-11-14 00:19:42 -05:00
if_inet6.h ipv6: updates to privacy addresses per RFC 4941. 2011-08-01 18:05:00 -07:00
inet6_connection_sock.h inet: Pass flowi to ->queue_xmit(). 2011-05-08 15:28:28 -07:00
inet6_hashtables.h net: use IS_ENABLED(CONFIG_IPV6) 2011-12-11 18:25:16 -05:00
inet_common.h
inet_connection_sock.h net: rename sk_clone to sk_clone_lock 2011-11-08 17:07:07 -05:00
inet_ecn.h inet: add rfc 3168 extract in front of INET_ECN_encapsulate() 2011-10-22 01:25:23 -04:00
inet_frag.h
inet_hashtables.h atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
inet_sock.h net: implement IP_RECVTOS for IP_PKTOPTIONS 2012-02-13 00:46:41 -05:00
inet_timewait_sock.h inet: remove rcu protection on tw_net 2011-12-14 13:34:55 -05:00
inetpeer.h route: Remove redirect_genid 2012-03-08 00:30:32 -08:00
ip6_checksum.h
ip6_fib.h IPv6: Avoid taking write lock for /proc/net/ipv6_route 2011-12-30 17:07:33 -05:00
ip6_route.h Merge branch 'nf-next' of git://1984.lsi.us.es/net-next 2011-12-25 02:21:45 -05:00
ip6_tunnel.h
ip_fib.h ipv4: Call fib_select_default() only when actually necessary. 2011-04-14 15:05:22 -07:00
ip_vs.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
ip.h ipv4: Make ip_call_ra_chain() return bool. 2012-03-09 14:34:50 -08:00
ipcomp.h
ipconfig.h
ipip.h
ipv6.h ipv6: Add fragment reporting to ipv6_skip_exthdr(). 2011-12-03 09:35:10 -08:00
ipx.h net: Remove __KERNEL__ cpp checks from include/net 2011-04-24 10:54:56 -07:00
iw_handler.h Fix common misspellings 2011-03-31 11:26:23 -03:00
lapb.h wan: make LAPB callbacks const 2011-09-16 19:20:20 -04:00
lib80211.h include: replace linux/module.h with "struct module" wherever possible 2011-10-31 19:32:32 -04:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h bonding,llc: Fix structure sizeof incompatibility for some PDUs 2011-05-13 15:13:24 -04:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
mac80211.h mac80211: rename bss_conf timestamp to last_tsf 2012-03-13 14:54:20 -04:00
mip6.h
mld.h
ndisc.h ipv6: Remove neigh argument from ndisc_send_redirect() 2012-01-27 21:00:08 -05:00
neighbour.h ipv6: Use universal hash for NDISC. 2011-12-28 15:06:58 -05:00
net_namespace.h net: use IS_ENABLED(CONFIG_IPV6) 2011-12-11 18:25:16 -05:00
net_ratelimit.h net: Kill ratelimit.h dependency in linux/net.h 2011-05-27 13:41:33 -04:00
netdma.h
netevent.h net: Remove __KERNEL__ cpp checks from include/net 2011-04-24 10:54:56 -07:00
netlabel.h doc: Update the email address for Paul Moore in various source files 2011-08-01 17:58:33 -07:00
netlink.h net: Deinline __nlmsg_put and genlmsg_put. -7k code on i386 defconfig. 2012-01-30 15:22:06 -05:00
netprio_cgroup.h netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m 2012-02-10 15:08:57 -05:00
netrom.h
nexthop.h
nl802154.h
p8022.h
ping.h net: ping: fix build failure 2011-05-17 14:16:58 -04:00
pkt_cls.h net: Fix range checks in tcf_valid_offset(). 2010-12-21 12:43:16 -08:00
pkt_sched.h net: sched: constify tcf_proto and tc_action 2011-07-06 02:52:16 -07:00
protocol.h net: use IS_ENABLED(CONFIG_IPV6) 2011-12-11 18:25:16 -05:00
psnap.h
raw.h
rawv6.h net: Remove __KERNEL__ cpp checks from include/net 2011-04-24 10:54:56 -07:00
red.h net_sched: sfq: add optional RED on top of SFQ 2012-01-12 20:05:28 -08:00
regulatory.h cfg80211: pass DFS region to drivers through reg_notifier() 2011-11-21 16:20:41 -05:00
request_sock.h tcp: Change possible SYN flooding messages 2011-09-15 14:49:43 -04:00
rose.h rose: Add length checks to CALL_REQUEST parsing 2011-03-27 17:59:04 -07:00
route.h ipv4: reset flowi parameters on route connect 2012-02-04 19:29:48 -05:00
rtnetlink.h rtnetlink: Fix problem with buffer allocation 2012-02-21 16:56:45 -05:00
sch_generic.h net: Make qdisc_skb_cb upper size bound explicit. 2012-02-09 13:50:34 -05:00
scm.h af_unix: dont send SCM_CREDENTIALS by default 2011-09-28 13:29:50 -04:00
secure_seq.h tcp: add const qualifiers where possible 2011-10-21 05:22:42 -04:00
slhc_vj.h
snmp.h Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2012-01-09 13:08:28 -08:00
sock.h poll: add poll_requested_events() and poll_does_not_wait() functions 2012-03-23 16:58:38 -07:00
stp.h
tcp_memcontrol.h cgroup: remove cgroup_subsys argument from callbacks 2012-02-02 09:20:22 -08:00
tcp_states.h
tcp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-01 17:57:40 -05:00
timewait_sock.h timewait_sock: Create and use getpeer op. 2010-12-01 18:09:13 -08:00
transp_v6.h net: relax PKTINFO non local ipv6 udp xmit check 2011-08-30 17:39:01 -04:00
udp.h net: use IS_ENABLED(CONFIG_IPV6) 2011-12-11 18:25:16 -05:00
udplite.h net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
wext.h
wimax.h net: wimax: Remove of unused 'rfkill_input' pointer 2011-06-24 17:50:44 -07:00
wpan-phy.h Fix common misspellings 2011-03-31 11:26:23 -03:00
x25.h
x25device.h
xfrm.h xfrm: remove unneeded method typedef declaration in net/xfrm.h. 2012-02-25 20:19:24 -05:00