linux

Author	SHA1	Message	Date
David S. Miller	b4fc1a460f	Merge branch 'bpf-next' Alexei Starovoitov says: ==================== eBPF syscall, verifier, testsuite v14 -> v15: - got rid of macros with hidden control flow (suggested by David) replaced macro with explicit goto or return and simplified where possible (affected patches #9 and #10) - rebased, retested v13 -> v14: - small change to 1st patch to ease 'new userspace with old kernel' problem (done similar to perf_copy_attr()) (suggested by Daniel) - the rest unchanged v12 -> v13: - replaced 'foo __user ' pointers with __aligned_u64 (suggested by David) - added __attribute__((aligned(8)) to 'union bpf_attr' to keep constant alignment between patches - updated manpage and syscall wrappers due to __aligned_u64 - rebased, retested on x64 with 32-bit and 64-bit userspace and on i386, build tested on arm32,sparc64 v11 -> v12: - dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel) - replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy) - implemented and tested compat support (not part of this set) (.. Daniel) - changed 'void log_buf' to 'char ' (.. Daniel) - combined struct bpf_work_struct and bpf_prog_info (.. Daniel) - added better return value explanation to manpage (.. Andy) - added log_buf/log_size explanation to manpage (.. Andy & Daniel) - added a lot more info about prog_type and map_type to manpage (.. Andy) - rebased, tweaked test_stubs Patches 1-4 establish BPF syscall shell for maps and programs. Patches 5-10 add verifier step by step Patch 11 adds test stubs for 'unspec' program type and verifier testsuite from user space Note that patches 1,3,4,7 add commands and attributes to the syscall while being backwards compatible from each other, which should demonstrate how other commands can be added in the future. After this set the programs can be loaded for testing only. They cannot be attached to any events. Though manpage talks about tracing and sockets, it will be a subject of future patches. Please take a look at manpage: BPF(2) Linux Programmer's Manual BPF(2) NAME bpf - perform a command on eBPF map or program SYNOPSIS #include <linux/bpf.h> int bpf(int cmd, union bpf_attr attr, unsigned int size); DESCRIPTION bpf() syscall is a multiplexor for a range of different operations on eBPF which can be characterized as "universal in-kernel virtual machine". eBPF is similar to original Berkeley Packet Filter (or "classic BPF") used to filter network packets. Both statically analyze the programs before loading them into the kernel to ensure that programs cannot harm the running system. eBPF extends classic BPF in multiple ways including ability to call in- kernel helper functions and access shared data structures like eBPF maps. The programs can be written in a restricted C that is compiled into eBPF bytecode and executed on the eBPF virtual machine or JITed into native instruction set. eBPF Design/Architecture eBPF maps is a generic storage of different types. User process can create multiple maps (with key/value being opaque bytes of data) and access them via file descriptor. In parallel eBPF programs can access maps from inside the kernel. It's up to user process and eBPF program to decide what they store inside maps. eBPF programs are similar to kernel modules. They are loaded by the user process and automatically unloaded when process exits. Each eBPF program is a safe run-to-completion set of instructions. eBPF verifier statically determines that the program terminates and is safe to execute. During verification the program takes a hold of maps that it intends to use, so selected maps cannot be removed until the program is unloaded. The program can be attached to different events. These events can be packets, tracepoint events and other types in the future. A new event triggers execution of the program which may store information about the event in the maps. Beyond storing data the programs may call into in-kernel helper functions which may, for example, dump stack, do trace_printk or other forms of live kernel debugging. The same program can be attached to multiple events. Different programs can access the same map: tracepoint tracepoint tracepoint sk_buff sk_buff event A event B event C on eth0 on eth1 \| \| \| \| \| \| \| \| \| \| --> tracing <-- tracing socket socket prog_1 prog_2 prog_3 prog_4 \| \| \| \| \|--- -----\| \|-------\| map_3 map_1 map_2 Syscall Arguments bpf() syscall operation is determined by cmd which can be one of the following: BPF_MAP_CREATE Create a map with given type and attributes and return map FD BPF_MAP_LOOKUP_ELEM Lookup element by key in a given map and return its value BPF_MAP_UPDATE_ELEM Create or update element (key/value pair) in a given map BPF_MAP_DELETE_ELEM Lookup and delete element by key in a given map BPF_MAP_GET_NEXT_KEY Lookup element by key in a given map and return key of next element BPF_PROG_LOAD Verify and load eBPF program attr is a pointer to a union of type bpf_attr as defined below. size is the size of the union. union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command / __u32 map_type; __u32 key_size; / size of key in bytes / __u32 value_size; / size of value in bytes / __u32 max_entries; / max number of entries in a map / }; struct { / anonymous struct used by BPF_MAP__ELEM commands / __u32 map_fd; __aligned_u64 key; union { __aligned_u64 value; __aligned_u64 next_key; }; }; struct { /* anonymous struct used by BPF_PROG_LOAD command / __u32 prog_type; __u32 insn_cnt; __aligned_u64 insns; / 'const struct bpf_insn ' / __aligned_u64 license; /* 'const char ' / __u32 log_level; /* verbosity level of eBPF verifier / __u32 log_size; / size of user buffer / __aligned_u64 log_buf; / user supplied 'char ' buffer / }; } __attribute__((aligned(8))); eBPF maps maps is a generic storage of different types for sharing data between kernel and userspace. Any map type has the following attributes: . type . max number of elements . key size in bytes . value size in bytes The following wrapper functions demonstrate how this syscall can be used to access the maps. The functions use the cmd argument to invoke different operations. BPF_MAP_CREATE int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = map_type, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; return bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); } bpf() syscall creates a map of map_type type and given attributes key_size, value_size, max_entries. On success it returns process-local file descriptor. On error, -1 is returned and errno is set to EINVAL or EPERM or ENOMEM. The attributes key_size and value_size will be used by verifier during program loading to check that program is calling bpf_map__elem() helper functions with correctly initialized key and that program doesn't access map element value beyond specified value_size. For example, when map is created with key_size = 8 and program does: bpf_map_lookup_elem(map_fd, fp - 4) such program will be rejected, since in-kernel helper function bpf_map_lookup_elem(map_fd, void key) expects to read 8 bytes from 'key' pointer, but 'fp - 4' starting address will cause out of bounds stack access. Similarly, when map is created with value_size = 1 and program does: value = bpf_map_lookup_elem(...); (u32 )value = 1; such program will be rejected, since it accesses value pointer beyond specified 1 byte value_size limit. Currently only hash table map_type is supported: enum bpf_map_type { BPF_MAP_TYPE_UNSPEC, BPF_MAP_TYPE_HASH, }; map_type selects one of the available map implementations in kernel. For all map_types eBPF programs access maps with the same bpf_map_lookup_elem()/bpf_map_update_elem() helper functions. BPF_MAP_LOOKUP_ELEM int bpf_lookup_elem(int fd, void key, void value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr)); } bpf() syscall looks up an element with given key in a map fd. If element is found it returns zero and stores element's value into value. If element is not found it returns -1 and sets errno to ENOENT. BPF_MAP_UPDATE_ELEM int bpf_update_elem(int fd, void key, void value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } The call creates or updates element with given key/value in a map fd. On success it returns zero. On error, -1 is returned and errno is set to EINVAL or EPERM or ENOMEM or E2BIG. E2BIG indicates that number of elements in the map reached max_entries limit specified at map creation time. BPF_MAP_DELETE_ELEM int bpf_delete_elem(int fd, void key) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), }; return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr)); } The call deletes an element in a map fd with given key. Returns zero on success. If element is not found it returns -1 and sets errno to ENOENT. BPF_MAP_GET_NEXT_KEY int bpf_get_next_key(int fd, void key, void next_key) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .next_key = ptr_to_u64(next_key), }; return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr)); } The call looks up an element by key in a given map fd and returns key of the next element into next_key pointer. If key is not found, it return zero and returns key of the first element into next_key. If key is the last element, it returns -1 and sets errno to ENOENT. Other possible errno values are ENOMEM, EFAULT, EPERM, EINVAL. This method can be used to iterate over all elements of the map. close(map_fd) will delete the map map_fd. Exiting process will delete all maps automatically. eBPF programs BPF_PROG_LOAD This cmd is used to load eBPF program into the kernel. char bpf_log_buf[LOG_BUF_SIZE]; int bpf_prog_load(enum bpf_prog_type prog_type, const struct bpf_insn insns, int insn_cnt, const char license) { union bpf_attr attr = { .prog_type = prog_type, .insns = ptr_to_u64(insns), .insn_cnt = insn_cnt, .license = ptr_to_u64(license), .log_buf = ptr_to_u64(bpf_log_buf), .log_size = LOG_BUF_SIZE, .log_level = 1, }; return bpf(BPF_PROG_LOAD, &attr, sizeof(attr)); } prog_type is one of the available program types: enum bpf_prog_type { BPF_PROG_TYPE_UNSPEC, BPF_PROG_TYPE_SOCKET, BPF_PROG_TYPE_TRACING, }; By picking prog_type program author selects a set of helper functions callable from eBPF program and corresponding format of struct bpf_context (which is the data blob passed into the program as the first argument). For example, the programs loaded with prog_type = TYPE_TRACING may call bpf_printk() helper, whereas TYPE_SOCKET programs may not. The set of functions available to the programs under given type may increase in the future. Currently the set of functions for TYPE_TRACING is: bpf_map_lookup_elem(map_fd, void key) // lookup key in a map_fd bpf_map_update_elem(map_fd, void key, void value) // update key/value bpf_map_delete_elem(map_fd, void key) // delete key in a map_fd bpf_ktime_get_ns(void) // returns current ktime bpf_printk(char fmt, int fmt_size, ...) // prints into trace buffer bpf_memcmp(void ptr1, void ptr2, int size) // non-faulting memcmp bpf_fetch_ptr(void ptr) // non-faulting load pointer from any address bpf_fetch_u8(void ptr) // non-faulting 1 byte load bpf_fetch_u16(void ptr) // other non-faulting loads bpf_fetch_u32(void ptr) bpf_fetch_u64(void ptr) and bpf_context is defined as: struct bpf_context { / argN fields match one to one to arguments passed to trace events / u64 arg1, arg2, arg3, arg4, arg5, arg6; / return value from kretprobe event or from syscall_exit event / u64 ret; }; The set of helper functions for TYPE_SOCKET is TBD. More program types may be added in the future. Like BPF_PROG_TYPE_USER_TRACING for unprivileged programs. BPF_PROG_TYPE_UNSPEC is used for testing only. Such programs cannot be attached to events. insns array of "struct bpf_insn" instructions insn_cnt number of instructions in the program license license string, which must be GPL compatible to call helper functions marked gpl_only log_buf user supplied buffer that in-kernel verifier is using to store verification log. Log is a multi-line string that should be used by program author to understand how verifier came to conclusion that program is unsafe. The format of the output can change at any time as verifier evolves. log_size size of user buffer. If size of the buffer is not large enough to store all verifier messages, -1 is returned and errno is set to ENOSPC. log_level verbosity level of eBPF verifier, where zero means no logs provided close(prog_fd) will unload eBPF program The maps are accesible from programs and generally tie the two together. Programs process various events (like tracepoint, kprobe, packets) and store the data into maps. User space fetches data from maps. Either the same or a different map may be used by user space as configuration space to alter program behavior on the fly. Events Once an eBPF program is loaded, it can be attached to an event. Various kernel subsystems have different ways to do so. For example: setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)); will attach the program prog_fd to socket sock which was received by prior call to socket(). ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); will attach the program prog_fd to perf event event_fd which was received by prior call to perf_event_open(). Another way to attach the program to a tracing event is: event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter"); write(event_fd, "bpf-123"); / where 123 is eBPF program FD / / here program is attached and will be triggered by events / close(event_fd); / to detach from event / EXAMPLES / eBPF+sockets example: * 1. create map with maximum of 2 elements * 2. set map[6] = 0 and map[17] = 0 * 3. load eBPF program that counts number of TCP and UDP packets received * via map[skb->ip->proto]++ * 4. attach prog_fd to raw socket via setsockopt() * 5. print number of received TCP/UDP packets every second / int main(int ac, char av) { int sock, map_fd, prog_fd, key; long long value = 0, tcp_cnt, udp_cnt; map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2); if (map_fd < 0) { printf("failed to create map '%s'\n", strerror(errno)); / likely not run as root / return 1; } key = 6; / ip->proto == tcp / assert(bpf_update_elem(map_fd, &key, &value) == 0); key = 17; / ip->proto == udp / assert(bpf_update_elem(map_fd, &key, &value) == 0); struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), / r6 = r1 / BPF_LD_ABS(BPF_B, 14 + 9), / r0 = ip->proto / BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/ (u32 )(fp - 4) = r0 / BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), / r2 = fp / BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), / r2 = r2 - 4 / BPF_LD_MAP_FD(BPF_REG_1, map_fd), / r1 = map_fd / BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), / r0 = map_lookup(r1, r2) / BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), / if (r0 == 0) goto pc+2 / BPF_MOV64_IMM(BPF_REG_1, 1), / r1 = 1 / BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), / lock (u64 )r0 += r1 / BPF_MOV64_IMM(BPF_REG_0, 0), / r0 = 0 / BPF_EXIT_INSN(), / return r0 / }; prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL"); assert(prog_fd >= 0); sock = open_raw_sock("lo"); assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)) == 0); for (;;) { key = 6; assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0); key = 17; assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0); printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt); sleep(1); } return 0; } RETURN VALUE For a successful call, the return value depends on the operation: BPF_MAP_CREATE The new file descriptor associated with eBPF map. BPF_PROG_LOAD The new file descriptor associated with eBPF program. All other commands Zero. On error, -1 is returned, and errno is set appropriately. ERRORS EPERM bpf() syscall was made without sufficient privilege (without the CAP_SYS_ADMIN capability). ENOMEM Cannot allocate sufficient memory. EBADF fd is not an open file descriptor EFAULT One of the pointers ( key or value or log_buf or insns ) is outside accessible address space. EINVAL The value specified in cmd is not recognized by this kernel. EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid. EINVAL For BPF_MAP__ELEM commands, some of the fields of "union bpf_attr" unused by this command are not set to zero. EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized instruction or uses reserved fields or jumps out of range or loop detected or calls unknown function). EACCES For BPF_PROG_LOAD, though program has valid instructions, it was rejected, since it was deemed unsafe (may access disallowed memory region or uninitialized stack/register or function constraints don't match actual types or misaligned access). In such case it is recommended to call bpf() again with log_level = 1 and examine log_buf for specific reason provided by verifier. ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that element with given key was not found. E2BIG program is too large or a map reached max_entries limit (max number of elements). NOTES These commands may be used only by a privileged process (one having the CAP_SYS_ADMIN capability). SEE ALSO eBPF architecture and instruction set is explained in Documentation/networking/filter.txt Linux 2014-09-16 BPF(2) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:40 -04:00
Alexei Starovoitov	3c731eba48	bpf: mini eBPF library, test stubs and verifier testsuite 1. the library includes a trivial set of BPF syscall wrappers: int bpf_create_map(int key_size, int value_size, int max_entries); int bpf_update_elem(int fd, void key, void value); int bpf_lookup_elem(int fd, void key, void value); int bpf_delete_elem(int fd, void key); int bpf_get_next_key(int fd, void key, void next_key); int bpf_prog_load(enum bpf_prog_type prog_type, const struct sock_filter_int insns, int insn_len, const char license); bpf_prog_load() stores verifier log into global bpf_log_buf[] array and BPF_() macros to build instructions 2. test stubs configure eBPF infra with 'unspec' map and program types. These are fake types used by user space testsuite only. 3. verifier tests valid and invalid programs and expects predefined error log messages from kernel. 40 tests so far. $ sudo ./test_verifier #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK ... Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00
Alexei Starovoitov	17a5267067	bpf: verifier (add verifier core) This patch adds verifier core which simulates execution of every insn and records the state of registers and program stack. Every branch instruction seen during simulation is pushed into state stack. When verifier reaches BPF_EXIT, it pops the state from the stack and continues until it reaches BPF_EXIT again. For program: 1: bpf_mov r1, xxx 2: if (r1 == 0) goto 5 3: bpf_mov r0, 1 4: goto 6 5: bpf_mov r0, 2 6: bpf_exit The verifier will walk insns: 1, 2, 3, 4, 6 then it will pop the state recorded at insn#2 and will continue: 5, 6 This way it walks all possible paths through the program and checks all possible values of registers. While doing so, it checks for: - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention - instruction encoding is not using reserved fields Kernel subsystem configures the verifier with two callbacks: - bool (is_valid_access)(int off, int size, enum bpf_access_type type); that provides information to the verifer which fields of 'ctx' are accessible (remember 'ctx' is the first argument to eBPF program) - const struct bpf_func_proto (*get_func_proto)(enum bpf_func_id func_id); returns argument constraints of kernel helper functions that eBPF program may call, so that verifier can checks that R1-R5 types match the prototype More details in Documentation/networking/filter.txt and in kernel/bpf/verifier.c Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00
Alexei Starovoitov	475fb78fbf	bpf: verifier (add branch/goto checks) check that control flow graph of eBPF program is a directed acyclic graph check_cfg() does: - detect loops - detect unreachable instructions - check that program terminates with BPF_EXIT insn - check that all branches are within program boundary Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00
Alexei Starovoitov	0246e64d9a	bpf: handle pseudo BPF_LD_IMM64 insn eBPF programs passed from userspace are using pseudo BPF_LD_IMM64 instructions to refer to process-local map_fd. Scan the program for such instructions and if FDs are valid, convert them to 'struct bpf_map' pointers which will be used by verifier to check access to maps in bpf_map_lookup/update() calls. If program passes verifier, convert pseudo BPF_LD_IMM64 into generic by dropping BPF_PSEUDO_MAP_FD flag. Note that eBPF interpreter is generic and knows nothing about pseudo insns. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00
Alexei Starovoitov	cbd3570086	bpf: verifier (add ability to receive verification log) add optional attributes for BPF_PROG_LOAD syscall: union bpf_attr { struct { ... __u32 log_level; /* verbosity level of eBPF verifier / __u32 log_size; / size of user buffer / __aligned_u64 log_buf; / user supplied 'char buffer' / }; }; when log_level > 0 the verifier will return its verification log in the user supplied buffer 'log_buf' which can be used by program author to analyze why verifier rejected given program. 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt provides several examples of these messages, like the program: BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), will be rejected with the following multi-line message in log_buf: 0: (7a) (u64 )(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (b7) r1 = 0 4: (85) call 1 5: (15) if r0 == 0x0 goto pc+1 R0=map_ptr R10=fp 6: (7a) (u64 )(r0 +4) = 0 misaligned access off 4 size 8 The format of the output can change at any time as verifier evolves. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00
Alexei Starovoitov	51580e798c	bpf: verifier (add docs) this patch adds all of eBPF verfier documentation and empty bpf_check() The end goal for the verifier is to statically check safety of the program. Verifier will catch: - loops - out of range jumps - unreachable instructions - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention More details in Documentation/networking/filter.txt Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Alexei Starovoitov	0a542a86d7	bpf: handle pseudo BPF_CALL insn in native eBPF programs userspace is using pseudo BPF_CALL instructions which encode one of 'enum bpf_func_id' inside insn->imm field. Verifier checks that program using correct function arguments to given func_id. If all checks passed, kernel needs to fixup BPF_CALL->imm fields by replacing func_id with in-kernel function pointer. eBPF interpreter just calls the function. In-kernel eBPF users continue to use generic BPF_CALL. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Alexei Starovoitov	09756af468	bpf: expand BPF syscall with program load/unload eBPF programs are similar to kernel modules. They are loaded by the user process and automatically unloaded when process exits. Each eBPF program is a safe run-to-completion set of instructions. eBPF verifier statically determines that the program terminates and is safe to execute. The following syscall wrapper can be used to load the program: int bpf_prog_load(enum bpf_prog_type prog_type, const struct bpf_insn insns, int insn_cnt, const char license) { union bpf_attr attr = { .prog_type = prog_type, .insns = ptr_to_u64(insns), .insn_cnt = insn_cnt, .license = ptr_to_u64(license), }; return bpf(BPF_PROG_LOAD, &attr, sizeof(attr)); } where 'insns' is an array of eBPF instructions and 'license' is a string that must be GPL compatible to call helper functions marked gpl_only Upon succesful load the syscall returns prog_fd. Use close(prog_fd) to unload the program. User space tests and examples follow in the later patches Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Alexei Starovoitov	db20fd2b01	bpf: add lookup/update/delete/iterate methods to BPF maps 'maps' is a generic storage of different types for sharing data between kernel and userspace. The maps are accessed from user space via BPF syscall, which has commands: - create a map with given type and attributes fd = bpf(BPF_MAP_CREATE, union bpf_attr attr, u32 size) returns fd or negative error - lookup key in a given map referenced by fd err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero and stores found elem into value or negative error - create or update key/value pair in a given map err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero or negative error - find and delete element by key in a given map err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr attr, u32 size) using attr->map_fd, attr->key - iterate map elements (based on input key return next_key) err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->next_key - close(fd) deletes the map Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Alexei Starovoitov	749730ce42	bpf: enable bpf syscall on x64 and i386 done as separate commit to ease conflict resolution Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Alexei Starovoitov	99c55f7d47	bpf: introduce BPF syscall and maps BPF syscall is a multiplexor for a range of different operations on eBPF. This patch introduces syscall with single command to create a map. Next patch adds commands to access maps. 'maps' is a generic storage of different types for sharing data between kernel and userspace. Userspace example: /* this syscall wrapper creates a map with given type and attributes * and returns map_fd on success. * use close(map_fd) to delete the map */ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = map_type, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; return bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); } 'union bpf_attr' is backwards compatible with future extensions. More details in Documentation/networking/filter.txt and in manpage Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
Eric Dumazet	4a8e320c92	net: sched: use pinned timers While using a MQ + NETEM setup, I had confirmation that the default timer migration ( /proc/sys/kernel/timer_migration ) is killing us. Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX queues) : EST="est 1sec 4sec" for ETH in eth1 do tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms done We can see that timers get migrated into a single cpu, presumably idle at the time timers are set up. Then all qdisc dequeues run from this cpu and huge lock contention happens. This single cpu is stuck in softirq mode and cannot dequeue fast enough. 39.24% [kernel] [k] _raw_spin_lock 2.65% [kernel] [k] netem_enqueue 1.80% [kernel] [k] netem_dequeue 1.63% [kernel] [k] copy_user_enhanced_fast_string 1.45% [kernel] [k] _raw_spin_lock_bh By pinning qdisc timers on the cpu running the qdisc, we respect proper XPS setting and remove this lock contention. 5.84% [kernel] [k] netem_enqueue 4.83% [kernel] [k] _raw_spin_lock 2.92% [kernel] [k] copy_user_enhanced_fast_string Current Qdiscs that benefit from this change are : netem, cbq, fq, hfsc, tbf, htb. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:26:48 -04:00
David S. Miller	9fb426a642	Merge branch 'gso_send_check' Tom Herbert says: ==================== net: Eliminate gso_send_check gso_send_check presents a lot of complexity for what it is being used for. It seems that there are only two cases where it might be effective: TCP and UFO paths. In these cases, the gso_send_check function initializes the TCP or UDP checksum respectively to the pseudo header checksum so that the checksum computation is appropriately offloaded or computed in the gso_segment functions. The gso_send_check functions are only called from dev.c in skb_mac_gso_segment when ip_summed != CHECKSUM_PARTIAL (which seems very unlikely in TCP case). We can move the logic of this into the respective gso_segment functions where the checksum is initialized if ip_summed != CHECKSUM_PARTIAL. With the above cases handled, gso_send_check is no longer needed, so we can remove all uses of it and the fields in the offload callbacks. With this change, ip_summed in the skb should be preserved though all the layers of gso_segment calls. In follow-on patches, we may be able to remove the check setup code in tcp_gso_segment if we can guarantee that ip_summed will always be CHECKSUM_PARTIAL (verify all paths and probably add an assert in tcp_gro_segment). Tested these patches by: - netperf TCP_STREAM test with GSO enabled - Forced ip_summed != CHECKSUM_PARTIAL with above - Ran UDP_RR with 10000 request size over GRE tunnel. This exercised UFO path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:23:13 -04:00
Tom Herbert	53e5039896	net: Remove gso_send_check as an offload callback The send_check logic was only interesting in cases of TCP offload and UDP UFO where the checksum needed to be initialized to the pseudo header checksum. Now we've moved that logic into the related gso_segment functions so gso_send_check is no longer needed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:47 -04:00
Tom Herbert	f71470b37e	udp: move logic out of udp[46]_ufo_send_check In udp[46]_ufo_send_check the UDP checksum initialized to the pseudo header checksum. We can move this logic into udp[46]_ufo_fragment. After this change udp[64]_ufo_send_check is a no-op. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:46 -04:00
Tom Herbert	d020f8f733	tcp: move logic out of tcp_v[64]_gso_send_check In tcp_v[46]_gso_send_check the TCP checksum is initialized to the pseudo header checksum using __tcp_v[46]_send_check. We can move this logic into new tcp[46]_gso_segment functions to be done when ip_summed != CHECKSUM_PARTIAL (ip_summed == CHECKSUM_PARTIAL should be the common case, possibly always true when taking GSO path). After this change tcp_v[46]_gso_send_check is no-op. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:22:46 -04:00
David S. Miller	2fdbfea573	Merge branch 'stmmac' Beniamino Galvani says: ==================== net: stmmac glue layer for Amlogic Meson SoCs the Ethernet controller available in Amlogic Meson6 and Meson8 SoCs is a Synopsys DesignWare MAC IP core, already supported by the stmmac driver. These patches add a glue layer to the driver for the platform-specific settings required by the Amlogic variant. This has been tested on a Amlogic S802 device with the initial Meson support submitted by Carlo Caione [1]. [1] http://lwn.net/Articles/612000/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:13:06 -04:00
Beniamino Galvani	318fd4909d	net: stmmac: meson: document device tree bindings Add the device tree bindings documentation for the Amlogic Meson variant of the Synopsys DesignWare MAC. Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:12:56 -04:00
Beniamino Galvani	0ad5adcdb7	net: stmmac: add Amlogic Meson glue layer The Ethernet controller available in Meson6 and Meson8 SoCs is a Synopsys DesignWare MAC IP core, already supported by the stmmac driver. This glue layer implements some platform-specific settings needed by the Amlogic variant. Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 00:12:56 -04:00
David S. Miller	4daaab4f0c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-09-24 16:48:32 -04:00
Linus Torvalds	b94d525e58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Here is a quick pull request primarily meant to address the deconfig fallout from changing SCSI_NETLINK from being used via 'select' to being used via 'depends'. I applied a set of 5 patches written by Michal Marek, and then I carefully audited all of the remaining config files, basically: 1) I scanned every arch config file, and if it mentioned CONFIG_INET or CONFIG_UNIX, I made sure it had CONFIG_NET=y 2) After that, I scanned every arch config file, and if it did not have CONFIG_NET=y I made sure it did not reference any networking config options. Finally, we have some late breaking wireless fixes in here from John Linville and co" [ And there's a sparc bpf fix snuck in too ] * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: sparc: bpf_jit: fix loads from negative offsets parisc: Update defconfigs which were missing CONFIG_NET. powerpc: Update defconfigs which were missing CONFIG_NET. s390: Update defconfigs which were missing CONFIG_NET. mips: Update some more defconfigs which were missing CONFIG_NET. sparc: Set CONFIG_NET=y in defconfigs sh: Set CONFIG_NET=y in defconfigs powerpc: Set CONFIG_NET=y in defconfigs parisc: Set CONFIG_NET=y in defconfigs mips: Set CONFIG_NET=y in defconfigs brcmfmac: Fix off by one bug in brcmf_count_20mhz_channels() ath9k: Fix NULL pointer dereference on early irq net: rfkill: gpio: Fix clock status NFC: st21nfca: Fix potential depmod dependency cycle NFC: st21nfcb: Fix depmod dependency cycle NFC: microread: Potential overflows in microread_target_discovered()	2014-09-24 12:45:24 -07:00
Alexei Starovoitov	35607b02db	sparc: bpf_jit: fix loads from negative offsets - fix BPF_LD\|ABS\|IND from negative offsets: make sure to sign extend lower 32 bits in 64-bit register before calling C helpers from JITed code, otherwise 'int k' argument of bpf_internal_load_pointer_neg_helper() function will be added as large unsigned integer, causing packet size check to trigger and abort the program. It's worth noting that JITed code for 'A = A op K' will affect upper 32 bits differently depending whether K is simm13 or not. Since small constants are sign extended, whereas large constants are stored in temp register and zero extended. That is ok and we don't have to pay a penalty of sign extension for every sethi, since all classic BPF instructions have 32-bit semantics and we only need to set correct upper bits when transitioning from JITed code into C. - though instructions 'A &= 0' and 'A *= 0' are odd, JIT compiler should not optimize them out Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 15:04:07 -04:00
David S. Miller	543a2dff5e	Merge tag 'master-2014-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-23 Please consider pulling this one last batch of fixes intended for the 3.17 stream! For the NFC bits, Samuel says: "Hopefully not too late for a handful of NFC fixes: - 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a depmod dependenyc cycle. - One potential buffer overflow in the microread driver." On top of that... Emil Goode provides a fix for a brcmfmac off-by-one regression which was introduced in the 3.17 cycle. Loic Poulain fixes a polarity mismatch for a variable assignment inside of rfkill-gpio. Wojciech Dubowik prevents a NULL pointer dereference in ath9k. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 15:00:12 -04:00
David S. Miller	c899c3f364	parisc: Update defconfigs which were missing CONFIG_NET. Commit `df568d8e` ("scsi: Use 'depends' with LIBFC instead of 'select'.") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 14:34:29 -04:00
David S. Miller	95d77997fd	powerpc: Update defconfigs which were missing CONFIG_NET. Commit `df568d8e` ("scsi: Use 'depends' with LIBFC instead of 'select'.") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 14:34:29 -04:00
David S. Miller	ff408ba1fc	s390: Update defconfigs which were missing CONFIG_NET. Commit `df568d8e` ("scsi: Use 'depends' with LIBFC instead of 'select'.") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 14:34:20 -04:00
David S. Miller	af4de1b568	mips: Update some more defconfigs which were missing CONFIG_NET. Commit `df568d8e` ("scsi: Use 'depends' with LIBFC instead of 'select'.") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:44:16 -04:00
Michal Marek	1ab0b8b200	sparc: Set CONFIG_NET=y in defconfigs Commit `5d6be6a5` ("scsi_netlink : Make SCSI_NETLINK dependent on NET instead of selecting NET") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: sparclinux@vger.kernel.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:38:30 -04:00
Michal Marek	925f7fadad	sh: Set CONFIG_NET=y in defconfigs Commit `5d6be6a5` ("scsi_netlink : Make SCSI_NETLINK dependent on NET instead of selecting NET") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-sh@vger.kernel.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:38:30 -04:00
Michal Marek	853e3e1d8e	powerpc: Set CONFIG_NET=y in defconfigs Commit `5d6be6a5` ("scsi_netlink : Make SCSI_NETLINK dependent on NET instead of selecting NET") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:38:29 -04:00
Michal Marek	25fee47f9c	parisc: Set CONFIG_NET=y in defconfigs Commit `5d6be6a5` ("scsi_netlink : Make SCSI_NETLINK dependent on NET instead of selecting NET") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-parisc@vger.kernel.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:38:29 -04:00
Michal Marek	d1630f9ef2	mips: Set CONFIG_NET=y in defconfigs Commit `5d6be6a5` ("scsi_netlink : Make SCSI_NETLINK dependent on NET instead of selecting NET") removed what happened to be the only instance of 'select NET'. Defconfigs that were relying on the select now lack networking support. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-mips@linux-mips.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-24 13:38:29 -04:00
Linus Torvalds	02f130a787	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull one last block fix from Jens Axboe: "We've had an issue with scsi-mq where probing takes forever. This was bisected down to the percpu changes for blk_mq_queue_enter(), and the fact we now suffer an RCU grace period when killing a queue. SCSI creates and destroys tons of queues, so this let to 10s of seconds of stalls at boot for some. Tejun has a real fix for this, but it's too involved for 3.17. So this is a temporary workaround to expedite the queue killing until we can fold in the real fix for 3.18 when that merge window opens" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe	2014-09-24 09:51:50 -07:00
Linus Torvalds	2d7ed01e5b	PCI updates for v3.17: Enumeration - Revert "PCI: Don't scan random busses in pci_scan_bridge()" (Bjorn Helgaas) - Revert "PCI: Make sure bus number resources stay within their parents bounds" (Bjorn Helgaas) PCI device hotplug - Fix pciehp pcie_wait_cmd() timeout (Yinghai Lu) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIunLAAoJEFmIoMA60/r80pwQAJSIBhVBlxF2RkygVd6SPmDz Kbj1tCl1A/xyytGhzD1axL170Ead77iUi3E286Qra1kw+0bL8mFStlqnJtIVhLMO jQWnwk2/47ZvINDZ12RxjQfIqgWQojy6i+Rj9Nm/iQl+A/zTstaMph/IBDbSOsfa ZodpXreyO5Ryn0j+U/1M+h6fMf+jT7tvD1WbN/Qa1OaN731UKRbTCl5yewxdt5h5 SlNr6xrPBzubq79YzXIuDj+LujnaPLmylFtXw8q36n3ZVDTeAH7yHNuK5dH0an3x 2k+CxNI/8+PWDF7whq/SqwBC03hagCy/D712qv23aPv5vqAT2mVMxCP90EJHHzsd /sKbb8H5qVzqxJseAc8JaA+2d9HEPEzHhNjmKsUwukTtPYnTFCNa8EErUV988KdV mC0Brp4kFHI+JnV7NTAS0gCZ2tdqNiX+rG/OWyKiC5BY66GlFE9g7s5Du6RrRY2n XBkcrJr/oH9C+evOLZszXk+nLBJKTpLkdPCApbB2b/tjvddurJnXj2PndVPbr+wH cVF2ufafWmAPL401OhvewjEq//C06/K1D/dd9oM9EreDjFUWuXw6bPbYtu+1Mzk7 jjp7hKPMjjBJPhMp7LUsHNOqdUWzn+rpT6sMNUtfq0nfOXFBk/KTe+NXjbN5CY2B qKw+h7wbPdVwsWi/ZhZe =qP+Z -----END PGP SIGNATURE----- Merge tag 'pci-v3.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Here are a few fixes that should be in v3.17. - Reverting "Don't scan random busses" covers up a CardBus regression having to do with allocating CardBus bus numbers. - Reverting "Make sure bus numbers stay within parents bounds" covers up an ACPI _CRS bug that makes us reconfigure a bridge, causing a broken device behind it to stop responding. - The pciehp timeout change fixes some code we added in v3.17. Without the fix, we can send a new hotplug command too early, before the timeout has expired. I hope for better fixes for the reverts, but those will have to come after v3.17" * tag 'pci-v3.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: pciehp: Fix pcie_wait_cmd() timeout Revert "PCI: Make sure bus number resources stay within their parents bounds" Revert "PCI: Don't scan random busses in pci_scan_bridge()"	2014-09-24 09:46:29 -07:00
Linus Torvalds	2368a9426f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes three issues: - if ccp is loaded on a machine without ccp, it will incorrectly activate causing all requests to fail. Fixed by preventing ccp from loading if hardware isn't available. - not all IRQs were enabled for the qat driver, leading to potential stalls when it is used - disabled buggy AVX CTR implementation in aesni" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - disable "by8" AVX CTR optimization crypto: ccp - Check for CCP before registering crypto algs crypto: qat - Enable all 32 IRQs	2014-09-24 09:37:35 -07:00
Linus Torvalds	eb55a2a95d	media fixes for v3.17-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIpioAAoJEAhfPr2O5OEVlz4QAIpVEnN7p/L2AMEbpzVYEu3f hk47HnIBQyCHLMZgjMVNglkXwoOUg4qOF25S16Qgyq8S2Rf/+urP4nBaaGla1C9O tYSMih8KpinHtDeSAiEfFF+IKx1M3m0YS/4vSlSrsb0iQnDBuKOCPLQ4C0fvGEFh NXjYdAL6rYRvLby9XWaCqgHK10rjeNdLC9R1tZpqtpli1CMlODowfC1IZC0dA+V6 hs7mnBFk8pkTPrEbcXrifMQqsIDCtkTKRV4VdLCYJA1cz5x3pIQLTmeXIrAcUUjR Dygc8evyvUviSm2tQx4KKgX40Qr6yEO06RMrLP4HgTyqMMJqrYJ6xXryA1HpXSwG vTCoY++5rlW7xLKO7WOp9oLBuLfayW8CMZZY3dcv1pqi6mo195qdhANRsm0IRmV9 v9QNrsrPq83TjxolvtU3qUtU80qhovh8nRRxGb2efzhIy6Q6qtw1nfxLIhvFTr7C chtSzHeNUnfYU+GVQMEUopM6WYzj2/2O9YDGpMTTaTDiu2iX+AwihzoXdn/9OyXl tMsovLOEBjBf4fDfLmEKAMTA11S06QA31COAkGjEZZLPhg9PE4rgBrTX1PxagDHI +gA4zxYF4/YSDwrWJSN+laK4scU+pPvG1njnHg18Mj4SCH1+34TyFMzQ5aJgxZ8h xh9tmnj5x68BPsXlcJ/f =nEDO -----END PGP SIGNATURE----- Merge tag 'media/v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For some last time fixes: - a regression detected on Kernel 3.16 related to VBI Teletext application breakage on drivers using videobuf2 (see https://bugzilla.kernel.org/show_bug.cgi?id=84401). The bug was noticed on saa7134 (migrated to VB2 on 3.16), but also affects em28xx (migrated on 3.9 to VB2); - two additional sanity checks at videobuf2; - two fixups to restore proper VBI support at the em28xx driver; - two Kernel oops fixups (at cx24123 and cx2341x drivers); - a bug at adv7604 where an if was doing just the opposite as it would be expected; - some documentation fixups to match the behavior defined at the Kernel" * tag 'media/v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] em28xx-v4l: get rid of field "users" in struct em28xx_v4l2" [media] em28xx: fix VBI handling logic [media] DocBook media: improve the poll() documentation [media] DocBook media: fix the poll() 'no QBUF' documentation [media] vb2: fix VBI/poll regression [media] cx2341x: fix kernel oops [media] cx24123: fix kernel oops due to missing parent pointer [media] adv7604: fix inverted condition [media] media/radio: fix radio-miropcm20.c build with io.h header file [media] vb2: fix plane index sanity check in vb2_plane_cookie() [media] DocBook media: update version number and V4L2 changes [media] DocBook media: fix fieldname in struct v4l2_subdev_selection [media] vb2: fix vb2 state check when start_streaming fails [media] videobuf2-core.h: fix comment [media] videobuf2-core: add comments before the WARN_ON [media] videobuf2-dma-sg: fix for wrong GFP mask to sg_alloc_table_from_pages	2014-09-24 09:03:43 -07:00
Linus Torvalds	a90e41e228	Bugfixes for md/raid1 particularly, but not only, fixing new "resync" code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUAVCIoAznsnt1WYoG5AQJDzRAAtciwFilYXxu8M7fPOQ/HZeoMtNLVX0dK cvL5yRhNxfoGLIG7TEeb5Wvd8cxHNR5t4x+jGmipJ7cTGE4S6Edgdpy2yhHDFBdo AyGgYreX441P07cefPUUa9nTVFlqx2TzJa+SR75CmBwbuZpx52kfHK9KMXWljY+Q Hm60k34tK4zzC5Tm2E7aeegFjUaIAwrpt3TOJlh8E/JiEQDsVz2+o+7RFwPXrXgm YnxfpaAcw5XcanlUj0q6r6O86hhItO54sBBcTtTNZtD7oZC82/OYj6SxlG0V3D2a wBFouI518Rf0TmdtG3XgPAfI0eCZyowZtYmpoYX+/8rkGSy2ZmJfxSY2NzmGBmX4 LtH0tYkp2qSu6WCXUMPOLmPRqQuT6iX4ho7KCNMr2n05kHMom/InNUajWUvqPFdE eBs27u9HngTVCTMpwdCfFV/qWXszEhpp9wyzAv5zRV7gyc3hZM3cQ1iV2GKor8Ka wSTeDT+gY9J2sCJgqx7li45jpsZPzayupwW+hBvieKeY6/fM1leur4Ji/mcRXytK YUci6fiy2kwxs1uzFq7Kra3Y5gqGq+S6HCspmZTtstzFxbKcMTmOC1B2ukKDPvGS HwXnQ6w+fXmF/+fXWD98++ET80rWj6utXBJhSGhkdQcyYRz5DU/2GsLsA4yvho4N Dbo2gIjTtD8= =gMsu -----END PGP SIGNATURE----- Merge tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md Pull bugfixes for md/raid1 from Neil Brown: "It is amazing how much easier it is to find bugs when you know one is there. Two bug reports resulted in finding 7 bugs! All are tagged for -stable. Those that can't cause (rare) data corruption, cause lockups. Particularly, but not only, fixing new "resync" code" * tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md: md/raid1: fix_read_error should act on all non-faulty devices. md/raid1: count resync requests in nr_pending. md/raid1: update next_resync under resync_lock. md/raid1: Don't use next_resync to determine how far resync has progressed md/raid1: make sure resync waits for conflicting writes to complete. md/raid1: clean up request counts properly in close_sync() md/raid1: be more cautious where we read-balance during resync. md/raid1: intialise start_next_window for READ case to avoid hang	2014-09-24 08:53:33 -07:00
Tejun Heo	0a30288da1	blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe blk-mq uses percpu_ref for its usage counter which tracks the number of in-flight commands and used to synchronously drain the queue on freeze. percpu_ref shutdown takes measureable wallclock time as it involves a sched RCU grace period. This means that draining a blk-mq takes measureable wallclock time. One would think that this shouldn't matter as queue shutdown should be a rare event which takes place asynchronously w.r.t. userland. Unfortunately, SCSI probing involves synchronously setting up and then tearing down a lot of request_queues back-to-back for non-existent LUNs. This means that SCSI probing may take more than ten seconds when scsi-mq is used. This will be properly fixed by implementing a mechanism to keep q->mq_usage_counter in atomic mode till genhd registration; however, that involves rather big updates to percpu_ref which is difficult to apply late in the devel cycle (v3.17-rc6 at the moment). As a stop-gap measure till the proper fix can be implemented in the next cycle, this patch introduces __percpu_ref_kill_expedited() and makes blk_mq_freeze_queue() use it. This is heavy-handed but should work for testing the experimental SCSI blk-mq implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de Fixes: `add703fda9` ("blk-mq: use percpu_ref for mq usage count") Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-09-24 08:29:36 -06:00
Mathias Krause	7da4b29d49	crypto: aesni - disable "by8" AVX CTR optimization The "by8" implementation introduced in commit `22cddcc7df` ("crypto: aes - AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it handles counter block overflows differently. It only accounts the right most 32 bit as a counter -- not the whole block as all other implementations do. This makes it fail the cryptomgr test #4 that specifically tests this corner case. As we're quite late in the release cycle, just disable the "by8" variant for now. Reported-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-09-24 21:15:31 +08:00
Tom Lendacky	c9f21cb638	crypto: ccp - Check for CCP before registering crypto algs If the ccp is built as a built-in module, then ccp-crypto (whether built as a module or a built-in module) will be able to load and it will register its crypto algorithms. If the system does not have a CCP this will result in -ENODEV being returned whenever a command is attempted to be queued by the registered crypto algorithms. Add an API, ccp_present(), that checks for the presence of a CCP on the system. The ccp-crypto module can use this to determine if it should register it's crypto alogorithms. Cc: stable@vger.kernel.org Reported-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-09-24 14:23:34 +08:00
Linus Torvalds	452b6361c4	Last late set of InfiniBand/RDMA fixes for 3.17: - Fixes for the new memory region re-registration support - iSER initiator error path fixes - Grab bag of small fixes for the qib and ocrdma hardware drivers - Larger set of fixes for mlx4, especially in RoCE mode -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJUIexdAAoJEENa44ZhAt0hP10QAJztxlS2a8U3JCJzthwSYxlI ohT9487iLk1uEcj4Z3i7w2ERRUzXaHbRTktNHFjwfRb8x2qMUgT2PfD6/30sQ250 nJAk3FRFNipxKkJSfmcc3+O4r91i4F+CaN8DGypaBDHcupeD2drKocl/Iu5MIvkG e5CzLlS7i/xrWKmgYP4bIqqFZsqQ+2rJrYBDybuLZSaZNd0PTDE3yCDihfOcsxjn TeOCVbm5895fPRtxzeCGHy8bXbYYN9vItuhtHC+sntYtbhNJhjpmP+1yD6M2SoZR 34sGd7AA1j1H6ATmanzeW2aALkFYPIuGihDbbnRQlDG1v09lEPfP2GtfLxoQ9Ibo nfe2rsthzV6Qh2xcXjn6KicgV7bb6aSUXEK24zKx7O3MkOvHkOC/JIIrd9dFe+uj R7pUd3XlAk8SBhTQ4gLub06Dl7ynzSRArwcdMTHp30LvtnjJZoQR67WGGrsdwlIW MV43105i7iLCcdaSd0ihKnR6OFlSh13Z0wpu+B386bwxkHxjFJXkVHxOJir/iAk9 cW4RXbA/ic7nwIjes4GbMNDOvdJO2tDcg9KGSgiDY3kC5GksPqfxXYVDlMB2rFoE PhfQ8TOcbZYTmlcKLMpMIFXP484VPhWQJeYWPOf9KGS6aW5QRNPsPCmAvaoSXWLs GVSlvjbE6O7MgonqG1Jh =Kpm1 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Last late set of InfiniBand/RDMA fixes for 3.17: - fixes for the new memory region re-registration support - iSER initiator error path fixes - grab bag of small fixes for the qib and ocrdma hardware drivers - larger set of fixes for mlx4, especially in RoCE mode" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/mlx4: Fix VF mac handling in RoCE IB/mlx4: Do not allow APM under RoCE IB/mlx4: Don't update QP1 in native mode IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/core: When marshaling uverbs path, clear unused fields IB/mlx4: Avoid executing gid task when device is being removed IB/mlx4: Fix lockdep splat for the iboe lock IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/iser: Bump version to 1.4.1 IB/iser: Allow bind only when connection state is UP IB/iser: Fix RX/TX CQ resource leak on error flow RDMA/ocrdma: Use right macro in query AH RDMA/ocrdma: Resolve L2 address when creating user AH mlx4: Correct error flows in rereg_mr IB/qib: Correct reference counting in debugfs qp_stats IPoIB: Remove unnecessary port query ...	2014-09-23 16:47:34 -07:00
Linus Torvalds	ffd4341d6a	sound fixes for 3.17-rc7 (or final) One fix is about a buggy compuation in PCM API function Clemens spotted out, but the impact must be really small as no one really uses it in user-space side. The rest are a trivial fix for a HD-audio model and a USB-audio device-specific regression fix, so all look fairly safe to apply. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJUIeYoAAoJEGwxgFQ9KSmkxQAP/jnhL0sGaVlqPP8UkxlQAEFo KuY+luKFttLDU/IixGkjpraVyvAG8pKwRkrMLLxy5N6WaX7vAbn45m714xlgRwaS EnREMes7hMnyH8KLyjxsSfnrKdGdXYET+3y2JcKAchcWK0UMwWuwptXZRaUFZl+X 9J0dDtuzEi8Lt/UeBkzMQXNQxHdJHFucNFKPe6oWHovV0f3AjgCsWHzpDyyGvAKQ dEAymw5AJF+oiAwdv+GEsW40jXAmRfstE7PadcyaxNlpHBzrXL7oWZfnOf1e0A2X H+g/Imj+XSk3L5HRL638amkvb/FmGdydD+vqeO4LCxt6kZKn53rPRNF/gSEU6NEF Ms0BHAji+PRcXcex7tW+RX7SxARJji7fTEFD4tv6P/Q9hwN2zqKM9qbKDZmdt4pz Cl3ldUv6Xi4PV7onLZVRW2Fv7XbgDcnzGzBYIWjkYp9TotqWxcEHOKIY/4Gdfjt4 SNZbJ8ESfPRcjRT8m+e9XayOpgFi3iwYMs+26vAULopPeFrQqMf7wN22qZF+A1En iHRdLKpnBnuGrmFVVFo2KjTpWf5bTQQFoDEjTm/FZQzkwn1u+RWqjH/c+KN6wYAq ewPGOYr8IBtFx7EkzbyIwUAsAdastA6kzsKlm0LjaN22QPZa/dpuBg2uNKiBBcuG mAwoS/dG/lQhwkh/cGYo =GQ9M -----END PGP SIGNATURE----- Merge tag 'sound-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "One fix is about a buggy computation in PCM API function Clemens spotted out, but the impact must be really small as no one really uses it in user-space side. The rest are a trivial fix for a HD-audio model and a USB-audio device-specific regression fix, so all look fairly safe to apply" * tag 'sound-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: snd-usb-caiaq: Fix LED commands for Kore controller ALSA: pcm: fix fifo_size frame calculation ALSA: hda - Add fixup model name lookup for Lemote A1205	2014-09-23 14:47:11 -07:00
Linus Torvalds	31f9bf46a5	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull final block fixes from Jens Axboe: "This week and last we've been fixing some corner cases related to blk-mq, mostly. I ended up pulling most of that out of for-linus yesterday, which is why the branch looks fresh. The rest were postponed for 3.18. This pull request contains: - Fix from Christoph, avoiding a stack overflow when FUA insertion would recursive infinitely. - Fix from David Hildenbrand on races between the timeout handler and uninitialized requests. Fixes a real issue that virtio_blk has run into. - A few fixes from me: - Ensure that request deadline/timeout is ordered before the request is marked as started. - A potential oops on out-of-memory, when we scale the queue depth of the device and retry. - A hang fix on requeue from SCSI, where the hardware queue would be stopped when we attempt to re-run it (and hence nothing would happen, stalling progress). - A fix for commit `2da78092`, where the cleanup path was moved to RCU, but a debug might_sleep() was inadvertently left in the code. This causes warnings for people" * 'for-linus' of git://git.kernel.dk/linux-block: genhd: fix leftover might_sleep() in blk_free_devt() blk-mq: use blk_mq_start_hw_queues() when running requeue work blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps() blk-mq: avoid infinite recursion with the FUA flag blk-mq: Avoid race condition with uninitialized requests blk-mq: request deadline must be visible before marking rq as started	2014-09-23 14:45:09 -07:00
Linus Torvalds	d19eff3acf	Merge branch 'parisc-3.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "We avoid using -mfast-indirect-calls for 64bit kernel builds to prevent building an unbootable kernel due to latest gcc changes. In the pdc_stable/firmware-access driver we fix a few possible stack overflows and we now call secure_computing_strict() instead of secure_computing() which fixes upcoming SECCOMP patches in the for-next trees" * 'parisc-3.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds parisc: pdc_stable.c: Avoid potential stack overflows parisc: pdc_stable.c: Cleaning up unnecessary use of memset in conjunction with strncpy parisc: ptrace: use secure_computing_strict()	2014-09-23 14:05:32 -07:00
John David Anglin	d26a7730b5	parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds In spite of what the GCC manual says, the -mfast-indirect-calls has never been supported in the 64-bit parisc compiler. Indirect calls have always been done using function descriptors irrespective of the -mfast-indirect-calls option. Recently, it was noticed that a function descriptor was always requested when the -mfast-indirect-calls option was specified. This caused problems when the option was used in application code and doesn't make any sense because the whole point of the option is to avoid using a function descriptor for indirect calls. Fixing this broke 64-bit kernel builds. I will fix GCC but for now we need the attached change. This results in the same kernel code as before. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Helge Deller <deller@gmx.de>	2014-09-23 21:38:26 +02:00
Andy Zhou	3c4d1daece	vxlan: Fix bug introduced by commit `acbf74a763` Commit `acbf74a763` ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions." introduced a bug in vxlan_xmit_one() function, causing it to transmit Vxlan packets without proper Vxlan header inserted. The change was not needed in the first place. Revert it. Reported-by: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 15:32:10 -04:00
Linus Torvalds	f0eb4a24d4	Need to rebuild defconfig files to cope with removal of "select NET" in drivers/scsi/Kconfig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIb7QAAoJEKurIx+X31iBdCgP/A6gtnOlkybSlxLGjATlObI6 bqYGixJd5+Po0bHJE/6Q2gepTBWRADCizexfzA8tDsuJbrQdpwpJ/4BCxIQMbQHe qWlQS/d07U5TbUh1HDQrBdseMUgJsqqUTEaUw/zuvfHHCDE4W08PoGn+KlJJv4RU ytV3Liph+NLaJbkEFftVcW+uIJ8Qsibd6k37oPA5AWsOyn4NiOBv2twaz20Kcowa MvzutUfogLAqbCl5Bh/7xEREuKdnmzlz2b+/Zc5/TgWLMjqJuJHJqdEJKm4Nag4D jbqw2Ji7hkX6eUL33MlmySVZqKlvLvqvaFdSQMB3wwTV5zAR19wz9ZXCgvanB+i4 aRn1zFpQWB51kQSUuy9jyaQ+8n+DPGeF1WLJrWKEVSV+SfKeOntnn224DqZirGIe UZea73MPMEjMGbQh477EwR5gaOTwaqQgNo403qIU8Iee8i5KVJp3tdAIf1zOUDed c/eziXvKGygwr0nyBwJzwDFE15TLUzkkv4IKnJje1RRT1ElTrsOMuGpl6EORy/Lm XabBCmiyU3l4Aj9+oqmxkDR22SOJ0fFBdeWKeufktfeIyRbW3TOsgSwKiMswAF1E jA2y3d7WhT76ceFZhQZ9kdT/6nz4CMpqOragUbMjw9bqCXjk5nAFMoZYXAHxSw4l ToG7whl3TYXs7OK4PIhe =GdXa -----END PGP SIGNATURE----- Merge tag 'please-pull-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 defconfig update from Tony Luck: "Need to rebuild defconfig files to cope with removal of "select NET" in drivers/scsi/Kconfig" * tag 'please-pull-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] refresh arch/ia64/configs/* using "make savedefconfig"	2014-09-23 12:10:48 -07:00
Linus Torvalds	c1d58658d8	Fix a resource leak in tmp103 driver. Add support for two more processors to fam15h_power driver. Also fix a bug in the same driver to only report the power level on chips which actually support reporting it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUIZ6JAAoJEMsfJm/On5mBD6UQAIZsZr17e5tqBRBQVpT67Uhc 8bQOL4bdxmkCF9YDCDFZ+R2tbd49T7zzitOdlFFbPv0/t9gQctZzW7YQ+plVh/+w eTm40imyh+Le6MDXxqgZdCZR1Y5/J15mTYGdCUQ5ld3p9yuJFbSoI7rxUfIeH5ER V7D/TZbf3U3m3kFhz5OalnBFNWMKhImGxOgfd/ulABZgvwbnwZCtersvjlYDHOsI 5SpBFCLvH0QD6khVGUgAT85XuHZZi+em3mhLAV4gpYO0NQDLotL/JONO5QslFMVQ bdNzTX54gRN7xZKp4kQUJllsc25htavVw531Yw9HUo0K9pPcjGruuGpcawm4zruY 4VNWdULm3MWdkS6LYCCRlKd2zW89ibcP1Ehwf+biD4Ucnx4J7QYeZ5pLl5oSG5Sk 3w2KSIM627PfL724wvZz2vvOV0CB62wB6TbX5XSHeN7jP+7dH2cTFAJycVfBBZLv nCXmvshLqQcjboW/XLrLAfu5r2nRWcZH30uL+D/VSI5o9kGEHj1d9xPu1BiWgpov xBe+mDWpHWFC6Qx74BqRwS2sCcLO3IMXXdcIOyV5jLeCHCIwJApypwfqsCeKfGPn hKjJqv8bfRhprMEEG2iFqYswTF3BM5QpDHt1LiTE059OWiBhI4z7Qn2VRUV0aDrh GAXPYFOhQb0atjYskMJ3 =i4EB -----END PGP SIGNATURE----- Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix a resource leak in tmp103 driver - Add support for two more processors to fam15h_power driver - Also fix a bug in the same driver to only report the power level on chips which actually support reporting it * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (tmp103) Fix resource leak bug in tmp103 temperature sensor driver hwmon: (fam15h_power) Add support for two more processors hwmon: (fam15h_power) Make actual power reporting conditional	2014-09-23 11:53:28 -07:00
Tony Luck	e8ee39e227	[IA64] refresh arch/ia64/configs/* using "make savedefconfig" Prompted by a change to drivers/scsi/Kconfig which used to do a "select NET" but now does a "depends on NET". This meant that some configurations ended up without CONFIG_NET=y Signed-off-by Tony Luck <tony.luck@intel.com>	2014-09-23 11:09:29 -07:00

1 2 3 4 5 ...

470629 Commits