2017-09-19 76 views
6

這個問題在互聯網上是asked before,但我找不到一個好的答案。爲什麼Linux內核具有`struct sock`和`struct socket`?

Linux內核網絡棧設有兩種結構:

的兩個結構主要與,但似乎有一點不同的一生。可以通過sock->sk找到sk,或通過sk->sk_socket找到sock

爲什麼有兩種結構來存儲有關套接字的信息?假設我需要添加一個新字段,何時將它添加到struct socket以及什麼時候struct sock

UPDATE:請注意,我指的是在struct socketinclude/linux/net.h Linux的源代碼,這只是針對內核代碼裏面,/usr/include/sys/socket.h這是爲用戶空間。

+0

好:有一個內部和一個外面。 (類似的事情發生在struct stat中) – wildplasser

回答

4

struct socket似乎是用於系統調用的更高級別的接口(這就是爲什麼它也具有指向代表文件描述符的struct file的原因)。

struct sock是一個內核中的用於實行插座AF_INET(也有struct unix_sockAF_UNIX插座其衍生物的本),它可以通過兩個內核和被用戶空間被使用(通過struct sock)。

這兩個都是在1993年被添加到Linux 1.0的,我懷疑你會找到一個指定初始設計決策的文檔。

+0

Okey,但爲什麼內核開發者不是直接在'struct socket'中直接在'struct sock'和'struct unix_sock'中包含所有字段? – user1202136

+1

@ user1202136:因爲它與[分解](https://en.wikipedia.org/wiki/Decomposition_(computer_science))相矛盾,這可以使事情變得更簡單。另外請注意,'AF_INET'和'AF_UNIX'只有43中的2(參見'AF_MAX')可用套接字類型。在創建低層實現結構之前,'struct socket'由'socket()'創建。 – myaut

0

「這兩個結構基本上是聯繫在一起的 - 」不確定你的意思。

我想你可以找到答案,如果查看源文件,這些結構:

socket -> linux-src/include/linux/net.h 
sock -> linux-src/include/net/sock.h 

插座

* NET  An implementation of the SOCKET network access protocol. 
*  This is the master header file for the Linux NET layer, 
*  or, in plain English: the networking handling part of the 
*  kernel. 

襪子

* INET  An implementation of the TCP/IP protocol suite for the LINUX 
*  operating system. INET is implemented using the BSD Socket 
*  interface as the means of communication with the user level. 

這些結構的迪菲租用並具有不同的套接字抽象表示。

這裏回答關於不同的套接字。

Unix vs BSD vs TCP vs Internet sockets?

哪裏定義附加字段取決於你的意圖。請描述你的任務。

請看源代碼。

的linux-SRC /在include/linux/net.h

/* 
* NET  An implementation of the SOCKET network access protocol. 
*  This is the master header file for the Linux NET layer, 
*  or, in plain English: the networking handling part of the 
*  kernel. 
* 
* Version: @(#)net.h 1.0.3 05/25/93 
* 
* Authors: Orest Zborowski, <[email protected]> 
*  Ross Biro 
*  Fred N. van Kempen, <[email protected]> 
* 
*  This program is free software; you can redistribute it and/or 
*  modify it under the terms of the GNU General Public License 
*  as published by the Free Software Foundation; either version 
*  2 of the License, or (at your option) any later version. 
*/ 
..... 
..... 
..... 
/** 
* struct socket - general BSD socket 
* @state: socket state (%SS_CONNECTED, etc) 
* @type: socket type (%SOCK_STREAM, etc) 
* @flags: socket flags (%SOCK_NOSPACE, etc) 
* @ops: protocol specific socket operations 
* @file: File back pointer for gc 
* @sk: internal networking protocol agnostic socket representation 
* @wq: wait queue for several uses 
*/ 
struct socket { 
    socket_state  state; 

    kmemcheck_bitfield_begin(type); 
    short   type; 
    kmemcheck_bitfield_end(type); 

    unsigned long  flags; 

    struct socket_wq __rcu *wq; 

    struct file  *file; 
    struct sock  *sk; 
    const struct proto_ops *ops; 
}; 

的linux-SRC /包括/淨/ sock.h

/* 
* INET  An implementation of the TCP/IP protocol suite for the LINUX 
*  operating system. INET is implemented using the BSD Socket 
*  interface as the means of communication with the user level. 
* 
*  Definitions for the AF_INET socket handler. 
* 
* Version: @(#)sock.h 1.0.4 05/13/93 
* 
* Authors: Ross Biro 
*  Fred N. van Kempen, <[email protected]> 
*  Corey Minyard <[email protected]> 
*  Florian La Roche <[email protected]> 
* 
* Fixes: 
*  Alan Cox : Volatiles in skbuff pointers. See 
*     skbuff comments. May be overdone, 
*     better to prove they can be removed 
*     than the reverse. 
*  Alan Cox : Added a zapped field for tcp to note 
*     a socket is reset and must stay shut up 
*  Alan Cox : New fields for options 
* Pauline Middelink : identd support 
*  Alan Cox : Eliminate low level recv/recvfrom 
*  David S. Miller : New socket lookup architecture. 
*    Steve Whitehouse:  Default routines for sock_ops 
*    Arnaldo C. Melo : removed net_pinfo, tp_pinfo and made 
*       protinfo be just a void pointer, as the 
*       protocol specific parts were moved to 
*       respective headers and ipv4/v6, etc now 
*       use private slabcaches for its socks 
*    Pedro Hortas : New flags field for socket options 
* 
* 
*  This program is free software; you can redistribute it and/or 
*  modify it under the terms of the GNU General Public License 
*  as published by the Free Software Foundation; either version 
*  2 of the License, or (at your option) any later version. 
*/ 
.... 
.... 
.... 
/** 
    * struct sock - network layer representation of sockets 
    * @__sk_common: shared layout with inet_timewait_sock 
    * @sk_shutdown: mask of %SEND_SHUTDOWN and/or %RCV_SHUTDOWN 
    * @sk_userlocks: %SO_SNDBUF and %SO_RCVBUF settings 
    * @sk_lock: synchronizer 
    * @sk_kern_sock: True if sock is using kernel lock classes 
    * @sk_rcvbuf: size of receive buffer in bytes 
    * @sk_wq: sock wait queue and async head 
    * @sk_rx_dst: receive input route used by early demux 
    * @sk_dst_cache: destination cache 
    * @sk_dst_pending_confirm: need to confirm neighbour 
    * @sk_policy: flow policy 
    * @sk_receive_queue: incoming packets 
    * @sk_wmem_alloc: transmit queue bytes committed 
    * @sk_tsq_flags: TCP Small Queues flags 
    * @sk_write_queue: Packet sending queue 
    * @sk_omem_alloc: "o" is "option" or "other" 
    * @sk_wmem_queued: persistent queue size 
    * @sk_forward_alloc: space allocated forward 
    * @sk_napi_id: id of the last napi context to receive data for sk 
    * @sk_ll_usec: usecs to busypoll when there is no data 
    * @sk_allocation: allocation mode 
    * @sk_pacing_rate: Pacing rate (if supported by transport/packet scheduler) 
    * @sk_pacing_status: Pacing status (requested, handled by sch_fq) 
    * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) 
    * @sk_sndbuf: size of send buffer in bytes 
    * @__sk_flags_offset: empty field used to determine location of bitfield 
    * @sk_padding: unused element for alignment 
    * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets 
    * @sk_no_check_rx: allow zero checksum in RX packets 
    * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) 
    * @sk_route_nocaps: forbidden route capabilities (e.g NETIF_F_GSO_MASK) 
    * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) 
    * @sk_gso_max_size: Maximum GSO segment size to build 
    * @sk_gso_max_segs: Maximum number of GSO segments 
    * @sk_lingertime: %SO_LINGER l_linger setting 
    * @sk_backlog: always used with the per-socket spinlock held 
    * @sk_callback_lock: used with the callbacks in the end of this struct 
    * @sk_error_queue: rarely used 
    * @sk_prot_creator: sk_prot of original sock creator (see ipv6_setsockopt, 
    *   IPV6_ADDRFORM for instance) 
    * @sk_err: last error 
    * @sk_err_soft: errors that don't cause failure but are the cause of a 
    *   persistent failure not just 'timed out' 
    * @sk_drops: raw/udp drops counter 
    * @sk_ack_backlog: current listen backlog 
    * @sk_max_ack_backlog: listen backlog set in listen() 
    * @sk_uid: user id of owner 
    * @sk_priority: %SO_PRIORITY setting 
    * @sk_type: socket type (%SOCK_STREAM, etc) 
    * @sk_protocol: which protocol this socket belongs in this network family 
    * @sk_peer_pid: &struct pid for this socket's peer 
    * @sk_peer_cred: %SO_PEERCRED setting 
    * @sk_rcvlowat: %SO_RCVLOWAT setting 
    * @sk_rcvtimeo: %SO_RCVTIMEO setting 
    * @sk_sndtimeo: %SO_SNDTIMEO setting 
    * @sk_txhash: computed flow hash for use on transmit 
    * @sk_filter: socket filtering instructions 
    * @sk_timer: sock cleanup timer 
    * @sk_stamp: time stamp of last packet received 
    * @sk_tsflags: SO_TIMESTAMPING socket options 
    * @sk_tskey: counter to disambiguate concurrent tstamp requests 
    * @sk_zckey: counter to order MSG_ZEROCOPY notifications 
    * @sk_socket: Identd and reporting IO signals 
    * @sk_user_data: RPC layer private data 
    * @sk_frag: cached page frag 
    * @sk_peek_off: current peek_offset value 
    * @sk_send_head: front of stuff to transmit 
    * @sk_security: used by security modules 
    * @sk_mark: generic packet mark 
    * @sk_cgrp_data: cgroup data for this cgroup 
    * @sk_memcg: this socket's memory cgroup association 
    * @sk_write_pending: a write to stream socket waits to start 
    * @sk_state_change: callback to indicate change in the state of the sock 
    * @sk_data_ready: callback to indicate there is data to be processed 
    * @sk_write_space: callback to indicate there is bf sending space available 
    * @sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE) 
    * @sk_backlog_rcv: callback to process the backlog 
    * @sk_destruct: called at sock freeing time, i.e. when all refcnt == 0 
    * @sk_reuseport_cb: reuseport group container 
    * @sk_rcu: used during RCU grace period 
    */ 
struct sock { 
    /* 
    * Now struct inet_timewait_sock also uses sock_common, so please just 
    * don't add nothing before this first member (__sk_common) --acme 
    */ 
    struct sock_common __sk_common; 
#define sk_node   __sk_common.skc_node 
#define sk_nulls_node  __sk_common.skc_nulls_node 
#define sk_refcnt  __sk_common.skc_refcnt 
#define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping 

#define sk_dontcopy_begin __sk_common.skc_dontcopy_begin 
#define sk_dontcopy_end  __sk_common.skc_dontcopy_end 
#define sk_hash   __sk_common.skc_hash 
#define sk_portpair  __sk_common.skc_portpair 
#define sk_num   __sk_common.skc_num 
#define sk_dport  __sk_common.skc_dport 
#define sk_addrpair  __sk_common.skc_addrpair 
#define sk_daddr  __sk_common.skc_daddr 
#define sk_rcv_saddr  __sk_common.skc_rcv_saddr 
#define sk_family  __sk_common.skc_family 
#define sk_state  __sk_common.skc_state 
#define sk_reuse  __sk_common.skc_reuse 
#define sk_reuseport  __sk_common.skc_reuseport 
#define sk_ipv6only  __sk_common.skc_ipv6only 
#define sk_net_refcnt  __sk_common.skc_net_refcnt 
#define sk_bound_dev_if  __sk_common.skc_bound_dev_if 
#define sk_bind_node  __sk_common.skc_bind_node 
#define sk_prot   __sk_common.skc_prot 
#define sk_net   __sk_common.skc_net 
#define sk_v6_daddr  __sk_common.skc_v6_daddr 
#define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr 
#define sk_cookie  __sk_common.skc_cookie 
#define sk_incoming_cpu  __sk_common.skc_incoming_cpu 
#define sk_flags  __sk_common.skc_flags 
#define sk_rxhash  __sk_common.skc_rxhash 

    socket_lock_t  sk_lock; 
    atomic_t  sk_drops; 
    int   sk_rcvlowat; 
    struct sk_buff_head sk_error_queue; 
    struct sk_buff_head sk_receive_queue; 
    /* 
    * The backlog queue is special, it is always used with 
    * the per-socket spinlock held and requires low latency 
    * access. Therefore we special case it's implementation. 
    * Note : rmem_alloc is in this structure to fill a hole 
    * on 64bit arches, not because its logically part of 
    * backlog. 
    */ 
    struct { 
     atomic_t rmem_alloc; 
     int  len; 
     struct sk_buff *head; 
     struct sk_buff *tail; 
    } sk_backlog; 
#define sk_rmem_alloc sk_backlog.rmem_alloc 

    int   sk_forward_alloc; 
#ifdef CONFIG_NET_RX_BUSY_POLL 
    unsigned int  sk_ll_usec; 
    /* ===== mostly read cache line ===== */ 
    unsigned int  sk_napi_id; 
#endif 
    int   sk_rcvbuf; 

    struct sk_filter __rcu *sk_filter; 
    union { 
     struct socket_wq __rcu *sk_wq; 
     struct socket_wq *sk_wq_raw; 
    }; 
#ifdef CONFIG_XFRM 
    struct xfrm_policy __rcu *sk_policy[2]; 
#endif 
    struct dst_entry *sk_rx_dst; 
    struct dst_entry __rcu *sk_dst_cache; 
    atomic_t  sk_omem_alloc; 
    int   sk_sndbuf; 

    /* ===== cache line for TX ===== */ 
    int   sk_wmem_queued; 
    refcount_t  sk_wmem_alloc; 
    unsigned long  sk_tsq_flags; 
    struct sk_buff  *sk_send_head; 
    struct sk_buff_head sk_write_queue; 
    __s32   sk_peek_off; 
    int   sk_write_pending; 
    __u32   sk_dst_pending_confirm; 
    u32   sk_pacing_status; /* see enum sk_pacing */ 
    long   sk_sndtimeo; 
    struct timer_list sk_timer; 
    __u32   sk_priority; 
    __u32   sk_mark; 
    u32   sk_pacing_rate; /* bytes per second */ 
    u32   sk_max_pacing_rate; 
    struct page_frag sk_frag; 
    netdev_features_t sk_route_caps; 
    netdev_features_t sk_route_nocaps; 
    int   sk_gso_type; 
    unsigned int  sk_gso_max_size; 
    gfp_t   sk_allocation; 
    __u32   sk_txhash; 

    /* 
    * Because of non atomicity rules, all 
    * changes are protected by socket lock. 
    */ 
    unsigned int  __sk_flags_offset[0]; 
#ifdef __BIG_ENDIAN_BITFIELD 
#define SK_FL_PROTO_SHIFT 16 
#define SK_FL_PROTO_MASK 0x00ff0000 

#define SK_FL_TYPE_SHIFT 0 
#define SK_FL_TYPE_MASK 0x0000ffff 
#else 
#define SK_FL_PROTO_SHIFT 8 
#define SK_FL_PROTO_MASK 0x0000ff00 

#define SK_FL_TYPE_SHIFT 16 
#define SK_FL_TYPE_MASK 0xffff0000 
#endif 

    kmemcheck_bitfield_begin(flags); 
    unsigned int  sk_padding : 1, 
       sk_kern_sock : 1, 
       sk_no_check_tx : 1, 
       sk_no_check_rx : 1, 
       sk_userlocks : 4, 
       sk_protocol : 8, 
       sk_type  : 16; 
#define SK_PROTOCOL_MAX U8_MAX 
    kmemcheck_bitfield_end(flags); 

    u16   sk_gso_max_segs; 
    unsigned long   sk_lingertime; 
    struct proto  *sk_prot_creator; 
    rwlock_t  sk_callback_lock; 
    int   sk_err, 
       sk_err_soft; 
    u32   sk_ack_backlog; 
    u32   sk_max_ack_backlog; 
    kuid_t   sk_uid; 
    struct pid  *sk_peer_pid; 
    const struct cred *sk_peer_cred; 
    long   sk_rcvtimeo; 
    ktime_t   sk_stamp; 
    u16   sk_tsflags; 
    u8   sk_shutdown; 
    u32   sk_tskey; 
    atomic_t  sk_zckey; 
    struct socket  *sk_socket; 
    void   *sk_user_data; 
#ifdef CONFIG_SECURITY 
    void   *sk_security; 
#endif 
    struct sock_cgroup_data sk_cgrp_data; 
    struct mem_cgroup *sk_memcg; 
    void   (*sk_state_change)(struct sock *sk); 
    void   (*sk_data_ready)(struct sock *sk); 
    void   (*sk_write_space)(struct sock *sk); 
    void   (*sk_error_report)(struct sock *sk); 
    int   (*sk_backlog_rcv)(struct sock *sk, 
          struct sk_buff *skb); 
    void     (*sk_destruct)(struct sock *sk); 
    struct sock_reuseport __rcu *sk_reuseport_cb; 
    struct rcu_head  sk_rcu; 
}; 
+0

沒有解釋的複製粘貼源代碼真的沒有幫助。我沒有具體的任務,只是想了解背後的邏輯。例如,爲什麼'struct sock'中的'sk_uid'而不是'struct socket'? – user1202136

相關問題