PSARC/2009/331 IP Datapath Refactoring
PSARC/2008/522 EOF of 2001/070 IPsec HW Acceleration support
PSARC/2009/495 netstat -r flags for blackhole and reject routes
PSARC 2009/496 EOF of XRESOLV
PSARC/2009/494 IP_DONTFRAG socket option
PSARC/2009/515 fragmentation controls for ping and traceroute
6798716 ip_newroute delenda est
6798739 ARP and IP are too separate
6807265 IPv4 ip2mac() support
6756382 Please remove Venus IPsec HWACCEL code
6880632 sendto/sendmsg never returns EHOSTUNREACH in Solaris
6748582 sendmsg() return OK, but doesn't send message using IPv4-mapped x IPv6 addr
1119790 TCP and path mtu discovery
4637227 should support equal-cost multi-path (ECMP)
5078568 getsockopt() for IPV6_PATHMTU on a non-connected socket should not succeed
6419648 "AR* contract private note" should be removed as part of ATM SW EOL
6274715 Arp could keep the old entry in the cache while it waits for an arp response
6605615 Remove duplicated TCP/IP opt_set/opt_get code; use conn_t
6874677 IP_TTL can be used to send with ttl zero
4034090 arp should not let you delete your own entry
6882140 Implement IP_DONTFRAG socket option
6883858 Implement ping -D option; traceroute -F should work for IPv6 and shared-IP zones
1119792 TCP/IP black hole detection is broken on receiver
4078796 Directed broadcast forwarding code has problems
4104337 restrict the IPPROTO_IP and IPPROTO_IPV6 options based on the socket family
4203747 Source address selection for source routed packets
4230259 pmtu is increased every ip_ire_pathmtu_interval timer value.
4300533 When sticky option ipv6_pktinfo set to bogus address subsequent connect time out
4471035 ire_delete_cache_gw is called through ire_walk unnecessarily
4514572 SO_DONTROUTE socket option doesn't work with IPv6
4524980 tcp_lookup_ipv4() should compare the ifindex against tcpb->tcpb_bound_if
4532714 machine fails to switch quickly among failed default routes
4634219 IPv6 path mtu discovery is broken when using routing header
4691581 udp broadcast handling causes too many replicas
4708405 mcast is broken on machines when all interfaces are IFF_POINTOPOINT
4770457 netstat/route: source address of interface routes pretends to be gateway address
4786974 use routing table to determine routes/interface for multicast
4792619 An ip_fanout_udp_ipc_v6() routine might lead to some simpler code
4816115 Nuke ipsec_out_use_global_policy
4862844 ipsec offload corner case
4867533 tcp_rq and tcp_wq are redundant
4868589 NCEs should be shared across an IPMP group
4872093 unplumbing an improper virtual interface panics in ip_newroute_get_dst_ill()
4901671 FireEngine needs some cleanup
4907617 IPsec identity latching should be done before sending SYN-ACK
4941461 scopeid and IPV6_PKTINFO with UDP/ICMP connect() does not work properly
4944981 ip does nothing with IP6I_NEXTHOP
4963353 IPv4 and IPv6 proto fanout codes could be brought closer
4963360 consider passing zoneid using ip6i_t instead of ipsec_out_t in NDP
4963734 new ip6_asp locking is used incorrectly in ip_newroute_v6()
5008315 IPv6 code passes ip6i_t to IPsec code instead of ip6_t
5009636 memory leak in ip_fanout_proto_v6()
5092337 tcp/udp option handling can use some cleanup
5035841 Solaris can fail to create a valid broadcast ire
5043747 ar_query_xmit: Could not find the ace
5051574 tcp_check_policy is missing some checks
6305037 full hardware checksum is discarded when there're more than 2 mblks in the chain
6311149 ip.c needs to be put through a woodchipper
4708860 Unable to reassemble CGTP fragmented multicast packets
6224628 Large IPv6 packets with IPsec protection sometimes have length mismatch.
6213243 Solaris does not currently support Dead Gateway Detection
5029091 duplicate code in IP's input path for TCP/UDP/SCTP
4674643 through IPv6 CGTP routes, the very first packet is sent only after a while
6207318 Multiple default routes do not round robin connections to routers.
4823410 IP has an inconsistent view of link mtu
5105520 adding interface route to down interface causes ifconfig hang
5105707 advanced sockets API introduced some dead code
6318399 IP option handling for icmp and udp is too complicated
6321434 Every dropped packet in IP should use ip_drop_packet()
6341693 ifconfig mtu should operate on the physical interface, not individual ipif's
6352430 The credentials attached to an mblk are not particularly useful
6357894 uninitialised ipp_hoplimit needs to be cleaned up.
6363568 ip_xmit_v6() may be missing IRE releases in error cases
6364828 ip_rput_forward needs a makeover
6384416 System panics when running as multicast forwarder using multicast tunnels
6402382 TX: UDP v6 slowpath is not modified to handle mac_exempt conns
6418413 assertion failed ipha->ipha_ident == 0||ipha->ipha_ident == 0xFFFF
6420916 assertion failures in ipv6 wput path
6430851 use of b_prev to store ifindex is not 100% safe
6446106 IPv6 packets stored in nce->nce_qd_mp will be sent with incorrect tcp/udp checksums
6453711 SCTP OOTB sent as if genetated by global zone
6465212 ARP/IP merge should remove ire_freemblk.esballoc
6490163 ip_input() could misbehave if the first mblk's size is not big enough
6496664 missing ipif_refrele leads to reference leak and deferred crash in ip_wput_ipsec_out_v6
6504856 memory leak in ip_fanout_proto_v6() when using link local outer tunnel addresses
6507765 IRE cache hash function performs badly
6510186 IP_FORWARD_PROG bit is easily overlooked
6514727 cgtp ipv6 failure on snv54
6528286 MULTIRT (CGTP) should offload checksum to hardware
6533904 SCTP: doesn't support traffic class for IPv6
6539415 TX: ipif source selection is flawed for unlabeled gateways
6539851 plumbed unworking nic blocks sending broadcast packets
6564468 non-solaris SCTP stack over rawip socket: netstat command counts rawipInData not rawipOutDatagrams
6568511 ipIfStatsOutDiscards not bumped when discarding an ipsec packet on the wrong NIC
6584162 tcp_g_q_inactive() makes incorrect use of taskq_dispatch()
6603974 round-robin default with many interfaces causes infinite temporary IRE thrashing
6611750 ilm_lookup_ill_index_v4 was born an orphan
6618423 ip_wput_frag_mdt sends out packets that void pfhooks
6620964 IRE max bucket count calculations performed in ip_ire_init() are flawed
6626266 various _broadcasts seem redundant
6638182 IP_PKTINFO + SO_DONTROUTE + CIPSO IP option == panic
6647710 IPv6 possible DoS vulnerability
6657357 nce should be kmem_cache alloc'ed from an nce_cache.
6685131 ilg_add -> conn_ilg_alloc interacting with conn_ilg[] walkers can cause panic.
6730298 adding 0.0.0.0 key with mask != 0 causes 'route delete default' to fail
6730976 vni and ipv6 doesn't quite work.
6740956 assertion failed: mp->b_next == 0L && mp->b_prev == 0L in nce_queue_mp_common()
6748515 BUMP_MIB() is occasionally done on the wrong ill
6753250 ip_output_v6() `notv6' error path has an errant ill_refrele()
6756411 NULL-pointer dereference in ip_wput_local()
6769582 IP must forward packet returned from FW-HOOK
6781525 bogus usesrc usage leads directly to panic
6422839 System paniced in ip_multicast_loopback due to NULL pointer dereference
6785521 initial IPv6 DAD solicitation is dropped in ip_newroute_ipif_v6()
6787370 ipnet devices not seeing forwarded IP packets on outgoing interface
6791187 ip*dbg() calls in ip_output_options() claim to originate from ip_wput()
6794047 nce_fp_mp prevents sharing of NCEs across an IPMP group
6797926 many unnecessary ip0dbg() in ip_rput_data_v6
6846919 Packet queued for ND gets sent in the clear.
6856591 ping doesn't send packets with DF set
6861113 arp module has incorrect dependency path for hook module
6865664 IPV6_NEXTHOP does not work with TCP socket
6874681 No ICMP time exceeded when a router receives packet with ttl = 0
6880977 ip_wput_ire() uses over 1k of stack
6595433 IPsec performance could be significantly better when calling hw crypto provider synchronously
6848397 ifconfig down of an interface can hang.
6849602 IPV6_PATHMTU size issue for UDP
6885359 Add compile-time option for testing pure IPsec overhead
6889268 Odd loopback source address selection with IPMP
6895420 assertion failed: connp->conn_helper_info == NULL
6851189 Routing-related panic occurred during reboot on T2000 system running snv_117
6896174 Post-async-encryption, AH+ESP packets may have misinitialized ipha/ip6
6896687 iptun presents IPv6 with an MTU < 1280
6897006 assertion failed: ipif->ipif_id != 0 in ip_sioctl_slifzone_restart
diff --git a/usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c b/usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c
index 96bcec5..1919d21 100644
--- a/usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c
+++ b/usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c
@@ -196,6 +196,7 @@
static void tcp_report(const mib_item_t *item);
static void udp_report(const mib_item_t *item);
static void group_report(mib_item_t *item);
+static void dce_report(mib_item_t *item);
static void print_ip_stats(mib2_ip_t *ip);
static void print_icmp_stats(mib2_icmp_t *icmp);
static void print_ip6_stats(mib2_ipv6IfStatsEntry_t *ip6);
@@ -236,7 +237,7 @@
static boolean_t Aflag = B_FALSE; /* All sockets/ifs/rtng-tbls */
-static boolean_t Dflag = B_FALSE; /* Debug Info */
+static boolean_t Dflag = B_FALSE; /* DCE info */
static boolean_t Iflag = B_FALSE; /* IP Traffic Interfaces */
static boolean_t Mflag = B_FALSE; /* STREAMS Memory Statistics */
static boolean_t Nflag = B_FALSE; /* Numeric Network Addresses */
@@ -248,6 +249,7 @@
static boolean_t Gflag = B_FALSE; /* Multicast group membership */
static boolean_t MMflag = B_FALSE; /* Multicast routing table */
static boolean_t DHCPflag = B_FALSE; /* DHCP statistics */
+static boolean_t Xflag = B_FALSE; /* Debug Info */
static int v4compat = 0; /* Compatible printing format for status */
@@ -276,6 +278,8 @@
static int ipv6MemberEntrySize;
static int ipv6GroupSourceEntrySize;
+static int ipDestEntrySize;
+
static int transportMLPSize;
static int tcpConnEntrySize;
static int tcp6ConnEntrySize;
@@ -298,7 +302,7 @@
/* Flags on routes */
#define FLF_A 0x00000001
-#define FLF_B 0x00000002
+#define FLF_b 0x00000002
#define FLF_D 0x00000004
#define FLF_G 0x00000008
#define FLF_H 0x00000010
@@ -306,7 +310,12 @@
#define FLF_U 0x00000040
#define FLF_M 0x00000080
#define FLF_S 0x00000100
-static const char flag_list[] = "ABDGHLUMS";
+#define FLF_C 0x00000200 /* IRE_IF_CLONE */
+#define FLF_I 0x00000400 /* RTF_INDIRECT */
+#define FLF_R 0x00000800 /* RTF_REJECT */
+#define FLF_B 0x00001000 /* RTF_BLACKHOLE */
+
+static const char flag_list[] = "AbDGHLUMSCIRB";
typedef struct filter_rule filter_t;
@@ -379,14 +388,15 @@
(void) setlocale(LC_ALL, "");
(void) textdomain(TEXT_DOMAIN);
- while ((c = getopt(argc, argv, "adimnrspMgvf:P:I:DRT:")) != -1) {
+ while ((c = getopt(argc, argv, "adimnrspMgvxf:P:I:DRT:")) != -1) {
switch ((char)c) {
case 'a': /* all connections */
Aflag = B_TRUE;
break;
- case 'd': /* turn on debugging */
+ case 'd': /* DCE info */
Dflag = B_TRUE;
+ IFLAGMOD(Iflag_only, 1, 0); /* see macro def'n */
break;
case 'i': /* interface (ill/ipif report) */
@@ -438,6 +448,10 @@
IFLAGMOD(Iflag_only, 1, 0); /* see macro def'n */
break;
+ case 'x': /* turn on debugging */
+ Xflag = B_TRUE;
+ break;
+
case 'f':
process_filter(optarg);
break;
@@ -603,7 +617,7 @@
mib_item_destroy(&previtem);
}
- if (!(Iflag || Rflag || Sflag || Mflag ||
+ if (!(Dflag || Iflag || Rflag || Sflag || Mflag ||
MMflag || Pflag || Gflag || DHCPflag)) {
if (protocol_selected(IPPROTO_UDP))
udp_report(item);
@@ -634,12 +648,14 @@
if (family_selected(AF_INET6))
ndp_report(item);
}
+ if (Dflag)
+ dce_report(item);
mib_item_destroy(&curritem);
}
/* netstat: AF_UNIX behaviour */
if (family_selected(AF_UNIX) &&
- (!(Iflag || Rflag || Sflag || Mflag ||
+ (!(Dflag || Iflag || Rflag || Sflag || Mflag ||
MMflag || Pflag || Gflag)))
unixpr(kc);
(void) kstat_close(kc);
@@ -729,7 +745,7 @@
* us information concerning IRE_MARK_TESTHIDDEN routes.
*/
req = (struct opthdr *)&tor[1];
- req->level = EXPER_IP_AND_TESTHIDDEN;
+ req->level = EXPER_IP_AND_ALL_IRES;
req->name = 0;
req->len = 0;
@@ -755,7 +771,7 @@
getcode = getmsg(sd, &ctlbuf, (struct strbuf *)0, &flags);
if (getcode == -1) {
perror("mibget getmsg(ctl) failed");
- if (Dflag) {
+ if (Xflag) {
(void) fputs("# level name len\n",
stderr);
i = 0;
@@ -774,7 +790,7 @@
toa->PRIM_type == T_OPTMGMT_ACK &&
toa->MGMT_flags == T_SUCCESS &&
req->len == 0) {
- if (Dflag)
+ if (Xflag)
(void) printf("mibget getmsg() %d returned "
"EOD (level %ld, name %ld)\n",
j, req->level, req->name);
@@ -826,7 +842,7 @@
last_item->valp = malloc((int)req->len);
if (last_item->valp == NULL)
goto error_exit;
- if (Dflag)
+ if (Xflag)
(void) printf("msg %d: group = %4d mib_id = %5d"
"length = %d\n",
j, last_item->group, last_item->mib_id,
@@ -1754,6 +1770,7 @@
ipGroupSourceEntrySize = ip->ipGroupSourceEntrySize;
ipRouteAttributeSize = ip->ipRouteAttributeSize;
transportMLPSize = ip->transportMLPSize;
+ ipDestEntrySize = ip->ipDestEntrySize;
assert(IS_P2ALIGNED(ipAddrEntrySize,
sizeof (mib2_ipAddrEntry_t *)));
assert(IS_P2ALIGNED(ipRouteEntrySize,
@@ -1850,7 +1867,7 @@
}
} /* 'for' loop 1 ends */
- if (Dflag) {
+ if (Xflag) {
(void) puts("mib_get_constants:");
(void) printf("\tipv6IfStatsEntrySize %d\n",
ipv6IfStatsEntrySize);
@@ -1872,6 +1889,7 @@
ipv6MemberEntrySize);
(void) printf("\tipv6IfIcmpEntrySize %d\n",
ipv6IfIcmpEntrySize);
+ (void) printf("\tipDestEntrySize %d\n", ipDestEntrySize);
(void) printf("\ttransportMLPSize %d\n", transportMLPSize);
(void) printf("\ttcpConnEntrySize %d\n", tcpConnEntrySize);
(void) printf("\ttcp6ConnEntrySize %d\n", tcp6ConnEntrySize);
@@ -1895,7 +1913,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -2542,7 +2560,7 @@
for (tempitem = curritem;
tempitem;
tempitem = tempitem->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -2603,7 +2621,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -2632,7 +2650,7 @@
boolean_t first = B_TRUE;
uint32_t new_ifindex;
- if (Dflag)
+ if (Xflag)
(void) printf("if_report: %d items\n",
(item->length)
/ sizeof (mib2_ipAddrEntry_t));
@@ -2944,7 +2962,7 @@
boolean_t first = B_TRUE;
uint32_t new_ifindex;
- if (Dflag)
+ if (Xflag)
(void) printf("if_report: %d items\n",
(item->length)
/ sizeof (mib2_ipv6AddrEntry_t));
@@ -3287,10 +3305,10 @@
(void) pr_netaddr(ap->ipAdEntAddr, ap->ipAdEntNetMask,
abuf, sizeof (abuf));
- (void) printf("%-13s %-14s %-6llu %-5s %-6llu "
+ (void) printf("%-13s %-14s %-6llu %-5s %-6s "
"%-5s %-6s %-6llu\n", abuf,
pr_addr(ap->ipAdEntAddr, dstbuf, sizeof (dstbuf)),
- statptr->ipackets, "N/A", statptr->opackets, "N/A", "N/A",
+ statptr->ipackets, "N/A", "N/A", "N/A", "N/A",
0LL);
}
}
@@ -3337,11 +3355,10 @@
else
(void) pr_prefix6(&ap6->ipv6AddrAddress,
ap6->ipv6AddrPfxLength, abuf, sizeof (abuf));
- (void) printf("%-27s %-27s %-6llu %-5s %-6llu %-5s %-6s\n",
+ (void) printf("%-27s %-27s %-6llu %-5s %-6s %-5s %-6s\n",
abuf, pr_addr6(&ap6->ipv6AddrAddress, dstbuf,
sizeof (dstbuf)),
- statptr->ipackets, "N/A",
- statptr->opackets, "N/A", "N/A");
+ statptr->ipackets, "N/A", "N/A", "N/A", "N/A");
}
}
@@ -3490,7 +3507,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -3501,12 +3518,12 @@
switch (item->mib_id) {
case EXPER_IP_GROUP_MEMBERSHIP:
v4grp = item;
- if (Dflag)
+ if (Xflag)
(void) printf("item is v4grp info\n");
break;
case EXPER_IP_GROUP_SOURCES:
v4src = item;
- if (Dflag)
+ if (Xflag)
(void) printf("item is v4src info\n");
break;
default:
@@ -3518,12 +3535,12 @@
switch (item->mib_id) {
case EXPER_IP6_GROUP_MEMBERSHIP:
v6grp = item;
- if (Dflag)
+ if (Xflag)
(void) printf("item is v6grp info\n");
break;
case EXPER_IP6_GROUP_SOURCES:
v6src = item;
- if (Dflag)
+ if (Xflag)
(void) printf("item is v6src info\n");
break;
default:
@@ -3533,7 +3550,7 @@
}
if (family_selected(AF_INET) && v4grp != NULL) {
- if (Dflag)
+ if (Xflag)
(void) printf("%u records for ipGroupMember:\n",
v4grp->length / sizeof (ip_member_t));
@@ -3564,7 +3581,7 @@
if (!Vflag || v4src == NULL)
continue;
- if (Dflag)
+ if (Xflag)
(void) printf("scanning %u ipGroupSource "
"records...\n",
v4src->length/sizeof (ip_grpsrc_t));
@@ -3609,7 +3626,7 @@
}
if (family_selected(AF_INET6) && v6grp != NULL) {
- if (Dflag)
+ if (Xflag)
(void) printf("%u records for ipv6GroupMember:\n",
v6grp->length / sizeof (ipv6_member_t));
@@ -3638,7 +3655,7 @@
if (!Vflag || v6src == NULL)
continue;
- if (Dflag)
+ if (Xflag)
(void) printf("scanning %u ipv6GroupSource "
"records...\n",
v6src->length/sizeof (ipv6_grpsrc_t));
@@ -3683,6 +3700,126 @@
(void) fflush(stdout);
}
+/* --------------------- DCE_REPORT (netstat -d) ------------------------- */
+
+#define FLBUFSIZE 8
+
+/* Assumes flbuf is at least 5 characters; callers use FLBUFSIZE */
+static char *
+dceflags2str(uint32_t flags, char *flbuf)
+{
+ char *str = flbuf;
+
+ if (flags & DCEF_DEFAULT)
+ *str++ = 'D';
+ if (flags & DCEF_PMTU)
+ *str++ = 'P';
+ if (flags & DCEF_UINFO)
+ *str++ = 'U';
+ if (flags & DCEF_TOO_SMALL_PMTU)
+ *str++ = 'S';
+ *str++ = '\0';
+ return (flbuf);
+}
+
+static void
+dce_report(mib_item_t *item)
+{
+ mib_item_t *v4dce = NULL;
+ mib_item_t *v6dce = NULL;
+ int jtemp = 0;
+ char ifname[LIFNAMSIZ + 1];
+ char abuf[MAXHOSTNAMELEN + 1];
+ char flbuf[FLBUFSIZE];
+ boolean_t first;
+ dest_cache_entry_t *dce;
+
+ /* 'for' loop 1: */
+ for (; item; item = item->next_item) {
+ if (Xflag) {
+ (void) printf("\n--- Entry %d ---\n", ++jtemp);
+ (void) printf("Group = %d, mib_id = %d, "
+ "length = %d, valp = 0x%p\n",
+ item->group, item->mib_id, item->length,
+ item->valp);
+ }
+ if (item->group == MIB2_IP && family_selected(AF_INET) &&
+ item->mib_id == EXPER_IP_DCE) {
+ v4dce = item;
+ if (Xflag)
+ (void) printf("item is v4dce info\n");
+ }
+ if (item->group == MIB2_IP6 && family_selected(AF_INET6) &&
+ item->mib_id == EXPER_IP_DCE) {
+ v6dce = item;
+ if (Xflag)
+ (void) printf("item is v6dce info\n");
+ }
+ }
+
+ if (family_selected(AF_INET) && v4dce != NULL) {
+ if (Xflag)
+ (void) printf("%u records for DestCacheEntry:\n",
+ v4dce->length / ipDestEntrySize);
+
+ first = B_TRUE;
+ for (dce = (dest_cache_entry_t *)v4dce->valp;
+ (char *)dce < (char *)v4dce->valp + v4dce->length;
+ /* LINTED: (note 1) */
+ dce = (dest_cache_entry_t *)((char *)dce +
+ ipDestEntrySize)) {
+ if (first) {
+ (void) putchar('\n');
+ (void) puts("Destination Cache Entries: IPv4");
+ (void) puts(
+ "Address PMTU Age Flags");
+ (void) puts(
+ "-------------------- ------ ----- -----");
+ first = B_FALSE;
+ }
+
+ (void) printf("%-20s %6u %5u %-5s\n",
+ pr_addr(dce->DestIpv4Address, abuf, sizeof (abuf)),
+ dce->DestPmtu, dce->DestAge,
+ dceflags2str(dce->DestFlags, flbuf));
+ }
+ }
+
+ if (family_selected(AF_INET6) && v6dce != NULL) {
+ if (Xflag)
+ (void) printf("%u records for DestCacheEntry:\n",
+ v6dce->length / ipDestEntrySize);
+
+ first = B_TRUE;
+ for (dce = (dest_cache_entry_t *)v6dce->valp;
+ (char *)dce < (char *)v6dce->valp + v6dce->length;
+ /* LINTED: (note 1) */
+ dce = (dest_cache_entry_t *)((char *)dce +
+ ipDestEntrySize)) {
+ if (first) {
+ (void) putchar('\n');
+ (void) puts("Destination Cache Entries: IPv6");
+ (void) puts(
+ "Address PMTU "
+ " Age Flags If ");
+ (void) puts(
+ "--------------------------- ------ "
+ "----- ----- ---");
+ first = B_FALSE;
+ }
+
+ (void) printf("%-27s %6u %5u %-5s %s\n",
+ pr_addr6(&dce->DestIpv6Address, abuf,
+ sizeof (abuf)),
+ dce->DestPmtu, dce->DestAge,
+ dceflags2str(dce->DestFlags, flbuf),
+ dce->DestIfindex == 0 ? "" :
+ ifindex2str(dce->DestIfindex, ifname));
+ }
+ }
+ (void) fflush(stdout);
+}
+
/* --------------------- ARP_REPORT (netstat -p) -------------------------- */
static void
@@ -3703,7 +3840,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -3713,7 +3850,7 @@
if (!(item->group == MIB2_IP && item->mib_id == MIB2_IP_MEDIA))
continue; /* 'for' loop 1 */
- if (Dflag)
+ if (Xflag)
(void) printf("%u records for "
"ipNetToMediaEntryTable:\n",
item->length/sizeof (mib2_ipNetToMediaEntry_t));
@@ -3798,7 +3935,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -3973,7 +4110,7 @@
v4a = v4_attrs;
v6a = v6_attrs;
for (; item != NULL; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -3991,7 +4128,7 @@
else if (item->group == MIB2_IP6 && !family_selected(AF_INET6))
continue; /* 'for' loop 1 */
- if (Dflag) {
+ if (Xflag) {
if (item->group == MIB2_IP) {
(void) printf("%u records for "
"ipRouteEntryTable:\n",
@@ -4161,29 +4298,29 @@
flag_b = FLF_U;
(void) strcpy(flags, "U");
- if (rp->ipRouteInfo.re_ire_type == IRE_DEFAULT ||
- rp->ipRouteInfo.re_ire_type == IRE_PREFIX ||
- rp->ipRouteInfo.re_ire_type == IRE_HOST ||
- rp->ipRouteInfo.re_ire_type == IRE_HOST_REDIRECT) {
+ /* RTF_INDIRECT wins over RTF_GATEWAY - don't display both */
+ if (rp->ipRouteInfo.re_flags & RTF_INDIRECT) {
+ (void) strcat(flags, "I");
+ flag_b |= FLF_I;
+ } else if (rp->ipRouteInfo.re_ire_type & IRE_OFFLINK) {
(void) strcat(flags, "G");
flag_b |= FLF_G;
}
- if (rp->ipRouteMask == IP_HOST_MASK) {
+ /* IRE_IF_CLONE wins over RTF_HOST - don't display both */
+ if (rp->ipRouteInfo.re_ire_type & IRE_IF_CLONE) {
+ (void) strcat(flags, "C");
+ flag_b |= FLF_C;
+ } else if (rp->ipRouteMask == IP_HOST_MASK) {
(void) strcat(flags, "H");
flag_b |= FLF_H;
}
- if (rp->ipRouteInfo.re_ire_type == IRE_HOST_REDIRECT) {
+ if (rp->ipRouteInfo.re_flags & RTF_DYNAMIC) {
(void) strcat(flags, "D");
flag_b |= FLF_D;
}
- if (rp->ipRouteInfo.re_ire_type == IRE_CACHE) {
- /* Address resolution */
- (void) strcat(flags, "A");
- flag_b |= FLF_A;
- }
if (rp->ipRouteInfo.re_ire_type == IRE_BROADCAST) { /* Broadcast */
- (void) strcat(flags, "B");
- flag_b |= FLF_B;
+ (void) strcat(flags, "b");
+ flag_b |= FLF_b;
}
if (rp->ipRouteInfo.re_ire_type == IRE_LOCAL) { /* Local */
(void) strcat(flags, "L");
@@ -4197,6 +4334,14 @@
(void) strcat(flags, "S"); /* Setsrc */
flag_b |= FLF_S;
}
+ if (rp->ipRouteInfo.re_flags & RTF_REJECT) {
+ (void) strcat(flags, "R");
+ flag_b |= FLF_R;
+ }
+ if (rp->ipRouteInfo.re_flags & RTF_BLACKHOLE) {
+ (void) strcat(flags, "B");
+ flag_b |= FLF_B;
+ }
return (flag_b);
}
@@ -4205,9 +4350,9 @@
static const char ire_hdr_v4_compat[] =
"\n%s Table:\n";
static const char ire_hdr_v4_verbose[] =
-" Destination Mask Gateway Device Mxfrg "
-"Rtt Ref Flg Out In/Fwd %s\n"
-"-------------------- --------------- -------------------- ------ ----- "
+" Destination Mask Gateway Device "
+" MTU Ref Flg Out In/Fwd %s\n"
+"-------------------- --------------- -------------------- ------ "
"----- --- --- ----- ------ %s\n";
static const char ire_hdr_v4_normal[] =
@@ -4226,8 +4371,10 @@
char flags[10]; /* RTF_ flags */
uint_t flag_b;
- if (!(Aflag || (rp->ipRouteInfo.re_ire_type != IRE_CACHE &&
+ if (!(Aflag || (rp->ipRouteInfo.re_ire_type != IRE_IF_CLONE &&
rp->ipRouteInfo.re_ire_type != IRE_BROADCAST &&
+ rp->ipRouteInfo.re_ire_type != IRE_MULTICAST &&
+ rp->ipRouteInfo.re_ire_type != IRE_NOROUTE &&
rp->ipRouteInfo.re_ire_type != IRE_LOCAL))) {
return (first);
}
@@ -4253,15 +4400,13 @@
dstbuf, sizeof (dstbuf));
}
if (Vflag) {
- (void) printf("%-20s %-15s %-20s %-6s %5u%c %4u %3u "
+ (void) printf("%-20s %-15s %-20s %-6s %5u %3u "
"%-4s%6u %6u %s\n",
dstbuf,
pr_mask(rp->ipRouteMask, maskbuf, sizeof (maskbuf)),
pr_addrnz(rp->ipRouteNextHop, gwbuf, sizeof (gwbuf)),
octetstr(&rp->ipRouteIfIndex, 'a', ifname, sizeof (ifname)),
rp->ipRouteInfo.re_max_frag,
- rp->ipRouteInfo.re_frag_flag ? '*' : ' ',
- rp->ipRouteInfo.re_rtt,
rp->ipRouteInfo.re_ref,
flags,
rp->ipRouteInfo.re_obpkt,
@@ -4391,12 +4536,68 @@
return (B_TRUE);
}
+/*
+ * Given an IPv6 MIB2 route entry, form the list of flags for the
+ * route.
+ */
+static uint_t
+form_v6_route_flags(const mib2_ipv6RouteEntry_t *rp6, char *flags)
+{
+ uint_t flag_b;
+
+ flag_b = FLF_U;
+ (void) strcpy(flags, "U");
+ /* RTF_INDIRECT wins over RTF_GATEWAY - don't display both */
+ if (rp6->ipv6RouteInfo.re_flags & RTF_INDIRECT) {
+ (void) strcat(flags, "I");
+ flag_b |= FLF_I;
+ } else if (rp6->ipv6RouteInfo.re_ire_type & IRE_OFFLINK) {
+ (void) strcat(flags, "G");
+ flag_b |= FLF_G;
+ }
+
+ /* IRE_IF_CLONE wins over RTF_HOST - don't display both */
+ if (rp6->ipv6RouteInfo.re_ire_type & IRE_IF_CLONE) {
+ (void) strcat(flags, "C");
+ flag_b |= FLF_C;
+ } else if (rp6->ipv6RoutePfxLength == IPV6_ABITS) {
+ (void) strcat(flags, "H");
+ flag_b |= FLF_H;
+ }
+
+ if (rp6->ipv6RouteInfo.re_flags & RTF_DYNAMIC) {
+ (void) strcat(flags, "D");
+ flag_b |= FLF_D;
+ }
+ if (rp6->ipv6RouteInfo.re_ire_type == IRE_LOCAL) { /* Local */
+ (void) strcat(flags, "L");
+ flag_b |= FLF_L;
+ }
+ if (rp6->ipv6RouteInfo.re_flags & RTF_MULTIRT) {
+ (void) strcat(flags, "M"); /* Multiroute */
+ flag_b |= FLF_M;
+ }
+ if (rp6->ipv6RouteInfo.re_flags & RTF_SETSRC) {
+ (void) strcat(flags, "S"); /* Setsrc */
+ flag_b |= FLF_S;
+ }
+ if (rp6->ipv6RouteInfo.re_flags & RTF_REJECT) {
+ (void) strcat(flags, "R");
+ flag_b |= FLF_R;
+ }
+ if (rp6->ipv6RouteInfo.re_flags & RTF_BLACKHOLE) {
+ (void) strcat(flags, "B");
+ flag_b |= FLF_B;
+ }
+ return (flag_b);
+}
+
static const char ire_hdr_v6[] =
"\n%s Table: IPv6\n";
static const char ire_hdr_v6_verbose[] =
-" Destination/Mask Gateway If PMTU Rtt "
+" Destination/Mask Gateway If MTU "
"Ref Flags Out In/Fwd %s\n"
-"--------------------------- --------------------------- ----- ------ ----- "
+"--------------------------- --------------------------- ----- ----- "
"--- ----- ------ ------ %s\n";
static const char ire_hdr_v6_normal[] =
" Destination/Mask Gateway Flags Ref Use "
@@ -4414,47 +4615,14 @@
char flags[10]; /* RTF_ flags */
uint_t flag_b;
- if (!(Aflag || (rp6->ipv6RouteInfo.re_ire_type != IRE_CACHE &&
+ if (!(Aflag || (rp6->ipv6RouteInfo.re_ire_type != IRE_IF_CLONE &&
+ rp6->ipv6RouteInfo.re_ire_type != IRE_MULTICAST &&
+ rp6->ipv6RouteInfo.re_ire_type != IRE_NOROUTE &&
rp6->ipv6RouteInfo.re_ire_type != IRE_LOCAL))) {
return (first);
}
- flag_b = FLF_U;
- (void) strcpy(flags, "U");
- if (rp6->ipv6RouteInfo.re_ire_type == IRE_DEFAULT ||
- rp6->ipv6RouteInfo.re_ire_type == IRE_PREFIX ||
- rp6->ipv6RouteInfo.re_ire_type == IRE_HOST ||
- rp6->ipv6RouteInfo.re_ire_type == IRE_HOST_REDIRECT) {
- (void) strcat(flags, "G");
- flag_b |= FLF_G;
- }
-
- if (rp6->ipv6RoutePfxLength == IPV6_ABITS) {
- (void) strcat(flags, "H");
- flag_b |= FLF_H;
- }
-
- if (rp6->ipv6RouteInfo.re_ire_type == IRE_HOST_REDIRECT) {
- (void) strcat(flags, "D");
- flag_b |= FLF_D;
- }
- if (rp6->ipv6RouteInfo.re_ire_type == IRE_CACHE) {
- /* Address resolution */
- (void) strcat(flags, "A");
- flag_b |= FLF_A;
- }
- if (rp6->ipv6RouteInfo.re_ire_type == IRE_LOCAL) { /* Local */
- (void) strcat(flags, "L");
- flag_b |= FLF_L;
- }
- if (rp6->ipv6RouteInfo.re_flags & RTF_MULTIRT) {
- (void) strcat(flags, "M"); /* Multiroute */
- flag_b |= FLF_M;
- }
- if (rp6->ipv6RouteInfo.re_flags & RTF_SETSRC) {
- (void) strcat(flags, "S"); /* Setsrc */
- flag_b |= FLF_S;
- }
+ flag_b = form_v6_route_flags(rp6, flags);
if (!ire_filter_match_v6(rp6, flag_b))
return (first);
@@ -4468,7 +4636,7 @@
}
if (Vflag) {
- (void) printf("%-27s %-27s %-5s %5u%c %5u %3u "
+ (void) printf("%-27s %-27s %-5s %5u %3u "
"%-5s %6u %6u %s\n",
pr_prefix6(&rp6->ipv6RouteDest,
rp6->ipv6RoutePfxLength, dstbuf, sizeof (dstbuf)),
@@ -4478,8 +4646,6 @@
octetstr(&rp6->ipv6RouteIfIndex, 'a',
ifname, sizeof (ifname)),
rp6->ipv6RouteInfo.re_max_frag,
- rp6->ipv6RouteInfo.re_frag_flag ? '*' : ' ',
- rp6->ipv6RouteInfo.re_rtt,
rp6->ipv6RouteInfo.re_ref,
flags,
rp6->ipv6RouteInfo.re_obpkt,
@@ -4617,7 +4783,7 @@
v4a = v4_attrs;
v6a = v6_attrs;
for (; item != NULL; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -4841,7 +5007,7 @@
v6a = v6_attrs;
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -4916,10 +5082,7 @@
"",
miudp_state(ude->udpEntryInfo.ue_state, attr));
- /*
- * UDP sockets don't have remote attributes, so there's no need to
- * print them here.
- */
+ print_transport_label(attr);
return (first);
}
@@ -4956,10 +5119,7 @@
miudp_state(ude6->udp6EntryInfo.ue_state, attr),
ifnamep == NULL ? "" : ifnamep);
- /*
- * UDP sockets don't have remote attributes, so there's no need to
- * print them here.
- */
+ print_transport_label(attr);
return (first);
}
@@ -5321,7 +5481,7 @@
/* 'for' loop 1: */
for (; item; item = item->next_item) {
- if (Dflag) {
+ if (Xflag) {
(void) printf("\n--- Entry %d ---\n", ++jtemp);
(void) printf("Group = %d, mib_id = %d, "
"length = %d, valp = 0x%p\n",
@@ -5334,7 +5494,7 @@
switch (item->mib_id) {
case EXPER_DVMRP_VIF:
- if (Dflag)
+ if (Xflag)
(void) printf("%u records for ipVifTable:\n",
item->length/sizeof (struct vifctl));
if (item->length/sizeof (struct vifctl) == 0) {
@@ -5377,7 +5537,7 @@
break;
case EXPER_DVMRP_MRT:
- if (Dflag)
+ if (Xflag)
(void) printf("%u records for ipMfcTable:\n",
item->length/sizeof (struct vifctl));
if (item->length/sizeof (struct vifctl) == 0) {
diff --git a/usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_main.c b/usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_main.c
index 28416c4..c062199 100644
--- a/usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_main.c
+++ b/usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_main.c
@@ -2875,7 +2875,7 @@
* us information concerning IRE_MARK_TESTHIDDEN routes.
*/
req = (struct opthdr *)&tor[1];
- req->level = EXPER_IP_AND_TESTHIDDEN;
+ req->level = EXPER_IP_AND_ALL_IRES;
req->name = 0;
req->len = 0;
diff --git a/usr/src/cmd/cmd-inet/usr.lib/mdnsd/mDNSUNP.c b/usr/src/cmd/cmd-inet/usr.lib/mdnsd/mDNSUNP.c
index b76341e..2cea11b 100644
--- a/usr/src/cmd/cmd-inet/usr.lib/mdnsd/mDNSUNP.c
+++ b/usr/src/cmd/cmd-inet/usr.lib/mdnsd/mDNSUNP.c
@@ -407,6 +407,15 @@
if (ifflags & (IFF_NOXMIT | IFF_NOLOCAL | IFF_PRIVATE))
continue;
+ /* A DHCP client will have IFF_UP set yet the address is zero. Ignore */
+ if (lifr->lifr_addr.ss_family == AF_INET) {
+ struct sockaddr_in *sinptr;
+
+ sinptr = (struct sockaddr_in *) &lifr->lifr_addr;
+ if (sinptr->sin_addr.s_addr == INADDR_ANY)
+ continue;
+ }
+
if (*best_lifr != NULL) {
/*
* Check if we found a better interface by checking
diff --git a/usr/src/cmd/cmd-inet/usr.sbin/ifconfig/ifconfig.c b/usr/src/cmd/cmd-inet/usr.sbin/ifconfig/ifconfig.c
index 506b15a..868f9ab 100644
--- a/usr/src/cmd/cmd-inet/usr.sbin/ifconfig/ifconfig.c
+++ b/usr/src/cmd/cmd-inet/usr.sbin/ifconfig/ifconfig.c
@@ -3541,18 +3541,6 @@
Perror2_exit("I_PUSH", IP_MOD_NAME);
/*
- * Push the ARP module onto the interface stream. IP uses
- * this to send resolution requests up to ARP. We need to
- * do this before the SLIFNAME ioctl is sent down because
- * the interface becomes publicly known as soon as the SLIFNAME
- * ioctl completes. Thus some other process trying to bring up
- * the interface after SLIFNAME but before we have pushed ARP
- * could hang. We pop the module again later if it is not needed.
- */
- if (ioctl(ip_fd, I_PUSH, ARP_MOD_NAME) == -1)
- Perror2_exit("I_PUSH", ARP_MOD_NAME);
-
- /*
* Prepare to set IFF_IPV4/IFF_IPV6 flags as part of SIOCSLIFNAME.
* (At this point in time the kernel also allows an override of the
* IFF_CANTCHANGE flags.)
@@ -3679,12 +3667,6 @@
(void) putchar('\n');
}
- /* Check if arp is not actually needed */
- if (lifr.lifr_flags & (IFF_NOARP|IFF_IPV6)) {
- if (ioctl(ip_fd, I_POP, 0) == -1)
- Perror2_exit("I_POP", ARP_MOD_NAME);
- }
-
/*
* Open "/dev/udp" for use as a multiplexor to PLINK the
* interface stream under. We use "/dev/udp" instead of "/dev/ip"
diff --git a/usr/src/cmd/cmd-inet/usr.sbin/ping/ping.c b/usr/src/cmd/cmd-inet/usr.sbin/ping/ping.c
index 2a4ff60..d851dce 100644
--- a/usr/src/cmd/cmd-inet/usr.sbin/ping/ping.c
+++ b/usr/src/cmd/cmd-inet/usr.sbin/ping/ping.c
@@ -159,6 +159,7 @@
int npackets; /* number of packets to send */
static ushort_t tos; /* type-of-service value */
static int hoplimit = -1; /* time-to-live value */
+static int dontfrag; /* IP*_DONTFRAG */
static int timeout = TIMEOUT; /* timeout value (sec) for probes */
static struct if_entry out_if; /* interface argument */
int ident; /* ID for this ping run */
@@ -268,7 +269,7 @@
setbuf(stdout, (char *)0);
while ((c = getopt(argc, argv,
- "abA:c:dF:G:g:I:i:LlnN:P:p:rRSsTt:UvX:x:Y0123?")) != -1) {
+ "abA:c:dDF:G:g:I:i:LlnN:P:p:rRSsTt:UvX:x:Y0123?")) != -1) {
switch ((char)c) {
case 'A':
if (strcmp(optarg, "inet") == 0) {
@@ -301,6 +302,10 @@
options |= SO_DEBUG;
break;
+ case 'D':
+ dontfrag = 1;
+ break;
+
case 'b':
bypass = _B_TRUE;
break;
@@ -1303,8 +1308,6 @@
}
}
- if (nexthop != NULL && !use_udp)
- set_nexthop(family, ai_nexthop, recv_sock);
/*
* We always receive on raw icmp socket. But the sending socket can be
* raw icmp or udp, depending on the use of -U flag.
@@ -1332,9 +1335,6 @@
}
}
- if (nexthop != NULL)
- set_nexthop(family, ai_nexthop, send_sock);
-
/*
* In order to distinguish replies to our UDP probes from
* other pings', we need to know our source port number.
@@ -1368,6 +1368,9 @@
send_sock = recv_sock;
}
+ if (nexthop != NULL)
+ set_nexthop(family, ai_nexthop, send_sock);
+
int_op = 48 * 1024;
if (int_op < datalen)
int_op = datalen;
@@ -1431,6 +1434,7 @@
if (moptions & MULTICAST_TTL) {
char_op = hoplimit;
+ /* Applies to unicast and multicast. */
if (family == AF_INET) {
if (setsockopt(send_sock, IPPROTO_IP, IP_MULTICAST_TTL,
(char *)&char_op, sizeof (char)) == -1) {
@@ -1454,7 +1458,10 @@
*/
}
- /* did the user specify an interface? */
+ /*
+ * did the user specify an interface?
+ * Applies to unicast, broadcast and multicast.
+ */
if (moptions & MULTICAST_IF) {
struct ifaddrlist *al = NULL; /* interface list */
struct ifaddrlist *my_if;
@@ -1496,6 +1503,8 @@
}
if (family == AF_INET) {
+ struct in_pktinfo pktinfo;
+
if (setsockopt(send_sock, IPPROTO_IP, IP_MULTICAST_IF,
(char *)&my_if->addr.addr,
sizeof (struct in_addr)) == -1) {
@@ -1504,6 +1513,15 @@
strerror(errno));
exit(EXIT_FAILURE);
}
+ bzero(&pktinfo, sizeof (pktinfo));
+ pktinfo.ipi_ifindex = my_if->index;
+ if (setsockopt(send_sock, IPPROTO_IP, IP_PKTINFO,
+ (char *)&pktinfo, sizeof (pktinfo)) == -1) {
+ Fprintf(stderr, "%s: setsockopt "
+ "IP_PKTINFO %s\n", progname,
+ strerror(errno));
+ exit(EXIT_FAILURE);
+ }
} else {
/*
* the outgoing interface is set in set_ancillary_data()
@@ -1525,6 +1543,23 @@
}
}
+ /* We enable or disable to not depend on the kernel default */
+ if (family == AF_INET) {
+ if (setsockopt(send_sock, IPPROTO_IP, IP_DONTFRAG,
+ (char *)&dontfrag, sizeof (dontfrag)) == -1) {
+ Fprintf(stderr, "%s: setsockopt IP_DONTFRAG %s\n",
+ progname, strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+ } else {
+ if (setsockopt(send_sock, IPPROTO_IPV6, IPV6_DONTFRAG,
+ (char *)&dontfrag, sizeof (dontfrag)) == -1) {
+ Fprintf(stderr, "%s: setsockopt IPV6_DONTFRAG %s\n",
+ progname, strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+ }
+
/* receiving IPv6 extension headers in verbose mode */
if (verbose && family == AF_INET6) {
if (setsockopt(recv_sock, IPPROTO_IPV6, IPV6_RECVHOPOPTS,
@@ -2336,7 +2371,7 @@
Fprintf(stderr, "usage: %s host [timeout]\n", cmdname);
Fprintf(stderr,
/* CSTYLED */
-"usage: %s -s [-l | U] [abdLnRrv] [-A addr_family] [-c traffic_class]\n\t"
+"usage: %s -s [-l | U] [abdDLnRrv] [-A addr_family] [-c traffic_class]\n\t"
"[-g gateway [-g gateway ...]] [-N nexthop] [-F flow_label] [-I interval]\n\t"
"[-i interface] [-P tos] [-p port] [-t ttl] host [data_size] [npackets]\n",
cmdname);
diff --git a/usr/src/cmd/cmd-inet/usr.sbin/route.c b/usr/src/cmd/cmd-inet/usr.sbin/route.c
index b4b16d6..aedef45 100644
--- a/usr/src/cmd/cmd-inet/usr.sbin/route.c
+++ b/usr/src/cmd/cmd-inet/usr.sbin/route.c
@@ -1,5 +1,5 @@
/*
- * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
@@ -45,8 +45,6 @@
* @(#)linkaddr.c 8.1 (Berkeley) 6/4/93
*/
-#pragma ident "%Z%%M% %I% %E% SMI"
-
#include <sys/param.h>
#include <sys/file.h>
#include <sys/socket.h>
@@ -175,6 +173,8 @@
{"show", K_SHOW},
#define K_SECATTR 43
{"secattr", K_SECATTR},
+#define K_INDIRECT 44
+ {"indirect", K_INDIRECT},
{0, 0}
};
@@ -655,7 +655,7 @@
(char *)rp < (char *)item->valp + item->length;
/* LINTED */
rp = (mib2_ipRouteEntry_t *)
- ((char *)rp + ipRouteEntrySize)) {
+ ((char *)rp + ipRouteEntrySize)) {
delRouteEntry(rp, NULL, seqno);
seqno++;
}
@@ -670,7 +670,7 @@
if (item->group == MIB2_IP6) {
ipv6RouteEntrySize =
((mib2_ipv6IfStatsEntry_t *)item->valp)->
- ipv6RouteEntrySize;
+ ipv6RouteEntrySize;
assert(IS_P2ALIGNED(ipv6RouteEntrySize,
sizeof (mib2_ipv6RouteEntry_t *)));
break;
@@ -692,7 +692,7 @@
(char *)rp6 < (char *)item->valp + item->length;
/* LINTED */
rp6 = (mib2_ipv6RouteEntry_t *)
- ((char *)rp6 + ipv6RouteEntrySize)) {
+ ((char *)rp6 + ipv6RouteEntrySize)) {
delRouteEntry(NULL, rp6, seqno);
seqno++;
}
@@ -812,7 +812,7 @@
(void) printf("%-20.20s ",
rtm->rtm_flags & RTF_HOST ? routename(sa) :
- netname(sa));
+ netname(sa));
/* LINTED */
sa = (struct sockaddr *)(salen(sa) + (char *)sa);
(void) printf("%-20.20s ", routename(sa));
@@ -861,7 +861,7 @@
cp = "default";
if (cp == NULL && !nflag) {
hp = gethostbyaddr((char *)&in, sizeof (struct in_addr),
- AF_INET);
+ AF_INET);
if (hp != NULL) {
if (((cp = strchr(hp->h_name, '.')) != NULL) &&
(strcmp(cp + 1, domain) == 0))
@@ -892,7 +892,7 @@
cp = "default";
if (cp == NULL && !nflag) {
hp = getipnodebyaddr((char *)&in6,
- sizeof (struct in6_addr), AF_INET6, &error_num);
+ sizeof (struct in6_addr), AF_INET6, &error_num);
if (hp != NULL) {
if (((cp = strchr(hp->h_name, '.')) != NULL) &&
(strcmp(cp + 1, domain) == 0))
@@ -1120,8 +1120,8 @@
break;
case AF_INET6:
if (inet_ntop(AF_INET6,
- &rcip->ri_gate.sin6.sin6_addr, obuf,
- INET6_ADDRSTRLEN) != NULL) {
+ &rcip->ri_gate.sin6.sin6_addr, obuf,
+ INET6_ADDRSTRLEN) != NULL) {
if (nflag) {
(void) fprintf(to, ": gateway %s",
obuf);
@@ -1405,6 +1405,9 @@
return (B_FALSE);
}
break;
+ case K_INDIRECT:
+ rcip->ri_flags |= RTF_INDIRECT;
+ break;
default:
if (dash_keyword) {
syntax_bad_keyword(tok + 1);
@@ -1479,8 +1482,8 @@
}
if (rcip->ri_af == AF_INET6 &&
memcmp(&rcip->ri_mask.sin6.sin6_addr,
- &in6_host_mask,
- sizeof (struct in6_addr)) == 0) {
+ &in6_host_mask,
+ sizeof (struct in6_addr)) == 0) {
rcip->ri_flags |= RTF_HOST;
}
} else {
@@ -1853,8 +1856,8 @@
break;
case AF_INET6:
if (inet_ntop(AF_INET6,
- (void *)&newrt->ri_dst.sin6.sin6_addr,
- obuf, INET6_ADDRSTRLEN) != NULL) {
+ (void *)&newrt->ri_dst.sin6.sin6_addr,
+ obuf, INET6_ADDRSTRLEN) != NULL) {
(void) printf(" %s", obuf);
break;
}
@@ -2236,7 +2239,7 @@
inet_lnaof(sin->sin_addr) == INADDR_ANY)) {
/* This looks like a network address. */
inet_makenetandmask(rcip, ntohl(val),
- sin);
+ sin);
}
}
return (B_TRUE);
@@ -2562,7 +2565,7 @@
static char routeflags[] =
"\1UP\2GATEWAY\3HOST\4REJECT\5DYNAMIC\6MODIFIED\7DONE\010MASK_PRESENT"
"\011CLONING\012XRESOLVE\013LLINFO\014STATIC\015BLACKHOLE"
- "\016PRIVATE\017PROTO2\020PROTO1\021MULTIRT\022SETSRC";
+ "\016PRIVATE\017PROTO2\020PROTO1\021MULTIRT\022SETSRC\023INDIRECT";
static char ifnetflags[] =
"\1UP\2BROADCAST\3DEBUG\4LOOPBACK\5PTP\6NOTRAILERS\7RUNNING\010NOARP"
"\011PPROMISC\012ALLMULTI\013INTELLIGENT\014MULTICAST"
@@ -2623,7 +2626,7 @@
break;
default:
(void) printf("pid: %ld, seq %d, errno %d, flags:",
- rtm->rtm_pid, rtm->rtm_seq, rtm->rtm_errno);
+ rtm->rtm_pid, rtm->rtm_seq, rtm->rtm_errno);
bprintf(stdout, rtm->rtm_flags, routeflags);
pmsg_common(rtm, msglen);
break;
@@ -2649,7 +2652,7 @@
if (rtm->rtm_msglen > (ushort_t)msglen) {
(void) fprintf(stderr,
gettext("message length mismatch, in packet %d, "
- "returned %d\n"), rtm->rtm_msglen, msglen);
+ "returned %d\n"), rtm->rtm_msglen, msglen);
}
if (rtm->rtm_errno) {
(void) fprintf(stderr, "RTM_GET: %s (errno %d)\n",
@@ -2675,7 +2678,7 @@
case RTA_IFP:
if (sa->sa_family == AF_LINK &&
((struct sockaddr_dl *)sa)->
- sdl_nlen != 0)
+ sdl_nlen != 0)
ifp = (struct sockaddr_dl *)sa;
break;
case RTA_SRC:
@@ -3122,8 +3125,8 @@
(void) fprintf(stderr, gettext("mibget %d gives "
"T_ERROR_ACK: TLI_error = 0x%lx, UNIX_error = "
"0x%lx\n"), j, tea->TLI_error, tea->UNIX_error);
- errno = (tea->TLI_error == TSYSERR)
- ? tea->UNIX_error : EPROTO;
+ errno = (tea->TLI_error == TSYSERR) ?
+ tea->UNIX_error : EPROTO;
break;
}
diff --git a/usr/src/cmd/cmd-inet/usr.sbin/traceroute/traceroute.c b/usr/src/cmd/cmd-inet/usr.sbin/traceroute/traceroute.c
index cae75df..b8b5625 100644
--- a/usr/src/cmd/cmd-inet/usr.sbin/traceroute/traceroute.c
+++ b/usr/src/cmd/cmd-inet/usr.sbin/traceroute/traceroute.c
@@ -166,6 +166,7 @@
boolean_t docksum = _B_TRUE; /* calculate checksums */
static boolean_t collect_stat = _B_FALSE; /* print statistics */
boolean_t settos = _B_FALSE; /* set type-of-service field */
+int dontfrag = 0; /* IP*_DONTFRAG */
static int max_timeout = 5; /* quit after this consecutive timeouts */
static boolean_t probe_all = _B_FALSE; /* probe all the IFs of the target */
static boolean_t pick_src = _B_FALSE; /* traceroute picks the src address */
@@ -315,6 +316,7 @@
case 'F':
off = IP_DF;
+ dontfrag = 1;
break;
case 'g':
@@ -1361,6 +1363,24 @@
exit(EXIT_FAILURE);
}
}
+
+ /* We enable or disable to not depend on the kernel default */
+ if (pr->family == AF_INET) {
+ if (setsockopt(ssock, IPPROTO_IP, IP_DONTFRAG,
+ (char *)&dontfrag, sizeof (dontfrag)) == -1) {
+ Fprintf(stderr, "%s: IP_DONTFRAG %s\n", prog,
+ strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+ } else {
+ if (setsockopt(ssock, IPPROTO_IPV6, IPV6_DONTFRAG,
+ (char *)&dontfrag, sizeof (dontfrag)) == -1) {
+ Fprintf(stderr, "%s: IPV6_DONTFRAG %s\n", prog,
+ strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+ }
+
if (pr->family == AF_INET) {
rcvsock4 = rsock;
sndsock4 = ssock;
diff --git a/usr/src/cmd/devfsadm/misc_link.c b/usr/src/cmd/devfsadm/misc_link.c
index 222699e..84cdb42 100644
--- a/usr/src/cmd/devfsadm/misc_link.c
+++ b/usr/src/cmd/devfsadm/misc_link.c
@@ -104,8 +104,7 @@
"(^ip$)|(^tcp$)|(^udp$)|(^icmp$)|(^sctp$)|"
"(^ip6$)|(^tcp6$)|(^udp6$)|(^icmp6$)|(^sctp6$)|"
"(^rts$)|(^arp$)|(^ipsecah$)|(^ipsecesp$)|(^keysock$)|(^spdsock$)|"
- "(^nca$)|(^rds$)|(^sdp$)|(^ipnet$)|(^dlpistub$)|(^iptunq)|"
- "(^bpf$)",
+ "(^nca$)|(^rds$)|(^sdp$)|(^ipnet$)|(^dlpistub$)|(^bpf$)",
TYPE_EXACT | DRV_RE, ILEVEL_1, minor_name
},
{ "pseudo", "ddi_pseudo",
diff --git a/usr/src/cmd/mdb/common/modules/arp/arp.c b/usr/src/cmd/mdb/common/modules/arp/arp.c
index f36a811..f97cdaa 100644
--- a/usr/src/cmd/mdb/common/modules/arp/arp.c
+++ b/usr/src/cmd/mdb/common/modules/arp/arp.c
@@ -19,12 +19,10 @@
* CDDL HEADER END
*/
/*
- * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
-#pragma ident "%Z%%M% %I% %E% SMI"
-
#include <stdio.h>
#include <sys/types.h>
#include <sys/stropts.h>
@@ -36,7 +34,6 @@
#include <inet/common.h>
#include <inet/mi.h>
#include <inet/arp.h>
-#include <inet/arp_impl.h>
#include <inet/ip.h>
#include <netinet/arp.h>
@@ -50,541 +47,10 @@
} arp_cmd_tbl;
/*
- * Table of ARP commands and structure types used for messages between ARP and
- * IP.
+ * removed all the ace/arl related stuff. The only thing that remains
+ * is code for dealing with ioctls and printing out arp header that
+ * should probably be moved into the ip/mdb module.
*/
-static const arp_cmd_tbl act_list[] = {
- { AR_ENTRY_ADD, "AR_ENTRY_ADD", "arp`area_t" },
- { AR_ENTRY_DELETE, "AR_ENTRY_DELETE", "arp`ared_t" },
- { AR_ENTRY_QUERY, "AR_ENTRY_QUERY", "arp`areq_t" },
- { AR_ENTRY_SQUERY, "AR_ENTRY_SQUERY", "arp`area_t" },
- { AR_MAPPING_ADD, "AR_MAPPING_ADD", "arp`arma_t" },
- { AR_CLIENT_NOTIFY, "AR_CLIENT_NOTIFY", "arp`arcn_t" },
- { AR_INTERFACE_UP, "AR_INTERFACE_UP", "arp`arc_t" },
- { AR_INTERFACE_DOWN, "AR_INTERFACE_DOWN", "arp`arc_t" },
- { AR_INTERFACE_ON, "AR_INTERFACE_ON", "arp`arc_t" },
- { AR_INTERFACE_OFF, "AR_INTERFACE_OFF", "arp`arc_t" },
- { AR_DLPIOP_DONE, "AR_DLPIOP_DONE", "arp`arc_t" },
- { AR_ARP_CLOSING, "AR_ARP_CLOSING", "arp`arc_t" },
- { AR_ARP_EXTEND, "AR_ARP_EXTEND", "arp`arc_t" },
- { 0, "unknown command", "arp`arc_t" }
-};
-
-/*
- * State information kept during walk over ACE hash table and unhashed mask
- * list.
- */
-typedef struct ace_walk_data {
- ace_t *awd_hash_tbl[ARP_HASH_SIZE];
- ace_t *awd_masks;
- int awd_idx;
-} ace_walk_data_t;
-
-/*
- * Given the kernel address of an arl_t, return the stackid
- */
-static int
-arl_to_stackid(uintptr_t addr)
-{
- arl_t arl;
- queue_t rq;
- ar_t ar;
- arp_stack_t ass;
- netstack_t nss;
-
- if (mdb_vread(&arl, sizeof (arl), addr) == -1) {
- mdb_warn("failed to read arl_t %p", addr);
- return (0);
- }
-
- addr = (uintptr_t)arl.arl_rq;
- if (mdb_vread(&rq, sizeof (rq), addr) == -1) {
- mdb_warn("failed to read queue_t %p", addr);
- return (0);
- }
-
- addr = (uintptr_t)rq.q_ptr;
- if (mdb_vread(&ar, sizeof (ar), addr) == -1) {
- mdb_warn("failed to read ar_t %p", addr);
- return (0);
- }
-
- addr = (uintptr_t)ar.ar_as;
- if (mdb_vread(&ass, sizeof (ass), addr) == -1) {
- mdb_warn("failed to read arp_stack_t %p", addr);
- return (0);
- }
- addr = (uintptr_t)ass.as_netstack;
- if (mdb_vread(&nss, sizeof (nss), addr) == -1) {
- mdb_warn("failed to read netstack_t %p", addr);
- return (0);
- }
- return (nss.netstack_stackid);
-}
-
-static int
-arp_stacks_walk_init(mdb_walk_state_t *wsp)
-{
- if (mdb_layered_walk("netstack", wsp) == -1) {
- mdb_warn("can't walk 'netstack'");
- return (WALK_ERR);
- }
- return (WALK_NEXT);
-}
-
-static int
-arp_stacks_walk_step(mdb_walk_state_t *wsp)
-{
- uintptr_t addr;
- netstack_t nss;
-
- if (mdb_vread(&nss, sizeof (nss), wsp->walk_addr) == -1) {
- mdb_warn("can't read netstack at %p", wsp->walk_addr);
- return (WALK_ERR);
- }
- addr = (uintptr_t)nss.netstack_modules[NS_ARP];
-
- return (wsp->walk_callback(addr, wsp->walk_layer, wsp->walk_cbdata));
-}
-
-static int
-arl_stack_walk_init(mdb_walk_state_t *wsp)
-{
- uintptr_t addr;
-
- if (wsp->walk_addr == NULL) {
- mdb_warn("arl_stack supports only local walks\n");
- return (WALK_ERR);
- }
-
- addr = wsp->walk_addr + OFFSETOF(arp_stack_t, as_arl_head);
- if (mdb_vread(&wsp->walk_addr, sizeof (wsp->walk_addr),
- addr) == -1) {
- mdb_warn("failed to read 'arl_g_head'");
- return (WALK_ERR);
- }
- return (WALK_NEXT);
-}
-
-static int
-arl_stack_walk_step(mdb_walk_state_t *wsp)
-{
- uintptr_t addr = wsp->walk_addr;
- arl_t arl;
-
- if (wsp->walk_addr == NULL)
- return (WALK_DONE);
-
- if (mdb_vread(&arl, sizeof (arl), addr) == -1) {
- mdb_warn("failed to read arl_t at %p", addr);
- return (WALK_ERR);
- }
-
- wsp->walk_addr = (uintptr_t)arl.arl_next;
-
- return ((*wsp->walk_callback)(addr, &arl, wsp->walk_cbdata));
-}
-
-static int
-arl_walk_init(mdb_walk_state_t *wsp)
-{
- if (mdb_layered_walk("arp_stacks", wsp) == -1) {
- mdb_warn("can't walk 'arp_stacks'");
- return (WALK_ERR);
- }
-
- return (WALK_NEXT);
-}
-
-static int
-arl_walk_step(mdb_walk_state_t *wsp)
-{
- if (mdb_pwalk("arl_stack", wsp->walk_callback,
- wsp->walk_cbdata, wsp->walk_addr) == -1) {
- mdb_warn("couldn't walk 'arl_stack' at %p", wsp->walk_addr);
- return (WALK_ERR);
- }
- return (WALK_NEXT);
-}
-
-/*
- * Called with walk_addr being the address of arp_stack_t
- */
-static int
-ace_stack_walk_init(mdb_walk_state_t *wsp)
-{
- ace_walk_data_t *aw;
- uintptr_t addr;
-
- if (wsp->walk_addr == NULL) {
- mdb_warn("ace_stack supports only local walks\n");
- return (WALK_ERR);
- }
-
- aw = mdb_alloc(sizeof (ace_walk_data_t), UM_SLEEP);
-
- addr = wsp->walk_addr + OFFSETOF(arp_stack_t, as_ce_hash_tbl);
- if (mdb_vread(aw->awd_hash_tbl, sizeof (aw->awd_hash_tbl),
- addr) == -1) {
- mdb_warn("failed to read 'as_ce_hash_tbl'");
- mdb_free(aw, sizeof (ace_walk_data_t));
- return (WALK_ERR);
- }
-
- addr = wsp->walk_addr + OFFSETOF(arp_stack_t, as_ce_mask_entries);
- if (mdb_vread(&aw->awd_masks, sizeof (aw->awd_masks),
- addr) == -1) {
- mdb_warn("failed to read 'as_ce_mask_entries'");
- mdb_free(aw, sizeof (ace_walk_data_t));
- return (WALK_ERR);
- }
-
- /* The step routine will start off by incrementing to index 0 */
- aw->awd_idx = -1;
- wsp->walk_addr = 0;
- wsp->walk_data = aw;
-
- return (WALK_NEXT);
-}
-
-static int
-ace_stack_walk_step(mdb_walk_state_t *wsp)
-{
- uintptr_t addr;
- ace_walk_data_t *aw = wsp->walk_data;
- ace_t ace;
-
- /*
- * If we're at the end of the previous list, then find the start of the
- * next list to process.
- */
- while (wsp->walk_addr == NULL) {
- if (aw->awd_idx == ARP_HASH_SIZE)
- return (WALK_DONE);
- if (++aw->awd_idx == ARP_HASH_SIZE) {
- wsp->walk_addr = (uintptr_t)aw->awd_masks;
- } else {
- wsp->walk_addr =
- (uintptr_t)aw->awd_hash_tbl[aw->awd_idx];
- }
- }
-
- addr = wsp->walk_addr;
- if (mdb_vread(&ace, sizeof (ace), addr) == -1) {
- mdb_warn("failed to read ace_t at %p", addr);
- return (WALK_ERR);
- }
-
- wsp->walk_addr = (uintptr_t)ace.ace_next;
-
- return (wsp->walk_callback(addr, &ace, wsp->walk_cbdata));
-}
-
-static void
-ace_stack_walk_fini(mdb_walk_state_t *wsp)
-{
- mdb_free(wsp->walk_data, sizeof (ace_walk_data_t));
-}
-
-static int
-ace_walk_init(mdb_walk_state_t *wsp)
-{
- if (mdb_layered_walk("arp_stacks", wsp) == -1) {
- mdb_warn("can't walk 'arp_stacks'");
- return (WALK_ERR);
- }
-
- return (WALK_NEXT);
-}
-
-static int
-ace_walk_step(mdb_walk_state_t *wsp)
-{
- if (mdb_pwalk("ace_stack", wsp->walk_callback,
- wsp->walk_cbdata, wsp->walk_addr) == -1) {
- mdb_warn("couldn't walk 'ace_stack' at %p", wsp->walk_addr);
- return (WALK_ERR);
- }
- return (WALK_NEXT);
-}
-
-
-/* Common routine to produce an 'ar' text description */
-static void
-ar_describe(const ar_t *ar, char *buf, size_t nbytes, boolean_t addmac)
-{
- if (ar->ar_arl == NULL) {
- queue_t wq, ipq;
- ill_t ill;
- char name[LIFNAMSIZ];
- GElf_Sym sym;
- boolean_t nextip;
-
- if (mdb_vread(&wq, sizeof (wq), (uintptr_t)ar->ar_wq) == -1 ||
- mdb_vread(&ipq, sizeof (ipq), (uintptr_t)wq.q_next) == -1)
- return;
-
- nextip =
- (mdb_lookup_by_obj("ip", "ipwinit", &sym) == 0 &&
- (uintptr_t)sym.st_value == (uintptr_t)ipq.q_qinfo);
-
- if (!ar->ar_on_ill_stream) {
- (void) strcpy(buf, nextip ? "Client" : "Unknown");
- return;
- }
-
- if (!nextip ||
- mdb_vread(&ill, sizeof (ill), (uintptr_t)ipq.q_ptr) == -1 ||
- mdb_readstr(name, sizeof (name),
- (uintptr_t)ill.ill_name) == -1) {
- return;
- }
- (void) mdb_snprintf(buf, nbytes, "IP %s", name);
- } else {
- arl_t arl;
- arlphy_t ap;
- ssize_t retv;
- uint32_t alen;
- uchar_t macaddr[ARP_MAX_ADDR_LEN];
-
- if (mdb_vread(&arl, sizeof (arl), (uintptr_t)ar->ar_arl) == -1)
- return;
- retv = mdb_snprintf(buf, nbytes, "ARP %s ", arl.arl_name);
- if (retv >= nbytes || !addmac)
- return;
- if (mdb_vread(&ap, sizeof (ap), (uintptr_t)arl.arl_phy) == -1)
- return;
- alen = ap.ap_hw_addrlen;
- if (ap.ap_hw_addr == NULL || alen == 0 ||
- alen > sizeof (macaddr))
- return;
- if (mdb_vread(macaddr, alen, (uintptr_t)ap.ap_hw_addr) == -1)
- return;
- mdb_mac_addr(macaddr, alen, buf + retv, nbytes - retv);
- }
-}
-
-/* ARGSUSED2 */
-static int
-ar_cb(uintptr_t addr, const void *arptr, void *dummy)
-{
- const ar_t *ar = arptr;
- char ardesc[sizeof ("ARP ") + LIFNAMSIZ];
-
- ar_describe(ar, ardesc, sizeof (ardesc), B_FALSE);
- mdb_printf("%?p %?p %?p %s\n", addr, ar->ar_wq, ar->ar_arl, ardesc);
- return (WALK_NEXT);
-}
-
-/*
- * Print out ARP client structures.
- */
-/* ARGSUSED2 */
-static int
-ar_cmd(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
-{
- ar_t ar;
-
- if (DCMD_HDRSPEC(flags) && !(flags & DCMD_PIPE_OUT)) {
- mdb_printf("%<u>%?s %?s %?s %s%</u>\n",
- "AR", "WQ", "ARL", "TYPE");
- }
-
- if (flags & DCMD_ADDRSPEC) {
- if (mdb_vread(&ar, sizeof (ar), addr) == -1) {
- mdb_warn("failed to read ar_t at %p", addr);
- return (DCMD_ERR);
- }
- (void) ar_cb(addr, &ar, NULL);
- } else {
- if (mdb_walk("ar", ar_cb, NULL) == -1) {
- mdb_warn("cannot walk ar_t structures");
- return (DCMD_ERR);
- }
- }
- return (DCMD_OK);
-}
-
-/* ARGSUSED2 */
-static int
-arl_cb(uintptr_t addr, const void *arlptr, void *dummy)
-{
- const arl_t *arl = arlptr;
- arlphy_t ap;
- uchar_t macaddr[ARP_MAX_ADDR_LEN];
- char macstr[ARP_MAX_ADDR_LEN*3];
- char flags[4];
- const char *primstr;
-
- mdb_printf("%?p ", addr);
- if (arl->arl_dlpi_pending == DL_PRIM_INVAL)
- mdb_printf("%16s", "--");
- else if ((primstr = mdb_dlpi_prim(arl->arl_dlpi_pending)) != NULL)
- mdb_printf("%16s", primstr);
- else
- mdb_printf("%16x", arl->arl_dlpi_pending);
-
- if (mdb_vread(&ap, sizeof (ap), (uintptr_t)arl->arl_phy) == -1 ||
- ap.ap_hw_addrlen == 0 || ap.ap_hw_addrlen > sizeof (macaddr)) {
- (void) strcpy(macstr, "--");
- } else if (mdb_vread(macaddr, ap.ap_hw_addrlen,
- (uintptr_t)ap.ap_hw_addr) == -1) {
- (void) strcpy(macstr, "?");
- } else {
- mdb_mac_addr(macaddr, ap.ap_hw_addrlen, macstr,
- sizeof (macstr));
- }
-
- /* Print both the link-layer state and the NOARP flag */
- flags[0] = '\0';
- if (arl->arl_flags & ARL_F_NOARP)
- (void) strcat(flags, "N");
- switch (arl->arl_state) {
- case ARL_S_DOWN:
- (void) strcat(flags, "d");
- break;
- case ARL_S_PENDING:
- (void) strcat(flags, "P");
- break;
- case ARL_S_UP:
- (void) strcat(flags, "U");
- break;
- default:
- (void) strcat(flags, "?");
- break;
- }
- mdb_printf(" %8d %-3s %-9s %-17s %5d\n",
- mdb_mblk_count(arl->arl_dlpi_deferred), flags, arl->arl_name,
- macstr, arl_to_stackid((uintptr_t)addr));
- return (WALK_NEXT);
-}
-
-/*
- * Print out ARP link-layer elements.
- */
-/* ARGSUSED2 */
-static int
-arl_cmd(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
-{
- arl_t arl;
-
- if (DCMD_HDRSPEC(flags) && !(flags & DCMD_PIPE_OUT)) {
- mdb_printf("%<u>%?s %16s %8s %3s %9s %-17s %5s%</u>\n",
- "ARL", "DLPI REQ", "DLPI CNT", "FLG", "INTERFACE",
- "HWADDR", "STACK");
- }
-
- if (flags & DCMD_ADDRSPEC) {
- if (mdb_vread(&arl, sizeof (arl), addr) == -1) {
- mdb_warn("failed to read arl_t at %p", addr);
- return (DCMD_ERR);
- }
- (void) arl_cb(addr, &arl, NULL);
- } else {
- if (mdb_walk("arl", arl_cb, NULL) == -1) {
- mdb_warn("cannot walk arl_t structures");
- return (DCMD_ERR);
- }
- }
- return (DCMD_OK);
-}
-
-/* ARGSUSED2 */
-static int
-ace_cb(uintptr_t addr, const void *aceptr, void *dummy)
-{
- const ace_t *ace = aceptr;
- uchar_t macaddr[ARP_MAX_ADDR_LEN];
- char macstr[ARP_MAX_ADDR_LEN*3];
- /* The %b format isn't compact enough for long listings */
- static const char ace_flags[] = "SPDRMLdA ofya";
- const char *cp;
- char flags[sizeof (ace_flags)], *fp;
- int flg;
- in_addr_t inaddr, mask;
- char addrstr[sizeof ("255.255.255.255/32")];
-
- /* Walk the list of flags and produce a string */
- cp = ace_flags;
- fp = flags;
- for (flg = 1; *cp != '\0'; flg <<= 1, cp++) {
- if ((flg & ace->ace_flags) && *cp != ' ')
- *fp++ = *cp;
- }
- *fp = '\0';
-
- /* If it's not resolved, then it has no hardware address */
- if (!(ace->ace_flags & ACE_F_RESOLVED) ||
- ace->ace_hw_addr_length == 0 ||
- ace->ace_hw_addr_length > sizeof (macaddr)) {
- (void) strcpy(macstr, "--");
- } else if (mdb_vread(macaddr, ace->ace_hw_addr_length,
- (uintptr_t)ace->ace_hw_addr) == -1) {
- (void) strcpy(macstr, "?");
- } else {
- mdb_mac_addr(macaddr, ace->ace_hw_addr_length, macstr,
- sizeof (macstr));
- }
-
- /*
- * Nothing other than IP uses ARP these days, so we don't try very hard
- * here to switch out on ARP protocol type. (Note that ARP protocol
- * types are roughly Ethertypes, but are allocated separately at IANA.)
- */
- if (ace->ace_proto != IP_ARP_PROTO_TYPE) {
- (void) mdb_snprintf(addrstr, sizeof (addrstr),
- "Unknown proto %x", ace->ace_proto);
- } else if (mdb_vread(&inaddr, sizeof (inaddr),
- (uintptr_t)ace->ace_proto_addr) != -1 &&
- mdb_vread(&mask, sizeof (mask), (uintptr_t)ace->ace_proto_mask) !=
- -1) {
- /*
- * If it's the standard host mask, then print it normally.
- * Otherwise, use "/n" notation.
- */
- if (mask == (in_addr_t)~0) {
- (void) mdb_snprintf(addrstr, sizeof (addrstr), "%I",
- inaddr);
- } else {
- (void) mdb_snprintf(addrstr, sizeof (addrstr), "%I/%d",
- inaddr, mask == 0 ? 0 : 33 - mdb_ffs(mask));
- }
- } else {
- (void) strcpy(addrstr, "?");
- }
- mdb_printf("%?p %-18s %-8s %-17s %5d\n", addr, addrstr, flags,
- macstr, arl_to_stackid((uintptr_t)ace->ace_arl));
- return (WALK_NEXT);
-}
-
-/*
- * Print out ARP cache entry (ace_t) elements.
- */
-/* ARGSUSED2 */
-static int
-ace_cmd(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
-{
- ace_t ace;
-
- if (DCMD_HDRSPEC(flags) && !(flags & DCMD_PIPE_OUT)) {
- mdb_printf("%<u>%?s %-18s %-8s %-17s %5s%</u>\n",
- "ACE", "PROTOADDR", "FLAGS", "HWADDR", "STACK");
- }
-
- if (flags & DCMD_ADDRSPEC) {
- if (mdb_vread(&ace, sizeof (ace), addr) == -1) {
- mdb_warn("failed to read ace_t at %p", addr);
- return (DCMD_ERR);
- }
- (void) ace_cb(addr, &ace, NULL);
- } else {
- if (mdb_walk("ace", ace_cb, NULL) == -1) {
- mdb_warn("cannot walk ace_t structures");
- return (DCMD_ERR);
- }
- }
- return (DCMD_OK);
-}
/*
* Print an ARP hardware and protocol address pair; used when printing an ARP
@@ -696,148 +162,25 @@
return (DCMD_OK);
}
-/*
- * Print out an arp command formatted in a reasonable manner. This implements
- * the type switch used by ARP.
- *
- * It could also dump the data that follows the header (using offset and length
- * in the various structures), but it currently does not.
- */
-/* ARGSUSED2 */
-static int
-arpcmd_cmd(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
-{
- arc_t arc;
- const arp_cmd_tbl *tp;
- mdb_arg_t subargv;
-
- if (!(flags & DCMD_ADDRSPEC)) {
- mdb_warn("address required to print ARP command\n");
- return (DCMD_ERR);
- }
- if (mdb_vread(&arc, sizeof (arc), addr) == -1) {
- mdb_warn("unable to read arc_t at %p", addr);
- return (DCMD_ERR);
- }
- for (tp = act_list; tp->act_cmd != 0; tp++)
- if (tp->act_cmd == arc.arc_cmd)
- break;
- mdb_printf("%p %s (%s) = ", addr, tp->act_name, tp->act_type);
- subargv.a_type = MDB_TYPE_STRING;
- subargv.a_un.a_str = tp->act_type;
- if (mdb_call_dcmd("print", addr, DCMD_ADDRSPEC, 1, &subargv) == -1)
- return (DCMD_ERR);
- else
- return (DCMD_OK);
-}
-
-static size_t
-mi_osize(const queue_t *q)
-{
- /*
- * The code in common/inet/mi.c allocates an extra word to store the
- * size of the allocation. An mi_o_s is thus a size_t plus an mi_o_s.
- */
- struct mi_block {
- size_t mi_nbytes;
- struct mi_o_s mi_o;
- } m;
-
- if (mdb_vread(&m, sizeof (m), (uintptr_t)q->q_ptr - sizeof (m)) != -1)
- return (m.mi_nbytes - sizeof (m));
-
- return (0);
-}
-
-/*
- * This is called when ::stream is used and an ARP module is seen on the
- * stream. Determine what sort of ARP usage is involved and show an
- * appropriate message.
- */
-static void
-arp_qinfo(const queue_t *qp, char *buf, size_t nbytes)
-{
- size_t size = mi_osize(qp);
- ar_t ar;
-
- if (size != sizeof (ar_t))
- return;
- if (mdb_vread(&ar, sizeof (ar), (uintptr_t)qp->q_ptr) == -1)
- return;
- ar_describe(&ar, buf, nbytes, B_TRUE);
-}
-
-static uintptr_t
-arp_rnext(const queue_t *q)
-{
- size_t size = mi_osize(q);
- ar_t ar;
-
- if (size == sizeof (ar_t) && mdb_vread(&ar, sizeof (ar),
- (uintptr_t)q->q_ptr) != -1)
- return ((uintptr_t)ar.ar_rq);
-
- return (NULL);
-}
-
-static uintptr_t
-arp_wnext(const queue_t *q)
-{
- size_t size = mi_osize(q);
- ar_t ar;
-
- if (size == sizeof (ar_t) && mdb_vread(&ar, sizeof (ar),
- (uintptr_t)q->q_ptr) != -1)
- return ((uintptr_t)ar.ar_wq);
-
- return (NULL);
-}
-
static const mdb_dcmd_t dcmds[] = {
- { "ar", "?", "display ARP client streams for all stacks",
- ar_cmd, NULL },
- { "arl", "?", "display ARP link layers for all stacks", arl_cmd, NULL },
- { "ace", "?", "display ARP cache entries for all stacks",
- ace_cmd, NULL },
{ "arphdr", ":", "display an ARP header", arphdr_cmd, NULL },
- { "arpcmd", ":", "display an ARP command", arpcmd_cmd, NULL },
{ NULL }
};
/* Note: ar_t walker is in genunix.c and net.c; generic MI walker */
static const mdb_walker_t walkers[] = {
- { "arl", "walk list of arl_t links for all stacks",
- arl_walk_init, arl_walk_step, NULL },
- { "arl_stack", "walk list of arl_t links",
- arl_stack_walk_init, arl_stack_walk_step, NULL },
- { "ace", "walk list of ace_t entries for all stacks",
- ace_walk_init, ace_walk_step, NULL },
- { "ace_stack", "walk list of ace_t entries",
- ace_stack_walk_init, ace_stack_walk_step, ace_stack_walk_fini },
- { "arp_stacks", "walk all the arp_stack_t",
- arp_stacks_walk_init, arp_stacks_walk_step, NULL },
{ NULL }
};
-static const mdb_qops_t arp_qops = { arp_qinfo, arp_rnext, arp_wnext };
static const mdb_modinfo_t modinfo = { MDB_API_VERSION, dcmds, walkers };
const mdb_modinfo_t *
_mdb_init(void)
{
- GElf_Sym sym;
-
- if (mdb_lookup_by_obj("arp", "winit", &sym) == 0)
- mdb_qops_install(&arp_qops, (uintptr_t)sym.st_value);
-
return (&modinfo);
}
void
_mdb_fini(void)
{
- GElf_Sym sym;
-
- if (mdb_lookup_by_obj("arp", "winit", &sym) == 0)
- mdb_qops_remove(&arp_qops, (uintptr_t)sym.st_value);
}
diff --git a/usr/src/cmd/mdb/common/modules/genunix/genunix.c b/usr/src/cmd/mdb/common/modules/genunix/genunix.c
index 3e49d9a..e6fe3f7 100644
--- a/usr/src/cmd/mdb/common/modules/genunix/genunix.c
+++ b/usr/src/cmd/mdb/common/modules/genunix/genunix.c
@@ -4770,8 +4770,6 @@
NULL, modchain_walk_step, NULL },
/* from net.c */
- { "ar", "walk ar_t structures using MI for all stacks",
- mi_payload_walk_init, mi_payload_walk_step, NULL, &mi_ar_arg },
{ "icmp", "walk ICMP control structures using MI for all stacks",
mi_payload_walk_init, mi_payload_walk_step, NULL,
&mi_icmp_arg },
@@ -4779,8 +4777,6 @@
mi_walk_init, mi_walk_step, mi_walk_fini, NULL },
{ "sonode", "given a sonode, walk its children",
sonode_walk_init, sonode_walk_step, sonode_walk_fini, NULL },
- { "ar_stacks", "walk all the ar_stack_t",
- ar_stacks_walk_init, ar_stacks_walk_step, NULL },
{ "icmp_stacks", "walk all the icmp_stack_t",
icmp_stacks_walk_init, icmp_stacks_walk_step, NULL },
{ "tcp_stacks", "walk all the tcp_stack_t",
diff --git a/usr/src/cmd/mdb/common/modules/genunix/net.c b/usr/src/cmd/mdb/common/modules/genunix/net.c
index d9f4717..23d6202 100644
--- a/usr/src/cmd/mdb/common/modules/genunix/net.c
+++ b/usr/src/cmd/mdb/common/modules/genunix/net.c
@@ -45,7 +45,6 @@
#include <sys/socketvar.h>
#include <sys/cred_impl.h>
#include <inet/udp_impl.h>
-#include <inet/arp_impl.h>
#include <inet/rawip_impl.h>
#include <inet/mi.h>
#include <fs/sockfs/socktpi_impl.h>
@@ -71,31 +70,6 @@
int af;
} netstat_cb_data_t;
-/* Walkers for various *_stack_t */
-int
-ar_stacks_walk_init(mdb_walk_state_t *wsp)
-{
- if (mdb_layered_walk("netstack", wsp) == -1) {
- mdb_warn("can't walk 'netstack'");
- return (WALK_ERR);
- }
- return (WALK_NEXT);
-}
-
-int
-ar_stacks_walk_step(mdb_walk_state_t *wsp)
-{
- uintptr_t kaddr;
- netstack_t nss;
-
- if (mdb_vread(&nss, sizeof (nss), wsp->walk_addr) == -1) {
- mdb_warn("can't read netstack at %p", wsp->walk_addr);
- return (WALK_ERR);
- }
- kaddr = (uintptr_t)nss.netstack_modules[NS_ARP];
- return (wsp->walk_callback(kaddr, wsp->walk_layer, wsp->walk_cbdata));
-}
-
int
icmp_stacks_walk_init(mdb_walk_state_t *wsp)
{
@@ -201,15 +175,15 @@
static int
net_tcp_ipv4(const tcp_t *tcp)
{
- return ((tcp->tcp_ipversion == IPV4_VERSION) ||
- (IN6_IS_ADDR_UNSPECIFIED(&tcp->tcp_ip_src_v6) &&
+ return ((tcp->tcp_connp->conn_ipversion == IPV4_VERSION) ||
+ (IN6_IS_ADDR_UNSPECIFIED(&tcp->tcp_connp->conn_laddr_v6) &&
(tcp->tcp_state <= TCPS_LISTEN)));
}
static int
net_tcp_ipv6(const tcp_t *tcp)
{
- return (tcp->tcp_ipversion == IPV6_VERSION);
+ return (tcp->tcp_connp->conn_ipversion == IPV6_VERSION);
}
static int
@@ -222,15 +196,15 @@
static int
net_udp_ipv4(const udp_t *udp)
{
- return ((udp->udp_ipversion == IPV4_VERSION) ||
- (IN6_IS_ADDR_UNSPECIFIED(&udp->udp_v6src) &&
+ return ((udp->udp_connp->conn_ipversion == IPV4_VERSION) ||
+ (IN6_IS_ADDR_UNSPECIFIED(&udp->udp_connp->conn_laddr_v6) &&
(udp->udp_state <= TS_IDLE)));
}
static int
net_udp_ipv6(const udp_t *udp)
{
- return (udp->udp_ipversion == IPV6_VERSION);
+ return (udp->udp_connp->conn_ipversion == IPV6_VERSION);
}
int
@@ -399,11 +373,6 @@
return (WALK_NEXT);
}
-const mi_payload_walk_arg_t mi_ar_arg = {
- "ar_stacks", OFFSETOF(arp_stack_t, as_head), sizeof (ar_t),
- MI_PAYLOAD_DEVICE | MI_PAYLOAD_MODULE
-};
-
const mi_payload_walk_arg_t mi_icmp_arg = {
"icmp_stacks", OFFSETOF(icmp_stack_t, is_head), sizeof (icmp_t),
MI_PAYLOAD_DEVICE | MI_PAYLOAD_MODULE
@@ -632,7 +601,7 @@
tcp_kaddr = (uintptr_t)connp->conn_tcp;
if (mdb_vread(&tcps, sizeof (tcp_t), tcp_kaddr) == -1) {
- mdb_warn("failed to read tcp_t at %p", kaddr);
+ mdb_warn("failed to read tcp_t at %p", tcp_kaddr);
return (WALK_ERR);
}
@@ -648,13 +617,13 @@
mdb_printf("%0?p %2i ", tcp_kaddr, tcp->tcp_state);
if (af == AF_INET) {
- net_ipv4addrport_pr(&tcp->tcp_ip_src_v6, tcp->tcp_lport);
+ net_ipv4addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
mdb_printf(" ");
- net_ipv4addrport_pr(&tcp->tcp_remote_v6, tcp->tcp_fport);
+ net_ipv4addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
} else if (af == AF_INET6) {
- net_ipv6addrport_pr(&tcp->tcp_ip_src_v6, tcp->tcp_lport);
+ net_ipv6addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
mdb_printf(" ");
- net_ipv6addrport_pr(&tcp->tcp_remote_v6, tcp->tcp_fport);
+ net_ipv6addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
}
mdb_printf(" %5i", ns_to_stackid((uintptr_t)connp->conn_netstack));
mdb_printf(" %4i\n", connp->conn_zoneid);
@@ -687,6 +656,9 @@
return (WALK_ERR);
}
+ connp->conn_udp = &udp;
+ udp.udp_connp = connp;
+
if (!((opts & NETSTAT_ALL) || net_udp_active(&udp)) ||
(af == AF_INET && !net_udp_ipv4(&udp)) ||
(af == AF_INET6 && !net_udp_ipv6(&udp))) {
@@ -704,13 +676,13 @@
mdb_printf("%0?p %10s ", (uintptr_t)connp->conn_udp, state);
if (af == AF_INET) {
- net_ipv4addrport_pr(&udp.udp_v6src, udp.udp_port);
+ net_ipv4addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
mdb_printf(" ");
- net_ipv4addrport_pr(&udp.udp_v6dst, udp.udp_dstport);
+ net_ipv4addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
} else if (af == AF_INET6) {
- net_ipv6addrport_pr(&udp.udp_v6src, udp.udp_port);
+ net_ipv6addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
mdb_printf(" ");
- net_ipv6addrport_pr(&udp.udp_v6dst, udp.udp_dstport);
+ net_ipv6addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
}
mdb_printf(" %5i", ns_to_stackid((uintptr_t)connp->conn_netstack));
mdb_printf(" %4i\n", connp->conn_zoneid);
@@ -740,8 +712,11 @@
return (WALK_ERR);
}
- if ((af == AF_INET && icmp.icmp_ipversion != IPV4_VERSION) ||
- (af == AF_INET6 && icmp.icmp_ipversion != IPV6_VERSION)) {
+ connp->conn_icmp = &icmp;
+ icmp.icmp_connp = connp;
+
+ if ((af == AF_INET && connp->conn_ipversion != IPV4_VERSION) ||
+ (af == AF_INET6 && connp->conn_ipversion != IPV6_VERSION)) {
return (WALK_NEXT);
}
@@ -756,16 +731,16 @@
mdb_printf("%0?p %10s ", (uintptr_t)connp->conn_icmp, state);
if (af == AF_INET) {
- mdb_printf("%*I ", ADDR_V4_WIDTH,
- V4_PART_OF_V6((icmp.icmp_v6src)));
- mdb_printf("%*I ", ADDR_V4_WIDTH,
- V4_PART_OF_V6((icmp.icmp_v6dst.sin6_addr)));
+ net_ipv4addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
+ mdb_printf(" ");
+ net_ipv4addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
} else if (af == AF_INET6) {
- mdb_printf("%*N ", ADDR_V6_WIDTH, &icmp.icmp_v6src);
- mdb_printf("%*N ", ADDR_V6_WIDTH, &icmp.icmp_v6dst);
+ net_ipv6addrport_pr(&connp->conn_laddr_v6, connp->conn_lport);
+ mdb_printf(" ");
+ net_ipv6addrport_pr(&connp->conn_faddr_v6, connp->conn_fport);
}
mdb_printf(" %5i", ns_to_stackid((uintptr_t)connp->conn_netstack));
- mdb_printf(" %4i\n", icmp.icmp_zoneid);
+ mdb_printf(" %4i\n", connp->conn_zoneid);
return (WALK_NEXT);
}
@@ -881,57 +856,57 @@
ill_t ill;
*intf = '\0';
- if (ire->ire_type == IRE_CACHE) {
- queue_t stq;
-
- if (mdb_vread(&stq, sizeof (stq), (uintptr_t)ire->ire_stq) ==
- -1)
- return;
- if (mdb_vread(&ill, sizeof (ill), (uintptr_t)stq.q_ptr) == -1)
+ if (ire->ire_ill != NULL) {
+ if (mdb_vread(&ill, sizeof (ill),
+ (uintptr_t)ire->ire_ill) == -1)
return;
(void) mdb_readstr(intf, MIN(LIFNAMSIZ, ill.ill_name_length),
(uintptr_t)ill.ill_name);
- } else if (ire->ire_ipif != NULL) {
- ipif_t ipif;
- char *cp;
-
- if (mdb_vread(&ipif, sizeof (ipif),
- (uintptr_t)ire->ire_ipif) == -1)
- return;
- if (mdb_vread(&ill, sizeof (ill), (uintptr_t)ipif.ipif_ill) ==
- -1)
- return;
- (void) mdb_readstr(intf, MIN(LIFNAMSIZ, ill.ill_name_length),
- (uintptr_t)ill.ill_name);
- if (ipif.ipif_id != 0) {
- cp = intf + strlen(intf);
- (void) mdb_snprintf(cp, LIFNAMSIZ + 1 - (cp - intf),
- ":%u", ipif.ipif_id);
- }
}
}
+const in6_addr_t ipv6_all_ones =
+ { 0xffffffffU, 0xffffffffU, 0xffffffffU, 0xffffffffU };
+
static void
-get_v4flags(const ire_t *ire, char *flags)
+get_ireflags(const ire_t *ire, char *flags)
{
(void) strcpy(flags, "U");
- if (ire->ire_type == IRE_DEFAULT || ire->ire_type == IRE_PREFIX ||
- ire->ire_type == IRE_HOST || ire->ire_type == IRE_HOST_REDIRECT)
+ /* RTF_INDIRECT wins over RTF_GATEWAY - don't display both */
+ if (ire->ire_flags & RTF_INDIRECT)
+ (void) strcat(flags, "I");
+ else if (ire->ire_type & IRE_OFFLINK)
(void) strcat(flags, "G");
- if (ire->ire_mask == IP_HOST_MASK)
- (void) strcat(flags, "H");
- if (ire->ire_type == IRE_HOST_REDIRECT)
+
+ /* IRE_IF_CLONE wins over RTF_HOST - don't display both */
+ if (ire->ire_type & IRE_IF_CLONE)
+ (void) strcat(flags, "C");
+ else if (ire->ire_ipversion == IPV4_VERSION) {
+ if (ire->ire_mask == IP_HOST_MASK)
+ (void) strcat(flags, "H");
+ } else {
+ if (IN6_ARE_ADDR_EQUAL(&ire->ire_mask_v6, &ipv6_all_ones))
+ (void) strcat(flags, "H");
+ }
+
+ if (ire->ire_flags & RTF_DYNAMIC)
(void) strcat(flags, "D");
- if (ire->ire_type == IRE_CACHE)
- (void) strcat(flags, "A");
if (ire->ire_type == IRE_BROADCAST)
- (void) strcat(flags, "B");
+ (void) strcat(flags, "b");
+ if (ire->ire_type == IRE_MULTICAST)
+ (void) strcat(flags, "m");
if (ire->ire_type == IRE_LOCAL)
(void) strcat(flags, "L");
+ if (ire->ire_type == IRE_NOROUTE)
+ (void) strcat(flags, "N");
if (ire->ire_flags & RTF_MULTIRT)
(void) strcat(flags, "M");
if (ire->ire_flags & RTF_SETSRC)
(void) strcat(flags, "S");
+ if (ire->ire_flags & RTF_REJECT)
+ (void) strcat(flags, "R");
+ if (ire->ire_flags & RTF_BLACKHOLE)
+ (void) strcat(flags, "B");
}
static int
@@ -945,8 +920,10 @@
if (ire->ire_ipversion != IPV4_VERSION)
return (WALK_NEXT);
- if (!(*opts & NETSTAT_ALL) && (ire->ire_type == IRE_CACHE ||
- ire->ire_type == IRE_BROADCAST || ire->ire_type == IRE_LOCAL))
+ /* Skip certain IREs by default */
+ if (!(*opts & NETSTAT_ALL) &&
+ (ire->ire_type &
+ (IRE_BROADCAST|IRE_LOCAL|IRE_MULTICAST|IRE_NOROUTE|IRE_IF_CLONE)))
return (WALK_NEXT);
if (*opts & NETSTAT_FIRST) {
@@ -966,10 +943,9 @@
}
}
- gate = (ire->ire_type & (IRE_INTERFACE|IRE_LOOPBACK|IRE_BROADCAST)) ?
- ire->ire_src_addr : ire->ire_gateway_addr;
+ gate = ire->ire_gateway_addr;
- get_v4flags(ire, flags);
+ get_ireflags(ire, flags);
get_ifname(ire, intf);
@@ -977,8 +953,8 @@
mdb_printf("%?p %-*I %-*I %-*I %-6s %5u%c %4u %3u %-3s %5u "
"%u\n", kaddr, ADDR_V4_WIDTH, ire->ire_addr, ADDR_V4_WIDTH,
ire->ire_mask, ADDR_V4_WIDTH, gate, intf,
- ire->ire_max_frag, ire->ire_frag_flag ? '*' : ' ',
- ire->ire_uinfo.iulp_rtt, ire->ire_refcnt, flags,
+ 0, ' ',
+ ire->ire_metrics.iulp_rtt, ire->ire_refcnt, flags,
ire->ire_ob_pkt_count, ire->ire_ib_pkt_count);
} else {
mdb_printf("%?p %-*I %-*I %-5s %4u %5u %s\n", kaddr,
@@ -1025,7 +1001,10 @@
if (ire->ire_ipversion != IPV6_VERSION)
return (WALK_NEXT);
- if (!(*opts & NETSTAT_ALL) && ire->ire_type == IRE_CACHE)
+ /* Skip certain IREs by default */
+ if (!(*opts & NETSTAT_ALL) &&
+ (ire->ire_type &
+ (IRE_BROADCAST|IRE_LOCAL|IRE_MULTICAST|IRE_NOROUTE|IRE_IF_CLONE)))
return (WALK_NEXT);
if (*opts & NETSTAT_FIRST) {
@@ -1045,37 +1024,21 @@
}
}
- gatep = (ire->ire_type & (IRE_INTERFACE|IRE_LOOPBACK)) ?
- &ire->ire_src_addr_v6 : &ire->ire_gateway_addr_v6;
+ gatep = &ire->ire_gateway_addr_v6;
masklen = ip_mask_to_plen_v6(&ire->ire_mask_v6);
(void) mdb_snprintf(deststr, sizeof (deststr), "%N/%d",
&ire->ire_addr_v6, masklen);
- (void) strcpy(flags, "U");
- if (ire->ire_type == IRE_DEFAULT || ire->ire_type == IRE_PREFIX ||
- ire->ire_type == IRE_HOST || ire->ire_type == IRE_HOST_REDIRECT)
- (void) strcat(flags, "G");
- if (masklen == IPV6_ABITS)
- (void) strcat(flags, "H");
- if (ire->ire_type == IRE_HOST_REDIRECT)
- (void) strcat(flags, "D");
- if (ire->ire_type == IRE_CACHE)
- (void) strcat(flags, "A");
- if (ire->ire_type == IRE_LOCAL)
- (void) strcat(flags, "L");
- if (ire->ire_flags & RTF_MULTIRT)
- (void) strcat(flags, "M");
- if (ire->ire_flags & RTF_SETSRC)
- (void) strcat(flags, "S");
+ get_ireflags(ire, flags);
get_ifname(ire, intf);
if (*opts & NETSTAT_VERBOSE) {
mdb_printf("%?p %-*s %-*N %-5s %5u%c %5u %3u %-5s %6u %u\n",
kaddr, ADDR_V6_WIDTH+4, deststr, ADDR_V6_WIDTH, gatep,
- intf, ire->ire_max_frag, ire->ire_frag_flag ? '*' : ' ',
- ire->ire_uinfo.iulp_rtt, ire->ire_refcnt,
+ intf, 0, ' ',
+ ire->ire_metrics.iulp_rtt, ire->ire_refcnt,
flags, ire->ire_ob_pkt_count, ire->ire_ib_pkt_count);
} else {
mdb_printf("%?p %-*s %-*N %-5s %3u %6u %s\n", kaddr,
diff --git a/usr/src/cmd/mdb/common/modules/genunix/net.h b/usr/src/cmd/mdb/common/modules/genunix/net.h
index f2d441e..f72d75f 100644
--- a/usr/src/cmd/mdb/common/modules/genunix/net.h
+++ b/usr/src/cmd/mdb/common/modules/genunix/net.h
@@ -30,7 +30,6 @@
extern "C" {
#endif
-extern struct mi_payload_walk_arg_s mi_ar_arg;
extern struct mi_payload_walk_arg_s mi_icmp_arg;
extern struct mi_payload_walk_arg_s mi_ill_arg;
@@ -42,8 +41,6 @@
extern void mi_walk_fini(mdb_walk_state_t *);
extern int mi_payload_walk_init(mdb_walk_state_t *);
extern int mi_payload_walk_step(mdb_walk_state_t *);
-extern int ar_stacks_walk_init(mdb_walk_state_t *);
-extern int ar_stacks_walk_step(mdb_walk_state_t *);
extern int icmp_stacks_walk_init(mdb_walk_state_t *);
extern int icmp_stacks_walk_step(mdb_walk_state_t *);
extern int tcp_stacks_walk_init(mdb_walk_state_t *);
diff --git a/usr/src/cmd/mdb/common/modules/genunix/streams.c b/usr/src/cmd/mdb/common/modules/genunix/streams.c
index 0458589..d0095c7 100644
--- a/usr/src/cmd/mdb/common/modules/genunix/streams.c
+++ b/usr/src/cmd/mdb/common/modules/genunix/streams.c
@@ -172,7 +172,6 @@
{ SF(0x08), "unused" },
{ SF(MSGMARKNEXT), "Private: b_next's first byte marked" },
{ SF(MSGNOTMARKNEXT), "Private: ... not marked" },
- { SF(MSGHASREF), "Private: msg has reference to owner" },
{ 0, NULL, NULL }
};
diff --git a/usr/src/cmd/mdb/common/modules/genunix/vfs.c b/usr/src/cmd/mdb/common/modules/genunix/vfs.c
index 45dc27a..8001c41 100644
--- a/usr/src/cmd/mdb/common/modules/genunix/vfs.c
+++ b/usr/src/cmd/mdb/common/modules/genunix/vfs.c
@@ -572,8 +572,9 @@
sin_t *sin4;
int scanned = 0;
boolean_t skip_lback = B_FALSE;
+ conn_t *connp = sctp->sctp_connp;
- addr->sa_family = sctp->sctp_family;
+ addr->sa_family = connp->conn_family;
if (sctp->sctp_nsaddrs == 0)
goto done;
@@ -636,18 +637,18 @@
continue;
}
- switch (sctp->sctp_family) {
+ switch (connp->conn_family) {
case AF_INET:
/* LINTED: alignment */
sin4 = (sin_t *)addr;
if ((sctp->sctp_state <= SCTPS_LISTEN) &&
sctp->sctp_bound_to_all) {
sin4->sin_addr.s_addr = INADDR_ANY;
- sin4->sin_port = sctp->sctp_lport;
+ sin4->sin_port = connp->conn_lport;
} else {
sin4 += added;
sin4->sin_family = AF_INET;
- sin4->sin_port = sctp->sctp_lport;
+ sin4->sin_port = connp->conn_lport;
IN6_V4MAPPED_TO_INADDR(&laddr,
&sin4->sin_addr);
}
@@ -660,15 +661,14 @@
sctp->sctp_bound_to_all) {
bzero(&sin6->sin6_addr,
sizeof (sin6->sin6_addr));
- sin6->sin6_port = sctp->sctp_lport;
+ sin6->sin6_port = connp->conn_lport;
} else {
sin6 += added;
sin6->sin6_family = AF_INET6;
- sin6->sin6_port = sctp->sctp_lport;
+ sin6->sin6_port = connp->conn_lport;
sin6->sin6_addr = laddr;
}
- sin6->sin6_flowinfo = sctp->sctp_ip6h->ip6_vcf &
- ~IPV6_VERS_AND_FLOW_MASK;
+ sin6->sin6_flowinfo = connp->conn_flowinfo;
sin6->sin6_scope_id = 0;
sin6->__sin6_src_id = 0;
break;
@@ -712,11 +712,12 @@
struct sockaddr_in6 *sin6;
sctp_faddr_t sctp_primary;
in6_addr_t faddr;
+ conn_t *connp = sctp->sctp_connp;
if (sctp->sctp_faddrs == NULL)
return (-1);
- addr->sa_family = sctp->sctp_family;
+ addr->sa_family = connp->conn_family;
if (mdb_vread(&sctp_primary, sizeof (sctp_faddr_t),
(uintptr_t)sctp->sctp_primary) == -1) {
mdb_warn("failed to read sctp primary faddr");
@@ -724,12 +725,12 @@
}
faddr = sctp_primary.faddr;
- switch (sctp->sctp_family) {
+ switch (connp->conn_family) {
case AF_INET:
/* LINTED: alignment */
sin4 = (struct sockaddr_in *)addr;
IN6_V4MAPPED_TO_INADDR(&faddr, &sin4->sin_addr);
- sin4->sin_port = sctp->sctp_fport;
+ sin4->sin_port = connp->conn_fport;
sin4->sin_family = AF_INET;
break;
@@ -737,7 +738,7 @@
/* LINTED: alignment */
sin6 = (struct sockaddr_in6 *)addr;
sin6->sin6_addr = faddr;
- sin6->sin6_port = sctp->sctp_fport;
+ sin6->sin6_port = connp->conn_fport;
sin6->sin6_family = AF_INET6;
sin6->sin6_flowinfo = 0;
sin6->sin6_scope_id = 0;
@@ -797,7 +798,7 @@
mdb_printf("socket: ");
mdb_nhconvert(&port, &conn_t.conn_lport, sizeof (port));
- mdb_printf("AF_INET %I %d ", conn_t.conn_src, port);
+ mdb_printf("AF_INET %I %d ", conn_t.conn_laddr_v4, port);
/*
* If this is a listening socket, we don't print
@@ -807,7 +808,8 @@
IPCL_IS_UDP(&conn_t) && IPCL_IS_CONNECTED(&conn_t)) {
mdb_printf("remote: ");
mdb_nhconvert(&port, &conn_t.conn_fport, sizeof (port));
- mdb_printf("AF_INET %I %d ", conn_t.conn_rem, port);
+ mdb_printf("AF_INET %I %d ", conn_t.conn_faddr_v4,
+ port);
}
break;
@@ -826,7 +828,7 @@
mdb_printf("socket: ");
mdb_nhconvert(&port, &conn_t.conn_lport, sizeof (port));
- mdb_printf("AF_INET6 %N %d ", &conn_t.conn_srcv6, port);
+ mdb_printf("AF_INET6 %N %d ", &conn_t.conn_laddr_v4, port);
/*
* If this is a listening socket, we don't print
@@ -836,7 +838,8 @@
IPCL_IS_UDP(&conn_t) && IPCL_IS_CONNECTED(&conn_t)) {
mdb_printf("remote: ");
mdb_nhconvert(&port, &conn_t.conn_fport, sizeof (port));
- mdb_printf("AF_INET6 %N %d ", &conn_t.conn_remv6, port);
+ mdb_printf("AF_INET6 %N %d ", &conn_t.conn_faddr_v6,
+ port);
}
break;
@@ -854,6 +857,7 @@
sctp_sock_print(struct sonode *socknode)
{
sctp_t sctp_t;
+ conn_t conns;
struct sockaddr *laddr = mdb_alloc(sizeof (struct sockaddr), UM_SLEEP);
struct sockaddr *faddr = mdb_alloc(sizeof (struct sockaddr), UM_SLEEP);
@@ -864,6 +868,14 @@
return (-1);
}
+ if (mdb_vread(&conns, sizeof (conn_t),
+ (uintptr_t)sctp_t.sctp_connp) == -1) {
+ mdb_warn("failed to read conn_t at %p",
+ (uintptr_t)sctp_t.sctp_connp);
+ return (-1);
+ }
+ sctp_t.sctp_connp = &conns;
+
if (sctp_getsockaddr(&sctp_t, laddr) == 0) {
mdb_printf("socket:");
pfiles_print_addr(laddr);
diff --git a/usr/src/cmd/mdb/common/modules/ip/ip.c b/usr/src/cmd/mdb/common/modules/ip/ip.c
index 28f21ef..da94942 100644
--- a/usr/src/cmd/mdb/common/modules/ip/ip.c
+++ b/usr/src/cmd/mdb/common/modules/ip/ip.c
@@ -52,6 +52,7 @@
#include <ilb/ilb_nat.h>
#include <ilb/ilb_conn.h>
#include <sys/dlpi.h>
+#include <sys/zone.h>
#include <mdb/mdb_modapi.h>
#include <mdb/mdb_ks.h>
@@ -84,15 +85,20 @@
ill_if_t ill_if;
} illif_walk_data_t;
-typedef struct nce_walk_data_s {
- struct ndp_g_s nce_ip_ndp;
- int nce_hash_tbl_index;
- nce_t nce;
-} nce_walk_data_t;
+typedef struct ncec_walk_data_s {
+ struct ndp_g_s ncec_ip_ndp;
+ int ncec_hash_tbl_index;
+ ncec_t ncec;
+} ncec_walk_data_t;
+
+typedef struct ncec_cbdata_s {
+ uintptr_t ncec_addr;
+ int ncec_ipversion;
+} ncec_cbdata_t;
typedef struct nce_cbdata_s {
- uintptr_t nce_addr;
- int nce_ipversion;
+ int nce_ipversion;
+ char nce_ill_name[LIFNAMSIZ];
} nce_cbdata_t;
typedef struct ire_cbdata_s {
@@ -100,6 +106,12 @@
boolean_t verbose;
} ire_cbdata_t;
+typedef struct zi_cbdata_s {
+ const char *zone_name;
+ ip_stack_t *ipst;
+ boolean_t shared_ip_zone;
+} zi_cbdata_t;
+
typedef struct th_walk_data {
uint_t thw_non_zero_only;
boolean_t thw_match;
@@ -122,6 +134,7 @@
typedef struct ill_cbdata_s {
uintptr_t ill_addr;
int ill_ipversion;
+ ip_stack_t *ill_ipst;
boolean_t verbose;
} ill_cbdata_t;
@@ -156,7 +169,7 @@
};
static hash_walk_arg_t proto_hash_arg = {
- OFFSETOF(ip_stack_t, ips_ipcl_proto_fanout),
+ OFFSETOF(ip_stack_t, ips_ipcl_proto_fanout_v4),
0
};
@@ -210,13 +223,15 @@
static int srcid_walk_step(mdb_walk_state_t *);
static int ire_format(uintptr_t addr, const void *, void *);
-static int nce_format(uintptr_t addr, const nce_t *nce, int ipversion);
-static int nce(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv);
-static int nce_walk_step(mdb_walk_state_t *wsp);
-static int nce_stack_walk_init(mdb_walk_state_t *wsp);
-static int nce_stack_walk_step(mdb_walk_state_t *wsp);
-static void nce_stack_walk_fini(mdb_walk_state_t *wsp);
-static int nce_cb(uintptr_t addr, const nce_walk_data_t *iw, nce_cbdata_t *id);
+static int ncec_format(uintptr_t addr, const ncec_t *ncec, int ipversion);
+static int ncec(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv);
+static int ncec_walk_step(mdb_walk_state_t *wsp);
+static int ncec_stack_walk_init(mdb_walk_state_t *wsp);
+static int ncec_stack_walk_step(mdb_walk_state_t *wsp);
+static void ncec_stack_walk_fini(mdb_walk_state_t *wsp);
+static int ncec_cb(uintptr_t addr, const ncec_walk_data_t *iw,
+ ncec_cbdata_t *id);
+static char *nce_l2_addr(const nce_t *, const ill_t *);
static int ipcl_hash_walk_init(mdb_walk_state_t *);
static int ipcl_hash_walk_step(mdb_walk_state_t *);
@@ -262,6 +277,69 @@
return (nss.netstack_stackid);
}
+/* ARGSUSED */
+static int
+zone_to_ips_cb(uintptr_t addr, const void *zi_arg, void *zi_cb_arg)
+{
+ zi_cbdata_t *zi_cb = zi_cb_arg;
+ zone_t zone;
+ char zone_name[ZONENAME_MAX];
+ netstack_t ns;
+
+ if (mdb_vread(&zone, sizeof (zone_t), addr) == -1) {
+ mdb_warn("can't read zone at %p", addr);
+ return (WALK_ERR);
+ }
+
+ (void) mdb_readstr(zone_name, ZONENAME_MAX, (uintptr_t)zone.zone_name);
+
+ if (strcmp(zi_cb->zone_name, zone_name) != 0)
+ return (WALK_NEXT);
+
+ zi_cb->shared_ip_zone = (!(zone.zone_flags & ZF_NET_EXCL) &&
+ (strcmp(zone_name, "global") != 0));
+
+ if (mdb_vread(&ns, sizeof (netstack_t), (uintptr_t)zone.zone_netstack)
+ == -1) {
+ mdb_warn("can't read netstack at %p", zone.zone_netstack);
+ return (WALK_ERR);
+ }
+
+ zi_cb->ipst = ns.netstack_ip;
+ return (WALK_DONE);
+}
+
+static ip_stack_t *
+zone_to_ips(const char *zone_name)
+{
+ zi_cbdata_t zi_cb;
+
+ if (zone_name == NULL)
+ return (NULL);
+
+ zi_cb.zone_name = zone_name;
+ zi_cb.ipst = NULL;
+ zi_cb.shared_ip_zone = B_FALSE;
+
+ if (mdb_walk("zone", (mdb_walk_cb_t)zone_to_ips_cb, &zi_cb) == -1) {
+ mdb_warn("failed to walk zone");
+ return (NULL);
+ }
+
+ if (zi_cb.shared_ip_zone) {
+ mdb_warn("%s is a Shared-IP zone, try '-s global' instead\n",
+ zone_name);
+ return (NULL);
+ }
+
+ if (zi_cb.ipst == NULL) {
+ mdb_warn("failed to find zone %s\n", zone_name);
+ return (NULL);
+ }
+
+ return (zi_cb.ipst);
+}
+
int
ip_stacks_walk_init(mdb_walk_state_t *wsp)
{
@@ -529,8 +607,117 @@
}
int
+nce_walk_init(mdb_walk_state_t *wsp)
+{
+ if (mdb_layered_walk("nce_cache", wsp) == -1) {
+ mdb_warn("can't walk 'nce_cache'");
+ return (WALK_ERR);
+ }
+
+ return (WALK_NEXT);
+}
+
+int
+nce_walk_step(mdb_walk_state_t *wsp)
+{
+ nce_t nce;
+
+ if (mdb_vread(&nce, sizeof (nce), wsp->walk_addr) == -1) {
+ mdb_warn("can't read nce at %p", wsp->walk_addr);
+ return (WALK_ERR);
+ }
+
+ return (wsp->walk_callback(wsp->walk_addr, &nce, wsp->walk_cbdata));
+}
+
+static int
+nce_format(uintptr_t addr, const nce_t *ncep, void *nce_cb_arg)
+{
+ nce_cbdata_t *nce_cb = nce_cb_arg;
+ ill_t ill;
+ char ill_name[LIFNAMSIZ];
+ ncec_t ncec;
+
+ if (mdb_vread(&ncec, sizeof (ncec),
+ (uintptr_t)ncep->nce_common) == -1) {
+ mdb_warn("can't read ncec at %p", ncep->nce_common);
+ return (WALK_NEXT);
+ }
+ if (nce_cb->nce_ipversion != 0 &&
+ ncec.ncec_ipversion != nce_cb->nce_ipversion)
+ return (WALK_NEXT);
+
+ if (mdb_vread(&ill, sizeof (ill), (uintptr_t)ncep->nce_ill) == -1) {
+ mdb_snprintf(ill_name, sizeof (ill_name), "--");
+ } else {
+ (void) mdb_readstr(ill_name,
+ MIN(LIFNAMSIZ, ill.ill_name_length),
+ (uintptr_t)ill.ill_name);
+ }
+
+ if (nce_cb->nce_ill_name[0] != '\0' &&
+ strncmp(nce_cb->nce_ill_name, ill_name, LIFNAMSIZ) != 0)
+ return (WALK_NEXT);
+
+ if (ncec.ncec_ipversion == IPV6_VERSION) {
+
+ mdb_printf("%?p %5s %-18s %?p %6d %N\n",
+ addr, ill_name,
+ nce_l2_addr(ncep, &ill),
+ ncep->nce_fp_mp,
+ ncep->nce_refcnt,
+ &ncep->nce_addr);
+
+ } else {
+ struct in_addr nceaddr;
+
+ IN6_V4MAPPED_TO_INADDR(&ncep->nce_addr, &nceaddr);
+ mdb_printf("%?p %5s %-18s %?p %6d %I\n",
+ addr, ill_name,
+ nce_l2_addr(ncep, &ill),
+ ncep->nce_fp_mp,
+ ncep->nce_refcnt,
+ nceaddr.s_addr);
+ }
+
+ return (WALK_NEXT);
+}
+
+int
+dce_walk_init(mdb_walk_state_t *wsp)
+{
+ wsp->walk_data = (void *)wsp->walk_addr;
+
+ if (mdb_layered_walk("dce_cache", wsp) == -1) {
+ mdb_warn("can't walk 'dce_cache'");
+ return (WALK_ERR);
+ }
+
+ return (WALK_NEXT);
+}
+
+int
+dce_walk_step(mdb_walk_state_t *wsp)
+{
+ dce_t dce;
+
+ if (mdb_vread(&dce, sizeof (dce), wsp->walk_addr) == -1) {
+ mdb_warn("can't read dce at %p", wsp->walk_addr);
+ return (WALK_ERR);
+ }
+
+ /* If ip_stack_t is specified, skip DCEs that don't belong to it. */
+ if ((wsp->walk_data != NULL) && (wsp->walk_data != dce.dce_ipst))
+ return (WALK_NEXT);
+
+ return (wsp->walk_callback(wsp->walk_addr, &dce, wsp->walk_cbdata));
+}
+
+int
ire_walk_init(mdb_walk_state_t *wsp)
{
+ wsp->walk_data = (void *)wsp->walk_addr;
+
if (mdb_layered_walk("ire_cache", wsp) == -1) {
mdb_warn("can't walk 'ire_cache'");
return (WALK_ERR);
@@ -549,53 +736,13 @@
return (WALK_ERR);
}
+ /* If ip_stack_t is specified, skip IREs that don't belong to it. */
+ if ((wsp->walk_data != NULL) && (wsp->walk_data != ire.ire_ipst))
+ return (WALK_NEXT);
+
return (wsp->walk_callback(wsp->walk_addr, &ire, wsp->walk_cbdata));
}
-
-int
-ire_ctable_walk_step(mdb_walk_state_t *wsp)
-{
- uintptr_t kaddr;
- irb_t *irb;
- uint32_t cache_table_size;
- int i;
- ire_cbdata_t ire_cb;
-
- ire_cb.verbose = B_FALSE;
- ire_cb.ire_ipversion = 0;
-
-
- kaddr = wsp->walk_addr + OFFSETOF(ip_stack_t, ips_ip_cache_table_size);
-
- if (mdb_vread(&cache_table_size, sizeof (uint32_t), kaddr) == -1) {
- mdb_warn("can't read ips_ip_cache_table at %p", kaddr);
- return (WALK_ERR);
- }
-
- kaddr = wsp->walk_addr + OFFSETOF(ip_stack_t, ips_ip_cache_table);
- if (mdb_vread(&kaddr, sizeof (kaddr), kaddr) == -1) {
- mdb_warn("can't read ips_ip_cache_table at %p", kaddr);
- return (WALK_ERR);
- }
-
- irb = mdb_alloc(sizeof (irb_t) * cache_table_size, UM_SLEEP|UM_GC);
- if (mdb_vread(irb, sizeof (irb_t) * cache_table_size, kaddr) == -1) {
- mdb_warn("can't read irb at %p", kaddr);
- return (WALK_ERR);
- }
- for (i = 0; i < cache_table_size; i++) {
- kaddr = (uintptr_t)irb[i].irb_ire;
-
- if (mdb_pwalk("ire_next", ire_format, &ire_cb,
- kaddr) == -1) {
- mdb_warn("can't walk 'ire_next' for ire %p", kaddr);
- return (WALK_ERR);
- }
- }
- return (WALK_NEXT);
-}
-
/* ARGSUSED */
int
ire_next_walk_init(mdb_walk_state_t *wsp)
@@ -633,6 +780,9 @@
const ire_t *irep = ire_arg;
ire_cbdata_t *ire_cb = ire_cb_arg;
boolean_t verbose = ire_cb->verbose;
+ ill_t ill;
+ char ill_name[LIFNAMSIZ];
+ boolean_t condemned = irep->ire_generation == IRE_GENERATION_CONDEMNED;
static const mdb_bitmask_t tmasks[] = {
{ "BROADCAST", IRE_BROADCAST, IRE_BROADCAST },
@@ -640,22 +790,12 @@
{ "LOCAL", IRE_LOCAL, IRE_LOCAL },
{ "LOOPBACK", IRE_LOOPBACK, IRE_LOOPBACK },
{ "PREFIX", IRE_PREFIX, IRE_PREFIX },
- { "CACHE", IRE_CACHE, IRE_CACHE },
+ { "MULTICAST", IRE_MULTICAST, IRE_MULTICAST },
+ { "NOROUTE", IRE_NOROUTE, IRE_NOROUTE },
{ "IF_NORESOLVER", IRE_IF_NORESOLVER, IRE_IF_NORESOLVER },
{ "IF_RESOLVER", IRE_IF_RESOLVER, IRE_IF_RESOLVER },
+ { "IF_CLONE", IRE_IF_CLONE, IRE_IF_CLONE },
{ "HOST", IRE_HOST, IRE_HOST },
- { "HOST_REDIRECT", IRE_HOST_REDIRECT, IRE_HOST_REDIRECT },
- { NULL, 0, 0 }
- };
-
- static const mdb_bitmask_t mmasks[] = {
- { "CONDEMNED", IRE_MARK_CONDEMNED, IRE_MARK_CONDEMNED },
- { "TESTHIDDEN", IRE_MARK_TESTHIDDEN, IRE_MARK_TESTHIDDEN },
- { "NOADD", IRE_MARK_NOADD, IRE_MARK_NOADD },
- { "TEMPORARY", IRE_MARK_TEMPORARY, IRE_MARK_TEMPORARY },
- { "USESRC", IRE_MARK_USESRC_CHECK, IRE_MARK_USESRC_CHECK },
- { "PRIVATE", IRE_MARK_PRIVATE_ADDR, IRE_MARK_PRIVATE_ADDR },
- { "UNCACHED", IRE_MARK_UNCACHED, IRE_MARK_UNCACHED },
{ NULL, 0, 0 }
};
@@ -678,6 +818,7 @@
{ "PROTO1", RTF_PROTO1, RTF_PROTO1 },
{ "MULTIRT", RTF_MULTIRT, RTF_MULTIRT },
{ "SETSRC", RTF_SETSRC, RTF_SETSRC },
+ { "INDIRECT", RTF_INDIRECT, RTF_INDIRECT },
{ NULL, 0, 0 }
};
@@ -685,40 +826,53 @@
irep->ire_ipversion != ire_cb->ire_ipversion)
return (WALK_NEXT);
+ if (mdb_vread(&ill, sizeof (ill), (uintptr_t)irep->ire_ill) == -1) {
+ mdb_snprintf(ill_name, sizeof (ill_name), "--");
+ } else {
+ (void) mdb_readstr(ill_name,
+ MIN(LIFNAMSIZ, ill.ill_name_length),
+ (uintptr_t)ill.ill_name);
+ }
+
if (irep->ire_ipversion == IPV6_VERSION && verbose) {
- mdb_printf("%<b>%?p%</b> %40N <%hb>\n"
- "%?s %40N <%hb>\n"
- "%?s %40d %4d <%hb>\n",
- addr, &irep->ire_src_addr_v6, irep->ire_type, tmasks,
- "", &irep->ire_addr_v6, (ushort_t)irep->ire_marks, mmasks,
+ mdb_printf("%<b>%?p%</b>%3s %40N <%hb%s>\n"
+ "%?s %40N\n"
+ "%?s %40d %4d <%hb> %s\n",
+ addr, condemned ? "(C)" : "", &irep->ire_setsrc_addr_v6,
+ irep->ire_type, tmasks,
+ (irep->ire_testhidden ? ", HIDDEN" : ""),
+ "", &irep->ire_addr_v6,
"", ips_to_stackid((uintptr_t)irep->ire_ipst),
irep->ire_zoneid,
- irep->ire_flags, fmasks);
+ irep->ire_flags, fmasks, ill_name);
} else if (irep->ire_ipversion == IPV6_VERSION) {
- mdb_printf("%?p %30N %30N %5d %4d\n",
- addr, &irep->ire_src_addr_v6,
+ mdb_printf("%?p%3s %30N %30N %5d %4d %s\n",
+ addr, condemned ? "(C)" : "", &irep->ire_setsrc_addr_v6,
&irep->ire_addr_v6,
ips_to_stackid((uintptr_t)irep->ire_ipst),
- irep->ire_zoneid);
+ irep->ire_zoneid, ill_name);
} else if (verbose) {
- mdb_printf("%<b>%?p%</b> %40I <%hb>\n"
- "%?s %40I <%hb>\n"
- "%?s %40d %4d <%hb>\n",
- addr, irep->ire_src_addr, irep->ire_type, tmasks,
- "", irep->ire_addr, (ushort_t)irep->ire_marks, mmasks,
+ mdb_printf("%<b>%?p%</b>%3s %40I <%hb%s>\n"
+ "%?s %40I\n"
+ "%?s %40d %4d <%hb> %s\n",
+ addr, condemned ? "(C)" : "", irep->ire_setsrc_addr,
+ irep->ire_type, tmasks,
+ (irep->ire_testhidden ? ", HIDDEN" : ""),
+ "", irep->ire_addr,
"", ips_to_stackid((uintptr_t)irep->ire_ipst),
- irep->ire_zoneid, irep->ire_flags, fmasks);
+ irep->ire_zoneid, irep->ire_flags, fmasks, ill_name);
} else {
- mdb_printf("%?p %30I %30I %5d %4d\n", addr, irep->ire_src_addr,
+ mdb_printf("%?p%3s %30I %30I %5d %4d %s\n", addr,
+ condemned ? "(C)" : "", irep->ire_setsrc_addr,
irep->ire_addr, ips_to_stackid((uintptr_t)irep->ire_ipst),
- irep->ire_zoneid);
+ irep->ire_zoneid, ill_name);
}
return (WALK_NEXT);
@@ -1040,6 +1194,140 @@
}
int
+nce(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
+{
+ nce_t nce;
+ nce_cbdata_t nce_cb;
+ int ipversion = 0;
+ const char *opt_P = NULL, *opt_ill;
+
+ if (mdb_getopts(argc, argv,
+ 'i', MDB_OPT_STR, &opt_ill,
+ 'P', MDB_OPT_STR, &opt_P, NULL) != argc)
+ return (DCMD_USAGE);
+
+ if (opt_P != NULL) {
+ if (strcmp("v4", opt_P) == 0) {
+ ipversion = IPV4_VERSION;
+ } else if (strcmp("v6", opt_P) == 0) {
+ ipversion = IPV6_VERSION;
+ } else {
+ mdb_warn("invalid protocol '%s'\n", opt_P);
+ return (DCMD_USAGE);
+ }
+ }
+
+ if ((flags & DCMD_LOOPFIRST) || !(flags & DCMD_LOOP)) {
+ mdb_printf("%<u>%?s %5s %18s %?s %s %s %</u>\n",
+ "ADDR", "INTF", "LLADDR", "FP_MP", "REFCNT",
+ "NCE_ADDR");
+ }
+
+ bzero(&nce_cb, sizeof (nce_cb));
+ if (opt_ill != NULL) {
+ strcpy(nce_cb.nce_ill_name, opt_ill);
+ }
+ nce_cb.nce_ipversion = ipversion;
+
+ if (flags & DCMD_ADDRSPEC) {
+ (void) mdb_vread(&nce, sizeof (nce_t), addr);
+ (void) nce_format(addr, &nce, &nce_cb);
+ } else if (mdb_walk("nce", (mdb_walk_cb_t)nce_format, &nce_cb) == -1) {
+ mdb_warn("failed to walk ire table");
+ return (DCMD_ERR);
+ }
+
+ return (DCMD_OK);
+}
+
+/* ARGSUSED */
+static int
+dce_format(uintptr_t addr, const dce_t *dcep, void *dce_cb_arg)
+{
+ static const mdb_bitmask_t dmasks[] = {
+ { "D", DCEF_DEFAULT, DCEF_DEFAULT },
+ { "P", DCEF_PMTU, DCEF_PMTU },
+ { "U", DCEF_UINFO, DCEF_UINFO },
+ { "S", DCEF_TOO_SMALL_PMTU, DCEF_TOO_SMALL_PMTU },
+ { NULL, 0, 0 }
+ };
+ char flagsbuf[2 * A_CNT(dmasks)];
+ int ipversion = *(int *)dce_cb_arg;
+ boolean_t condemned = dcep->dce_generation == DCE_GENERATION_CONDEMNED;
+
+ if (ipversion != 0 && ipversion != dcep->dce_ipversion)
+ return (WALK_NEXT);
+
+ mdb_snprintf(flagsbuf, sizeof (flagsbuf), "%b", dcep->dce_flags,
+ dmasks);
+
+ switch (dcep->dce_ipversion) {
+ case IPV4_VERSION:
+ mdb_printf("%<u>%?p%3s %8s %8d %30I %</u>\n", addr, condemned ?
+ "(C)" : "", flagsbuf, dcep->dce_pmtu, &dcep->dce_v4addr);
+ break;
+ case IPV6_VERSION:
+ mdb_printf("%<u>%?p%3s %8s %8d %30N %</u>\n", addr, condemned ?
+ "(C)" : "", flagsbuf, dcep->dce_pmtu, &dcep->dce_v6addr);
+ break;
+ default:
+ mdb_printf("%<u>%?p%3s %8s %8d %30s %</u>\n", addr, condemned ?
+ "(C)" : "", flagsbuf, dcep->dce_pmtu, "");
+ }
+
+ return (WALK_NEXT);
+}
+
+int
+dce(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
+{
+ dce_t dce;
+ const char *opt_P = NULL;
+ const char *zone_name = NULL;
+ ip_stack_t *ipst = NULL;
+ int ipversion = 0;
+
+ if (mdb_getopts(argc, argv,
+ 's', MDB_OPT_STR, &zone_name,
+ 'P', MDB_OPT_STR, &opt_P, NULL) != argc)
+ return (DCMD_USAGE);
+
+ /* Follow the specified zone name to find a ip_stack_t*. */
+ if (zone_name != NULL) {
+ ipst = zone_to_ips(zone_name);
+ if (ipst == NULL)
+ return (DCMD_USAGE);
+ }
+
+ if (opt_P != NULL) {
+ if (strcmp("v4", opt_P) == 0) {
+ ipversion = IPV4_VERSION;
+ } else if (strcmp("v6", opt_P) == 0) {
+ ipversion = IPV6_VERSION;
+ } else {
+ mdb_warn("invalid protocol '%s'\n", opt_P);
+ return (DCMD_USAGE);
+ }
+ }
+
+ if ((flags & DCMD_LOOPFIRST) || !(flags & DCMD_LOOP)) {
+ mdb_printf("%<u>%?s%3s %8s %8s %30s %</u>\n",
+ "ADDR", "", "FLAGS", "PMTU", "DST_ADDR");
+ }
+
+ if (flags & DCMD_ADDRSPEC) {
+ (void) mdb_vread(&dce, sizeof (dce_t), addr);
+ (void) dce_format(addr, &dce, &ipversion);
+ } else if (mdb_pwalk("dce", (mdb_walk_cb_t)dce_format, &ipversion,
+ (uintptr_t)ipst) == -1) {
+ mdb_warn("failed to walk dce cache");
+ return (DCMD_ERR);
+ }
+
+ return (DCMD_OK);
+}
+
+int
ire(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
{
uint_t verbose = FALSE;
@@ -1047,12 +1335,22 @@
ire_cbdata_t ire_cb;
int ipversion = 0;
const char *opt_P = NULL;
+ const char *zone_name = NULL;
+ ip_stack_t *ipst = NULL;
if (mdb_getopts(argc, argv,
'v', MDB_OPT_SETBITS, TRUE, &verbose,
+ 's', MDB_OPT_STR, &zone_name,
'P', MDB_OPT_STR, &opt_P, NULL) != argc)
return (DCMD_USAGE);
+ /* Follow the specified zone name to find a ip_stack_t*. */
+ if (zone_name != NULL) {
+ ipst = zone_to_ips(zone_name);
+ if (ipst == NULL)
+ return (DCMD_USAGE);
+ }
+
if (opt_P != NULL) {
if (strcmp("v4", opt_P) == 0) {
ipversion = IPV4_VERSION;
@@ -1069,13 +1367,13 @@
if (verbose) {
mdb_printf("%?s %40s %-20s%\n"
"%?s %40s %-20s%\n"
- "%<u>%?s %40s %4s %-20s%</u>\n",
+ "%<u>%?s %40s %4s %-20s %s%</u>\n",
"ADDR", "SRC", "TYPE",
"", "DST", "MARKS",
- "", "STACK", "ZONE", "FLAGS");
+ "", "STACK", "ZONE", "FLAGS", "INTF");
} else {
- mdb_printf("%<u>%?s %30s %30s %5s %4s%</u>\n",
- "ADDR", "SRC", "DST", "STACK", "ZONE");
+ mdb_printf("%<u>%?s %30s %30s %5s %4s %s%</u>\n",
+ "ADDR", "SRC", "DST", "STACK", "ZONE", "INTF");
}
}
@@ -1085,7 +1383,8 @@
if (flags & DCMD_ADDRSPEC) {
(void) mdb_vread(&ire, sizeof (ire_t), addr);
(void) ire_format(addr, &ire, &ire_cb);
- } else if (mdb_walk("ire", (mdb_walk_cb_t)ire_format, &ire_cb) == -1) {
+ } else if (mdb_pwalk("ire", (mdb_walk_cb_t)ire_format, &ire_cb,
+ (uintptr_t)ipst) == -1) {
mdb_warn("failed to walk ire table");
return (DCMD_ERR);
}
@@ -1338,7 +1637,7 @@
static void
th_trace_help(void)
{
- mdb_printf("If given an address of an ill_t, ipif_t, ire_t, or nce_t, "
+ mdb_printf("If given an address of an ill_t, ipif_t, ire_t, or ncec_t, "
"print the\n"
"corresponding th_trace_t structure in detail. Otherwise, if no "
"address is\n"
@@ -1354,8 +1653,8 @@
{ "srcid_status", ":",
"display connection structures from ipcl hash tables",
srcid_status },
- { "ill", "?[-v] [-P v4 | v6]", "display ill_t structures",
- ill, ill_help },
+ { "ill", "?[-v] [-P v4 | v6] [-s exclusive-ip-zone-name]",
+ "display ill_t structures", ill, ill_help },
{ "illif", "?[-P v4 | v6]",
"display or filter IP Lower Level InterFace structures", illif,
illif_help },
@@ -1363,10 +1662,14 @@
{ "ip6hdr", ":[-vf]", "display an IPv6 header", ip6hdr },
{ "ipif", "?[-v] [-P v4 | v6]", "display ipif structures",
ipif, ipif_help },
- { "ire", "?[-v] [-P v4|v6]",
+ { "ire", "?[-v] [-P v4|v6] [-s exclusive-ip-zone-name]",
"display Internet Route Entry structures", ire },
- { "nce", "?[-P v4 | v6]", "display Neighbor Cache Entry structures",
- nce },
+ { "nce", "?[-P v4|v6] [-i <interface>]",
+ "display interface-specific Neighbor Cache structures", nce },
+ { "ncec", "?[-P v4 | v6]", "display Neighbor Cache Entry structures",
+ ncec },
+ { "dce", "?[-P v4|v6] [-s exclusive-ip-zone-name]",
+ "display Destination Cache Entry structures", dce },
{ "squeue", ":[-v]", "print core squeue_t info", squeue,
ip_squeue_help },
{ "tcphdr", ":", "display a TCP header", tcphdr },
@@ -1385,7 +1688,7 @@
{ "illif_stack", "walk list of ill interface types",
illif_stack_walk_init, illif_stack_walk_step,
illif_stack_walk_fini },
- { "ill", "walk list of nce structures for all stacks",
+ { "ill", "walk active ill_t structures for all stacks",
ill_walk_init, ill_walk_step, NULL },
{ "ipif", "walk list of ipif structures for all stacks",
ipif_walk_init, ipif_walk_step, NULL },
@@ -1400,19 +1703,21 @@
&srcid_walk_arg },
{ "ire", "walk active ire_t structures",
ire_walk_init, ire_walk_step, NULL },
- { "ire_ctable", "walk ire_t structures in the ctable",
- ip_stacks_common_walk_init, ire_ctable_walk_step, NULL },
{ "ire_next", "walk ire_t structures in the ctable",
ire_next_walk_init, ire_next_walk_step, NULL },
+ { "nce", "walk active nce_t structures",
+ nce_walk_init, nce_walk_step, NULL },
+ { "dce", "walk active dce_t structures",
+ dce_walk_init, dce_walk_step, NULL },
{ "ip_stacks", "walk all the ip_stack_t",
ip_stacks_walk_init, ip_stacks_walk_step, NULL },
{ "th_hash", "walk all the th_hash_t entries",
th_hash_walk_init, th_hash_walk_step, NULL },
- { "nce", "walk list of nce structures for all stacks",
- ip_stacks_common_walk_init, nce_walk_step, NULL },
- { "nce_stack", "walk list of nce structures",
- nce_stack_walk_init, nce_stack_walk_step,
- nce_stack_walk_fini},
+ { "ncec", "walk list of ncec structures for all stacks",
+ ip_stacks_common_walk_init, ncec_walk_step, NULL },
+ { "ncec_stack", "walk list of ncec structures",
+ ncec_stack_walk_init, ncec_stack_walk_step,
+ ncec_stack_walk_fini},
{ "udp_hash", "walk list of conn_t structures in ips_ipcl_udp_fanout",
ipcl_hash_walk_init, ipcl_hash_walk_step,
ipcl_hash_walk_fini, &udp_hash_arg},
@@ -1471,9 +1776,9 @@
}
static char *
-nce_state(int nce_state)
+ncec_state(int ncec_state)
{
- switch (nce_state) {
+ switch (ncec_state) {
case ND_UNCHANGED:
return ("unchanged");
case ND_INCOMPLETE:
@@ -1496,6 +1801,36 @@
}
static char *
+ncec_l2_addr(const ncec_t *ncec, const ill_t *ill)
+{
+ uchar_t *h;
+ static char addr_buf[L2MAXADDRSTRLEN];
+
+ if (ncec->ncec_lladdr == NULL) {
+ return ("None");
+ }
+
+ if (ill->ill_net_type == IRE_IF_RESOLVER) {
+
+ if (ill->ill_phys_addr_length == 0)
+ return ("None");
+ h = mdb_zalloc(ill->ill_phys_addr_length, UM_SLEEP);
+ if (mdb_vread(h, ill->ill_phys_addr_length,
+ (uintptr_t)ncec->ncec_lladdr) == -1) {
+ mdb_warn("failed to read hwaddr at %p",
+ ncec->ncec_lladdr);
+ return ("Unknown");
+ }
+ mdb_mac_addr(h, ill->ill_phys_addr_length,
+ addr_buf, sizeof (addr_buf));
+ } else {
+ return ("None");
+ }
+ mdb_free(h, ill->ill_phys_addr_length);
+ return (addr_buf);
+}
+
+static char *
nce_l2_addr(const nce_t *nce, const ill_t *ill)
{
uchar_t *h;
@@ -1503,29 +1838,24 @@
mblk_t mp;
size_t mblen;
- if (ill->ill_flags & ILLF_XRESOLV) {
- return ("XRESOLV");
- }
-
- if (nce->nce_res_mp == NULL) {
+ if (nce->nce_dlur_mp == NULL)
return ("None");
- }
if (ill->ill_net_type == IRE_IF_RESOLVER) {
-
if (mdb_vread(&mp, sizeof (mblk_t),
- (uintptr_t)nce->nce_res_mp) == -1) {
- mdb_warn("failed to read nce_res_mp at %p",
- nce->nce_res_mp);
+ (uintptr_t)nce->nce_dlur_mp) == -1) {
+ mdb_warn("failed to read nce_dlur_mp at %p",
+ nce->nce_dlur_mp);
+ return ("None");
}
-
- if (ill->ill_nd_lla_len == 0)
+ if (ill->ill_phys_addr_length == 0)
return ("None");
mblen = mp.b_wptr - mp.b_rptr;
if (mblen > (sizeof (dl_unitdata_req_t) + MAX_SAP_LEN) ||
- ill->ill_nd_lla_len > MAX_SAP_LEN ||
- NCE_LL_ADDR_OFFSET(ill) + ill->ill_nd_lla_len > mblen) {
- return ("Truncated");
+ ill->ill_phys_addr_length > MAX_SAP_LEN ||
+ (NCE_LL_ADDR_OFFSET(ill) +
+ ill->ill_phys_addr_length) > mblen) {
+ return ("Unknown");
}
h = mdb_zalloc(mblen, UM_SLEEP);
if (mdb_vread(h, mblen, (uintptr_t)(mp.b_rptr)) == -1) {
@@ -1533,8 +1863,8 @@
mp.b_rptr + NCE_LL_ADDR_OFFSET(ill));
return ("Unknown");
}
- mdb_mac_addr(h + NCE_LL_ADDR_OFFSET(ill), ill->ill_nd_lla_len,
- addr_buf, sizeof (addr_buf));
+ mdb_mac_addr(h + NCE_LL_ADDR_OFFSET(ill),
+ ill->ill_phys_addr_length, addr_buf, sizeof (addr_buf));
} else {
return ("None");
}
@@ -1543,7 +1873,7 @@
}
static void
-nce_header(uint_t flags)
+ncec_header(uint_t flags)
{
if ((flags & DCMD_LOOPFIRST) || !(flags & DCMD_LOOP)) {
@@ -1553,10 +1883,10 @@
}
int
-nce(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
+ncec(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
{
- nce_t nce;
- nce_cbdata_t id;
+ ncec_t ncec;
+ ncec_cbdata_t id;
int ipversion = 0;
const char *opt_P = NULL;
@@ -1577,23 +1907,23 @@
if (flags & DCMD_ADDRSPEC) {
- if (mdb_vread(&nce, sizeof (nce_t), addr) == -1) {
- mdb_warn("failed to read nce at %p\n", addr);
+ if (mdb_vread(&ncec, sizeof (ncec_t), addr) == -1) {
+ mdb_warn("failed to read ncec at %p\n", addr);
return (DCMD_ERR);
}
- if (ipversion != 0 && nce.nce_ipversion != ipversion) {
+ if (ipversion != 0 && ncec.ncec_ipversion != ipversion) {
mdb_printf("IP Version mismatch\n");
return (DCMD_ERR);
}
- nce_header(flags);
- return (nce_format(addr, &nce, ipversion));
+ ncec_header(flags);
+ return (ncec_format(addr, &ncec, ipversion));
} else {
- id.nce_addr = addr;
- id.nce_ipversion = ipversion;
- nce_header(flags);
- if (mdb_walk("nce", (mdb_walk_cb_t)nce_cb, &id) == -1) {
- mdb_warn("failed to walk nce table\n");
+ id.ncec_addr = addr;
+ id.ncec_ipversion = ipversion;
+ ncec_header(flags);
+ if (mdb_walk("ncec", (mdb_walk_cb_t)ncec_cb, &id) == -1) {
+ mdb_warn("failed to walk ncec table\n");
return (DCMD_ERR);
}
}
@@ -1601,10 +1931,10 @@
}
static int
-nce_format(uintptr_t addr, const nce_t *nce, int ipversion)
+ncec_format(uintptr_t addr, const ncec_t *ncec, int ipversion)
{
- static const mdb_bitmask_t nce_flags[] = {
- { "P", NCE_F_PERMANENT, NCE_F_PERMANENT },
+ static const mdb_bitmask_t ncec_flags[] = {
+ { "P", NCE_F_NONUD, NCE_F_NONUD },
{ "R", NCE_F_ISROUTER, NCE_F_ISROUTER },
{ "N", NCE_F_NONUD, NCE_F_NONUD },
{ "A", NCE_F_ANYCAST, NCE_F_ANYCAST },
@@ -1613,15 +1943,15 @@
{ "B", NCE_F_BCAST, NCE_F_BCAST },
{ NULL, 0, 0 }
};
-#define NCE_MAX_FLAGS (sizeof (nce_flags) / sizeof (mdb_bitmask_t))
+#define NCE_MAX_FLAGS (sizeof (ncec_flags) / sizeof (mdb_bitmask_t))
struct in_addr nceaddr;
ill_t ill;
char ill_name[LIFNAMSIZ];
char flagsbuf[NCE_MAX_FLAGS];
- if (mdb_vread(&ill, sizeof (ill), (uintptr_t)nce->nce_ill) == -1) {
- mdb_warn("failed to read nce_ill at %p",
- nce->nce_ill);
+ if (mdb_vread(&ill, sizeof (ill), (uintptr_t)ncec->ncec_ill) == -1) {
+ mdb_warn("failed to read ncec_ill at %p",
+ ncec->ncec_ill);
return (DCMD_ERR);
}
@@ -1629,33 +1959,33 @@
(uintptr_t)ill.ill_name);
mdb_snprintf(flagsbuf, sizeof (flagsbuf), "%hb",
- nce->nce_flags, nce_flags);
+ ncec->ncec_flags, ncec_flags);
- if (ipversion != 0 && nce->nce_ipversion != ipversion)
+ if (ipversion != 0 && ncec->ncec_ipversion != ipversion)
return (DCMD_OK);
- if (nce->nce_ipversion == IPV4_VERSION) {
- IN6_V4MAPPED_TO_INADDR(&nce->nce_addr, &nceaddr);
+ if (ncec->ncec_ipversion == IPV4_VERSION) {
+ IN6_V4MAPPED_TO_INADDR(&ncec->ncec_addr, &nceaddr);
mdb_printf("%?p %-20s %-10s "
"%-8s "
"%-5s %I\n",
- addr, nce_l2_addr(nce, &ill),
- nce_state(nce->nce_state),
+ addr, ncec_l2_addr(ncec, &ill),
+ ncec_state(ncec->ncec_state),
flagsbuf,
ill_name, nceaddr.s_addr);
} else {
mdb_printf("%?p %-20s %-10s %-8s %-5s %N\n",
- addr, nce_l2_addr(nce, &ill),
- nce_state(nce->nce_state),
+ addr, ncec_l2_addr(ncec, &ill),
+ ncec_state(ncec->ncec_state),
flagsbuf,
- ill_name, &nce->nce_addr);
+ ill_name, &ncec->ncec_addr);
}
return (DCMD_OK);
}
static uintptr_t
-nce_get_next_hash_tbl(uintptr_t start, int *index, struct ndp_g_s ndp)
+ncec_get_next_hash_tbl(uintptr_t start, int *index, struct ndp_g_s ndp)
{
uintptr_t addr = start;
int i = *index;
@@ -1671,7 +2001,7 @@
}
static int
-nce_walk_step(mdb_walk_state_t *wsp)
+ncec_walk_step(mdb_walk_state_t *wsp)
{
uintptr_t kaddr4, kaddr6;
@@ -1686,15 +2016,15 @@
mdb_warn("can't read ips_ip_cache_table at %p", kaddr6);
return (WALK_ERR);
}
- if (mdb_pwalk("nce_stack", wsp->walk_callback, wsp->walk_cbdata,
+ if (mdb_pwalk("ncec_stack", wsp->walk_callback, wsp->walk_cbdata,
kaddr4) == -1) {
- mdb_warn("couldn't walk 'nce_stack' for ips_ndp4 %p",
+ mdb_warn("couldn't walk 'ncec_stack' for ips_ndp4 %p",
kaddr4);
return (WALK_ERR);
}
- if (mdb_pwalk("nce_stack", wsp->walk_callback,
+ if (mdb_pwalk("ncec_stack", wsp->walk_callback,
wsp->walk_cbdata, kaddr6) == -1) {
- mdb_warn("couldn't walk 'nce_stack' for ips_ndp6 %p",
+ mdb_warn("couldn't walk 'ncec_stack' for ips_ndp6 %p",
kaddr6);
return (WALK_ERR);
}
@@ -1743,7 +2073,7 @@
mdb_free(iw, sizeof (ipcl_hash_walk_data_t));
return (WALK_ERR);
}
- if (arg->tbl_off == OFFSETOF(ip_stack_t, ips_ipcl_proto_fanout) ||
+ if (arg->tbl_off == OFFSETOF(ip_stack_t, ips_ipcl_proto_fanout_v4) ||
arg->tbl_off == OFFSETOF(ip_stack_t, ips_ipcl_proto_fanout_v6)) {
iw->hash_tbl_size = IPPROTO_MAX;
} else {
@@ -1809,72 +2139,75 @@
* Called with walk_addr being the address of ips_ndp{4,6}
*/
static int
-nce_stack_walk_init(mdb_walk_state_t *wsp)
+ncec_stack_walk_init(mdb_walk_state_t *wsp)
{
- nce_walk_data_t *nw;
+ ncec_walk_data_t *nw;
if (wsp->walk_addr == NULL) {
- mdb_warn("nce_stack requires ndp_g_s address\n");
+ mdb_warn("ncec_stack requires ndp_g_s address\n");
return (WALK_ERR);
}
- nw = mdb_alloc(sizeof (nce_walk_data_t), UM_SLEEP);
+ nw = mdb_alloc(sizeof (ncec_walk_data_t), UM_SLEEP);
- if (mdb_vread(&nw->nce_ip_ndp, sizeof (struct ndp_g_s),
+ if (mdb_vread(&nw->ncec_ip_ndp, sizeof (struct ndp_g_s),
wsp->walk_addr) == -1) {
mdb_warn("failed to read 'ip_ndp' at %p",
wsp->walk_addr);
- mdb_free(nw, sizeof (nce_walk_data_t));
+ mdb_free(nw, sizeof (ncec_walk_data_t));
return (WALK_ERR);
}
- nw->nce_hash_tbl_index = 0;
- wsp->walk_addr = nce_get_next_hash_tbl(NULL,
- &nw->nce_hash_tbl_index, nw->nce_ip_ndp);
+ /*
+ * ncec_get_next_hash_tbl() starts at ++i , so initialize index to -1
+ */
+ nw->ncec_hash_tbl_index = -1;
+ wsp->walk_addr = ncec_get_next_hash_tbl(NULL,
+ &nw->ncec_hash_tbl_index, nw->ncec_ip_ndp);
wsp->walk_data = nw;
return (WALK_NEXT);
}
static int
-nce_stack_walk_step(mdb_walk_state_t *wsp)
+ncec_stack_walk_step(mdb_walk_state_t *wsp)
{
uintptr_t addr = wsp->walk_addr;
- nce_walk_data_t *nw = wsp->walk_data;
+ ncec_walk_data_t *nw = wsp->walk_data;
if (addr == NULL)
return (WALK_DONE);
- if (mdb_vread(&nw->nce, sizeof (nce_t), addr) == -1) {
- mdb_warn("failed to read nce_t at %p", addr);
+ if (mdb_vread(&nw->ncec, sizeof (ncec_t), addr) == -1) {
+ mdb_warn("failed to read ncec_t at %p", addr);
return (WALK_ERR);
}
- wsp->walk_addr = (uintptr_t)nw->nce.nce_next;
+ wsp->walk_addr = (uintptr_t)nw->ncec.ncec_next;
- wsp->walk_addr = nce_get_next_hash_tbl(wsp->walk_addr,
- &nw->nce_hash_tbl_index, nw->nce_ip_ndp);
+ wsp->walk_addr = ncec_get_next_hash_tbl(wsp->walk_addr,
+ &nw->ncec_hash_tbl_index, nw->ncec_ip_ndp);
return (wsp->walk_callback(addr, nw, wsp->walk_cbdata));
}
static void
-nce_stack_walk_fini(mdb_walk_state_t *wsp)
+ncec_stack_walk_fini(mdb_walk_state_t *wsp)
{
- mdb_free(wsp->walk_data, sizeof (nce_walk_data_t));
+ mdb_free(wsp->walk_data, sizeof (ncec_walk_data_t));
}
/* ARGSUSED */
static int
-nce_cb(uintptr_t addr, const nce_walk_data_t *iw, nce_cbdata_t *id)
+ncec_cb(uintptr_t addr, const ncec_walk_data_t *iw, ncec_cbdata_t *id)
{
- nce_t nce;
+ ncec_t ncec;
- if (mdb_vread(&nce, sizeof (nce_t), addr) == -1) {
- mdb_warn("failed to read nce at %p", addr);
+ if (mdb_vread(&ncec, sizeof (ncec_t), addr) == -1) {
+ mdb_warn("failed to read ncec at %p", addr);
return (WALK_NEXT);
}
- (void) nce_format(addr, &nce, id->nce_ipversion);
+ (void) ncec_format(addr, &ncec, id->ncec_ipversion);
return (WALK_NEXT);
}
@@ -1918,6 +2251,11 @@
mdb_warn("failed to read ill at %p", addr);
return (WALK_NEXT);
}
+
+ /* If ip_stack_t is specified, skip ILLs that don't belong to it. */
+ if (id->ill_ipst != NULL && ill.ill_ipst != id->ill_ipst)
+ return (WALK_NEXT);
+
return (ill_format((uintptr_t)addr, &ill, id));
}
@@ -2013,7 +2351,7 @@
break;
}
cnt = ill->ill_refcnt + ill->ill_ire_cnt + ill->ill_nce_cnt +
- ill->ill_ilm_walker_cnt + ill->ill_ilm_cnt;
+ ill->ill_ilm_cnt + ill->ill_ncec_cnt;
mdb_printf("%-?p %-8s %-3s ",
addr, ill_name, ill->ill_isv6 ? "v6" : "v4");
if (typebuf != NULL)
@@ -2035,11 +2373,10 @@
strlen(sbuf), "", ill->ill_ire_cnt, "ill_ire_cnt");
mdb_printf("%*s %7d %-18s nces referencing this ill\n",
strlen(sbuf), "", ill->ill_nce_cnt, "ill_nce_cnt");
+ mdb_printf("%*s %7d %-18s ncecs referencing this ill\n",
+ strlen(sbuf), "", ill->ill_ncec_cnt, "ill_ncec_cnt");
mdb_printf("%*s %7d %-18s ilms referencing this ill\n",
strlen(sbuf), "", ill->ill_ilm_cnt, "ill_ilm_cnt");
- mdb_printf("%*s %7d %-18s active ilm walkers\n\n",
- strlen(sbuf), "", ill->ill_ilm_walker_cnt,
- "ill_ilm_walker_cnt");
} else {
mdb_printf("%4d %-?p %-llb\n",
cnt, ill->ill_wq,
@@ -2054,14 +2391,24 @@
ill_t ill_data;
ill_cbdata_t id;
int ipversion = 0;
+ const char *zone_name = NULL;
const char *opt_P = NULL;
uint_t verbose = FALSE;
+ ip_stack_t *ipst = NULL;
if (mdb_getopts(argc, argv,
'v', MDB_OPT_SETBITS, TRUE, &verbose,
+ 's', MDB_OPT_STR, &zone_name,
'P', MDB_OPT_STR, &opt_P, NULL) != argc)
return (DCMD_USAGE);
+ /* Follow the specified zone name to find a ip_stack_t*. */
+ if (zone_name != NULL) {
+ ipst = zone_to_ips(zone_name);
+ if (ipst == NULL)
+ return (DCMD_USAGE);
+ }
+
if (opt_P != NULL) {
if (strcmp("v4", opt_P) == 0) {
ipversion = IPV4_VERSION;
@@ -2076,6 +2423,7 @@
id.verbose = verbose;
id.ill_addr = addr;
id.ill_ipversion = ipversion;
+ id.ill_ipst = ipst;
ill_header(verbose);
if (flags & DCMD_ADDRSPEC) {
@@ -2254,7 +2602,6 @@
{ "CO", IPIF_CONDEMNED, IPIF_CONDEMNED},
{ "CH", IPIF_CHANGING, IPIF_CHANGING},
{ "SL", IPIF_SET_LINKLOCAL, IPIF_SET_LINKLOCAL},
- { "ZS", IPIF_ZERO_SOURCE, IPIF_ZERO_SOURCE},
{ NULL, 0, 0 }
};
static const mdb_bitmask_t fmasks[] = {
@@ -2299,16 +2646,14 @@
}
mdb_snprintf(bitfields, sizeof (bitfields), "%s",
ipif->ipif_addr_ready ? ",ADR" : "",
- ipif->ipif_multicast_up ? ",MU" : "",
ipif->ipif_was_up ? ",WU" : "",
- ipif->ipif_was_dup ? ",WD" : "",
- ipif->ipif_joined_allhosts ? ",JA" : "");
+ ipif->ipif_was_dup ? ",WD" : "");
mdb_snprintf(flagsbuf, sizeof (flagsbuf), "%llb%s",
ipif->ipif_flags, fmasks, bitfields);
mdb_snprintf(sflagsbuf, sizeof (sflagsbuf), "%b",
ipif->ipif_state_flags, sfmasks);
- cnt = ipif->ipif_refcnt + ipif->ipif_ire_cnt + ipif->ipif_ilm_cnt;
+ cnt = ipif->ipif_refcnt;
if (ipifcb->ill.ill_isv6) {
mdb_snprintf(addrstr, sizeof (addrstr), "%N",
@@ -2329,12 +2674,6 @@
mdb_printf("%s |\n%s +---> %4d %-15s "
"Active consistent reader cnt\n",
sbuf, sbuf, ipif->ipif_refcnt, "ipif_refcnt");
- mdb_printf("%*s %10d %-15s "
- "Number of ire's referencing this ipif\n",
- strlen(sbuf), "", ipif->ipif_ire_cnt, "ipif_ire_cnt");
- mdb_printf("%*s %10d %-15s "
- "Number of ilm's referencing this ipif\n\n",
- strlen(sbuf), "", ipif->ipif_ilm_cnt, "ipif_ilm_cnt");
mdb_printf("%-s/%d\n",
addrstr, mask_to_prefixlen(af, &ipif->ipif_v6net_mask));
if (ipifcb->ill.ill_isv6) {
@@ -2473,16 +2812,16 @@
mdb_printf("%-?p %-?p %?d %?d\n", addr, conn->conn_wq,
nss.netstack_stackid, conn->conn_zoneid);
- if (conn->conn_af_isv6) {
+ if (conn->conn_family == AF_INET6) {
mdb_snprintf(src_addrstr, sizeof (rem_addrstr), "%N",
- &conn->conn_srcv6);
+ &conn->conn_laddr_v6);
mdb_snprintf(rem_addrstr, sizeof (rem_addrstr), "%N",
- &conn->conn_remv6);
+ &conn->conn_faddr_v6);
} else {
mdb_snprintf(src_addrstr, sizeof (src_addrstr), "%I",
- V4_PART_OF_V6((conn->conn_srcv6)));
+ V4_PART_OF_V6((conn->conn_laddr_v6)));
mdb_snprintf(rem_addrstr, sizeof (rem_addrstr), "%I",
- V4_PART_OF_V6((conn->conn_remv6)));
+ V4_PART_OF_V6((conn->conn_faddr_v6)));
}
mdb_printf("%s:%-5d\n%s:%-5d\n",
src_addrstr, conn->conn_lport, rem_addrstr, conn->conn_fport);
@@ -2519,7 +2858,7 @@
{
mdb_printf("Prints conn_t structures from the following hash tables: "
"\n\tips_ipcl_udp_fanout\n\tips_ipcl_bind_fanout"
- "\n\tips_ipcl_conn_fanout\n\tips_ipcl_proto_fanout"
+ "\n\tips_ipcl_conn_fanout\n\tips_ipcl_proto_fanout_v4"
"\n\tips_ipcl_proto_fanout_v6\n");
}
diff --git a/usr/src/cmd/mdb/common/modules/sctp/sctp.c b/usr/src/cmd/mdb/common/modules/sctp/sctp.c
index 05f0c38..4165a56 100644
--- a/usr/src/cmd/mdb/common/modules/sctp/sctp.c
+++ b/usr/src/cmd/mdb/common/modules/sctp/sctp.c
@@ -20,12 +20,10 @@
*/
/*
- * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
-#pragma ident "%Z%%M% %I% %E% SMI"
-
#include <sys/types.h>
#include <sys/stream.h>
#include <sys/mdb_modapi.h>
@@ -164,7 +162,7 @@
mdb_printf("lastactive\t%?ld\thb_secret\t%?#lx\n", fa->lastactive,
fa->hb_secret);
mdb_printf("rxt_unacked\t%?u\n", fa->rxt_unacked);
- mdb_printf("timer_mp\t%?p\tire\t\t%?p\n", fa->timer_mp, fa->ire);
+ mdb_printf("timer_mp\t%?p\tixa\t\t%?p\n", fa->timer_mp, fa->ixa);
mdb_printf("hb_enabled\t%?d\thb_pending\t%?d\n"
"timer_running\t%?d\tdf\t\t%?d\n"
"pmtu_discovered\t%?d\tisv4\t\t%?d\n"
@@ -566,11 +564,12 @@
{
mdb_printf("\tunderstands_asconf\t%d\n",
sctp->sctp_understands_asconf);
- mdb_printf("\tdebug\t\t\t%d\n", sctp->sctp_debug);
+ mdb_printf("\tdebug\t\t\t%d\n", sctp->sctp_connp->conn_debug);
mdb_printf("\tcchunk_pend\t\t%d\n", sctp->sctp_cchunk_pend);
- mdb_printf("\tdgram_errind\t\t%d\n", sctp->sctp_dgram_errind);
+ mdb_printf("\tdgram_errind\t\t%d\n",
+ sctp->sctp_connp->conn_dgram_errind);
- mdb_printf("\tlinger\t\t\t%d\n", sctp->sctp_linger);
+ mdb_printf("\tlinger\t\t\t%d\n", sctp->sctp_connp->conn_linger);
if (sctp->sctp_lingering)
return;
mdb_printf("\tlingering\t\t%d\n", sctp->sctp_lingering);
@@ -578,7 +577,8 @@
mdb_printf("\tforce_sack\t\t%d\n", sctp->sctp_force_sack);
mdb_printf("\tack_timer_runing\t%d\n", sctp->sctp_ack_timer_running);
- mdb_printf("\trecvdstaddr\t\t%d\n", sctp->sctp_recvdstaddr);
+ mdb_printf("\trecvdstaddr\t\t%d\n",
+ sctp->sctp_connp->conn_recv_ancillary.crb_recvdstaddr);
mdb_printf("\thwcksum\t\t\t%d\n", sctp->sctp_hwcksum);
mdb_printf("\tunderstands_addip\t%d\n", sctp->sctp_understands_addip);
@@ -654,8 +654,8 @@
if (saddr->saddr_ipif_delete_pending == 1)
mdb_printf("/DeletePending");
mdb_printf(")\n");
- mdb_printf("\t\t\tMTU %d id %d zoneid %d IPIF flags %x\n",
- ipif.sctp_ipif_mtu, ipif.sctp_ipif_id,
+ mdb_printf("\t\t\tid %d zoneid %d IPIF flags %x\n",
+ ipif.sctp_ipif_id,
ipif.sctp_ipif_zoneid, ipif.sctp_ipif_flags);
return (WALK_NEXT);
}
@@ -682,8 +682,8 @@
int
sctp(uintptr_t addr, uint_t flags, int argc, const mdb_arg_t *argv)
{
- sctp_t sctp;
- conn_t connp;
+ sctp_t sctps, *sctp;
+ conn_t conns, *connp;
int i;
uint_t opts = 0;
uint_t paddr = 0;
@@ -692,16 +692,23 @@
if (!(flags & DCMD_ADDRSPEC))
return (DCMD_USAGE);
- if (mdb_vread(&sctp, sizeof (sctp), addr) == -1) {
+ if (mdb_vread(&sctps, sizeof (sctps), addr) == -1) {
mdb_warn("failed to read sctp_t at: %p\n", addr);
return (DCMD_ERR);
}
- if (mdb_vread(&connp, sizeof (connp),
- (uintptr_t)sctp.sctp_connp) == -1) {
- mdb_warn("failed to read conn_t at: %p\n", sctp.sctp_connp);
+ sctp = &sctps;
+
+ if (mdb_vread(&conns, sizeof (conns),
+ (uintptr_t)sctp->sctp_connp) == -1) {
+ mdb_warn("failed to read conn_t at: %p\n", sctp->sctp_connp);
return (DCMD_ERR);
}
+ connp = &conns;
+
+ connp->conn_sctp = sctp;
+ sctp->sctp_connp = connp;
+
if (mdb_getopts(argc, argv,
'a', MDB_OPT_SETBITS, MDB_SCTP_SHOW_ALL, &opts,
'f', MDB_OPT_SETBITS, MDB_SCTP_SHOW_FLAGS, &opts,
@@ -726,7 +733,7 @@
/* non-verbose faddrs, suitable for pipelines to sctp_faddr */
if (paddr != 0) {
sctp_faddr_t faddr, *fp;
- for (fp = sctp.sctp_faddrs; fp != NULL; fp = faddr.next) {
+ for (fp = sctp->sctp_faddrs; fp != NULL; fp = faddr.next) {
if (mdb_vread(&faddr, sizeof (faddr), (uintptr_t)fp)
== -1) {
mdb_warn("failed to read faddr at %p",
@@ -738,16 +745,16 @@
return (DCMD_OK);
}
- mdb_nhconvert(&lport, &sctp.sctp_lport, sizeof (lport));
- mdb_nhconvert(&fport, &sctp.sctp_fport, sizeof (fport));
+ mdb_nhconvert(&lport, &connp->conn_lport, sizeof (lport));
+ mdb_nhconvert(&fport, &connp->conn_fport, sizeof (fport));
mdb_printf("%<u>%p% %22s S=%-6hu D=%-6hu% STACK=%d ZONE=%d%</u>", addr,
- state2str(&sctp), lport, fport,
- ns_to_stackid((uintptr_t)connp.conn_netstack), connp.conn_zoneid);
+ state2str(sctp), lport, fport,
+ ns_to_stackid((uintptr_t)connp->conn_netstack), connp->conn_zoneid);
- if (sctp.sctp_faddrs) {
+ if (sctp->sctp_faddrs) {
sctp_faddr_t faddr;
if (mdb_vread(&faddr, sizeof (faddr),
- (uintptr_t)sctp.sctp_faddrs) != -1)
+ (uintptr_t)sctp->sctp_faddrs) != -1)
mdb_printf("%<u> %N%</u>", &faddr.faddr);
}
mdb_printf("\n");
@@ -756,78 +763,78 @@
mdb_printf("%<b>Local and Peer Addresses%</b>\n");
/* Display source addresses */
- mdb_printf("nsaddrs\t\t%?d\n", sctp.sctp_nsaddrs);
+ mdb_printf("nsaddrs\t\t%?d\n", sctp->sctp_nsaddrs);
(void) mdb_pwalk("sctp_walk_saddr", print_saddr, NULL, addr);
/* Display peer addresses */
- mdb_printf("nfaddrs\t\t%?d\n", sctp.sctp_nfaddrs);
+ mdb_printf("nfaddrs\t\t%?d\n", sctp->sctp_nfaddrs);
i = 1;
(void) mdb_pwalk("sctp_walk_faddr", print_faddr, &i, addr);
mdb_printf("lastfaddr\t%?p\tprimary\t\t%?p\n",
- sctp.sctp_lastfaddr, sctp.sctp_primary);
+ sctp->sctp_lastfaddr, sctp->sctp_primary);
mdb_printf("current\t\t%?p\tlastdata\t%?p\n",
- sctp.sctp_current, sctp.sctp_lastdata);
+ sctp->sctp_current, sctp->sctp_lastdata);
}
if (opts & MDB_SCTP_SHOW_OUT) {
mdb_printf("%<b>Outbound Data%</b>\n");
mdb_printf("xmit_head\t%?p\txmit_tail\t%?p\n",
- sctp.sctp_xmit_head, sctp.sctp_xmit_tail);
+ sctp->sctp_xmit_head, sctp->sctp_xmit_tail);
mdb_printf("xmit_unsent\t%?p\txmit_unsent_tail%?p\n",
- sctp.sctp_xmit_unsent, sctp.sctp_xmit_unsent_tail);
- mdb_printf("xmit_unacked\t%?p\n", sctp.sctp_xmit_unacked);
+ sctp->sctp_xmit_unsent, sctp->sctp_xmit_unsent_tail);
+ mdb_printf("xmit_unacked\t%?p\n", sctp->sctp_xmit_unacked);
mdb_printf("unacked\t\t%?u\tunsent\t\t%?ld\n",
- sctp.sctp_unacked, sctp.sctp_unsent);
+ sctp->sctp_unacked, sctp->sctp_unsent);
mdb_printf("ltsn\t\t%?x\tlastack_rxd\t%?x\n",
- sctp.sctp_ltsn, sctp.sctp_lastack_rxd);
+ sctp->sctp_ltsn, sctp->sctp_lastack_rxd);
mdb_printf("recovery_tsn\t%?x\tadv_pap\t\t%?x\n",
- sctp.sctp_recovery_tsn, sctp.sctp_adv_pap);
+ sctp->sctp_recovery_tsn, sctp->sctp_adv_pap);
mdb_printf("num_ostr\t%?hu\tostrcntrs\t%?p\n",
- sctp.sctp_num_ostr, sctp.sctp_ostrcntrs);
+ sctp->sctp_num_ostr, sctp->sctp_ostrcntrs);
mdb_printf("pad_mp\t\t%?p\terr_chunks\t%?p\n",
- sctp.sctp_pad_mp, sctp.sctp_err_chunks);
- mdb_printf("err_len\t\t%?u\n", sctp.sctp_err_len);
+ sctp->sctp_pad_mp, sctp->sctp_err_chunks);
+ mdb_printf("err_len\t\t%?u\n", sctp->sctp_err_len);
mdb_printf("%<b>Default Send Parameters%</b>\n");
mdb_printf("def_stream\t%?u\tdef_flags\t%?x\n",
- sctp.sctp_def_stream, sctp.sctp_def_flags);
+ sctp->sctp_def_stream, sctp->sctp_def_flags);
mdb_printf("def_ppid\t%?x\tdef_context\t%?x\n",
- sctp.sctp_def_ppid, sctp.sctp_def_context);
+ sctp->sctp_def_ppid, sctp->sctp_def_context);
mdb_printf("def_timetolive\t%?u\n",
- sctp.sctp_def_timetolive);
+ sctp->sctp_def_timetolive);
}
if (opts & MDB_SCTP_SHOW_IN) {
mdb_printf("%<b>Inbound Data%</b>\n");
mdb_printf("sack_info\t%?p\tsack_gaps\t%?d\n",
- sctp.sctp_sack_info, sctp.sctp_sack_gaps);
- dump_sack_info((uintptr_t)sctp.sctp_sack_info);
+ sctp->sctp_sack_info, sctp->sctp_sack_gaps);
+ dump_sack_info((uintptr_t)sctp->sctp_sack_info);
mdb_printf("ftsn\t\t%?x\tlastacked\t%?x\n",
- sctp.sctp_ftsn, sctp.sctp_lastacked);
+ sctp->sctp_ftsn, sctp->sctp_lastacked);
mdb_printf("istr_nmsgs\t%?d\tsack_toggle\t%?d\n",
- sctp.sctp_istr_nmsgs, sctp.sctp_sack_toggle);
- mdb_printf("ack_mp\t\t%?p\n", sctp.sctp_ack_mp);
+ sctp->sctp_istr_nmsgs, sctp->sctp_sack_toggle);
+ mdb_printf("ack_mp\t\t%?p\n", sctp->sctp_ack_mp);
mdb_printf("num_istr\t%?hu\tinstr\t\t%?p\n",
- sctp.sctp_num_istr, sctp.sctp_instr);
- mdb_printf("unord_reass\t%?p\n", sctp.sctp_uo_frags);
+ sctp->sctp_num_istr, sctp->sctp_instr);
+ mdb_printf("unord_reass\t%?p\n", sctp->sctp_uo_frags);
}
if (opts & MDB_SCTP_SHOW_RTT) {
mdb_printf("%<b>RTT Tracking%</b>\n");
mdb_printf("rtt_tsn\t\t%?x\tout_time\t%?ld\n",
- sctp.sctp_rtt_tsn, sctp.sctp_out_time);
+ sctp->sctp_rtt_tsn, sctp->sctp_out_time);
}
if (opts & MDB_SCTP_SHOW_FLOW) {
mdb_printf("%<b>Flow Control%</b>\n");
- mdb_printf("txmit_hiwater\t%?d\n"
- "xmit_lowater\t%?d\tfrwnd\t\t%?u\n"
+ mdb_printf("tconn_sndbuf\t%?d\n"
+ "conn_sndlowat\t%?d\tfrwnd\t\t%?u\n"
"rwnd\t\t%?u\tinitial rwnd\t%?u\n"
- "rxqueued\t%?u\tcwnd_max\t%?u\n", sctp.sctp_xmit_hiwater,
- sctp.sctp_xmit_lowater, sctp.sctp_frwnd,
- sctp.sctp_rwnd, sctp.sctp_irwnd, sctp.sctp_rxqueued,
- sctp.sctp_cwnd_max);
+ "rxqueued\t%?u\tcwnd_max\t%?u\n", connp->conn_sndbuf,
+ connp->conn_sndlowat, sctp->sctp_frwnd,
+ sctp->sctp_rwnd, sctp->sctp_irwnd, sctp->sctp_rxqueued,
+ sctp->sctp_cwnd_max);
}
if (opts & MDB_SCTP_SHOW_HDR) {
@@ -838,21 +845,21 @@
"ipha\t\t%?p\tip6h\t\t%?p\n"
"ip_hdr_len\t%?d\tip_hdr6_len\t%?d\n"
"sctph\t\t%?p\tsctph6\t\t%?p\n"
- "lvtag\t\t%?x\tfvtag\t\t%?x\n", sctp.sctp_iphc,
- sctp.sctp_iphc6, sctp.sctp_iphc_len,
- sctp.sctp_iphc6_len, sctp.sctp_hdr_len,
- sctp.sctp_hdr6_len, sctp.sctp_ipha, sctp.sctp_ip6h,
- sctp.sctp_ip_hdr_len, sctp.sctp_ip_hdr6_len,
- sctp.sctp_sctph, sctp.sctp_sctph6, sctp.sctp_lvtag,
- sctp.sctp_fvtag);
+ "lvtag\t\t%?x\tfvtag\t\t%?x\n", sctp->sctp_iphc,
+ sctp->sctp_iphc6, sctp->sctp_iphc_len,
+ sctp->sctp_iphc6_len, sctp->sctp_hdr_len,
+ sctp->sctp_hdr6_len, sctp->sctp_ipha, sctp->sctp_ip6h,
+ sctp->sctp_ip_hdr_len, sctp->sctp_ip_hdr6_len,
+ sctp->sctp_sctph, sctp->sctp_sctph6, sctp->sctp_lvtag,
+ sctp->sctp_fvtag);
}
if (opts & MDB_SCTP_SHOW_PMTUD) {
mdb_printf("%<b>PMTUd%</b>\n");
mdb_printf("last_mtu_probe\t%?ld\tmtu_probe_intvl\t%?ld\n"
"mss\t\t%?u\n",
- sctp.sctp_last_mtu_probe, sctp.sctp_mtu_probe_intvl,
- sctp.sctp_mss);
+ sctp->sctp_last_mtu_probe, sctp->sctp_mtu_probe_intvl,
+ sctp->sctp_mss);
}
if (opts & MDB_SCTP_SHOW_RXT) {
@@ -862,33 +869,33 @@
"pp_max_rxt\t%?d\trto_max\t\t%?u\n"
"rto_min\t\t%?u\trto_initial\t%?u\n"
"init_rto_max\t%?u\n"
- "rxt_nxttsn\t%?u\trxt_maxtsn\t%?u\n", sctp.sctp_cookie_mp,
- sctp.sctp_strikes, sctp.sctp_max_init_rxt,
- sctp.sctp_pa_max_rxt, sctp.sctp_pp_max_rxt,
- sctp.sctp_rto_max, sctp.sctp_rto_min,
- sctp.sctp_rto_initial, sctp.sctp_init_rto_max,
- sctp.sctp_rxt_nxttsn, sctp.sctp_rxt_maxtsn);
+ "rxt_nxttsn\t%?u\trxt_maxtsn\t%?u\n", sctp->sctp_cookie_mp,
+ sctp->sctp_strikes, sctp->sctp_max_init_rxt,
+ sctp->sctp_pa_max_rxt, sctp->sctp_pp_max_rxt,
+ sctp->sctp_rto_max, sctp->sctp_rto_min,
+ sctp->sctp_rto_initial, sctp->sctp_init_rto_max,
+ sctp->sctp_rxt_nxttsn, sctp->sctp_rxt_maxtsn);
}
if (opts & MDB_SCTP_SHOW_CONN) {
mdb_printf("%<b>Connection State%</b>\n");
mdb_printf("last_secret_update%?ld\n",
- sctp.sctp_last_secret_update);
+ sctp->sctp_last_secret_update);
mdb_printf("secret\t\t");
for (i = 0; i < SCTP_SECRET_LEN; i++) {
if (i % 2 == 0)
- mdb_printf("0x%02x", sctp.sctp_secret[i]);
+ mdb_printf("0x%02x", sctp->sctp_secret[i]);
else
- mdb_printf("%02x ", sctp.sctp_secret[i]);
+ mdb_printf("%02x ", sctp->sctp_secret[i]);
}
mdb_printf("\n");
mdb_printf("old_secret\t");
for (i = 0; i < SCTP_SECRET_LEN; i++) {
if (i % 2 == 0)
- mdb_printf("0x%02x", sctp.sctp_old_secret[i]);
+ mdb_printf("0x%02x", sctp->sctp_old_secret[i]);
else
- mdb_printf("%02x ", sctp.sctp_old_secret[i]);
+ mdb_printf("%02x ", sctp->sctp_old_secret[i]);
}
mdb_printf("\n");
}
@@ -901,40 +908,40 @@
"T2expire\t%?lu\tT3expire\t%?lu\n"
"msgcount\t%?llu\tprsctpdrop\t%?llu\n"
"AssocStartTime\t%?lu\n",
- sctp.sctp_opkts, sctp.sctp_obchunks,
- sctp.sctp_odchunks, sctp.sctp_oudchunks,
- sctp.sctp_rxtchunks, sctp.sctp_T1expire,
- sctp.sctp_T2expire, sctp.sctp_T3expire,
- sctp.sctp_msgcount, sctp.sctp_prsctpdrop,
- sctp.sctp_assoc_start_time);
+ sctp->sctp_opkts, sctp->sctp_obchunks,
+ sctp->sctp_odchunks, sctp->sctp_oudchunks,
+ sctp->sctp_rxtchunks, sctp->sctp_T1expire,
+ sctp->sctp_T2expire, sctp->sctp_T3expire,
+ sctp->sctp_msgcount, sctp->sctp_prsctpdrop,
+ sctp->sctp_assoc_start_time);
mdb_printf("ipkts\t\t%?llu\tibchunks\t%?llu\n"
"idchunks\t%?llu\tiudchunks\t%?llu\n"
"fragdmsgs\t%?llu\treassmsgs\t%?llu\n",
- sctp.sctp_ipkts, sctp.sctp_ibchunks,
- sctp.sctp_idchunks, sctp.sctp_iudchunks,
- sctp.sctp_fragdmsgs, sctp.sctp_reassmsgs);
+ sctp->sctp_ipkts, sctp->sctp_ibchunks,
+ sctp->sctp_idchunks, sctp->sctp_iudchunks,
+ sctp->sctp_fragdmsgs, sctp->sctp_reassmsgs);
}
if (opts & MDB_SCTP_SHOW_HASH) {
mdb_printf("%<b>Hash Tables%</b>\n");
- mdb_printf("conn_hash_next\t%?p\t", sctp.sctp_conn_hash_next);
- mdb_printf("conn_hash_prev\t%?p\n", sctp.sctp_conn_hash_prev);
+ mdb_printf("conn_hash_next\t%?p\t", sctp->sctp_conn_hash_next);
+ mdb_printf("conn_hash_prev\t%?p\n", sctp->sctp_conn_hash_prev);
mdb_printf("listen_hash_next%?p\t",
- sctp.sctp_listen_hash_next);
+ sctp->sctp_listen_hash_next);
mdb_printf("listen_hash_prev%?p\n",
- sctp.sctp_listen_hash_prev);
- mdb_nhconvert(&lport, &sctp.sctp_lport, sizeof (lport));
+ sctp->sctp_listen_hash_prev);
+ mdb_nhconvert(&lport, &connp->conn_lport, sizeof (lport));
mdb_printf("[ listen_hash bucket\t%?d ]\n",
SCTP_LISTEN_HASH(lport));
- mdb_printf("conn_tfp\t%?p\t", sctp.sctp_conn_tfp);
- mdb_printf("listen_tfp\t%?p\n", sctp.sctp_listen_tfp);
+ mdb_printf("conn_tfp\t%?p\t", sctp->sctp_conn_tfp);
+ mdb_printf("listen_tfp\t%?p\n", sctp->sctp_listen_tfp);
mdb_printf("bind_hash\t%?p\tptpbhn\t\t%?p\n",
- sctp.sctp_bind_hash, sctp.sctp_ptpbhn);
+ sctp->sctp_bind_hash, sctp->sctp_ptpbhn);
mdb_printf("bind_lockp\t%?p\n",
- sctp.sctp_bind_lockp);
+ sctp->sctp_bind_lockp);
mdb_printf("[ bind_hash bucket\t%?d ]\n",
SCTP_BIND_HASH(lport));
}
@@ -943,8 +950,8 @@
mdb_printf("%<b>Cleanup / Close%</b>\n");
mdb_printf("shutdown_faddr\t%?p\tclient_errno\t%?d\n"
"lingertime\t%?d\trefcnt\t\t%?hu\n",
- sctp.sctp_shutdown_faddr, sctp.sctp_client_errno,
- sctp.sctp_lingertime, sctp.sctp_refcnt);
+ sctp->sctp_shutdown_faddr, sctp->sctp_client_errno,
+ connp->conn_lingertime, sctp->sctp_refcnt);
}
if (opts & MDB_SCTP_SHOW_MISC) {
@@ -955,24 +962,25 @@
"active\t\t%?ld\ttx_adaptation_code%?x\n"
"rx_adaptation_code%?x\ttimer_mp\t%?p\n"
"partial_delivery_point\t%?d\n",
- sctp.sctp_bound_if, sctp.sctp_heartbeat_mp,
- sctp.sctp_family, sctp.sctp_ipversion,
- sctp.sctp_hb_interval, sctp.sctp_autoclose,
- sctp.sctp_active, sctp.sctp_tx_adaptation_code,
- sctp.sctp_rx_adaptation_code, sctp.sctp_timer_mp,
- sctp.sctp_pd_point);
+ connp->conn_bound_if, sctp->sctp_heartbeat_mp,
+ connp->conn_family,
+ connp->conn_ipversion,
+ sctp->sctp_hb_interval, sctp->sctp_autoclose,
+ sctp->sctp_active, sctp->sctp_tx_adaptation_code,
+ sctp->sctp_rx_adaptation_code, sctp->sctp_timer_mp,
+ sctp->sctp_pd_point);
}
if (opts & MDB_SCTP_SHOW_EXT) {
mdb_printf("%<b>Extensions and Reliable Ctl Chunks%</b>\n");
mdb_printf("cxmit_list\t%?p\tlcsn\t\t%?x\n"
- "fcsn\t\t%?x\n", sctp.sctp_cxmit_list, sctp.sctp_lcsn,
- sctp.sctp_fcsn);
+ "fcsn\t\t%?x\n", sctp->sctp_cxmit_list, sctp->sctp_lcsn,
+ sctp->sctp_fcsn);
}
if (opts & MDB_SCTP_SHOW_FLAGS) {
mdb_printf("%<b>Flags%</b>\n");
- show_sctp_flags(&sctp);
+ show_sctp_flags(sctp);
}
return (DCMD_OK);
diff --git a/usr/src/common/net/patricia/radix.c b/usr/src/common/net/patricia/radix.c
index 9a1d3f7..cf20852 100644
--- a/usr/src/common/net/patricia/radix.c
+++ b/usr/src/common/net/patricia/radix.c
@@ -1,5 +1,5 @@
/*
- * Copyright 2008 Sun Microsystems, Inc. All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 1988, 1989, 1993
@@ -367,8 +367,9 @@
* is looking for some other criteria as well. Continue
* looking as if the exact match failed.
*/
- if (t->rn_parent->rn_flags & RNF_ROOT) {
- /* hit the top. have to give up */
+ if (t->rn_dupedkey == NULL &&
+ (t->rn_parent->rn_flags & RNF_ROOT)) {
+ /* no more dupedkeys and hit the top. have to give up */
return (NULL);
}
b = 0;
@@ -486,56 +487,70 @@
{
caddr_t v = v_arg;
struct radix_node *top = head->rnh_treetop;
+ struct radix_node *p, *x;
int head_off = top->rn_offset, vlen = (int)LEN(v);
struct radix_node *t = rn_search(v_arg, top);
caddr_t cp = v + head_off;
int b;
struct radix_node *tt;
+ caddr_t cp2 = t->rn_key + head_off;
+ int cmp_res;
+ caddr_t cplim = v + vlen;
/*
* Find first bit at which v and t->rn_key differ
*/
- {
- caddr_t cp2 = t->rn_key + head_off;
- int cmp_res;
- caddr_t cplim = v + vlen;
-
- while (cp < cplim)
- if (*cp2++ != *cp++)
- goto on1;
- *dupentry = 1;
- return (t);
+ while (cp < cplim)
+ if (*cp2++ != *cp++)
+ goto on1;
+ *dupentry = 1;
+ return (t);
on1:
- *dupentry = 0;
- cmp_res = (cp[-1] ^ cp2[-1]) & 0xff;
- for (b = (cp - v) << 3; cmp_res; b--)
- cmp_res >>= 1;
- }
- {
- struct radix_node *p, *x = top;
- cp = v;
- do {
- p = x;
- if (cp[x->rn_offset] & x->rn_bmask)
- x = x->rn_right;
- else
- x = x->rn_left;
- } while (b > (unsigned)x->rn_bit);
- /* x->rn_bit < b && x->rn_bit >= 0 */
- t = rn_newpair(v_arg, b, nodes);
- tt = t->rn_left;
- if ((cp[p->rn_offset] & p->rn_bmask) == 0)
- p->rn_left = t;
+ *dupentry = 0;
+ cmp_res = (cp[-1] ^ cp2[-1]) & 0xff;
+ /*
+ * (cp - v) is the number of bytes where the match is relevant.
+ * Multiply by 8 to get number of bits. Then reduce this number
+ * by the trailing bits in the last byte where we have a match
+ * by looking at (cmp_res >> 1) in each iteration below.
+ * Note that v starts at the beginning of the key, so, when key
+ * is a sockaddr structure, the preliminary len/family/port bytes
+ * are accounted for.
+ */
+ for (b = (cp - v) << 3; cmp_res; b--)
+ cmp_res >>= 1;
+ cp = v;
+ x = top;
+ do {
+ p = x;
+ if (cp[x->rn_offset] & x->rn_bmask)
+ x = x->rn_right;
else
- p->rn_right = t;
- x->rn_parent = t;
- t->rn_parent = p;
- if ((cp[t->rn_offset] & t->rn_bmask) == 0) {
- t->rn_right = x;
- } else {
- t->rn_right = tt;
- t->rn_left = x;
- }
+ x = x->rn_left;
+ } while (b > (unsigned)x->rn_bit);
+ /* x->rn_bit < b && x->rn_bit >= 0 */
+ /*
+ * now the rightmost bit where v and rn_key differ (b) is <
+ * x->rn_bit.
+ *
+ * We will add a new branch at p. b cannot equal x->rn_bit
+ * because we know we didn't find a duplicated key.
+ * The tree will be re-adjusted so that t is inserted between p
+ * and x.
+ */
+ t = rn_newpair(v_arg, b, nodes);
+ tt = t->rn_left;
+ if ((cp[p->rn_offset] & p->rn_bmask) == 0)
+ p->rn_left = t;
+ else
+ p->rn_right = t;
+ x->rn_parent = t;
+ t->rn_parent = p;
+ if ((cp[t->rn_offset] & t->rn_bmask) == 0) {
+ t->rn_right = x;
+ } else {
+ t->rn_right = tt;
+ t->rn_left = x;
}
return (tt);
}
@@ -718,6 +733,8 @@
* find it among possible duplicate key entries
* anyway, so the above test doesn't hurt.
*
+ * Insert treenodes before tt.
+ *
* We sort the masks for a duplicated key the same way as
* in a masklist -- most specific to least specific.
* This may require the unfortunate nuisance of relocating
@@ -758,22 +775,54 @@
tt->rn_bit = x->rn_bit;
tt->rn_flags |= x->rn_flags & RNF_NORMAL;
}
+ /* BEGIN CSTYLED */
+ /*
+ * at this point the parent-child relationship for p, t, x, tt is
+ * one of the following:
+ * p p
+ * : (left/right child) :
+ * : :
+ * t t
+ * / \ / \
+ * x tt tt x
+ *
+ * tt == saved_tt returned by rn_insert().
+ */
+ /* END CSTYLED */
t = saved_tt->rn_parent;
if (keyduplicated)
goto key_exists;
b_leaf = -1 - t->rn_bit;
+ /*
+ * b_leaf is now normalized to be in the leaf rn_bit format
+ * (it is the rn_bit value of a leaf corresponding to netmask
+ * of t->rn_bit).
+ */
if (t->rn_right == saved_tt)
x = t->rn_left;
else
x = t->rn_right;
- /* Promote general routes from below */
+ /*
+ * Promote general routes from below.
+ * Identify the less specific netmasks and add them to t->rm_mklist
+ */
if (x->rn_bit < 0) {
- for (mp = &t->rn_mklist; x; x = x->rn_dupedkey)
- if (x->rn_mask && (x->rn_bit >= b_leaf) && x->rn_mklist == 0) {
- *mp = m = rn_new_radix_mask(x, 0);
- if (m)
- mp = &m->rm_mklist;
- }
+ /* x is the sibling node. it is a leaf node. */
+ for (mp = &t->rn_mklist; x; x = x->rn_dupedkey)
+ if (x->rn_mask && (x->rn_bit >= b_leaf) &&
+ x->rn_mklist == 0) {
+ /*
+ * x is the first node in the dupedkey chain
+ * without a mklist, and with a shorter mask
+ * than b_leaf. Create a radix_mask
+ * corresponding to x's mask and add it to
+ * t's rn_mklist. The mask list gets created
+ * in decreasing order of mask length.
+ */
+ *mp = m = rn_new_radix_mask(x, 0);
+ if (m)
+ mp = &m->rm_mklist;
+ }
} else if (x->rn_mklist) {
/*
* Skip over masks whose index is > that of new node
@@ -788,6 +837,7 @@
if ((netmask == 0) || (b > t->rn_bit))
return (tt); /* can't lift at all */
b_leaf = tt->rn_bit;
+ /* b is the index of the netmask */
do {
x = t;
t = t->rn_parent;
diff --git a/usr/src/lib/brand/native/zone/platform.xml b/usr/src/lib/brand/native/zone/platform.xml
index e988200..0225a51 100644
--- a/usr/src/lib/brand/native/zone/platform.xml
+++ b/usr/src/lib/brand/native/zone/platform.xml
@@ -106,7 +106,6 @@
<device match="ipsecesp" ip-type="exclusive" />
<device match="ipstate" ip-type="exclusive" />
<device match="ipsync" ip-type="exclusive" />
- <device match="iptunq" ip-type="exclusive" />
<device match="keysock" ip-type="exclusive" />
<device match="rawip" ip-type="exclusive" />
<device match="rawip6" ip-type="exclusive" />
@@ -117,6 +116,7 @@
<device match="spdsock" ip-type="exclusive" />
<device match="sppp" ip-type="exclusive" />
<device match="sppptun" ip-type="exclusive" />
+ <device match="vni" ip-type="exclusive" />
<!-- Renamed devices to create under /dev -->
<device match="zcons/%z/zoneconsole" name="zconsole" />
diff --git a/usr/src/lib/brand/solaris10/zone/platform.xml b/usr/src/lib/brand/solaris10/zone/platform.xml
index fa396ec..89f7035 100644
--- a/usr/src/lib/brand/solaris10/zone/platform.xml
+++ b/usr/src/lib/brand/solaris10/zone/platform.xml
@@ -123,7 +123,6 @@
<device match="ipsecesp" ip-type="exclusive" />
<device match="ipstate" ip-type="exclusive" />
<device match="ipsync" ip-type="exclusive" />
- <device match="iptunq" ip-type="exclusive" />
<device match="keysock" ip-type="exclusive" />
<device match="rawip" ip-type="exclusive" />
<device match="rawip6" ip-type="exclusive" />
@@ -134,6 +133,7 @@
<device match="spdsock" ip-type="exclusive" />
<device match="sppp" ip-type="exclusive" />
<device match="sppptun" ip-type="exclusive" />
+ <device match="vni" ip-type="exclusive" />
<!-- Renamed devices to create under /dev -->
<device match="zcons/%z/zoneconsole" name="zconsole" />
diff --git a/usr/src/pkgdefs/SUNWckr/prototype_com b/usr/src/pkgdefs/SUNWckr/prototype_com
index 30679b7..86489c1 100644
--- a/usr/src/pkgdefs/SUNWckr/prototype_com
+++ b/usr/src/pkgdefs/SUNWckr/prototype_com
@@ -92,7 +92,6 @@
f none kernel/drv/ipsecah.conf 644 root sys
f none kernel/drv/ipsecesp.conf 644 root sys
f none kernel/drv/iptun.conf 644 root sys
-f none kernel/drv/iptunq.conf 644 root sys
f none kernel/drv/iwscn.conf 644 root sys
f none kernel/drv/keysock.conf 644 root sys
f none kernel/drv/kmdb.conf 644 root sys
diff --git a/usr/src/pkgdefs/SUNWckr/prototype_i386 b/usr/src/pkgdefs/SUNWckr/prototype_i386
index 2a66761..5f886a8 100644
--- a/usr/src/pkgdefs/SUNWckr/prototype_i386
+++ b/usr/src/pkgdefs/SUNWckr/prototype_i386
@@ -103,7 +103,6 @@
f none kernel/drv/ipsecah 755 root sys
f none kernel/drv/ipsecesp 755 root sys
f none kernel/drv/iptun 755 root sys
-f none kernel/drv/iptunq 755 root sys
f none kernel/drv/iwscn 755 root sys
f none kernel/drv/kb8042 755 root sys
f none kernel/drv/keysock 755 root sys
@@ -326,7 +325,6 @@
f none kernel/drv/amd64/ipsecah 755 root sys
f none kernel/drv/amd64/ipsecesp 755 root sys
f none kernel/drv/amd64/iptun 755 root sys
-f none kernel/drv/amd64/iptunq 755 root sys
f none kernel/drv/amd64/iwscn 755 root sys
f none kernel/drv/amd64/kb8042 755 root sys
f none kernel/drv/amd64/keysock 755 root sys
diff --git a/usr/src/pkgdefs/SUNWckr/prototype_sparc b/usr/src/pkgdefs/SUNWckr/prototype_sparc
index e086c94..c2824f9 100644
--- a/usr/src/pkgdefs/SUNWckr/prototype_sparc
+++ b/usr/src/pkgdefs/SUNWckr/prototype_sparc
@@ -94,7 +94,6 @@
f none kernel/drv/sparcv9/ipsecah 755 root sys
f none kernel/drv/sparcv9/ipsecesp 755 root sys
f none kernel/drv/sparcv9/iptun 755 root sys
-f none kernel/drv/sparcv9/iptunq 755 root sys
f none kernel/drv/sparcv9/isp 755 root sys
f none kernel/drv/sparcv9/iwscn 755 root sys
f none kernel/drv/sparcv9/kb8042 755 root sys
diff --git a/usr/src/pkgdefs/SUNWhea/prototype_com b/usr/src/pkgdefs/SUNWhea/prototype_com
index 3129ef6..e3bfe3f 100644
--- a/usr/src/pkgdefs/SUNWhea/prototype_com
+++ b/usr/src/pkgdefs/SUNWhea/prototype_com
@@ -242,6 +242,7 @@
f none usr/include/inet/arp.h 644 root bin
f none usr/include/inet/common.h 644 root bin
f none usr/include/inet/ip.h 644 root bin
+f none usr/include/inet/ip_arp.h 644 root bin
f none usr/include/inet/ip_if.h 644 root bin
f none usr/include/inet/ip_ire.h 644 root bin
f none usr/include/inet/ip_ftable.h 644 root bin
diff --git a/usr/src/pkgdefs/etc/exception_list_i386 b/usr/src/pkgdefs/etc/exception_list_i386
index 09514a0..ee760eb 100644
--- a/usr/src/pkgdefs/etc/exception_list_i386
+++ b/usr/src/pkgdefs/etc/exception_list_i386
@@ -365,7 +365,6 @@
usr/lib/amd64/libipsecutil.so i386
usr/lib/amd64/llib-lipsecutil.ln i386
#
-usr/include/inet/arp_impl.h i386
usr/include/inet/rawip_impl.h i386
usr/include/inet/udp_impl.h i386
usr/include/inet/tcp_impl.h i386
diff --git a/usr/src/pkgdefs/etc/exception_list_sparc b/usr/src/pkgdefs/etc/exception_list_sparc
index 5a32c55..533552b 100644
--- a/usr/src/pkgdefs/etc/exception_list_sparc
+++ b/usr/src/pkgdefs/etc/exception_list_sparc
@@ -354,7 +354,6 @@
usr/share/lib/locale/com/sun/dhcpmgr/cli/dhtadm/ResourceBundle.properties sparc
usr/share/lib/locale/com/sun/dhcpmgr/cli/pntadm/ResourceBundle.properties sparc
#
-usr/include/inet/arp_impl.h sparc
usr/include/inet/rawip_impl.h sparc
usr/include/inet/udp_impl.h sparc
usr/include/inet/tcp_impl.h sparc
diff --git a/usr/src/tools/scripts/bfu.sh b/usr/src/tools/scripts/bfu.sh
index be82000..e4e9a36 100644
--- a/usr/src/tools/scripts/bfu.sh
+++ b/usr/src/tools/scripts/bfu.sh
@@ -8010,6 +8010,12 @@
rm -f $root/kernel/strmod/sparcv9/tun
rm -f $root/kernel/strmod/amd64/tun
+ # Remove obsolete iptunq
+ rm -f $root/kernel/drv/iptunq
+ rm -f $root/kernel/drv/iptunq.conf
+ rm -f $root/kernel/drv/amd64/iptunq
+ rm -f $root/kernel/drv/sparcv9/iptunq
+
#
# Remove libtopo platform XML files that have been replaced by propmap
# files.
diff --git a/usr/src/uts/common/Makefile.files b/usr/src/uts/common/Makefile.files
index 042685b..550606f 100644
--- a/usr/src/uts/common/Makefile.files
+++ b/usr/src/uts/common/Makefile.files
@@ -514,7 +514,7 @@
TSWTCL_OBJS += tswtcl.o tswtclddi.o
-ARP_OBJS += arpddi.o arp.o arp_netinfo.o
+ARP_OBJS += arpddi.o
ICMP_OBJS += icmpddi.o
@@ -535,13 +535,15 @@
sctp_addr.o tn_ipopt.o tnet.o ip_netinfo.o
IP_ILB_OBJS = ilb.o ilb_nat.o ilb_conn.o ilb_alg_hash.o ilb_alg_rr.o
-IP_OBJS += igmp.o ipmp.o ip.o ip6.o ip6_asp.o ip6_if.o ip6_ire.o ip6_rts.o \
- ip_if.o ip_ire.o ip_listutils.o ip_mroute.o \
- ip_multi.o ip2mac.o ip_ndp.o ip_opt_data.o ip_rts.o ip_srcid.o \
+IP_OBJS += igmp.o ipmp.o ip.o ip6.o ip6_asp.o ip6_if.o ip6_ire.o \
+ ip6_rts.o ip_if.o ip_ire.o ip_listutils.o ip_mroute.o \
+ ip_multi.o ip2mac.o ip_ndp.o ip_rts.o ip_srcid.o \
ipddi.o ipdrop.o mi.o nd.o optcom.o snmpcom.o ipsec_loader.o \
spd.o ipclassifier.o inet_common.o ip_squeue.o squeue.o \
ip_sadb.o ip_ftable.o proto_set.o radix.o ip_dummy.o \
- ip_helper_stream.o iptunq.o \
+ ip_helper_stream.o \
+ ip_output.o ip_input.o ip6_input.o ip6_output.o ip_arp.o \
+ conn_opt.o ip_attr.o ip_dce.o \
$(IP_ICMP_OBJS) \
$(IP_RTS_OBJS) \
$(IP_TCP_OBJS) \
@@ -644,8 +646,6 @@
IPTUN_OBJS += iptun_dev.o iptun_ctl.o iptun.o
-IPTUNQ_OBJS += iptunq_ddi.o
-
AGGR_OBJS += aggr_dev.o aggr_ctl.o aggr_grp.o aggr_port.o \
aggr_send.o aggr_recv.o aggr_lacp.o
diff --git a/usr/src/uts/common/fs/sockfs/sockcommon.h b/usr/src/uts/common/fs/sockfs/sockcommon.h
index f3ffe45..fac10a8 100644
--- a/usr/src/uts/common/fs/sockfs/sockcommon.h
+++ b/usr/src/uts/common/fs/sockfs/sockcommon.h
@@ -184,8 +184,7 @@
extern void so_enqueue_msg(struct sonode *, mblk_t *, size_t);
extern void so_process_new_message(struct sonode *, mblk_t *, mblk_t *);
-extern mblk_t *socopyinuio(uio_t *, ssize_t, size_t, ssize_t, size_t, int *,
- cred_t *);
+extern mblk_t *socopyinuio(uio_t *, ssize_t, size_t, ssize_t, size_t, int *);
extern mblk_t *socopyoutuio(mblk_t *, struct uio *, ssize_t, int *);
extern boolean_t somsghasdata(mblk_t *);
diff --git a/usr/src/uts/common/fs/sockfs/sockcommon_sops.c b/usr/src/uts/common/fs/sockfs/sockcommon_sops.c
index 48a3e37..4521fdd 100644
--- a/usr/src/uts/common/fs/sockfs/sockcommon_sops.c
+++ b/usr/src/uts/common/fs/sockfs/sockcommon_sops.c
@@ -470,8 +470,7 @@
so->so_proto_props.sopp_maxpsz,
so->so_proto_props.sopp_wroff,
so->so_proto_props.sopp_maxblk,
- so->so_proto_props.sopp_tail, &error,
- cr)) == NULL) {
+ so->so_proto_props.sopp_tail, &error)) == NULL) {
break;
}
ASSERT(uiop->uio_resid >= 0);
diff --git a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c b/usr/src/uts/common/fs/sockfs/sockcommon_subr.c
index a244c65..9b806d0 100644
--- a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c
+++ b/usr/src/uts/common/fs/sockfs/sockcommon_subr.c
@@ -471,7 +471,7 @@
/* Copy userdata into a new mblk_t */
mblk_t *
socopyinuio(uio_t *uiop, ssize_t iosize, size_t wroff, ssize_t maxblk,
- size_t tail_len, int *errorp, cred_t *cr)
+ size_t tail_len, int *errorp)
{
mblk_t *head = NULL, **tail = &head;
@@ -499,11 +499,7 @@
blocksize = MIN(iosize, maxblk);
ASSERT(blocksize >= 0);
- if (is_system_labeled())
- mp = allocb_cred(wroff + blocksize + tail_len,
- cr, curproc->p_pid);
- else
- mp = allocb(wroff + blocksize + tail_len, BPRI_MED);
+ mp = allocb(wroff + blocksize + tail_len, BPRI_MED);
if (mp == NULL) {
*errorp = ENOMEM;
return (head);
diff --git a/usr/src/uts/common/fs/sockfs/socktpi.c b/usr/src/uts/common/fs/sockfs/socktpi.c
index b2a178f..bfbd67a 100644
--- a/usr/src/uts/common/fs/sockfs/socktpi.c
+++ b/usr/src/uts/common/fs/sockfs/socktpi.c
@@ -5506,205 +5506,6 @@
so_lock_single(so); /* Set SOLOCKED */
mutex_exit(&so->so_lock);
- /*
- * For SOCKET or TCP level options, try to set it here itself
- * provided socket has not been popped and we know the tcp
- * structure (stored in so_priv).
- */
- if ((level == SOL_SOCKET || level == IPPROTO_TCP) &&
- (so->so_family == AF_INET || so->so_family == AF_INET6) &&
- (so->so_version == SOV_SOCKSTREAM) &&
- (so->so_proto_handle != NULL)) {
- tcp_t *tcp = (tcp_t *)so->so_proto_handle;
- boolean_t onoff;
-
-#define intvalue (*(int32_t *)optval)
-
- switch (level) {
- case SOL_SOCKET:
- switch (option_name) { /* Check length param */
- case SO_DEBUG:
- case SO_REUSEADDR:
- case SO_DONTROUTE:
- case SO_BROADCAST:
- case SO_USELOOPBACK:
- case SO_OOBINLINE:
- case SO_DGRAM_ERRIND:
- if (optlen != (t_uscalar_t)sizeof (int32_t)) {
- error = EINVAL;
- eprintsoline(so, error);
- mutex_enter(&so->so_lock);
- goto done2;
- }
- ASSERT(optval);
- onoff = intvalue != 0;
- handled = B_TRUE;
- break;
- case SO_SNDTIMEO:
- case SO_RCVTIMEO:
- if (get_udatamodel() == DATAMODEL_NONE ||
- get_udatamodel() == DATAMODEL_NATIVE) {
- if (optlen !=
- sizeof (struct timeval)) {
- error = EINVAL;
- eprintsoline(so, error);
- mutex_enter(&so->so_lock);
- goto done2;
- }
- } else {
- if (optlen !=
- sizeof (struct timeval32)) {
- error = EINVAL;
- eprintsoline(so, error);
- mutex_enter(&so->so_lock);
- goto done2;
- }
- }
- ASSERT(optval);
- handled = B_TRUE;
- break;
- case SO_LINGER:
- if (optlen !=
- (t_uscalar_t)sizeof (struct linger)) {
- error = EINVAL;
- eprintsoline(so, error);
- mutex_enter(&so->so_lock);
- goto done2;
- }
- ASSERT(optval);
- handled = B_TRUE;
- break;
- }
-
- switch (option_name) { /* Do actions */
- case SO_LINGER: {
- struct linger *lgr = (struct linger *)optval;
-
- if (lgr->l_onoff) {
- tcp->tcp_linger = 1;
- tcp->tcp_lingertime = lgr->l_linger;
- so->so_linger.l_onoff = SO_LINGER;
- so->so_options |= SO_LINGER;
- } else {
- tcp->tcp_linger = 0;
- tcp->tcp_lingertime = 0;
- so->so_linger.l_onoff = 0;
- so->so_options &= ~SO_LINGER;
- }
- so->so_linger.l_linger = lgr->l_linger;
- handled = B_TRUE;
- break;
- }
- case SO_SNDTIMEO:
- case SO_RCVTIMEO: {
- struct timeval tl;
- clock_t val;
-
- if (get_udatamodel() == DATAMODEL_NONE ||
- get_udatamodel() == DATAMODEL_NATIVE)
- bcopy(&tl, (struct timeval *)optval,
- sizeof (struct timeval));
- else
- TIMEVAL32_TO_TIMEVAL(&tl,
- (struct timeval32 *)optval);
- val = tl.tv_sec * 1000 * 1000 + tl.tv_usec;
- if (option_name == SO_RCVTIMEO)
- so->so_rcvtimeo = drv_usectohz(val);
- else
- so->so_sndtimeo = drv_usectohz(val);
- break;
- }
-
- case SO_DEBUG:
- tcp->tcp_debug = onoff;
-#ifdef SOCK_TEST
- if (intvalue & 2)
- sock_test_timelimit = 10 * hz;
- else
- sock_test_timelimit = 0;
-
- if (intvalue & 4)
- do_useracc = 0;
- else
- do_useracc = 1;
-#endif /* SOCK_TEST */
- break;
- case SO_DONTROUTE:
- /*
- * SO_DONTROUTE, SO_USELOOPBACK and
- * SO_BROADCAST are only of interest to IP.
- * We track them here only so
- * that we can report their current value.
- */
- tcp->tcp_dontroute = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- case SO_USELOOPBACK:
- tcp->tcp_useloopback = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- case SO_BROADCAST:
- tcp->tcp_broadcast = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- case SO_REUSEADDR:
- tcp->tcp_reuseaddr = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- case SO_OOBINLINE:
- tcp->tcp_oobinline = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- case SO_DGRAM_ERRIND:
- tcp->tcp_dgram_errind = onoff;
- if (onoff)
- so->so_options |= option_name;
- else
- so->so_options &= ~option_name;
- break;
- }
- break;
- case IPPROTO_TCP:
- switch (option_name) {
- case TCP_NODELAY:
- if (optlen != (t_uscalar_t)sizeof (int32_t)) {
- error = EINVAL;
- eprintsoline(so, error);
- mutex_enter(&so->so_lock);
- goto done2;
- }
- ASSERT(optval);
- tcp->tcp_naglim = intvalue ? 1 : tcp->tcp_mss;
- handled = B_TRUE;
- break;
- }
- break;
- default:
- handled = B_FALSE;
- break;
- }
- }
-
- if (handled) {
- mutex_enter(&so->so_lock);
- goto done2;
- }
-
optmgmt_req.PRIM_type = T_SVR4_OPTMGMT_REQ;
optmgmt_req.MGMT_flags = T_NEGOTIATE;
optmgmt_req.OPT_length = (t_scalar_t)sizeof (oh) + optlen;
diff --git a/usr/src/uts/common/inet/Makefile b/usr/src/uts/common/inet/Makefile
index 052c010..3d45e48 100644
--- a/usr/src/uts/common/inet/Makefile
+++ b/usr/src/uts/common/inet/Makefile
@@ -28,12 +28,12 @@
# include global definitions
include ../../../Makefile.master
-HDRS= arp.h arp_impl.h common.h ipclassifier.h ip.h ip6.h ipdrop.h ipnet.h \
+HDRS= arp.h common.h ipclassifier.h ip.h ip6.h ipdrop.h ipnet.h \
ipsecah.h ipsecesp.h ipsec_info.h iptun.h ip6_asp.h ip_if.h ip_ire.h \
ip_multi.h ip_netinfo.h ip_ndp.h ip_rts.h ipsec_impl.h keysock.h \
led.h mi.h mib2.h nd.h optcom.h sadb.h sctp_itf.h snmpcom.h tcp.h \
tcp_sack.h tcp_stack.h udp_impl.h rawip_impl.h ipp_common.h \
- ip_ftable.h ip_impl.h ip_stack.h tcp_impl.h wifi_ioctl.h \
+ ip_ftable.h ip_impl.h ip_stack.h ip_arp.h tcp_impl.h wifi_ioctl.h \
ip2mac.h ip2mac_impl.h
ROOTDIRS= $(ROOT)/usr/include/inet
diff --git a/usr/src/uts/common/inet/arp.h b/usr/src/uts/common/inet/arp.h
index 4351c91..de0602e 100644
--- a/usr/src/uts/common/inet/arp.h
+++ b/usr/src/uts/common/inet/arp.h
@@ -28,7 +28,6 @@
#define _INET_ARP_H
#include <sys/types.h>
-#include <net/if.h>
#ifdef __cplusplus
extern "C" {
@@ -45,30 +44,7 @@
#define RARP_REQUEST 3
#define RARP_RESPONSE 4
-#define AR_IOCTL (((unsigned)'A' & 0xFF)<<8)
-#define CMD_IN_PROGRESS 0x10000
-
-#define AR_ENTRY_ADD (AR_IOCTL + 1)
-#define AR_ENTRY_DELETE (AR_IOCTL + 2)
-#define AR_ENTRY_QUERY (AR_IOCTL + 3)
-#define AR_ENTRY_SQUERY (AR_IOCTL + 6)
-#define AR_MAPPING_ADD (AR_IOCTL + 7)
-#define AR_CLIENT_NOTIFY (AR_IOCTL + 8)
-#define AR_INTERFACE_UP (AR_IOCTL + 9)
-#define AR_INTERFACE_DOWN (AR_IOCTL + 10)
-#define AR_INTERFACE_ON (AR_IOCTL + 12)
-#define AR_INTERFACE_OFF (AR_IOCTL + 13)
-#define AR_DLPIOP_DONE (AR_IOCTL + 14)
-/*
- * This is not an ARP command per se, it is used to interface between
- * ARP and IP during close.
- */
-#define AR_ARP_CLOSING (AR_IOCTL + 16)
-#define AR_ARP_EXTEND (AR_IOCTL + 17)
-#define AR_IPMP_ACTIVATE (AR_IOCTL + 18)
-#define AR_IPMP_DEACTIVATE (AR_IOCTL + 19)
-
-/* Both ace_flags and area_flags; must also modify arp.c in mdb */
+/* Both ace_flags; must also modify arp.c in mdb */
#define ACE_F_PERMANENT 0x0001
#define ACE_F_PUBLISH 0x0002
#define ACE_F_DYING 0x0004
@@ -84,123 +60,6 @@
#define ACE_F_DELAYED 0x0800 /* rescheduled on arp_defend_rate */
#define ACE_F_DAD_ABORTED 0x1000 /* DAD was aborted on link down */
-/* ared_flags */
-#define ARED_F_PRESERVE_PERM 0x0001 /* preserve permanent ace */
-
-/* ARP Command Structures */
-
-/* arc_t - Common command overlay */
-typedef struct ar_cmd_s {
- uint32_t arc_cmd;
- uint32_t arc_name_offset;
- uint32_t arc_name_length;
-} arc_t;
-
-/*
- * NOTE: when using area_t for an AR_ENTRY_SQUERY, the area_hw_addr_offset
- * field isn't what you might think. See comments in ip_multi.c where
- * the routine ill_create_squery() is called, and also in the routine
- * itself, to see how this field is used *only* when the area_t holds
- * an AR_ENTRY_SQUERY.
- */
-typedef struct ar_entry_add_s {
- uint32_t area_cmd;
- uint32_t area_name_offset;
- uint32_t area_name_length;
- uint32_t area_proto;
- uint32_t area_proto_addr_offset;
- uint32_t area_proto_addr_length;
- uint32_t area_proto_mask_offset;
- uint32_t area_flags; /* Same values as ace_flags */
- uint32_t area_hw_addr_offset;
- uint32_t area_hw_addr_length;
-} area_t;
-
-typedef struct ar_entry_delete_s {
- uint32_t ared_cmd;
- uint32_t ared_name_offset;
- uint32_t ared_name_length;
- uint32_t ared_proto;
- uint32_t ared_proto_addr_offset;
- uint32_t ared_proto_addr_length;
- uint32_t ared_flags;
-} ared_t;
-
-typedef struct ar_entry_query_s {
- uint32_t areq_cmd;
- uint32_t areq_name_offset;
- uint32_t areq_name_length;
- uint32_t areq_proto;
- uint32_t areq_target_addr_offset;
- uint32_t areq_target_addr_length;
- uint32_t areq_flags;
- uint32_t areq_sender_addr_offset;
- uint32_t areq_sender_addr_length;
- uint32_t areq_xmit_count; /* 0 ==> cache lookup only */
- uint32_t areq_xmit_interval; /* # of milliseconds; 0: default */
- /* # ofquests to buffer; 0: default */
- uint32_t areq_max_buffered;
- uchar_t areq_sap[8]; /* to insert in returned template */
-} areq_t;
-
-#define AR_EQ_DEFAULT_XMIT_COUNT 6
-#define AR_EQ_DEFAULT_XMIT_INTERVAL 1000
-#define AR_EQ_DEFAULT_MAX_BUFFERED 4
-
-/*
- * Structure used with AR_ENTRY_LLAQUERY to map from the link_addr
- * (in Neighbor Discovery option format excluding the option type and
- * length) to a hardware address.
- * The response has the same format as for an AR_ENTRY_SQUERY - an M_CTL with
- * arel_hw_addr updated.
- * An IPv6 address will be passed in AR_ENTRY_LLAQUERY so that atmip
- * can send it in AR_CLIENT_NOTIFY messages.
- */
-typedef struct ar_entry_llaquery_s {
- uint32_t arel_cmd;
- uint32_t arel_name_offset;
- uint32_t arel_name_length;
- uint32_t arel_link_addr_offset;
- uint32_t arel_link_addr_length;
- uint32_t arel_hw_addr_offset;
- uint32_t arel_hw_addr_length;
- uint32_t arel_ip_addr_offset;
- uint32_t arel_ip_addr_length;
-} arel_t;
-
-typedef struct ar_mapping_add_s {
- uint32_t arma_cmd;
- uint32_t arma_name_offset;
- uint32_t arma_name_length;
- uint32_t arma_proto;
- uint32_t arma_proto_addr_offset;
- uint32_t arma_proto_addr_length;
- uint32_t arma_proto_mask_offset;
- uint32_t arma_proto_extract_mask_offset;
- uint32_t arma_flags;
- uint32_t arma_hw_addr_offset;
- uint32_t arma_hw_addr_length;
- /* Offset were we start placing */
- uint32_t arma_hw_mapping_start;
- /* the mask&proto_addr */
-} arma_t;
-
-/* Structure used to notify ARP of changes to IPMP group topology */
-typedef struct ar_ipmp_event_s {
- uint32_t arie_cmd;
- uint32_t arie_name_offset;
- uint32_t arie_name_length;
- char arie_grifname[LIFNAMSIZ];
-} arie_t;
-
-/* Structure used to notify clients of interesting conditions. */
-typedef struct ar_client_notify_s {
- uint32_t arcn_cmd;
- uint32_t arcn_name_offset;
- uint32_t arcn_name_length;
- uint32_t arcn_code; /* Notification code. */
-} arcn_t;
-
/* Client Notification Codes */
#define AR_CN_BOGON 1
#define AR_CN_ANNOUNCE 2
diff --git a/usr/src/uts/common/inet/arp/arp.c b/usr/src/uts/common/inet/arp/arp.c
deleted file mode 100644
index abdbc39..0000000
--- a/usr/src/uts/common/inet/arp/arp.c
+++ /dev/null
@@ -1,4883 +0,0 @@
-/*
- * CDDL HEADER START
- *
- * The contents of this file are subject to the terms of the
- * Common Development and Distribution License (the "License").
- * You may not use this file except in compliance with the License.
- *
- * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
- * or http://www.opensolaris.org/os/licensing.
- * See the License for the specific language governing permissions
- * and limitations under the License.
- *
- * When distributing Covered Code, include this CDDL HEADER in each
- * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
- * If applicable, add the following below this CDDL HEADER, with the
- * fields enclosed by brackets "[]" replaced with your own identifying
- * information: Portions Copyright [yyyy] [name of copyright owner]
- *
- * CDDL HEADER END
- */
-/*
- * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
- * Use is subject to license terms.
- */
-/* Copyright (c) 1990 Mentat Inc. */
-
-/* AR - Address Resolution Protocol */
-
-#include <sys/types.h>
-#include <sys/stream.h>
-#include <sys/stropts.h>
-#include <sys/strsubr.h>
-#include <sys/errno.h>
-#include <sys/strlog.h>
-#include <sys/dlpi.h>
-#include <sys/sockio.h>
-#define _SUN_TPI_VERSION 2
-#include <sys/tihdr.h>
-#include <sys/socket.h>
-#include <sys/ddi.h>
-#include <sys/sunddi.h>
-#include <sys/cmn_err.h>
-#include <sys/sdt.h>
-#include <sys/vtrace.h>
-#include <sys/strsun.h>
-#include <sys/policy.h>
-#include <sys/zone.h>
-#include <sys/ethernet.h>
-#include <sys/zone.h>
-#include <sys/random.h>
-#include <sys/sdt.h>
-#include <sys/hook_event.h>
-
-#include <inet/common.h>
-#include <inet/optcom.h>
-#include <inet/mi.h>
-#include <inet/nd.h>
-#include <inet/snmpcom.h>
-#include <net/if.h>
-#include <inet/arp.h>
-#include <netinet/ip6.h>
-#include <netinet/arp.h>
-#include <inet/ip.h>
-#include <inet/ip_ire.h>
-#include <inet/ip_ndp.h>
-#include <inet/mib2.h>
-#include <inet/arp_impl.h>
-
-/*
- * ARP entry life time and design notes
- * ------------------------------------
- *
- * ARP entries (ACEs) must last at least as long as IP knows about a given
- * MAC-IP translation (i.e., as long as the IRE cache entry exists). It's ok
- * if the ARP entry lasts longer, but not ok if it is removed before the IP
- * entry. The reason for this is that if ARP doesn't have an entry, we will be
- * unable to detect the difference between an ARP broadcast that represents no
- * change (same, known address of sender) and one that represents a change (new
- * address for existing entry). In the former case, we must not notify IP, or
- * we can suffer hurricane attack. In the latter case, we must notify IP, or
- * IP will drift out of sync with the network.
- *
- * Note that IP controls the lifetime of entries, not ARP.
- *
- * We don't attempt to reconfirm aging entries. If the system is no longer
- * talking to a given peer, then it doesn't matter if we have the right mapping
- * for that peer. It would be possible to send queries on aging entries that
- * are active, but this isn't done.
- *
- * IPMP Notes
- * ----------
- *
- * ARP is aware of IPMP. In particular, IP notifies ARP about all "active"
- * (able to transmit data packets) interfaces in a given group via
- * AR_IPMP_ACTIVATE and AR_IPMP_DEACTIVATE messages. These messages, combined
- * with the "IPMP arl_t" that ARP creates over the IPMP DLPI stub driver,
- * enable ARP to track all the arl_t's that are in the same group and thus
- * ensure that ACEs are shared across each group and the arl_t that ARP
- * chooses to transmit on for a given ACE is optimal.
- *
- * ARP relies on IP for hardware address updates. In particular, if the
- * hardware address of an interface changes (DL_NOTE_PHYS_ADDR), then IP will
- * bring the interface down and back up -- and as part of bringing it back
- * up, will send messages to ARP that allow it to update the affected arl's
- * with new hardware addresses.
- *
- * N.B.: One side-effect of this approach is that when an interface fails and
- * then starts to repair, it will temporarily populate the ARP cache with
- * addresses that are owned by it rather than the group's arl_t. To address
- * this, we could add more messages (e.g., AR_IPMP_JOIN and AR_IPMP_LEAVE),
- * but as the issue appears to be only cosmetic (redundant entries in the ARP
- * cache during interace repair), we've kept things simple for now.
- */
-
-/*
- * This is used when scanning for "old" (least recently broadcast) ACEs. We
- * don't want to have to walk the list for every single one, so we gather up
- * batches at a time.
- */
-#define ACE_RESCHED_LIST_LEN 8
-
-typedef struct {
- arl_t *art_arl;
- uint_t art_naces;
- ace_t *art_aces[ACE_RESCHED_LIST_LEN];
-} ace_resched_t;
-
-#define ACE_RESOLVED(ace) ((ace)->ace_flags & ACE_F_RESOLVED)
-#define ACE_NONPERM(ace) \
- (((ace)->ace_flags & (ACE_F_RESOLVED | ACE_F_PERMANENT)) == \
- ACE_F_RESOLVED)
-
-#define AR_DEF_XMIT_INTERVAL 500 /* time in milliseconds */
-#define AR_LL_HDR_SLACK 32 /* Leave the lower layer some room */
-
-#define AR_SNMP_MSG T_OPTMGMT_ACK
-#define AR_DRAINING (void *)0x11
-
-/*
- * The IPv4 Link Local address space is special; we do extra duplicate checking
- * there, as the entire assignment mechanism rests on random numbers.
- */
-#define IS_IPV4_LL_SPACE(ptr) (((uchar_t *)ptr)[0] == 169 && \
- ((uchar_t *)ptr)[1] == 254)
-
-/*
- * Check if the command needs to be enqueued by seeing if there are other
- * commands ahead of us or if some DLPI response is being awaited. Usually
- * there would be an enqueued command in the latter case, however if the
- * stream that originated the command has closed, the close would have
- * cleaned up the enqueued command. AR_DRAINING signifies that the command
- * at the head of the arl_queue has been internally dequeued on completion
- * of the previous command and is being called from ar_dlpi_done
- */
-#define CMD_NEEDS_QUEUEING(mp, arl) \
- (mp->b_prev != AR_DRAINING && (arl->arl_queue != NULL || \
- arl->arl_dlpi_pending != DL_PRIM_INVAL))
-
-#define ARH_FIXED_LEN 8
-
-/*
- * Macro used when creating ACEs to determine the arl that should own it.
- */
-#define OWNING_ARL(arl) \
- ((arl)->arl_ipmp_arl != NULL ? (arl)->arl_ipmp_arl : arl)
-
-/*
- * MAC-specific intelligence. Shouldn't be needed, but the DL_INFO_ACK
- * doesn't quite do it for us.
- */
-typedef struct ar_m_s {
- t_uscalar_t ar_mac_type;
- uint32_t ar_mac_arp_hw_type;
- t_scalar_t ar_mac_sap_length;
- uint32_t ar_mac_hw_addr_length;
-} ar_m_t;
-
-typedef struct msg2_args {
- mblk_t *m2a_mpdata;
- mblk_t *m2a_mptail;
-} msg2_args_t;
-
-static mblk_t *ar_alloc(uint32_t cmd, int);
-static int ar_ce_create(arl_t *arl, uint32_t proto, uchar_t *hw_addr,
- uint32_t hw_addr_len, uchar_t *proto_addr,
- uint32_t proto_addr_len, uchar_t *proto_mask,
- uchar_t *proto_extract_mask, uint32_t hw_extract_start,
- uchar_t *sender_addr, uint32_t flags);
-static void ar_ce_delete(ace_t *ace);
-static void ar_ce_delete_per_arl(ace_t *ace, void *arg);
-static ace_t **ar_ce_hash(arp_stack_t *as, uint32_t proto,
- const uchar_t *proto_addr, uint32_t proto_addr_length);
-static ace_t *ar_ce_lookup(arl_t *arl, uint32_t proto,
- const uchar_t *proto_addr, uint32_t proto_addr_length);
-static ace_t *ar_ce_lookup_entry(arl_t *arl, uint32_t proto,
- const uchar_t *proto_addr, uint32_t proto_addr_length);
-static ace_t *ar_ce_lookup_from_area(arp_stack_t *as, mblk_t *mp,
- ace_t *matchfn());
-static ace_t *ar_ce_lookup_mapping(arl_t *arl, uint32_t proto,
- const uchar_t *proto_addr, uint32_t proto_addr_length);
-static ace_t *ar_ce_lookup_permanent(arp_stack_t *as, uint32_t proto,
- uchar_t *proto_addr, uint32_t proto_addr_length);
-static boolean_t ar_ce_resolve(ace_t *ace, const uchar_t *hw_addr,
- uint32_t hw_addr_length);
-static void ar_ce_walk(arp_stack_t *as, void (*pfi)(ace_t *, void *),
- void *arg1);
-
-static void ar_client_notify(const arl_t *arl, mblk_t *mp, int code);
-static int ar_close(queue_t *q);
-static int ar_cmd_dispatch(queue_t *q, mblk_t *mp, boolean_t from_wput);
-static void ar_cmd_drain(arl_t *arl);
-static void ar_cmd_done(arl_t *arl);
-static mblk_t *ar_dlpi_comm(t_uscalar_t prim, size_t size);
-static void ar_dlpi_send(arl_t *, mblk_t *);
-static void ar_dlpi_done(arl_t *, t_uscalar_t);
-static int ar_entry_add(queue_t *q, mblk_t *mp);
-static int ar_entry_delete(queue_t *q, mblk_t *mp);
-static int ar_entry_query(queue_t *q, mblk_t *mp);
-static int ar_entry_squery(queue_t *q, mblk_t *mp);
-static int ar_interface_up(queue_t *q, mblk_t *mp);
-static int ar_interface_down(queue_t *q, mblk_t *mp);
-static int ar_interface_on(queue_t *q, mblk_t *mp);
-static int ar_interface_off(queue_t *q, mblk_t *mp);
-static int ar_ipmp_activate(queue_t *q, mblk_t *mp);
-static int ar_ipmp_deactivate(queue_t *q, mblk_t *mp);
-static void ar_ll_cleanup_arl_queue(queue_t *q);
-static void ar_ll_down(arl_t *arl);
-static arl_t *ar_ll_lookup_by_name(arp_stack_t *as, const char *name);
-static arl_t *ar_ll_lookup_from_mp(arp_stack_t *as, mblk_t *mp);
-static void ar_ll_init(arp_stack_t *, ar_t *, mblk_t *mp);
-static void ar_ll_set_defaults(arl_t *, mblk_t *mp);
-static void ar_ll_clear_defaults(arl_t *);
-static int ar_ll_up(arl_t *arl);
-static int ar_mapping_add(queue_t *q, mblk_t *mp);
-static boolean_t ar_mask_all_ones(uchar_t *mask, uint32_t mask_len);
-static ar_m_t *ar_m_lookup(t_uscalar_t mac_type);
-static int ar_nd_ioctl(queue_t *q, mblk_t *mp);
-static int ar_open(queue_t *q, dev_t *devp, int flag, int sflag,
- cred_t *credp);
-static int ar_param_get(queue_t *q, mblk_t *mp, caddr_t cp, cred_t *cr);
-static boolean_t ar_param_register(IDP *ndp, arpparam_t *arppa, int cnt);
-static int ar_param_set(queue_t *q, mblk_t *mp, char *value,
- caddr_t cp, cred_t *cr);
-static void ar_query_delete(ace_t *ace, void *ar);
-static void ar_query_reply(ace_t *ace, int ret_val,
- uchar_t *proto_addr, uint32_t proto_addr_len);
-static clock_t ar_query_xmit(arp_stack_t *as, ace_t *ace);
-static void ar_rput(queue_t *q, mblk_t *mp_orig);
-static void ar_rput_dlpi(queue_t *q, mblk_t *mp);
-static void ar_set_address(ace_t *ace, uchar_t *addrpos,
- uchar_t *proto_addr, uint32_t proto_addr_len);
-static int ar_slifname(queue_t *q, mblk_t *mp);
-static int ar_set_ppa(queue_t *q, mblk_t *mp);
-static int ar_snmp_msg(queue_t *q, mblk_t *mp_orig);
-static void ar_snmp_msg2(ace_t *, void *);
-static void ar_wput(queue_t *q, mblk_t *mp);
-static void ar_wsrv(queue_t *q);
-static void ar_xmit(arl_t *arl, uint32_t operation, uint32_t proto,
- uint32_t plen, const uchar_t *haddr1, const uchar_t *paddr1,
- const uchar_t *haddr2, const uchar_t *paddr2, const uchar_t *dstaddr,
- arp_stack_t *as);
-static void ar_cmd_enqueue(arl_t *arl, mblk_t *mp, queue_t *q,
- ushort_t cmd, boolean_t);
-static mblk_t *ar_cmd_dequeue(arl_t *arl);
-
-static void *arp_stack_init(netstackid_t stackid, netstack_t *ns);
-static void arp_stack_fini(netstackid_t stackid, void *arg);
-static void arp_stack_shutdown(netstackid_t stackid, void *arg);
-
-boolean_t arp_no_defense = B_FALSE;
-
-/*
- * All of these are alterable, within the min/max values given,
- * at run time. arp_publish_interval and arp_publish_count are
- * set by default to 2 seconds and 5 respectively. This is
- * useful during FAILOVER/FAILBACK to make sure that the ARP
- * packets are not lost. Assumed that it does not affect the
- * normal operations.
- */
-static arpparam_t arp_param_arr[] = {
- /* min max value name */
- { 30000, 3600000, 300000, "arp_cleanup_interval"},
- { 1000, 20000, 2000, "arp_publish_interval"},
- { 1, 20, 5, "arp_publish_count"},
- { 0, 20000, 1000, "arp_probe_delay"},
- { 10, 20000, 1500, "arp_probe_interval"},
- { 0, 20, 3, "arp_probe_count"},
- { 0, 20000, 100, "arp_fastprobe_delay"},
- { 10, 20000, 150, "arp_fastprobe_interval"},
- { 0, 20, 3, "arp_fastprobe_count"},
- { 0, 3600000, 300000, "arp_defend_interval"},
- { 0, 20000, 100, "arp_defend_rate"},
- { 0, 3600000, 15000, "arp_broadcast_interval"},
- { 5, 86400, 3600, "arp_defend_period"}
-};
-#define as_cleanup_interval as_param_arr[0].arp_param_value
-#define as_publish_interval as_param_arr[1].arp_param_value
-#define as_publish_count as_param_arr[2].arp_param_value
-#define as_probe_delay as_param_arr[3].arp_param_value
-#define as_probe_interval as_param_arr[4].arp_param_value
-#define as_probe_count as_param_arr[5].arp_param_value
-#define as_fastprobe_delay as_param_arr[6].arp_param_value
-#define as_fastprobe_interval as_param_arr[7].arp_param_value
-#define as_fastprobe_count as_param_arr[8].arp_param_value
-#define as_defend_interval as_param_arr[9].arp_param_value
-#define as_defend_rate as_param_arr[10].arp_param_value
-#define as_broadcast_interval as_param_arr[11].arp_param_value
-#define as_defend_period as_param_arr[12].arp_param_value
-
-static struct module_info arp_mod_info = {
- 0, "arp", 0, INFPSZ, 512, 128
-};
-
-static struct qinit arprinit = {
- (pfi_t)ar_rput, NULL, ar_open, ar_close, NULL, &arp_mod_info
-};
-
-static struct qinit arpwinit = {
- (pfi_t)ar_wput, (pfi_t)ar_wsrv, ar_open, ar_close, NULL, &arp_mod_info
-};
-
-struct streamtab arpinfo = {
- &arprinit, &arpwinit
-};
-
-/*
- * TODO: we need a better mechanism to set the ARP hardware type since
- * the DLPI mac type does not include enough predefined values.
- */
-static ar_m_t ar_m_tbl[] = {
- { DL_CSMACD, ARPHRD_ETHER, -2, 6}, /* 802.3 */
- { DL_TPB, ARPHRD_IEEE802, -2, 6}, /* 802.4 */
- { DL_TPR, ARPHRD_IEEE802, -2, 6}, /* 802.5 */
- { DL_METRO, ARPHRD_IEEE802, -2, 6}, /* 802.6 */
- { DL_ETHER, ARPHRD_ETHER, -2, 6}, /* Ethernet */
- { DL_FDDI, ARPHRD_ETHER, -2, 6}, /* FDDI */
- { DL_IB, ARPHRD_IB, -2, 20}, /* Infiniband */
- { DL_OTHER, ARPHRD_ETHER, -2, 6}, /* unknown */
-};
-
-/*
- * Note that all routines which need to queue the message for later
- * processing have to be ioctl_aware to be able to queue the complete message.
- * Following are command entry flags in arct_flags
- */
-#define ARF_IOCTL_AWARE 0x1 /* Arp command can come down as M_IOCTL */
-#define ARF_ONLY_CMD 0x2 /* Command is exclusive to ARP */
-#define ARF_WPUT_OK 0x4 /* Command is allowed from ar_wput */
-
-/* ARP Cmd Table entry */
-typedef struct arct_s {
- int (*arct_pfi)(queue_t *, mblk_t *);
- uint32_t arct_cmd;
- int arct_min_len;
- uint32_t arct_flags;
- int arct_priv_req; /* Privilege required for this cmd */
- const char *arct_txt;
-} arct_t;
-
-/*
- * AR_ENTRY_ADD, QUERY and SQUERY are used by sdp, hence they need to
- * have ARF_WPUT_OK set.
- */
-static arct_t ar_cmd_tbl[] = {
- { ar_entry_add, AR_ENTRY_ADD, sizeof (area_t),
- ARF_IOCTL_AWARE | ARF_ONLY_CMD | ARF_WPUT_OK, OP_CONFIG,
- "AR_ENTRY_ADD" },
- { ar_entry_delete, AR_ENTRY_DELETE, sizeof (ared_t),
- ARF_IOCTL_AWARE | ARF_ONLY_CMD, OP_CONFIG, "AR_ENTRY_DELETE" },
- { ar_entry_query, AR_ENTRY_QUERY, sizeof (areq_t),
- ARF_IOCTL_AWARE | ARF_ONLY_CMD | ARF_WPUT_OK, OP_NP,
- "AR_ENTRY_QUERY" },
- { ar_entry_squery, AR_ENTRY_SQUERY, sizeof (area_t),
- ARF_IOCTL_AWARE | ARF_ONLY_CMD | ARF_WPUT_OK, OP_NP,
- "AR_ENTRY_SQUERY" },
- { ar_mapping_add, AR_MAPPING_ADD, sizeof (arma_t),
- ARF_IOCTL_AWARE | ARF_ONLY_CMD, OP_CONFIG, "AR_MAPPING_ADD" },
- { ar_interface_up, AR_INTERFACE_UP, sizeof (arc_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_INTERFACE_UP" },
- { ar_interface_down, AR_INTERFACE_DOWN, sizeof (arc_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_INTERFACE_DOWN" },
- { ar_interface_on, AR_INTERFACE_ON, sizeof (arc_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_INTERFACE_ON" },
- { ar_interface_off, AR_INTERFACE_OFF, sizeof (arc_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_INTERFACE_OFF" },
- { ar_ipmp_activate, AR_IPMP_ACTIVATE, sizeof (arie_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_IPMP_ACTIVATE" },
- { ar_ipmp_deactivate, AR_IPMP_DEACTIVATE, sizeof (arie_t),
- ARF_ONLY_CMD, OP_CONFIG, "AR_IPMP_DEACTIVATE" },
- { ar_set_ppa, (uint32_t)IF_UNITSEL, sizeof (int),
- ARF_IOCTL_AWARE | ARF_WPUT_OK, OP_CONFIG, "IF_UNITSEL" },
- { ar_nd_ioctl, ND_GET, 1,
- ARF_IOCTL_AWARE | ARF_WPUT_OK, OP_NP, "ND_GET" },
- { ar_nd_ioctl, ND_SET, 1,
- ARF_IOCTL_AWARE | ARF_WPUT_OK, OP_CONFIG, "ND_SET" },
- { ar_snmp_msg, AR_SNMP_MSG, sizeof (struct T_optmgmt_ack),
- ARF_IOCTL_AWARE | ARF_WPUT_OK | ARF_ONLY_CMD, OP_NP,
- "AR_SNMP_MSG" },
- { ar_slifname, (uint32_t)SIOCSLIFNAME, sizeof (struct lifreq),
- ARF_IOCTL_AWARE | ARF_WPUT_OK, OP_CONFIG, "SIOCSLIFNAME" }
-};
-
-/*
- * Lookup and return an arl appropriate for sending packets with either source
- * hardware address `hw_addr' or source protocol address `ip_addr', in that
- * order. If neither was specified or neither match, return any arl in the
- * same group as `arl'.
- */
-static arl_t *
-ar_ipmp_lookup_xmit_arl(arl_t *arl, uchar_t *hw_addr, uint_t hw_addrlen,
- uchar_t *ip_addr)
-{
- arlphy_t *ap;
- ace_t *src_ace;
- arl_t *xmit_arl = NULL;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- ASSERT(arl->arl_flags & ARL_F_IPMP);
-
- if (hw_addr != NULL && hw_addrlen != 0) {
- xmit_arl = as->as_arl_head;
- for (; xmit_arl != NULL; xmit_arl = xmit_arl->arl_next) {
- /*
- * There may be arls with the same HW address that are
- * not in our IPMP group; we don't want those.
- */
- if (xmit_arl->arl_ipmp_arl != arl)
- continue;
-
- ap = xmit_arl->arl_phy;
- if (ap != NULL && ap->ap_hw_addrlen == hw_addrlen &&
- bcmp(ap->ap_hw_addr, hw_addr, hw_addrlen) == 0)
- break;
- }
-
- DTRACE_PROBE4(xmit_arl_hwsrc, arl_t *, arl, arl_t *,
- xmit_arl, uchar_t *, hw_addr, uint_t, hw_addrlen);
- }
-
- if (xmit_arl == NULL && ip_addr != NULL) {
- src_ace = ar_ce_lookup_permanent(as, IP_ARP_PROTO_TYPE, ip_addr,
- IP_ADDR_LEN);
- if (src_ace != NULL)
- xmit_arl = src_ace->ace_xmit_arl;
-
- DTRACE_PROBE4(xmit_arl_ipsrc, arl_t *, arl, arl_t *,
- xmit_arl, uchar_t *, ip_addr, uint_t, IP_ADDR_LEN);
- }
-
- if (xmit_arl == NULL) {
- xmit_arl = as->as_arl_head;
- for (; xmit_arl != NULL; xmit_arl = xmit_arl->arl_next)
- if (xmit_arl->arl_ipmp_arl == arl && xmit_arl != arl)
- break;
-
- DTRACE_PROBE2(xmit_arl_any, arl_t *, arl, arl_t *, xmit_arl);
- }
-
- return (xmit_arl);
-}
-
-/*
- * ARP Cache Entry creation routine.
- * Cache entries are allocated within timer messages and inserted into
- * the global hash list based on protocol and protocol address.
- */
-static int
-ar_ce_create(arl_t *arl, uint_t proto, uchar_t *hw_addr, uint_t hw_addr_len,
- uchar_t *proto_addr, uint_t proto_addr_len, uchar_t *proto_mask,
- uchar_t *proto_extract_mask, uint_t hw_extract_start, uchar_t *sender_addr,
- uint_t flags)
-{
- static ace_t ace_null;
- ace_t *ace;
- ace_t **acep;
- uchar_t *dst;
- mblk_t *mp;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
- arl_t *xmit_arl;
- arlphy_t *ap;
-
- if ((flags & ~ACE_EXTERNAL_FLAGS_MASK) || arl == NULL)
- return (EINVAL);
-
- if (proto_addr == NULL || proto_addr_len == 0 ||
- (proto == IP_ARP_PROTO_TYPE && proto_addr_len != IP_ADDR_LEN))
- return (EINVAL);
-
- if (flags & ACE_F_MYADDR)
- flags |= ACE_F_PUBLISH | ACE_F_AUTHORITY;
-
- /*
- * Latch a transmit arl for this ace.
- */
- if (arl->arl_flags & ARL_F_IPMP) {
- ASSERT(proto == IP_ARP_PROTO_TYPE);
- xmit_arl = ar_ipmp_lookup_xmit_arl(arl, hw_addr, hw_addr_len,
- sender_addr);
- } else {
- xmit_arl = arl;
- }
-
- if (xmit_arl == NULL || xmit_arl->arl_phy == NULL)
- return (EINVAL);
-
- ap = xmit_arl->arl_phy;
-
- if (!hw_addr && hw_addr_len == 0) {
- if (flags == ACE_F_PERMANENT) { /* Not publish */
- /* 224.0.0.0 to zero length address */
- flags |= ACE_F_RESOLVED;
- } else { /* local address and unresolved case */
- hw_addr = ap->ap_hw_addr;
- hw_addr_len = ap->ap_hw_addrlen;
- if (flags & ACE_F_PUBLISH)
- flags |= ACE_F_RESOLVED;
- }
- } else {
- flags |= ACE_F_RESOLVED;
- }
-
- /* Handle hw_addr_len == 0 for DL_ENABMULTI_REQ etc. */
- if (hw_addr_len != 0 && hw_addr == NULL)
- return (EINVAL);
- if (hw_addr_len < ap->ap_hw_addrlen && hw_addr_len != 0)
- return (EINVAL);
- if (!proto_extract_mask && (flags & ACE_F_MAPPING))
- return (EINVAL);
-
- /*
- * If the underlying link doesn't have reliable up/down notification or
- * if we're working with the IPv4 169.254.0.0/16 Link Local Address
- * space, then don't use the fast timers. Otherwise, use them.
- */
- if (ap->ap_notifies &&
- !(proto == IP_ARP_PROTO_TYPE && IS_IPV4_LL_SPACE(proto_addr))) {
- flags |= ACE_F_FAST;
- }
-
- /*
- * Allocate the timer block to hold the ace.
- * (ace + proto_addr + proto_addr_mask + proto_extract_mask + hw_addr)
- */
- mp = mi_timer_alloc(sizeof (ace_t) + proto_addr_len + proto_addr_len +
- proto_addr_len + hw_addr_len);
- if (!mp)
- return (ENOMEM);
- ace = (ace_t *)mp->b_rptr;
- *ace = ace_null;
- ace->ace_proto = proto;
- ace->ace_mp = mp;
- ace->ace_arl = arl;
- ace->ace_xmit_arl = xmit_arl;
-
- dst = (uchar_t *)&ace[1];
-
- ace->ace_proto_addr = dst;
- ace->ace_proto_addr_length = proto_addr_len;
- bcopy(proto_addr, dst, proto_addr_len);
- dst += proto_addr_len;
- /*
- * The proto_mask allows us to add entries which will let us respond
- * to requests for a group of addresses. This makes it easy to provide
- * proxy ARP service for machines that don't understand about the local
- * subnet structure, if, for example, there are BSD4.2 systems lurking.
- */
- ace->ace_proto_mask = dst;
- if (proto_mask != NULL) {
- bcopy(proto_mask, dst, proto_addr_len);
- dst += proto_addr_len;
- } else {
- while (proto_addr_len-- > 0)
- *dst++ = (uchar_t)~0;
- }
-
- if (proto_extract_mask != NULL) {
- ace->ace_proto_extract_mask = dst;
- bcopy(proto_extract_mask, dst, ace->ace_proto_addr_length);
- dst += ace->ace_proto_addr_length;
- } else {
- ace->ace_proto_extract_mask = NULL;
- }
- ace->ace_hw_extract_start = hw_extract_start;
- ace->ace_hw_addr_length = hw_addr_len;
- ace->ace_hw_addr = dst;
- if (hw_addr != NULL) {
- bcopy(hw_addr, dst, hw_addr_len);
- dst += hw_addr_len;
- }
-
- ace->ace_flags = flags;
- if (ar_mask_all_ones(ace->ace_proto_mask,
- ace->ace_proto_addr_length)) {
- acep = ar_ce_hash(as, ace->ace_proto, ace->ace_proto_addr,
- ace->ace_proto_addr_length);
- } else {
- acep = &as->as_ce_mask_entries;
- }
- if ((ace->ace_next = *acep) != NULL)
- ace->ace_next->ace_ptpn = &ace->ace_next;
- *acep = ace;
- ace->ace_ptpn = acep;
- return (0);
-}
-
-/* Delete a cache entry. */
-static void
-ar_ce_delete(ace_t *ace)
-{
- ace_t **acep;
-
- /* Get out of the hash list. */
- acep = ace->ace_ptpn;
- if (ace->ace_next)
- ace->ace_next->ace_ptpn = acep;
- acep[0] = ace->ace_next;
- /* Mark it dying in case we have a timer about to fire. */
- ace->ace_flags |= ACE_F_DYING;
- /* Complete any outstanding queries immediately. */
- ar_query_reply(ace, ENXIO, NULL, (uint32_t)0);
- /* Free the timer, immediately, or when it fires. */
- mi_timer_free(ace->ace_mp);
-}
-
-/*
- * ar_ce_walk routine. Delete the ace if it is associated with the arl
- * that is going away.
- */
-static void
-ar_ce_delete_per_arl(ace_t *ace, void *arl)
-{
- if (ace->ace_arl == arl || ace->ace_xmit_arl == arl) {
- ace->ace_flags &= ~ACE_F_PERMANENT;
- ar_ce_delete(ace);
- }
-}
-
-/*
- * ar_ce_walk routine used when deactivating an `arl' in a group. Deletes
- * `ace' if it was using `arl_arg' as its output interface.
- */
-static void
-ar_ce_ipmp_deactivate(ace_t *ace, void *arl_arg)
-{
- arl_t *arl = arl_arg;
-
- ASSERT(!(arl->arl_flags & ARL_F_IPMP));
-
- if (ace->ace_arl == arl) {
- ASSERT(ace->ace_xmit_arl == arl);
- /*
- * This ACE is tied to the arl leaving the group (e.g., an
- * ACE_F_PERMANENT for a test address) and is not used by the
- * group, so we can leave it be.
- */
- return;
- }
-
- if (ace->ace_xmit_arl != arl)
- return;
-
- ASSERT(ace->ace_arl == arl->arl_ipmp_arl);
-
- /*
- * IP should've already sent us messages asking us to move any
- * ACE_F_MYADDR entries to another arl, but there are two exceptions:
- *
- * 1. The group was misconfigured with interfaces that have duplicate
- * hardware addresses, but in.mpathd was unable to offline those
- * duplicate interfaces.
- *
- * 2. The messages from IP were lost or never created (e.g. due to
- * memory pressure).
- *
- * We handle the first case by just quietly deleting the ACE. Since
- * the second case cannot be distinguished from a more serious bug in
- * the IPMP framework, we ASSERT() that this can't happen on DEBUG
- * systems, but quietly delete the ACE on production systems (the
- * deleted ACE will render the IP address unreachable).
- */
- if (ace->ace_flags & ACE_F_MYADDR) {
- arlphy_t *ap = arl->arl_phy;
- uint_t hw_addrlen = ap->ap_hw_addrlen;
-
- ASSERT(hw_addrlen == ace->ace_hw_addr_length &&
- bcmp(ap->ap_hw_addr, ace->ace_hw_addr, hw_addrlen) == 0);
- }
-
- /*
- * NOTE: it's possible this arl got selected as the ace_xmit_arl when
- * creating an ACE_F_PERMANENT ACE on behalf of an SIOCS*ARP ioctl for
- * an IPMP IP interface. But it's still OK for us to delete such an
- * ACE since ipmp_illgrp_refresh_arpent() will ask us to recreate it
- * and we'll pick another arl then.
- */
- ar_ce_delete(ace);
-}
-
-/* Cache entry hash routine, based on protocol and protocol address. */
-static ace_t **
-ar_ce_hash(arp_stack_t *as, uint32_t proto, const uchar_t *proto_addr,
- uint32_t proto_addr_length)
-{
- const uchar_t *up = proto_addr;
- unsigned int hval = proto;
- int len = proto_addr_length;
-
- while (--len >= 0)
- hval ^= *up++;
- return (&as->as_ce_hash_tbl[hval % ARP_HASH_SIZE]);
-}
-
-/* Cache entry lookup. Try to find an ace matching the parameters passed. */
-ace_t *
-ar_ce_lookup(arl_t *arl, uint32_t proto, const uchar_t *proto_addr,
- uint32_t proto_addr_length)
-{
- ace_t *ace;
-
- ace = ar_ce_lookup_entry(arl, proto, proto_addr, proto_addr_length);
- if (!ace)
- ace = ar_ce_lookup_mapping(arl, proto, proto_addr,
- proto_addr_length);
- return (ace);
-}
-
-/*
- * Cache entry lookup. Try to find an ace matching the parameters passed.
- * Look only for exact entries (no mappings)
- */
-static ace_t *
-ar_ce_lookup_entry(arl_t *arl, uint32_t proto, const uchar_t *proto_addr,
- uint32_t proto_addr_length)
-{
- ace_t *ace;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- if (!proto_addr)
- return (NULL);
- ace = *ar_ce_hash(as, proto, proto_addr, proto_addr_length);
- for (; ace; ace = ace->ace_next) {
- if ((ace->ace_arl == arl ||
- ace->ace_arl == arl->arl_ipmp_arl) &&
- ace->ace_proto_addr_length == proto_addr_length &&
- ace->ace_proto == proto) {
- int i1 = proto_addr_length;
- uchar_t *ace_addr = ace->ace_proto_addr;
- uchar_t *mask = ace->ace_proto_mask;
- /*
- * Note that the ace_proto_mask is applied to the
- * proto_addr before comparing to the ace_addr.
- */
- do {
- if (--i1 < 0)
- return (ace);
- } while ((proto_addr[i1] & mask[i1]) == ace_addr[i1]);
- }
- }
- return (ace);
-}
-
-/*
- * Extract cache entry lookup parameters from an external command message, then
- * call the supplied match function.
- */
-static ace_t *
-ar_ce_lookup_from_area(arp_stack_t *as, mblk_t *mp, ace_t *matchfn())
-{
- uchar_t *proto_addr;
- area_t *area = (area_t *)mp->b_rptr;
-
- proto_addr = mi_offset_paramc(mp, area->area_proto_addr_offset,
- area->area_proto_addr_length);
- if (!proto_addr)
- return (NULL);
- return ((*matchfn)(ar_ll_lookup_from_mp(as, mp), area->area_proto,
- proto_addr, area->area_proto_addr_length));
-}
-
-/*
- * Cache entry lookup. Try to find an ace matching the parameters passed.
- * Look only for mappings.
- */
-static ace_t *
-ar_ce_lookup_mapping(arl_t *arl, uint32_t proto, const uchar_t *proto_addr,
- uint32_t proto_addr_length)
-{
- ace_t *ace;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- if (!proto_addr)
- return (NULL);
- ace = as->as_ce_mask_entries;
- for (; ace; ace = ace->ace_next) {
- if (ace->ace_arl == arl &&
- ace->ace_proto_addr_length == proto_addr_length &&
- ace->ace_proto == proto) {
- int i1 = proto_addr_length;
- uchar_t *ace_addr = ace->ace_proto_addr;
- uchar_t *mask = ace->ace_proto_mask;
- /*
- * Note that the ace_proto_mask is applied to the
- * proto_addr before comparing to the ace_addr.
- */
- do {
- if (--i1 < 0)
- return (ace);
- } while ((proto_addr[i1] & mask[i1]) == ace_addr[i1]);
- }
- }
- return (ace);
-}
-
-/*
- * Look for a permanent entry for proto_addr across all interfaces.
- */
-static ace_t *
-ar_ce_lookup_permanent(arp_stack_t *as, uint32_t proto, uchar_t *proto_addr,
- uint32_t proto_addr_length)
-{
- ace_t *ace;
-
- ace = *ar_ce_hash(as, proto, proto_addr, proto_addr_length);
- for (; ace != NULL; ace = ace->ace_next) {
- if (!(ace->ace_flags & ACE_F_PERMANENT))
- continue;
- if (ace->ace_proto_addr_length == proto_addr_length &&
- ace->ace_proto == proto) {
- int i1 = proto_addr_length;
- uchar_t *ace_addr = ace->ace_proto_addr;
- uchar_t *mask = ace->ace_proto_mask;
-
- /*
- * Note that the ace_proto_mask is applied to the
- * proto_addr before comparing to the ace_addr.
- */
- do {
- if (--i1 < 0)
- return (ace);
- } while ((proto_addr[i1] & mask[i1]) == ace_addr[i1]);
- }
- }
- return (ace);
-}
-
-/*
- * ar_ce_resolve is called when a response comes in to an outstanding request.
- * Returns 'true' if the address has changed and we need to tell the client.
- * (We don't need to tell the client if there's still an outstanding query.)
- */
-static boolean_t
-ar_ce_resolve(ace_t *ace, const uchar_t *hw_addr, uint32_t hw_addr_length)
-{
- boolean_t hwchanged;
-
- if (hw_addr_length == ace->ace_hw_addr_length) {
- ASSERT(ace->ace_hw_addr != NULL);
- hwchanged = bcmp(hw_addr, ace->ace_hw_addr,
- hw_addr_length) != 0;
- if (hwchanged)
- bcopy(hw_addr, ace->ace_hw_addr, hw_addr_length);
- /*
- * No need to bother with ar_query_reply if no queries are
- * waiting.
- */
- ace->ace_flags |= ACE_F_RESOLVED;
- if (ace->ace_query_mp != NULL)
- ar_query_reply(ace, 0, NULL, (uint32_t)0);
- if (hwchanged)
- return (B_TRUE);
- }
- return (B_FALSE);
-}
-
-/*
- * There are 2 functions performed by this function.
- * 1. Resolution of unresolved entries and update of resolved entries.
- * 2. Detection of nodes with our own IP address (duplicates).
- *
- * If the resolving ARL is in the same group as a matching ACE's ARL, then
- * update the ACE. Otherwise, make no updates.
- *
- * For all entries, we first check to see if this is a duplicate (probable
- * loopback) message. If so, then just ignore it.
- *
- * Next, check to see if the entry has completed DAD. If not, then we've
- * failed, because someone is already using the address. Notify IP of the DAD
- * failure and remove the broken ace.
- *
- * Next, we check if we're the authority for this address. If so, then it's
- * time to defend it, because the other node is a duplicate. Report it as a
- * 'bogon' and let IP decide how to defend.
- *
- * Finally, if it's unresolved or if the arls match, we just update the MAC
- * address. This allows a published 'static' entry to be updated by an ARP
- * request from the node for which we're a proxy ARP server.
- *
- * Note that this logic does not update published ARP entries for mismatched
- * arls, as for example when we proxy arp across 2 subnets with differing
- * subnet masks.
- *
- * Return Values below
- */
-
-#define AR_NOTFOUND 1 /* No matching ace found in cache */
-#define AR_MERGED 2 /* Matching ace updated (RFC 826 Merge_flag) */
-#define AR_LOOPBACK 3 /* Our own arp packet was received */
-#define AR_BOGON 4 /* Another host has our IP addr. */
-#define AR_FAILED 5 /* Duplicate Address Detection has failed */
-#define AR_CHANGED 6 /* Address has changed; tell IP (and merged) */
-
-static int
-ar_ce_resolve_all(arl_t *arl, uint32_t proto, const uchar_t *src_haddr,
- uint32_t hlen, const uchar_t *src_paddr, uint32_t plen, arl_t **ace_arlp)
-{
- ace_t *ace;
- ace_t *ace_next;
- int i1;
- const uchar_t *paddr;
- uchar_t *ace_addr;
- uchar_t *mask;
- int retv = AR_NOTFOUND;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- ace = *ar_ce_hash(as, proto, src_paddr, plen);
- for (; ace != NULL; ace = ace_next) {
-
- /* ar_ce_resolve may delete the ace; fetch next pointer now */
- ace_next = ace->ace_next;
-
- if (ace->ace_proto_addr_length != plen ||
- ace->ace_proto != proto) {
- continue;
- }
-
- /*
- * Note that the ace_proto_mask is applied to the proto_addr
- * before comparing to the ace_addr.
- */
- paddr = src_paddr;
- i1 = plen;
- ace_addr = ace->ace_proto_addr;
- mask = ace->ace_proto_mask;
- while (--i1 >= 0) {
- if ((*paddr++ & *mask++) != *ace_addr++)
- break;
- }
- if (i1 >= 0)
- continue;
-
- *ace_arlp = ace->ace_arl;
-
- /*
- * If the IP address is ours, and the hardware address matches
- * one of our own arls, then this is a broadcast packet
- * emitted by one of our interfaces, reflected by the switch
- * and received on another interface. We return AR_LOOPBACK.
- */
- if (ace->ace_flags & ACE_F_MYADDR) {
- arl_t *hw_arl = as->as_arl_head;
- arlphy_t *ap;
-
- for (; hw_arl != NULL; hw_arl = hw_arl->arl_next) {
- ap = hw_arl->arl_phy;
- if (ap != NULL && ap->ap_hw_addrlen == hlen &&
- bcmp(ap->ap_hw_addr, src_haddr, hlen) == 0)
- return (AR_LOOPBACK);
- }
- }
-
- /*
- * If the entry is unverified, then we've just verified that
- * someone else already owns this address, because this is a
- * message with the same protocol address but different
- * hardware address. NOTE: the ace_xmit_arl check ensures we
- * don't send duplicate AR_FAILEDs if arl is in an IPMP group.
- */
- if ((ace->ace_flags & ACE_F_UNVERIFIED) &&
- arl == ace->ace_xmit_arl) {
- ar_ce_delete(ace);
- return (AR_FAILED);
- }
-
- /*
- * If the IP address matches ours and we're authoritative for
- * this entry, then some other node is using our IP addr, so
- * return AR_BOGON. Also reset the transmit count to zero so
- * that, if we're currently in initial announcement mode, we
- * switch back to the lazier defense mode. Knowing that
- * there's at least one duplicate out there, we ought not
- * blindly announce. NOTE: the ace_xmit_arl check ensures we
- * don't send duplicate AR_BOGONs if arl is in an IPMP group.
- */
- if ((ace->ace_flags & ACE_F_AUTHORITY) &&
- arl == ace->ace_xmit_arl) {
- ace->ace_xmit_count = 0;
- return (AR_BOGON);
- }
-
- /*
- * Only update this ACE if it's on the same network -- i.e.,
- * it's for our ARL or another ARL in the same IPMP group.
- */
- if (ace->ace_arl == arl || ace->ace_arl == arl->arl_ipmp_arl) {
- if (ar_ce_resolve(ace, src_haddr, hlen))
- retv = AR_CHANGED;
- else if (retv == AR_NOTFOUND)
- retv = AR_MERGED;
- }
- }
-
- if (retv == AR_NOTFOUND)
- *ace_arlp = NULL;
- return (retv);
-}
-
-/* Pass arg1 to the pfi supplied, along with each ace in existence. */
-static void
-ar_ce_walk(arp_stack_t *as, void (*pfi)(ace_t *, void *), void *arg1)
-{
- ace_t *ace;
- ace_t *ace1;
- int i;
-
- for (i = 0; i < ARP_HASH_SIZE; i++) {
- /*
- * We walk the hash chain in a way that allows the current
- * ace to get blown off by the called routine.
- */
- for (ace = as->as_ce_hash_tbl[i]; ace; ace = ace1) {
- ace1 = ace->ace_next;
- (*pfi)(ace, arg1);
- }
- }
- for (ace = as->as_ce_mask_entries; ace; ace = ace1) {
- ace1 = ace->ace_next;
- (*pfi)(ace, arg1);
- }
-}
-
-/*
- * Send a copy of interesting packets to the corresponding IP instance.
- * The corresponding IP instance is the ARP-IP-DEV instance for this
- * DEV (i.e. ARL).
- */
-static void
-ar_client_notify(const arl_t *arl, mblk_t *mp, int code)
-{
- ar_t *ar = ((ar_t *)arl->arl_rq->q_ptr)->ar_arl_ip_assoc;
- arcn_t *arcn;
- mblk_t *mp1;
- int arl_namelen = strlen(arl->arl_name) + 1;
-
- /* Looks like the association disappeared */
- if (ar == NULL) {
- freemsg(mp);
- return;
- }
-
- /* ar is the corresponding ARP-IP instance for this ARL */
- ASSERT(ar->ar_arl == NULL && ar->ar_wq->q_next != NULL);
-
- mp1 = allocb(sizeof (arcn_t) + arl_namelen, BPRI_MED);
- if (mp1 == NULL) {
- freemsg(mp);
- return;
- }
- DB_TYPE(mp1) = M_CTL;
- mp1->b_cont = mp;
- arcn = (arcn_t *)mp1->b_rptr;
- mp1->b_wptr = (uchar_t *)&arcn[1] + arl_namelen;
- arcn->arcn_cmd = AR_CLIENT_NOTIFY;
- arcn->arcn_name_offset = sizeof (arcn_t);
- arcn->arcn_name_length = arl_namelen;
- arcn->arcn_code = code;
- bcopy(arl->arl_name, &arcn[1], arl_namelen);
-
- putnext(ar->ar_wq, mp1);
-}
-
-/*
- * Send a delete-notify message down to IP. We've determined that IP doesn't
- * have a cache entry for the IP address itself, but it may have other cache
- * entries with the same hardware address, and we don't want to see those grow
- * stale. (The alternative is sending down updates for every ARP message we
- * get that doesn't match an existing ace. That's much more expensive than an
- * occasional delete and reload.)
- */
-static void
-ar_delete_notify(const ace_t *ace)
-{
- const arl_t *arl = ace->ace_arl;
- const arlphy_t *ap = ace->ace_xmit_arl->arl_phy;
- mblk_t *mp;
- size_t len;
- arh_t *arh;
-
- len = sizeof (*arh) + 2 * ace->ace_proto_addr_length;
- mp = allocb(len, BPRI_MED);
- if (mp == NULL)
- return;
- arh = (arh_t *)mp->b_rptr;
- mp->b_wptr = (uchar_t *)arh + len;
- U16_TO_BE16(ap->ap_arp_hw_type, arh->arh_hardware);
- U16_TO_BE16(ace->ace_proto, arh->arh_proto);
- arh->arh_hlen = 0;
- arh->arh_plen = ace->ace_proto_addr_length;
- U16_TO_BE16(ARP_RESPONSE, arh->arh_operation);
- bcopy(ace->ace_proto_addr, arh + 1, ace->ace_proto_addr_length);
- bcopy(ace->ace_proto_addr, (uchar_t *)(arh + 1) +
- ace->ace_proto_addr_length, ace->ace_proto_addr_length);
- ar_client_notify(arl, mp, AR_CN_ANNOUNCE);
-}
-
-/* ARP module close routine. */
-static int
-ar_close(queue_t *q)
-{
- ar_t *ar = (ar_t *)q->q_ptr;
- char name[LIFNAMSIZ];
- arl_t *arl, *xarl;
- arl_t **arlp;
- cred_t *cr;
- arc_t *arc;
- mblk_t *mp1;
- int index;
- arp_stack_t *as = ar->ar_as;
-
- TRACE_1(TR_FAC_ARP, TR_ARP_CLOSE,
- "arp_close: q %p", q);
-
- arl = ar->ar_arl;
- if (arl == NULL) {
- index = 0;
- /*
- * If this is the <ARP-IP-Driver> stream send down
- * a closing message to IP and wait for IP to send
- * an ack. This helps to make sure that messages
- * that are currently being sent up by IP are not lost.
- */
- if (ar->ar_on_ill_stream) {
- mp1 = allocb(sizeof (arc_t), BPRI_MED);
- if (mp1 != NULL) {
- DB_TYPE(mp1) = M_CTL;
- arc = (arc_t *)mp1->b_rptr;
- mp1->b_wptr = mp1->b_rptr + sizeof (arc_t);
- arc->arc_cmd = AR_ARP_CLOSING;
- putnext(WR(q), mp1);
- while (!ar->ar_ip_acked_close)
- /* If we are interrupted break out */
- if (qwait_sig(q) == 0)
- break;
- }
- }
- /* Delete all our pending queries, 'arl' is not dereferenced */
- ar_ce_walk(as, ar_query_delete, ar);
- /*
- * The request could be pending on some arl_queue also. This
- * happens if the arl is not yet bound, and bind is pending.
- */
- ar_ll_cleanup_arl_queue(q);
- } else {
- index = arl->arl_index;
- (void) strcpy(name, arl->arl_name);
- arl->arl_closing = 1;
- while (arl->arl_queue != NULL)
- qwait(arl->arl_rq);
-
- if (arl->arl_state == ARL_S_UP)
- ar_ll_down(arl);
-
- while (arl->arl_state != ARL_S_DOWN)
- qwait(arl->arl_rq);
-
- if (arl->arl_flags & ARL_F_IPMP) {
- /*
- * Though rude, someone could force the IPMP arl
- * closed without removing the underlying interfaces.
- * In that case, force the ARLs out of the group.
- */
- xarl = as->as_arl_head;
- for (; xarl != NULL; xarl = xarl->arl_next) {
- if (xarl->arl_ipmp_arl != arl || xarl == arl)
- continue;
- ar_ce_walk(as, ar_ce_ipmp_deactivate, xarl);
- xarl->arl_ipmp_arl = NULL;
- }
- }
-
- ar_ll_clear_defaults(arl);
- /*
- * If this is the control stream for an arl, delete anything
- * hanging off our arl.
- */
- ar_ce_walk(as, ar_ce_delete_per_arl, arl);
- /* Free any messages waiting for a bind_ack */
- /* Get the arl out of the chain. */
- rw_enter(&as->as_arl_lock, RW_WRITER);
- for (arlp = &as->as_arl_head; *arlp;
- arlp = &(*arlp)->arl_next) {
- if (*arlp == arl) {
- *arlp = arl->arl_next;
- break;
- }
- }
-
- ASSERT(arl->arl_dlpi_deferred == NULL);
- ar->ar_arl = NULL;
- rw_exit(&as->as_arl_lock);
-
- mi_free((char *)arl);
- }
- /* Let's break the association between an ARL and IP instance */
- if (ar->ar_arl_ip_assoc != NULL) {
- ASSERT(ar->ar_arl_ip_assoc->ar_arl_ip_assoc != NULL &&
- ar->ar_arl_ip_assoc->ar_arl_ip_assoc == ar);
- ar->ar_arl_ip_assoc->ar_arl_ip_assoc = NULL;
- ar->ar_arl_ip_assoc = NULL;
- }
- cr = ar->ar_credp;
- /* mi_close_comm frees the instance data. */
- (void) mi_close_comm(&as->as_head, q);
- qprocsoff(q);
- crfree(cr);
-
- if (index != 0) {
- hook_nic_event_t info;
-
- info.hne_nic = index;
- info.hne_lif = 0;
- info.hne_event = NE_UNPLUMB;
- info.hne_data = name;
- info.hne_datalen = strlen(name);
- (void) hook_run(as->as_net_data->netd_hooks,
- as->as_arpnicevents, (hook_data_t)&info);
- }
- netstack_rele(as->as_netstack);
- return (0);
-}
-
-/*
- * Dispatch routine for ARP commands. This routine can be called out of
- * either ar_wput or ar_rput, in response to IOCTLs or M_PROTO messages.
- */
-/* TODO: error reporting for M_PROTO case */
-static int
-ar_cmd_dispatch(queue_t *q, mblk_t *mp_orig, boolean_t from_wput)
-{
- arct_t *arct;
- uint32_t cmd;
- ssize_t len;
- mblk_t *mp = mp_orig;
- cred_t *cr = NULL;
-
- if (!mp)
- return (ENOENT);
-
- /* We get both M_PROTO and M_IOCTL messages, so watch out! */
- if (DB_TYPE(mp) == M_IOCTL) {
- struct iocblk *ioc;
- ioc = (struct iocblk *)mp->b_rptr;
- cmd = ioc->ioc_cmd;
- cr = ioc->ioc_cr;
- mp = mp->b_cont;
- if (!mp)
- return (ENOENT);
- } else {
- cr = msg_getcred(mp, NULL);
- /* For initial messages beteen IP and ARP, cr can be NULL */
- if (cr == NULL)
- cr = ((ar_t *)q->q_ptr)->ar_credp;
- }
- len = MBLKL(mp);
- if (len < sizeof (uint32_t) || !OK_32PTR(mp->b_rptr))
- return (ENOENT);
- if (mp_orig == mp)
- cmd = *(uint32_t *)mp->b_rptr;
- for (arct = ar_cmd_tbl; ; arct++) {
- if (arct >= A_END(ar_cmd_tbl))
- return (ENOENT);
- if (arct->arct_cmd == cmd)
- break;
- }
- if (len < arct->arct_min_len) {
- /*
- * If the command is exclusive to ARP, we return EINVAL,
- * else we need to pass the command downstream, so return
- * ENOENT
- */
- return ((arct->arct_flags & ARF_ONLY_CMD) ? EINVAL : ENOENT);
- }
- if (arct->arct_priv_req != OP_NP) {
- int error;
-
- if ((error = secpolicy_ip(cr, arct->arct_priv_req,
- B_FALSE)) != 0)
- return (error);
- }
- /* Disallow many commands except if from rput i.e. from IP */
- if (from_wput && !(arct->arct_flags & ARF_WPUT_OK)) {
- return (EINVAL);
- }
-
- if (arct->arct_flags & ARF_IOCTL_AWARE)
- mp = mp_orig;
-
- DTRACE_PROBE3(cmd_dispatch, queue_t *, q, mblk_t *, mp,
- arct_t *, arct);
- return (*arct->arct_pfi)(q, mp);
-}
-
-/* Allocate and do common initializations for DLPI messages. */
-static mblk_t *
-ar_dlpi_comm(t_uscalar_t prim, size_t size)
-{
- mblk_t *mp;
-
- if ((mp = allocb(size, BPRI_HI)) == NULL)
- return (NULL);
-
- /*
- * DLPIv2 says that DL_INFO_REQ and DL_TOKEN_REQ (the latter
- * of which we don't seem to use) are sent with M_PCPROTO, and
- * that other DLPI are M_PROTO.
- */
- DB_TYPE(mp) = (prim == DL_INFO_REQ) ? M_PCPROTO : M_PROTO;
-
- mp->b_wptr = mp->b_rptr + size;
- bzero(mp->b_rptr, size);
- ((union DL_primitives *)mp->b_rptr)->dl_primitive = prim;
-
- return (mp);
-}
-
-static void
-ar_dlpi_dispatch(arl_t *arl)
-{
- mblk_t *mp;
- t_uscalar_t primitive = DL_PRIM_INVAL;
-
- while (((mp = arl->arl_dlpi_deferred) != NULL) &&
- (arl->arl_dlpi_pending == DL_PRIM_INVAL)) {
- union DL_primitives *dlp = (union DL_primitives *)mp->b_rptr;
-
- DTRACE_PROBE2(dlpi_dispatch, arl_t *, arl, mblk_t *, mp);
-
- ASSERT(DB_TYPE(mp) == M_PROTO || DB_TYPE(mp) == M_PCPROTO);
- arl->arl_dlpi_deferred = mp->b_next;
- mp->b_next = NULL;
-
- /*
- * If this is a DL_NOTIFY_CONF, no ack is expected.
- */
- if ((primitive = dlp->dl_primitive) != DL_NOTIFY_CONF)
- arl->arl_dlpi_pending = dlp->dl_primitive;
- putnext(arl->arl_wq, mp);
- }
-
- if (arl->arl_dlpi_pending == DL_PRIM_INVAL) {
- /*
- * No pending DLPI operation.
- */
- ASSERT(mp == NULL);
- DTRACE_PROBE1(dlpi_idle, arl_t *, arl);
-
- /*
- * If the last DLPI message dispatched is DL_NOTIFY_CONF,
- * it is not assoicated with any pending cmd request, drain
- * the rest of pending cmd requests, otherwise call
- * ar_cmd_done() to finish up the current pending cmd
- * operation.
- */
- if (primitive == DL_NOTIFY_CONF)
- ar_cmd_drain(arl);
- else
- ar_cmd_done(arl);
- } else if (mp != NULL) {
- DTRACE_PROBE2(dlpi_defer, arl_t *, arl, mblk_t *, mp);
- }
-}
-
-/*
- * The following two functions serialize DLPI messages to the driver, much
- * along the lines of ill_dlpi_send and ill_dlpi_done in IP. Basically,
- * we wait for a DLPI message, sent downstream, to be acked before sending
- * the next. If there are DLPI messages that have not yet been sent, queue
- * this message (mp), else send it downstream.
- */
-static void
-ar_dlpi_send(arl_t *arl, mblk_t *mp)
-{
- mblk_t **mpp;
-
- ASSERT(arl != NULL);
- ASSERT(DB_TYPE(mp) == M_PROTO || DB_TYPE(mp) == M_PCPROTO);
-
- /* Always queue the message. Tail insertion */
- mpp = &arl->arl_dlpi_deferred;
- while (*mpp != NULL)
- mpp = &((*mpp)->b_next);
- *mpp = mp;
-
- ar_dlpi_dispatch(arl);
-}
-
-/*
- * Called when an DLPI control message has been acked; send down the next
- * queued message (if any).
- * The DLPI messages of interest being bind, attach, unbind and detach since
- * these are the only ones sent by ARP via ar_dlpi_send.
- */
-static void
-ar_dlpi_done(arl_t *arl, t_uscalar_t prim)
-{
- if (arl->arl_dlpi_pending != prim) {
- DTRACE_PROBE2(dlpi_done_unexpected, arl_t *, arl,
- t_uscalar_t, prim);
- return;
- }
-
- DTRACE_PROBE2(dlpi_done, arl_t *, arl, t_uscalar_t, prim);
- arl->arl_dlpi_pending = DL_PRIM_INVAL;
- ar_dlpi_dispatch(arl);
-}
-
-/*
- * Send a DL_NOTE_REPLUMB_DONE message down to the driver to indicate
- * the replumb process has already been done. Note that mp is either a
- * DL_NOTIFY_IND message or an AR_INTERFACE_DOWN message (comes from IP).
- */
-static void
-arp_replumb_done(arl_t *arl, mblk_t *mp)
-{
- ASSERT(arl->arl_state == ARL_S_DOWN && arl->arl_replumbing);
-
- mp = mexchange(NULL, mp, sizeof (dl_notify_conf_t), M_PROTO,
- DL_NOTIFY_CONF);
- ((dl_notify_conf_t *)(mp->b_rptr))->dl_notification =
- DL_NOTE_REPLUMB_DONE;
- arl->arl_replumbing = B_FALSE;
- ar_dlpi_send(arl, mp);
-}
-
-static void
-ar_cmd_drain(arl_t *arl)
-{
- mblk_t *mp;
- queue_t *q;
-
- /*
- * Run the commands that have been enqueued while we were waiting
- * for the last command (AR_INTERFACE_UP or AR_INTERFACE_DOWN)
- * to complete.
- */
- while ((mp = arl->arl_queue) != NULL) {
- if (((uintptr_t)mp->b_prev & CMD_IN_PROGRESS) != 0) {
- /*
- * The current command is an AR_INTERFACE_UP or
- * AR_INTERFACE_DOWN and is waiting for a DLPI ack
- * from the driver. Return. We can't make progress now.
- */
- break;
- }
-
- mp = ar_cmd_dequeue(arl);
- mp->b_prev = AR_DRAINING;
- q = mp->b_queue;
- mp->b_queue = NULL;
-
- /*
- * Don't call put(q, mp) since it can lead to reorder of
- * messages by sending the current messages to the end of
- * arp's syncq
- */
- if (q->q_flag & QREADR)
- ar_rput(q, mp);
- else
- ar_wput(q, mp);
- }
-}
-
-static void
-ar_cmd_done(arl_t *arl)
-{
- mblk_t *mp;
- int cmd;
- int err;
- mblk_t *mp1;
- mblk_t *dlpi_op_done_mp = NULL;
- queue_t *dlpi_op_done_q;
- ar_t *ar_arl;
- ar_t *ar_ip;
-
- ASSERT(arl->arl_state == ARL_S_UP || arl->arl_state == ARL_S_DOWN);
-
- /*
- * If the current operation was initiated by IP there must be
- * an op enqueued in arl_queue. But if ar_close has sent down
- * a detach/unbind, there is no command enqueued. Also if the IP-ARP
- * stream has closed the cleanup would be done and there won't be any mp
- */
- if ((mp = arl->arl_queue) == NULL)
- return;
-
- if ((cmd = (uintptr_t)mp->b_prev) & CMD_IN_PROGRESS) {
- mp1 = ar_cmd_dequeue(arl);
- ASSERT(mp == mp1);
-
- cmd &= ~CMD_IN_PROGRESS;
- if (cmd == AR_INTERFACE_UP) {
- /*
- * There is an ioctl waiting for us...
- */
- if (arl->arl_state == ARL_S_UP)
- err = 0;
- else
- err = EINVAL;
-
- dlpi_op_done_mp = ar_alloc(AR_DLPIOP_DONE, err);
- if (dlpi_op_done_mp != NULL) {
- /*
- * Better performance if we send the response
- * after the potential MAPPING_ADDs command
- * that are likely to follow. (Do it below the
- * while loop, instead of putnext right now)
- */
- dlpi_op_done_q = WR(mp->b_queue);
- }
-
- if (err == 0) {
- /*
- * Now that we have the ARL instance
- * corresponding to the IP instance let's make
- * the association here.
- */
- ar_ip = (ar_t *)mp->b_queue->q_ptr;
- ar_arl = (ar_t *)arl->arl_rq->q_ptr;
- ar_arl->ar_arl_ip_assoc = ar_ip;
- ar_ip->ar_arl_ip_assoc = ar_arl;
- }
-
- inet_freemsg(mp);
- } else if (cmd == AR_INTERFACE_DOWN && arl->arl_replumbing) {
- /*
- * The arl is successfully brought down and this is
- * a result of the DL_NOTE_REPLUMB process. Reset
- * mp->b_prev first (it keeps the 'cmd' information
- * at this point).
- */
- mp->b_prev = NULL;
- arp_replumb_done(arl, mp);
- } else {
- inet_freemsg(mp);
- }
- }
-
- ar_cmd_drain(arl);
-
- if (dlpi_op_done_mp != NULL) {
- DTRACE_PROBE3(cmd_done_next, arl_t *, arl,
- queue_t *, dlpi_op_done_q, mblk_t *, dlpi_op_done_mp);
- putnext(dlpi_op_done_q, dlpi_op_done_mp);
- }
-}
-
-/*
- * Queue all arp commands coming from clients. Typically these commands
- * come from IP, but could also come from other clients. The commands
- * are serviced in FIFO order. Some commands need to wait and restart
- * after the DLPI response from the driver is received. Typically
- * AR_INTERFACE_UP and AR_INTERFACE_DOWN. ar_dlpi_done restarts
- * the command and then dequeues the queue at arl_queue and calls ar_rput
- * or ar_wput for each enqueued command. AR_DRAINING is used to signify
- * that the command is being executed thru a drain from ar_dlpi_done.
- * Functions handling the individual commands such as ar_entry_add
- * check for this flag in b_prev to determine whether the command has
- * to be enqueued for later processing or must be processed now.
- *
- * b_next used to thread the enqueued command mblks
- * b_queue used to identify the queue of the originating request(client)
- * b_prev used to store the command itself for easy parsing.
- */
-static void
-ar_cmd_enqueue(arl_t *arl, mblk_t *mp, queue_t *q, ushort_t cmd,
- boolean_t tail_insert)
-{
- mp->b_queue = q;
- if (arl->arl_queue == NULL) {
- ASSERT(arl->arl_queue_tail == NULL);
- mp->b_prev = (void *)((uintptr_t)(cmd | CMD_IN_PROGRESS));
- mp->b_next = NULL;
- arl->arl_queue = mp;
- arl->arl_queue_tail = mp;
- } else if (tail_insert) {
- mp->b_prev = (void *)((uintptr_t)cmd);
- mp->b_next = NULL;
- arl->arl_queue_tail->b_next = mp;
- arl->arl_queue_tail = mp;
- } else {
- /* head insert */
- mp->b_prev = (void *)((uintptr_t)cmd | CMD_IN_PROGRESS);
- mp->b_next = arl->arl_queue;
- arl->arl_queue = mp;
- }
-}
-
-static mblk_t *
-ar_cmd_dequeue(arl_t *arl)
-{
- mblk_t *mp;
-
- if (arl->arl_queue == NULL) {
- ASSERT(arl->arl_queue_tail == NULL);
- return (NULL);
- }
- mp = arl->arl_queue;
- arl->arl_queue = mp->b_next;
- if (arl->arl_queue == NULL)
- arl->arl_queue_tail = NULL;
- mp->b_next = NULL;
- return (mp);
-}
-
-/*
- * Standard ACE timer handling: compute 'fuzz' around a central value or from 0
- * up to a value, and then set the timer. The randomization is necessary to
- * prevent groups of systems from falling into synchronization on the network
- * and producing ARP packet storms.
- */
-static void
-ace_set_timer(ace_t *ace, boolean_t initial_time)
-{
- clock_t intv, rnd, frac;
-
- (void) random_get_pseudo_bytes((uint8_t *)&rnd, sizeof (rnd));
- /* Note that clock_t is signed; must chop off bits */
- rnd &= (1ul << (NBBY * sizeof (rnd) - 1)) - 1;
- intv = ace->ace_xmit_interval;
- if (initial_time) {
- /* Set intv to be anywhere in the [1 .. intv] range */
- if (intv <= 0)
- intv = 1;
- else
- intv = (rnd % intv) + 1;
- } else {
- /* Compute 'frac' as 20% of the configured interval */
- if ((frac = intv / 5) <= 1)
- frac = 2;
- /* Set intv randomly in the range [intv-frac .. intv+frac] */
- if ((intv = intv - frac + rnd % (2 * frac + 1)) <= 0)
- intv = 1;
- }
- mi_timer(ace->ace_arl->arl_wq, ace->ace_mp, intv);
-}
-
-/*
- * Process entry add requests from external messages.
- * It is also called by ip_rput_dlpi_writer() through
- * ipif_resolver_up() to change hardware address when
- * an asynchronous hardware address change notification
- * arrives from the driver.
- */
-static int
-ar_entry_add(queue_t *q, mblk_t *mp_orig)
-{
- area_t *area;
- ace_t *ace;
- uchar_t *hw_addr;
- uint32_t hw_addr_len;
- uchar_t *proto_addr;
- uint32_t proto_addr_len;
- uchar_t *proto_mask;
- arl_t *arl;
- mblk_t *mp = mp_orig;
- int err;
- uint_t aflags;
- boolean_t unverified;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- /* We handle both M_IOCTL and M_PROTO messages. */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL)
- return (EINVAL);
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp_orig, arl)) {
- DTRACE_PROBE3(eadd_enqueued, queue_t *, q, mblk_t *, mp_orig,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp_orig, q, AR_ENTRY_ADD, B_TRUE);
- return (EINPROGRESS);
- }
- mp_orig->b_prev = NULL;
-
- area = (area_t *)mp->b_rptr;
- aflags = area->area_flags;
-
- /*
- * If the previous entry wasn't published and we are now going
- * to publish, then we need to do address verification. The previous
- * entry may have been a local unpublished address or even an external
- * address. If the entry we find was in an unverified state we retain
- * this.
- * If it's a new published entry, then we're obligated to do
- * duplicate address detection now.
- */
- ace = ar_ce_lookup_from_area(as, mp, ar_ce_lookup_entry);
- if (ace != NULL) {
- unverified = !(ace->ace_flags & ACE_F_PUBLISH) &&
- (aflags & ACE_F_PUBLISH);
- if (ace->ace_flags & ACE_F_UNVERIFIED)
- unverified = B_TRUE;
- ar_ce_delete(ace);
- } else {
- unverified = (aflags & ACE_F_PUBLISH) != 0;
- }
-
- /* Allow client to request DAD restart */
- if (aflags & ACE_F_UNVERIFIED)
- unverified = B_TRUE;
-
- /* Extract parameters from the message. */
- hw_addr_len = area->area_hw_addr_length;
- hw_addr = mi_offset_paramc(mp, area->area_hw_addr_offset, hw_addr_len);
- proto_addr_len = area->area_proto_addr_length;
- proto_addr = mi_offset_paramc(mp, area->area_proto_addr_offset,
- proto_addr_len);
- proto_mask = mi_offset_paramc(mp, area->area_proto_mask_offset,
- proto_addr_len);
- if (proto_mask == NULL) {
- DTRACE_PROBE2(eadd_bad_mask, arl_t *, arl, area_t *, area);
- return (EINVAL);
- }
- err = ar_ce_create(
- arl,
- area->area_proto,
- hw_addr,
- hw_addr_len,
- proto_addr,
- proto_addr_len,
- proto_mask,
- NULL,
- (uint32_t)0,
- NULL,
- aflags & ~ACE_F_MAPPING & ~ACE_F_UNVERIFIED & ~ACE_F_DEFEND);
- if (err != 0) {
- DTRACE_PROBE3(eadd_create_failed, arl_t *, arl, area_t *, area,
- int, err);
- return (err);
- }
-
- if (aflags & ACE_F_PUBLISH) {
- arlphy_t *ap;
-
- ace = ar_ce_lookup(arl, area->area_proto, proto_addr,
- proto_addr_len);
- ASSERT(ace != NULL);
-
- ap = ace->ace_xmit_arl->arl_phy;
-
- if (hw_addr == NULL || hw_addr_len == 0) {
- hw_addr = ap->ap_hw_addr;
- } else if (aflags & ACE_F_MYADDR) {
- /*
- * If hardware address changes, then make sure
- * that the hardware address and hardware
- * address length fields in arlphy_t get updated
- * too. Otherwise, they will continue carrying
- * the old hardware address information.
- */
- ASSERT((hw_addr != NULL) && (hw_addr_len != 0));
- bcopy(hw_addr, ap->ap_hw_addr, hw_addr_len);
- ap->ap_hw_addrlen = hw_addr_len;
- }
-
- if (ace->ace_flags & ACE_F_FAST) {
- ace->ace_xmit_count = as->as_fastprobe_count;
- ace->ace_xmit_interval = as->as_fastprobe_delay;
- } else {
- ace->ace_xmit_count = as->as_probe_count;
- ace->ace_xmit_interval = as->as_probe_delay;
- }
-
- /*
- * If the user has disabled duplicate address detection for
- * this kind of interface (fast or slow) by setting the probe
- * count to zero, then pretend as if we've verified the
- * address, and go right to address defense mode.
- */
- if (ace->ace_xmit_count == 0)
- unverified = B_FALSE;
-
- /*
- * If we need to do duplicate address detection, then kick that
- * off. Otherwise, send out a gratuitous ARP message in order
- * to update everyone's caches with the new hardware address.
- */
- if (unverified) {
- ace->ace_flags |= ACE_F_UNVERIFIED;
- if (ace->ace_xmit_interval == 0) {
- /*
- * User has configured us to send the first
- * probe right away. Do so, and set up for
- * the subsequent probes.
- */
- DTRACE_PROBE2(eadd_probe, ace_t *, ace,
- area_t *, area);
- ar_xmit(ace->ace_xmit_arl, ARP_REQUEST,
- area->area_proto, proto_addr_len,
- hw_addr, NULL, NULL, proto_addr, NULL, as);
- ace->ace_xmit_count--;
- ace->ace_xmit_interval =
- (ace->ace_flags & ACE_F_FAST) ?
- as->as_fastprobe_interval :
- as->as_probe_interval;
- ace_set_timer(ace, B_FALSE);
- } else {
- DTRACE_PROBE2(eadd_delay, ace_t *, ace,
- area_t *, area);
- /* Regular delay before initial probe */
- ace_set_timer(ace, B_TRUE);
- }
- } else {
- DTRACE_PROBE2(eadd_announce, ace_t *, ace,
- area_t *, area);
- ar_xmit(ace->ace_xmit_arl, ARP_REQUEST,
- area->area_proto, proto_addr_len, hw_addr,
- proto_addr, ap->ap_arp_addr, proto_addr, NULL, as);
- ace->ace_last_bcast = ddi_get_lbolt();
-
- /*
- * If AUTHORITY is set, it is not just a proxy arp
- * entry; we believe we're the authority for this
- * entry. In that case, and if we're not just doing
- * one-off defense of the address, we send more than
- * one copy, so we'll still have a good chance of
- * updating everyone even when there's a packet loss
- * or two.
- */
- if ((aflags & ACE_F_AUTHORITY) &&
- !(aflags & ACE_F_DEFEND) &&
- as->as_publish_count > 0) {
- /* Account for the xmit we just did */
- ace->ace_xmit_count = as->as_publish_count - 1;
- ace->ace_xmit_interval =
- as->as_publish_interval;
- if (ace->ace_xmit_count > 0)
- ace_set_timer(ace, B_FALSE);
- }
- }
- }
- return (0);
-}
-
-/* Process entry delete requests from external messages. */
-static int
-ar_entry_delete(queue_t *q, mblk_t *mp_orig)
-{
- ace_t *ace;
- arl_t *arl;
- mblk_t *mp = mp_orig;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- /* We handle both M_IOCTL and M_PROTO messages. */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL)
- return (EINVAL);
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp_orig, arl)) {
- DTRACE_PROBE3(edel_enqueued, queue_t *, q, mblk_t *, mp_orig,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp_orig, q, AR_ENTRY_DELETE, B_TRUE);
- return (EINPROGRESS);
- }
- mp_orig->b_prev = NULL;
-
- /*
- * Need to know if it is a mapping or an exact match. Check exact
- * match first.
- */
- ace = ar_ce_lookup_from_area(as, mp, ar_ce_lookup);
- if (ace != NULL) {
- ared_t *ared = (ared_t *)mp->b_rptr;
-
- /*
- * If it's a permanent entry, then the client is the one who
- * told us to delete it, so there's no reason to notify.
- */
- if (ACE_NONPERM(ace))
- ar_delete_notify(ace);
- /*
- * Only delete the ARP entry if it is non-permanent, or
- * ARED_F_PRESERVE_PERM flags is not set.
- */
- if (ACE_NONPERM(ace) ||
- !(ared->ared_flags & ARED_F_PRESERVE_PERM)) {
- ar_ce_delete(ace);
- }
- return (0);
- }
- return (ENXIO);
-}
-
-/*
- * Process entry query requests from external messages.
- * Bump up the ire_stats_freed for all errors except
- * EINPROGRESS - which means the packet has been queued.
- * For all other errors the packet is going to be freed
- * and hence we account for ire being freed if it
- * is a M_PROTO message.
- */
-static int
-ar_entry_query(queue_t *q, mblk_t *mp_orig)
-{
- ace_t *ace;
- areq_t *areq;
- arl_t *arl;
- int err;
- mblk_t *mp = mp_orig;
- uchar_t *proto_addr;
- uchar_t *sender_addr;
- uint32_t proto_addr_len;
- clock_t ms;
- boolean_t is_mproto = B_TRUE;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- /* We handle both M_IOCTL and M_PROTO messages. */
- if (DB_TYPE(mp) == M_IOCTL) {
- is_mproto = B_FALSE;
- mp = mp->b_cont;
- }
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL) {
- DTRACE_PROBE2(query_no_arl, queue_t *, q, mblk_t *, mp);
- err = EINVAL;
- goto err_ret;
- }
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp_orig, arl)) {
- DTRACE_PROBE3(query_enqueued, queue_t *, q, mblk_t *, mp_orig,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp_orig, q, AR_ENTRY_QUERY, B_TRUE);
- return (EINPROGRESS);
- }
- mp_orig->b_prev = NULL;
-
- areq = (areq_t *)mp->b_rptr;
- proto_addr_len = areq->areq_target_addr_length;
- proto_addr = mi_offset_paramc(mp, areq->areq_target_addr_offset,
- proto_addr_len);
- if (proto_addr == NULL) {
- DTRACE_PROBE1(query_illegal_address, areq_t *, areq);
- err = EINVAL;
- goto err_ret;
- }
- /* Stash the reply queue pointer for later use. */
- mp->b_prev = (mblk_t *)OTHERQ(q);
- mp->b_next = NULL;
- if (areq->areq_xmit_interval == 0)
- areq->areq_xmit_interval = AR_DEF_XMIT_INTERVAL;
- ace = ar_ce_lookup(arl, areq->areq_proto, proto_addr, proto_addr_len);
- if (ace != NULL && (ace->ace_flags & ACE_F_OLD)) {
- /*
- * This is a potentially stale entry that IP's asking about.
- * Since IP is asking, it must not have an answer anymore,
- * either due to periodic ARP flush or due to SO_DONTROUTE.
- * Rather than go forward with what we've got, restart
- * resolution.
- */
- DTRACE_PROBE2(query_stale_ace, ace_t *, ace, areq_t *, areq);
- ar_ce_delete(ace);
- ace = NULL;
- }
- if (ace != NULL) {
- mblk_t **mpp;
- uint32_t count = 0;
-
- /*
- * There is already a cache entry. This means there is either
- * a permanent entry, or address resolution is in progress.
- * If the latter, there should be one or more queries queued
- * up. We link the current one in at the end, if there aren't
- * too many outstanding.
- */
- for (mpp = &ace->ace_query_mp; mpp[0]; mpp = &mpp[0]->b_next) {
- if (++count > areq->areq_max_buffered) {
- DTRACE_PROBE2(query_overflow, ace_t *, ace,
- areq_t *, areq);
- mp->b_prev = NULL;
- err = EALREADY;
- goto err_ret;
- }
- }
- /* Put us on the list. */
- mpp[0] = mp;
- if (count != 0) {
- /*
- * If a query was already queued up, then we must not
- * have an answer yet.
- */
- DTRACE_PROBE2(query_in_progress, ace_t *, ace,
- areq_t *, areq);
- return (EINPROGRESS);
- }
- if (ACE_RESOLVED(ace)) {
- /*
- * We have an answer already.
- * Keep a dup of mp since proto_addr points to it
- * and mp has been placed on the ace_query_mp list.
- */
- mblk_t *mp1;
-
- DTRACE_PROBE2(query_resolved, ace_t *, ace,
- areq_t *, areq);
- mp1 = dupmsg(mp);
- ar_query_reply(ace, 0, proto_addr, proto_addr_len);
- freemsg(mp1);
- return (EINPROGRESS);
- }
- if (ace->ace_flags & ACE_F_MAPPING) {
- /* Should never happen */
- DTRACE_PROBE2(query_unresolved_mapping, ace_t *, ace,
- areq_t *, areq);
- mpp[0] = mp->b_next;
- err = ENXIO;
- goto err_ret;
- }
- DTRACE_PROBE2(query_unresolved, ace_t, ace, areq_t *, areq);
- } else {
- /* No ace yet. Make one now. (This is the common case.) */
- if (areq->areq_xmit_count == 0) {
- DTRACE_PROBE2(query_template, arl_t *, arl,
- areq_t *, areq);
- mp->b_prev = NULL;
- err = ENXIO;
- goto err_ret;
- }
- /*
- * Check for sender addr being NULL or not before
- * we create the ace. It is easy to cleanup later.
- */
- sender_addr = mi_offset_paramc(mp,
- areq->areq_sender_addr_offset,
- areq->areq_sender_addr_length);
- if (sender_addr == NULL) {
- DTRACE_PROBE2(query_no_sender, arl_t *, arl,
- areq_t *, areq);
- mp->b_prev = NULL;
- err = EINVAL;
- goto err_ret;
- }
- err = ar_ce_create(OWNING_ARL(arl), areq->areq_proto, NULL, 0,
- proto_addr, proto_addr_len, NULL,
- NULL, (uint32_t)0, sender_addr,
- areq->areq_flags);
- if (err != 0) {
- DTRACE_PROBE3(query_create_failed, arl_t *, arl,
- areq_t *, areq, int, err);
- mp->b_prev = NULL;
- goto err_ret;
- }
- ace = ar_ce_lookup(arl, areq->areq_proto, proto_addr,
- proto_addr_len);
- if (ace == NULL || ace->ace_query_mp != NULL) {
- /* Shouldn't happen! */
- DTRACE_PROBE3(query_lookup_failed, arl_t *, arl,
- areq_t *, areq, ace_t *, ace);
- mp->b_prev = NULL;
- err = ENXIO;
- goto err_ret;
- }
- ace->ace_query_mp = mp;
- }
- ms = ar_query_xmit(as, ace);
- if (ms == 0) {
- /* Immediate reply requested. */
- ar_query_reply(ace, ENXIO, NULL, (uint32_t)0);
- } else {
- mi_timer(ace->ace_arl->arl_wq, ace->ace_mp, ms);
- }
- return (EINPROGRESS);
-err_ret:
- if (is_mproto) {
- ip_stack_t *ipst = as->as_netstack->netstack_ip;
-
- BUMP_IRE_STATS(ipst->ips_ire_stats_v4, ire_stats_freed);
- }
- return (err);
-}
-
-/* Handle simple query requests. */
-static int
-ar_entry_squery(queue_t *q, mblk_t *mp_orig)
-{
- ace_t *ace;
- area_t *area;
- arl_t *arl;
- uchar_t *hw_addr;
- uint32_t hw_addr_len;
- mblk_t *mp = mp_orig;
- uchar_t *proto_addr;
- int proto_addr_len;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL)
- return (EINVAL);
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp_orig, arl)) {
- DTRACE_PROBE3(squery_enqueued, queue_t *, q, mblk_t *, mp_orig,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp_orig, q, AR_ENTRY_SQUERY, B_TRUE);
- return (EINPROGRESS);
- }
- mp_orig->b_prev = NULL;
-
- /* Extract parameters from the request message. */
- area = (area_t *)mp->b_rptr;
- proto_addr_len = area->area_proto_addr_length;
- proto_addr = mi_offset_paramc(mp, area->area_proto_addr_offset,
- proto_addr_len);
- hw_addr_len = area->area_hw_addr_length;
- hw_addr = mi_offset_paramc(mp, area->area_hw_addr_offset, hw_addr_len);
- if (proto_addr == NULL || hw_addr == NULL) {
- DTRACE_PROBE1(squery_illegal_address, area_t *, area);
- return (EINVAL);
- }
- ace = ar_ce_lookup(arl, area->area_proto, proto_addr, proto_addr_len);
- if (ace == NULL) {
- return (ENXIO);
- }
- if (hw_addr_len < ace->ace_hw_addr_length) {
- return (EINVAL);
- }
- if (ACE_RESOLVED(ace)) {
- /* Got it, prepare the response. */
- ASSERT(area->area_hw_addr_length == ace->ace_hw_addr_length);
- ar_set_address(ace, hw_addr, proto_addr, proto_addr_len);
- } else {
- /*
- * We have an incomplete entry. Set the length to zero and
- * just return out the flags.
- */
- area->area_hw_addr_length = 0;
- }
- area->area_flags = ace->ace_flags;
- if (mp == mp_orig) {
- /* Non-ioctl case */
- /* TODO: change message type? */
- DB_TYPE(mp) = M_CTL; /* Caught by ip_wput */
- DTRACE_PROBE3(squery_reply, queue_t *, q, mblk_t *, mp,
- arl_t *, arl);
- qreply(q, mp);
- return (EINPROGRESS);
- }
- return (0);
-}
-
-/* Process an interface down causing us to detach and unbind. */
-/* ARGSUSED */
-static int
-ar_interface_down(queue_t *q, mblk_t *mp)
-{
- arl_t *arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL || arl->arl_closing) {
- DTRACE_PROBE2(down_no_arl, queue_t *, q, mblk_t *, mp);
- return (EINVAL);
- }
-
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp, arl)) {
- DTRACE_PROBE3(down_enqueued, queue_t *, q, mblk_t *, mp,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp, q, AR_INTERFACE_DOWN, B_TRUE);
- return (EINPROGRESS);
- }
- mp->b_prev = NULL;
- /*
- * The arl is already down, no work to do.
- */
- if (arl->arl_state == ARL_S_DOWN) {
- if (arl->arl_replumbing) {
- /*
- * The arl is already down and this is a result of
- * the DL_NOTE_REPLUMB process. Return EINPROGRESS
- * so this mp won't be freed by ar_rput().
- */
- arp_replumb_done(arl, mp);
- return (EINPROGRESS);
- } else {
- /* ar_rput frees the mp */
- return (0);
- }
- }
-
- /*
- * This command cannot complete in a single shot now itself.
- * It has to be restarted after the receipt of the ack from
- * the driver. So we need to enqueue the command (at the head).
- */
- ar_cmd_enqueue(arl, mp, q, AR_INTERFACE_DOWN, B_FALSE);
-
- ASSERT(arl->arl_state == ARL_S_UP);
-
- /* Free all arp entries for this interface */
- ar_ce_walk(as, ar_ce_delete_per_arl, arl);
-
- ar_ll_down(arl);
- /* Return EINPROGRESS so that ar_rput does not free the 'mp' */
- return (EINPROGRESS);
-}
-
-
-/* Process an interface up causing the info req sequence to start. */
-/* ARGSUSED */
-static int
-ar_interface_up(queue_t *q, mblk_t *mp)
-{
- arl_t *arl;
- int err;
- mblk_t *mp1;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL || arl->arl_closing) {
- DTRACE_PROBE2(up_no_arl, queue_t *, q, mblk_t *, mp);
- err = EINVAL;
- goto done;
- }
-
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp, arl)) {
- DTRACE_PROBE3(up_enqueued, queue_t *, q, mblk_t *, mp,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp, q, AR_INTERFACE_UP, B_TRUE);
- return (EINPROGRESS);
- }
- mp->b_prev = NULL;
-
- /*
- * The arl is already up. No work to do.
- */
- if (arl->arl_state == ARL_S_UP) {
- err = 0;
- goto done;
- }
-
- /*
- * This command cannot complete in a single shot now itself.
- * It has to be restarted after the receipt of the ack from
- * the driver. So we need to enqueue the command (at the head).
- */
- ar_cmd_enqueue(arl, mp, q, AR_INTERFACE_UP, B_FALSE);
-
- err = ar_ll_up(arl);
-
- /* Return EINPROGRESS so that ar_rput does not free the 'mp' */
- return (EINPROGRESS);
-
-done:
- /* caller frees 'mp' */
-
- mp1 = ar_alloc(AR_DLPIOP_DONE, err);
- if (mp1 != NULL) {
- q = WR(q);
- DTRACE_PROBE3(up_send_err, queue_t *, q, mblk_t *, mp1,
- int, err);
- putnext(q, mp1);
- }
- return (err);
-}
-
-/*
- * Given an arie_t `mp', find the arl_t's that it names and return them
- * in `*arlp' and `*ipmp_arlp'. If they cannot be found, return B_FALSE.
- */
-static boolean_t
-ar_ipmp_lookup(arp_stack_t *as, mblk_t *mp, arl_t **arlp, arl_t **ipmp_arlp)
-{
- arie_t *arie = (arie_t *)mp->b_rptr;
-
- *arlp = ar_ll_lookup_from_mp(as, mp);
- if (*arlp == NULL) {
- DTRACE_PROBE1(ipmp_lookup_no_arl, mblk_t *, mp);
- return (B_FALSE);
- }
-
- arie->arie_grifname[LIFNAMSIZ - 1] = '\0';
- *ipmp_arlp = ar_ll_lookup_by_name(as, arie->arie_grifname);
- if (*ipmp_arlp == NULL) {
- DTRACE_PROBE1(ipmp_lookup_no_ipmp_arl, mblk_t *, mp);
- return (B_FALSE);
- }
-
- DTRACE_PROBE2(ipmp_lookup, arl_t *, *arlp, arl_t *, *ipmp_arlp);
- return (B_TRUE);
-}
-
-/*
- * Bind an arl_t to an IPMP group arl_t.
- */
-static int
-ar_ipmp_activate(queue_t *q, mblk_t *mp)
-{
- arl_t *arl, *ipmp_arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- if (!ar_ipmp_lookup(as, mp, &arl, &ipmp_arl))
- return (EINVAL);
-
- if (arl->arl_ipmp_arl != NULL) {
- DTRACE_PROBE1(ipmp_activated_already, arl_t *, arl);
- return (EALREADY);
- }
-
- DTRACE_PROBE2(ipmp_activate, arl_t *, arl, arl_t *, ipmp_arl);
- arl->arl_ipmp_arl = ipmp_arl;
- return (0);
-}
-
-/*
- * Unbind an arl_t from an IPMP group arl_t and update the ace_t's so
- * that it is no longer part of the group.
- */
-static int
-ar_ipmp_deactivate(queue_t *q, mblk_t *mp)
-{
- arl_t *arl, *ipmp_arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- if (!ar_ipmp_lookup(as, mp, &arl, &ipmp_arl))
- return (EINVAL);
-
- if (ipmp_arl != arl->arl_ipmp_arl) {
- DTRACE_PROBE2(ipmp_deactivate_notactive, arl_t *, arl, arl_t *,
- ipmp_arl);
- return (EINVAL);
- }
-
- DTRACE_PROBE2(ipmp_deactivate, arl_t *, arl, arl_t *,
- arl->arl_ipmp_arl);
- ar_ce_walk(as, ar_ce_ipmp_deactivate, arl);
- arl->arl_ipmp_arl = NULL;
- return (0);
-}
-
-/*
- * Enable an interface to process ARP_REQUEST and ARP_RESPONSE messages.
- */
-/* ARGSUSED */
-static int
-ar_interface_on(queue_t *q, mblk_t *mp)
-{
- arl_t *arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL) {
- DTRACE_PROBE2(on_no_arl, queue_t *, q, mblk_t *, mp);
- return (EINVAL);
- }
-
- DTRACE_PROBE3(on_intf, queue_t *, q, mblk_t *, mp, arl_t *, arl);
- arl->arl_flags &= ~ARL_F_NOARP;
- return (0);
-}
-
-/*
- * Disable an interface from processing
- * ARP_REQUEST and ARP_RESPONSE messages
- */
-/* ARGSUSED */
-static int
-ar_interface_off(queue_t *q, mblk_t *mp)
-{
- arl_t *arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL) {
- DTRACE_PROBE2(off_no_arl, queue_t *, q, mblk_t *, mp);
- return (EINVAL);
- }
-
- DTRACE_PROBE3(off_intf, queue_t *, q, mblk_t *, mp, arl_t *, arl);
- arl->arl_flags |= ARL_F_NOARP;
- return (0);
-}
-
-/*
- * The queue 'q' is closing. Walk all the arl's and free any message
- * pending in the arl_queue if it originated from the closing q.
- * Also cleanup the ip_pending_queue, if the arp-IP stream is closing.
- */
-static void
-ar_ll_cleanup_arl_queue(queue_t *q)
-{
- arl_t *arl;
- mblk_t *mp;
- mblk_t *mpnext;
- mblk_t *prev;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
- ip_stack_t *ipst = as->as_netstack->netstack_ip;
-
- for (arl = as->as_arl_head; arl != NULL; arl = arl->arl_next) {
- for (prev = NULL, mp = arl->arl_queue; mp != NULL;
- mp = mpnext) {
- mpnext = mp->b_next;
- if ((void *)mp->b_queue == (void *)q ||
- (void *)mp->b_queue == (void *)OTHERQ(q)) {
- if (prev == NULL)
- arl->arl_queue = mp->b_next;
- else
- prev->b_next = mp->b_next;
- if (arl->arl_queue_tail == mp)
- arl->arl_queue_tail = prev;
- if (DB_TYPE(mp) == M_PROTO &&
- *(uint32_t *)mp->b_rptr == AR_ENTRY_QUERY) {
- BUMP_IRE_STATS(ipst->ips_ire_stats_v4,
- ire_stats_freed);
- }
- inet_freemsg(mp);
- } else {
- prev = mp;
- }
- }
- }
-}
-
-/*
- * Look up a lower level tap by name.
- */
-static arl_t *
-ar_ll_lookup_by_name(arp_stack_t *as, const char *name)
-{
- arl_t *arl;
-
- for (arl = as->as_arl_head; arl; arl = arl->arl_next) {
- if (strcmp(arl->arl_name, name) == 0) {
- return (arl);
- }
- }
- return (NULL);
-}
-
-/*
- * Look up a lower level tap using parameters extracted from the common
- * portion of the ARP command.
- */
-static arl_t *
-ar_ll_lookup_from_mp(arp_stack_t *as, mblk_t *mp)
-{
- arc_t *arc = (arc_t *)mp->b_rptr;
- uint8_t *name;
- size_t namelen = arc->arc_name_length;
-
- name = mi_offset_param(mp, arc->arc_name_offset, namelen);
- if (name == NULL || name[namelen - 1] != '\0')
- return (NULL);
- return (ar_ll_lookup_by_name(as, (char *)name));
-}
-
-static void
-ar_ll_init(arp_stack_t *as, ar_t *ar, mblk_t *mp)
-{
- arl_t *arl;
- dl_info_ack_t *dlia = (dl_info_ack_t *)mp->b_rptr;
-
- ASSERT(ar->ar_arl == NULL);
-
- if ((arl = (arl_t *)mi_zalloc(sizeof (arl_t))) == NULL)
- return;
-
- if (dlia->dl_mac_type == SUNW_DL_IPMP) {
- arl->arl_flags |= ARL_F_IPMP;
- arl->arl_ipmp_arl = arl;
- }
-
- arl->arl_provider_style = dlia->dl_provider_style;
- arl->arl_rq = ar->ar_rq;
- arl->arl_wq = ar->ar_wq;
-
- arl->arl_dlpi_pending = DL_PRIM_INVAL;
-
- ar->ar_arl = arl;
-
- /*
- * If/when ARP gets pushed into the IP module then this code to make
- * a number uniquely identify an ARP instance can be removed and the
- * ifindex from IP used. Rather than try and reinvent or copy the
- * code used by IP for the purpose of allocating an index number
- * (and trying to keep the number small), just allocate it in an
- * ever increasing manner. This index number isn't ever exposed to
- * users directly, its only use is for providing the pfhooks interface
- * with a number it can use to uniquely identify an interface in time.
- *
- * Using a 32bit counter, over 136 plumbs would need to be done every
- * second of every day (non-leap year) for it to wrap around and the
- * for() loop below to kick in as a performance concern.
- */
- if (as->as_arp_counter_wrapped) {
- arl_t *arl1;
-
- do {
- for (arl1 = as->as_arl_head; arl1 != NULL;
- arl1 = arl1->arl_next)
- if (arl1->arl_index ==
- as->as_arp_index_counter) {
- as->as_arp_index_counter++;
- if (as->as_arp_index_counter == 0) {
- as->as_arp_counter_wrapped++;
- as->as_arp_index_counter = 1;
- }
- break;
- }
- } while (arl1 != NULL);
- } else {
- arl->arl_index = as->as_arp_index_counter;
- }
- as->as_arp_index_counter++;
- if (as->as_arp_index_counter == 0) {
- as->as_arp_counter_wrapped++;
- as->as_arp_index_counter = 1;
- }
-}
-
-/*
- * This routine is called during module initialization when the DL_INFO_ACK
- * comes back from the device. We set up defaults for all the device dependent
- * doo-dads we are going to need. This will leave us ready to roll if we are
- * attempting auto-configuration. Alternatively, these defaults can be
- * overridden by initialization procedures possessing higher intelligence.
- */
-static void
-ar_ll_set_defaults(arl_t *arl, mblk_t *mp)
-{
- ar_m_t *arm;
- dl_info_ack_t *dlia = (dl_info_ack_t *)mp->b_rptr;
- dl_unitdata_req_t *dlur;
- uchar_t *up;
- arlphy_t *ap;
-
- ASSERT(arl != NULL);
-
- /*
- * Clear any stale defaults that might exist.
- */
- ar_ll_clear_defaults(arl);
-
- if (arl->arl_flags & ARL_F_IPMP) {
- /*
- * If this is an IPMP arl_t, we have nothing to do,
- * since we will never transmit or receive.
- */
- return;
- }
-
- ap = kmem_zalloc(sizeof (arlphy_t), KM_NOSLEEP);
- if (ap == NULL)
- goto bad;
- arl->arl_phy = ap;
-
- if ((arm = ar_m_lookup(dlia->dl_mac_type)) == NULL)
- arm = ar_m_lookup(DL_OTHER);
- ASSERT(arm != NULL);
-
- /*
- * We initialize based on parameters in the (currently) not too
- * exhaustive ar_m_tbl.
- */
- if (dlia->dl_version == DL_VERSION_2) {
- /* XXX DLPI spec allows dl_sap_length of 0 before binding. */
- ap->ap_saplen = dlia->dl_sap_length;
- ap->ap_hw_addrlen = dlia->dl_brdcst_addr_length;
- } else {
- ap->ap_saplen = arm->ar_mac_sap_length;
- ap->ap_hw_addrlen = arm->ar_mac_hw_addr_length;
- }
- ap->ap_arp_hw_type = arm->ar_mac_arp_hw_type;
-
- /*
- * Allocate the hardware and ARP addresses; note that the hardware
- * address cannot be filled in until we see the DL_BIND_ACK.
- */
- ap->ap_hw_addr = kmem_zalloc(ap->ap_hw_addrlen, KM_NOSLEEP);
- ap->ap_arp_addr = kmem_alloc(ap->ap_hw_addrlen, KM_NOSLEEP);
- if (ap->ap_hw_addr == NULL || ap->ap_arp_addr == NULL)
- goto bad;
-
- if (dlia->dl_version == DL_VERSION_2) {
- if ((up = mi_offset_param(mp, dlia->dl_brdcst_addr_offset,
- ap->ap_hw_addrlen)) == NULL)
- goto bad;
- bcopy(up, ap->ap_arp_addr, ap->ap_hw_addrlen);
- } else {
- /*
- * No choice but to assume a broadcast address of all ones,
- * known to work on some popular networks.
- */
- (void) memset(ap->ap_arp_addr, ~0, ap->ap_hw_addrlen);
- }
-
- /*
- * Make us a template DL_UNITDATA_REQ message which we will use for
- * broadcasting resolution requests, and which we will clone to hand
- * back as responses to the protocols.
- */
- ap->ap_xmit_mp = ar_dlpi_comm(DL_UNITDATA_REQ, ap->ap_hw_addrlen +
- ABS(ap->ap_saplen) + sizeof (dl_unitdata_req_t));
- if (ap->ap_xmit_mp == NULL)
- goto bad;
-
- dlur = (dl_unitdata_req_t *)ap->ap_xmit_mp->b_rptr;
- dlur->dl_priority.dl_min = 0;
- dlur->dl_priority.dl_max = 0;
- dlur->dl_dest_addr_length = ap->ap_hw_addrlen + ABS(ap->ap_saplen);
- dlur->dl_dest_addr_offset = sizeof (dl_unitdata_req_t);
-
- /* NOTE: the destination address and sap offsets are permanently set */
- ap->ap_xmit_sapoff = dlur->dl_dest_addr_offset;
- ap->ap_xmit_addroff = dlur->dl_dest_addr_offset;
- if (ap->ap_saplen < 0)
- ap->ap_xmit_sapoff += ap->ap_hw_addrlen; /* sap last */
- else
- ap->ap_xmit_addroff += ap->ap_saplen; /* addr last */
-
- *(uint16_t *)((caddr_t)dlur + ap->ap_xmit_sapoff) = ETHERTYPE_ARP;
- return;
-bad:
- ar_ll_clear_defaults(arl);
-}
-
-static void
-ar_ll_clear_defaults(arl_t *arl)
-{
- arlphy_t *ap = arl->arl_phy;
-
- if (ap != NULL) {
- arl->arl_phy = NULL;
- if (ap->ap_hw_addr != NULL)
- kmem_free(ap->ap_hw_addr, ap->ap_hw_addrlen);
- if (ap->ap_arp_addr != NULL)
- kmem_free(ap->ap_arp_addr, ap->ap_hw_addrlen);
- freemsg(ap->ap_xmit_mp);
- kmem_free(ap, sizeof (arlphy_t));
- }
-}
-
-static void
-ar_ll_down(arl_t *arl)
-{
- mblk_t *mp;
- ar_t *ar;
-
- ASSERT(arl->arl_state == ARL_S_UP);
-
- /* Let's break the association between an ARL and IP instance */
- ar = (ar_t *)arl->arl_rq->q_ptr;
- if (ar->ar_arl_ip_assoc != NULL) {
- ASSERT(ar->ar_arl_ip_assoc->ar_arl_ip_assoc != NULL &&
- ar->ar_arl_ip_assoc->ar_arl_ip_assoc == ar);
- ar->ar_arl_ip_assoc->ar_arl_ip_assoc = NULL;
- ar->ar_arl_ip_assoc = NULL;
- }
-
- arl->arl_state = ARL_S_PENDING;
-
- mp = arl->arl_unbind_mp;
- ASSERT(mp != NULL);
- ar_dlpi_send(arl, mp);
- arl->arl_unbind_mp = NULL;
-
- if (arl->arl_provider_style == DL_STYLE2) {
- mp = arl->arl_detach_mp;
- ASSERT(mp != NULL);
- ar_dlpi_send(arl, mp);
- arl->arl_detach_mp = NULL;
- }
-}
-
-static int
-ar_ll_up(arl_t *arl)
-{
- mblk_t *attach_mp = NULL;
- mblk_t *bind_mp = NULL;
- mblk_t *detach_mp = NULL;
- mblk_t *unbind_mp = NULL;
- mblk_t *info_mp = NULL;
- mblk_t *notify_mp = NULL;
-
- ASSERT(arl->arl_state == ARL_S_DOWN);
-
- if (arl->arl_provider_style == DL_STYLE2) {
- attach_mp =
- ar_dlpi_comm(DL_ATTACH_REQ, sizeof (dl_attach_req_t));
- if (attach_mp == NULL)
- goto bad;
- ((dl_attach_req_t *)attach_mp->b_rptr)->dl_ppa =
- arl->arl_ppa;
-
- detach_mp =
- ar_dlpi_comm(DL_DETACH_REQ, sizeof (dl_detach_req_t));
- if (detach_mp == NULL)
- goto bad;
- }
-
- info_mp = ar_dlpi_comm(DL_INFO_REQ, sizeof (dl_info_req_t));
- if (info_mp == NULL)
- goto bad;
-
- /* Allocate and initialize a bind message. */
- bind_mp = ar_dlpi_comm(DL_BIND_REQ, sizeof (dl_bind_req_t));
- if (bind_mp == NULL)
- goto bad;
- ((dl_bind_req_t *)bind_mp->b_rptr)->dl_sap = ETHERTYPE_ARP;
- ((dl_bind_req_t *)bind_mp->b_rptr)->dl_service_mode = DL_CLDLS;
-
- unbind_mp = ar_dlpi_comm(DL_UNBIND_REQ, sizeof (dl_unbind_req_t));
- if (unbind_mp == NULL)
- goto bad;
-
- notify_mp = ar_dlpi_comm(DL_NOTIFY_REQ, sizeof (dl_notify_req_t));
- if (notify_mp == NULL)
- goto bad;
- ((dl_notify_req_t *)notify_mp->b_rptr)->dl_notifications =
- DL_NOTE_LINK_UP | DL_NOTE_LINK_DOWN | DL_NOTE_REPLUMB;
-
- arl->arl_state = ARL_S_PENDING;
- if (arl->arl_provider_style == DL_STYLE2) {
- ar_dlpi_send(arl, attach_mp);
- ASSERT(detach_mp != NULL);
- arl->arl_detach_mp = detach_mp;
- }
- ar_dlpi_send(arl, info_mp);
- ar_dlpi_send(arl, bind_mp);
- arl->arl_unbind_mp = unbind_mp;
- ar_dlpi_send(arl, notify_mp);
- return (0);
-
-bad:
- freemsg(attach_mp);
- freemsg(bind_mp);
- freemsg(detach_mp);
- freemsg(unbind_mp);
- freemsg(info_mp);
- freemsg(notify_mp);
- return (ENOMEM);
-}
-
-/* Process mapping add requests from external messages. */
-static int
-ar_mapping_add(queue_t *q, mblk_t *mp_orig)
-{
- arma_t *arma;
- mblk_t *mp = mp_orig;
- ace_t *ace;
- uchar_t *hw_addr;
- uint32_t hw_addr_len;
- uchar_t *proto_addr;
- uint32_t proto_addr_len;
- uchar_t *proto_mask;
- uchar_t *proto_extract_mask;
- uint32_t hw_extract_start;
- arl_t *arl;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- /* We handle both M_IOCTL and M_PROTO messages. */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- arl = ar_ll_lookup_from_mp(as, mp);
- if (arl == NULL)
- return (EINVAL);
- /*
- * Newly received commands from clients go to the tail of the queue.
- */
- if (CMD_NEEDS_QUEUEING(mp_orig, arl)) {
- DTRACE_PROBE3(madd_enqueued, queue_t *, q, mblk_t *, mp_orig,
- arl_t *, arl);
- ar_cmd_enqueue(arl, mp_orig, q, AR_MAPPING_ADD, B_TRUE);
- return (EINPROGRESS);
- }
- mp_orig->b_prev = NULL;
-
- arma = (arma_t *)mp->b_rptr;
- ace = ar_ce_lookup_from_area(as, mp, ar_ce_lookup_mapping);
- if (ace != NULL)
- ar_ce_delete(ace);
- hw_addr_len = arma->arma_hw_addr_length;
- hw_addr = mi_offset_paramc(mp, arma->arma_hw_addr_offset, hw_addr_len);
- proto_addr_len = arma->arma_proto_addr_length;
- proto_addr = mi_offset_paramc(mp, arma->arma_proto_addr_offset,
- proto_addr_len);
- proto_mask = mi_offset_paramc(mp, arma->arma_proto_mask_offset,
- proto_addr_len);
- proto_extract_mask = mi_offset_paramc(mp,
- arma->arma_proto_extract_mask_offset, proto_addr_len);
- hw_extract_start = arma->arma_hw_mapping_start;
- if (proto_mask == NULL || proto_extract_mask == NULL) {
- DTRACE_PROBE2(madd_illegal_mask, arl_t *, arl, arpa_t *, arma);
- return (EINVAL);
- }
- return (ar_ce_create(
- arl,
- arma->arma_proto,
- hw_addr,
- hw_addr_len,
- proto_addr,
- proto_addr_len,
- proto_mask,
- proto_extract_mask,
- hw_extract_start,
- NULL,
- arma->arma_flags | ACE_F_MAPPING));
-}
-
-static boolean_t
-ar_mask_all_ones(uchar_t *mask, uint32_t mask_len)
-{
- if (mask == NULL)
- return (B_TRUE);
-
- while (mask_len-- > 0) {
- if (*mask++ != 0xFF) {
- return (B_FALSE);
- }
- }
- return (B_TRUE);
-}
-
-/* Find an entry for a particular MAC type in the ar_m_tbl. */
-static ar_m_t *
-ar_m_lookup(t_uscalar_t mac_type)
-{
- ar_m_t *arm;
-
- for (arm = ar_m_tbl; arm < A_END(ar_m_tbl); arm++) {
- if (arm->ar_mac_type == mac_type)
- return (arm);
- }
- return (NULL);
-}
-
-/* Respond to Named Dispatch requests. */
-static int
-ar_nd_ioctl(queue_t *q, mblk_t *mp)
-{
- ar_t *ar = (ar_t *)q->q_ptr;
- arp_stack_t *as = ar->ar_as;
-
- if (DB_TYPE(mp) == M_IOCTL && nd_getset(q, as->as_nd, mp))
- return (0);
- return (ENOENT);
-}
-
-/* ARP module open routine. */
-static int
-ar_open(queue_t *q, dev_t *devp, int flag, int sflag, cred_t *credp)
-{
- ar_t *ar;
- int err;
- queue_t *tmp_q;
- mblk_t *mp;
- netstack_t *ns;
- arp_stack_t *as;
-
- TRACE_1(TR_FAC_ARP, TR_ARP_OPEN,
- "arp_open: q %p", q);
- /* Allow a reopen. */
- if (q->q_ptr != NULL) {
- return (0);
- }
-
- ns = netstack_find_by_cred(credp);
- ASSERT(ns != NULL);
- as = ns->netstack_arp;
- ASSERT(as != NULL);
-
- /* mi_open_comm allocates the instance data structure, etc. */
- err = mi_open_comm(&as->as_head, sizeof (ar_t), q, devp, flag, sflag,
- credp);
- if (err) {
- netstack_rele(as->as_netstack);
- return (err);
- }
-
- /*
- * We are D_MTPERMOD so it is safe to do qprocson before
- * the instance data has been initialized.
- */
- qprocson(q);
-
- ar = (ar_t *)q->q_ptr;
- ar->ar_rq = q;
- q = WR(q);
- ar->ar_wq = q;
- crhold(credp);
- ar->ar_credp = credp;
- ar->ar_as = as;
-
- /*
- * Probe for the DLPI info if we are not pushed on IP or UDP. Wait for
- * the reply. In case of error call ar_close() which will take
- * care of doing everything required to close this instance, such
- * as freeing the arl, restarting the timer on a different queue etc.
- */
- if (strcmp(q->q_next->q_qinfo->qi_minfo->mi_idname, "ip") == 0 ||
- strcmp(q->q_next->q_qinfo->qi_minfo->mi_idname, "udp") == 0) {
- arc_t *arc;
-
- /*
- * We are pushed directly on top of IP or UDP. There is no need
- * to send down a DL_INFO_REQ. Return success. This could
- * either be an ill stream (i.e. <arp-IP-Driver> stream)
- * or a stream corresponding to an open of /dev/arp
- * (i.e. <arp-IP> stream). Note that we don't support
- * pushing some module in between arp and IP.
- *
- * Tell IP, though, that we're an extended implementation, so
- * it knows to expect a DAD response after bringing an
- * interface up. Old ATM drivers won't do this, and IP will
- * just bring the interface up immediately.
- */
- ar->ar_on_ill_stream = (q->q_next->q_next != NULL);
- if (!ar->ar_on_ill_stream || arp_no_defense)
- return (0);
- mp = allocb(sizeof (arc_t), BPRI_MED);
- if (mp == NULL) {
- (void) ar_close(RD(q));
- return (ENOMEM);
- }
- DB_TYPE(mp) = M_CTL;
- arc = (arc_t *)mp->b_rptr;
- mp->b_wptr = mp->b_rptr + sizeof (arc_t);
- arc->arc_cmd = AR_ARP_EXTEND;
- putnext(q, mp);
- return (0);
- }
- tmp_q = q;
- /* Get the driver's queue */
- while (tmp_q->q_next != NULL)
- tmp_q = tmp_q->q_next;
-
- ASSERT(tmp_q->q_qinfo->qi_minfo != NULL);
-
- if (strcmp(tmp_q->q_qinfo->qi_minfo->mi_idname, "ip") == 0 ||
- strcmp(tmp_q->q_qinfo->qi_minfo->mi_idname, "udp") == 0) {
- /*
- * We don't support pushing ARP arbitrarily on an IP or UDP
- * driver stream. ARP has to be pushed directly above IP or
- * UDP.
- */
- (void) ar_close(RD(q));
- return (ENOTSUP);
- } else {
- /*
- * Send down a DL_INFO_REQ so we can find out what we are
- * talking to.
- */
- mp = ar_dlpi_comm(DL_INFO_REQ, sizeof (dl_info_req_t));
- if (mp == NULL) {
- (void) ar_close(RD(q));
- return (ENOMEM);
- }
- putnext(ar->ar_wq, mp);
- while (ar->ar_arl == NULL) {
- if (!qwait_sig(ar->ar_rq)) {
- (void) ar_close(RD(q));
- return (EINTR);
- }
- }
- }
- return (0);
-}
-
-/* Get current value of Named Dispatch item. */
-/* ARGSUSED */
-static int
-ar_param_get(queue_t *q, mblk_t *mp, caddr_t cp, cred_t *cr)
-{
- arpparam_t *arppa = (arpparam_t *)cp;
-
- (void) mi_mpprintf(mp, "%d", arppa->arp_param_value);
- return (0);
-}
-
-/*
- * Walk through the param array specified registering each element with the
- * named dispatch handler.
- */
-static boolean_t
-ar_param_register(IDP *ndp, arpparam_t *arppa, int cnt)
-{
- for (; cnt-- > 0; arppa++) {
- if (arppa->arp_param_name && arppa->arp_param_name[0]) {
- if (!nd_load(ndp, arppa->arp_param_name,
- ar_param_get, ar_param_set,
- (caddr_t)arppa)) {
- nd_free(ndp);
- return (B_FALSE);
- }
- }
- }
- return (B_TRUE);
-}
-
-/* Set new value of Named Dispatch item. */
-/* ARGSUSED */
-static int
-ar_param_set(queue_t *q, mblk_t *mp, char *value, caddr_t cp, cred_t *cr)
-{
- long new_value;
- arpparam_t *arppa = (arpparam_t *)cp;
-
- if (ddi_strtol(value, NULL, 10, &new_value) != 0 ||
- new_value < arppa->arp_param_min ||
- new_value > arppa->arp_param_max) {
- return (EINVAL);
- }
- arppa->arp_param_value = new_value;
- return (0);
-}
-
-/*
- * Process an I_PLINK ioctl. If the lower stream is an arp device stream,
- * append another mblk to the chain, that will carry the device name,
- * and the muxid. IP uses this info to lookup the corresponding ill, and
- * set the ill_arp_muxid atomically, as part of the I_PLINK, instead of
- * waiting for the SIOCSLIFMUXID. (which may never happen if ifconfig is
- * killed, and this has the bad effect of not being able to unplumb
- * subsequently)
- */
-static int
-ar_plink_send(queue_t *q, mblk_t *mp)
-{
- char *name;
- mblk_t *muxmp;
- mblk_t *mp1;
- ar_t *ar = (ar_t *)q->q_ptr;
- arp_stack_t *as = ar->ar_as;
- struct linkblk *li;
- struct ipmx_s *ipmxp;
- queue_t *arpwq;
-
- mp1 = mp->b_cont;
- ASSERT((mp1 != NULL) && (mp1->b_cont == NULL));
- li = (struct linkblk *)mp1->b_rptr;
- arpwq = li->l_qbot;
-
- /*
- * Allocate a new mblk which will hold an ipmx_s and chain it to
- * the M_IOCTL chain. The final chain will consist of 3 mblks,
- * namely the M_IOCTL, followed by the linkblk, followed by the ipmx_s
- */
- muxmp = allocb(sizeof (struct ipmx_s), BPRI_MED);
- if (muxmp == NULL)
- return (ENOMEM);
- ipmxp = (struct ipmx_s *)muxmp->b_wptr;
- ipmxp->ipmx_arpdev_stream = 0;
- muxmp->b_wptr += sizeof (struct ipmx_s);
- mp1->b_cont = muxmp;
-
- /*
- * The l_qbot represents the uppermost write queue of the
- * lower stream. Walk down this stream till we hit ARP.
- * We can safely walk, since STREAMS has made sure the stream
- * cannot close till the IOCACK goes up, and is not interruptible.
- */
- while (arpwq != NULL) {
- /*
- * Beware of broken modules like logsubr.c that
- * may not have a q_qinfo or qi_minfo.
- */
- if ((q->q_qinfo != NULL) && (q->q_qinfo->qi_minfo != NULL)) {
- name = arpwq->q_qinfo->qi_minfo->mi_idname;
- if (name != NULL && name[0] != NULL &&
- (strcmp(name, arp_mod_info.mi_idname) == 0))
- break;
- }
- arpwq = arpwq->q_next;
- }
-
- /*
- * Check if arpwq corresponds to an arp device stream, by walking
- * the mi list. If it does, then add the muxid and device name info
- * for use by IP. IP will send the M_IOCACK.
- */
- if (arpwq != NULL) {
- for (ar = (ar_t *)mi_first_ptr(&as->as_head); ar != NULL;
- ar = (ar_t *)mi_next_ptr(&as->as_head, (void *)ar)) {
- if ((ar->ar_wq == arpwq) && (ar->ar_arl != NULL)) {
- ipmxp->ipmx_arpdev_stream = 1;
- (void) strcpy((char *)ipmxp->ipmx_name,
- ar->ar_arl->arl_name);
- break;
- }
- }
- }
-
- putnext(q, mp);
- return (0);
-}
-
-/*
- * ar_ce_walk routine to delete any outstanding queries for an ar that is
- * going away.
- */
-static void
-ar_query_delete(ace_t *ace, void *arg)
-{
- ar_t *ar = arg;
- mblk_t **mpp = &ace->ace_query_mp;
- mblk_t *mp;
- arp_stack_t *as = ar->ar_as;
- ip_stack_t *ipst = as->as_netstack->netstack_ip;
-
- while ((mp = *mpp) != NULL) {
- /* The response queue was stored in the query b_prev. */
- if ((queue_t *)mp->b_prev == ar->ar_wq ||
- (queue_t *)mp->b_prev == ar->ar_rq) {
- *mpp = mp->b_next;
- if (DB_TYPE(mp) == M_PROTO &&
- *(uint32_t *)mp->b_rptr == AR_ENTRY_QUERY) {
- BUMP_IRE_STATS(ipst->ips_ire_stats_v4,
- ire_stats_freed);
- }
- inet_freemsg(mp);
- } else {
- mpp = &mp->b_next;
- }
- }
-}
-
-/*
- * This routine is called either when an address resolution has just been
- * found, or when it is time to give, or in some other error situation.
- * If a non-zero ret_val is provided, any outstanding queries for the
- * specified ace will be completed using that error value. Otherwise,
- * the completion status will depend on whether the address has been
- * resolved.
- */
-static void
-ar_query_reply(ace_t *ace, int ret_val, uchar_t *proto_addr,
- uint32_t proto_addr_len)
-{
- mblk_t *areq_mp;
- mblk_t *mp;
- mblk_t *xmit_mp;
- queue_t *arl_wq = ace->ace_arl->arl_wq;
- arp_stack_t *as = ARL_TO_ARPSTACK(ace->ace_arl);
- ip_stack_t *ipst = as->as_netstack->netstack_ip;
- arlphy_t *ap = ace->ace_xmit_arl->arl_phy;
-
- /*
- * On error or completion for a query, we need to shut down the timer.
- * However, the timer must not be stopped for an interface doing
- * Duplicate Address Detection, or it will never finish that phase.
- */
- if (!(ace->ace_flags & (ACE_F_UNVERIFIED | ACE_F_AUTHORITY)))
- mi_timer(arl_wq, ace->ace_mp, -1L);
-
- /* Establish the return value appropriate. */
- if (ret_val == 0) {
- if (!ACE_RESOLVED(ace) || ap == NULL)
- ret_val = ENXIO;
- }
- /* Terminate all outstanding queries. */
- while ((mp = ace->ace_query_mp) != 0) {
- /* The response queue was saved in b_prev. */
- queue_t *q = (queue_t *)mp->b_prev;
- mp->b_prev = NULL;
- ace->ace_query_mp = mp->b_next;
- mp->b_next = NULL;
- /*
- * If we have the answer, attempt to get a copy of the xmit
- * template to prepare for the client.
- */
- if (ret_val == 0 &&
- (xmit_mp = copyb(ap->ap_xmit_mp)) == NULL) {
- /* Too bad, buy more memory. */
- ret_val = ENOMEM;
- }
- /* Complete the response based on how the request arrived. */
- if (DB_TYPE(mp) == M_IOCTL) {
- struct iocblk *ioc = (struct iocblk *)mp->b_rptr;
-
- ioc->ioc_error = ret_val;
- if (ret_val != 0) {
- DB_TYPE(mp) = M_IOCNAK;
- ioc->ioc_count = 0;
- putnext(q, mp);
- continue;
- }
- /*
- * Return the xmit mp out with the successful IOCTL.
- */
- DB_TYPE(mp) = M_IOCACK;
- ioc->ioc_count = MBLKL(xmit_mp);
- /* Remove the areq mblk from the IOCTL. */
- areq_mp = mp->b_cont;
- mp->b_cont = areq_mp->b_cont;
- } else {
- if (ret_val != 0) {
- /* TODO: find some way to let the guy know? */
- inet_freemsg(mp);
- BUMP_IRE_STATS(ipst->ips_ire_stats_v4,
- ire_stats_freed);
- continue;
- }
- /*
- * In the M_PROTO case, the areq message is followed by
- * a message chain to be returned to the protocol. ARP
- * doesn't know (or care) what is in this chain, but in
- * the event that the reader is pondering the
- * relationship between ARP and IP (for example), the
- * areq is followed by an incipient IRE, and then the
- * original outbound packet. Here we detach the areq.
- */
- areq_mp = mp;
- mp = mp->b_cont;
- }
- ASSERT(ret_val == 0 && ap != NULL);
- if (ap->ap_saplen != 0) {
- /*
- * Copy the SAP type specified in the request into
- * the xmit mp.
- */
- areq_t *areq = (areq_t *)areq_mp->b_rptr;
- bcopy(areq->areq_sap, xmit_mp->b_rptr +
- ap->ap_xmit_sapoff, ABS(ap->ap_saplen));
- }
- /* Done with the areq message. */
- freeb(areq_mp);
- /*
- * Copy the resolved hardware address into the xmit mp
- * or perform the mapping operation.
- */
- ar_set_address(ace, xmit_mp->b_rptr + ap->ap_xmit_addroff,
- proto_addr, proto_addr_len);
- /*
- * Now insert the xmit mp after the response message. In
- * the M_IOCTL case, it will be the returned data block. In
- * the M_PROTO case, (again using IP as an example) it will
- * appear after the IRE and before the outbound packet.
- */
- xmit_mp->b_cont = mp->b_cont;
- mp->b_cont = xmit_mp;
- putnext(q, mp);
- }
-
- /*
- * Unless we are responding from a permanent cache entry, start the
- * cleanup timer or (on error) delete the entry.
- */
- if (!(ace->ace_flags & (ACE_F_PERMANENT | ACE_F_DYING))) {
- if (!ACE_RESOLVED(ace) || ap == NULL) {
- /*
- * No need to notify IP here, because the entry was
- * never resolved, so IP can't have any cached copies
- * of the address.
- */
- ar_ce_delete(ace);
- } else {
- mi_timer(arl_wq, ace->ace_mp, as->as_cleanup_interval);
- }
- }
-}
-
-/*
- * Returns number of milliseconds after which we should either rexmit or abort.
- * Return of zero means we should abort.
- */
-static clock_t
-ar_query_xmit(arp_stack_t *as, ace_t *ace)
-{
- areq_t *areq;
- mblk_t *mp;
- uchar_t *proto_addr;
- uchar_t *sender_addr;
- ace_t *src_ace;
- arl_t *xmit_arl = ace->ace_xmit_arl;
-
- mp = ace->ace_query_mp;
- /*
- * ar_query_delete may have just blown off the outstanding
- * ace_query_mp entries because the client who sent the query
- * went away. If this happens just before the ace_mp timer
- * goes off, we'd find a null ace_query_mp which is not an error.
- * The unresolved ace itself, and the timer, will be removed
- * when the arl stream goes away.
- */
- if (!mp)
- return (0);
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- areq = (areq_t *)mp->b_rptr;
- if (areq->areq_xmit_count == 0)
- return (0);
- areq->areq_xmit_count--;
- proto_addr = mi_offset_paramc(mp, areq->areq_target_addr_offset,
- areq->areq_target_addr_length);
- sender_addr = mi_offset_paramc(mp, areq->areq_sender_addr_offset,
- areq->areq_sender_addr_length);
-
- /*
- * Get the ace for the sender address, so that we can verify that
- * we have one and that DAD has completed.
- */
- src_ace = ar_ce_lookup(xmit_arl, areq->areq_proto, sender_addr,
- areq->areq_sender_addr_length);
- if (src_ace == NULL) {
- DTRACE_PROBE3(xmit_no_source, ace_t *, ace, areq_t *, areq,
- uchar_t *, sender_addr);
- return (0);
- }
-
- /*
- * If we haven't yet finished duplicate address checking on this source
- * address, then do *not* use it on the wire. Doing so will corrupt
- * the world's caches. Just allow the timer to restart. Note that
- * duplicate address checking will eventually complete one way or the
- * other, so this cannot go on "forever."
- */
- if (src_ace->ace_flags & ACE_F_UNVERIFIED) {
- DTRACE_PROBE2(xmit_source_unverified, ace_t *, ace,
- ace_t *, src_ace);
- areq->areq_xmit_count++;
- return (areq->areq_xmit_interval);
- }
-
- DTRACE_PROBE3(xmit_send, ace_t *, ace, ace_t *, src_ace,
- areq_t *, areq);
-
- ar_xmit(xmit_arl, ARP_REQUEST, areq->areq_proto,
- areq->areq_sender_addr_length, xmit_arl->arl_phy->ap_hw_addr,
- sender_addr, xmit_arl->arl_phy->ap_arp_addr, proto_addr, NULL, as);
- src_ace->ace_last_bcast = ddi_get_lbolt();
- return (areq->areq_xmit_interval);
-}
-
-/* Our read side put procedure. */
-static void
-ar_rput(queue_t *q, mblk_t *mp)
-{
- arh_t *arh;
- arl_t *arl;
- arl_t *client_arl;
- ace_t *dst_ace;
- uchar_t *dst_paddr;
- int err;
- uint32_t hlen;
- struct iocblk *ioc;
- mblk_t *mp1;
- int op;
- uint32_t plen;
- uint32_t proto;
- uchar_t *src_haddr;
- uchar_t *src_paddr;
- uchar_t *dst_haddr;
- boolean_t is_probe;
- boolean_t is_unicast = B_FALSE;
- dl_unitdata_ind_t *dlindp;
- int i;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- TRACE_1(TR_FAC_ARP, TR_ARP_RPUT_START,
- "arp_rput_start: q %p", q);
-
- /*
- * We handle ARP commands from below both in M_IOCTL and M_PROTO
- * messages. Actual ARP requests and responses will show up as
- * M_PROTO messages containing DL_UNITDATA_IND blocks.
- */
- switch (DB_TYPE(mp)) {
- case M_IOCTL:
- err = ar_cmd_dispatch(q, mp, B_FALSE);
- switch (err) {
- case ENOENT:
- DB_TYPE(mp) = M_IOCNAK;
- if ((mp1 = mp->b_cont) != 0) {
- /*
- * Collapse the data as a note to the
- * originator.
- */
- mp1->b_wptr = mp1->b_rptr;
- }
- break;
- case EINPROGRESS:
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "ioctl/inprogress");
- return;
- default:
- DB_TYPE(mp) = M_IOCACK;
- break;
- }
- ioc = (struct iocblk *)mp->b_rptr;
- ioc->ioc_error = err;
- if ((mp1 = mp->b_cont) != 0)
- ioc->ioc_count = MBLKL(mp1);
- else
- ioc->ioc_count = 0;
- qreply(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "ioctl");
- return;
- case M_CTL:
- /*
- * IP is acking the AR_ARP_CLOSING message that we sent
- * in ar_close.
- */
- if (MBLKL(mp) == sizeof (arc_t)) {
- if (((arc_t *)mp->b_rptr)->arc_cmd == AR_ARP_CLOSING)
- ((ar_t *)q->q_ptr)->ar_ip_acked_close = 1;
- }
- freemsg(mp);
- return;
- case M_PCPROTO:
- case M_PROTO:
- dlindp = (dl_unitdata_ind_t *)mp->b_rptr;
- if (MBLKL(mp) >= sizeof (dl_unitdata_ind_t) &&
- dlindp->dl_primitive == DL_UNITDATA_IND) {
- is_unicast = (dlindp->dl_group_address == 0);
- arl = ((ar_t *)q->q_ptr)->ar_arl;
- if (arl != NULL && arl->arl_phy != NULL) {
- /* Real messages from the wire! */
- break;
- }
- putnext(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "default");
- return;
- }
- err = ar_cmd_dispatch(q, mp, B_FALSE);
- switch (err) {
- case ENOENT:
- /* Miscellaneous DLPI messages get shuffled off. */
- ar_rput_dlpi(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "proto/dlpi");
- break;
- case EINPROGRESS:
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "proto");
- break;
- default:
- inet_freemsg(mp);
- break;
- }
- return;
- default:
- putnext(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "default");
- return;
- }
- /*
- * If the IFF_NOARP flag is on, then do not process any
- * incoming ARP_REQUEST or incoming ARP_RESPONSE.
- */
- if (arl->arl_flags & ARL_F_NOARP) {
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "interface has IFF_NOARP set");
- return;
- }
-
- /*
- * What we should have at this point is a DL_UNITDATA_IND message
- * followed by an ARP packet. We do some initial checks and then
- * get to work.
- */
- mp1 = mp->b_cont;
- if (mp1 == NULL) {
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "baddlpi");
- return;
- }
- if (mp1->b_cont != NULL) {
- /* No fooling around with funny messages. */
- if (!pullupmsg(mp1, -1)) {
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "pullupmsgfail");
- return;
- }
- }
- arh = (arh_t *)mp1->b_rptr;
- hlen = arh->arh_hlen;
- plen = arh->arh_plen;
- if (MBLKL(mp1) < ARH_FIXED_LEN + 2 * hlen + 2 * plen) {
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "short");
- return;
- }
- /*
- * hlen 0 is used for RFC 1868 UnARP.
- *
- * Note that the rest of the code checks that hlen is what we expect
- * for this hardware address type, so might as well discard packets
- * here that don't match.
- */
- if ((hlen > 0 && hlen != arl->arl_phy->ap_hw_addrlen) || plen == 0) {
- DTRACE_PROBE2(rput_bogus, arl_t *, arl, mblk_t *, mp1);
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "hlenzero/plenzero");
- return;
- }
- /*
- * Historically, Solaris has been lenient about hardware type numbers.
- * We should check here, but don't.
- */
- DTRACE_PROBE2(rput_normal, arl_t *, arl, arh_t *, arh);
-
- DTRACE_PROBE3(arp__physical__in__start,
- arl_t *, arl, arh_t *, arh, mblk_t *, mp);
-
- ARP_HOOK_IN(as->as_arp_physical_in_event, as->as_arp_physical_in,
- arl->arl_index, arh, mp, mp1, as);
-
- DTRACE_PROBE1(arp__physical__in__end, mblk_t *, mp);
-
- if (mp == NULL)
- return;
-
- proto = (uint32_t)BE16_TO_U16(arh->arh_proto);
- src_haddr = (uchar_t *)arh;
- src_haddr = &src_haddr[ARH_FIXED_LEN];
- src_paddr = &src_haddr[hlen];
- dst_haddr = &src_haddr[hlen + plen];
- dst_paddr = &src_haddr[hlen + plen + hlen];
- op = BE16_TO_U16(arh->arh_operation);
-
- /* Determine if this is just a probe */
- for (i = 0; i < plen; i++)
- if (src_paddr[i] != 0)
- break;
- is_probe = i >= plen;
-
- /*
- * RFC 826: first check if the <protocol, sender protocol address> is
- * in the cache, if there is a sender protocol address. Note that this
- * step also handles resolutions based on source.
- *
- * Note that IP expects that each notification it receives will be
- * tied to the ill it received it on. Thus, we must talk to it over
- * the arl tied to the resolved IP address (if any), hence client_arl.
- */
- if (is_probe)
- err = AR_NOTFOUND;
- else
- err = ar_ce_resolve_all(arl, proto, src_haddr, hlen, src_paddr,
- plen, &client_arl);
-
- switch (err) {
- case AR_BOGON:
- ar_client_notify(client_arl, mp1, AR_CN_BOGON);
- mp1 = NULL;
- break;
- case AR_FAILED:
- ar_client_notify(client_arl, mp1, AR_CN_FAILED);
- mp1 = NULL;
- break;
- case AR_LOOPBACK:
- DTRACE_PROBE2(rput_loopback, arl_t *, arl, arh_t *, arh);
- freemsg(mp1);
- mp1 = NULL;
- break;
- }
- if (mp1 == NULL) {
- freeb(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "unneeded");
- return;
- }
-
- /*
- * Now look up the destination address. By RFC 826, we ignore the
- * packet at this step if the target isn't one of our addresses. This
- * is true even if the target is something we're trying to resolve and
- * the packet is a response. To avoid duplicate responses, we also
- * ignore the packet if it was multicast/broadcast to an arl that's in
- * an IPMP group but was not the designated xmit_arl for the ACE.
- *
- * Note that in order to do this correctly, we need to know when to
- * notify IP of a change implied by the source address of the ARP
- * message. That implies that the local ARP table has entries for all
- * of the resolved entries cached in the client. This is why we must
- * notify IP when we delete a resolved entry and we know that IP may
- * have cached answers.
- */
- dst_ace = ar_ce_lookup_entry(arl, proto, dst_paddr, plen);
- if (dst_ace == NULL || !ACE_RESOLVED(dst_ace) ||
- (dst_ace->ace_xmit_arl != arl && !is_unicast) ||
- !(dst_ace->ace_flags & ACE_F_PUBLISH)) {
- /*
- * Let the client know if the source mapping has changed, even
- * if the destination provides no useful information for the
- * client.
- */
- if (err == AR_CHANGED)
- ar_client_notify(client_arl, mp1, AR_CN_ANNOUNCE);
- else
- freemsg(mp1);
- freeb(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "nottarget");
- return;
- }
-
- /*
- * If the target is unverified by DAD, then one of two things is true:
- * either it's someone else claiming this address (on a probe or an
- * announcement) or it's just a regular request. The former is
- * failure, but a regular request is not.
- */
- if (dst_ace->ace_flags & ACE_F_UNVERIFIED) {
- /*
- * Check for a reflection. Some misbehaving bridges will
- * reflect our own transmitted packets back to us.
- */
- if (hlen == dst_ace->ace_hw_addr_length &&
- bcmp(src_haddr, dst_ace->ace_hw_addr, hlen) == 0) {
- DTRACE_PROBE3(rput_probe_reflected, arl_t *, arl,
- arh_t *, arh, ace_t *, dst_ace);
- freeb(mp);
- freemsg(mp1);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "reflection");
- return;
- }
-
- /*
- * Conflicts seen via the wrong interface may be bogus.
- * Multiple interfaces on the same segment imply any conflict
- * will also be seen via the correct interface, so we can ignore
- * anything not matching the arl from the ace.
- */
- if (arl != dst_ace->ace_arl) {
- DTRACE_PROBE3(rput_probe_misdirect, arl_t *, arl,
- arh_t *, arh, ace_t *, dst_ace);
- freeb(mp);
- freemsg(mp1);
- return;
- }
- /*
- * Responses targeting our HW address that are not responses to
- * our DAD probe must be ignored as they are related to requests
- * sent before DAD was restarted. Note: response to our DAD
- * probe will have been handled by ar_ce_resolve_all() above.
- */
- if (op == ARP_RESPONSE &&
- (bcmp(dst_haddr, dst_ace->ace_hw_addr, hlen) == 0)) {
- DTRACE_PROBE3(rput_probe_stale, arl_t *, arl,
- arh_t *, arh, ace_t *, dst_ace);
- freeb(mp);
- freemsg(mp1);
- return;
- }
- /*
- * Responses targeted to HW addresses which are not ours but
- * sent to our unverified proto address are also conflicts.
- * These may be reported by a proxy rather than the interface
- * with the conflicting address, dst_paddr is in conflict
- * rather than src_paddr. To ensure IP can locate the correct
- * ipif to take down, it is necessary to copy dst_paddr to
- * the src_paddr field before sending it to IP. The same is
- * required for probes, where src_paddr will be INADDR_ANY.
- */
- if (is_probe) {
- /*
- * In this case, client_arl will be invalid (e.g.,
- * since probes don't have a valid sender address).
- * But dst_ace has the appropriate arl.
- */
- bcopy(dst_paddr, src_paddr, plen);
- ar_client_notify(dst_ace->ace_arl, mp1, AR_CN_FAILED);
- ar_ce_delete(dst_ace);
- } else if (op == ARP_RESPONSE) {
- bcopy(dst_paddr, src_paddr, plen);
- ar_client_notify(client_arl, mp1, AR_CN_FAILED);
- ar_ce_delete(dst_ace);
- } else if (err == AR_CHANGED) {
- ar_client_notify(client_arl, mp1, AR_CN_ANNOUNCE);
- } else {
- DTRACE_PROBE3(rput_request_unverified, arl_t *, arl,
- arh_t *, arh, ace_t *, dst_ace);
- freemsg(mp1);
- }
- freeb(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "unverified");
- return;
- }
-
- /*
- * If it's a request, then we reply to this, and if we think the
- * sender's unknown, then we create an entry to avoid unnecessary ARPs.
- * The design assumption is that someone ARPing us is likely to send us
- * a packet soon, and that we'll want to reply to it.
- */
- if (op == ARP_REQUEST) {
- const uchar_t *dstaddr = src_haddr;
- clock_t now;
-
- /*
- * This implements periodic address defense based on a modified
- * version of the RFC 3927 requirements. Instead of sending a
- * broadcasted reply every time, as demanded by the RFC, we
- * send at most one broadcast reply per arp_broadcast_interval.
- */
- now = ddi_get_lbolt();
- if ((now - dst_ace->ace_last_bcast) >
- MSEC_TO_TICK(as->as_broadcast_interval)) {
- DTRACE_PROBE3(rput_bcast_reply, arl_t *, arl,
- arh_t *, arh, ace_t *, dst_ace);
- dst_ace->ace_last_bcast = now;
- dstaddr = arl->arl_phy->ap_arp_addr;
- /*
- * If this is one of the long-suffering entries, then
- * pull it out now. It no longer needs separate
- * defense, because we're doing now that with this
- * broadcasted reply.
- */
- dst_ace->ace_flags &= ~ACE_F_DELAYED;
- }
-
- ar_xmit(arl, ARP_RESPONSE, dst_ace->ace_proto, plen,
- dst_ace->ace_hw_addr, dst_ace->ace_proto_addr,
- src_haddr, src_paddr, dstaddr, as);
- if (!is_probe && err == AR_NOTFOUND &&
- ar_ce_create(OWNING_ARL(arl), proto, src_haddr, hlen,
- src_paddr, plen, NULL, NULL, 0, NULL, 0) == 0) {
- ace_t *ace;
-
- ace = ar_ce_lookup(arl, proto, src_paddr, plen);
- ASSERT(ace != NULL);
- mi_timer(ace->ace_arl->arl_wq, ace->ace_mp,
- as->as_cleanup_interval);
- }
- }
- if (err == AR_CHANGED) {
- freeb(mp);
- ar_client_notify(client_arl, mp1, AR_CN_ANNOUNCE);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "reqchange");
- } else {
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_RPUT_END,
- "arp_rput_end: q %p (%S)", q, "end");
- }
-}
-
-static void
-ar_ce_restart_dad(ace_t *ace, void *arl_arg)
-{
- arl_t *arl = arl_arg;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- if ((ace->ace_xmit_arl == arl) &&
- (ace->ace_flags & (ACE_F_UNVERIFIED|ACE_F_DAD_ABORTED)) ==
- (ACE_F_UNVERIFIED|ACE_F_DAD_ABORTED)) {
- /*
- * Slight cheat here: we don't use the initial probe delay
- * in this obscure case.
- */
- if (ace->ace_flags & ACE_F_FAST) {
- ace->ace_xmit_count = as->as_fastprobe_count;
- ace->ace_xmit_interval = as->as_fastprobe_interval;
- } else {
- ace->ace_xmit_count = as->as_probe_count;
- ace->ace_xmit_interval = as->as_probe_interval;
- }
- ace->ace_flags &= ~ACE_F_DAD_ABORTED;
- ace_set_timer(ace, B_FALSE);
- }
-}
-
-/* DLPI messages, other than DL_UNITDATA_IND are handled here. */
-static void
-ar_rput_dlpi(queue_t *q, mblk_t *mp)
-{
- ar_t *ar = q->q_ptr;
- arl_t *arl = ar->ar_arl;
- arlphy_t *ap = NULL;
- union DL_primitives *dlp;
- const char *err_str;
- arp_stack_t *as = ar->ar_as;
-
- if (arl != NULL)
- ap = arl->arl_phy;
-
- if (MBLKL(mp) < sizeof (dlp->dl_primitive)) {
- putnext(q, mp);
- return;
- }
- dlp = (union DL_primitives *)mp->b_rptr;
- switch (dlp->dl_primitive) {
- case DL_ERROR_ACK:
- /*
- * ce is confused about how DLPI works, so we have to interpret
- * an "error" on DL_NOTIFY_ACK (which we never could have sent)
- * as really meaning an error on DL_NOTIFY_REQ.
- *
- * Note that supporting DL_NOTIFY_REQ is optional, so printing
- * out an error message on the console isn't warranted except
- * for debug.
- */
- if (dlp->error_ack.dl_error_primitive == DL_NOTIFY_ACK ||
- dlp->error_ack.dl_error_primitive == DL_NOTIFY_REQ) {
- ar_dlpi_done(arl, DL_NOTIFY_REQ);
- freemsg(mp);
- return;
- }
- err_str = dl_primstr(dlp->error_ack.dl_error_primitive);
- DTRACE_PROBE2(rput_dl_error, arl_t *, arl,
- dl_error_ack_t *, &dlp->error_ack);
- switch (dlp->error_ack.dl_error_primitive) {
- case DL_UNBIND_REQ:
- if (arl->arl_provider_style == DL_STYLE1)
- arl->arl_state = ARL_S_DOWN;
- break;
- case DL_DETACH_REQ:
- case DL_BIND_REQ:
- arl->arl_state = ARL_S_DOWN;
- break;
- case DL_ATTACH_REQ:
- break;
- default:
- /* If it's anything else, we didn't send it. */
- putnext(q, mp);
- return;
- }
- ar_dlpi_done(arl, dlp->error_ack.dl_error_primitive);
- (void) mi_strlog(q, 1, SL_ERROR|SL_TRACE,
- "ar_rput_dlpi: %s failed, dl_errno %d, dl_unix_errno %d",
- err_str, dlp->error_ack.dl_errno,
- dlp->error_ack.dl_unix_errno);
- break;
- case DL_INFO_ACK:
- DTRACE_PROBE2(rput_dl_info, arl_t *, arl,
- dl_info_ack_t *, &dlp->info_ack);
- if (arl != NULL && arl->arl_dlpi_pending == DL_INFO_REQ) {
- /*
- * We have a response back from the driver. Go set up
- * transmit defaults.
- */
- ar_ll_set_defaults(arl, mp);
- ar_dlpi_done(arl, DL_INFO_REQ);
- } else if (arl == NULL) {
- ar_ll_init(as, ar, mp);
- }
- /* Kick off any awaiting messages */
- qenable(WR(q));
- break;
- case DL_OK_ACK:
- DTRACE_PROBE2(rput_dl_ok, arl_t *, arl,
- dl_ok_ack_t *, &dlp->ok_ack);
- switch (dlp->ok_ack.dl_correct_primitive) {
- case DL_UNBIND_REQ:
- if (arl->arl_provider_style == DL_STYLE1)
- arl->arl_state = ARL_S_DOWN;
- break;
- case DL_DETACH_REQ:
- arl->arl_state = ARL_S_DOWN;
- break;
- case DL_ATTACH_REQ:
- break;
- default:
- putnext(q, mp);
- return;
- }
- ar_dlpi_done(arl, dlp->ok_ack.dl_correct_primitive);
- break;
- case DL_NOTIFY_ACK:
- DTRACE_PROBE2(rput_dl_notify, arl_t *, arl,
- dl_notify_ack_t *, &dlp->notify_ack);
- /*
- * We mostly care about interface-up transitions, as this is
- * when we need to redo duplicate address detection.
- */
- if (ap != NULL) {
- ap->ap_notifies = (dlp->notify_ack.dl_notifications &
- DL_NOTE_LINK_UP) != 0;
- }
- ar_dlpi_done(arl, DL_NOTIFY_REQ);
- break;
- case DL_BIND_ACK:
- DTRACE_PROBE2(rput_dl_bind, arl_t *, arl,
- dl_bind_ack_t *, &dlp->bind_ack);
- if (ap != NULL) {
- caddr_t hw_addr;
-
- hw_addr = (caddr_t)dlp + dlp->bind_ack.dl_addr_offset;
- if (ap->ap_saplen > 0)
- hw_addr += ap->ap_saplen;
- bcopy(hw_addr, ap->ap_hw_addr, ap->ap_hw_addrlen);
- }
- arl->arl_state = ARL_S_UP;
- ar_dlpi_done(arl, DL_BIND_REQ);
- break;
- case DL_NOTIFY_IND:
- DTRACE_PROBE2(rput_dl_notify_ind, arl_t *, arl,
- dl_notify_ind_t *, &dlp->notify_ind);
-
- if (dlp->notify_ind.dl_notification == DL_NOTE_REPLUMB) {
- arl->arl_replumbing = B_TRUE;
- if (arl->arl_state == ARL_S_DOWN) {
- arp_replumb_done(arl, mp);
- return;
- }
- break;
- }
-
- if (ap != NULL) {
- switch (dlp->notify_ind.dl_notification) {
- case DL_NOTE_LINK_UP:
- ap->ap_link_down = B_FALSE;
- ar_ce_walk(as, ar_ce_restart_dad, arl);
- break;
- case DL_NOTE_LINK_DOWN:
- ap->ap_link_down = B_TRUE;
- break;
- }
- }
- break;
- case DL_UDERROR_IND:
- DTRACE_PROBE2(rput_dl_uderror, arl_t *, arl,
- dl_uderror_ind_t *, &dlp->uderror_ind);
- (void) mi_strlog(q, 1, SL_ERROR | SL_TRACE,
- "ar_rput_dlpi: "
- "DL_UDERROR_IND, dl_dest_addr_length %d dl_errno %d",
- dlp->uderror_ind.dl_dest_addr_length,
- dlp->uderror_ind.dl_errno);
- putnext(q, mp);
- return;
- default:
- DTRACE_PROBE2(rput_dl_badprim, arl_t *, arl,
- union DL_primitives *, dlp);
- putnext(q, mp);
- return;
- }
- freemsg(mp);
-}
-
-static void
-ar_set_address(ace_t *ace, uchar_t *addrpos, uchar_t *proto_addr,
- uint32_t proto_addr_len)
-{
- uchar_t *mask, *to;
- int len;
-
- ASSERT(ace->ace_hw_addr != NULL);
-
- bcopy(ace->ace_hw_addr, addrpos, ace->ace_hw_addr_length);
- if (ace->ace_flags & ACE_F_MAPPING &&
- proto_addr != NULL &&
- ace->ace_proto_extract_mask) { /* careful */
- len = MIN((int)ace->ace_hw_addr_length
- - ace->ace_hw_extract_start,
- proto_addr_len);
- mask = ace->ace_proto_extract_mask;
- to = addrpos + ace->ace_hw_extract_start;
- while (len-- > 0)
- *to++ |= *mask++ & *proto_addr++;
- }
-}
-
-static int
-ar_slifname(queue_t *q, mblk_t *mp_orig)
-{
- ar_t *ar = q->q_ptr;
- arl_t *arl = ar->ar_arl;
- struct lifreq *lifr;
- mblk_t *mp = mp_orig;
- arl_t *old_arl;
- mblk_t *ioccpy;
- struct iocblk *iocp;
- hook_nic_event_t info;
- arp_stack_t *as = ar->ar_as;
-
- if (ar->ar_on_ill_stream) {
- /*
- * This command is for IP, since it is coming down
- * the <arp-IP-driver> stream. Return ENOENT so that
- * it will be sent downstream by the caller
- */
- return (ENOENT);
- }
- /* We handle both M_IOCTL and M_PROTO messages */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- if (q->q_next == NULL || arl == NULL) {
- /*
- * If the interface was just opened and
- * the info ack has not yet come back from the driver
- */
- DTRACE_PROBE2(slifname_no_arl, queue_t *, q,
- mblk_t *, mp_orig);
- (void) putq(q, mp_orig);
- return (EINPROGRESS);
- }
-
- if (MBLKL(mp) < sizeof (struct lifreq)) {
- DTRACE_PROBE2(slifname_malformed, queue_t *, q,
- mblk_t *, mp);
- }
-
- if (arl->arl_name[0] != '\0') {
- DTRACE_PROBE1(slifname_already, arl_t *, arl);
- return (EALREADY);
- }
-
- lifr = (struct lifreq *)mp->b_rptr;
-
- if (strlen(lifr->lifr_name) >= LIFNAMSIZ) {
- DTRACE_PROBE2(slifname_bad_name, arl_t *, arl,
- struct lifreq *, lifr);
- return (ENXIO);
- }
-
- /* Check whether the name is already in use. */
-
- old_arl = ar_ll_lookup_by_name(as, lifr->lifr_name);
- if (old_arl != NULL) {
- DTRACE_PROBE2(slifname_exists, arl_t *, arl, arl_t *, old_arl);
- return (EEXIST);
- }
-
- /* Make a copy of the message so we can send it downstream. */
- if ((ioccpy = allocb(sizeof (struct iocblk), BPRI_MED)) == NULL ||
- (ioccpy->b_cont = copymsg(mp)) == NULL) {
- if (ioccpy != NULL)
- freeb(ioccpy);
- return (ENOMEM);
- }
-
- (void) strlcpy(arl->arl_name, lifr->lifr_name, sizeof (arl->arl_name));
-
- /* The ppa is sent down by ifconfig */
- arl->arl_ppa = lifr->lifr_ppa;
-
- /*
- * A network device is not considered to be fully plumb'd until
- * its name has been set using SIOCSLIFNAME. Once it has
- * been set, it cannot be set again (see code above), so there
- * is currently no danger in this function causing two NE_PLUMB
- * events without an intervening NE_UNPLUMB.
- */
- info.hne_nic = arl->arl_index;
- info.hne_lif = 0;
- info.hne_event = NE_PLUMB;
- info.hne_data = arl->arl_name;
- info.hne_datalen = strlen(arl->arl_name);
- (void) hook_run(as->as_net_data->netd_hooks, as->as_arpnicevents,
- (hook_data_t)&info);
-
- /* Chain in the new arl. */
- rw_enter(&as->as_arl_lock, RW_WRITER);
- arl->arl_next = as->as_arl_head;
- as->as_arl_head = arl;
- rw_exit(&as->as_arl_lock);
- DTRACE_PROBE1(slifname_set, arl_t *, arl);
-
- /*
- * Send along a copy of the ioctl; this is just for hitbox. Use
- * M_CTL to avoid confusing anyone else who might be listening.
- */
- DB_TYPE(ioccpy) = M_CTL;
- iocp = (struct iocblk *)ioccpy->b_rptr;
- bzero(iocp, sizeof (*iocp));
- iocp->ioc_cmd = SIOCSLIFNAME;
- iocp->ioc_count = msgsize(ioccpy->b_cont);
- ioccpy->b_wptr = (uchar_t *)(iocp + 1);
- putnext(arl->arl_wq, ioccpy);
-
- return (0);
-}
-
-static int
-ar_set_ppa(queue_t *q, mblk_t *mp_orig)
-{
- ar_t *ar = (ar_t *)q->q_ptr;
- arl_t *arl = ar->ar_arl;
- int ppa;
- char *cp;
- mblk_t *mp = mp_orig;
- arl_t *old_arl;
- arp_stack_t *as = ar->ar_as;
-
- if (ar->ar_on_ill_stream) {
- /*
- * This command is for IP, since it is coming down
- * the <arp-IP-driver> stream. Return ENOENT so that
- * it will be sent downstream by the caller
- */
- return (ENOENT);
- }
-
- /* We handle both M_IOCTL and M_PROTO messages. */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
- if (q->q_next == NULL || arl == NULL) {
- /*
- * If the interface was just opened and
- * the info ack has not yet come back from the driver.
- */
- DTRACE_PROBE2(setppa_no_arl, queue_t *, q,
- mblk_t *, mp_orig);
- (void) putq(q, mp_orig);
- return (EINPROGRESS);
- }
-
- if (arl->arl_name[0] != '\0') {
- DTRACE_PROBE1(setppa_already, arl_t *, arl);
- return (EALREADY);
- }
-
- do {
- q = q->q_next;
- } while (q->q_next != NULL);
- cp = q->q_qinfo->qi_minfo->mi_idname;
-
- ppa = *(int *)(mp->b_rptr);
- (void) snprintf(arl->arl_name, sizeof (arl->arl_name), "%s%d", cp, ppa);
-
- old_arl = ar_ll_lookup_by_name(as, arl->arl_name);
- if (old_arl != NULL) {
- DTRACE_PROBE2(setppa_exists, arl_t *, arl, arl_t *, old_arl);
- /* Make it a null string again */
- arl->arl_name[0] = '\0';
- return (EBUSY);
- }
-
- arl->arl_ppa = ppa;
- DTRACE_PROBE1(setppa_done, arl_t *, arl);
- /* Chain in the new arl. */
- rw_enter(&as->as_arl_lock, RW_WRITER);
- arl->arl_next = as->as_arl_head;
- as->as_arl_head = arl;
- rw_exit(&as->as_arl_lock);
-
- return (0);
-}
-
-static int
-ar_snmp_msg(queue_t *q, mblk_t *mp_orig)
-{
- mblk_t *mpdata, *mp = mp_orig;
- struct opthdr *optp;
- msg2_args_t args;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- if (mp == NULL)
- return (0);
- /*
- * ar_cmd_dispatch() already checked for us that "mp->b_cont" is valid
- * in case of an M_IOCTL message.
- */
- if (DB_TYPE(mp) == M_IOCTL)
- mp = mp->b_cont;
-
- optp = (struct opthdr *)(&mp->b_rptr[sizeof (struct T_optmgmt_ack)]);
- if (optp->level == MIB2_IP && optp->name == MIB2_IP_MEDIA) {
- /*
- * Put our ARP cache entries in the ipNetToMediaTable mp from
- * IP. Due to a historical side effect of IP's MIB code, it
- * always passes us a b_cont, but the b_cont should be empty.
- */
- if ((mpdata = mp->b_cont) == NULL || MBLKL(mpdata) != 0)
- return (EINVAL);
-
- args.m2a_mpdata = mpdata;
- args.m2a_mptail = NULL;
- ar_ce_walk(as, ar_snmp_msg2, &args);
- optp->len = msgdsize(mpdata);
- }
- putnext(q, mp_orig);
- return (EINPROGRESS); /* so that rput() exits doing nothing... */
-}
-
-static void
-ar_snmp_msg2(ace_t *ace, void *arg)
-{
- const char *name = "unknown";
- mib2_ipNetToMediaEntry_t ntme;
- msg2_args_t *m2ap = arg;
-
- ASSERT(ace != NULL && ace->ace_arl != NULL);
- if (ace->ace_arl != NULL)
- name = ace->ace_arl->arl_name;
-
- /*
- * Fill in ntme using the information in the ACE.
- */
- ntme.ipNetToMediaType = (ace->ace_flags & ACE_F_PERMANENT) ? 4 : 3;
- ntme.ipNetToMediaIfIndex.o_length = MIN(OCTET_LENGTH, strlen(name));
- bcopy(name, ntme.ipNetToMediaIfIndex.o_bytes,
- ntme.ipNetToMediaIfIndex.o_length);
-
- bcopy(ace->ace_proto_addr, &ntme.ipNetToMediaNetAddress,
- MIN(sizeof (uint32_t), ace->ace_proto_addr_length));
-
- ntme.ipNetToMediaInfo.ntm_mask.o_length =
- MIN(OCTET_LENGTH, ace->ace_proto_addr_length);
- bcopy(ace->ace_proto_mask, ntme.ipNetToMediaInfo.ntm_mask.o_bytes,
- ntme.ipNetToMediaInfo.ntm_mask.o_length);
- ntme.ipNetToMediaInfo.ntm_flags = ace->ace_flags;
-
- ntme.ipNetToMediaPhysAddress.o_length =
- MIN(OCTET_LENGTH, ace->ace_hw_addr_length);
- if ((ace->ace_flags & ACE_F_RESOLVED) == 0)
- ntme.ipNetToMediaPhysAddress.o_length = 0;
- bcopy(ace->ace_hw_addr, ntme.ipNetToMediaPhysAddress.o_bytes,
- ntme.ipNetToMediaPhysAddress.o_length);
-
- /*
- * All entries within the ARP cache are unique, and there are no
- * preexisting entries in the ipNetToMediaTable mp, so just add 'em.
- */
- (void) snmp_append_data2(m2ap->m2a_mpdata, &m2ap->m2a_mptail,
- (char *)&ntme, sizeof (ntme));
-}
-
-/* Write side put procedure. */
-static void
-ar_wput(queue_t *q, mblk_t *mp)
-{
- int err;
- struct iocblk *ioc;
- mblk_t *mp1;
-
- TRACE_1(TR_FAC_ARP, TR_ARP_WPUT_START,
- "arp_wput_start: q %p", q);
-
- /*
- * Here we handle ARP commands coming from controlling processes
- * either in the form of M_IOCTL messages, or M_PROTO messages.
- */
- switch (DB_TYPE(mp)) {
- case M_IOCTL:
- switch (err = ar_cmd_dispatch(q, mp, B_TRUE)) {
- case ENOENT:
- /*
- * If it is an I_PLINK, process it. Otherwise
- * we don't recognize it, so pass it down.
- * Since ARP is a module there is always someone
- * below.
- */
- ASSERT(q->q_next != NULL);
- ioc = (struct iocblk *)mp->b_rptr;
- if ((ioc->ioc_cmd != I_PLINK) &&
- (ioc->ioc_cmd != I_PUNLINK)) {
- putnext(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)",
- q, "ioctl/enoent");
- return;
- }
- err = ar_plink_send(q, mp);
- if (err == 0) {
- return;
- }
- if ((mp1 = mp->b_cont) != 0)
- mp1->b_wptr = mp1->b_rptr;
- break;
- case EINPROGRESS:
- /*
- * If the request resulted in an attempt to resolve
- * an address, we return out here. The IOCTL will
- * be completed in ar_rput if something comes back,
- * or as a result of the timer expiring.
- */
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q, "inprog");
- return;
- default:
- DB_TYPE(mp) = M_IOCACK;
- break;
- }
- ioc = (struct iocblk *)mp->b_rptr;
- if (err != 0)
- ioc->ioc_error = err;
- if (ioc->ioc_error != 0) {
- /*
- * Don't free b_cont as IP/IB needs
- * it to identify the request.
- */
- DB_TYPE(mp) = M_IOCNAK;
- }
- ioc->ioc_count = msgdsize(mp->b_cont);
- qreply(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q, "ioctl");
- return;
- case M_FLUSH:
- if (*mp->b_rptr & FLUSHW)
- flushq(q, FLUSHDATA);
- if (*mp->b_rptr & FLUSHR) {
- flushq(RD(q), FLUSHDATA);
- *mp->b_rptr &= ~FLUSHW;
- qreply(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q, "flush");
- return;
- }
- /*
- * The normal behavior of a STREAMS module should be
- * to pass down M_FLUSH messages. However there is a
- * complex sequence of events during plumb/unplumb that
- * can cause DLPI messages in the driver's queue to be
- * flushed. So we don't send down M_FLUSH. This has been
- * reported for some drivers (Eg. le) that send up an M_FLUSH
- * in response to unbind request which will eventually be
- * looped back at the mux head and sent down. Since IP
- * does not queue messages in a module instance queue
- * of IP, nothing is lost by not sending down the flush.
- */
- freemsg(mp);
- return;
- case M_PROTO:
- case M_PCPROTO:
- /*
- * Commands in the form of PROTO messages are handled very
- * much the same as IOCTLs, but no response is returned.
- */
- switch (err = ar_cmd_dispatch(q, mp, B_TRUE)) {
- case ENOENT:
- if (q->q_next) {
- putnext(q, mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q,
- "proto/enoent");
- return;
- }
- break;
- case EINPROGRESS:
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q, "proto/einprog");
- return;
- default:
- break;
- }
- break;
- case M_IOCDATA:
- /*
- * We pass M_IOCDATA downstream because it could be as a
- * result of a previous M_COPYIN/M_COPYOUT message sent
- * upstream.
- */
- /* FALLTHRU */
- case M_CTL:
- /*
- * We also send any M_CTL downstream as it could
- * contain control information for a module downstream.
- */
- putnext(q, mp);
- return;
- default:
- break;
- }
- /* Free any message we don't understand */
- freemsg(mp);
- TRACE_2(TR_FAC_ARP, TR_ARP_WPUT_END,
- "arp_wput_end: q %p (%S)", q, "end");
-}
-
-static boolean_t
-arp_say_ready(ace_t *ace)
-{
- mblk_t *mp;
- arl_t *arl = ace->ace_arl;
- arlphy_t *ap = ace->ace_xmit_arl->arl_phy;
- arh_t *arh;
- uchar_t *cp;
-
- mp = allocb(sizeof (*arh) + 2 * (ace->ace_hw_addr_length +
- ace->ace_proto_addr_length), BPRI_MED);
- if (mp == NULL) {
- /* skip a beat on allocation trouble */
- ace->ace_xmit_count = 1;
- ace_set_timer(ace, B_FALSE);
- return (B_FALSE);
- }
- /* Tell IP address is now usable */
- arh = (arh_t *)mp->b_rptr;
- U16_TO_BE16(ap->ap_arp_hw_type, arh->arh_hardware);
- U16_TO_BE16(ace->ace_proto, arh->arh_proto);
- arh->arh_hlen = ace->ace_hw_addr_length;
- arh->arh_plen = ace->ace_proto_addr_length;
- U16_TO_BE16(ARP_REQUEST, arh->arh_operation);
- cp = (uchar_t *)(arh + 1);
- bcopy(ace->ace_hw_addr, cp, ace->ace_hw_addr_length);
- cp += ace->ace_hw_addr_length;
- bcopy(ace->ace_proto_addr, cp, ace->ace_proto_addr_length);
- cp += ace->ace_proto_addr_length;
- bcopy(ace->ace_hw_addr, cp, ace->ace_hw_addr_length);
- cp += ace->ace_hw_addr_length;
- bcopy(ace->ace_proto_addr, cp, ace->ace_proto_addr_length);
- cp += ace->ace_proto_addr_length;
- mp->b_wptr = cp;
- ar_client_notify(arl, mp, AR_CN_READY);
- DTRACE_PROBE1(ready, ace_t *, ace);
- return (B_TRUE);
-}
-
-/*
- * Pick the longest-waiting aces for defense.
- */
-static void
-ace_reschedule(ace_t *ace, void *arg)
-{
- ace_resched_t *art = arg;
- ace_t **aces;
- ace_t **acemax;
- ace_t *atemp;
-
- if (ace->ace_xmit_arl != art->art_arl)
- return;
- /*
- * Only published entries that are ready for announcement are eligible.
- */
- if ((ace->ace_flags & (ACE_F_PUBLISH | ACE_F_UNVERIFIED | ACE_F_DYING |
- ACE_F_DELAYED)) != ACE_F_PUBLISH) {
- return;
- }
- if (art->art_naces < ACE_RESCHED_LIST_LEN) {
- art->art_aces[art->art_naces++] = ace;
- } else {
- aces = art->art_aces;
- acemax = aces + ACE_RESCHED_LIST_LEN;
- for (; aces < acemax; aces++) {
- if ((*aces)->ace_last_bcast > ace->ace_last_bcast) {
- atemp = *aces;
- *aces = ace;
- ace = atemp;
- }
- }
- }
-}
-
-/*
- * Reschedule the ARP defense of any long-waiting ACEs. It's assumed that this
- * doesn't happen very often (if at all), and thus it needn't be highly
- * optimized. (Note, though, that it's actually O(N) complexity, because the
- * outer loop is bounded by a constant rather than by the length of the list.)
- */
-static void
-arl_reschedule(arl_t *arl)
-{
- arlphy_t *ap = arl->arl_phy;
- ace_resched_t art;
- int i;
- ace_t *ace;
- arp_stack_t *as = ARL_TO_ARPSTACK(arl);
-
- i = ap->ap_defend_count;
- ap->ap_defend_count = 0;
- /* If none could be sitting around, then don't reschedule */
- if (i < as->as_defend_rate) {
- DTRACE_PROBE1(reschedule_none, arl_t *, arl);
- return;
- }
- art.art_arl = arl;
- while (ap->ap_defend_count < as->as_defend_rate) {
- art.art_naces = 0;
- ar_ce_walk(as, ace_reschedule, &art);
- for (i = 0; i < art.art_naces; i++) {
- ace = art.art_aces[i];
- ace->ace_flags |= ACE_F_DELAYED;
- ace_set_timer(ace, B_FALSE);
- if (++ap->ap_defend_count >= as->as_defend_rate)
- break;
- }
- if (art.art_naces < ACE_RESCHED_LIST_LEN)
- break;
- }
- DTRACE_PROBE1(reschedule, arl_t *, arl);
-}
-
-/*
- * Write side service routine. The only action here is delivery of transmit
- * timer events and delayed messages while waiting for the info_ack (ar_arl
- * not yet set).
- */
-static void
-ar_wsrv(queue_t *q)
-{
- ace_t *ace;
- arlphy_t *ap;
- mblk_t *mp;
- clock_t ms;
- arp_stack_t *as = ((ar_t *)q->q_ptr)->ar_as;
-
- TRACE_1(TR_FAC_ARP, TR_ARP_WSRV_START,
- "arp_wsrv_start: q %p", q);
-
- while ((mp = getq(q)) != NULL) {
- switch (DB_TYPE(mp)) {
- case M_PCSIG:
- if (!mi_timer_valid(mp))
- continue;
- ace = (ace_t *)mp->b_rptr;
- if (ace->ace_flags & ACE_F_DYING)
- continue;
- ap = ace->ace_xmit_arl->arl_phy;
- if (ace->ace_flags & ACE_F_UNVERIFIED) {
- ASSERT(ace->ace_flags & ACE_F_PUBLISH);
- ASSERT(ace->ace_query_mp == NULL);
- /*
- * If the link is down, give up for now. IP
- * will give us the go-ahead to try again when
- * the link restarts.
- */
- if (ap->ap_link_down) {
- DTRACE_PROBE1(timer_link_down,
- ace_t *, ace);
- ace->ace_flags |= ACE_F_DAD_ABORTED;
- continue;
- }
- if (ace->ace_xmit_count > 0) {
- DTRACE_PROBE1(timer_probe,
- ace_t *, ace);
- ace->ace_xmit_count--;
- ar_xmit(ace->ace_xmit_arl, ARP_REQUEST,
- ace->ace_proto,
- ace->ace_proto_addr_length,
- ace->ace_hw_addr, NULL, NULL,
- ace->ace_proto_addr, NULL, as);
- ace_set_timer(ace, B_FALSE);
- continue;
- }
- if (!arp_say_ready(ace))
- continue;
- DTRACE_PROBE1(timer_ready, ace_t *, ace);
- ace->ace_xmit_interval =
- as->as_publish_interval;
- ace->ace_xmit_count = as->as_publish_count;
- if (ace->ace_xmit_count == 0)
- ace->ace_xmit_count++;
- ace->ace_flags &= ~ACE_F_UNVERIFIED;
- }
- if (ace->ace_flags & ACE_F_PUBLISH) {
- clock_t now;
-
- /*
- * If an hour has passed, then free up the
- * entries that need defense by rescheduling
- * them.
- */
- now = ddi_get_lbolt();
- if (as->as_defend_rate > 0 &&
- now - ap->ap_defend_start >
- SEC_TO_TICK(as->as_defend_period)) {
- ap->ap_defend_start = now;
- arl_reschedule(ace->ace_xmit_arl);
- }
- /*
- * Finish the job that we started in
- * ar_entry_add. When we get to zero
- * announcement retransmits left, switch to
- * address defense.
- */
- ASSERT(ace->ace_query_mp == NULL);
- if (ace->ace_xmit_count > 0) {
- ace->ace_xmit_count--;
- DTRACE_PROBE1(timer_announce,
- ace_t *, ace);
- } else if (ace->ace_flags & ACE_F_DELAYED) {
- /*
- * This guy was rescheduled as one of
- * the really old entries needing
- * on-going defense. Let him through
- * now.
- */
- DTRACE_PROBE1(timer_send_delayed,
- ace_t *, ace);
- ace->ace_flags &= ~ACE_F_DELAYED;
- } else if (as->as_defend_rate > 0 &&
- (ap->ap_defend_count >=
- as->as_defend_rate ||
- ++ap->ap_defend_count >=
- as->as_defend_rate)) {
- /*
- * If we're no longer allowed to send
- * unbidden defense messages, then just
- * wait for rescheduling.
- */
- DTRACE_PROBE1(timer_excess_defense,
- ace_t *, ace);
- ace_set_timer(ace, B_FALSE);
- continue;
- } else {
- DTRACE_PROBE1(timer_defend,
- ace_t *, ace);
- }
- ar_xmit(ace->ace_xmit_arl, ARP_REQUEST,
- ace->ace_proto,
- ace->ace_proto_addr_length,
- ace->ace_hw_addr,
- ace->ace_proto_addr,
- ace->ace_xmit_arl->arl_phy->ap_arp_addr,
- ace->ace_proto_addr, NULL, as);
- ace->ace_last_bcast = now;
- if (ace->ace_xmit_count == 0)
- ace->ace_xmit_interval =
- as->as_defend_interval;
- if (ace->ace_xmit_interval != 0)
- ace_set_timer(ace, B_FALSE);
- continue;
- }
-
- /*
- * If this is a non-permanent (regular) resolved ARP
- * entry, then it's now time to check if it can be
- * retired. As an optimization, we check with IP
- * first, and just restart the timer if the address is
- * still in use.
- */
- if (ACE_NONPERM(ace)) {
- if (ace->ace_proto == IP_ARP_PROTO_TYPE &&
- ndp_lookup_ipaddr(*(ipaddr_t *)
- ace->ace_proto_addr, as->as_netstack)) {
- ace->ace_flags |= ACE_F_OLD;
- mi_timer(ace->ace_arl->arl_wq,
- ace->ace_mp,
- as->as_cleanup_interval);
- } else {
- ar_delete_notify(ace);
- ar_ce_delete(ace);
- }
- continue;
- }
-
- /*
- * ar_query_xmit returns the number of milliseconds to
- * wait following this transmit. If the number of
- * allowed transmissions has been exhausted, it will
- * return zero without transmitting. If that happens
- * we complete the operation with a failure indication.
- * Otherwise, we restart the timer.
- */
- ms = ar_query_xmit(as, ace);
- if (ms == 0)
- ar_query_reply(ace, ENXIO, NULL, (uint32_t)0);
- else
- mi_timer(q, mp, ms);
- continue;
- default:
- put(q, mp);
- continue;
- }
- }
- TRACE_1(TR_FAC_ARP, TR_ARP_WSRV_END,
- "arp_wsrv_end: q %p", q);
-}
-
-/* ar_xmit is called to transmit an ARP Request or Response. */
-static void
-ar_xmit(arl_t *arl, uint32_t operation, uint32_t proto, uint32_t plen,
- const uchar_t *haddr1, const uchar_t *paddr1, const uchar_t *haddr2,
- const uchar_t *paddr2, const uchar_t *dstaddr, arp_stack_t *as)
-{
- arh_t *arh;
- uint8_t *cp;
- uint_t hlen;
- mblk_t *mp;
- arlphy_t *ap = arl->arl_phy;
-
- ASSERT(!(arl->arl_flags & ARL_F_IPMP));
-
- if (ap == NULL) {
- DTRACE_PROBE1(xmit_no_arl_phy, arl_t *, arl);
- return;
- }
-
- /* IFF_NOARP flag is set or link down: do not send arp messages */
- if ((arl->arl_flags & ARL_F_NOARP) || ap->ap_link_down)
- return;
-
- hlen = ap->ap_hw_addrlen;
- if ((mp = copyb(ap->ap_xmit_mp)) == NULL)
- return;
-
- mp->b_cont = allocb(AR_LL_HDR_SLACK + ARH_FIXED_LEN + (hlen * 4) +
- plen + plen, BPRI_MED);
- if (mp->b_cont == NULL) {
- freeb(mp);
- return;
- }
-
- /* Get the L2 destination address for the message */
- if (haddr2 == NULL)
- dstaddr = ap->ap_arp_addr;
- else if (dstaddr == NULL)
- dstaddr = haddr2;
-
- /*
- * Figure out where the target hardware address goes in the
- * DL_UNITDATA_REQ header, and copy it in.
- */
- cp = mi_offset_param(mp, ap->ap_xmit_addroff, hlen);
- ASSERT(cp != NULL);
- if (cp == NULL) {
- freemsg(mp);
- return;
- }
- bcopy(dstaddr, cp, hlen);
-
- /* Fill in the ARP header. */
- cp = mp->b_cont->b_rptr + (AR_LL_HDR_SLACK + hlen + hlen);
- mp->b_cont->b_rptr = cp;
- arh = (arh_t *)cp;
- U16_TO_BE16(ap->ap_arp_hw_type, arh->arh_hardware);
- U16_TO_BE16(proto, arh->arh_proto);
- arh->arh_hlen = (uint8_t)hlen;
- arh->arh_plen = (uint8_t)plen;
- U16_TO_BE16(operation, arh->arh_operation);
- cp += ARH_FIXED_LEN;
- bcopy(haddr1, cp, hlen);
- cp += hlen;
- if (paddr1 == NULL)
- bzero(cp, plen);
- else
- bcopy(paddr1, cp, plen);
- cp += plen;
- if (haddr2 == NULL)
- bzero(cp, hlen);
- else
- bcopy(haddr2, cp, hlen);
- cp += hlen;
- bcopy(paddr2, cp, plen);
- cp += plen;
- mp->b_cont->b_wptr = cp;
-
- DTRACE_PROBE3(arp__physical__out__start,
- arl_t *, arl, arh_t *, arh, mblk_t *, mp);
-
- ARP_HOOK_OUT(as->as_arp_physical_out_event, as->as_arp_physical_out,
- arl->arl_index, arh, mp, mp->b_cont, as);
-
- DTRACE_PROBE1(arp__physical__out__end, mblk_t *, mp);
-
- if (mp == NULL)
- return;
-
- /* Ship it out. */
- if (canputnext(arl->arl_wq))
- putnext(arl->arl_wq, mp);
- else
- freemsg(mp);
-}
-
-static mblk_t *
-ar_alloc(uint32_t cmd, int err)
-{
- uint32_t len;
- mblk_t *mp;
- mblk_t *mp1;
- char *cp;
- arc_t *arc;
-
- /* For now only one type of command is accepted */
- if (cmd != AR_DLPIOP_DONE)
- return (NULL);
- len = sizeof (arc_t);
- mp = allocb(len, BPRI_HI);
- if (!mp)
- return (NULL);
-
- DB_TYPE(mp) = M_CTL;
- cp = (char *)mp->b_rptr;
- arc = (arc_t *)(mp->b_rptr);
- arc->arc_cmd = cmd;
- mp->b_wptr = (uchar_t *)&cp[len];
- len = sizeof (int);
- mp1 = allocb(len, BPRI_HI);
- if (!mp1) {
- freeb(mp);
- return (NULL);
- }
- cp = (char *)mp->b_rptr;
- /* Initialize the error code */
- *((int *)mp1->b_rptr) = err;
- mp1->b_wptr = (uchar_t *)&cp[len];
- linkb(mp, mp1);
- return (mp);
-}
-
-void
-arp_ddi_init(void)
-{
- /*
- * We want to be informed each time a stack is created or
- * destroyed in the kernel, so we can maintain the
- * set of arp_stack_t's.
- */
- netstack_register(NS_ARP, arp_stack_init, arp_stack_shutdown,
- arp_stack_fini);
-}
-
-void
-arp_ddi_destroy(void)
-{
- netstack_unregister(NS_ARP);
-}
-
-/*
- * Initialize the ARP stack instance.
- */
-/* ARGSUSED */
-static void *
-arp_stack_init(netstackid_t stackid, netstack_t *ns)
-{
- arp_stack_t *as;
- arpparam_t *pa;
-
- as = (arp_stack_t *)kmem_zalloc(sizeof (*as), KM_SLEEP);
- as->as_netstack = ns;
-
- pa = (arpparam_t *)kmem_alloc(sizeof (arp_param_arr), KM_SLEEP);
- as->as_param_arr = pa;
- bcopy(arp_param_arr, as->as_param_arr, sizeof (arp_param_arr));
-
- (void) ar_param_register(&as->as_nd,
- as->as_param_arr, A_CNT(arp_param_arr));
-
- as->as_arp_index_counter = 1;
- as->as_arp_counter_wrapped = 0;
-
- rw_init(&as->as_arl_lock, NULL, RW_DRIVER, NULL);
- arp_net_init(as, stackid);
- arp_hook_init(as);
-
- return (as);
-}
-
-/* ARGSUSED */
-static void
-arp_stack_shutdown(netstackid_t stackid, void *arg)
-{
- arp_stack_t *as = (arp_stack_t *)arg;
-
- arp_net_shutdown(as);
-}
-
-/*
- * Free the ARP stack instance.
- */
-/* ARGSUSED */
-static void
-arp_stack_fini(netstackid_t stackid, void *arg)
-{
- arp_stack_t *as = (arp_stack_t *)arg;
-
- arp_hook_destroy(as);
- arp_net_destroy(as);
- rw_destroy(&as->as_arl_lock);
- nd_free(&as->as_nd);
- kmem_free(as->as_param_arr, sizeof (arp_param_arr));
- as->as_param_arr = NULL;
- kmem_free(as, sizeof (*as));
-}
diff --git a/usr/src/uts/common/inet/arp/arp_netinfo.c b/usr/src/uts/common/inet/arp/arp_netinfo.c
deleted file mode 100644
index 9d9c6a5..0000000
--- a/usr/src/uts/common/inet/arp/arp_netinfo.c
+++ /dev/null
@@ -1,376 +0,0 @@
-/*
- * CDDL HEADER START
- *
- * The contents of this file are subject to the terms of the
- * Common Development and Distribution License (the "License").
- * You may not use this file except in compliance with the License.
- *
- * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
- * or http://www.opensolaris.org/os/licensing.
- * See the License for the specific language governing permissions
- * and limitations under the License.
- *
- * When distributing Covered Code, include this CDDL HEADER in each
- * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
- * If applicable, add the following below this CDDL HEADER, with the
- * fields enclosed by brackets "[]" replaced with your own identifying
- * information: Portions Copyright [yyyy] [name of copyright owner]
- *
- * CDDL HEADER END
- */
-/*
- * Copyright 2008 Sun Microsystems, Inc. All rights reserved.
- * Use is subject to license terms.
- */
-
-#include <sys/param.h>
-#include <sys/types.h>
-#include <sys/systm.h>
-#include <sys/cmn_err.h>
-#include <sys/stream.h>
-#include <sys/sunddi.h>
-#include <sys/hook.h>
-#include <sys/hook_impl.h>
-#include <sys/netstack.h>
-#include <net/if.h>
-
-#include <sys/neti.h>
-#include <sys/hook_event.h>
-#include <inet/arp_impl.h>
-
-/*
- * ARP netinfo entry point declarations.
- */
-static int arp_getifname(net_handle_t, phy_if_t, char *, const size_t);
-static int arp_getmtu(net_handle_t, phy_if_t, lif_if_t);
-static int arp_getpmtuenabled(net_handle_t);
-static int arp_getlifaddr(net_handle_t, phy_if_t, lif_if_t, size_t,
- net_ifaddr_t [], void *);
-static int arp_getlifzone(net_handle_t, phy_if_t, lif_if_t, zoneid_t *);
-static int arp_getlifflags(net_handle_t, phy_if_t, lif_if_t, uint64_t *);
-static phy_if_t arp_phygetnext(net_handle_t, phy_if_t);
-static phy_if_t arp_phylookup(net_handle_t, const char *);
-static lif_if_t arp_lifgetnext(net_handle_t, phy_if_t, lif_if_t);
-static int arp_inject(net_handle_t, inject_t, net_inject_t *);
-static phy_if_t arp_routeto(net_handle_t, struct sockaddr *, struct sockaddr *);
-static int arp_ispartialchecksum(net_handle_t, mblk_t *);
-static int arp_isvalidchecksum(net_handle_t, mblk_t *);
-
-static net_protocol_t arp_netinfo = {
- NETINFO_VERSION,
- NHF_ARP,
- arp_getifname,
- arp_getmtu,
- arp_getpmtuenabled,
- arp_getlifaddr,
- arp_getlifzone,
- arp_getlifflags,
- arp_phygetnext,
- arp_phylookup,
- arp_lifgetnext,
- arp_inject,
- arp_routeto,
- arp_ispartialchecksum,
- arp_isvalidchecksum
-};
-
-/*
- * Register ARP netinfo functions.
- */
-void
-arp_net_init(arp_stack_t *as, netstackid_t stackid)
-{
- netid_t id;
-
- id = net_getnetidbynetstackid(stackid);
- ASSERT(id != -1);
-
- as->as_net_data = net_protocol_register(id, &arp_netinfo);
- ASSERT(as->as_net_data != NULL);
-}
-
-void
-arp_net_shutdown(arp_stack_t *as)
-{
- if (as->as_arpnicevents != NULL) {
- (void) net_event_shutdown(as->as_net_data,
- &as->as_arp_nic_events);
- }
-
- if (as->as_arp_physical_out != NULL) {
- (void) net_event_shutdown(as->as_net_data,
- &as->as_arp_physical_out_event);
- }
-
- if (as->as_arp_physical_in != NULL) {
- (void) net_event_shutdown(as->as_net_data,
- &as->as_arp_physical_in_event);
- }
-
- (void) net_family_shutdown(as->as_net_data, &as->as_arproot);
-}
-
-/*
- * Unregister ARP netinfo functions.
- */
-void
-arp_net_destroy(arp_stack_t *as)
-{
- if (net_protocol_unregister(as->as_net_data) == 0)
- as->as_net_data = NULL;
-}
-
-/*
- * Initialize ARP hook family and events
- */
-void
-arp_hook_init(arp_stack_t *as)
-{
- HOOK_FAMILY_INIT(&as->as_arproot, Hn_ARP);
- if (net_family_register(as->as_net_data, &as->as_arproot) != 0) {
- cmn_err(CE_NOTE, "arp_hook_init: "
- "net_family_register failed for arp");
- }
-
- HOOK_EVENT_INIT(&as->as_arp_physical_in_event, NH_PHYSICAL_IN);
- as->as_arp_physical_in = net_event_register(as->as_net_data,
- &as->as_arp_physical_in_event);
- if (as->as_arp_physical_in == NULL) {
- cmn_err(CE_NOTE, "arp_hook_init: "
- "net_event_register failed for arp/physical_in");
- }
-
- HOOK_EVENT_INIT(&as->as_arp_physical_out_event, NH_PHYSICAL_OUT);
- as->as_arp_physical_out = net_event_register(as->as_net_data,
- &as->as_arp_physical_out_event);
- if (as->as_arp_physical_out == NULL) {
- cmn_err(CE_NOTE, "arp_hook_init: "
- "net_event_register failed for arp/physical_out");
- }
-
- HOOK_EVENT_INIT(&as->as_arp_nic_events, NH_NIC_EVENTS);
- as->as_arpnicevents = net_event_register(as->as_net_data,
- &as->as_arp_nic_events);
- if (as->as_arpnicevents == NULL) {
- cmn_err(CE_NOTE, "arp_hook_init: "
- "net_event_register failed for arp/nic_events");
- }
-}
-
-void
-arp_hook_destroy(arp_stack_t *as)
-{
- if (as->as_arpnicevents != NULL) {
- if (net_event_unregister(as->as_net_data,
- &as->as_arp_nic_events) == 0)
- as->as_arpnicevents = NULL;
- }
-
- if (as->as_arp_physical_out != NULL) {
- if (net_event_unregister(as->as_net_data,
- &as->as_arp_physical_out_event) == 0)
- as->as_arp_physical_out = NULL;
- }
-
- if (as->as_arp_physical_in != NULL) {
- if (net_event_unregister(as->as_net_data,
- &as->as_arp_physical_in_event) == 0)
- as->as_arp_physical_in = NULL;
- }
-
- (void) net_family_unregister(as->as_net_data, &as->as_arproot);
-}
-
-/*
- * Determine the name of the lower level interface
- */
-static int
-arp_getifname(net_handle_t net, phy_if_t phy_ifdata, char *buffer,
- const size_t buflen)
-{
- arl_t *arl;
- arp_stack_t *as;
- netstack_t *ns = net->netd_stack->nts_netstack;
-
- ASSERT(buffer != NULL);
- ASSERT(ns != NULL);
-
- as = ns->netstack_arp;
- rw_enter(&as->as_arl_lock, RW_READER);
- for (arl = as->as_arl_head; arl != NULL; arl = arl->arl_next) {
- if (arl->arl_index == phy_ifdata) {
- (void) strlcpy(buffer, arl->arl_name, buflen);
- rw_exit(&as->as_arl_lock);
- return (0);
- }
- }
- rw_exit(&as->as_arl_lock);
-
- return (1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_getmtu(net_handle_t net, phy_if_t phy_ifdata, lif_if_t ifdata)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_getpmtuenabled(net_handle_t net)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_getlifaddr(net_handle_t net, phy_if_t phy_ifdata, lif_if_t ifdata,
- size_t nelem, net_ifaddr_t type[], void *storage)
-{
- return (-1);
-}
-
-/*
- * Determine the instance number of the next lower level interface
- */
-static phy_if_t
-arp_phygetnext(net_handle_t net, phy_if_t phy_ifdata)
-{
- arl_t *arl;
- int index;
- arp_stack_t *as;
- netstack_t *ns = net->netd_stack->nts_netstack;
-
- ASSERT(ns != NULL);
-
- as = ns->netstack_arp;
- rw_enter(&as->as_arl_lock, RW_READER);
- if (phy_ifdata == 0) {
- arl = as->as_arl_head;
- } else {
- for (arl = as->as_arl_head; arl != NULL;
- arl = arl->arl_next) {
- if (arl->arl_index == phy_ifdata) {
- arl = arl->arl_next;
- break;
- }
- }
- }
-
- index = (arl != NULL) ? arl->arl_index : 0;
-
- rw_exit(&as->as_arl_lock);
-
- return (index);
-}
-
-/*
- * Given a network interface name, find its ARP layer instance number.
- */
-static phy_if_t
-arp_phylookup(net_handle_t net, const char *name)
-{
- arl_t *arl;
- int index;
- arp_stack_t *as;
- netstack_t *ns = net->netd_stack->nts_netstack;
-
- ASSERT(name != NULL);
- ASSERT(ns != NULL);
-
- index = 0;
- as = ns->netstack_arp;
- rw_enter(&as->as_arl_lock, RW_READER);
- for (arl = as->as_arl_head; arl != NULL; arl = arl->arl_next) {
- if (strcmp(name, arl->arl_name) == 0) {
- index = arl->arl_index;
- break;
- }
- }
- rw_exit(&as->as_arl_lock);
-
- return (index);
-
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static lif_if_t
-arp_lifgetnext(net_handle_t net, phy_if_t ifp, lif_if_t lif)
-{
- return ((lif_if_t)-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_inject(net_handle_t net, inject_t injection, net_inject_t *neti)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static phy_if_t
-arp_routeto(net_handle_t net, struct sockaddr *addr, struct sockaddr *next)
-{
- return ((phy_if_t)-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-int
-arp_ispartialchecksum(net_handle_t net, mblk_t *mb)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_isvalidchecksum(net_handle_t net, mblk_t *mb)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_getlifzone(net_handle_t net, phy_if_t phy_ifdata, lif_if_t ifdata,
- zoneid_t *zoneid)
-{
- return (-1);
-}
-
-/*
- * Unsupported with ARP.
- */
-/*ARGSUSED*/
-static int
-arp_getlifflags(net_handle_t net, phy_if_t phy_ifdata, lif_if_t ifdata,
- uint64_t *flags)
-{
- return (-1);
-}
diff --git a/usr/src/uts/common/inet/arp/arpddi.c b/usr/src/uts/common/inet/arp/arpddi.c
index 2cc56b7..de83332 100644
--- a/usr/src/uts/common/inet/arp/arpddi.c
+++ b/usr/src/uts/common/inet/arp/arpddi.c
@@ -19,7 +19,7 @@
* CDDL HEADER END
*/
/*
- * Copyright 2008 Sun Microsystems, Inc. All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
/* Copyright (c) 1990 Mentat Inc. */
@@ -27,10 +27,8 @@
#include <sys/types.h>
#include <sys/conf.h>
#include <sys/modctl.h>
-#include <sys/ksynch.h>
#include <inet/common.h>
#include <inet/ip.h>
-#include <inet/arp_impl.h>
#define INET_NAME "arp"
#define INET_MODDESC "ARP STREAMS module"
@@ -39,28 +37,16 @@
#define INET_DEVSTRTAB ipinfov4
#define INET_MODSTRTAB arpinfo
#define INET_DEVMTFLAGS IP_DEVMTFLAGS /* since as a driver we're ip */
-#define INET_MODMTFLAGS (D_MP | D_MTPERMOD)
+#define INET_MODMTFLAGS D_MP
#include "../inetddi.c"
-extern void arp_ddi_init(void);
-extern void arp_ddi_destroy(void);
-
int
_init(void)
{
int error;
- /*
- * Note: After mod_install succeeds, another thread can enter
- * therefore all initialization is done before it and any
- * de-initialization needed done if it fails.
- */
- arp_ddi_init();
error = mod_install(&modlinkage);
- if (error != 0)
- arp_ddi_destroy();
-
return (error);
}
@@ -70,8 +56,6 @@
int error;
error = mod_remove(&modlinkage);
- if (error == 0)
- arp_ddi_destroy();
return (error);
}
diff --git a/usr/src/uts/common/inet/arp_impl.h b/usr/src/uts/common/inet/arp_impl.h
deleted file mode 100644
index 38d0d1a..0000000
--- a/usr/src/uts/common/inet/arp_impl.h
+++ /dev/null
@@ -1,253 +0,0 @@
-/*
- * CDDL HEADER START
- *
- * The contents of this file are subject to the terms of the
- * Common Development and Distribution License (the "License").
- * You may not use this file except in compliance with the License.
- *
- * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
- * or http://www.opensolaris.org/os/licensing.
- * See the License for the specific language governing permissions
- * and limitations under the License.
- *
- * When distributing Covered Code, include this CDDL HEADER in each
- * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
- * If applicable, add the following below this CDDL HEADER, with the
- * fields enclosed by brackets "[]" replaced with your own identifying
- * information: Portions Copyright [yyyy] [name of copyright owner]
- *
- * CDDL HEADER END
- */
-/*
- * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
- * Use is subject to license terms.
- */
-
-#ifndef _ARP_IMPL_H
-#define _ARP_IMPL_H
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#ifdef _KERNEL
-
-#include <sys/types.h>
-#include <sys/stream.h>
-#include <net/if.h>
-#include <sys/netstack.h>
-
-/* ARP kernel hash size; used for mdb support */
-#define ARP_HASH_SIZE 256
-
-/* Named Dispatch Parameter Management Structure */
-typedef struct arpparam_s {
- uint32_t arp_param_min;
- uint32_t arp_param_max;
- uint32_t arp_param_value;
- char *arp_param_name;
-} arpparam_t;
-
-/* ARL Structure, one per link level device */
-typedef struct arl_s {
- struct arl_s *arl_next; /* ARL chain at arl_g_head */
- queue_t *arl_rq; /* Read queue pointer */
- queue_t *arl_wq; /* Write queue pointer */
- t_uscalar_t arl_ppa; /* DL_ATTACH parameter */
- char arl_name[LIFNAMSIZ]; /* Lower level name */
- mblk_t *arl_unbind_mp;
- mblk_t *arl_detach_mp;
- t_uscalar_t arl_provider_style; /* From DL_INFO_ACK */
- mblk_t *arl_queue; /* Queued commands head */
- mblk_t *arl_queue_tail; /* Queued commands tail */
- uint32_t arl_flags; /* ARL_F_* values below */
- t_uscalar_t arl_dlpi_pending; /* pending DLPI request */
- mblk_t *arl_dlpi_deferred; /* Deferred DLPI messages */
- uint_t arl_state; /* lower interface state */
- uint_t arl_closing : 1, /* stream is closing */
- arl_replumbing : 1; /* Wait for IP to bring down */
- uint32_t arl_index; /* instance number */
- struct arlphy_s *arl_phy; /* physical info, if any */
- struct arl_s *arl_ipmp_arl; /* pointer to group arl_t */
-} arl_t;
-
-/*
- * There is no field to get from an arl_t to an arp_stack_t, but this
- * macro does it.
- */
-#define ARL_TO_ARPSTACK(_arl) (((ar_t *)(_arl)->arl_rq->q_ptr)->ar_as)
-
-/* ARL physical info structure, one per physical link level device */
-typedef struct arlphy_s {
- uint32_t ap_arp_hw_type; /* hardware type */
- uchar_t *ap_arp_addr; /* multicast address to use */
- uchar_t *ap_hw_addr; /* hardware address */
- uint32_t ap_hw_addrlen; /* hardware address length */
- mblk_t *ap_xmit_mp; /* DL_UNITDATA_REQ template */
- t_uscalar_t ap_xmit_addroff; /* address offset in xmit_mp */
- t_uscalar_t ap_xmit_sapoff; /* sap offset in xmit_mp */
- t_scalar_t ap_saplen; /* sap length */
- clock_t ap_defend_start; /* start of 1-hour period */
- uint_t ap_defend_count; /* # of unbidden broadcasts */
- uint_t ap_notifies : 1, /* handles DL_NOTE_LINK */
- ap_link_down : 1; /* DL_NOTE status */
-} arlphy_t;
-
-/* ARP Cache Entry */
-typedef struct ace_s {
- struct ace_s *ace_next; /* Hash chain next pointer */
- struct ace_s **ace_ptpn; /* Pointer to previous next */
- struct arl_s *ace_arl; /* Associated arl */
- uint32_t ace_proto; /* Protocol for this ace */
- uint32_t ace_flags;
- uchar_t *ace_proto_addr;
- uint32_t ace_proto_addr_length;
- uchar_t *ace_proto_mask; /* Mask for matching addr */
- uchar_t *ace_proto_extract_mask; /* For mappings */
- uchar_t *ace_hw_addr;
- uint32_t ace_hw_addr_length;
- uint32_t ace_hw_extract_start; /* For mappings */
- mblk_t *ace_mp; /* mblk we are in */
- mblk_t *ace_query_mp; /* outstanding query chain */
- clock_t ace_last_bcast; /* last broadcast Response */
- clock_t ace_xmit_interval;
- int ace_xmit_count;
- arl_t *ace_xmit_arl; /* xmit on this arl */
-} ace_t;
-
-#define ARPHOOK_INTERESTED_PHYSICAL_IN(as) \
- (as->as_arp_physical_in_event.he_interested)
-#define ARPHOOK_INTERESTED_PHYSICAL_OUT(as) \
- (as->as_arp_physical_out_event.he_interested)
-
-#define ARP_HOOK_IN(_hook, _event, _ilp, _hdr, _fm, _m, as) \
- \
- if ((_hook).he_interested) { \
- hook_pkt_event_t info; \
- \
- info.hpe_protocol = as->as_net_data; \
- info.hpe_ifp = _ilp; \
- info.hpe_ofp = 0; \
- info.hpe_hdr = _hdr; \
- info.hpe_mp = &(_fm); \
- info.hpe_mb = _m; \
- if (hook_run(as->as_net_data->netd_hooks, \
- _event, (hook_data_t)&info) != 0) { \
- if (_fm != NULL) { \
- freemsg(_fm); \
- _fm = NULL; \
- } \
- _hdr = NULL; \
- _m = NULL; \
- } else { \
- _hdr = info.hpe_hdr; \
- _m = info.hpe_mb; \
- } \
- }
-
-#define ARP_HOOK_OUT(_hook, _event, _olp, _hdr, _fm, _m, as) \
- \
- if ((_hook).he_interested) { \
- hook_pkt_event_t info; \
- \
- info.hpe_protocol = as->as_net_data; \
- info.hpe_ifp = 0; \
- info.hpe_ofp = _olp; \
- info.hpe_hdr = _hdr; \
- info.hpe_mp = &(_fm); \
- info.hpe_mb = _m; \
- if (hook_run(as->as_net_data->netd_hooks, \
- _event, (hook_data_t)&info) != 0) { \
- if (_fm != NULL) { \
- freemsg(_fm); \
- _fm = NULL; \
- } \
- _hdr = NULL; \
- _m = NULL; \
- } else { \
- _hdr = info.hpe_hdr; \
- _m = info.hpe_mb; \
- } \
- }
-
-#define ACE_EXTERNAL_FLAGS_MASK \
- (ACE_F_PERMANENT | ACE_F_PUBLISH | ACE_F_MAPPING | ACE_F_MYADDR | \
- ACE_F_AUTHORITY)
-
-/*
- * ARP stack instances
- */
-struct arp_stack {
- netstack_t *as_netstack; /* Common netstack */
- void *as_head; /* AR Instance Data List Head */
- caddr_t as_nd; /* AR Named Dispatch Head */
- struct arl_s *as_arl_head; /* ARL List Head */
- arpparam_t *as_param_arr; /* ndd variable table */
-
- /* ARP Cache Entry Hash Table */
- ace_t *as_ce_hash_tbl[ARP_HASH_SIZE];
- ace_t *as_ce_mask_entries;
-
- /*
- * With the introduction of netinfo (neti kernel module),
- * it is now possible to access data structures in the ARP module
- * without the code being executed in the context of the IP module,
- * thus there is no locking being enforced through the use of STREAMS.
- * as_arl_lock is used to protect as_arl_head list.
- */
- krwlock_t as_arl_lock;
-
- uint32_t as_arp_index_counter;
- uint32_t as_arp_counter_wrapped;
-
- /* arp_neti.c */
- hook_family_t as_arproot;
-
- /*
- * Hooks for ARP
- */
- hook_event_t as_arp_physical_in_event;
- hook_event_t as_arp_physical_out_event;
- hook_event_t as_arp_nic_events;
-
- hook_event_token_t as_arp_physical_in;
- hook_event_token_t as_arp_physical_out;
- hook_event_token_t as_arpnicevents;
-
- net_handle_t as_net_data;
-};
-typedef struct arp_stack arp_stack_t;
-
-#define ARL_F_NOARP 0x01
-#define ARL_F_IPMP 0x02
-
-#define ARL_S_DOWN 0x00
-#define ARL_S_PENDING 0x01
-#define ARL_S_UP 0x02
-
-/* AR Structure, one per upper stream */
-typedef struct ar_s {
- queue_t *ar_rq; /* Read queue pointer */
- queue_t *ar_wq; /* Write queue pointer */
- arl_t *ar_arl; /* Associated arl */
- cred_t *ar_credp; /* Credentials associated w/ open */
- struct ar_s *ar_arl_ip_assoc; /* ARL - IP association */
- uint32_t
- ar_ip_acked_close : 1, /* IP has acked the close */
- ar_on_ill_stream : 1; /* Module below is IP */
- arp_stack_t *ar_as;
-} ar_t;
-
-extern void arp_hook_init(arp_stack_t *);
-extern void arp_hook_destroy(arp_stack_t *);
-extern void arp_net_init(arp_stack_t *, netstackid_t);
-extern void arp_net_shutdown(arp_stack_t *);
-extern void arp_net_destroy(arp_stack_t *);
-
-#endif /* _KERNEL */
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _ARP_IMPL_H */
diff --git a/usr/src/uts/common/inet/ip.h b/usr/src/uts/common/inet/ip.h
index 5a7e05b..88a1406 100644
--- a/usr/src/uts/common/inet/ip.h
+++ b/usr/src/uts/common/inet/ip.h
@@ -55,8 +55,6 @@
#include <sys/squeue.h>
#include <net/route.h>
#include <sys/systm.h>
-#include <sys/multidata.h>
-#include <sys/list.h>
#include <net/radix.h>
#include <sys/modhash.h>
@@ -94,6 +92,7 @@
/* Number of bits in an address */
#define IP_ABITS 32
+#define IPV4_ABITS IP_ABITS
#define IPV6_ABITS 128
#define IP_HOST_MASK (ipaddr_t)0xffffffffU
@@ -101,14 +100,6 @@
#define IP_CSUM(mp, off, sum) (~ip_cksum(mp, off, sum) & 0xFFFF)
#define IP_CSUM_PARTIAL(mp, off, sum) ip_cksum(mp, off, sum)
#define IP_BCSUM_PARTIAL(bp, len, sum) bcksum(bp, len, sum)
-#define IP_MD_CSUM(pd, off, sum) (~ip_md_cksum(pd, off, sum) & 0xffff)
-#define IP_MD_CSUM_PARTIAL(pd, off, sum) ip_md_cksum(pd, off, sum)
-
-/*
- * Flag to IP write side to indicate that the appln has sent in a pre-built
- * IP header. Stored in ipha_ident (which is otherwise zero).
- */
-#define IP_HDR_INCLUDED 0xFFFF
#define ILL_FRAG_HASH_TBL_COUNT ((unsigned int)64)
#define ILL_FRAG_HASH_TBL_SIZE (ILL_FRAG_HASH_TBL_COUNT * sizeof (ipfb_t))
@@ -137,17 +128,12 @@
#define UDPH_SIZE 8
-/* Leave room for ip_newroute to tack on the src and target addresses */
-#define OK_RESOLVER_MP(mp) \
- ((mp) && ((mp)->b_wptr - (mp)->b_rptr) >= (2 * IP_ADDR_LEN))
-
/*
* Constants and type definitions to support IP IOCTL commands
*/
#define IP_IOCTL (('i'<<8)|'p')
#define IP_IOC_IRE_DELETE 4
#define IP_IOC_IRE_DELETE_NO_REPLY 5
-#define IP_IOC_IRE_ADVISE_NO_REPLY 6
#define IP_IOC_RTS_REQUEST 7
/* Common definitions used by IP IOCTL data structures */
@@ -157,31 +143,6 @@
uint_t ipllc_name_length;
} ipllc_t;
-/* IP IRE Change Command Structure. */
-typedef struct ipic_s {
- ipllc_t ipic_ipllc;
- uint_t ipic_ire_type;
- uint_t ipic_max_frag;
- uint_t ipic_addr_offset;
- uint_t ipic_addr_length;
- uint_t ipic_mask_offset;
- uint_t ipic_mask_length;
- uint_t ipic_src_addr_offset;
- uint_t ipic_src_addr_length;
- uint_t ipic_ll_hdr_offset;
- uint_t ipic_ll_hdr_length;
- uint_t ipic_gateway_addr_offset;
- uint_t ipic_gateway_addr_length;
- clock_t ipic_rtt;
- uint32_t ipic_ssthresh;
- clock_t ipic_rtt_sd;
- uchar_t ipic_ire_marks;
-} ipic_t;
-
-#define ipic_cmd ipic_ipllc.ipllc_cmd
-#define ipic_ll_name_length ipic_ipllc.ipllc_name_length
-#define ipic_ll_name_offset ipic_ipllc.ipllc_name_offset
-
/* IP IRE Delete Command Structure. */
typedef struct ipid_s {
ipllc_t ipid_ipllc;
@@ -257,16 +218,8 @@
#define Q_TO_ICMP(q) (Q_TO_CONN((q))->conn_icmp)
#define Q_TO_RTS(q) (Q_TO_CONN((q))->conn_rts)
-/*
- * The following two macros are used by IP to get the appropriate
- * wq and rq for a conn. If it is a TCP conn, then we need
- * tcp_wq/tcp_rq else, conn_wq/conn_rq. IP can use conn_wq and conn_rq
- * from a conn directly if it knows that the conn is not TCP.
- */
-#define CONNP_TO_WQ(connp) \
- (IPCL_IS_TCP(connp) ? (connp)->conn_tcp->tcp_wq : (connp)->conn_wq)
-
-#define CONNP_TO_RQ(connp) RD(CONNP_TO_WQ(connp))
+#define CONNP_TO_WQ(connp) ((connp)->conn_wq)
+#define CONNP_TO_RQ(connp) ((connp)->conn_rq)
#define GRAB_CONN_LOCK(q) { \
if (q != NULL && CONN_Q(q)) \
@@ -278,9 +231,6 @@
mutex_exit(&(Q_TO_CONN(q))->conn_lock); \
}
-/* "Congestion controlled" protocol */
-#define IP_FLOW_CONTROLLED_ULP(p) ((p) == IPPROTO_TCP || (p) == IPPROTO_SCTP)
-
/*
* Complete the pending operation. Usually an ioctl. Can also
* be a bind or option management request that got enqueued
@@ -295,63 +245,13 @@
}
/*
- * Flags for the various ip_fanout_* routines.
- */
-#define IP_FF_SEND_ICMP 0x01 /* Send an ICMP error */
-#define IP_FF_HDR_COMPLETE 0x02 /* Call ip_hdr_complete if error */
-#define IP_FF_CKSUM 0x04 /* Recompute ipha_cksum if error */
-#define IP_FF_RAWIP 0x08 /* Use rawip mib variable */
-#define IP_FF_SRC_QUENCH 0x10 /* OK to send ICMP_SOURCE_QUENCH */
-#define IP_FF_SYN_ADDIRE 0x20 /* Add IRE if TCP syn packet */
-#define IP_FF_IPINFO 0x80 /* Used for both V4 and V6 */
-#define IP_FF_SEND_SLLA 0x100 /* Send source link layer info ? */
-#define IPV6_REACHABILITY_CONFIRMATION 0x200 /* Flags for ip_xmit_v6 */
-#define IP_FF_NO_MCAST_LOOP 0x400 /* No multicasts for sending zone */
-
-/*
- * Following flags are used by IPQoS to determine if policy processing is
- * required.
- */
-#define IP6_NO_IPPOLICY 0x800 /* Don't do IPQoS processing */
-#define IP6_IN_LLMCAST 0x1000 /* Multicast */
-
-#define IP_FF_LOOPBACK 0x2000 /* Loopback fanout */
-#define IP_FF_SCTP_CSUM_ERR 0x4000 /* sctp pkt has failed chksum */
-
-#ifndef IRE_DB_TYPE
-#define IRE_DB_TYPE M_SIG
-#endif
-
-#ifndef IRE_DB_REQ_TYPE
-#define IRE_DB_REQ_TYPE M_PCSIG
-#endif
-
-#ifndef IRE_ARPRESOLVE_TYPE
-#define IRE_ARPRESOLVE_TYPE M_EVENT
-#endif
-
-/*
* Values for squeue switch:
*/
-
#define IP_SQUEUE_ENTER_NODRAIN 1
#define IP_SQUEUE_ENTER 2
-/*
- * This is part of the interface between Transport provider and
- * IP which can be used to set policy information. This is usually
- * accompanied with O_T_BIND_REQ/T_BIND_REQ.ip_bind assumes that
- * only IPSEC_POLICY_SET is there when it is found in the chain.
- * The information contained is an struct ipsec_req_t. On success
- * or failure, either the T_BIND_ACK or the T_ERROR_ACK is returned.
- * IPSEC_POLICY_SET is never returned.
- */
-#define IPSEC_POLICY_SET M_SETOPTS
+#define IP_SQUEUE_FILL 3
-#define IRE_IS_LOCAL(ire) ((ire != NULL) && \
- ((ire)->ire_type & (IRE_LOCAL | IRE_LOOPBACK)))
-
-#define IRE_IS_TARGET(ire) ((ire != NULL) && \
- ((ire)->ire_type != IRE_BROADCAST))
+extern int ip_squeue_flag;
/* IP Fragmentation Reassembly Header */
typedef struct ipf_s {
@@ -387,71 +287,6 @@
#define ipf_src V4_PART_OF_V6(ipf_v6src)
#define ipf_dst V4_PART_OF_V6(ipf_v6dst)
-typedef enum {
- IB_PKT = 0x01,
- OB_PKT = 0x02
-} ip_pkt_t;
-
-#define UPDATE_IB_PKT_COUNT(ire)\
- { \
- (ire)->ire_ib_pkt_count++; \
- if ((ire)->ire_ipif != NULL) { \
- /* \
- * forwarding packet \
- */ \
- if ((ire)->ire_type & (IRE_LOCAL|IRE_BROADCAST)) \
- atomic_add_32(&(ire)->ire_ipif->ipif_ib_pkt_count, 1);\
- else \
- atomic_add_32(&(ire)->ire_ipif->ipif_fo_pkt_count, 1);\
- } \
- }
-
-#define UPDATE_OB_PKT_COUNT(ire)\
- { \
- (ire)->ire_ob_pkt_count++;\
- if ((ire)->ire_ipif != NULL) { \
- atomic_add_32(&(ire)->ire_ipif->ipif_ob_pkt_count, 1); \
- } \
- }
-
-#define IP_RPUT_LOCAL(q, mp, ipha, ire, recv_ill) \
-{ \
- switch (ipha->ipha_protocol) { \
- case IPPROTO_UDP: \
- ip_udp_input(q, mp, ipha, ire, recv_ill); \
- break; \
- default: \
- ip_proto_input(q, mp, ipha, ire, recv_ill, 0); \
- break; \
- } \
-}
-
-/*
- * NCE_EXPIRED is TRUE when we have a non-permanent nce that was
- * found to be REACHABLE more than ip_ire_arp_interval ms ago.
- * This macro is used to age existing nce_t entries. The
- * nce's will get cleaned up in the following circumstances:
- * - ip_ire_trash_reclaim will free nce's using ndp_cache_reclaim
- * when memory is low,
- * - ip_arp_news, when updates are received.
- * - if the nce is NCE_EXPIRED(), it will deleted, so that a new
- * arp request will need to be triggered from an ND_INITIAL nce.
- *
- * Note that the nce state transition follows the pattern:
- * ND_INITIAL -> ND_INCOMPLETE -> ND_REACHABLE
- * after which the nce is deleted when it has expired.
- *
- * nce_last is the timestamp that indicates when the nce_res_mp in the
- * nce_t was last updated to a valid link-layer address. nce_last gets
- * modified/updated :
- * - when the nce is created
- * - every time we get a sane arp response for the nce.
- */
-#define NCE_EXPIRED(nce, ipst) (nce->nce_last > 0 && \
- ((nce->nce_flags & NCE_F_PERMANENT) == 0) && \
- ((TICK_TO_MSEC(lbolt64) - nce->nce_last) > \
- (ipst)->ips_ip_ire_arp_interval))
-
#endif /* _KERNEL */
/* ICMP types */
@@ -560,7 +395,17 @@
#define IPH_DF 0x4000 /* Don't fragment */
#define IPH_MF 0x2000 /* More fragments to come */
#define IPH_OFFSET 0x1FFF /* Where the offset lives */
-#define IPH_FRAG_HDR 0x8000 /* IPv6 don't fragment bit */
+
+/* Byte-order specific values */
+#ifdef _BIG_ENDIAN
+#define IPH_DF_HTONS 0x4000 /* Don't fragment */
+#define IPH_MF_HTONS 0x2000 /* More fragments to come */
+#define IPH_OFFSET_HTONS 0x1FFF /* Where the offset lives */
+#else
+#define IPH_DF_HTONS 0x0040 /* Don't fragment */
+#define IPH_MF_HTONS 0x0020 /* More fragments to come */
+#define IPH_OFFSET_HTONS 0xFF1F /* Where the offset lives */
+#endif
/* ECN code points for IPv4 TOS byte and IPv6 traffic class octet. */
#define IPH_ECN_NECT 0x0 /* Not ECN-Capable Transport */
@@ -571,10 +416,8 @@
struct ill_s;
typedef void ip_v6intfid_func_t(struct ill_s *, in6_addr_t *);
-typedef boolean_t ip_v6mapinfo_func_t(uint_t, uint8_t *, uint8_t *, uint32_t *,
- in6_addr_t *);
-typedef boolean_t ip_v4mapinfo_func_t(uint_t, uint8_t *, uint8_t *, uint32_t *,
- ipaddr_t *);
+typedef void ip_v6mapinfo_func_t(struct ill_s *, uchar_t *, uchar_t *);
+typedef void ip_v4mapinfo_func_t(struct ill_s *, uchar_t *, uchar_t *);
/* IP Mac info structure */
typedef struct ip_m_s {
@@ -582,8 +425,8 @@
int ip_m_type; /* From <net/if_types.h> */
t_uscalar_t ip_m_ipv4sap;
t_uscalar_t ip_m_ipv6sap;
- ip_v4mapinfo_func_t *ip_m_v4mapinfo;
- ip_v6mapinfo_func_t *ip_m_v6mapinfo;
+ ip_v4mapinfo_func_t *ip_m_v4mapping;
+ ip_v6mapinfo_func_t *ip_m_v6mapping;
ip_v6intfid_func_t *ip_m_v6intfid;
ip_v6intfid_func_t *ip_m_v6destintfid;
} ip_m_t;
@@ -591,20 +434,14 @@
/*
* The following functions attempt to reduce the link layer dependency
* of the IP stack. The current set of link specific operations are:
- * a. map from IPv4 class D (224.0/4) multicast address range to the link
- * layer multicast address range.
- * b. map from IPv6 multicast address range (ff00::/8) to the link
- * layer multicast address range.
- * c. derive the default IPv6 interface identifier from the interface.
- * d. derive the default IPv6 destination interface identifier from
+ * a. map from IPv4 class D (224.0/4) multicast address range or the
+ * IPv6 multicast address range (ff00::/8) to the link layer multicast
+ * address.
+ * b. derive the default IPv6 interface identifier from the interface.
+ * c. derive the default IPv6 destination interface identifier from
* the interface (point-to-point only).
*/
-#define MEDIA_V4MINFO(ip_m, plen, bphys, maddr, hwxp, v4ptr) \
- (((ip_m)->ip_m_v4mapinfo != NULL) && \
- (*(ip_m)->ip_m_v4mapinfo)(plen, bphys, maddr, hwxp, v4ptr))
-#define MEDIA_V6MINFO(ip_m, plen, bphys, maddr, hwxp, v6ptr) \
- (((ip_m)->ip_m_v6mapinfo != NULL) && \
- (*(ip_m)->ip_m_v6mapinfo)(plen, bphys, maddr, hwxp, v6ptr))
+extern void ip_mcast_mapping(struct ill_s *, uchar_t *, uchar_t *);
/* ip_m_v6*intfid return void and are never NULL */
#define MEDIA_V6INTFID(ip_m, ill, v6ptr) (ip_m)->ip_m_v6intfid(ill, v6ptr)
#define MEDIA_V6DESTINTFID(ip_m, ill, v6ptr) \
@@ -616,107 +453,38 @@
#define IRE_LOCAL 0x0004 /* Route entry for local address */
#define IRE_LOOPBACK 0x0008 /* Route entry for loopback address */
#define IRE_PREFIX 0x0010 /* Route entry for prefix routes */
+#ifndef _KERNEL
+/* Keep so user-level still compiles */
#define IRE_CACHE 0x0020 /* Cached Route entry */
+#endif
#define IRE_IF_NORESOLVER 0x0040 /* Route entry for local interface */
/* net without any address mapping. */
#define IRE_IF_RESOLVER 0x0080 /* Route entry for local interface */
/* net with resolver. */
#define IRE_HOST 0x0100 /* Host route entry */
+/* Keep so user-level still compiles */
#define IRE_HOST_REDIRECT 0x0200 /* only used for T_SVR4_OPTMGMT_REQ */
+#define IRE_IF_CLONE 0x0400 /* Per host clone of IRE_IF */
+#define IRE_MULTICAST 0x0800 /* Special - not in table */
+#define IRE_NOROUTE 0x1000 /* Special - not in table */
#define IRE_INTERFACE (IRE_IF_NORESOLVER | IRE_IF_RESOLVER)
+
+#define IRE_IF_ALL (IRE_IF_NORESOLVER | IRE_IF_RESOLVER | \
+ IRE_IF_CLONE)
#define IRE_OFFSUBNET (IRE_DEFAULT | IRE_PREFIX | IRE_HOST)
-#define IRE_CACHETABLE (IRE_CACHE | IRE_BROADCAST | IRE_LOCAL | \
- IRE_LOOPBACK)
-#define IRE_FORWARDTABLE (IRE_INTERFACE | IRE_OFFSUBNET)
-
+#define IRE_OFFLINK IRE_OFFSUBNET
/*
- * If an IRE is marked with IRE_MARK_CONDEMNED, the last walker of
- * the bucket should delete this IRE from this bucket.
+ * Note that we view IRE_NOROUTE as ONLINK since we can "send" to them without
+ * going through a router; the result of sending will be an error/icmp error.
*/
-#define IRE_MARK_CONDEMNED 0x0001
-
-/*
- * An IRE with IRE_MARK_PMTU has ire_max_frag set from an ICMP error.
- */
-#define IRE_MARK_PMTU 0x0002
-
-/*
- * An IRE with IRE_MARK_TESTHIDDEN is used by in.mpathd for test traffic. It
- * can only be looked up by requesting MATCH_IRE_MARK_TESTHIDDEN.
- */
-#define IRE_MARK_TESTHIDDEN 0x0004
-
-/*
- * An IRE with IRE_MARK_NOADD is created in ip_newroute_ipif when the outgoing
- * interface is specified by e.g. IP_PKTINFO. The IRE is not added to the IRE
- * cache table.
- */
-#define IRE_MARK_NOADD 0x0008 /* Mark not to add ire in cache */
-
-/*
- * IRE marked with IRE_MARK_TEMPORARY means that this IRE has been used
- * either for forwarding a packet or has not been used for sending
- * traffic on TCP connections terminated on this system. In both
- * cases, this IRE is the first to go when IRE is being cleaned up.
- */
-#define IRE_MARK_TEMPORARY 0x0010
-
-/*
- * IRE marked with IRE_MARK_USESRC_CHECK means that while adding an IRE with
- * this mark, additional atomic checks need to be performed. For eg: by the
- * time an IRE_CACHE is created, sent up to ARP and then comes back to IP; the
- * usesrc grouping could have changed in which case we want to fail adding
- * the IRE_CACHE entry
- */
-#define IRE_MARK_USESRC_CHECK 0x0020
-
-/*
- * IRE_MARK_PRIVATE_ADDR is used for IP_NEXTHOP. When IP_NEXTHOP is set, the
- * routing table lookup for the destination is bypassed and the packet is
- * sent directly to the specified nexthop. The associated IRE_CACHE entries
- * should be marked with IRE_MARK_PRIVATE_ADDR flag so that they don't show up
- * in regular ire cache lookups.
- */
-#define IRE_MARK_PRIVATE_ADDR 0x0040
-
-/*
- * When we send an ARP resolution query for the nexthop gateway's ire,
- * we use esballoc to create the ire_t in the AR_ENTRY_QUERY mblk
- * chain, and mark its ire_marks with IRE_MARK_UNCACHED. This flag
- * indicates that information from ARP has not been transferred to a
- * permanent IRE_CACHE entry. The flag is reset only when the
- * information is successfully transferred to an ire_cache entry (in
- * ire_add()). Attempting to free the AR_ENTRY_QUERY mblk chain prior
- * to ire_add (e.g., from arp, or from ip`ip_wput_nondata) will
- * require that the resources (incomplete ire_cache and/or nce) must
- * be cleaned up. The free callback routine (ire_freemblk()) checks
- * for IRE_MARK_UNCACHED to see if any resources that are pinned down
- * will need to be cleaned up or not.
- */
-
-#define IRE_MARK_UNCACHED 0x0080
-
-/*
- * The comment below (and for other netstack_t references) refers
- * to the fact that we only do netstack_hold in particular cases,
- * such as the references from open streams (ill_t and conn_t's
- * pointers). Internally within IP we rely on IP's ability to cleanup e.g.
- * ire_t's when an ill goes away.
- */
-typedef struct ire_expire_arg_s {
- int iea_flush_flag;
- ip_stack_t *iea_ipst; /* Does not have a netstack_hold */
-} ire_expire_arg_t;
-
-/* Flags with ire_expire routine */
-#define FLUSH_ARP_TIME 0x0001 /* ARP info potentially stale timer */
-#define FLUSH_REDIRECT_TIME 0x0002 /* Redirects potentially stale */
-#define FLUSH_MTU_TIME 0x0004 /* Include path MTU per RFC 1191 */
+#define IRE_ONLINK (IRE_IF_ALL|IRE_LOCAL|IRE_LOOPBACK| \
+ IRE_BROADCAST|IRE_MULTICAST|IRE_NOROUTE)
/* Arguments to ire_flush_cache() */
#define IRE_FLUSH_DELETE 0
#define IRE_FLUSH_ADD 1
+#define IRE_FLUSH_GWCHANGE 2
/*
* Open/close synchronization flags.
@@ -724,31 +492,21 @@
* depends on the atomic 32 bit access to that field.
*/
#define CONN_CLOSING 0x01 /* ip_close waiting for ip_wsrv */
-#define CONN_IPSEC_LOAD_WAIT 0x02 /* waiting for load */
-#define CONN_CONDEMNED 0x04 /* conn is closing, no more refs */
-#define CONN_INCIPIENT 0x08 /* conn not yet visible, no refs */
-#define CONN_QUIESCED 0x10 /* conn is now quiescent */
-
-/* Used to check connection state flags before caching the IRE */
-#define CONN_CACHE_IRE(connp) \
- (!((connp)->conn_state_flags & (CONN_CLOSING|CONN_CONDEMNED)))
+#define CONN_CONDEMNED 0x02 /* conn is closing, no more refs */
+#define CONN_INCIPIENT 0x04 /* conn not yet visible, no refs */
+#define CONN_QUIESCED 0x08 /* conn is now quiescent */
+#define CONN_UPDATE_ILL 0x10 /* conn_update_ill in progress */
/*
- * Parameter to ip_output giving the identity of the caller.
- * IP_WSRV means the packet was enqueued in the STREAMS queue
- * due to flow control and is now being reprocessed in the context of
- * the STREAMS service procedure, consequent to flow control relief.
- * IRE_SEND means the packet is being reprocessed consequent to an
- * ire cache creation and addition and this may or may not be happening
- * in the service procedure context. Anything other than the above 2
- * cases is identified as IP_WPUT. Most commonly this is the case of
- * packets coming down from the application.
+ * Flags for dce_flags field. Specifies which information has been set.
+ * dce_ident is always present, but the other ones are identified by the flags.
*/
-#ifdef _KERNEL
-#define IP_WSRV 1 /* Called from ip_wsrv */
-#define IP_WPUT 2 /* Called from ip_wput */
-#define IRE_SEND 3 /* Called from ire_send */
+#define DCEF_DEFAULT 0x0001 /* Default DCE - no pmtu or uinfo */
+#define DCEF_PMTU 0x0002 /* Different than interface MTU */
+#define DCEF_UINFO 0x0004 /* dce_uinfo set */
+#define DCEF_TOO_SMALL_PMTU 0x0008 /* Smaller than IPv4/IPv6 MIN */
+#ifdef _KERNEL
/*
* Extra structures need for per-src-addr filtering (IGMPv3/MLDv2)
*/
@@ -786,90 +544,80 @@
} mrec_t;
/* Group membership list per upper conn */
+
/*
- * XXX add ilg info for ifaddr/ifindex.
- * XXX can we make ilg survive an ifconfig unplumb + plumb
- * by setting the ipif/ill to NULL and recover that later?
+ * We record the multicast information from the socket option in
+ * ilg_ifaddr/ilg_ifindex. This allows rejoining the group in the case when
+ * the ifaddr (or ifindex) disappears and later reappears, potentially on
+ * a different ill. The IPv6 multicast socket options and ioctls all specify
+ * the interface using an ifindex. For IPv4 some socket options/ioctls use
+ * the interface address and others use the index. We record here the method
+ * that was actually used (and leave the other of ilg_ifaddr or ilg_ifindex)
+ * at zero so that we can rejoin the way the application intended.
*
- * ilg_ipif is used by IPv4 as multicast groups are joined using an interface
- * address (ipif).
- * ilg_ill is used by IPv6 as multicast groups are joined using an interface
- * index (phyint->phyint_ifindex).
- * ilg_ill is NULL for IPv4 and ilg_ipif is NULL for IPv6.
+ * We track the ill on which we will or already have joined an ilm using
+ * ilg_ill. When we have succeeded joining the ilm and have a refhold on it
+ * then we set ilg_ilm. Thus intentionally there is a window where ilg_ill is
+ * set and ilg_ilm is not set. This allows clearing ilg_ill as a signal that
+ * the ill is being unplumbed and the ilm should be discarded.
*
* ilg records the state of multicast memberships of a socket end point.
* ilm records the state of multicast memberships with the driver and is
* maintained per interface.
*
- * There is no direct link between a given ilg and ilm. If the
- * application has joined a group G with ifindex I, we will have
- * an ilg with ilg_v6group and ilg_ill. There will be a corresponding
- * ilm with ilm_ill/ilm_v6addr recording the multicast membership.
- * To delete the membership:
- *
- * a) Search for ilg matching on G and I with ilg_v6group
- * and ilg_ill. Delete ilg_ill.
- * b) Search the corresponding ilm matching on G and I with
- * ilm_v6addr and ilm_ill. Delete ilm.
- *
- * For IPv4 the only difference is that we look using ipifs, not ills.
+ * The ilg state is protected by conn_ilg_lock.
+ * The ilg will not be freed until ilg_refcnt drops to zero.
*/
-
-/*
- * The ilg_t and ilm_t members are protected by ipsq. They can be changed only
- * by a thread executing in the ipsq. In other words add/delete of a
- * multicast group has to execute in the ipsq.
- */
-#define ILG_DELETED 0x1 /* ilg_flags */
typedef struct ilg_s {
+ struct ilg_s *ilg_next;
+ struct ilg_s **ilg_ptpn;
+ struct conn_s *ilg_connp; /* Back pointer to get lock */
in6_addr_t ilg_v6group;
- struct ipif_s *ilg_ipif; /* Logical interface we are member on */
- struct ill_s *ilg_ill; /* Used by IPv6 */
- uint_t ilg_flags;
+ ipaddr_t ilg_ifaddr; /* For some IPv4 cases */
+ uint_t ilg_ifindex; /* IPv6 and some other IPv4 cases */
+ struct ill_s *ilg_ill; /* Where ilm is joined. No refhold */
+ struct ilm_s *ilg_ilm; /* With ilm_refhold */
+ uint_t ilg_refcnt;
mcast_record_t ilg_fmode; /* MODE_IS_INCLUDE/MODE_IS_EXCLUDE */
slist_t *ilg_filter;
+ boolean_t ilg_condemned; /* Conceptually deleted */
} ilg_t;
/*
* Multicast address list entry for ill.
- * ilm_ipif is used by IPv4 as multicast groups are joined using ipif.
- * ilm_ill is used by IPv6 as multicast groups are joined using ill.
- * ilm_ill is NULL for IPv4 and ilm_ipif is NULL for IPv6.
+ * ilm_ill is used by IPv4 and IPv6
+ *
+ * The ilm state (and other multicast state on the ill) is protected by
+ * ill_mcast_lock. Operations that change state on both an ilg and ilm
+ * in addition use ill_mcast_serializer to ensure that we can't have
+ * interleaving between e.g., add and delete operations for the same conn_t,
+ * group, and ill.
*
* The comment below (and for other netstack_t references) refers
* to the fact that we only do netstack_hold in particular cases,
- * such as the references from open streams (ill_t and conn_t's
+ * such as the references from open endpoints (ill_t and conn_t's
* pointers). Internally within IP we rely on IP's ability to cleanup e.g.
* ire_t's when an ill goes away.
*/
-#define ILM_DELETED 0x1 /* ilm_flags */
typedef struct ilm_s {
in6_addr_t ilm_v6addr;
int ilm_refcnt;
uint_t ilm_timer; /* IGMP/MLD query resp timer, in msec */
- struct ipif_s *ilm_ipif; /* Back pointer to ipif for IPv4 */
struct ilm_s *ilm_next; /* Linked list for each ill */
uint_t ilm_state; /* state of the membership */
- struct ill_s *ilm_ill; /* Back pointer to ill for IPv6 */
- uint_t ilm_flags;
- boolean_t ilm_notify_driver; /* Need to notify the driver */
+ struct ill_s *ilm_ill; /* Back pointer to ill - ill_ilm_cnt */
zoneid_t ilm_zoneid;
int ilm_no_ilg_cnt; /* number of joins w/ no ilg */
mcast_record_t ilm_fmode; /* MODE_IS_INCLUDE/MODE_IS_EXCLUDE */
slist_t *ilm_filter; /* source filter list */
slist_t *ilm_pendsrcs; /* relevant src addrs for pending req */
rtx_state_t ilm_rtx; /* SCR retransmission state */
+ ipaddr_t ilm_ifaddr; /* For IPv4 netstat */
ip_stack_t *ilm_ipst; /* Does not have a netstack_hold */
} ilm_t;
#define ilm_addr V4_PART_OF_V6(ilm_v6addr)
-typedef struct ilm_walker {
- struct ill_s *ilw_ill; /* associated ill */
- struct ill_s *ilw_ipmp_ill; /* associated ipmp ill (if any) */
- struct ill_s *ilw_walk_ill; /* current ill being walked */
-} ilm_walker_t;
-
/*
* Soft reference to an IPsec SA.
*
@@ -898,40 +646,28 @@
* In the presence of IPsec policy, fully-bound conn's bind a connection
* to more than just the 5-tuple, but also a specific IPsec action and
* identity-pair.
+ * The identity pair is accessed from both the receive and transmit side
+ * hence it is maintained in the ipsec_latch_t structure. conn_latch and
+ * ixa_ipsec_latch points to it.
+ * The policy and actions are stored in conn_latch_in_policy and
+ * conn_latch_in_action for the inbound side, and in ixa_ipsec_policy and
+ * ixa_ipsec_action for the transmit side.
*
- * As an optimization, we also cache soft references to IPsec SA's
- * here so that we can fast-path around most of the work needed for
+ * As an optimization, we also cache soft references to IPsec SA's in
+ * ip_xmit_attr_t so that we can fast-path around most of the work needed for
* outbound IPsec SA selection.
- *
- * Were it not for TCP's detached connections, this state would be
- * in-line in conn_t; instead, this is in a separate structure so it
- * can be handed off to TCP when a connection is detached.
*/
typedef struct ipsec_latch_s
{
kmutex_t ipl_lock;
uint32_t ipl_refcnt;
- uint64_t ipl_unique;
- struct ipsec_policy_s *ipl_in_policy; /* latched policy (in) */
- struct ipsec_policy_s *ipl_out_policy; /* latched policy (out) */
- struct ipsec_action_s *ipl_in_action; /* latched action (in) */
- struct ipsec_action_s *ipl_out_action; /* latched action (out) */
- cred_t *ipl_local_id;
struct ipsid_s *ipl_local_cid;
struct ipsid_s *ipl_remote_cid;
unsigned int
- ipl_out_action_latched : 1,
- ipl_in_action_latched : 1,
- ipl_out_policy_latched : 1,
- ipl_in_policy_latched : 1,
-
ipl_ids_latched : 1,
- ipl_pad_to_bit_31 : 27;
-
- ipsa_ref_t ipl_ref[2]; /* 0: ESP, 1: AH */
-
+ ipl_pad_to_bit_31 : 31;
} ipsec_latch_t;
#define IPLATCH_REFHOLD(ipl) { \
@@ -939,97 +675,19 @@
ASSERT((ipl)->ipl_refcnt != 0); \
}
-#define IPLATCH_REFRELE(ipl, ns) { \
+#define IPLATCH_REFRELE(ipl) { \
ASSERT((ipl)->ipl_refcnt != 0); \
membar_exit(); \
if (atomic_add_32_nv(&(ipl)->ipl_refcnt, -1) == 0) \
- iplatch_free(ipl, ns); \
+ iplatch_free(ipl); \
}
/*
* peer identity structure.
*/
-
typedef struct conn_s conn_t;
/*
- * The old IP client structure "ipc_t" is gone. All the data is stored in the
- * connection structure "conn_t" now. The mapping of old and new fields looks
- * like this:
- *
- * ipc_ulp conn_ulp
- * ipc_rq conn_rq
- * ipc_wq conn_wq
- *
- * ipc_laddr conn_src
- * ipc_faddr conn_rem
- * ipc_v6laddr conn_srcv6
- * ipc_v6faddr conn_remv6
- *
- * ipc_lport conn_lport
- * ipc_fport conn_fport
- * ipc_ports conn_ports
- *
- * ipc_policy conn_policy
- * ipc_latch conn_latch
- *
- * ipc_irc_lock conn_lock
- * ipc_ire_cache conn_ire_cache
- *
- * ipc_state_flags conn_state_flags
- * ipc_outgoing_ill conn_outgoing_ill
- *
- * ipc_dontroute conn_dontroute
- * ipc_loopback conn_loopback
- * ipc_broadcast conn_broadcast
- * ipc_reuseaddr conn_reuseaddr
- *
- * ipc_multicast_loop conn_multicast_loop
- * ipc_multi_router conn_multi_router
- * ipc_draining conn_draining
- *
- * ipc_did_putbq conn_did_putbq
- * ipc_unspec_src conn_unspec_src
- * ipc_policy_cached conn_policy_cached
- *
- * ipc_in_enforce_policy conn_in_enforce_policy
- * ipc_out_enforce_policy conn_out_enforce_policy
- * ipc_af_isv6 conn_af_isv6
- * ipc_pkt_isv6 conn_pkt_isv6
- *
- * ipc_ipv6_recvpktinfo conn_ipv6_recvpktinfo
- *
- * ipc_ipv6_recvhoplimit conn_ipv6_recvhoplimit
- * ipc_ipv6_recvhopopts conn_ipv6_recvhopopts
- * ipc_ipv6_recvdstopts conn_ipv6_recvdstopts
- *
- * ipc_ipv6_recvrthdr conn_ipv6_recvrthdr
- * ipc_ipv6_recvrtdstopts conn_ipv6_recvrtdstopts
- * ipc_fully_bound conn_fully_bound
- *
- * ipc_recvif conn_recvif
- *
- * ipc_recvslla conn_recvslla
- * ipc_acking_unbind conn_acking_unbind
- * ipc_pad_to_bit_31 conn_pad_to_bit_31
- *
- * ipc_proto conn_proto
- * ipc_incoming_ill conn_incoming_ill
- * ipc_pending_ill conn_pending_ill
- * ipc_unbind_mp conn_unbind_mp
- * ipc_ilg conn_ilg
- * ipc_ilg_allocated conn_ilg_allocated
- * ipc_ilg_inuse conn_ilg_inuse
- * ipc_ilg_walker_cnt conn_ilg_walker_cnt
- * ipc_refcv conn_refcv
- * ipc_multicast_ipif conn_multicast_ipif
- * ipc_multicast_ill conn_multicast_ill
- * ipc_drain_next conn_drain_next
- * ipc_drain_prev conn_drain_prev
- * ipc_idl conn_idl
- */
-
-/*
* This is used to match an inbound/outbound datagram with policy.
*/
typedef struct ipsec_selector {
@@ -1069,22 +727,6 @@
#define IPSEC_POLICY_MAX 5 /* Always max + 1. */
/*
- * Folowing macro is used whenever the code does not know whether there
- * is a M_CTL present in the front and it needs to examine the actual mp
- * i.e the IP header. As a M_CTL message could be in the front, this
- * extracts the packet into mp and the M_CTL mp into first_mp. If M_CTL
- * mp is not present, both first_mp and mp point to the same message.
- */
-#define EXTRACT_PKT_MP(mp, first_mp, mctl_present) \
- (first_mp) = (mp); \
- if ((mp)->b_datap->db_type == M_CTL) { \
- (mp) = (mp)->b_cont; \
- (mctl_present) = B_TRUE; \
- } else { \
- (mctl_present) = B_FALSE; \
- }
-
-/*
* Check with IPSEC inbound policy if
*
* 1) per-socket policy is present - indicated by conn_in_enforce_policy.
@@ -1113,11 +755,6 @@
/*
* Information cached in IRE for upper layer protocol (ULP).
- *
- * Notice that ire_max_frag is not included in the iulp_t structure, which
- * it may seem that it should. But ire_max_frag cannot really be cached. It
- * is fixed for each interface. For MTU found by PMTUd, we may want to cache
- * it. But currently, we do not do that.
*/
typedef struct iulp_s {
boolean_t iulp_set; /* Is any metric set? */
@@ -1128,17 +765,21 @@
uint32_t iulp_rpipe; /* Receive pipe size. */
uint32_t iulp_rtomax; /* Max round trip timeout. */
uint32_t iulp_sack; /* Use SACK option (TCP)? */
+ uint32_t iulp_mtu; /* Setable with routing sockets */
+
uint32_t
iulp_tstamp_ok : 1, /* Use timestamp option (TCP)? */
iulp_wscale_ok : 1, /* Use window scale option (TCP)? */
iulp_ecn_ok : 1, /* Enable ECN (for TCP)? */
iulp_pmtud_ok : 1, /* Enable PMTUd? */
- iulp_not_used : 28;
-} iulp_t;
+ /* These three are passed out by ip_set_destination */
+ iulp_localnet: 1, /* IRE_ONLINK */
+ iulp_loopback: 1, /* IRE_LOOPBACK */
+ iulp_local: 1, /* IRE_LOCAL */
-/* Zero iulp_t. */
-extern const iulp_t ire_uinfo_null;
+ iulp_not_used : 25;
+} iulp_t;
/*
* The conn drain list structure (idl_t).
@@ -1173,7 +814,6 @@
struct idl_s {
conn_t *idl_conn; /* Head of drain list */
kmutex_t idl_lock; /* Lock for this list */
- conn_t *idl_conn_draining; /* conn that is draining */
uint32_t
idl_repeat : 1, /* Last conn must re-enable */
/* drain list again */
@@ -1182,36 +822,38 @@
};
#define CONN_DRAIN_LIST_LOCK(connp) (&((connp)->conn_idl->idl_lock))
+
/*
* Interface route structure which holds the necessary information to recreate
- * routes that are tied to an interface (namely where ire_ipif != NULL).
+ * routes that are tied to an interface i.e. have ire_ill set.
+ *
* These routes which were initially created via a routing socket or via the
* SIOCADDRT ioctl may be gateway routes (RTF_GATEWAY being set) or may be
- * traditional interface routes. When an interface comes back up after being
- * marked down, this information will be used to recreate the routes. These
- * are part of an mblk_t chain that hangs off of the IPIF (ipif_saved_ire_mp).
+ * traditional interface routes. When an ill comes back up after being
+ * down, this information will be used to recreate the routes. These
+ * are part of an mblk_t chain that hangs off of the ILL (ill_saved_ire_mp).
*/
typedef struct ifrt_s {
ushort_t ifrt_type; /* Type of IRE */
in6_addr_t ifrt_v6addr; /* Address IRE represents. */
- in6_addr_t ifrt_v6gateway_addr; /* Gateway if IRE_OFFSUBNET */
- in6_addr_t ifrt_v6src_addr; /* Src addr if RTF_SETSRC */
+ in6_addr_t ifrt_v6gateway_addr; /* Gateway if IRE_OFFLINK */
+ in6_addr_t ifrt_v6setsrc_addr; /* Src addr if RTF_SETSRC */
in6_addr_t ifrt_v6mask; /* Mask for matching IRE. */
uint32_t ifrt_flags; /* flags related to route */
- uint_t ifrt_max_frag; /* MTU (next hop or path). */
- iulp_t ifrt_iulp_info; /* Cached IRE ULP info. */
+ iulp_t ifrt_metrics; /* Routing socket metrics */
+ zoneid_t ifrt_zoneid; /* zoneid for route */
} ifrt_t;
#define ifrt_addr V4_PART_OF_V6(ifrt_v6addr)
#define ifrt_gateway_addr V4_PART_OF_V6(ifrt_v6gateway_addr)
-#define ifrt_src_addr V4_PART_OF_V6(ifrt_v6src_addr)
#define ifrt_mask V4_PART_OF_V6(ifrt_v6mask)
+#define ifrt_setsrc_addr V4_PART_OF_V6(ifrt_v6setsrc_addr)
/* Number of IP addresses that can be hosted on a physical interface */
#define MAX_ADDRS_PER_IF 8192
/*
* Number of Source addresses to be considered for source address
- * selection. Used by ipif_select_source[_v6].
+ * selection. Used by ipif_select_source_v4/v6.
*/
#define MAX_IPIF_SELECT_SOURCE 50
@@ -1245,16 +887,13 @@
#define IPIF_CONDEMNED 0x1 /* The ipif is being removed */
#define IPIF_CHANGING 0x2 /* A critcal ipif field is changing */
#define IPIF_SET_LINKLOCAL 0x10 /* transient flag during bringup */
-#define IPIF_ZERO_SOURCE 0x20 /* transient flag during bringup */
/* IP interface structure, one per local address */
typedef struct ipif_s {
struct ipif_s *ipif_next;
struct ill_s *ipif_ill; /* Back pointer to our ill */
int ipif_id; /* Logical unit number */
- uint_t ipif_mtu; /* Starts at ipif_ill->ill_max_frag */
in6_addr_t ipif_v6lcl_addr; /* Local IP address for this if. */
- in6_addr_t ipif_v6src_addr; /* Source IP address for this if. */
in6_addr_t ipif_v6subnet; /* Subnet prefix for this if. */
in6_addr_t ipif_v6net_mask; /* Net mask for this interface. */
in6_addr_t ipif_v6brd_addr; /* Broadcast addr for this interface. */
@@ -1262,47 +901,29 @@
uint64_t ipif_flags; /* Interface flags. */
uint_t ipif_metric; /* BSD if metric, for compatibility. */
uint_t ipif_ire_type; /* IRE_LOCAL or IRE_LOOPBACK */
- mblk_t *ipif_arp_del_mp; /* Allocated at time arp comes up, to */
- /* prevent awkward out of mem */
- /* condition later */
- mblk_t *ipif_saved_ire_mp; /* Allocated for each extra */
- /* IRE_IF_NORESOLVER/IRE_IF_RESOLVER */
- /* on this interface so that they */
- /* can survive ifconfig down. */
- kmutex_t ipif_saved_ire_lock; /* Protects ipif_saved_ire_mp */
-
- mrec_t *ipif_igmp_rpt; /* List of group memberships which */
- /* will be reported on. Used when */
- /* handling an igmp timeout. */
/*
- * The packet counts in the ipif contain the sum of the
- * packet counts in dead IREs that were affiliated with
- * this ipif.
+ * The packet count in the ipif contain the sum of the
+ * packet counts in dead IRE_LOCAL/LOOPBACK for this ipif.
*/
- uint_t ipif_fo_pkt_count; /* Forwarded thru our dead IREs */
uint_t ipif_ib_pkt_count; /* Inbound packets for our dead IREs */
- uint_t ipif_ob_pkt_count; /* Outbound packets to our dead IREs */
+
/* Exclusive bit fields, protected by ipsq_t */
unsigned int
- ipif_multicast_up : 1, /* ipif_multicast_up() successful */
ipif_was_up : 1, /* ipif was up before */
ipif_addr_ready : 1, /* DAD is done */
ipif_was_dup : 1, /* DAD had failed */
-
- ipif_joined_allhosts : 1, /* allhosts joined */
ipif_added_nce : 1, /* nce added for local address */
- ipif_pad_to_31 : 26;
+
+ ipif_pad_to_31 : 28;
+
+ ilm_t *ipif_allhosts_ilm; /* For all-nodes join */
+ ilm_t *ipif_solmulti_ilm; /* For IPv6 solicited multicast join */
uint_t ipif_seqid; /* unique index across all ills */
uint_t ipif_state_flags; /* See IPIF_* flag defs above */
uint_t ipif_refcnt; /* active consistent reader cnt */
- /* Number of ire's and ilm's referencing this ipif */
- uint_t ipif_ire_cnt;
- uint_t ipif_ilm_cnt;
-
- uint_t ipif_saved_ire_cnt;
zoneid_t ipif_zoneid; /* zone ID number */
timeout_id_t ipif_recovery_id; /* Timer for DAD recovery */
boolean_t ipif_trace_disable; /* True when alloc fails */
@@ -1313,40 +934,12 @@
* part of a group will be pointed to, and an ill cannot disappear
* while it's in a group.
*/
- struct ill_s *ipif_bound_ill;
- struct ipif_s *ipif_bound_next; /* bound ipif chain */
- boolean_t ipif_bound; /* B_TRUE if we successfully bound */
-} ipif_t;
+ struct ill_s *ipif_bound_ill;
+ struct ipif_s *ipif_bound_next; /* bound ipif chain */
+ boolean_t ipif_bound; /* B_TRUE if we successfully bound */
-/*
- * IPIF_FREE_OK() means that there are no incoming references
- * to the ipif. Incoming refs would prevent the ipif from being freed.
- */
-#define IPIF_FREE_OK(ipif) \
- ((ipif)->ipif_ire_cnt == 0 && (ipif)->ipif_ilm_cnt == 0)
-/*
- * IPIF_DOWN_OK() determines whether the incoming pointer reference counts
- * would permit the ipif to be considered quiescent. In order for
- * an ipif or ill to be considered quiescent, the ire and nce references
- * to that ipif/ill must be zero.
- *
- * We do not require the ilm references to go to zero for quiescence
- * because the quiescence checks are done to ensure that
- * outgoing packets do not use addresses from the ipif/ill after it
- * has been marked down, and incoming packets to addresses on a
- * queiscent interface are rejected. This implies that all the
- * ire/nce's using that source address need to be deleted and future
- * creation of any ires using that source address must be prevented.
- * Similarly incoming unicast packets destined to the 'down' address
- * will not be accepted once that ire is gone. However incoming
- * multicast packets are not destined to the downed address.
- * They are only related to the ill in question. Furthermore
- * the current API behavior allows applications to join or leave
- * multicast groups, i.e., IP_ADD_MEMBERSHIP / LEAVE_MEMBERSHIP, using a
- * down address. Therefore the ilm references are not included in
- * the _DOWN_OK macros.
- */
-#define IPIF_DOWN_OK(ipif) ((ipif)->ipif_ire_cnt == 0)
+ struct ire_s *ipif_ire_local; /* Our IRE_LOCAL or LOOPBACK */
+} ipif_t;
/*
* The following table lists the protection levels of the various members
@@ -1371,9 +964,7 @@
* ill_g_lock ill_g_lock
* ipif_ill ipsq + down ipif write once
* ipif_id ipsq + down ipif write once
- * ipif_mtu ipsq
* ipif_v6lcl_addr ipsq + down ipif up ipif
- * ipif_v6src_addr ipsq + down ipif up ipif
* ipif_v6subnet ipsq + down ipif up ipif
* ipif_v6net_mask ipsq + down ipif up ipif
*
@@ -1383,29 +974,31 @@
* ipif_metric
* ipif_ire_type ipsq + down ill up ill
*
- * ipif_arp_del_mp ipsq ipsq
- * ipif_saved_ire_mp ipif_saved_ire_lock ipif_saved_ire_lock
- * ipif_igmp_rpt ipsq ipsq
- *
- * ipif_fo_pkt_count Approx
* ipif_ib_pkt_count Approx
- * ipif_ob_pkt_count Approx
*
* bit fields ill_lock ill_lock
*
+ * ipif_allhosts_ilm ipsq ipsq
+ * ipif_solmulti_ilm ipsq ipsq
+ *
* ipif_seqid ipsq Write once
*
* ipif_state_flags ill_lock ill_lock
* ipif_refcnt ill_lock ill_lock
- * ipif_ire_cnt ill_lock ill_lock
- * ipif_ilm_cnt ill_lock ill_lock
- * ipif_saved_ire_cnt
- *
* ipif_bound_ill ipsq + ipmp_lock ipsq OR ipmp_lock
* ipif_bound_next ipsq ipsq
* ipif_bound ipsq ipsq
+ *
+ * ipif_ire_local ipsq + ips_ill_g_lock ipsq OR ips_ill_g_lock
*/
+/*
+ * Return values from ip_laddr_verify_{v4,v6}
+ */
+typedef enum { IPVL_UNICAST_UP, IPVL_UNICAST_DOWN, IPVL_MCAST, IPVL_BCAST,
+ IPVL_BAD} ip_laddr_t;
+
+
#define IP_TR_HASH(tid) ((((uintptr_t)tid) >> 6) & (IP_TR_HASH_MAX - 1))
#ifdef DEBUG
@@ -1422,18 +1015,12 @@
/* IPv4 compatibility macros */
#define ipif_lcl_addr V4_PART_OF_V6(ipif_v6lcl_addr)
-#define ipif_src_addr V4_PART_OF_V6(ipif_v6src_addr)
#define ipif_subnet V4_PART_OF_V6(ipif_v6subnet)
#define ipif_net_mask V4_PART_OF_V6(ipif_v6net_mask)
#define ipif_brd_addr V4_PART_OF_V6(ipif_v6brd_addr)
#define ipif_pp_dst_addr V4_PART_OF_V6(ipif_v6pp_dst_addr)
/* Macros for easy backreferences to the ill. */
-#define ipif_wq ipif_ill->ill_wq
-#define ipif_rq ipif_ill->ill_rq
-#define ipif_net_type ipif_ill->ill_net_type
-#define ipif_ipif_up_count ipif_ill->ill_ipif_up_count
-#define ipif_type ipif_ill->ill_type
#define ipif_isv6 ipif_ill->ill_isv6
#define SIOCLIFADDR_NDX 112 /* ndx of SIOCLIFADDR in the ndx ioctl table */
@@ -1524,7 +1111,7 @@
boolean_t ipx_current_done; /* is the current operation done? */
int ipx_current_ioctl; /* current ioctl, or 0 if no ioctl */
ipif_t *ipx_current_ipif; /* ipif for current op */
- ipif_t *ipx_pending_ipif; /* ipif for ipsq_pending_mp */
+ ipif_t *ipx_pending_ipif; /* ipif for ipx_pending_mp */
mblk_t *ipx_pending_mp; /* current ioctl mp while waiting */
boolean_t ipx_forced; /* debugging aid */
#ifdef DEBUG
@@ -1642,24 +1229,62 @@
krwlock_t irb_lock; /* Protect this bucket */
uint_t irb_refcnt; /* Protected by irb_lock */
uchar_t irb_marks; /* CONDEMNED ires in this bucket ? */
-#define IRB_MARK_CONDEMNED 0x0001
-#define IRB_MARK_FTABLE 0x0002
+#define IRB_MARK_CONDEMNED 0x0001 /* Contains some IRE_IS_CONDEMNED */
+#define IRB_MARK_DYNAMIC 0x0002 /* Dynamically allocated */
+ /* Once IPv6 uses radix then IRB_MARK_DYNAMIC will be always be set */
uint_t irb_ire_cnt; /* Num of active IRE in this bucket */
- uint_t irb_tmp_ire_cnt; /* Num of temporary IRE */
- struct ire_s *irb_rr_origin; /* origin for round-robin */
int irb_nire; /* Num of ftable ire's that ref irb */
ip_stack_t *irb_ipst; /* Does not have a netstack_hold */
} irb_t;
#define IRB2RT(irb) (rt_t *)((caddr_t)(irb) - offsetof(rt_t, rt_irb))
-/* The following are return values of ip_xmit_v4() */
-typedef enum {
- SEND_PASSED = 0, /* sent packet out on wire */
- SEND_FAILED, /* sending of packet failed */
- LOOKUP_IN_PROGRESS, /* ire cache found, ARP resolution in progress */
- LLHDR_RESLV_FAILED /* macaddr resl of onlink dst or nexthop failed */
-} ipxmit_state_t;
+/* Forward declarations */
+struct dce_s;
+typedef struct dce_s dce_t;
+struct ire_s;
+typedef struct ire_s ire_t;
+struct ncec_s;
+typedef struct ncec_s ncec_t;
+struct nce_s;
+typedef struct nce_s nce_t;
+struct ip_recv_attr_s;
+typedef struct ip_recv_attr_s ip_recv_attr_t;
+struct ip_xmit_attr_s;
+typedef struct ip_xmit_attr_s ip_xmit_attr_t;
+
+struct tsol_ire_gw_secattr_s;
+typedef struct tsol_ire_gw_secattr_s tsol_ire_gw_secattr_t;
+
+/*
+ * This is a structure for a one-element route cache that is passed
+ * by reference between ip_input and ill_inputfn.
+ */
+typedef struct {
+ ire_t *rtc_ire;
+ ipaddr_t rtc_ipaddr;
+ in6_addr_t rtc_ip6addr;
+} rtc_t;
+
+/*
+ * Note: Temporarily use 64 bits, and will probably go back to 32 bits after
+ * more cleanup work is done.
+ */
+typedef uint64_t iaflags_t;
+
+/* The ill input function pointer type */
+typedef void (*pfillinput_t)(mblk_t *, void *, void *, ip_recv_attr_t *,
+ rtc_t *);
+
+/* The ire receive function pointer type */
+typedef void (*pfirerecv_t)(ire_t *, mblk_t *, void *, ip_recv_attr_t *);
+
+/* The ire send and postfrag function pointer types */
+typedef int (*pfiresend_t)(ire_t *, mblk_t *, void *,
+ ip_xmit_attr_t *, uint32_t *);
+typedef int (*pfirepostfrag_t)(mblk_t *, nce_t *, iaflags_t, uint_t, uint32_t,
+ zoneid_t, zoneid_t, uintptr_t *);
+
#define IP_V4_G_HEAD 0
#define IP_V6_G_HEAD 1
@@ -1733,26 +1358,12 @@
/*
* Capabilities, possible flags for ill_capabilities.
*/
-
-#define ILL_CAPAB_AH 0x01 /* IPsec AH acceleration */
-#define ILL_CAPAB_ESP 0x02 /* IPsec ESP acceleration */
-#define ILL_CAPAB_MDT 0x04 /* Multidata Transmit */
+#define ILL_CAPAB_LSO 0x04 /* Large Send Offload */
#define ILL_CAPAB_HCKSUM 0x08 /* Hardware checksumming */
#define ILL_CAPAB_ZEROCOPY 0x10 /* Zero-copy */
#define ILL_CAPAB_DLD 0x20 /* DLD capabilities */
#define ILL_CAPAB_DLD_POLL 0x40 /* Polling */
#define ILL_CAPAB_DLD_DIRECT 0x80 /* Direct function call */
-#define ILL_CAPAB_DLD_LSO 0x100 /* Large Segment Offload */
-
-/*
- * Per-ill Multidata Transmit capabilities.
- */
-typedef struct ill_mdt_capab_s ill_mdt_capab_t;
-
-/*
- * Per-ill IPsec capabilities.
- */
-typedef struct ill_ipsec_capab_s ill_ipsec_capab_t;
/*
* Per-ill Hardware Checksumming capbilities.
@@ -1775,15 +1386,18 @@
typedef struct ill_rx_ring ill_rx_ring_t;
/*
- * Per-ill Large Segment Offload capabilities.
+ * Per-ill Large Send Offload capabilities.
*/
typedef struct ill_lso_capab_s ill_lso_capab_t;
/* The following are ill_state_flags */
#define ILL_LL_SUBNET_PENDING 0x01 /* Waiting for DL_INFO_ACK from drv */
#define ILL_CONDEMNED 0x02 /* No more new ref's to the ILL */
-#define ILL_CHANGING 0x04 /* ILL not globally visible */
-#define ILL_DL_UNBIND_IN_PROGRESS 0x08 /* UNBIND_REQ is sent */
+#define ILL_DL_UNBIND_IN_PROGRESS 0x04 /* UNBIND_REQ is sent */
+#define ILL_DOWN_IN_PROGRESS 0x08 /* ILL is going down - no new nce's */
+#define ILL_LL_BIND_PENDING 0x0020 /* XXX Reuse ILL_LL_SUBNET_PENDING ? */
+#define ILL_LL_UP 0x0040
+#define ILL_LL_DOWN 0x0080
/* Is this an ILL whose source address is used by other ILL's ? */
#define IS_USESRC_ILL(ill) \
@@ -1796,10 +1410,9 @@
((ill)->ill_usesrc_grp_next != NULL))
/* Is this an virtual network interface (vni) ILL ? */
-#define IS_VNI(ill) \
- (((ill) != NULL) && \
+#define IS_VNI(ill) \
(((ill)->ill_phyint->phyint_flags & (PHYI_LOOPBACK|PHYI_VIRTUAL)) == \
- PHYI_VIRTUAL))
+ PHYI_VIRTUAL)
/* Is this a loopback ILL? */
#define IS_LOOPBACK(ill) \
@@ -1900,18 +1513,41 @@
* ARP up-to-date as the active set of interfaces in the group changes.
*/
typedef struct ipmp_arpent_s {
- mblk_t *ia_area_mp; /* AR_ENTRY_ADD pointer */
ipaddr_t ia_ipaddr; /* IP address for this entry */
boolean_t ia_proxyarp; /* proxy ARP entry? */
boolean_t ia_notified; /* ARP notified about this entry? */
list_node_t ia_node; /* next ARP entry in list */
+ uint16_t ia_flags; /* nce_flags for the address */
+ size_t ia_lladdr_len;
+ uchar_t *ia_lladdr;
} ipmp_arpent_t;
+struct arl_s;
+
+/*
+ * Per-ill capabilities.
+ */
+struct ill_hcksum_capab_s {
+ uint_t ill_hcksum_version; /* interface version */
+ uint_t ill_hcksum_txflags; /* capabilities on transmit */
+};
+
+struct ill_zerocopy_capab_s {
+ uint_t ill_zerocopy_version; /* interface version */
+ uint_t ill_zerocopy_flags; /* capabilities */
+};
+
+struct ill_lso_capab_s {
+ uint_t ill_lso_flags; /* capabilities */
+ uint_t ill_lso_max; /* maximum size of payload */
+};
+
/*
* IP Lower level Structure.
* Instance data structure in ip_open when there is a device below us.
*/
typedef struct ill_s {
+ pfillinput_t ill_inputfn; /* Fast input function selector */
ill_if_t *ill_ifptr; /* pointer to interface type */
queue_t *ill_rq; /* Read queue. */
queue_t *ill_wq; /* Write queue. */
@@ -1922,6 +1558,8 @@
uint_t ill_ipif_up_count; /* Number of IPIFs currently up. */
uint_t ill_max_frag; /* Max IDU from DLPI. */
+ uint_t ill_current_frag; /* Current IDU from DLPI. */
+ uint_t ill_mtu; /* User-specified MTU; SIOCSLIFMTU */
char *ill_name; /* Our name. */
uint_t ill_ipif_dup_count; /* Number of duplicate addresses. */
uint_t ill_name_length; /* Name length, incl. terminator. */
@@ -1941,8 +1579,9 @@
uint8_t *ill_frag_ptr; /* Reassembly state. */
timeout_id_t ill_frag_timer_id; /* timeout id for the frag timer */
ipfb_t *ill_frag_hash_tbl; /* Fragment hash list head. */
- ipif_t *ill_pending_ipif; /* IPIF waiting for DL operation. */
+ krwlock_t ill_mcast_lock; /* Protects multicast state */
+ kmutex_t ill_mcast_serializer; /* Serialize across ilg and ilm state */
ilm_t *ill_ilm; /* Multicast membership for ill */
uint_t ill_global_timer; /* for IGMPv3/MLDv2 general queries */
int ill_mcast_type; /* type of router which is querier */
@@ -1955,22 +1594,20 @@
uint8_t ill_mcast_rv; /* IGMPv3/MLDv2 robustness variable */
int ill_mcast_qi; /* IGMPv3/MLDv2 query interval var */
- mblk_t *ill_pending_mp; /* IOCTL/DLPI awaiting completion. */
/*
* All non-NULL cells between 'ill_first_mp_to_free' and
* 'ill_last_mp_to_free' are freed in ill_delete.
*/
#define ill_first_mp_to_free ill_bcast_mp
mblk_t *ill_bcast_mp; /* DLPI header for broadcasts. */
- mblk_t *ill_resolver_mp; /* Resolver template. */
mblk_t *ill_unbind_mp; /* unbind mp from ill_dl_up() */
mblk_t *ill_promiscoff_mp; /* for ill_leave_allmulti() */
mblk_t *ill_dlpi_deferred; /* b_next chain of control messages */
- mblk_t *ill_ardeact_mp; /* deact mp from ipmp_ill_activate() */
mblk_t *ill_dest_addr_mp; /* mblk which holds ill_dest_addr */
mblk_t *ill_replumb_mp; /* replumb mp from ill_replumb() */
mblk_t *ill_phys_addr_mp; /* mblk which holds ill_phys_addr */
-#define ill_last_mp_to_free ill_phys_addr_mp
+ mblk_t *ill_mcast_deferred; /* b_next chain of IGMP/MLD packets */
+#define ill_last_mp_to_free ill_mcast_deferred
cred_t *ill_credp; /* opener's credentials */
uint8_t *ill_phys_addr; /* ill_phys_addr_mp->b_rptr + off */
@@ -1986,37 +1623,33 @@
ill_dlpi_style_set : 1,
ill_ifname_pending : 1,
- ill_join_allmulti : 1,
ill_logical_down : 1,
ill_dl_up : 1,
-
ill_up_ipifs : 1,
+
ill_note_link : 1, /* supports link-up notification */
ill_capab_reneg : 1, /* capability renegotiation to be done */
ill_dld_capab_inprog : 1, /* direct dld capab call in prog */
-
ill_need_recover_multicast : 1,
- ill_pad_to_bit_31 : 19;
+
+ ill_replumbing : 1,
+ ill_arl_dlpi_pending : 1,
+
+ ill_pad_to_bit_31 : 18;
/* Following bit fields protected by ill_lock */
uint_t
ill_fragtimer_executing : 1,
ill_fragtimer_needrestart : 1,
- ill_ilm_cleanup_reqd : 1,
- ill_arp_closing : 1,
-
- ill_arp_bringup_pending : 1,
- ill_arp_extend : 1, /* ARP has DAD extensions */
ill_manual_token : 1, /* system won't override ill_token */
ill_manual_linklocal : 1, /* system won't auto-conf linklocal */
- ill_pad_bit_31 : 24;
+ ill_pad_bit_31 : 28;
/*
* Used in SIOCSIFMUXID and SIOCGIFMUXID for 'ifconfig unplumb'.
*/
- int ill_arp_muxid; /* muxid returned from plink for arp */
- int ill_ip_muxid; /* muxid returned from plink for ip */
+ int ill_muxid; /* muxid returned from plink */
/* Used for IP frag reassembly throttling on a per ILL basis. */
uint_t ill_ipf_gen; /* Generation of next fragment queue */
@@ -2033,20 +1666,13 @@
uint_t ill_dlpi_capab_state; /* State of capability query, IDCS_* */
uint_t ill_capab_pending_cnt;
uint64_t ill_capabilities; /* Enabled capabilities, ILL_CAPAB_* */
- ill_mdt_capab_t *ill_mdt_capab; /* Multidata Transmit capabilities */
- ill_ipsec_capab_t *ill_ipsec_capab_ah; /* IPsec AH capabilities */
- ill_ipsec_capab_t *ill_ipsec_capab_esp; /* IPsec ESP capabilities */
ill_hcksum_capab_t *ill_hcksum_capab; /* H/W cksumming capabilities */
ill_zerocopy_capab_t *ill_zerocopy_capab; /* Zero-copy capabilities */
ill_dld_capab_t *ill_dld_capab; /* DLD capabilities */
ill_lso_capab_t *ill_lso_capab; /* Large Segment Offload capabilities */
mblk_t *ill_capab_reset_mp; /* Preallocated mblk for capab reset */
- /*
- * Fields for IPv6
- */
uint8_t ill_max_hops; /* Maximum hops for any logical interface */
- uint_t ill_max_mtu; /* Maximum MTU for any logical interface */
uint_t ill_user_mtu; /* User-specified MTU via SIOCSLIFLNKINFO */
uint32_t ill_reachable_time; /* Value for ND algorithm in msec */
uint32_t ill_reachable_retrans_time; /* Value for ND algorithm msec */
@@ -2057,20 +1683,6 @@
uint32_t ill_xmit_count; /* ndp max multicast xmits */
mib2_ipIfStatsEntry_t *ill_ip_mib; /* ver indep. interface mib */
mib2_ipv6IfIcmpEntry_t *ill_icmp6_mib; /* Per interface mib */
- /*
- * Following two mblks are allocated common to all
- * the ipifs when the first interface is coming up.
- * It is sent up to arp when the last ipif is coming
- * down.
- */
- mblk_t *ill_arp_down_mp;
- mblk_t *ill_arp_del_mapping_mp;
- /*
- * Used for implementing IFF_NOARP. As IFF_NOARP is used
- * to turn off for all the logicals, it is here instead
- * of the ipif.
- */
- mblk_t *ill_arp_on_mp;
phyint_t *ill_phyint;
uint64_t ill_flags;
@@ -2094,11 +1706,11 @@
*/
uint_t ill_ifname_pending_err;
avl_node_t ill_avl_byppa; /* avl node based on ppa */
- void *ill_fastpath_list; /* both ire and nce hang off this */
+ list_t ill_nce; /* pointer to nce_s list */
uint_t ill_refcnt; /* active refcnt by threads */
uint_t ill_ire_cnt; /* ires associated with this ill */
kcondvar_t ill_cv;
- uint_t ill_ilm_walker_cnt; /* snmp ilm walkers */
+ uint_t ill_ncec_cnt; /* ncecs associated with this ill */
uint_t ill_nce_cnt; /* nces associated with this ill */
uint_t ill_waiters; /* threads waiting in ipsq_enter */
/*
@@ -2119,6 +1731,17 @@
void *ill_flownotify_mh; /* Tx flow ctl, mac cb handle */
uint_t ill_ilm_cnt; /* ilms referencing this ill */
uint_t ill_ipallmulti_cnt; /* ip_join_allmulti() calls */
+ ilm_t *ill_ipallmulti_ilm;
+
+ mblk_t *ill_saved_ire_mp; /* Allocated for each extra IRE */
+ /* with ire_ill set so they can */
+ /* survive the ill going down and up. */
+ kmutex_t ill_saved_ire_lock; /* Protects ill_saved_ire_mp, cnt */
+ uint_t ill_saved_ire_cnt; /* # entries */
+ struct arl_ill_common_s *ill_common;
+ ire_t *ill_ire_multicast; /* IRE_MULTICAST for ill */
+ clock_t ill_defend_start; /* start of 1 hour period */
+ uint_t ill_defend_count; /* # of announce/defends per ill */
/*
* IPMP fields.
*/
@@ -2131,6 +1754,8 @@
uint_t ill_bound_cnt; /* # of data addresses bound to ill */
ipif_t *ill_bound_ipif; /* ipif chain bound to ill */
timeout_id_t ill_refresh_tid; /* ill refresh retry timeout id */
+
+ uint32_t ill_mrouter_cnt; /* mrouter allmulti joins */
} ill_t;
/*
@@ -2139,15 +1764,17 @@
*/
#define ILL_FREE_OK(ill) \
((ill)->ill_ire_cnt == 0 && (ill)->ill_ilm_cnt == 0 && \
- (ill)->ill_nce_cnt == 0)
+ (ill)->ill_ncec_cnt == 0 && (ill)->ill_nce_cnt == 0)
/*
- * An ipif/ill can be marked down only when the ire and nce references
+ * An ipif/ill can be marked down only when the ire and ncec references
* to that ipif/ill goes to zero. ILL_DOWN_OK() is a necessary condition
* quiescence checks. See comments above IPIF_DOWN_OK for details
* on why ires and nces are selectively considered for this macro.
*/
-#define ILL_DOWN_OK(ill) (ill->ill_ire_cnt == 0 && ill->ill_nce_cnt == 0)
+#define ILL_DOWN_OK(ill) \
+ (ill->ill_ire_cnt == 0 && ill->ill_ncec_cnt == 0 && \
+ ill->ill_nce_cnt == 0)
/*
* The following table lists the protection levels of the various members
@@ -2162,7 +1789,8 @@
* ill_error ipsq None
* ill_ipif ill_g_lock + ipsq ill_g_lock OR ipsq
* ill_ipif_up_count ill_lock + ipsq ill_lock OR ipsq
- * ill_max_frag ipsq Write once
+ * ill_max_frag ill_lock ill_lock
+ * ill_current_frag ill_lock ill_lock
*
* ill_name ill_g_lock + ipsq Write once
* ill_name_length ill_g_lock + ipsq Write once
@@ -2179,23 +1807,22 @@
*
* ill_frag_timer_id ill_lock ill_lock
* ill_frag_hash_tbl ipsq up ill
- * ill_ilm ipsq + ill_lock ill_lock
- * ill_mcast_type ill_lock ill_lock
- * ill_mcast_v1_time ill_lock ill_lock
- * ill_mcast_v2_time ill_lock ill_lock
- * ill_mcast_v1_tset ill_lock ill_lock
- * ill_mcast_v2_tset ill_lock ill_lock
- * ill_mcast_rv ill_lock ill_lock
- * ill_mcast_qi ill_lock ill_lock
- * ill_pending_mp ill_lock ill_lock
+ * ill_ilm ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_global_timer ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_type ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_v1_time ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_v2_time ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_v1_tset ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_v2_tset ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_rv ill_mcast_lock(WRITER) ill_mcast_lock(READER)
+ * ill_mcast_qi ill_mcast_lock(WRITER) ill_mcast_lock(READER)
*
- * ill_bcast_mp ipsq ipsq
- * ill_resolver_mp ipsq only when ill is up
* ill_down_mp ipsq ipsq
* ill_dlpi_deferred ill_lock ill_lock
* ill_dlpi_pending ipsq + ill_lock ipsq or ill_lock or
* absence of ipsq writer.
* ill_phys_addr_mp ipsq + down ill only when ill is up
+ * ill_mcast_deferred ill_lock ill_lock
* ill_phys_addr ipsq + down ill only when ill is up
* ill_dest_addr_mp ipsq + down ill only when ill is up
* ill_dest_addr ipsq + down ill only when ill is up
@@ -2204,8 +1831,7 @@
* exclusive bit flags ipsq_t ipsq_t
* shared bit flags ill_lock ill_lock
*
- * ill_arp_muxid ipsq Not atomic
- * ill_ip_muxid ipsq Not atomic
+ * ill_muxid ipsq Not atomic
*
* ill_ipf_gen Not atomic
* ill_frag_count atomics atomics
@@ -2215,7 +1841,7 @@
* ill_dlpi_capab_state ipsq ipsq
* ill_max_hops ipsq Not atomic
*
- * ill_max_mtu
+ * ill_mtu ill_lock None
*
* ill_user_mtu ipsq + ill_lock ill_lock
* ill_reachable_time ipsq + ill_lock ill_lock
@@ -2230,9 +1856,6 @@
* ill_xmit_count ipsq + down ill write once
* ill_ip6_mib ipsq + down ill only when ill is up
* ill_icmp6_mib ipsq + down ill only when ill is up
- * ill_arp_down_mp ipsq ipsq
- * ill_arp_del_mapping_mp ipsq ipsq
- * ill_arp_on_mp ipsq ipsq
*
* ill_phyint ipsq, ill_g_lock, ill_lock Any of them
* ill_flags ill_lock ill_lock
@@ -2247,7 +1870,7 @@
* ill_refcnt ill_lock ill_lock
* ill_ire_cnt ill_lock ill_lock
* ill_cv ill_lock ill_lock
- * ill_ilm_walker_cnt ill_lock ill_lock
+ * ill_ncec_cnt ill_lock ill_lock
* ill_nce_cnt ill_lock ill_lock
* ill_ilm_cnt ill_lock ill_lock
* ill_src_ipif ill_g_lock ill_g_lock
@@ -2256,8 +1879,12 @@
* ill_dhcpinit atomics atomics
* ill_flownotify_mh write once write once
* ill_capab_pending_cnt ipsq ipsq
- *
- * ill_bound_cnt ipsq ipsq
+ * ill_ipallmulti_cnt ill_lock ill_lock
+ * ill_ipallmulti_ilm ill_lock ill_lock
+ * ill_saved_ire_mp ill_saved_ire_lock ill_saved_ire_lock
+ * ill_saved_ire_cnt ill_saved_ire_lock ill_saved_ire_lock
+ * ill_arl ??? ???
+ * ill_ire_multicast ipsq + quiescent none
* ill_bound_ipif ipsq ipsq
* ill_actnode ipsq + ipmp_lock ipsq OR ipmp_lock
* ill_grpnode ipsq + ill_g_lock ipsq OR ill_g_lock
@@ -2267,6 +1894,7 @@
* ill_refresh_tid ill_lock ill_lock
* ill_grp (for IPMP ill) write once write once
* ill_grp (for underlying ill) ipsq + ill_g_lock ipsq OR ill_g_lock
+ * ill_mrouter_cnt atomics atomics
*
* NOTE: It's OK to make heuristic decisions on an underlying interface
* by using IS_UNDER_IPMP() or comparing ill_grp's raw pointer value.
@@ -2311,7 +1939,6 @@
#define IPI_GET_CMD 0x8 /* branch to mi_copyout on success */
/* unused 0x10 */
#define IPI_NULL_BCONT 0x20 /* ioctl has not data and hence no b_cont */
-#define IPI_PASS_DOWN 0x40 /* pass this ioctl down when a module only */
extern ip_ioctl_cmd_t ip_ndx_ioctl_table[];
extern ip_ioctl_cmd_t ip_misc_ioctl_table[];
@@ -2362,6 +1989,430 @@
char *ip_ndp_name;
} ipndp_t;
+/* IXA Notification types */
+typedef enum {
+ IXAN_LSO, /* LSO capability change */
+ IXAN_PMTU, /* PMTU change */
+ IXAN_ZCOPY /* ZEROCOPY capability change */
+} ixa_notify_type_t;
+
+typedef uint_t ixa_notify_arg_t;
+
+typedef void (*ixa_notify_t)(void *, ip_xmit_attr_t *ixa, ixa_notify_type_t,
+ ixa_notify_arg_t);
+
+/*
+ * Attribute flags that are common to the transmit and receive attributes
+ */
+#define IAF_IS_IPV4 0x80000000 /* ipsec_*_v4 */
+#define IAF_TRUSTED_ICMP 0x40000000 /* ipsec_*_icmp_loopback */
+#define IAF_NO_LOOP_ZONEID_SET 0x20000000 /* Zone that shouldn't have */
+ /* a copy */
+#define IAF_LOOPBACK_COPY 0x10000000 /* For multi and broadcast */
+
+#define IAF_MASK 0xf0000000 /* Flags that are common */
+
+/*
+ * Transmit side attributes used between the transport protocols and IP as
+ * well as inside IP. It is also used to cache information in the conn_t i.e.
+ * replaces conn_ire and the IPsec caching in the conn_t.
+ */
+struct ip_xmit_attr_s {
+ iaflags_t ixa_flags; /* IXAF_*. See below */
+
+ uint32_t ixa_free_flags; /* IXA_FREE_*. See below */
+ uint32_t ixa_refcnt; /* Using atomics */
+
+ /*
+ * Always initialized independently of ixa_flags settings.
+ * Used by ip_xmit so we keep them up front for cache locality.
+ */
+ uint32_t ixa_xmit_hint; /* For ECMP and GLD TX ring fanout */
+ uint_t ixa_pktlen; /* Always set. For frag and stats */
+ zoneid_t ixa_zoneid; /* Assumed always set */
+
+ /* Always set for conn_ip_output(); might be stale */
+ /*
+ * Since TCP keeps the conn_t around past the process going away
+ * we need to use the "notr" (e.g, ire_refhold_notr) for ixa_ire,
+ * ixa_nce, and ixa_dce.
+ */
+ ire_t *ixa_ire; /* Forwarding table entry */
+ uint_t ixa_ire_generation;
+ nce_t *ixa_nce; /* Neighbor cache entry */
+ dce_t *ixa_dce; /* Destination cache entry */
+ uint_t ixa_dce_generation;
+ uint_t ixa_src_generation; /* If IXAF_VERIFY_SOURCE */
+
+ uint32_t ixa_src_preferences; /* prefs for src addr select */
+ uint32_t ixa_pmtu; /* IXAF_VERIFY_PMTU */
+
+ /* Set by ULP if IXAF_VERIFY_PMTU; otherwise set by IP */
+ uint32_t ixa_fragsize;
+
+ int8_t ixa_use_min_mtu; /* IXAF_USE_MIN_MTU values */
+
+ pfirepostfrag_t ixa_postfragfn; /* Set internally in IP */
+
+ in6_addr_t ixa_nexthop_v6; /* IXAF_NEXTHOP_SET */
+#define ixa_nexthop_v4 V4_PART_OF_V6(ixa_nexthop_v6)
+
+ zoneid_t ixa_no_loop_zoneid; /* IXAF_NO_LOOP_ZONEID_SET */
+
+ uint_t ixa_scopeid; /* For IPv6 link-locals */
+
+ uint_t ixa_broadcast_ttl; /* IXAF_BROACAST_TTL_SET */
+
+ uint_t ixa_multicast_ttl; /* Assumed set for multicast */
+ uint_t ixa_multicast_ifindex; /* Assumed set for multicast */
+ ipaddr_t ixa_multicast_ifaddr; /* Assumed set for multicast */
+
+ int ixa_raw_cksum_offset; /* If IXAF_SET_RAW_CKSUM */
+
+ uint32_t ixa_ident; /* For IPv6 fragment header */
+
+ /*
+ * Cached LSO information.
+ */
+ ill_lso_capab_t ixa_lso_capab; /* Valid when IXAF_LSO_CAPAB */
+
+ uint64_t ixa_ipsec_policy_gen; /* Generation from iph_gen */
+ /*
+ * The following IPsec fields are only initialized when
+ * IXAF_IPSEC_SECURE is set. Otherwise they contain garbage.
+ */
+ ipsec_latch_t *ixa_ipsec_latch; /* Just the ids */
+ struct ipsa_s *ixa_ipsec_ah_sa; /* Hard reference SA for AH */
+ struct ipsa_s *ixa_ipsec_esp_sa; /* Hard reference SA for ESP */
+ struct ipsec_policy_s *ixa_ipsec_policy; /* why are we here? */
+ struct ipsec_action_s *ixa_ipsec_action; /* For reflected packets */
+ ipsa_ref_t ixa_ipsec_ref[2]; /* Soft reference to SA */
+ /* 0: ESP, 1: AH */
+
+ /*
+ * The selectors here are potentially different than the SPD rule's
+ * selectors, and we need to have both available for IKEv2.
+ *
+ * NOTE: "Source" and "Dest" are w.r.t. outbound datagrams. Ports can
+ * be zero, and the protocol number is needed to make the ports
+ * significant.
+ */
+ uint16_t ixa_ipsec_src_port; /* Source port number of d-gram. */
+ uint16_t ixa_ipsec_dst_port; /* Destination port number of d-gram. */
+ uint8_t ixa_ipsec_icmp_type; /* ICMP type of d-gram */
+ uint8_t ixa_ipsec_icmp_code; /* ICMP code of d-gram */
+
+ sa_family_t ixa_ipsec_inaf; /* Inner address family */
+#define IXA_MAX_ADDRLEN 4 /* Max addr len. (in 32-bit words) */
+ uint32_t ixa_ipsec_insrc[IXA_MAX_ADDRLEN]; /* Inner src address */
+ uint32_t ixa_ipsec_indst[IXA_MAX_ADDRLEN]; /* Inner dest address */
+ uint8_t ixa_ipsec_insrcpfx; /* Inner source prefix */
+ uint8_t ixa_ipsec_indstpfx; /* Inner destination prefix */
+
+ uint8_t ixa_ipsec_proto; /* IP protocol number for d-gram. */
+
+ /* Always initialized independently of ixa_flags settings */
+ uint_t ixa_ifindex; /* Assumed always set */
+ uint16_t ixa_ip_hdr_length; /* Points to ULP header */
+ uint8_t ixa_protocol; /* Protocol number for ULP cksum */
+ ts_label_t *ixa_tsl; /* Always set. NULL if not TX */
+ ip_stack_t *ixa_ipst; /* Always set */
+ uint32_t ixa_extra_ident; /* Set if LSO */
+ cred_t *ixa_cred; /* For getpeerucred */
+ pid_t ixa_cpid; /* For getpeerucred */
+
+#ifdef DEBUG
+ kthread_t *ixa_curthread; /* For serialization assert */
+#endif
+ squeue_t *ixa_sqp; /* Set from conn_sqp as a hint */
+ uintptr_t ixa_cookie; /* cookie to use for tx flow control */
+
+ /*
+ * Must be set by ULP if any of IXAF_VERIFY_LSO, IXAF_VERIFY_PMTU,
+ * or IXAF_VERIFY_ZCOPY is set.
+ */
+ ixa_notify_t ixa_notify; /* Registered upcall notify function */
+ void *ixa_notify_cookie; /* ULP cookie for ixa_notify */
+};
+
+/*
+ * Flags to indicate which transmit attributes are set.
+ * Split into "xxx_SET" ones which indicate that the "xxx" field it set, and
+ * single flags.
+ */
+#define IXAF_REACH_CONF 0x00000001 /* Reachability confirmation */
+#define IXAF_BROADCAST_TTL_SET 0x00000002 /* ixa_broadcast_ttl valid */
+#define IXAF_SET_SOURCE 0x00000004 /* Replace if broadcast */
+#define IXAF_USE_MIN_MTU 0x00000008 /* IPV6_USE_MIN_MTU */
+
+#define IXAF_DONTFRAG 0x00000010 /* IP*_DONTFRAG */
+#define IXAF_VERIFY_PMTU 0x00000020 /* ixa_pmtu/ixa_fragsize set */
+#define IXAF_PMTU_DISCOVERY 0x00000040 /* Create/use PMTU state */
+#define IXAF_MULTICAST_LOOP 0x00000080 /* IP_MULTICAST_LOOP */
+
+#define IXAF_IPSEC_SECURE 0x00000100 /* Need IPsec processing */
+#define IXAF_UCRED_TSL 0x00000200 /* ixa_tsl from SCM_UCRED */
+#define IXAF_DONTROUTE 0x00000400 /* SO_DONTROUTE */
+#define IXAF_NO_IPSEC 0x00000800 /* Ignore policy */
+
+#define IXAF_PMTU_TOO_SMALL 0x00001000 /* PMTU too small */
+#define IXAF_SET_ULP_CKSUM 0x00002000 /* Calculate ULP checksum */
+#define IXAF_VERIFY_SOURCE 0x00004000 /* Check that source is ok */
+#define IXAF_NEXTHOP_SET 0x00008000 /* ixa_nexthop set */
+
+#define IXAF_PMTU_IPV4_DF 0x00010000 /* Set IPv4 DF */
+#define IXAF_NO_DEV_FLOW_CTL 0x00020000 /* Protocol needs no flow ctl */
+#define IXAF_NO_TTL_CHANGE 0x00040000 /* Internal to IP */
+#define IXAF_IPV6_ADD_FRAGHDR 0x00080000 /* Add fragment header */
+
+#define IXAF_IPSEC_TUNNEL 0x00100000 /* Tunnel mode */
+#define IXAF_NO_PFHOOK 0x00200000 /* Skip xmit pfhook */
+#define IXAF_NO_TRACE 0x00400000 /* When back from ARP/ND */
+#define IXAF_SCOPEID_SET 0x00800000 /* ixa_scopeid set */
+
+#define IXAF_MULTIRT_MULTICAST 0x01000000 /* MULTIRT for multicast */
+#define IXAF_NO_HW_CKSUM 0x02000000 /* Force software cksum */
+#define IXAF_SET_RAW_CKSUM 0x04000000 /* Use ixa_raw_cksum_offset */
+#define IXAF_IPSEC_GLOBAL_POLICY 0x08000000 /* Policy came from global */
+
+/* Note the following uses bits 0x10000000 through 0x80000000 */
+#define IXAF_IS_IPV4 IAF_IS_IPV4
+#define IXAF_TRUSTED_ICMP IAF_TRUSTED_ICMP
+#define IXAF_NO_LOOP_ZONEID_SET IAF_NO_LOOP_ZONEID_SET
+#define IXAF_LOOPBACK_COPY IAF_LOOPBACK_COPY
+
+/* Note: use the upper 32 bits */
+#define IXAF_VERIFY_LSO 0x100000000 /* Check LSO capability */
+#define IXAF_LSO_CAPAB 0x200000000 /* Capable of LSO */
+#define IXAF_VERIFY_ZCOPY 0x400000000 /* Check Zero Copy capability */
+#define IXAF_ZCOPY_CAPAB 0x800000000 /* Capable of ZEROCOPY */
+
+/*
+ * The normal flags for sending packets e.g., icmp errors
+ */
+#define IXAF_BASIC_SIMPLE_V4 (IXAF_SET_ULP_CKSUM | IXAF_IS_IPV4)
+#define IXAF_BASIC_SIMPLE_V6 (IXAF_SET_ULP_CKSUM)
+
+/*
+ * Normally these fields do not have a hold. But in some cases they do, for
+ * instance when we've gone through ip_*_attr_to/from_mblk.
+ * We use ixa_free_flags to indicate that they have a hold and need to be
+ * released on cleanup.
+ */
+#define IXA_FREE_CRED 0x00000001 /* ixa_cred needs to be rele */
+#define IXA_FREE_TSL 0x00000002 /* ixa_tsl needs to be rele */
+
+/*
+ * Simplistic way to set the ixa_xmit_hint for locally generated traffic
+ * and forwarded traffic. The shift amount are based on the size of the
+ * structs to discard the low order bits which don't have much if any variation
+ * (coloring in kmem_cache_alloc might provide some variation).
+ *
+ * Basing the locally generated hint on the address of the conn_t means that
+ * the packets from the same socket/connection do not get reordered.
+ * Basing the hint for forwarded traffic on the ill_ring_t means that
+ * packets from the same NIC+ring are likely to use the same outbound ring
+ * hence we get low contention on the ring in the transmitting driver.
+ */
+#define CONN_TO_XMIT_HINT(connp) ((uint32_t)(((uintptr_t)connp) >> 11))
+#define ILL_RING_TO_XMIT_HINT(ring) ((uint32_t)(((uintptr_t)ring) >> 7))
+
+/*
+ * IP set Destination Flags used by function ip_set_destination,
+ * ip_attr_connect, and conn_connect.
+ */
+#define IPDF_ALLOW_MCBC 0x1 /* Allow multi/broadcast */
+#define IPDF_VERIFY_DST 0x2 /* Verify destination addr */
+#define IPDF_SELECT_SRC 0x4 /* Select source address */
+#define IPDF_LSO 0x8 /* Try LSO */
+#define IPDF_IPSEC 0x10 /* Set IPsec policy */
+#define IPDF_ZONE_IS_GLOBAL 0x20 /* From conn_zone_is_global */
+#define IPDF_ZCOPY 0x40 /* Try ZEROCOPY */
+#define IPDF_UNIQUE_DCE 0x80 /* Get a per-destination DCE */
+
+/*
+ * Receive side attributes used between the transport protocols and IP as
+ * well as inside IP.
+ */
+struct ip_recv_attr_s {
+ iaflags_t ira_flags; /* See below */
+
+ uint32_t ira_free_flags; /* IRA_FREE_*. See below */
+
+ /*
+ * This is a hint for TCP SYN packets.
+ * Always initialized independently of ira_flags settings
+ */
+ squeue_t *ira_sqp;
+ ill_rx_ring_t *ira_ring; /* Internal to IP */
+
+ /* For ip_accept_tcp when IRAF_TARGET_SQP is set */
+ squeue_t *ira_target_sqp;
+ mblk_t *ira_target_sqp_mp;
+
+ /* Always initialized independently of ira_flags settings */
+ uint32_t ira_xmit_hint; /* For ECMP and GLD TX ring fanout */
+ zoneid_t ira_zoneid; /* ALL_ZONES unless local delivery */
+ uint_t ira_pktlen; /* Always set. For frag and stats */
+ uint16_t ira_ip_hdr_length; /* Points to ULP header */
+ uint8_t ira_protocol; /* Protocol number for ULP cksum */
+ uint_t ira_rifindex; /* Received ifindex */
+ uint_t ira_ruifindex; /* Received upper ifindex */
+ ts_label_t *ira_tsl; /* Always set. NULL if not TX */
+ /*
+ * ira_rill and ira_ill is set inside IP, but not when conn_recv is
+ * called; ULPs should use ira_ruifindex instead.
+ */
+ ill_t *ira_rill; /* ill where packet came */
+ ill_t *ira_ill; /* ill where IP address hosted */
+ cred_t *ira_cred; /* For getpeerucred */
+ pid_t ira_cpid; /* For getpeerucred */
+
+ /* Used when IRAF_VERIFIED_SRC is set; this source was ok */
+ ipaddr_t ira_verified_src;
+
+ /*
+ * The following IPsec fields are only initialized when
+ * IRAF_IPSEC_SECURE is set. Otherwise they contain garbage.
+ */
+ struct ipsec_action_s *ira_ipsec_action; /* how we made it in.. */
+ struct ipsa_s *ira_ipsec_ah_sa; /* SA for AH */
+ struct ipsa_s *ira_ipsec_esp_sa; /* SA for ESP */
+
+ ipaddr_t ira_mroute_tunnel; /* IRAF_MROUTE_TUNNEL_SET */
+
+ zoneid_t ira_no_loop_zoneid; /* IRAF_NO_LOOP_ZONEID_SET */
+
+ uint32_t ira_esp_udp_ports; /* IRAF_ESP_UDP_PORTS */
+
+ /*
+ * For IP_RECVSLLA and ip_ndp_conflict/find_solicitation.
+ * Same size as max for sockaddr_dl
+ */
+#define IRA_L2SRC_SIZE 244
+ uint8_t ira_l2src[IRA_L2SRC_SIZE]; /* If IRAF_L2SRC_SET */
+
+ /*
+ * Local handle that we use to do lazy setting of ira_l2src.
+ * We defer setting l2src until needed but we do before any
+ * ip_input pullupmsg or copymsg.
+ */
+ struct mac_header_info_s *ira_mhip; /* Could be NULL */
+};
+
+/*
+ * Flags to indicate which receive attributes are set.
+ */
+#define IRAF_SYSTEM_LABELED 0x00000001 /* is_system_labeled() */
+#define IRAF_IPV4_OPTIONS 0x00000002 /* Performance */
+#define IRAF_MULTICAST 0x00000004 /* Was multicast at L3 */
+#define IRAF_BROADCAST 0x00000008 /* Was broadcast at L3 */
+#define IRAF_MULTIBROADCAST (IRAF_MULTICAST|IRAF_BROADCAST)
+
+#define IRAF_LOOPBACK 0x00000010 /* Looped back by IP */
+#define IRAF_VERIFY_IP_CKSUM 0x00000020 /* Need to verify IP */
+#define IRAF_VERIFY_ULP_CKSUM 0x00000040 /* Need to verify TCP,UDP,etc */
+#define IRAF_SCTP_CSUM_ERR 0x00000080 /* sctp pkt has failed chksum */
+
+#define IRAF_IPSEC_SECURE 0x00000100 /* Passed AH and/or ESP */
+#define IRAF_DHCP_UNICAST 0x00000200
+#define IRAF_IPSEC_DECAPS 0x00000400 /* Was packet decapsulated */
+ /* from a matching inner packet? */
+#define IRAF_TARGET_SQP 0x00000800 /* ira_target_sqp is set */
+#define IRAF_VERIFIED_SRC 0x00001000 /* ira_verified_src set */
+#define IRAF_RSVP 0x00002000 /* RSVP packet for rsvpd */
+#define IRAF_MROUTE_TUNNEL_SET 0x00004000 /* From ip_mroute_decap */
+#define IRAF_PIM_REGISTER 0x00008000 /* From register_mforward */
+
+#define IRAF_TX_MAC_EXEMPTABLE 0x00010000 /* Allow MAC_EXEMPT readdown */
+#define IRAF_TX_SHARED_ADDR 0x00020000 /* Arrived on ALL_ZONES addr */
+#define IRAF_ESP_UDP_PORTS 0x00040000 /* NAT-traversal packet */
+#define IRAF_NO_HW_CKSUM 0x00080000 /* Force software cksum */
+
+#define IRAF_ICMP_ERROR 0x00100000 /* Send to conn_recvicmp */
+#define IRAF_ROUTER_ALERT 0x00200000 /* IPv6 router alert */
+#define IRAF_L2SRC_SET 0x00400000 /* ira_l2src has been set */
+#define IRAF_L2SRC_LOOPBACK 0x00800000 /* Came from us */
+
+#define IRAF_L2DST_MULTICAST 0x01000000 /* Multicast at L2 */
+#define IRAF_L2DST_BROADCAST 0x02000000 /* Broadcast at L2 */
+/* Unused 0x04000000 */
+/* Unused 0x08000000 */
+
+/* Below starts with 0x10000000 */
+#define IRAF_IS_IPV4 IAF_IS_IPV4
+#define IRAF_TRUSTED_ICMP IAF_TRUSTED_ICMP
+#define IRAF_NO_LOOP_ZONEID_SET IAF_NO_LOOP_ZONEID_SET
+#define IRAF_LOOPBACK_COPY IAF_LOOPBACK_COPY
+
+/*
+ * Normally these fields do not have a hold. But in some cases they do, for
+ * instance when we've gone through ip_*_attr_to/from_mblk.
+ * We use ira_free_flags to indicate that they have a hold and need to be
+ * released on cleanup.
+ */
+#define IRA_FREE_CRED 0x00000001 /* ira_cred needs to be rele */
+#define IRA_FREE_TSL 0x00000002 /* ira_tsl needs to be rele */
+
+/*
+ * Optional destination cache entry for path MTU information,
+ * and ULP metrics.
+ */
+struct dce_s {
+ uint_t dce_generation; /* Changed since cached? */
+ uint_t dce_flags; /* See below */
+ uint_t dce_ipversion; /* IPv4/IPv6 version */
+ uint32_t dce_pmtu; /* Path MTU if DCEF_PMTU */
+ uint32_t dce_ident; /* Per destination IP ident. */
+ iulp_t dce_uinfo; /* Metrics if DCEF_UINFO */
+
+ struct dce_s *dce_next;
+ struct dce_s **dce_ptpn;
+ struct dcb_s *dce_bucket;
+
+ union {
+ in6_addr_t dceu_v6addr;
+ ipaddr_t dceu_v4addr;
+ } dce_u;
+#define dce_v4addr dce_u.dceu_v4addr
+#define dce_v6addr dce_u.dceu_v6addr
+ /* Note that for IPv6+IPMP we use the ifindex for the upper interface */
+ uint_t dce_ifindex; /* For IPv6 link-locals */
+
+ kmutex_t dce_lock;
+ uint_t dce_refcnt;
+ uint64_t dce_last_change_time; /* Path MTU. In seconds */
+
+ ip_stack_t *dce_ipst; /* Does not have a netstack_hold */
+};
+
+/*
+ * Values for dce_generation.
+ *
+ * If a DCE has DCE_GENERATION_CONDEMNED, the last dce_refrele should delete
+ * it.
+ *
+ * DCE_GENERATION_VERIFY is never stored in dce_generation but it is
+ * stored in places that cache DCE (such as ixa_dce_generation).
+ * It is used as a signal that the cache is stale and needs to be reverified.
+ */
+#define DCE_GENERATION_CONDEMNED 0
+#define DCE_GENERATION_VERIFY 1
+#define DCE_GENERATION_INITIAL 2
+#define DCE_IS_CONDEMNED(dce) \
+ ((dce)->dce_generation == DCE_GENERATION_CONDEMNED)
+
+
+/*
+ * Values for ips_src_generation.
+ *
+ * SRC_GENERATION_VERIFY is never stored in ips_src_generation but it is
+ * stored in places that cache IREs (ixa_src_generation). It is used as a
+ * signal that the cache is stale and needs to be reverified.
+ */
+#define SRC_GENERATION_VERIFY 0
+#define SRC_GENERATION_INITIAL 1
+
/*
* The kernel stores security attributes of all gateways in a database made
* up of one or more tsol_gcdb_t elements. Each tsol_gcdb_t contains the
@@ -2453,183 +2504,28 @@
*/
struct tsol_tnrhc;
-typedef struct tsol_ire_gw_secattr_s {
+struct tsol_ire_gw_secattr_s {
kmutex_t igsa_lock; /* lock to protect following */
struct tsol_tnrhc *igsa_rhc; /* host entry for gateway */
tsol_gc_t *igsa_gc; /* for prefix IREs */
- tsol_gcgrp_t *igsa_gcgrp; /* for cache IREs */
-} tsol_ire_gw_secattr_t;
-
-/*
- * Following are the macros to increment/decrement the reference
- * count of the IREs and IRBs (ire bucket).
- *
- * 1) We bump up the reference count of an IRE to make sure that
- * it does not get deleted and freed while we are using it.
- * Typically all the lookup functions hold the bucket lock,
- * and look for the IRE. If it finds an IRE, it bumps up the
- * reference count before dropping the lock. Sometimes we *may* want
- * to bump up the reference count after we *looked* up i.e without
- * holding the bucket lock. So, the IRE_REFHOLD macro does not assert
- * on the bucket lock being held. Any thread trying to delete from
- * the hash bucket can still do so but cannot free the IRE if
- * ire_refcnt is not 0.
- *
- * 2) We bump up the reference count on the bucket where the IRE resides
- * (IRB), when we want to prevent the IREs getting deleted from a given
- * hash bucket. This makes life easier for ire_walk type functions which
- * wants to walk the IRE list, call a function, but needs to drop
- * the bucket lock to prevent recursive rw_enters. While the
- * lock is dropped, the list could be changed by other threads or
- * the same thread could end up deleting the ire or the ire pointed by
- * ire_next. IRE_REFHOLDing the ire or ire_next is not sufficient as
- * a delete will still remove the ire from the bucket while we have
- * dropped the lock and hence the ire_next would be NULL. Thus, we
- * need a mechanism to prevent deletions from a given bucket.
- *
- * To prevent deletions, we bump up the reference count on the
- * bucket. If the bucket is held, ire_delete just marks IRE_MARK_CONDEMNED
- * both on the ire's ire_marks and the bucket's irb_marks. When the
- * reference count on the bucket drops to zero, all the CONDEMNED ires
- * are deleted. We don't have to bump up the reference count on the
- * bucket if we are walking the bucket and never have to drop the bucket
- * lock. Note that IRB_REFHOLD does not prevent addition of new ires
- * in the list. It is okay because addition of new ires will not cause
- * ire_next to point to freed memory. We do IRB_REFHOLD only when
- * all of the 3 conditions are true :
- *
- * 1) The code needs to walk the IRE bucket from start to end.
- * 2) It may have to drop the bucket lock sometimes while doing (1)
- * 3) It does not want any ires to be deleted meanwhile.
- */
-
-/*
- * Bump up the reference count on the IRE. We cannot assert that the
- * bucket lock is being held as it is legal to bump up the reference
- * count after the first lookup has returned the IRE without
- * holding the lock. Currently ip_wput does this for caching IRE_CACHEs.
- */
-
-#ifdef DEBUG
-#define IRE_UNTRACE_REF(ire) ire_untrace_ref(ire);
-#define IRE_TRACE_REF(ire) ire_trace_ref(ire);
-#else
-#define IRE_UNTRACE_REF(ire)
-#define IRE_TRACE_REF(ire)
-#endif
-
-#define IRE_REFHOLD_NOTR(ire) { \
- atomic_add_32(&(ire)->ire_refcnt, 1); \
- ASSERT((ire)->ire_refcnt != 0); \
-}
-
-#define IRE_REFHOLD(ire) { \
- IRE_REFHOLD_NOTR(ire); \
- IRE_TRACE_REF(ire); \
-}
-
-#define IRE_REFHOLD_LOCKED(ire) { \
- IRE_TRACE_REF(ire); \
- (ire)->ire_refcnt++; \
-}
-
-/*
- * Decrement the reference count on the IRE.
- * In architectures e.g sun4u, where atomic_add_32_nv is just
- * a cas, we need to maintain the right memory barrier semantics
- * as that of mutex_exit i.e all the loads and stores should complete
- * before the cas is executed. membar_exit() does that here.
- *
- * NOTE : This macro is used only in places where we want performance.
- * To avoid bloating the code, we use the function "ire_refrele"
- * which essentially calls the macro.
- */
-#define IRE_REFRELE_NOTR(ire) { \
- ASSERT((ire)->ire_refcnt != 0); \
- membar_exit(); \
- if (atomic_add_32_nv(&(ire)->ire_refcnt, -1) == 0) \
- ire_inactive(ire); \
-}
-
-#define IRE_REFRELE(ire) { \
- if (ire->ire_bucket != NULL) { \
- IRE_UNTRACE_REF(ire); \
- } \
- IRE_REFRELE_NOTR(ire); \
-}
-
-/*
- * Bump up the reference count on the hash bucket - IRB to
- * prevent ires from being deleted in this bucket.
- */
-#define IRB_REFHOLD(irb) { \
- rw_enter(&(irb)->irb_lock, RW_WRITER); \
- (irb)->irb_refcnt++; \
- ASSERT((irb)->irb_refcnt != 0); \
- rw_exit(&(irb)->irb_lock); \
-}
-#define IRB_REFHOLD_LOCKED(irb) { \
- ASSERT(RW_WRITE_HELD(&(irb)->irb_lock)); \
- (irb)->irb_refcnt++; \
- ASSERT((irb)->irb_refcnt != 0); \
-}
+};
void irb_refrele_ftable(irb_t *);
-/*
- * Note: when IRB_MARK_FTABLE (i.e., IRE_CACHETABLE entry), the irb_t
- * is statically allocated, so that when the irb_refcnt goes to 0,
- * we simply clean up the ire list and continue.
- */
-#define IRB_REFRELE(irb) { \
- if ((irb)->irb_marks & IRB_MARK_FTABLE) { \
- irb_refrele_ftable((irb)); \
- } else { \
- rw_enter(&(irb)->irb_lock, RW_WRITER); \
- ASSERT((irb)->irb_refcnt != 0); \
- if (--(irb)->irb_refcnt == 0 && \
- ((irb)->irb_marks & IRE_MARK_CONDEMNED)) { \
- ire_t *ire_list; \
- \
- ire_list = ire_unlink(irb); \
- rw_exit(&(irb)->irb_lock); \
- ASSERT(ire_list != NULL); \
- ire_cleanup(ire_list); \
- } else { \
- rw_exit(&(irb)->irb_lock); \
- } \
- } \
-}
extern struct kmem_cache *rt_entry_cache;
-/*
- * Lock the fast path mp for access, since the fp_mp can be deleted
- * due a DL_NOTE_FASTPATH_FLUSH in the case of IRE_BROADCAST
- */
-
-#define LOCK_IRE_FP_MP(ire) { \
- if ((ire)->ire_type == IRE_BROADCAST) \
- mutex_enter(&ire->ire_nce->nce_lock); \
- }
-#define UNLOCK_IRE_FP_MP(ire) { \
- if ((ire)->ire_type == IRE_BROADCAST) \
- mutex_exit(&ire->ire_nce->nce_lock); \
- }
-
typedef struct ire4 {
- ipaddr_t ire4_src_addr; /* Source address to use. */
ipaddr_t ire4_mask; /* Mask for matching this IRE. */
ipaddr_t ire4_addr; /* Address this IRE represents. */
- ipaddr_t ire4_gateway_addr; /* Gateway if IRE_CACHE/IRE_OFFSUBNET */
- ipaddr_t ire4_cmask; /* Mask from parent prefix route */
+ ipaddr_t ire4_gateway_addr; /* Gateway including for IRE_ONLINK */
+ ipaddr_t ire4_setsrc_addr; /* RTF_SETSRC */
} ire4_t;
typedef struct ire6 {
- in6_addr_t ire6_src_addr; /* Source address to use. */
in6_addr_t ire6_mask; /* Mask for matching this IRE. */
in6_addr_t ire6_addr; /* Address this IRE represents. */
- in6_addr_t ire6_gateway_addr; /* Gateway if IRE_CACHE/IRE_OFFSUBNET */
- in6_addr_t ire6_cmask; /* Mask from parent prefix route */
+ in6_addr_t ire6_gateway_addr; /* Gateway including for IRE_ONLINK */
+ in6_addr_t ire6_setsrc_addr; /* RTF_SETSRC */
} ire6_t;
typedef union ire_addr {
@@ -2637,116 +2533,132 @@
ire4_t ire4_u;
} ire_addr_u_t;
-/* Internet Routing Entry */
-typedef struct ire_s {
+/*
+ * Internet Routing Entry
+ * When we have multiple identical IREs we logically add them by manipulating
+ * ire_identical_ref and ire_delete first decrements
+ * that and when it reaches 1 we know it is the last IRE.
+ * "identical" is defined as being the same for:
+ * ire_addr, ire_netmask, ire_gateway, ire_ill, ire_zoneid, and ire_type
+ * For instance, multiple IRE_BROADCASTs for the same subnet number are
+ * viewed as identical, and so are the IRE_INTERFACEs when there are
+ * multiple logical interfaces (on the same ill) with the same subnet prefix.
+ */
+struct ire_s {
struct ire_s *ire_next; /* The hash chain must be first. */
struct ire_s **ire_ptpn; /* Pointer to previous next. */
uint32_t ire_refcnt; /* Number of references */
- mblk_t *ire_mp; /* Non-null if allocated as mblk */
- queue_t *ire_rfq; /* recv from this queue */
- queue_t *ire_stq; /* send to this queue */
- union {
- uint_t *max_fragp; /* Used only during ire creation */
- uint_t max_frag; /* MTU (next hop or path). */
- } imf_u;
-#define ire_max_frag imf_u.max_frag
-#define ire_max_fragp imf_u.max_fragp
- uint32_t ire_frag_flag; /* IPH_DF or zero. */
- uint32_t ire_ident; /* Per IRE IP ident. */
- uint32_t ire_tire_mark; /* Used for reclaim of unused. */
+ ill_t *ire_ill;
+ uint32_t ire_identical_ref; /* IRE_INTERFACE, IRE_BROADCAST */
uchar_t ire_ipversion; /* IPv4/IPv6 version */
- uchar_t ire_marks; /* IRE_MARK_CONDEMNED etc. */
ushort_t ire_type; /* Type of IRE */
+ uint_t ire_generation; /* Generation including CONDEMNED */
uint_t ire_ib_pkt_count; /* Inbound packets for ire_addr */
uint_t ire_ob_pkt_count; /* Outbound packets to ire_addr */
- uint_t ire_ll_hdr_length; /* Non-zero if we do M_DATA prepends */
time_t ire_create_time; /* Time (in secs) IRE was created. */
- uint32_t ire_phandle; /* Associate prefix IREs to cache */
- uint32_t ire_ihandle; /* Associate interface IREs to cache */
- ipif_t *ire_ipif; /* the interface that this ire uses */
uint32_t ire_flags; /* flags related to route (RTF_*) */
/*
- * Neighbor Cache Entry for IPv6; arp info for IPv4
+ * ire_testhidden is TRUE for INTERFACE IREs of IS_UNDER_IPMP(ill)
+ * interfaces
*/
- struct nce_s *ire_nce;
+ boolean_t ire_testhidden;
+ pfirerecv_t ire_recvfn; /* Receive side handling */
+ pfiresend_t ire_sendfn; /* Send side handling */
+ pfirepostfrag_t ire_postfragfn; /* Bottom end of send handling */
+
uint_t ire_masklen; /* # bits in ire_mask{,_v6} */
ire_addr_u_t ire_u; /* IPv4/IPv6 address info. */
irb_t *ire_bucket; /* Hash bucket when ire_ptphn is set */
- iulp_t ire_uinfo; /* Upper layer protocol info. */
- /*
- * Protects ire_uinfo, ire_max_frag, and ire_frag_flag.
- */
kmutex_t ire_lock;
- uint_t ire_ipif_seqid; /* ipif_seqid of ire_ipif */
- uint_t ire_ipif_ifindex; /* ifindex associated with ipif */
- clock_t ire_last_used_time; /* Last used time */
+ clock_t ire_last_used_time; /* For IRE_LOCAL reception */
tsol_ire_gw_secattr_t *ire_gw_secattr; /* gateway security attributes */
- zoneid_t ire_zoneid; /* for local address discrimination */
+ zoneid_t ire_zoneid;
+
/*
- * ire's that are embedded inside mblk_t and sent to the external
- * resolver use the ire_stq_ifindex to track the ifindex of the
- * ire_stq, so that the ill (if it exists) can be correctly recovered
- * for cleanup in the esbfree routine when arp failure occurs.
- * Similarly, the ire_stackid is used to recover the ip_stack_t.
+ * Cached information of where to send packets that match this route.
+ * The ire_dep_* information is used to determine when ire_nce_cache
+ * needs to be updated.
+ * ire_nce_cache is the fastpath for the Neighbor Cache Entry
+ * for IPv6; arp info for IPv4
+ * Since this is a cache setup and torn down independently of
+ * applications we need to use nce_ref{rele,hold}_notr for it.
*/
- uint_t ire_stq_ifindex;
- netstackid_t ire_stackid;
+ nce_t *ire_nce_cache;
+
+ /*
+ * Quick check whether the ire_type and ire_masklen indicates
+ * that the IRE can have ire_nce_cache set i.e., whether it is
+ * IRE_ONLINK and for a single destination.
+ */
+ boolean_t ire_nce_capable;
+
+ /*
+ * Dependency tracking so we can safely cache IRE and NCE pointers
+ * in offlink and onlink IREs.
+ * These are locked under the ips_ire_dep_lock rwlock. Write held
+ * when modifying the linkage.
+ * ire_dep_parent (Also chain towards IRE for nexthop)
+ * ire_dep_parent_generation: ire_generation of ire_dep_parent
+ * ire_dep_children (From parent to first child)
+ * ire_dep_sib_next (linked list of siblings)
+ * ire_dep_sib_ptpn (linked list of siblings)
+ *
+ * The parent has a ire_refhold on each child, and each child has
+ * an ire_refhold on its parent.
+ * Since ire_dep_parent is a cache setup and torn down independently of
+ * applications we need to use ire_ref{rele,hold}_notr for it.
+ */
+ ire_t *ire_dep_parent;
+ ire_t *ire_dep_children;
+ ire_t *ire_dep_sib_next;
+ ire_t **ire_dep_sib_ptpn; /* Pointer to previous next */
+ uint_t ire_dep_parent_generation;
+
+ uint_t ire_badcnt; /* Number of times ND_UNREACHABLE */
+ uint64_t ire_last_badcnt; /* In seconds */
+
+ /* ire_defense* and ire_last_used_time are only used on IRE_LOCALs */
uint_t ire_defense_count; /* number of ARP conflicts */
uint_t ire_defense_time; /* last time defended (secs) */
+
boolean_t ire_trace_disable; /* True when alloc fails */
ip_stack_t *ire_ipst; /* Does not have a netstack_hold */
-} ire_t;
+ iulp_t ire_metrics;
+};
/* IPv4 compatibility macros */
-#define ire_src_addr ire_u.ire4_u.ire4_src_addr
#define ire_mask ire_u.ire4_u.ire4_mask
#define ire_addr ire_u.ire4_u.ire4_addr
#define ire_gateway_addr ire_u.ire4_u.ire4_gateway_addr
-#define ire_cmask ire_u.ire4_u.ire4_cmask
+#define ire_setsrc_addr ire_u.ire4_u.ire4_setsrc_addr
-#define ire_src_addr_v6 ire_u.ire6_u.ire6_src_addr
#define ire_mask_v6 ire_u.ire6_u.ire6_mask
#define ire_addr_v6 ire_u.ire6_u.ire6_addr
#define ire_gateway_addr_v6 ire_u.ire6_u.ire6_gateway_addr
-#define ire_cmask_v6 ire_u.ire6_u.ire6_cmask
+#define ire_setsrc_addr_v6 ire_u.ire6_u.ire6_setsrc_addr
+
+/*
+ * Values for ire_generation.
+ *
+ * If an IRE is marked with IRE_IS_CONDEMNED, the last walker of
+ * the bucket should delete this IRE from this bucket.
+ *
+ * IRE_GENERATION_VERIFY is never stored in ire_generation but it is
+ * stored in places that cache IREs (such as ixa_ire_generation and
+ * ire_dep_parent_generation). It is used as a signal that the cache is
+ * stale and needs to be reverified.
+ */
+#define IRE_GENERATION_CONDEMNED 0
+#define IRE_GENERATION_VERIFY 1
+#define IRE_GENERATION_INITIAL 2
+#define IRE_IS_CONDEMNED(ire) \
+ ((ire)->ire_generation == IRE_GENERATION_CONDEMNED)
/* Convenient typedefs for sockaddrs */
typedef struct sockaddr_in sin_t;
typedef struct sockaddr_in6 sin6_t;
-/* Address structure used for internal bind with IP */
-typedef struct ipa_conn_s {
- ipaddr_t ac_laddr;
- ipaddr_t ac_faddr;
- uint16_t ac_fport;
- uint16_t ac_lport;
-} ipa_conn_t;
-
-typedef struct ipa6_conn_s {
- in6_addr_t ac6_laddr;
- in6_addr_t ac6_faddr;
- uint16_t ac6_fport;
- uint16_t ac6_lport;
-} ipa6_conn_t;
-
-/*
- * Using ipa_conn_x_t or ipa6_conn_x_t allows us to modify the behavior of IP's
- * bind handler.
- */
-typedef struct ipa_conn_extended_s {
- uint64_t acx_flags;
- ipa_conn_t acx_conn;
-} ipa_conn_x_t;
-
-typedef struct ipa6_conn_extended_s {
- uint64_t ac6x_flags;
- ipa6_conn_t ac6x_conn;
-} ipa6_conn_x_t;
-
-/* flag values for ipa_conn_x_t and ipa6_conn_x_t. */
-#define ACX_VERIFY_DST 0x1ULL /* verify destination address is reachable */
-
/* Name/Value Descriptor. */
typedef struct nv_s {
uint64_t nv_value;
@@ -2784,110 +2696,83 @@
* to support the needs of such tools and private definitions moved to
* private headers.
*/
-struct ip6_pkt_s {
+struct ip_pkt_s {
uint_t ipp_fields; /* Which fields are valid */
- uint_t ipp_sticky_ignored; /* sticky fields to ignore */
- uint_t ipp_ifindex; /* pktinfo ifindex */
in6_addr_t ipp_addr; /* pktinfo src/dst addr */
- uint_t ipp_unicast_hops; /* IPV6_UNICAST_HOPS */
- uint_t ipp_multicast_hops; /* IPV6_MULTICAST_HOPS */
+#define ipp_addr_v4 V4_PART_OF_V6(ipp_addr)
+ uint_t ipp_unicast_hops; /* IPV6_UNICAST_HOPS, IP_TTL */
uint_t ipp_hoplimit; /* IPV6_HOPLIMIT */
uint_t ipp_hopoptslen;
- uint_t ipp_rtdstoptslen;
+ uint_t ipp_rthdrdstoptslen;
uint_t ipp_rthdrlen;
uint_t ipp_dstoptslen;
- uint_t ipp_pathmtulen;
uint_t ipp_fraghdrlen;
ip6_hbh_t *ipp_hopopts;
- ip6_dest_t *ipp_rtdstopts;
+ ip6_dest_t *ipp_rthdrdstopts;
ip6_rthdr_t *ipp_rthdr;
ip6_dest_t *ipp_dstopts;
ip6_frag_t *ipp_fraghdr;
- struct ip6_mtuinfo *ipp_pathmtu;
- in6_addr_t ipp_nexthop; /* Transmit only */
- uint8_t ipp_tclass;
- int8_t ipp_use_min_mtu;
+ uint8_t ipp_tclass; /* IPV6_TCLASS */
+ uint8_t ipp_type_of_service; /* IP_TOS */
+ uint_t ipp_ipv4_options_len; /* Len of IPv4 options */
+ uint8_t *ipp_ipv4_options; /* Ptr to IPv4 options */
+ uint_t ipp_label_len_v4; /* Len of TX label for IPv4 */
+ uint8_t *ipp_label_v4; /* TX label for IPv4 */
+ uint_t ipp_label_len_v6; /* Len of TX label for IPv6 */
+ uint8_t *ipp_label_v6; /* TX label for IPv6 */
};
-typedef struct ip6_pkt_s ip6_pkt_t;
+typedef struct ip_pkt_s ip_pkt_t;
-extern void ip6_pkt_free(ip6_pkt_t *); /* free storage inside ip6_pkt_t */
-
-/*
- * This struct is used by ULP_opt_set() functions to return value of IPv4
- * ancillary options. Currently this is only used by udp and icmp and only
- * IP_PKTINFO option is supported.
- */
-typedef struct ip4_pkt_s {
- uint_t ip4_ill_index; /* interface index */
- ipaddr_t ip4_addr; /* source address */
-} ip4_pkt_t;
-
-/*
- * Used by ULP's to pass options info to ip_output
- * currently only IP_PKTINFO is supported.
- */
-typedef struct ip_opt_info_s {
- uint_t ip_opt_ill_index;
- uint_t ip_opt_flags;
-} ip_opt_info_t;
-
-/*
- * value for ip_opt_flags
- */
-#define IP_VERIFY_SRC 0x1
-
-/*
- * This structure is used to convey information from IP and the ULP.
- * Currently used for the IP_RECVSLLA, IP_RECVIF and IP_RECVPKTINFO options.
- * The type of information field is set to IN_PKTINFO (i.e inbound pkt info)
- */
-typedef struct ip_pktinfo {
- uint32_t ip_pkt_ulp_type; /* type of info sent */
- uint32_t ip_pkt_flags; /* what is sent up by IP */
- uint32_t ip_pkt_ifindex; /* inbound interface index */
- struct sockaddr_dl ip_pkt_slla; /* has source link layer addr */
- struct in_addr ip_pkt_match_addr; /* matched address */
-} ip_pktinfo_t;
-
-/*
- * flags to tell UDP what IP is sending; in_pkt_flags
- */
-#define IPF_RECVIF 0x01 /* inbound interface index */
-#define IPF_RECVSLLA 0x02 /* source link layer address */
-/*
- * Inbound interface index + matched address.
- * Used only by IPV4.
- */
-#define IPF_RECVADDR 0x04
+extern void ip_pkt_free(ip_pkt_t *); /* free storage inside ip_pkt_t */
+extern ipaddr_t ip_pkt_source_route_v4(const ip_pkt_t *);
+extern in6_addr_t *ip_pkt_source_route_v6(const ip_pkt_t *);
+extern int ip_pkt_copy(ip_pkt_t *, ip_pkt_t *, int);
+extern void ip_pkt_source_route_reverse_v4(ip_pkt_t *);
/* ipp_fields values */
-#define IPPF_IFINDEX 0x0001 /* Part of in6_pktinfo: ifindex */
-#define IPPF_ADDR 0x0002 /* Part of in6_pktinfo: src/dst addr */
-#define IPPF_SCOPE_ID 0x0004 /* Add xmit ip6i_t for sin6_scope_id */
-#define IPPF_NO_CKSUM 0x0008 /* Add xmit ip6i_t for IP6I_NO_*_CKSUM */
+#define IPPF_ADDR 0x0001 /* Part of in6_pktinfo: src/dst addr */
+#define IPPF_HOPLIMIT 0x0002 /* Overrides unicast and multicast */
+#define IPPF_TCLASS 0x0004 /* Overrides class in sin6_flowinfo */
-#define IPPF_RAW_CKSUM 0x0010 /* Add xmit ip6i_t for IP6I_RAW_CHECKSUM */
-#define IPPF_HOPLIMIT 0x0020
-#define IPPF_HOPOPTS 0x0040
-#define IPPF_RTHDR 0x0080
+#define IPPF_HOPOPTS 0x0010 /* ipp_hopopts set */
+#define IPPF_RTHDR 0x0020 /* ipp_rthdr set */
+#define IPPF_RTHDRDSTOPTS 0x0040 /* ipp_rthdrdstopts set */
+#define IPPF_DSTOPTS 0x0080 /* ipp_dstopts set */
-#define IPPF_RTDSTOPTS 0x0100
-#define IPPF_DSTOPTS 0x0200
-#define IPPF_NEXTHOP 0x0400
-#define IPPF_PATHMTU 0x0800
+#define IPPF_IPV4_OPTIONS 0x0100 /* ipp_ipv4_options set */
+#define IPPF_LABEL_V4 0x0200 /* ipp_label_v4 set */
+#define IPPF_LABEL_V6 0x0400 /* ipp_label_v6 set */
-#define IPPF_TCLASS 0x1000
-#define IPPF_DONTFRAG 0x2000
-#define IPPF_USE_MIN_MTU 0x04000
-#define IPPF_MULTICAST_HOPS 0x08000
+#define IPPF_FRAGHDR 0x0800 /* Used for IPsec receive side */
-#define IPPF_UNICAST_HOPS 0x10000
-#define IPPF_FRAGHDR 0x20000
+/*
+ * Data structure which is passed to conn_opt_get/set.
+ * The conn_t is included even though it can be inferred from queue_t.
+ * setsockopt and getsockopt use conn_ixa and conn_xmit_ipp. However,
+ * when handling ancillary data we use separate ixa and ipps.
+ */
+typedef struct conn_opt_arg_s {
+ conn_t *coa_connp;
+ ip_xmit_attr_t *coa_ixa;
+ ip_pkt_t *coa_ipp;
+ boolean_t coa_ancillary; /* Ancillary data and not setsockopt */
+ uint_t coa_changed; /* See below */
+} conn_opt_arg_t;
-#define IPPF_HAS_IP6I \
- (IPPF_IFINDEX|IPPF_ADDR|IPPF_NEXTHOP|IPPF_SCOPE_ID| \
- IPPF_NO_CKSUM|IPPF_RAW_CKSUM|IPPF_HOPLIMIT|IPPF_DONTFRAG| \
- IPPF_USE_MIN_MTU|IPPF_MULTICAST_HOPS|IPPF_UNICAST_HOPS)
+/*
+ * Flags for what changed.
+ * If we want to be more efficient in the future we can have more fine
+ * grained flags e.g., a flag for just IP_TOS changing.
+ * For now we either call ip_set_destination (for "route changed")
+ * and/or conn_build_hdr_template/conn_prepend_hdr (for "header changed").
+ */
+#define COA_HEADER_CHANGED 0x0001
+#define COA_ROUTE_CHANGED 0x0002
+#define COA_RCVBUF_CHANGED 0x0004 /* SO_RCVBUF */
+#define COA_SNDBUF_CHANGED 0x0008 /* SO_SNDBUF */
+#define COA_WROFF_CHANGED 0x0010 /* Header size changed */
+#define COA_ICMP_BIND_NEEDED 0x0020
+#define COA_OOBINLINE_CHANGED 0x0040
#define TCP_PORTS_OFFSET 0
#define UDP_PORTS_OFFSET 0
@@ -2902,32 +2787,21 @@
#define IPIF_LOOKUP_FAILED 2 /* Used as error code */
#define ILL_CAN_LOOKUP(ill) \
- (!((ill)->ill_state_flags & (ILL_CONDEMNED | ILL_CHANGING)) || \
+ (!((ill)->ill_state_flags & ILL_CONDEMNED) || \
IAM_WRITER_ILL(ill))
-#define ILL_CAN_WAIT(ill, q) \
- (((q) != NULL) && !((ill)->ill_state_flags & (ILL_CONDEMNED)))
+#define ILL_IS_CONDEMNED(ill) \
+ ((ill)->ill_state_flags & ILL_CONDEMNED)
#define IPIF_CAN_LOOKUP(ipif) \
- (!((ipif)->ipif_state_flags & (IPIF_CONDEMNED | IPIF_CHANGING)) || \
+ (!((ipif)->ipif_state_flags & IPIF_CONDEMNED) || \
IAM_WRITER_IPIF(ipif))
-/*
- * If the parameter 'q' is NULL, the caller is not interested in wait and
- * restart of the operation if the ILL or IPIF cannot be looked up when it is
- * marked as 'CHANGING'. Typically a thread that tries to send out data will
- * end up passing NULLs as the last 4 parameters to ill_lookup_on_ifindex and
- * in this case 'q' is NULL
- */
-#define IPIF_CAN_WAIT(ipif, q) \
- (((q) != NULL) && !((ipif)->ipif_state_flags & (IPIF_CONDEMNED)))
+#define IPIF_IS_CONDEMNED(ipif) \
+ ((ipif)->ipif_state_flags & IPIF_CONDEMNED)
-#define IPIF_CAN_LOOKUP_WALKER(ipif) \
- (!((ipif)->ipif_state_flags & (IPIF_CONDEMNED)) || \
- IAM_WRITER_IPIF(ipif))
-
-#define ILL_UNMARK_CHANGING(ill) \
- (ill)->ill_state_flags &= ~ILL_CHANGING;
+#define IPIF_IS_CHANGING(ipif) \
+ ((ipif)->ipif_state_flags & IPIF_CHANGING)
/* Macros used to assert that this thread is a writer */
#define IAM_WRITER_IPSQ(ipsq) ((ipsq)->ipsq_xop->ipx_writer == curthread)
@@ -2956,9 +2830,9 @@
#define RELEASE_ILL_LOCKS(ill_1, ill_2) \
{ \
if (ill_1 != NULL) \
- mutex_exit(&(ill_1)->ill_lock); \
+ mutex_exit(&(ill_1)->ill_lock); \
if (ill_2 != NULL && ill_2 != ill_1) \
- mutex_exit(&(ill_2)->ill_lock); \
+ mutex_exit(&(ill_2)->ill_lock); \
}
/* Get the other protocol instance ill */
@@ -2975,20 +2849,13 @@
struct lifreq *ci_lifr; /* the lifreq struct passed down */
} cmd_info_t;
-/*
- * List of AH and ESP IPsec acceleration capable ills
- */
-typedef struct ipsec_capab_ill_s {
- uint_t ill_index;
- boolean_t ill_isv6;
- struct ipsec_capab_ill_s *next;
-} ipsec_capab_ill_t;
-
extern struct kmem_cache *ire_cache;
extern ipaddr_t ip_g_all_ones;
-extern uint_t ip_loopback_mtu; /* /etc/system */
+extern uint_t ip_loopback_mtu; /* /etc/system */
+extern uint_t ip_loopback_mtuplus;
+extern uint_t ip_loopback_mtu_v6plus;
extern vmem_t *ip_minor_arena_sa;
extern vmem_t *ip_minor_arena_la;
@@ -3014,18 +2881,18 @@
#define ips_ip_g_send_redirects ips_param_arr[5].ip_param_value
#define ips_ip_g_forward_directed_bcast ips_param_arr[6].ip_param_value
#define ips_ip_mrtdebug ips_param_arr[7].ip_param_value
-#define ips_ip_timer_interval ips_param_arr[8].ip_param_value
-#define ips_ip_ire_arp_interval ips_param_arr[9].ip_param_value
-#define ips_ip_ire_redir_interval ips_param_arr[10].ip_param_value
+#define ips_ip_ire_reclaim_fraction ips_param_arr[8].ip_param_value
+#define ips_ip_nce_reclaim_fraction ips_param_arr[9].ip_param_value
+#define ips_ip_dce_reclaim_fraction ips_param_arr[10].ip_param_value
#define ips_ip_def_ttl ips_param_arr[11].ip_param_value
#define ips_ip_forward_src_routed ips_param_arr[12].ip_param_value
#define ips_ip_wroff_extra ips_param_arr[13].ip_param_value
-#define ips_ip_ire_pathmtu_interval ips_param_arr[14].ip_param_value
+#define ips_ip_pathmtu_interval ips_param_arr[14].ip_param_value
#define ips_ip_icmp_return ips_param_arr[15].ip_param_value
#define ips_ip_path_mtu_discovery ips_param_arr[16].ip_param_value
-#define ips_ip_ignore_delete_time ips_param_arr[17].ip_param_value
+#define ips_ip_pmtu_min ips_param_arr[17].ip_param_value
#define ips_ip_ignore_redirect ips_param_arr[18].ip_param_value
-#define ips_ip_output_queue ips_param_arr[19].ip_param_value
+#define ips_ip_arp_icmp_error ips_param_arr[19].ip_param_value
#define ips_ip_broadcast_ttl ips_param_arr[20].ip_param_value
#define ips_ip_icmp_err_interval ips_param_arr[21].ip_param_value
#define ips_ip_icmp_err_burst ips_param_arr[22].ip_param_value
@@ -3046,7 +2913,7 @@
#define ips_ipv6_send_redirects ips_param_arr[35].ip_param_value
#define ips_ipv6_ignore_redirect ips_param_arr[36].ip_param_value
#define ips_ipv6_strict_dst_multihoming ips_param_arr[37].ip_param_value
-#define ips_ip_ire_reclaim_fraction ips_param_arr[38].ip_param_value
+#define ips_src_check ips_param_arr[38].ip_param_value
#define ips_ipsec_policy_log_interval ips_param_arr[39].ip_param_value
#define ips_pim_accept_clear_messages ips_param_arr[40].ip_param_value
#define ips_ip_ndp_unsolicit_interval ips_param_arr[41].ip_param_value
@@ -3055,21 +2922,37 @@
/* Misc IP configuration knobs */
#define ips_ip_policy_mask ips_param_arr[44].ip_param_value
-#define ips_ip_multirt_resolution_interval ips_param_arr[45].ip_param_value
+#define ips_ip_ecmp_behavior ips_param_arr[45].ip_param_value
#define ips_ip_multirt_ttl ips_param_arr[46].ip_param_value
-#define ips_ip_multidata_outbound ips_param_arr[47].ip_param_value
-#define ips_ip_ndp_defense_interval ips_param_arr[48].ip_param_value
-#define ips_ip_max_temp_idle ips_param_arr[49].ip_param_value
-#define ips_ip_max_temp_defend ips_param_arr[50].ip_param_value
-#define ips_ip_max_defend ips_param_arr[51].ip_param_value
-#define ips_ip_defend_interval ips_param_arr[52].ip_param_value
-#define ips_ip_dup_recovery ips_param_arr[53].ip_param_value
-#define ips_ip_restrict_interzone_loopback ips_param_arr[54].ip_param_value
-#define ips_ip_lso_outbound ips_param_arr[55].ip_param_value
-#define ips_igmp_max_version ips_param_arr[56].ip_param_value
-#define ips_mld_max_version ips_param_arr[57].ip_param_value
-#define ips_ip_pmtu_min ips_param_arr[58].ip_param_value
-#define ips_ipv6_drop_inbound_icmpv6 ips_param_arr[59].ip_param_value
+#define ips_ip_ire_badcnt_lifetime ips_param_arr[47].ip_param_value
+#define ips_ip_max_temp_idle ips_param_arr[48].ip_param_value
+#define ips_ip_max_temp_defend ips_param_arr[49].ip_param_value
+#define ips_ip_max_defend ips_param_arr[50].ip_param_value
+#define ips_ip_defend_interval ips_param_arr[51].ip_param_value
+#define ips_ip_dup_recovery ips_param_arr[52].ip_param_value
+#define ips_ip_restrict_interzone_loopback ips_param_arr[53].ip_param_value
+#define ips_ip_lso_outbound ips_param_arr[54].ip_param_value
+#define ips_igmp_max_version ips_param_arr[55].ip_param_value
+#define ips_mld_max_version ips_param_arr[56].ip_param_value
+#define ips_ipv6_drop_inbound_icmpv6 ips_param_arr[57].ip_param_value
+#define ips_arp_probe_delay ips_param_arr[58].ip_param_value
+#define ips_arp_fastprobe_delay ips_param_arr[59].ip_param_value
+#define ips_arp_probe_interval ips_param_arr[60].ip_param_value
+#define ips_arp_fastprobe_interval ips_param_arr[61].ip_param_value
+#define ips_arp_probe_count ips_param_arr[62].ip_param_value
+#define ips_arp_fastprobe_count ips_param_arr[63].ip_param_value
+#define ips_ipv4_dad_announce_interval ips_param_arr[64].ip_param_value
+#define ips_ipv6_dad_announce_interval ips_param_arr[65].ip_param_value
+#define ips_arp_defend_interval ips_param_arr[66].ip_param_value
+#define ips_arp_defend_rate ips_param_arr[67].ip_param_value
+#define ips_ndp_defend_interval ips_param_arr[68].ip_param_value
+#define ips_ndp_defend_rate ips_param_arr[69].ip_param_value
+#define ips_arp_defend_period ips_param_arr[70].ip_param_value
+#define ips_ndp_defend_period ips_param_arr[71].ip_param_value
+#define ips_ipv4_icmp_return_pmtu ips_param_arr[72].ip_param_value
+#define ips_ipv6_icmp_return_pmtu ips_param_arr[73].ip_param_value
+#define ips_ip_arp_publish_count ips_param_arr[74].ip_param_value
+#define ips_ip_arp_publish_interval ips_param_arr[75].ip_param_value
extern int dohwcksum; /* use h/w cksum if supported by the h/w */
#ifdef ZC_TEST
@@ -3102,13 +2985,13 @@
((ipst)->ips_ip4_loopback_out_event.he_interested)
#define HOOKS6_INTERESTED_LOOPBACK_OUT(ipst) \
((ipst)->ips_ip6_loopback_out_event.he_interested)
-
/*
- * Hooks macros used inside of ip
+ * Hooks marcos used inside of ip
+ * The callers use the above INTERESTED macros first, hence
+ * the he_interested check is superflous.
*/
-#define FW_HOOKS(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst) \
- \
- if ((_hook).he_interested) { \
+#define FW_HOOKS(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst, _err) \
+ if ((_hook).he_interested) { \
hook_pkt_event_t info; \
\
_NOTE(CONSTCOND) \
@@ -3121,12 +3004,15 @@
info.hpe_mp = &(_fm); \
info.hpe_mb = _m; \
info.hpe_flags = _llm; \
- if (hook_run(ipst->ips_ipv4_net_data->netd_hooks, \
- _event, (hook_data_t)&info) != 0) { \
+ _err = hook_run(ipst->ips_ipv4_net_data->netd_hooks, \
+ _event, (hook_data_t)&info); \
+ if (_err != 0) { \
ip2dbg(("%s hook dropped mblk chain %p hdr %p\n",\
(_hook).he_name, (void *)_fm, (void *)_m)); \
- freemsg(_fm); \
- _fm = NULL; \
+ if (_fm != NULL) { \
+ freemsg(_fm); \
+ _fm = NULL; \
+ } \
_iph = NULL; \
_m = NULL; \
} else { \
@@ -3135,9 +3021,8 @@
} \
}
-#define FW_HOOKS6(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst) \
- \
- if ((_hook).he_interested) { \
+#define FW_HOOKS6(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst, _err) \
+ if ((_hook).he_interested) { \
hook_pkt_event_t info; \
\
_NOTE(CONSTCOND) \
@@ -3150,12 +3035,15 @@
info.hpe_mp = &(_fm); \
info.hpe_mb = _m; \
info.hpe_flags = _llm; \
- if (hook_run(ipst->ips_ipv6_net_data->netd_hooks, \
- _event, (hook_data_t)&info) != 0) { \
+ _err = hook_run(ipst->ips_ipv6_net_data->netd_hooks, \
+ _event, (hook_data_t)&info); \
+ if (_err != 0) { \
ip2dbg(("%s hook dropped mblk chain %p hdr %p\n",\
(_hook).he_name, (void *)_fm, (void *)_m)); \
- freemsg(_fm); \
- _fm = NULL; \
+ if (_fm != NULL) { \
+ freemsg(_fm); \
+ _fm = NULL; \
+ } \
_iph = NULL; \
_m = NULL; \
} else { \
@@ -3194,24 +3082,6 @@
#define IP_LOOPBACK_ADDR(addr) \
(((addr) & N_IN_CLASSA_NET == N_IN_LOOPBACK_NET))
-#ifdef DEBUG
-/* IPsec HW acceleration debugging support */
-
-#define IPSECHW_CAPAB 0x0001 /* capability negotiation */
-#define IPSECHW_SADB 0x0002 /* SADB exchange */
-#define IPSECHW_PKT 0x0004 /* general packet flow */
-#define IPSECHW_PKTIN 0x0008 /* driver in pkt processing details */
-#define IPSECHW_PKTOUT 0x0010 /* driver out pkt processing details */
-
-#define IPSECHW_DEBUG(f, x) if (ipsechw_debug & (f)) { (void) printf x; }
-#define IPSECHW_CALL(f, r, x) if (ipsechw_debug & (f)) { (void) r x; }
-
-extern uint32_t ipsechw_debug;
-#else
-#define IPSECHW_DEBUG(f, x) {}
-#define IPSECHW_CALL(f, r, x) {}
-#endif
-
extern int ip_debug;
extern uint_t ip_thread_data;
extern krwlock_t ip_thread_rwlock;
@@ -3235,8 +3105,6 @@
/* Default MAC-layer address string length for mac_colon_addr */
#define MAC_STR_LEN 128
-struct ipsec_out_s;
-
struct mac_header_info_s;
extern void ill_frag_timer(void *);
@@ -3252,86 +3120,173 @@
extern const char *mac_colon_addr(const uint8_t *, size_t, char *, size_t);
extern void ip_lwput(queue_t *, mblk_t *);
extern boolean_t icmp_err_rate_limit(ip_stack_t *);
-extern void icmp_time_exceeded(queue_t *, mblk_t *, uint8_t, zoneid_t,
- ip_stack_t *);
-extern void icmp_unreachable(queue_t *, mblk_t *, uint8_t, zoneid_t,
- ip_stack_t *);
-extern mblk_t *ip_add_info(mblk_t *, ill_t *, uint_t, zoneid_t, ip_stack_t *);
-cred_t *ip_best_cred(mblk_t *, conn_t *, pid_t *);
-extern mblk_t *ip_bind_v4(queue_t *, mblk_t *, conn_t *);
-extern boolean_t ip_bind_ipsec_policy_set(conn_t *, mblk_t *);
-extern int ip_bind_laddr_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t,
- uint16_t, boolean_t);
-extern int ip_proto_bind_laddr_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t,
- uint16_t, boolean_t);
-extern int ip_proto_bind_connected_v4(conn_t *, mblk_t **,
- uint8_t, ipaddr_t *, uint16_t, ipaddr_t, uint16_t, boolean_t, boolean_t,
- cred_t *);
-extern int ip_bind_connected_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t *,
- uint16_t, ipaddr_t, uint16_t, boolean_t, boolean_t, cred_t *);
+extern void icmp_frag_needed(mblk_t *, int, ip_recv_attr_t *);
+extern mblk_t *icmp_inbound_v4(mblk_t *, ip_recv_attr_t *);
+extern void icmp_time_exceeded(mblk_t *, uint8_t, ip_recv_attr_t *);
+extern void icmp_unreachable(mblk_t *, uint8_t, ip_recv_attr_t *);
+extern boolean_t ip_ipsec_policy_inherit(conn_t *, conn_t *, ip_recv_attr_t *);
+extern void *ip_pullup(mblk_t *, ssize_t, ip_recv_attr_t *);
+extern void ip_setl2src(mblk_t *, ip_recv_attr_t *, ill_t *);
+extern mblk_t *ip_check_and_align_header(mblk_t *, uint_t, ip_recv_attr_t *);
+extern mblk_t *ip_check_length(mblk_t *, uchar_t *, ssize_t, uint_t, uint_t,
+ ip_recv_attr_t *);
+extern mblk_t *ip_check_optlen(mblk_t *, ipha_t *, uint_t, uint_t,
+ ip_recv_attr_t *);
+extern mblk_t *ip_fix_dbref(mblk_t *, ip_recv_attr_t *);
extern uint_t ip_cksum(mblk_t *, int, uint32_t);
extern int ip_close(queue_t *, int);
extern uint16_t ip_csum_hdr(ipha_t *);
-extern void ip_proto_not_sup(queue_t *, mblk_t *, uint_t, zoneid_t,
- ip_stack_t *);
+extern void ip_forward_xmit_v4(nce_t *, ill_t *, mblk_t *, ipha_t *,
+ ip_recv_attr_t *, uint32_t, uint32_t);
+extern boolean_t ip_forward_options(mblk_t *, ipha_t *, ill_t *,
+ ip_recv_attr_t *);
+extern int ip_fragment_v4(mblk_t *, nce_t *, iaflags_t, uint_t, uint32_t,
+ uint32_t, zoneid_t, zoneid_t, pfirepostfrag_t postfragfn,
+ uintptr_t *cookie);
+extern void ip_proto_not_sup(mblk_t *, ip_recv_attr_t *);
extern void ip_ire_g_fini(void);
extern void ip_ire_g_init(void);
extern void ip_ire_fini(ip_stack_t *);
extern void ip_ire_init(ip_stack_t *);
+extern void ip_mdata_to_mhi(ill_t *, mblk_t *, struct mac_header_info_s *);
extern int ip_openv4(queue_t *q, dev_t *devp, int flag, int sflag,
cred_t *credp);
extern int ip_openv6(queue_t *q, dev_t *devp, int flag, int sflag,
cred_t *credp);
extern int ip_reassemble(mblk_t *, ipf_t *, uint_t, boolean_t, ill_t *,
size_t);
-extern int ip_opt_set_ill(conn_t *, int, boolean_t, boolean_t,
- int, int, mblk_t *);
extern void ip_rput(queue_t *, mblk_t *);
extern void ip_input(ill_t *, ill_rx_ring_t *, mblk_t *,
struct mac_header_info_s *);
+extern void ip_input_v6(ill_t *, ill_rx_ring_t *, mblk_t *,
+ struct mac_header_info_s *);
+extern mblk_t *ip_input_common_v4(ill_t *, ill_rx_ring_t *, mblk_t *,
+ struct mac_header_info_s *, squeue_t *, mblk_t **, uint_t *);
+extern mblk_t *ip_input_common_v6(ill_t *, ill_rx_ring_t *, mblk_t *,
+ struct mac_header_info_s *, squeue_t *, mblk_t **, uint_t *);
+extern void ill_input_full_v4(mblk_t *, void *, void *,
+ ip_recv_attr_t *, rtc_t *);
+extern void ill_input_short_v4(mblk_t *, void *, void *,
+ ip_recv_attr_t *, rtc_t *);
+extern void ill_input_full_v6(mblk_t *, void *, void *,
+ ip_recv_attr_t *, rtc_t *);
+extern void ill_input_short_v6(mblk_t *, void *, void *,
+ ip_recv_attr_t *, rtc_t *);
+extern ipaddr_t ip_input_options(ipha_t *, ipaddr_t, mblk_t *,
+ ip_recv_attr_t *, int *);
+extern boolean_t ip_input_local_options(mblk_t *, ipha_t *, ip_recv_attr_t *);
+extern mblk_t *ip_input_fragment(mblk_t *, ipha_t *, ip_recv_attr_t *);
+extern mblk_t *ip_input_fragment_v6(mblk_t *, ip6_t *, ip6_frag_t *, uint_t,
+ ip_recv_attr_t *);
+extern void ip_input_post_ipsec(mblk_t *, ip_recv_attr_t *);
+extern void ip_fanout_v4(mblk_t *, ipha_t *, ip_recv_attr_t *);
+extern void ip_fanout_v6(mblk_t *, ip6_t *, ip_recv_attr_t *);
+extern void ip_fanout_proto_conn(conn_t *, mblk_t *, ipha_t *, ip6_t *,
+ ip_recv_attr_t *);
+extern void ip_fanout_proto_v4(mblk_t *, ipha_t *, ip_recv_attr_t *);
+extern void ip_fanout_send_icmp_v4(mblk_t *, uint_t, uint_t,
+ ip_recv_attr_t *);
+extern void ip_fanout_udp_conn(conn_t *, mblk_t *, ipha_t *, ip6_t *,
+ ip_recv_attr_t *);
+extern void ip_fanout_udp_multi_v4(mblk_t *, ipha_t *, uint16_t, uint16_t,
+ ip_recv_attr_t *);
+extern mblk_t *zero_spi_check(mblk_t *, ip_recv_attr_t *);
+extern void ip_build_hdrs_v4(uchar_t *, uint_t, const ip_pkt_t *, uint8_t);
+extern int ip_find_hdr_v4(ipha_t *, ip_pkt_t *, boolean_t);
+extern int ip_total_hdrs_len_v4(const ip_pkt_t *);
+
extern mblk_t *ip_accept_tcp(ill_t *, ill_rx_ring_t *, squeue_t *,
mblk_t *, mblk_t **, uint_t *cnt);
-extern void ip_rput_dlpi(queue_t *, mblk_t *);
-extern void ip_rput_forward(ire_t *, ipha_t *, mblk_t *, ill_t *);
-extern void ip_rput_forward_multicast(ipaddr_t, mblk_t *, ipif_t *);
+extern void ip_rput_dlpi(ill_t *, mblk_t *);
+extern void ip_rput_notdata(ill_t *, mblk_t *);
extern void ip_mib2_add_ip_stats(mib2_ipIfStatsEntry_t *,
mib2_ipIfStatsEntry_t *);
extern void ip_mib2_add_icmp6_stats(mib2_ipv6IfIcmpEntry_t *,
mib2_ipv6IfIcmpEntry_t *);
-extern void ip_udp_input(queue_t *, mblk_t *, ipha_t *, ire_t *, ill_t *);
-extern void ip_proto_input(queue_t *, mblk_t *, ipha_t *, ire_t *, ill_t *,
- uint32_t);
extern void ip_rput_other(ipsq_t *, queue_t *, mblk_t *, void *);
extern ire_t *ip_check_multihome(void *, ire_t *, ill_t *);
-extern void ip_setpktversion(conn_t *, boolean_t, boolean_t, ip_stack_t *);
-extern void ip_trash_ire_reclaim(void *);
-extern void ip_trash_timer_expire(void *);
-extern void ip_wput(queue_t *, mblk_t *);
-extern void ip_output(void *, mblk_t *, void *, int);
-extern void ip_output_options(void *, mblk_t *, void *, int,
- ip_opt_info_t *);
+extern void ip_send_potential_redirect_v4(mblk_t *, ipha_t *, ire_t *,
+ ip_recv_attr_t *);
+extern int ip_set_destination_v4(ipaddr_t *, ipaddr_t, ipaddr_t,
+ ip_xmit_attr_t *, iulp_t *, uint32_t, uint_t);
+extern int ip_set_destination_v6(in6_addr_t *, const in6_addr_t *,
+ const in6_addr_t *, ip_xmit_attr_t *, iulp_t *, uint32_t, uint_t);
-extern void ip_wput_ire(queue_t *, mblk_t *, ire_t *, conn_t *, int,
- zoneid_t);
-extern void ip_wput_local(queue_t *, ill_t *, ipha_t *, mblk_t *, ire_t *,
- int, zoneid_t);
-extern void ip_wput_multicast(queue_t *, mblk_t *, ipif_t *, zoneid_t);
-extern void ip_wput_nondata(ipsq_t *, queue_t *, mblk_t *, void *);
+extern int ip_output_simple(mblk_t *, ip_xmit_attr_t *);
+extern int ip_output_simple_v4(mblk_t *, ip_xmit_attr_t *);
+extern int ip_output_simple_v6(mblk_t *, ip_xmit_attr_t *);
+extern int ip_output_options(mblk_t *, ipha_t *, ip_xmit_attr_t *,
+ ill_t *);
+extern void ip_output_local_options(ipha_t *, ip_stack_t *);
+
+extern ip_xmit_attr_t *conn_get_ixa(conn_t *, boolean_t);
+extern ip_xmit_attr_t *conn_get_ixa_tryhard(conn_t *, boolean_t);
+extern ip_xmit_attr_t *conn_replace_ixa(conn_t *, ip_xmit_attr_t *);
+extern ip_xmit_attr_t *conn_get_ixa_exclusive(conn_t *);
+extern ip_xmit_attr_t *ip_xmit_attr_duplicate(ip_xmit_attr_t *);
+extern void ip_xmit_attr_replace_tsl(ip_xmit_attr_t *, ts_label_t *);
+extern void ip_xmit_attr_restore_tsl(ip_xmit_attr_t *, cred_t *);
+boolean_t ip_recv_attr_replace_label(ip_recv_attr_t *, ts_label_t *);
+extern void ixa_inactive(ip_xmit_attr_t *);
+extern void ixa_refrele(ip_xmit_attr_t *);
+extern boolean_t ixa_check_drain_insert(conn_t *, ip_xmit_attr_t *);
+extern void ixa_cleanup(ip_xmit_attr_t *);
+extern void ira_cleanup(ip_recv_attr_t *, boolean_t);
+extern void ixa_safe_copy(ip_xmit_attr_t *, ip_xmit_attr_t *);
+
+extern int conn_ip_output(mblk_t *, ip_xmit_attr_t *);
+extern boolean_t ip_output_verify_local(ip_xmit_attr_t *);
+extern mblk_t *ip_output_process_local(mblk_t *, ip_xmit_attr_t *, boolean_t,
+ boolean_t, conn_t *);
+
+extern int conn_opt_get(conn_opt_arg_t *, t_scalar_t, t_scalar_t,
+ uchar_t *);
+extern int conn_opt_set(conn_opt_arg_t *, t_scalar_t, t_scalar_t, uint_t,
+ uchar_t *, boolean_t, cred_t *);
+extern boolean_t conn_same_as_last_v4(conn_t *, sin_t *);
+extern boolean_t conn_same_as_last_v6(conn_t *, sin6_t *);
+extern int conn_update_label(const conn_t *, const ip_xmit_attr_t *,
+ const in6_addr_t *, ip_pkt_t *);
+
+extern int ip_opt_set_multicast_group(conn_t *, t_scalar_t,
+ uchar_t *, boolean_t, boolean_t);
+extern int ip_opt_set_multicast_sources(conn_t *, t_scalar_t,
+ uchar_t *, boolean_t, boolean_t);
+extern int conn_getsockname(conn_t *, struct sockaddr *, uint_t *);
+extern int conn_getpeername(conn_t *, struct sockaddr *, uint_t *);
+
+extern int conn_build_hdr_template(conn_t *, uint_t, uint_t,
+ const in6_addr_t *, const in6_addr_t *, uint32_t);
+extern mblk_t *conn_prepend_hdr(ip_xmit_attr_t *, const ip_pkt_t *,
+ const in6_addr_t *, const in6_addr_t *, uint8_t, uint32_t, uint_t,
+ mblk_t *, uint_t, uint_t, uint32_t *, int *);
+extern void ip_attr_newdst(ip_xmit_attr_t *);
+extern void ip_attr_nexthop(const ip_pkt_t *, const ip_xmit_attr_t *,
+ const in6_addr_t *, in6_addr_t *);
+extern int conn_connect(conn_t *, iulp_t *, uint32_t);
+extern int ip_attr_connect(const conn_t *, ip_xmit_attr_t *,
+ const in6_addr_t *, const in6_addr_t *, const in6_addr_t *, in_port_t,
+ in6_addr_t *, iulp_t *, uint32_t);
+extern int conn_inherit_parent(conn_t *, conn_t *);
+
+extern void conn_ixa_cleanup(conn_t *connp, void *arg);
+
+extern boolean_t conn_wantpacket(conn_t *, ip_recv_attr_t *, ipha_t *);
+extern uint_t ip_type_v4(ipaddr_t, ip_stack_t *);
+extern uint_t ip_type_v6(const in6_addr_t *, ip_stack_t *);
+
+extern void ip_wput_nondata(queue_t *, mblk_t *);
extern void ip_wsrv(queue_t *);
extern char *ip_nv_lookup(nv_t *, int);
extern boolean_t ip_local_addr_ok_v6(const in6_addr_t *, const in6_addr_t *);
extern boolean_t ip_remote_addr_ok_v6(const in6_addr_t *, const in6_addr_t *);
extern ipaddr_t ip_massage_options(ipha_t *, netstack_t *);
extern ipaddr_t ip_net_mask(ipaddr_t);
-extern void ip_newroute(queue_t *, mblk_t *, ipaddr_t, conn_t *, zoneid_t,
- ip_stack_t *);
-extern ipxmit_state_t ip_xmit_v4(mblk_t *, ire_t *, struct ipsec_out_s *,
- boolean_t, conn_t *);
-extern int ip_hdr_complete(ipha_t *, zoneid_t, ip_stack_t *);
+extern void arp_bringup_done(ill_t *, int);
+extern void arp_replumb_done(ill_t *, int);
extern struct qinit iprinitv6;
-extern struct qinit ipwinitv6;
extern void ipmp_init(ip_stack_t *);
extern void ipmp_destroy(ip_stack_t *);
@@ -3347,12 +3302,11 @@
extern void ipmp_illgrp_del_ipif(ipmp_illgrp_t *, ipif_t *);
extern ill_t *ipmp_illgrp_next_ill(ipmp_illgrp_t *);
extern ill_t *ipmp_illgrp_hold_next_ill(ipmp_illgrp_t *);
-extern ill_t *ipmp_illgrp_cast_ill(ipmp_illgrp_t *);
extern ill_t *ipmp_illgrp_hold_cast_ill(ipmp_illgrp_t *);
extern ill_t *ipmp_illgrp_ipmp_ill(ipmp_illgrp_t *);
extern void ipmp_illgrp_refresh_mtu(ipmp_illgrp_t *);
-extern ipmp_arpent_t *ipmp_illgrp_create_arpent(ipmp_illgrp_t *, mblk_t *,
- boolean_t);
+extern ipmp_arpent_t *ipmp_illgrp_create_arpent(ipmp_illgrp_t *,
+ boolean_t, ipaddr_t, uchar_t *, size_t, uint16_t);
extern void ipmp_illgrp_destroy_arpent(ipmp_illgrp_t *, ipmp_arpent_t *);
extern ipmp_arpent_t *ipmp_illgrp_lookup_arpent(ipmp_illgrp_t *, ipaddr_t *);
extern void ipmp_illgrp_refresh_arpent(ipmp_illgrp_t *);
@@ -3373,19 +3327,25 @@
extern ill_t *ipmp_ipif_hold_bound_ill(const ipif_t *);
extern boolean_t ipmp_ipif_is_dataaddr(const ipif_t *);
extern boolean_t ipmp_ipif_is_stubaddr(const ipif_t *);
+extern boolean_t ipmp_packet_is_probe(mblk_t *, ill_t *);
+extern ill_t *ipmp_ill_get_xmit_ill(ill_t *, boolean_t);
+extern void ipmp_ncec_flush_nce(ncec_t *);
+extern void ipmp_ncec_fastpath(ncec_t *, ill_t *);
extern void conn_drain_insert(conn_t *, idl_tx_list_t *);
+extern void conn_setqfull(conn_t *, boolean_t *);
+extern void conn_clrqfull(conn_t *, boolean_t *);
extern int conn_ipsec_length(conn_t *);
-extern void ip_wput_ipsec_out(queue_t *, mblk_t *, ipha_t *, ill_t *,
- ire_t *);
extern ipaddr_t ip_get_dst(ipha_t *);
-extern int ipsec_out_extra_length(mblk_t *);
-extern int ipsec_in_extra_length(mblk_t *);
-extern mblk_t *ipsec_in_alloc(boolean_t, netstack_t *);
-extern boolean_t ipsec_in_is_secure(mblk_t *);
-extern void ipsec_out_process(queue_t *, mblk_t *, ire_t *, uint_t);
-extern void ipsec_out_to_in(mblk_t *);
-extern void ip_fanout_proto_again(mblk_t *, ill_t *, ill_t *, ire_t *);
+extern uint_t ip_get_pmtu(ip_xmit_attr_t *);
+extern uint_t ip_get_base_mtu(ill_t *, ire_t *);
+extern mblk_t *ip_output_attach_policy(mblk_t *, ipha_t *, ip6_t *,
+ const conn_t *, ip_xmit_attr_t *);
+extern int ipsec_out_extra_length(ip_xmit_attr_t *);
+extern int ipsec_out_process(mblk_t *, ip_xmit_attr_t *);
+extern int ip_output_post_ipsec(mblk_t *, ip_xmit_attr_t *);
+extern void ipsec_out_to_in(ip_xmit_attr_t *, ill_t *ill,
+ ip_recv_attr_t *);
extern void ire_cleanup(ire_t *);
extern void ire_inactive(ire_t *);
@@ -3407,14 +3367,13 @@
extern uint8_t ipoptp_next(ipoptp_t *);
extern uint8_t ipoptp_first(ipoptp_t *, ipha_t *);
-extern int ip_opt_get_user(const ipha_t *, uchar_t *);
+extern int ip_opt_get_user(conn_t *, uchar_t *);
extern int ipsec_req_from_conn(conn_t *, ipsec_req_t *, int);
extern int ip_snmp_get(queue_t *q, mblk_t *mctl, int level);
extern int ip_snmp_set(queue_t *q, int, int, uchar_t *, int);
extern void ip_process_ioctl(ipsq_t *, queue_t *, mblk_t *, void *);
extern void ip_quiesce_conn(conn_t *);
extern void ip_reprocess_ioctl(ipsq_t *, queue_t *, mblk_t *, void *);
-extern void ip_restart_optmgmt(ipsq_t *, queue_t *, mblk_t *, void *);
extern void ip_ioctl_finish(queue_t *, mblk_t *, int, int, ipsq_t *);
extern boolean_t ip_cmpbuf(const void *, uint_t, boolean_t, const void *,
@@ -3425,32 +3384,36 @@
extern boolean_t ipsq_pending_mp_cleanup(ill_t *, conn_t *);
extern void conn_ioctl_cleanup(conn_t *);
-extern ill_t *conn_get_held_ill(conn_t *, ill_t **, int *);
-struct tcp_stack;
-extern void ip_xmit_reset_serialize(mblk_t *, int, zoneid_t, struct tcp_stack *,
- conn_t *);
-
-struct multidata_s;
-struct pdesc_s;
-
-extern mblk_t *ip_mdinfo_alloc(ill_mdt_capab_t *);
-extern mblk_t *ip_mdinfo_return(ire_t *, conn_t *, char *, ill_mdt_capab_t *);
-extern mblk_t *ip_lsoinfo_alloc(ill_lso_capab_t *);
-extern mblk_t *ip_lsoinfo_return(ire_t *, conn_t *, char *,
- ill_lso_capab_t *);
-extern uint_t ip_md_cksum(struct pdesc_s *, int, uint_t);
-extern boolean_t ip_md_addr_attr(struct multidata_s *, struct pdesc_s *,
- const mblk_t *);
-extern boolean_t ip_md_hcksum_attr(struct multidata_s *, struct pdesc_s *,
- uint32_t, uint32_t, uint32_t, uint32_t);
-extern boolean_t ip_md_zcopy_attr(struct multidata_s *, struct pdesc_s *,
- uint_t);
extern void ip_unbind(conn_t *);
extern void tnet_init(void);
extern void tnet_fini(void);
+/*
+ * Hook functions to enable cluster networking
+ * On non-clustered systems these vectors must always be NULL.
+ */
+extern int (*cl_inet_isclusterwide)(netstackid_t stack_id, uint8_t protocol,
+ sa_family_t addr_family, uint8_t *laddrp, void *args);
+extern uint32_t (*cl_inet_ipident)(netstackid_t stack_id, uint8_t protocol,
+ sa_family_t addr_family, uint8_t *laddrp, uint8_t *faddrp,
+ void *args);
+extern int (*cl_inet_connect2)(netstackid_t stack_id, uint8_t protocol,
+ boolean_t is_outgoing, sa_family_t addr_family, uint8_t *laddrp,
+ in_port_t lport, uint8_t *faddrp, in_port_t fport, void *args);
+extern void (*cl_inet_getspi)(netstackid_t, uint8_t, uint8_t *, size_t,
+ void *);
+extern void (*cl_inet_getspi)(netstackid_t stack_id, uint8_t protocol,
+ uint8_t *ptr, size_t len, void *args);
+extern int (*cl_inet_checkspi)(netstackid_t stack_id, uint8_t protocol,
+ uint32_t spi, void *args);
+extern void (*cl_inet_deletespi)(netstackid_t stack_id, uint8_t protocol,
+ uint32_t spi, void *args);
+extern void (*cl_inet_idlesa)(netstackid_t, uint8_t, uint32_t,
+ sa_family_t, in6_addr_t, in6_addr_t, void *);
+
+
/* Hooks for CGTP (multirt routes) filtering module */
#define CGTP_FILTER_REV_1 1
#define CGTP_FILTER_REV_2 2
@@ -3491,73 +3454,6 @@
extern int ip_cgtp_filter_unregister(netstackid_t);
extern int ip_cgtp_filter_is_registered(netstackid_t);
-/* Flags for ire_multirt_lookup() */
-
-#define MULTIRT_USESTAMP 0x0001
-#define MULTIRT_SETSTAMP 0x0002
-#define MULTIRT_CACHEGW 0x0004
-
-/* Debug stuff for multirt route resolution. */
-#if defined(DEBUG) && !defined(__lint)
-/* Our "don't send, rather drop" flag. */
-#define MULTIRT_DEBUG_FLAG 0x8000
-
-#define MULTIRT_TRACE(x) ip2dbg(x)
-
-#define MULTIRT_DEBUG_TAG(mblk) \
- do { \
- ASSERT(mblk != NULL); \
- MULTIRT_TRACE(("%s[%d]: tagging mblk %p, tag was %d\n", \
- __FILE__, __LINE__, \
- (void *)(mblk), (mblk)->b_flag & MULTIRT_DEBUG_FLAG)); \
- (mblk)->b_flag |= MULTIRT_DEBUG_FLAG; \
- } while (0)
-
-#define MULTIRT_DEBUG_UNTAG(mblk) \
- do { \
- ASSERT(mblk != NULL); \
- MULTIRT_TRACE(("%s[%d]: untagging mblk %p, tag was %d\n", \
- __FILE__, __LINE__, \
- (void *)(mblk), (mblk)->b_flag & MULTIRT_DEBUG_FLAG)); \
- (mblk)->b_flag &= ~MULTIRT_DEBUG_FLAG; \
- } while (0)
-
-#define MULTIRT_DEBUG_TAGGED(mblk) \
- (((mblk)->b_flag & MULTIRT_DEBUG_FLAG) ? B_TRUE : B_FALSE)
-#else
-#define MULTIRT_DEBUG_TAG(mblk) ASSERT(mblk != NULL)
-#define MULTIRT_DEBUG_UNTAG(mblk) ASSERT(mblk != NULL)
-#define MULTIRT_DEBUG_TAGGED(mblk) B_FALSE
-#endif
-
-/*
- * Per-ILL Multidata Transmit capabilities.
- */
-struct ill_mdt_capab_s {
- uint_t ill_mdt_version; /* interface version */
- uint_t ill_mdt_on; /* on/off switch for MDT on this ILL */
- uint_t ill_mdt_hdr_head; /* leading header fragment extra space */
- uint_t ill_mdt_hdr_tail; /* trailing header fragment extra space */
- uint_t ill_mdt_max_pld; /* maximum payload buffers per Multidata */
- uint_t ill_mdt_span_limit; /* maximum payload span per packet */
-};
-
-struct ill_hcksum_capab_s {
- uint_t ill_hcksum_version; /* interface version */
- uint_t ill_hcksum_txflags; /* capabilities on transmit */
-};
-
-struct ill_zerocopy_capab_s {
- uint_t ill_zerocopy_version; /* interface version */
- uint_t ill_zerocopy_flags; /* capabilities */
-};
-
-struct ill_lso_capab_s {
- uint_t ill_lso_on; /* on/off switch for LSO on this ILL */
- uint_t ill_lso_flags; /* capabilities */
- uint_t ill_lso_max; /* maximum size of payload */
-};
-
/*
* rr_ring_state cycles in the order shown below from RR_FREE through
* RR_FREE_IN_PROG and back to RR_FREE.
@@ -3669,18 +3565,61 @@
extern void ip_squeue_quiesce_ring(ill_t *, ill_rx_ring_t *);
extern void ip_squeue_restart_ring(ill_t *, ill_rx_ring_t *);
extern void ip_squeue_clean_all(ill_t *);
+extern boolean_t ip_source_routed(ipha_t *, ip_stack_t *);
extern void tcp_wput(queue_t *, mblk_t *);
-extern int ip_fill_mtuinfo(struct in6_addr *, in_port_t,
- struct ip6_mtuinfo *, netstack_t *);
-extern ipif_t *conn_get_held_ipif(conn_t *, ipif_t **, int *);
+extern int ip_fill_mtuinfo(conn_t *, ip_xmit_attr_t *,
+ struct ip6_mtuinfo *);
extern hook_t *ipobs_register_hook(netstack_t *, pfv_t);
extern void ipobs_unregister_hook(netstack_t *, hook_t *);
extern void ipobs_hook(mblk_t *, int, zoneid_t, zoneid_t, const ill_t *,
ip_stack_t *);
typedef void (*ipsq_func_t)(ipsq_t *, queue_t *, mblk_t *, void *);
+extern void dce_g_init(void);
+extern void dce_g_destroy(void);
+extern void dce_stack_init(ip_stack_t *);
+extern void dce_stack_destroy(ip_stack_t *);
+extern void dce_cleanup(uint_t, ip_stack_t *);
+extern dce_t *dce_get_default(ip_stack_t *);
+extern dce_t *dce_lookup_pkt(mblk_t *, ip_xmit_attr_t *, uint_t *);
+extern dce_t *dce_lookup_v4(ipaddr_t, ip_stack_t *, uint_t *);
+extern dce_t *dce_lookup_v6(const in6_addr_t *, uint_t, ip_stack_t *,
+ uint_t *);
+extern dce_t *dce_lookup_and_add_v4(ipaddr_t, ip_stack_t *);
+extern dce_t *dce_lookup_and_add_v6(const in6_addr_t *, uint_t,
+ ip_stack_t *);
+extern int dce_update_uinfo_v4(ipaddr_t, iulp_t *, ip_stack_t *);
+extern int dce_update_uinfo_v6(const in6_addr_t *, uint_t, iulp_t *,
+ ip_stack_t *);
+extern int dce_update_uinfo(const in6_addr_t *, uint_t, iulp_t *,
+ ip_stack_t *);
+extern void dce_increment_generation(dce_t *);
+extern void dce_increment_all_generations(boolean_t, ip_stack_t *);
+extern void dce_refrele(dce_t *);
+extern void dce_refhold(dce_t *);
+extern void dce_refrele_notr(dce_t *);
+extern void dce_refhold_notr(dce_t *);
+mblk_t *ip_snmp_get_mib2_ip_dce(queue_t *, mblk_t *, ip_stack_t *ipst);
+
+extern ip_laddr_t ip_laddr_verify_v4(ipaddr_t, zoneid_t,
+ ip_stack_t *, boolean_t);
+extern ip_laddr_t ip_laddr_verify_v6(const in6_addr_t *, zoneid_t,
+ ip_stack_t *, boolean_t, uint_t);
+extern int ip_laddr_fanout_insert(conn_t *);
+
+extern boolean_t ip_verify_src(mblk_t *, ip_xmit_attr_t *, uint_t *);
+extern int ip_verify_ire(mblk_t *, ip_xmit_attr_t *);
+
+extern mblk_t *ip_xmit_attr_to_mblk(ip_xmit_attr_t *);
+extern boolean_t ip_xmit_attr_from_mblk(mblk_t *, ip_xmit_attr_t *);
+extern mblk_t *ip_xmit_attr_free_mblk(mblk_t *);
+extern mblk_t *ip_recv_attr_to_mblk(ip_recv_attr_t *);
+extern boolean_t ip_recv_attr_from_mblk(mblk_t *, ip_recv_attr_t *);
+extern mblk_t *ip_recv_attr_free_mblk(mblk_t *);
+extern boolean_t ip_recv_attr_is_mblk(mblk_t *);
+
/*
* Squeue tags. Tags only need to be unique when the callback function is the
* same to distinguish between different calls, but we use unique tags for
@@ -3729,16 +3668,8 @@
#define SQTAG_CONNECT_FINISH 41
#define SQTAG_SYNCHRONOUS_OP 42
#define SQTAG_TCP_SHUTDOWN_OUTPUT 43
-#define SQTAG_XMIT_EARLY_RESET 44
+#define SQTAG_TCP_IXA_CLEANUP 44
-#define NOT_OVER_IP(ip_wq) \
- (ip_wq->q_next != NULL || \
- (ip_wq->q_qinfo->qi_minfo->mi_idname) == NULL || \
- strcmp(ip_wq->q_qinfo->qi_minfo->mi_idname, \
- IP_MOD_NAME) != 0 || \
- ip_wq->q_qinfo->qi_minfo->mi_idnum != IP_MOD_ID)
-
-#define PROTO_FLOW_CNTRLD(connp) (connp->conn_flow_cntrld)
#endif /* _KERNEL */
#ifdef __cplusplus
diff --git a/usr/src/uts/common/inet/ip/conn_opt.c b/usr/src/uts/common/inet/ip/conn_opt.c
new file mode 100644
index 0000000..a46d7c4
--- /dev/null
+++ b/usr/src/uts/common/inet/ip/conn_opt.c
@@ -0,0 +1,2933 @@
+/*
+ * CDDL HEADER START
+ *
+ * The contents of this file are subject to the terms of the
+ * Common Development and Distribution License (the "License").
+ * You may not use this file except in compliance with the License.
+ *
+ * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
+ * or http://www.opensolaris.org/os/licensing.
+ * See the License for the specific language governing permissions
+ * and limitations under the License.
+ *
+ * When distributing Covered Code, include this CDDL HEADER in each
+ * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
+ * If applicable, add the following below this CDDL HEADER, with the
+ * fields enclosed by brackets "[]" replaced with your own identifying
+ * information: Portions Copyright [yyyy] [name of copyright owner]
+ *
+ * CDDL HEADER END
+ */
+
+/*
+ * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
+ * Use is subject to license terms.
+ */
+/* Copyright (c) 1990 Mentat Inc. */
+
+#include <sys/types.h>
+#include <sys/stream.h>
+#include <sys/strsun.h>
+#define _SUN_TPI_VERSION 2
+#include <sys/tihdr.h>
+#include <sys/xti_inet.h>
+#include <sys/ucred.h>
+#include <sys/zone.h>
+#include <sys/ddi.h>
+#include <sys/sunddi.h>
+#include <sys/cmn_err.h>
+#include <sys/debug.h>
+#include <sys/atomic.h>
+#include <sys/policy.h>
+
+#include <sys/systm.h>
+#include <sys/param.h>
+#include <sys/kmem.h>
+#include <sys/sdt.h>
+#include <sys/socket.h>
+#include <sys/ethernet.h>
+#include <sys/mac.h>
+#include <net/if.h>
+#include <net/if_types.h>
+#include <net/if_arp.h>
+#include <net/route.h>
+#include <sys/sockio.h>
+#include <netinet/in.h>
+#include <net/if_dl.h>
+
+#include <inet/common.h>
+#include <inet/mi.h>
+#include <inet/mib2.h>
+#include <inet/nd.h>
+#include <inet/arp.h>
+#include <inet/snmpcom.h>
+#include <inet/kstatcom.h>
+
+#include <netinet/igmp_var.h>
+#include <netinet/ip6.h>
+#include <netinet/icmp6.h>
+#include <netinet/sctp.h>
+
+#include <inet/ip.h>
+#include <inet/ip_impl.h>
+#include <inet/ip6.h>
+#include <inet/ip6_asp.h>
+#include <inet/tcp.h>
+#include <inet/ip_multi.h>
+#include <inet/ip_if.h>
+#include <inet/ip_ire.h>
+#include <inet/ip_ftable.h>
+#include <inet/ip_rts.h>
+#include <inet/optcom.h>
+#include <inet/ip_ndp.h>
+#include <inet/ip_listutils.h>
+#include <netinet/igmp.h>
+#include <netinet/ip_mroute.h>
+#include <netinet/udp.h>
+#include <inet/ipp_common.h>
+
+#include <net/pfkeyv2.h>
+#include <inet/sadb.h>
+#include <inet/ipsec_impl.h>
+#include <inet/ipdrop.h>
+#include <inet/ip_netinfo.h>
+
+#include <inet/ipclassifier.h>
+#include <inet/sctp_ip.h>
+#include <inet/sctp/sctp_impl.h>
+#include <inet/udp_impl.h>
+#include <sys/sunddi.h>
+
+#include <sys/tsol/label.h>
+#include <sys/tsol/tnet.h>
+
+static sin_t sin_null; /* Zero address for quick clears */
+static sin6_t sin6_null; /* Zero address for quick clears */
+
+/*
+ * Return how much size is needed for the different ancillary data items
+ */
+uint_t
+conn_recvancillary_size(conn_t *connp, crb_t recv_ancillary,
+ ip_recv_attr_t *ira, mblk_t *mp, ip_pkt_t *ipp)
+{
+ uint_t ancil_size;
+ ip_stack_t *ipst = connp->conn_netstack->netstack_ip;
+
+ /*
+ * If IP_RECVDSTADDR is set we include the destination IP
+ * address as an option. With IP_RECVOPTS we include all
+ * the IP options.
+ */
+ ancil_size = 0;
+ if (recv_ancillary.crb_recvdstaddr &&
+ (ira->ira_flags & IRAF_IS_IPV4)) {
+ ancil_size += sizeof (struct T_opthdr) +
+ sizeof (struct in_addr);
+ IP_STAT(ipst, conn_in_recvdstaddr);
+ }
+
+ /*
+ * ip_recvpktinfo is used for both AF_INET and AF_INET6 but
+ * are different
+ */
+ if (recv_ancillary.crb_ip_recvpktinfo &&
+ connp->conn_family == AF_INET) {
+ ancil_size += sizeof (struct T_opthdr) +
+ sizeof (struct in_pktinfo);
+ IP_STAT(ipst, conn_in_recvpktinfo);
+ }
+
+ if ((recv_ancillary.crb_recvopts) &&
+ (ipp->ipp_fields & IPPF_IPV4_OPTIONS)) {
+ ancil_size += sizeof (struct T_opthdr) +
+ ipp->ipp_ipv4_options_len;
+ IP_STAT(ipst, conn_in_recvopts);
+ }
+
+ if (recv_ancillary.crb_recvslla) {
+ ip_stack_t *ipst = connp->conn_netstack->netstack_ip;
+ ill_t *ill;
+
+ /* Make sure ira_l2src is setup if not already */
+ if (!(ira->ira_flags & IRAF_L2SRC_SET)) {
+ ill = ill_lookup_on_ifindex(ira->ira_rifindex, B_FALSE,
+ ipst);
+ if (ill != NULL) {
+ ip_setl2src(mp, ira, ill);
+ ill_refrele(ill);
+ }
+ }
+ ancil_size += sizeof (struct T_opthdr) +
+ sizeof (struct sockaddr_dl);
+ IP_STAT(ipst, conn_in_recvslla);
+ }
+
+ if (recv_ancillary.crb_recvif) {
+ ancil_size += sizeof (struct T_opthdr) + sizeof (uint_t);
+ IP_STAT(ipst, conn_in_recvif);
+ }
+
+ /*
+ * ip_recvpktinfo is used for both AF_INET and AF_INET6 but
+ * are different
+ */
+ if (recv_ancillary.crb_ip_recvpktinfo &&
+ connp->conn_family == AF_INET6) {
+ ancil_size += sizeof (struct T_opthdr) +
+ sizeof (struct in6_pktinfo);
+ IP_STAT(ipst, conn_in_recvpktinfo);
+ }
+
+ if (recv_ancillary.crb_ipv6_recvhoplimit) {
+ ancil_size += sizeof (struct T_opthdr) + sizeof (int);
+ IP_STAT(ipst, conn_in_recvhoplimit);
+ }
+
+ if (recv_ancillary.crb_ipv6_recvtclass) {
+ ancil_size += sizeof (struct T_opthdr) + sizeof (int);
+ IP_STAT(ipst, conn_in_recvtclass);
+ }
+
+ if (recv_ancillary.crb_ipv6_recvhopopts &&
+ (ipp->ipp_fields & IPPF_HOPOPTS)) {
+ ancil_size += sizeof (struct T_opthdr) + ipp->ipp_hopoptslen;
+ IP_STAT(ipst, conn_in_recvhopopts);
+ }
+ /*
+ * To honor RFC3542 when an application asks for both IPV6_RECVDSTOPTS
+ * and IPV6_RECVRTHDR, we pass up the item rthdrdstopts (the destination
+ * options that appear before a routing header.
+ * We also pass them up if IPV6_RECVRTHDRDSTOPTS is set.
+ */
+ if (ipp->ipp_fields & IPPF_RTHDRDSTOPTS) {
+ if (recv_ancillary.crb_ipv6_recvrthdrdstopts ||
+ (recv_ancillary.crb_ipv6_recvdstopts &&
+ recv_ancillary.crb_ipv6_recvrthdr)) {
+ ancil_size += sizeof (struct T_opthdr) +
+ ipp->ipp_rthdrdstoptslen;
+ IP_STAT(ipst, conn_in_recvrthdrdstopts);
+ }
+ }
+ if ((recv_ancillary.crb_ipv6_recvrthdr) &&
+ (ipp->ipp_fields & IPPF_RTHDR)) {
+ ancil_size += sizeof (struct T_opthdr) + ipp->ipp_rthdrlen;
+ IP_STAT(ipst, conn_in_recvrthdr);
+ }
+ if ((recv_ancillary.crb_ipv6_recvdstopts ||
+ recv_ancillary.crb_old_ipv6_recvdstopts) &&
+ (ipp->ipp_fields & IPPF_DSTOPTS)) {
+ ancil_size += sizeof (struct T_opthdr) + ipp->ipp_dstoptslen;
+ IP_STAT(ipst, conn_in_recvdstopts);
+ }
+ if (recv_ancillary.crb_recvucred && ira->ira_cred != NULL) {
+ ancil_size += sizeof (struct T_opthdr) + ucredsize;
+ IP_STAT(ipst, conn_in_recvucred);
+ }
+
+ /*
+ * If SO_TIMESTAMP is set allocate the appropriate sized
+ * buffer. Since gethrestime() expects a pointer aligned
+ * argument, we allocate space necessary for extra
+ * alignment (even though it might not be used).
+ */
+ if (recv_ancillary.crb_timestamp) {
+ ancil_size += sizeof (struct T_opthdr) +
+ sizeof (timestruc_t) + _POINTER_ALIGNMENT;
+ IP_STAT(ipst, conn_in_timestamp);
+ }
+
+ /*
+ * If IP_RECVTTL is set allocate the appropriate sized buffer
+ */
+ if (recv_ancillary.crb_recvttl &&
+ (ira->ira_flags & IRAF_IS_IPV4)) {
+ ancil_size += sizeof (struct T_opthdr) + sizeof (uint8_t);
+ IP_STAT(ipst, conn_in_recvttl);
+ }
+
+ return (ancil_size);
+}
+
+/*
+ * Lay down the ancillary data items at "ancil_buf".
+ * Assumes caller has used conn_recvancillary_size to allocate a sufficiently
+ * large buffer - ancil_size.
+ */
+void
+conn_recvancillary_add(conn_t *connp, crb_t recv_ancillary,
+ ip_recv_attr_t *ira, ip_pkt_t *ipp, uchar_t *ancil_buf, uint_t ancil_size)
+{
+ /*
+ * Copy in destination address before options to avoid
+ * any padding issues.
+ */
+ if (recv_ancillary.crb_recvdstaddr &&
+ (ira->ira_flags & IRAF_IS_IPV4)) {
+ struct T_opthdr *toh;
+ ipaddr_t *dstptr;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_RECVDSTADDR;
+ toh->len = sizeof (struct T_opthdr) + sizeof (ipaddr_t);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ dstptr = (ipaddr_t *)ancil_buf;
+ *dstptr = ipp->ipp_addr_v4;
+ ancil_buf += sizeof (ipaddr_t);
+ ancil_size -= toh->len;
+ }
+
+ /*
+ * ip_recvpktinfo is used for both AF_INET and AF_INET6 but
+ * are different
+ */
+ if (recv_ancillary.crb_ip_recvpktinfo &&
+ connp->conn_family == AF_INET) {
+ ip_stack_t *ipst = connp->conn_netstack->netstack_ip;
+ struct T_opthdr *toh;
+ struct in_pktinfo *pktinfop;
+ ill_t *ill;
+ ipif_t *ipif;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_PKTINFO;
+ toh->len = sizeof (struct T_opthdr) + sizeof (*pktinfop);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ pktinfop = (struct in_pktinfo *)ancil_buf;
+
+ pktinfop->ipi_ifindex = ira->ira_ruifindex;
+ pktinfop->ipi_spec_dst.s_addr = INADDR_ANY;
+
+ /* Find a good address to report */
+ ill = ill_lookup_on_ifindex(ira->ira_ruifindex, B_FALSE, ipst);
+ if (ill != NULL) {
+ ipif = ipif_good_addr(ill, IPCL_ZONEID(connp));
+ if (ipif != NULL) {
+ pktinfop->ipi_spec_dst.s_addr =
+ ipif->ipif_lcl_addr;
+ ipif_refrele(ipif);
+ }
+ ill_refrele(ill);
+ }
+ pktinfop->ipi_addr.s_addr = ipp->ipp_addr_v4;
+ ancil_buf += sizeof (struct in_pktinfo);
+ ancil_size -= toh->len;
+ }
+
+ if ((recv_ancillary.crb_recvopts) &&
+ (ipp->ipp_fields & IPPF_IPV4_OPTIONS)) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_RECVOPTS;
+ toh->len = sizeof (struct T_opthdr) + ipp->ipp_ipv4_options_len;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ bcopy(ipp->ipp_ipv4_options, ancil_buf,
+ ipp->ipp_ipv4_options_len);
+ ancil_buf += ipp->ipp_ipv4_options_len;
+ ancil_size -= toh->len;
+ }
+
+ if (recv_ancillary.crb_recvslla) {
+ ip_stack_t *ipst = connp->conn_netstack->netstack_ip;
+ struct T_opthdr *toh;
+ struct sockaddr_dl *dstptr;
+ ill_t *ill;
+ int alen = 0;
+
+ ill = ill_lookup_on_ifindex(ira->ira_rifindex, B_FALSE, ipst);
+ if (ill != NULL)
+ alen = ill->ill_phys_addr_length;
+
+ /*
+ * For loopback multicast and broadcast the packet arrives
+ * with ira_ruifdex being the physical interface, but
+ * ira_l2src is all zero since ip_postfrag_loopback doesn't
+ * know our l2src. We don't report the address in that case.
+ */
+ if (ira->ira_flags & IRAF_LOOPBACK)
+ alen = 0;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_RECVSLLA;
+ toh->len = sizeof (struct T_opthdr) +
+ sizeof (struct sockaddr_dl);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ dstptr = (struct sockaddr_dl *)ancil_buf;
+ dstptr->sdl_family = AF_LINK;
+ dstptr->sdl_index = ira->ira_ruifindex;
+ if (ill != NULL)
+ dstptr->sdl_type = ill->ill_type;
+ else
+ dstptr->sdl_type = 0;
+ dstptr->sdl_nlen = 0;
+ dstptr->sdl_alen = alen;
+ dstptr->sdl_slen = 0;
+ bcopy(ira->ira_l2src, dstptr->sdl_data, alen);
+ ancil_buf += sizeof (struct sockaddr_dl);
+ ancil_size -= toh->len;
+ if (ill != NULL)
+ ill_refrele(ill);
+ }
+
+ if (recv_ancillary.crb_recvif) {
+ struct T_opthdr *toh;
+ uint_t *dstptr;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_RECVIF;
+ toh->len = sizeof (struct T_opthdr) + sizeof (uint_t);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ dstptr = (uint_t *)ancil_buf;
+ *dstptr = ira->ira_ruifindex;
+ ancil_buf += sizeof (uint_t);
+ ancil_size -= toh->len;
+ }
+
+ /*
+ * ip_recvpktinfo is used for both AF_INET and AF_INET6 but
+ * are different
+ */
+ if (recv_ancillary.crb_ip_recvpktinfo &&
+ connp->conn_family == AF_INET6) {
+ struct T_opthdr *toh;
+ struct in6_pktinfo *pkti;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_PKTINFO;
+ toh->len = sizeof (struct T_opthdr) + sizeof (*pkti);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ pkti = (struct in6_pktinfo *)ancil_buf;
+ if (ira->ira_flags & IRAF_IS_IPV4) {
+ IN6_IPADDR_TO_V4MAPPED(ipp->ipp_addr_v4,
+ &pkti->ipi6_addr);
+ } else {
+ pkti->ipi6_addr = ipp->ipp_addr;
+ }
+ pkti->ipi6_ifindex = ira->ira_ruifindex;
+
+ ancil_buf += sizeof (*pkti);
+ ancil_size -= toh->len;
+ }
+ if (recv_ancillary.crb_ipv6_recvhoplimit) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_HOPLIMIT;
+ toh->len = sizeof (struct T_opthdr) + sizeof (uint_t);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ *(uint_t *)ancil_buf = ipp->ipp_hoplimit;
+ ancil_buf += sizeof (uint_t);
+ ancil_size -= toh->len;
+ }
+ if (recv_ancillary.crb_ipv6_recvtclass) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_TCLASS;
+ toh->len = sizeof (struct T_opthdr) + sizeof (uint_t);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+
+ if (ira->ira_flags & IRAF_IS_IPV4)
+ *(uint_t *)ancil_buf = ipp->ipp_type_of_service;
+ else
+ *(uint_t *)ancil_buf = ipp->ipp_tclass;
+ ancil_buf += sizeof (uint_t);
+ ancil_size -= toh->len;
+ }
+ if (recv_ancillary.crb_ipv6_recvhopopts &&
+ (ipp->ipp_fields & IPPF_HOPOPTS)) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_HOPOPTS;
+ toh->len = sizeof (struct T_opthdr) + ipp->ipp_hopoptslen;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ bcopy(ipp->ipp_hopopts, ancil_buf, ipp->ipp_hopoptslen);
+ ancil_buf += ipp->ipp_hopoptslen;
+ ancil_size -= toh->len;
+ }
+ /*
+ * To honor RFC3542 when an application asks for both IPV6_RECVDSTOPTS
+ * and IPV6_RECVRTHDR, we pass up the item rthdrdstopts (the destination
+ * options that appear before a routing header.
+ * We also pass them up if IPV6_RECVRTHDRDSTOPTS is set.
+ */
+ if (ipp->ipp_fields & IPPF_RTHDRDSTOPTS) {
+ if (recv_ancillary.crb_ipv6_recvrthdrdstopts ||
+ (recv_ancillary.crb_ipv6_recvdstopts &&
+ recv_ancillary.crb_ipv6_recvrthdr)) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_DSTOPTS;
+ toh->len = sizeof (struct T_opthdr) +
+ ipp->ipp_rthdrdstoptslen;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ bcopy(ipp->ipp_rthdrdstopts, ancil_buf,
+ ipp->ipp_rthdrdstoptslen);
+ ancil_buf += ipp->ipp_rthdrdstoptslen;
+ ancil_size -= toh->len;
+ }
+ }
+ if (recv_ancillary.crb_ipv6_recvrthdr &&
+ (ipp->ipp_fields & IPPF_RTHDR)) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_RTHDR;
+ toh->len = sizeof (struct T_opthdr) + ipp->ipp_rthdrlen;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ bcopy(ipp->ipp_rthdr, ancil_buf, ipp->ipp_rthdrlen);
+ ancil_buf += ipp->ipp_rthdrlen;
+ ancil_size -= toh->len;
+ }
+ if ((recv_ancillary.crb_ipv6_recvdstopts ||
+ recv_ancillary.crb_old_ipv6_recvdstopts) &&
+ (ipp->ipp_fields & IPPF_DSTOPTS)) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IPV6;
+ toh->name = IPV6_DSTOPTS;
+ toh->len = sizeof (struct T_opthdr) + ipp->ipp_dstoptslen;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ bcopy(ipp->ipp_dstopts, ancil_buf, ipp->ipp_dstoptslen);
+ ancil_buf += ipp->ipp_dstoptslen;
+ ancil_size -= toh->len;
+ }
+
+ if (recv_ancillary.crb_recvucred && ira->ira_cred != NULL) {
+ struct T_opthdr *toh;
+ cred_t *rcr = connp->conn_cred;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = SOL_SOCKET;
+ toh->name = SCM_UCRED;
+ toh->len = sizeof (struct T_opthdr) + ucredsize;
+ toh->status = 0;
+ (void) cred2ucred(ira->ira_cred, ira->ira_cpid, &toh[1], rcr);
+ ancil_buf += toh->len;
+ ancil_size -= toh->len;
+ }
+ if (recv_ancillary.crb_timestamp) {
+ struct T_opthdr *toh;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = SOL_SOCKET;
+ toh->name = SCM_TIMESTAMP;
+ toh->len = sizeof (struct T_opthdr) +
+ sizeof (timestruc_t) + _POINTER_ALIGNMENT;
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ /* Align for gethrestime() */
+ ancil_buf = (uchar_t *)P2ROUNDUP((intptr_t)ancil_buf,
+ sizeof (intptr_t));
+ gethrestime((timestruc_t *)ancil_buf);
+ ancil_buf = (uchar_t *)toh + toh->len;
+ ancil_size -= toh->len;
+ }
+
+ /*
+ * CAUTION:
+ * Due to aligment issues
+ * Processing of IP_RECVTTL option
+ * should always be the last. Adding
+ * any option processing after this will
+ * cause alignment panic.
+ */
+ if (recv_ancillary.crb_recvttl &&
+ (ira->ira_flags & IRAF_IS_IPV4)) {
+ struct T_opthdr *toh;
+ uint8_t *dstptr;
+
+ toh = (struct T_opthdr *)ancil_buf;
+ toh->level = IPPROTO_IP;
+ toh->name = IP_RECVTTL;
+ toh->len = sizeof (struct T_opthdr) + sizeof (uint8_t);
+ toh->status = 0;
+ ancil_buf += sizeof (struct T_opthdr);
+ dstptr = (uint8_t *)ancil_buf;
+ *dstptr = ipp->ipp_hoplimit;
+ ancil_buf += sizeof (uint8_t);
+ ancil_size -= toh->len;
+ }
+
+ /* Consumed all of allocated space */
+ ASSERT(ancil_size == 0);
+
+}
+
+/*
+ * This routine retrieves the current status of socket options.
+ * It returns the size of the option retrieved, or -1.
+ */
+int
+conn_opt_get(conn_opt_arg_t *coa, t_scalar_t level, t_scalar_t name,
+ uchar_t *ptr)
+{
+ int *i1 = (int *)ptr;
+ conn_t *connp = coa->coa_connp;
+ ip_xmit_attr_t *ixa = coa->coa_ixa;
+ ip_pkt_t *ipp = coa->coa_ipp;
+ ip_stack_t *ipst = ixa->ixa_ipst;
+ uint_t len;
+
+ ASSERT(MUTEX_HELD(&coa->coa_connp->conn_lock));
+
+ switch (level) {
+ case SOL_SOCKET:
+ switch (name) {
+ case SO_DEBUG:
+ *i1 = connp->conn_debug ? SO_DEBUG : 0;
+ break; /* goto sizeof (int) option return */
+ case SO_KEEPALIVE:
+ *i1 = connp->conn_keepalive ? SO_KEEPALIVE : 0;
+ break;
+ case SO_LINGER: {
+ struct linger *lgr = (struct linger *)ptr;
+
+ lgr->l_onoff = connp->conn_linger ? SO_LINGER : 0;
+ lgr->l_linger = connp->conn_lingertime;
+ }
+ return (sizeof (struct linger));
+
+ case SO_OOBINLINE:
+ *i1 = connp->conn_oobinline ? SO_OOBINLINE : 0;
+ break;
+ case SO_REUSEADDR:
+ *i1 = connp->conn_reuseaddr ? SO_REUSEADDR : 0;
+ break; /* goto sizeof (int) option return */
+ case SO_TYPE:
+ *i1 = connp->conn_so_type;
+ break; /* goto sizeof (int) option return */
+ case SO_DONTROUTE:
+ *i1 = (ixa->ixa_flags & IXAF_DONTROUTE) ?
+ SO_DONTROUTE : 0;
+ break; /* goto sizeof (int) option return */
+ case SO_USELOOPBACK:
+ *i1 = connp->conn_useloopback ? SO_USELOOPBACK : 0;
+ break; /* goto sizeof (int) option return */
+ case SO_BROADCAST:
+ *i1 = connp->conn_broadcast ? SO_BROADCAST : 0;
+ break; /* goto sizeof (int) option return */
+
+ case SO_SNDBUF:
+ *i1 = connp->conn_sndbuf;
+ break; /* goto sizeof (int) option return */
+ case SO_RCVBUF:
+ *i1 = connp->conn_rcvbuf;
+ break; /* goto sizeof (int) option return */
+ case SO_RCVTIMEO:
+ case SO_SNDTIMEO:
+ /*
+ * Pass these two options in order for third part
+ * protocol usage. Here just return directly.
+ */
+ *i1 = 0;
+ break;
+ case SO_DGRAM_ERRIND:
+ *i1 = connp->conn_dgram_errind ? SO_DGRAM_ERRIND : 0;
+ break; /* goto sizeof (int) option return */
+ case SO_RECVUCRED:
+ *i1 = connp->conn_recv_ancillary.crb_recvucred;
+ break; /* goto sizeof (int) option return */
+ case SO_TIMESTAMP:
+ *i1 = connp->conn_recv_ancillary.crb_timestamp;
+ break; /* goto sizeof (int) option return */
+#ifdef SO_VRRP
+ case SO_VRRP:
+ *i1 = connp->conn_isvrrp;
+ break; /* goto sizeof (int) option return */
+#endif
+ case SO_ANON_MLP:
+ *i1 = connp->conn_anon_mlp;
+ break; /* goto sizeof (int) option return */
+ case SO_MAC_EXEMPT:
+ *i1 = (connp->conn_mac_mode == CONN_MAC_AWARE);
+ break; /* goto sizeof (int) option return */
+ case SO_MAC_IMPLICIT:
+ *i1 = (connp->conn_mac_mode == CONN_MAC_IMPLICIT);
+ break; /* goto sizeof (int) option return */
+ case SO_ALLZONES:
+ *i1 = connp->conn_allzones;
+ break; /* goto sizeof (int) option return */
+ case SO_EXCLBIND:
+ *i1 = connp->conn_exclbind ? SO_EXCLBIND : 0;
+ break;
+ case SO_PROTOTYPE:
+ *i1 = connp->conn_proto;
+ break;
+
+ case SO_DOMAIN:
+ *i1 = connp->conn_family;
+ break;
+ default:
+ return (-1);
+ }
+ break;
+ case IPPROTO_IP:
+ if (connp->conn_family != AF_INET)
+ return (-1);
+ switch (name) {
+ case IP_OPTIONS:
+ case T_IP_OPTIONS:
+ if (!(ipp->ipp_fields & IPPF_IPV4_OPTIONS))
+ return (0);
+
+ len = ipp->ipp_ipv4_options_len;
+ if (len > 0) {
+ bcopy(ipp->ipp_ipv4_options, ptr, len);
+ }
+ return (len);
+
+ case IP_PKTINFO: {
+ /*
+ * This also handles IP_RECVPKTINFO.
+ * IP_PKTINFO and IP_RECVPKTINFO have same value.
+ * Differentiation is based on the size of the
+ * argument passed in.
+ */
+ struct in_pktinfo *pktinfo;
+
+#ifdef notdef
+ /* optcom doesn't provide a length with "get" */
+ if (inlen == sizeof (int)) {
+ /* This is IP_RECVPKTINFO option. */
+ *i1 = connp->conn_recv_ancillary.
+ crb_ip_recvpktinfo;
+ return (sizeof (int));
+ }
+#endif
+ /* XXX assumes that caller has room for max size! */
+
+ pktinfo = (struct in_pktinfo *)ptr;
+ pktinfo->ipi_ifindex = ixa->ixa_ifindex;
+ if (ipp->ipp_fields & IPPF_ADDR)
+ pktinfo->ipi_spec_dst.s_addr = ipp->ipp_addr_v4;
+ else
+ pktinfo->ipi_spec_dst.s_addr = INADDR_ANY;
+ return (sizeof (struct in_pktinfo));
+ }
+ case IP_DONTFRAG:
+ *i1 = (ixa->ixa_flags & IXAF_DONTFRAG) != 0;
+ return (sizeof (int));
+ case IP_TOS:
+ case T_IP_TOS:
+ *i1 = (int)ipp->ipp_type_of_service;
+ break; /* goto sizeof (int) option return */
+ case IP_TTL:
+ *i1 = (int)ipp->ipp_unicast_hops;
+ break; /* goto sizeof (int) option return */
+ case IP_DHCPINIT_IF:
+ return (-1);
+ case IP_NEXTHOP:
+ if (ixa->ixa_flags & IXAF_NEXTHOP_SET) {
+ *(ipaddr_t *)ptr = ixa->ixa_nexthop_v4;
+ return (sizeof (ipaddr_t));
+ } else {
+ return (0);
+ }
+
+ case IP_MULTICAST_IF:
+ /* 0 address if not set */
+ *(ipaddr_t *)ptr = ixa->ixa_multicast_ifaddr;
+ return (sizeof (ipaddr_t));
+ case IP_MULTICAST_TTL:
+ *(uchar_t *)ptr = ixa->ixa_multicast_ttl;
+ return (sizeof (uchar_t));
+ case IP_MULTICAST_LOOP:
+ *ptr = (ixa->ixa_flags & IXAF_MULTICAST_LOOP) ? 1 : 0;
+ return (sizeof (uint8_t));
+ case IP_RECVOPTS:
+ *i1 = connp->conn_recv_ancillary.crb_recvopts;
+ break; /* goto sizeof (int) option return */
+ case IP_RECVDSTADDR:
+ *i1 = connp->conn_recv_ancillary.crb_recvdstaddr;
+ break; /* goto sizeof (int) option return */
+ case IP_RECVIF:
+ *i1 = connp->conn_recv_ancillary.crb_recvif;
+ break; /* goto sizeof (int) option return */
+ case IP_RECVSLLA:
+ *i1 = connp->conn_recv_ancillary.crb_recvslla;
+ break; /* goto sizeof (int) option return */
+ case IP_RECVTTL:
+ *i1 = connp->conn_recv_ancillary.crb_recvttl;
+ break; /* goto sizeof (int) option return */
+ case IP_ADD_MEMBERSHIP:
+ case IP_DROP_MEMBERSHIP:
+ case MCAST_JOIN_GROUP:
+ case MCAST_LEAVE_GROUP:
+ case IP_BLOCK_SOURCE:
+ case IP_UNBLOCK_SOURCE:
+ case IP_ADD_SOURCE_MEMBERSHIP:
+ case IP_DROP_SOURCE_MEMBERSHIP:
+ case MCAST_BLOCK_SOURCE:
+ case MCAST_UNBLOCK_SOURCE:
+ case MCAST_JOIN_SOURCE_GROUP:
+ case MCAST_LEAVE_SOURCE_GROUP:
+ case MRT_INIT:
+ case MRT_DONE:
+ case MRT_ADD_VIF:
+ case MRT_DEL_VIF:
+ case MRT_ADD_MFC:
+ case MRT_DEL_MFC:
+ /* cannot "get" the value for these */
+ return (-1);
+ case MRT_VERSION:
+ case MRT_ASSERT:
+ (void) ip_mrouter_get(name, connp, ptr);
+ return (sizeof (int));
+ case IP_SEC_OPT:
+ return (ipsec_req_from_conn(connp, (ipsec_req_t *)ptr,
+ IPSEC_AF_V4));
+ case IP_BOUND_IF:
+ /* Zero if not set */
+ *i1 = connp->conn_bound_if;
+ break; /* goto sizeof (int) option return */
+ case IP_UNSPEC_SRC:
+ *i1 = connp->conn_unspec_src;
+ break; /* goto sizeof (int) option return */
+ case IP_BROADCAST_TTL:
+ if (ixa->ixa_flags & IXAF_BROADCAST_TTL_SET)
+ *(uchar_t *)ptr = ixa->ixa_broadcast_ttl;
+ else
+ *(uchar_t *)ptr = ipst->ips_ip_broadcast_ttl;
+ return (sizeof (uchar_t));
+ default:
+ return (-1);
+ }
+ break;
+ case IPPROTO_IPV6:
+ if (connp->conn_family != AF_INET6)
+ return (-1);
+ switch (name) {
+ case IPV6_UNICAST_HOPS:
+ *i1 = (int)ipp->ipp_unicast_hops;
+ break; /* goto sizeof (int) option return */
+ case IPV6_MULTICAST_IF:
+ /* 0 index if not set */
+ *i1 = ixa->ixa_multicast_ifindex;
+ break; /* goto sizeof (int) option return */
+ case IPV6_MULTICAST_HOPS:
+ *i1 = ixa->ixa_multicast_ttl;
+ break; /* goto sizeof (int) option return */
+ case IPV6_MULTICAST_LOOP:
+ *i1 = (ixa->ixa_flags & IXAF_MULTICAST_LOOP) ? 1 : 0;
+ break; /* goto sizeof (int) option return */
+ case IPV6_JOIN_GROUP:
+ case IPV6_LEAVE_GROUP:
+ case MCAST_JOIN_GROUP:
+ case MCAST_LEAVE_GROUP:
+ case MCAST_BLOCK_SOURCE:
+ case MCAST_UNBLOCK_SOURCE:
+ case MCAST_JOIN_SOURCE_GROUP:
+ case MCAST_LEAVE_SOURCE_GROUP:
+ /* cannot "get" the value for these */
+ return (-1);
+ case IPV6_BOUND_IF:
+ /* Zero if not set */
+ *i1 = connp->conn_bound_if;
+ break; /* goto sizeof (int) option return */
+ case IPV6_UNSPEC_SRC:
+ *i1 = connp->conn_unspec_src;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVPKTINFO:
+ *i1 = connp->conn_recv_ancillary.crb_ip_recvpktinfo;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVTCLASS:
+ *i1 = connp->conn_recv_ancillary.crb_ipv6_recvtclass;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVPATHMTU:
+ *i1 = connp->conn_ipv6_recvpathmtu;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVHOPLIMIT:
+ *i1 = connp->conn_recv_ancillary.crb_ipv6_recvhoplimit;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVHOPOPTS:
+ *i1 = connp->conn_recv_ancillary.crb_ipv6_recvhopopts;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVDSTOPTS:
+ *i1 = connp->conn_recv_ancillary.crb_ipv6_recvdstopts;
+ break; /* goto sizeof (int) option return */
+ case _OLD_IPV6_RECVDSTOPTS:
+ *i1 =
+ connp->conn_recv_ancillary.crb_old_ipv6_recvdstopts;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVRTHDRDSTOPTS:
+ *i1 = connp->conn_recv_ancillary.
+ crb_ipv6_recvrthdrdstopts;
+ break; /* goto sizeof (int) option return */
+ case IPV6_RECVRTHDR:
+ *i1 = connp->conn_recv_ancillary.crb_ipv6_recvrthdr;
+ break; /* goto sizeof (int) option return */
+ case IPV6_PKTINFO: {
+ /* XXX assumes that caller has room for max size! */
+ struct in6_pktinfo *pkti;
+
+ pkti = (struct in6_pktinfo *)ptr;
+ pkti->ipi6_ifindex = ixa->ixa_ifindex;
+ if (ipp->ipp_fields & IPPF_ADDR)
+ pkti->ipi6_addr = ipp->ipp_addr;
+ else
+ pkti->ipi6_addr = ipv6_all_zeros;
+ return (sizeof (struct in6_pktinfo));
+ }
+ case IPV6_TCLASS:
+ *i1 = ipp->ipp_tclass;
+ break; /* goto sizeof (int) option return */
+ case IPV6_NEXTHOP: {
+ sin6_t *sin6 = (sin6_t *)ptr;
+
+ if (ixa->ixa_flags & IXAF_NEXTHOP_SET)
+ return (0);
+
+ *sin6 = sin6_null;
+ sin6->sin6_family = AF_INET6;
+ sin6->sin6_addr = ixa->ixa_nexthop_v6;
+
+ return (sizeof (sin6_t));
+ }
+ case IPV6_HOPOPTS:
+ if (!(ipp->ipp_fields & IPPF_HOPOPTS))
+ return (0);
+ bcopy(ipp->ipp_hopopts, ptr,
+ ipp->ipp_hopoptslen);
+ return (ipp->ipp_hopoptslen);
+ case IPV6_RTHDRDSTOPTS:
+ if (!(ipp->ipp_fields & IPPF_RTHDRDSTOPTS))
+ return (0);
+ bcopy(ipp->ipp_rthdrdstopts, ptr,
+ ipp->ipp_rthdrdstoptslen);
+ return (ipp->ipp_rthdrdstoptslen);
+ case IPV6_RTHDR:
+ if (!(ipp->ipp_fields & IPPF_RTHDR))
+ return (0);
+ bcopy(ipp->ipp_rthdr, ptr, ipp->ipp_rthdrlen);
+ return (ipp->ipp_rthdrlen);
+ case IPV6_DSTOPTS:
+ if (!(ipp->ipp_fields & IPPF_DSTOPTS))
+ return (0);
+ bcopy(ipp->ipp_dstopts, ptr, ipp->ipp_dstoptslen);
+ return (ipp->ipp_dstoptslen);
+ case IPV6_PATHMTU:
+ return (ip_fill_mtuinfo(connp, ixa,
+ (struct ip6_mtuinfo *)ptr));
+ case IPV6_SEC_OPT:
+ return (ipsec_req_from_conn(connp, (ipsec_req_t *)ptr,
+ IPSEC_AF_V6));
+ case IPV6_SRC_PREFERENCES:
+ return (ip6_get_src_preferences(ixa, (uint32_t *)ptr));
+ case IPV6_DONTFRAG:
+ *i1 = (ixa->ixa_flags & IXAF_DONTFRAG) != 0;
+ return (sizeof (int));
+ case IPV6_USE_MIN_MTU:
+ if (ixa->ixa_flags & IXAF_USE_MIN_MTU)
+ *i1 = ixa->ixa_use_min_mtu;
+ else
+ *i1 = IPV6_USE_MIN_MTU_MULTICAST;
+ break;
+ case IPV6_V6ONLY:
+ *i1 = connp->conn_ipv6_v6only;
+ return (sizeof (int));
+ default:
+ return (-1);
+ }
+ break;
+ case IPPROTO_UDP:
+ switch (name) {
+ case UDP_ANONPRIVBIND:
+ *i1 = connp->conn_anon_priv_bind;
+ break;
+ case UDP_EXCLBIND:
+ *i1 = connp->conn_exclbind ? UDP_EXCLBIND : 0;
+ break;
+ default:
+ return (-1);
+ }
+ break;
+ case IPPROTO_TCP:
+ switch (name) {
+ case TCP_RECVDSTADDR:
+ *i1 = connp->conn_recv_ancillary.crb_recvdstaddr;
+ break;
+ case TCP_ANONPRIVBIND:
+ *i1 = connp->conn_anon_priv_bind;
+ break;
+ case TCP_EXCLBIND:
+ *i1 = connp->conn_exclbind ? TCP_EXCLBIND : 0;
+ break;
+ default:
+ return (-1);
+ }
+ break;
+ default:
+ return (-1);
+ }
+ return (sizeof (int));
+}
+
+static int conn_opt_set_socket(conn_opt_arg_t *coa, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr);
+static int conn_opt_set_ip(conn_opt_arg_t *coa, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr);
+static int conn_opt_set_ipv6(conn_opt_arg_t *coa, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr);
+static int conn_opt_set_udp(conn_opt_arg_t *coa, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr);
+static int conn_opt_set_tcp(conn_opt_arg_t *coa, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr);
+
+/*
+ * This routine sets the most common socket options including some
+ * that are transport/ULP specific.
+ * It returns errno or zero.
+ *
+ * For fixed length options, there is no sanity check
+ * of passed in length is done. It is assumed *_optcom_req()
+ * routines do the right thing.
+ */
+int
+conn_opt_set(conn_opt_arg_t *coa, t_scalar_t level, t_scalar_t name,
+ uint_t inlen, uchar_t *invalp, boolean_t checkonly, cred_t *cr)
+{
+ ASSERT(MUTEX_NOT_HELD(&coa->coa_connp->conn_lock));
+
+ /* We have different functions for different levels */
+ switch (level) {
+ case SOL_SOCKET:
+ return (conn_opt_set_socket(coa, name, inlen, invalp,
+ checkonly, cr));
+ case IPPROTO_IP:
+ return (conn_opt_set_ip(coa, name, inlen, invalp,
+ checkonly, cr));
+ case IPPROTO_IPV6:
+ return (conn_opt_set_ipv6(coa, name, inlen, invalp,
+ checkonly, cr));
+ case IPPROTO_UDP:
+ return (conn_opt_set_udp(coa, name, inlen, invalp,
+ checkonly, cr));
+ case IPPROTO_TCP:
+ return (conn_opt_set_tcp(coa, name, inlen, invalp,
+ checkonly, cr));
+ default:
+ return (0);
+ }
+}
+
+/*
+ * Handle SOL_SOCKET
+ * Note that we do not handle SO_PROTOTYPE here. The ULPs that support
+ * it implement their own checks and setting of conn_proto.
+ */
+/* ARGSUSED1 */
+static int
+conn_opt_set_socket(conn_opt_arg_t *coa, t_scalar_t name, uint_t inlen,
+ uchar_t *invalp, boolean_t checkonly, cred_t *cr)
+{
+ conn_t *connp = coa->coa_connp;
+ ip_xmit_attr_t *ixa = coa->coa_ixa;
+ int *i1 = (int *)invalp;
+ boolean_t onoff = (*i1 == 0) ? 0 : 1;
+
+ switch (name) {
+ case SO_ALLZONES:
+ if (IPCL_IS_BOUND(connp))
+ return (EINVAL);
+ break;
+#ifdef SO_VRRP
+ case SO_VRRP:
+ if (secpolicy_ip_config(cr, checkonly) != 0)
+ return (EACCES);
+ break;
+#endif
+ case SO_MAC_EXEMPT:
+ if (secpolicy_net_mac_aware(cr) != 0)
+ return (EACCES);
+ if (IPCL_IS_BOUND(connp))
+ return (EINVAL);
+ break;
+ case SO_MAC_IMPLICIT:
+ if (secpolicy_net_mac_implicit(cr) != 0)
+ return (EACCES);
+ break;
+ }
+ if (checkonly)
+ return (0);
+
+ mutex_enter(&connp->conn_lock);
+ /* Here we set the actual option value */
+ switch (name) {
+ case SO_DEBUG:
+ connp->conn_debug = onoff;
+ break;
+ case SO_KEEPALIVE:
+ connp->conn_keepalive = onoff;
+ break;
+ case SO_LINGER: {
+ struct linger *lgr = (struct linger *)invalp;
+
+ if (lgr->l_onoff) {
+ connp->conn_linger = 1;
+ connp->conn_lingertime = lgr->l_linger;
+ } else {
+ connp->conn_linger = 0;
+ connp->conn_lingertime = 0;
+ }
+ break;
+ }
+ case SO_OOBINLINE:
+ connp->conn_oobinline = onoff;
+ coa->coa_changed |= COA_OOBINLINE_CHANGED;
+ break;
+ case SO_REUSEADDR:
+ connp->conn_reuseaddr = onoff;
+ break;
+ case SO_DONTROUTE:
+ if (onoff)
+ ixa->ixa_flags |= IXAF_DONTROUTE;
+ else
+ ixa->ixa_flags &= ~IXAF_DONTROUTE;
+ coa->coa_changed |= COA_ROUTE_CHANGED;
+ break;
+ case SO_USELOOPBACK:
+ connp->conn_useloopback = onoff;
+ break;
+ case SO_BROADCAST:
+ connp->conn_broadcast = onoff;
+ break;
+ case SO_SNDBUF:
+ /* ULP has range checked the value */
+ connp->conn_sndbuf = *i1;
+ coa->coa_changed |= COA_SNDBUF_CHANGED;
+ break;
+ case SO_RCVBUF:
+ /* ULP has range checked the value */
+ connp->conn_rcvbuf = *i1;
+ coa->coa_changed |= COA_RCVBUF_CHANGED;
+ break;
+ case SO_RCVTIMEO:
+ case SO_SNDTIMEO:
+ /*
+ * Pass these two options in order for third part
+ * protocol usage.
+ */
+ break;
+ case SO_DGRAM_ERRIND:
+ connp->conn_dgram_errind = onoff;
+ break;
+ case SO_RECVUCRED:
+ connp->conn_recv_ancillary.crb_recvucred = onoff;
+ break;
+ case SO_ALLZONES:
+ connp->conn_allzones = onoff;
+ coa->coa_changed |= COA_ROUTE_CHANGED;
+ if (onoff)
+ ixa->ixa_zoneid = ALL_ZONES;
+ else
+ ixa->ixa_zoneid = connp->conn_zoneid;
+ break;
+ case SO_TIMESTAMP:
+ connp->conn_recv_ancillary.crb_timestamp = onoff;
+ break;
+#ifdef SO_VRRP
+ case SO_VRRP:
+ connp->conn_isvrrp = onoff;
+ break;
+#endif
+ case SO_ANON_MLP:
+ connp->conn_anon_mlp = onoff;
+ break;
+ case SO_MAC_EXEMPT:
+ connp->conn_mac_mode = onoff ?
+ CONN_MAC_AWARE : CONN_MAC_DEFAULT;
+ break;
+ case SO_MAC_IMPLICIT:
+ connp->conn_mac_mode = onoff ?
+ CONN_MAC_IMPLICIT : CONN_MAC_DEFAULT;
+ break;
+ case SO_EXCLBIND:
+ connp->conn_exclbind = onoff;
+ break;
+ }
+ mutex_exit(&connp->conn_lock);
+ return (0);
+}
+
+/* Handle IPPROTO_IP */
+static int
+conn_opt_set_ip(conn_opt_arg_t *coa, t_scalar_t name, uint_t inlen,
+ uchar_t *invalp, boolean_t checkonly, cred_t *cr)
+{
+ conn_t *connp = coa->coa_connp;
+ ip_xmit_attr_t *ixa = coa->coa_ixa;
+ ip_pkt_t *ipp = coa->coa_ipp;
+ int *i1 = (int *)invalp;
+ boolean_t onoff = (*i1 == 0) ? 0 : 1;
+ ipaddr_t addr = (ipaddr_t)*i1;
+ uint_t ifindex;
+ zoneid_t zoneid = IPCL_ZONEID(connp);
+ ipif_t *ipif;
+ ip_stack_t *ipst = connp->conn_netstack->netstack_ip;
+ int error;
+
+ if (connp->conn_family != AF_INET)
+ return (EINVAL);
+
+ switch (name) {
+ case IP_TTL:
+ /* Don't allow zero */
+ if (*i1 < 1 || *i1 > 255)
+ return (EINVAL);
+ break;
+ case IP_MULTICAST_IF:
+ if (addr == INADDR_ANY) {
+ /* Clear */
+ ifindex = 0;
+ break;
+ }
+ ipif = ipif_lookup_addr(addr, NULL, zoneid, ipst);
+ if (ipif == NULL)
+ return (EHOSTUNREACH);
+ /* not supported by the virtual network iface */
+ if (IS_VNI(ipif->ipif_ill)) {
+ ipif_refrele(ipif);
+ return (EINVAL);
+ }
+ ifindex = ipif->ipif_ill->ill_phyint->phyint_ifindex;
+ ipif_refrele(ipif);
+ break;
+ case IP_NEXTHOP: {
+ ire_t *ire;
+
+ if (addr == INADDR_ANY) {
+ /* Clear */
+ break;
+ }
+ /* Verify that the next-hop is on-link */
+ ire = ire_ftable_lookup_v4(addr, 0, 0, IRE_ONLINK, NULL, zoneid,
+ NULL, MATCH_IRE_TYPE, 0, ipst, NULL);
+ if (ire == NULL)
+ return (EHOSTUNREACH);
+ ire_refrele(ire);
+ break;
+ }
+ case IP_OPTIONS:
+ case T_IP_OPTIONS: {
+ uint_t newlen;
+
+ if (ipp->ipp_fields & IPPF_LABEL_V4)
+ newlen = inlen + (ipp->ipp_label_len_v4 + 3) & ~3;
+ else
+ newlen = inlen;
+ if ((inlen & 0x3) || newlen > IP_MAX_OPT_LENGTH) {
+ return (EINVAL);
+ }
+ break;
+ }
+ case IP_PKTINFO: {
+ struct in_pktinfo *pktinfo;
+
+ /* Two different valid lengths */
+ if (inlen != sizeof (int) &&
+ inlen != sizeof (struct in_pktinfo))
+ return (EINVAL);
+ if (inlen == sizeof (int))
+ break;
+
+ pktinfo = (struct in_pktinfo *)invalp;
+ if (pktinfo->ipi_spec_dst.s_addr != INADDR_ANY) {
+ switch (ip_laddr_verify_v4(pktinfo->ipi_spec_dst.s_addr,
+ zoneid, ipst, B_FALSE)) {
+ case IPVL_UNICAST_UP:
+ case IPVL_UNICAST_DOWN:
+ break;
+ default:
+ return (EADDRNOTAVAIL);
+ }
+ }
+ if (!ip_ifindex_valid(pktinfo->ipi_ifindex, B_FALSE, ipst))
+ return (ENXIO);
+ break;
+ }
+ case IP_BOUND_IF:
+ ifindex = *(uint_t *)i1;
+
+ /* Just check it is ok. */
+ if (!ip_ifindex_valid(ifindex, B_FALSE, ipst))
+ return (ENXIO);
+ break;
+ }
+ if (checkonly)
+ return (0);
+
+ /* Here we set the actual option value */
+ /*
+ * conn_lock protects the bitfields, and is used to
+ * set the fields atomically. Not needed for ixa settings since
+ * the caller has an exclusive copy of the ixa.
+ * We can not hold conn_lock across the multicast options though.
+ */
+ switch (name) {
+ case IP_OPTIONS:
+ case T_IP_OPTIONS:
+ /* Save options for use by IP. */
+ mutex_enter(&connp->conn_lock);
+ error = optcom_pkt_set(invalp, inlen,
+ (uchar_t **)&ipp->ipp_ipv4_options,
+ &ipp->ipp_ipv4_options_len);
+ if (error != 0) {
+ mutex_exit(&connp->conn_lock);
+ return (error);
+ }
+ if (ipp->ipp_ipv4_options_len == 0) {
+ ipp->ipp_fields &= ~IPPF_IPV4_OPTIONS;
+ } else {
+ ipp->ipp_fields |= IPPF_IPV4_OPTIONS;
+ }
+ mutex_exit(&connp->conn_lock);
+ coa->coa_changed |= COA_HEADER_CHANGED;
+ coa->coa_changed |= COA_WROFF_CHANGED;
+ break;
+
+ case IP_TTL:
+ mutex_enter(&connp->conn_lock);
+ ipp->ipp_unicast_hops = *i1;
+ mutex_exit(&connp->conn_lock);
+ coa->coa_changed |= COA_HEADER_CHANGED;
+ break;
+ case IP_TOS:
+ case T_IP_TOS:
+ mutex_enter(&connp->conn_lock);
+ if (*i1 == -1) {
+ ipp->ipp_type_of_service = 0;
+ } else {
+ ipp->ipp_type_of_service = *i1;
+ }
+ mutex_exit(&connp->conn_lock);
+ coa->coa_changed |= COA_HEADER_CHANGED;
+ break;
+ case IP_MULTICAST_IF:
+ ixa->ixa_multicast_ifindex = ifindex;
+ ixa->ixa_multicast_ifaddr = addr;
+ coa->coa_changed |= COA_ROUTE_CHANGED;
+ break;
+ case IP_MULTICAST_TTL:
+ ixa->ixa_multicast_ttl = *invalp;
+ /* Handled automatically by ip_output */
+ break;
+ case IP_MULTICAST_LOOP: