Name: CRAWDAD cmu/hotspot
Creator: Jeffrey Pang
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Wireless Networking

Abstract

Dataset of all visible APs of 13 hotspot locations in Seattle, WA over one week.

We measured the performance and application support of all visible APs at 13 hotspot locations around University Avenue, Seattle, WA, near the University of Washington over the course of 1 week.

date/time of measurement start: 2009-10-07

date/time of measurement end: 2009-10-15

collection environment: Users expect Internet connectivity wherever they travel and many of their devices, such as iPods and wireless cameras, rely on local area Wi-Fi access points (APs) to obtain connectivity. Even smart phone users may employ Wi-Fi instead of 3G and WiMAX to improve the performance of bandwidth intensive applications or to avoid data charges. Fortunately, there is often a large selection of commercial APs to choose from. For example, JiWire (http://www.jiwire.com/), a hotspot directory, reports 395 to 1,071 commercial APs in each of the top ten U.S. metropolitan areas. Nonetheless, some users report that some APs block applications and have poorer than advertised performance, so selecting the best commercial AP is not always straightforward. To verify these reports, we present a measurement study of commercial APs in hotspot settings. We measure APs from the perspective of a typical Wi-Fi user who is inside an establishment. Our study examines the performance and application support of all visible APs at 13 hotspot locations around University Avenue, Seattle, WA, near the University of Washington over the course of 1 week. All locations are single-room coffee or tea shops. Most APs we measured are not open. In addition to each hotspot’s official AP, the APs of hotspots nearby are also usually visible. APs of the free public seattle wiﬁ network are sometimes visible at all locations. APs belonging to the University of Washington network are sometimes visible due to proximity to campus buildings, though these were never the best performing at any location. Our study offers a lower bound on the number and diversity of APs, as more may become available.

network configuration: We collected measurements with a commodity laptop with an Atheros 802.11b/g miniPCI card attached to the laptop’s internal antennas. We implemented a custom wireless network manager for associating to APs and performing measurements after association. Our implementation is based on the Mark-and-Sweep war driving tool, which is described in “Mark-and-sweep: getting the inside scoop on neighborhood networks” (IMC, 2008) by D. Han, A. Agarwala, D. G. Andersen and M. Kaminsky.

data collection methodology: Measurements were performed as follows: * For each location (loc_persistent.loc_name), we performed a number of trials. Each trial is identified by trial.id. * During each trial, we sat down at the location, scanned for all visible BSSIDs with SNR > 10dB. Then, in serial, we performed a measurement test on each visible BSSID. Each measurement test is identified by ap.id. * During each measurement test, we performed a sequence of tests: 1) We first attempt to associate and obtain a dhcp address. This test uses the wicrawl associate_and_dhcp plugin. 2) If successful, we then check if there is a web portal that we must bypass to obtain Internet connectivity. We also perform a number of local scans to discover clients on the LAN (ARP scan, UPNP scan, bonjour scan, CIFs scan) This test uses the wicrawl portal_check plugin. 3) Once we obtain Internet connectivity, then we perform the remainder of the tests with the following wicrawl plugins: bandwidth_up bandwidth_down tcp_bw traceroute natcheck port_check (udp upload, udp download, tcp up/down, traceroute, nat type, jitter+loss+port block checking, respectively).

Traceset

sql_tables

Traceset of all visible APs of 13 hotspot locations in Seattle, WA over one week.

file: wifireports-udistrict-20081007-anon.tar.gz
description: We measured the performance and application support of all visible APs at 13 hotspot locations around University Avenue, Seattle, WA, near the University of Washington over the course of 1 week.
measurement purpose: Network Diagnosis, Network Performance Analysis
methodology: Measurements were performed as follows: * For each location (loc_persistent.loc_name), we performed a number of trials. Each trial is identified by trial.id. * During each trial, we sat down at the location, scanned for all visible BSSIDs with SNR > 10dB. Then, in serial, we performed a measurement test on each visible BSSID. Each measurement test is identified by ap.id. * During each measurement test, we performed a sequence of tests: 1) We first attempt to associate and obtain a dhcp address. This test uses the wicrawl associate_and_dhcp plugin. 2) If successful, we then check if there is a web portal that we must bypass to obtain Internet connectivity. We also perform a number of local scans to discover clients on the LAN (ARP scan, UPNP scan, bonjour scan, CIFs scan) This test uses the wicrawl portal_check plugin. 3) Once we obtain Internet connectivity, then we perform the remainder of the tests with the following wicrawl plugins: bandwidth_up bandwidth_down tcp_bw traceroute natcheck port_check (udp upload, udp download, tcp up/down, traceroute, nat type, jitter+loss+port block checking, respectively)

sql_tables Traces

ap: Database table of each measurement trial on APs from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Data from each measurement test is saved in ap, local_scans, tcp_ports, udp_ports, as follows: In the table ap: id - unique ID for measurement test random_mac - unused (ignore this field) trial_id - trial.id that this measurement belongs to scantime - time scanning started bssid - BSSID of the AP that we are testing ssid - SSID of the AP that we are testing channel - 802.11 channel that we are on power - median SNR of beacons that we measured kismet_packets - number of beacons that we measured kismet_best_signal - highest SNR of beacons that we measured kismet_mean_signal - mean SNR of beacons that we measured kismet_best_noise - lowest noise of beacons that we measured kismet_mean_noise - mean noise of beacons that we measured kismet_median_noise - median noise of beacons that we measured encryption - AP's type of encryption rates - AP's supported rates output_file - file in the filesystem with wicrawl output mitm_file - SSL proxy log file (for portal login) pcap_file - pcap file in the filesystem monitor_file - monitor mode pcap file in the filesystem associate_success - true if association succeeded associate_tries - number of association tries associate_fail_reason - reason association failed dhcp_success - true if DHCP succeeded dhcp_tries - number of DHCP tried ip - IP address assigned to us gateway - IP address of the gateway name_servers - nameservers assigned to us domain_name - domain name of the local domain realip - external facing IP address (non-NATed) portal_exists - true if there is a login portal to bypass portal_title - title of the login portal HTML page portal_refresh_url - HTTP refresh url for portal, if any portal_fetch_time - time it took to fetch the portal page portal_tries - number of tries to fetch the portal page google_fetch_time - time it took to fetch google.com after portal page portal_success - true if we bypassed the portal page (or none existed) portal_fail_reason - reason we failed to bypass the portal page nat_type - type of NAT, as reported by STUN udp_bw_up - UDP upload measurement in Mbps udp_bw_down - UDP download measurement in Mbps tcp_bw_up - TCP upload measurement in Mbps tcp_bw_down - TCP download measurement in Mbps ping_type - type of ping used for RTT and loss measurements rtt_avg - mean RTT to our measurement server rtt_std_dev - stddev of the RTT to our measurement server loss_rate - ping loss rate to our measurement server wifi_loss_ping_type - type of ping used for estimating wifi loss wifi_loss_target_type - whether our wifi loss estimate pinged the AP (gw) or our measurement server (server) wifi_loss_rate_big - ping loss rate with 1500B packets, 802.11 retries disabled wifi_loss_rate_small - ping loss rate with 40B packets, 802.11 retries disabled dns_check1 - whether we succeded fetching a CMU DNS name dns_check2 - same as above (see port_check plugin) traceroute - output of traceroute to our measurement server (null if anonymized)

ap_persistent: Database table of unique AP BSSID from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: We also summarize information about each AP BSSID that we ever saw in ap_persistent: bssid - BSSID of AP ssid - SSID of AP encryption - encryption of AP associate_successes - number of association successes over all trials dhcp_successes - number of DHCP successes over all trials portal_exists - true of AP has a login portal portal_successes - number of portal click-through successes over all trials requires_payment - true of AP requires $$$ to use requires_purchase - true if we have to buy something to use the AP requires_membership - true if we have to be a member of some organization to use the AP (typically this means University of Washington)

arp_devices: Database table of MAC addresses that responded to ARP queries, from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: The ARP scan also produces auxiliary information that lists all MAC addresses that responded to the arp scan during each measurement test in the arp_devices table: mac - mac address that responded ap_id - ap.id that identifies this measurement test.

bad_measurements: Database table of flawed measurement trial data from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Finally, there were known errors in some measurements and we list those in the bad_measurements table: ap_id - ap.id that identifies this measurement test associate - true if association test has errors dhcp - true if DHCP test has errors portal - true if portal test has errors nat - true if NAT test has errors udp_bw_up - true of UDP upload test has errors udp_bw_down - true if UDP download test has errors tcp_bw_up - true if TCP upload test has errors tcp_bw_down - true if TCP download test has errors ping - true if ping RTT/jitter/loss test has errors wifi_ping - true if wifi loss test has errors dns - true if DNS check test has errors traceroute - true if traceroute has errors upnp - true if UPnP scan has errors mdns - true if mDNS scan has errors arp - true if ARP scan has errors tcp_ports - true of TCP port block check has errors udp_ports - true if UDP port block check has errors comments - user entered comments about errors.

loc_persistent: Database table of data on each location from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Each location is described in the loc_persistent table: loc_name - cannonical name of location latitude - GPS latitude (arbitrary from measurements) longitude - GPS longitude (arbitrary from measurements) trials - number of trials taken at this location duration_mean - average time to complete a trial duration_min - min time to complete a trial duration_max - max time to complete a trial official_ssids - comma-separated-list of "official" SSIDs for hotspot.

local_scans: Database table of local scan measurement data from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Data from each measurement test is saved in ap, local_scans, tcp_ports, udp_ports, as follows: In the local_scans table: ap_id - ap.id that identifies this measurement test before_portal - true if we performed the scan before bypassing the login portal (sometimes we test before and after) upnp_ips - number of distinct IPs advertising UPnP services upnp_total - total number of UPnP services advertised mdns_ips - number of distinct IPs advertising an mDNS name mdns_workstations - number of ._workstation._tcp.local mDNS names mdns_total int, - number of mDNS names total smb_hosts int, - number of responding CIFs clients (samba) arp - number of distinct local IP addresses responding to ARP (excluding our IP and the AP's IP) arp_macs - number of distinct MAC addresses responding to ARP (excluding our MAC and the AP's MAC) arp_scanned - number of local IP addresses scanned in the ARP scan upnp_output - output of UPnP scanner (null if anonymized) mdns_output - output of mDNS scanner (null if anonymized) smb_output - output of SMB scanner (null if anonymized) arp_output - output of nmap ARP scanner (null if anonymized).

taken at this location duration_mean - average time to complete a trial duration_min - min time to complete a trial duration_max - max time to complete a trial official_ssids - comma-separated-list of "official" SSIDs for hotspot.

tcp_ports: Database table of tcp port scan measurement data from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Data from each measurement test is saved in ap, local_scans, tcp_ports, udp_ports, as follows: In the tcp_ports and udp_ports tables: ap_id - ap.id that identifies this measurement test udp### - y if some probes to port ### succeeded, n otherwise tcp### - same as above, but r if redirected to a man-in-the-middle udp###_time - RTT of probes to port ### tcp###_time - time to setup the tcp connection to port ###

trial: Database table of data on each trial at each location from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: For each trial, we save the following data: id - unique ID for the trial unixtime - time the trial started success - whether the trial succeeded or was marked a failure by the user latitude - GPS latitude longitude - GPS longitude duration - how long it took to run all the measurement tests loc_name - where the trial took place (refers to loc_persistent.loc_name) comments - any user entered comments about the trial mac - measurement client's mac address ifconfig - the output of ifconfig after the trial iwconfig - the output of iwconfig after the trial netstat - the output of netstat after the trial group_name - group that this trial belongs to (usually groups trials by day) dir - directory in the filesystem with the log files from this trial

udp_ports: Database table of udp port scan measurement data from 13 hotspot locations in Seattle, WA over one week.

configuration: Our measurement data is stored in several relational database tables. It is distributed as an SQL file that you should be able to import into the relational database of your choice (we use MySQL). We assume in this document that the database is called wifi. The tables are as follows +------------------+ | Tables_in_wifi | +------------------+ | ap | - data on each measurement trial on APs | ap_persistent | - unique AP BSSID | arp_devices | - MAC addresses that responded to ARP queries | bad_measurements | - measurement trial data that is flawed | loc_persistent | - data on each location | local_scans | - local scan measurement data | plugin_output | - actual wicrawl output (empty if anonymized) | tcp_ports | - tcp port scan measurement data | trial | - data on each trial at each location | udp_ports | - udp port scan measurement data | wifi_info | - estimated loss data (unused) +------------------+
format: Data from each measurement test is saved in ap, local_scans, tcp_ports, udp_ports, as follows: In the tcp_ports and udp_ports tables: ap_id - ap.id that identifies this measurement test udp### - y if some probes to port ### succeeded, n otherwise tcp### - same as above, but r if redirected to a man-in-the-middle udp###_time - RTT of probes to port ### tcp###_time - time to setup the tcp connection to port ###

Instructions:

The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.

About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.

CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.

Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.

Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.

Citation:

Jeffrey Pang, cmu/hotspot, https://doi.org/10.15783/C7RP4W , Date: 20090415

Dataset Files

wifireports-udistrict-20081007-anon.tar.gz (2.18 MB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.

Documentation

Attachment	Size
cmu-hotspot-readme.txt	1.55 KB

These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.

Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.

QUESTIONS?

Report a problem with this Dataset

Datasets

Open Access

CRAWDAD cmu/hotspot