Page 1
Cisco SFS InfiniBand Host Drivers User Guide for Linux Release 3.2.0 June 2007 Americas Headquarters Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 http://www.cisco.com Tel: 408 526-4000 800 553-NETS (6387) Fax: 408 527-0883 Text Part Number: OL-12309-01...
Page 2
OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. CCVP, the Cisco logo, and the Cisco Square Bridge logo are trademarks of Cisco Systems, Inc.; Changing the Way We Work, Live, Play, and Learn is a service mark of Cisco Systems, Inc.;...
Page 3
Preface Audience Organization Conventions Root and Non-root Conventions in Examples Related Documentation Obtaining Documentation, Obtaining Support, and Security Guidelines About Host Drivers C H A P T E R Introduction Architecture Supported Protocols IPoIB Supported APIs MVAPICH MPI uDAPL Intel MPI HP MPI HCA Utilities and Diagnostics Installing Host Drivers...
Contents Subinterfaces Creating a Subinterface Associated with a Specific IB Partition Removing a Subinterface Associated with a Specific IB Partition Verifying IPoIB Functionality IPoIB Performance Sample Startup Configuration File IPoIB High Availability Merging Physical Ports Unmerging Physical Ports SCSI RDMA Protocol C H A P T E R Introduction Configuring SRP...
Page 5
MVAPICH MPI C H A P T E R Introduction Initial Setup Configuring SSH Editing Environment Variables Setting Environment Variables in System-Wide Startup Files Editing Environment Variables in the Users Shell Startup Files Editing Environment Variables Manually MPI Bandwidth Test Performance MPI Latency Test Performance Intel MPI Benchmarks (IMB) Test Performance Compiling MPI Programs...
Page 6
Contents Cisco SFS InfiniBand Host Drivers User Guide for Linux OL-12309-01...
Page 7
Preface This preface describes who should read the Cisco SFS InfiniBand Host Drivers User Guide for Linux, how it is organized, and its document conventions. It includes the following sections: Audience, page vii • Organization, page vii • Conventions, page viii •...
Page 8
Conventions Chapter Chapter 8 Appendix A Conventions This document uses the following conventions: Convention boldface font italic font Menu1 > Menu2 > Item… { x | y | z } [ x | y | z ] string font screen boldface screen font italic screen...
Preface Notes use the following convention: Means reader take note. Notes contain helpful suggestions or references to material not covered in the Note manual. Cautions use the following convention: Means reader be careful. In this situation, you might do something that could result in equipment Caution damage or loss of data.
Page 10
Preface Obtaining Documentation, Obtaining Support, and Security Guidelines Cisco SFS InfiniBand Host Drivers User Guide for Linux OL-12309-01...
Page 11
About Host Drivers This chapter describes host drivers and includes the following sections: Introduction, page 1-1 • Architecture, page 1-2 • Supported Protocols, page 1-3 • Supported APIs, page 1-4 • • HCA Utilities and Diagnostics, page 1-4 For expansions of acronyms and abbreviations used in this publication, see Note Abbreviations.”...
Architecture See the “Root and Non-root Conventions in Examples” section on page ix Note significance of prompts used in the examples in this chapter. Architecture Figure 1-1 displays the software architecture of the protocols and APIs that HCAs support. The figure displays ULPs and APIs in relation to other IB software elements.
Chapter 1 About Host Drivers Supported Protocols This section describes the supported protocols and includes the following topics: IPoIB • • • Protocol here refers to software in the networking layer in kernel space. IPoIB The IPoIB protocol passes IP traffic over the IB network. Configuring IPoIB requires similar steps to configuring IP on an Ethernet network.
Page 14
Supported APIs Supported APIs This section describes the supported APIs and includes the following topics: MVAPICH MPI • uDAPL • • Intel MPI HP MPI • API refers to software in the networking layer in user space. MVAPICH MPI MPI is a standard library functionality in C, C++, and Fortran that can be used to implement a message-passing program.
Installing Host Drivers The chapter includes the following sections: Introduction, page 2-1 • Contents of ISO Image, page 2-2 • Installing Host Drivers from an ISO Image, page 2-2 • Uninstalling Host Drivers from an ISO Image, page 2-3 • Note See the “Root and Non-root Conventions in Examples”...
Page 16
Contents of ISO Image Contents of ISO Image The ISO image contains the following directories and files: docs/ • This directory contains the related documents. • tsinstall This is the installation script. redhat/ • This directory contains the binary RPMs for Red Hat Enterprise Linux. suse/ •...
Page 18
Chapter 2 Installing Host Drivers Uninstalling Host Drivers from an ISO Image Cisco SFS InfiniBand Host Drivers User Guide for Linux OL-12309-01...
Page 19
IP over IB Protocol This chapter describes IP over IB protocol and includes the following sections: Introduction, page 3-1 • Manually Configuring IPoIB for Default IB Partition, page 3-2 • Subinterfaces, page 3-2 • Verifying IPoIB Functionality, page 3-5 • •...
Manually Configuring IPoIB for Default IB Partition Manually Configuring IPoIB for Default IB Partition To manually configure IPoIB for the default IB partition, perform the following steps: Log in to your Linux host. Step 1 To configure the interface, enter the ifconfig command with the following items: Step 2 The appropriate IB interface (ib0 or ib1 on a host with one HCA) •...
Chapter 3 IP over IB Protocol Creating a Subinterface Associated with a Specific IB Partition To create a subinterface associated with a specific IB partition, perform the following steps: Step 1 Create a partition on an IB SFS. Alternatively, you can choose to create the partition of the IB interface on the host first, and then create the partition for the ports on the IB SFS.
Subinterfaces NOARP RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) Verify that you see the ib0.8002 output. Step 6 Configure the new interface just as you would the parent interface. (See the IPoIB for Default IB Partition”...
IPoIB Performance IPoIB Performance This section describes how to verify IPoIB performance by running the Bandwidth test and the Latency test. These tests are described in detail at the following URL: http://www.netperf.org/netperf/training/Netperf.html To verify IPoIB performance, perform the following steps: Download Netperf from the following URL: Step 1 http://www.netperf.org/netperf/NetperfPage.html...
Page 25
Chapter 3 IP over IB Protocol Run the Netperf Latency test. Step 5 Run the test once, and stop the server so that it does not repeat the test. The following example shows how to run the Latency test, and then stop the Netperf server: host2$ netperf -H 192.168.0.1 -c -C -t TCP_RR -- -r 1,1 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.1 (192.168.0.1) port 0 AF_INET...
Sample Startup Configuration File Sample Startup Configuration File IP addresses that are configured manually are not persistent across reboots. You must use a configuration file to configure IPoIB when the host boots. Two sample configurations are included in this section. The following sample configuration shows an example file named ifcfg-ib0 that resides on a Linux host in /etc/sysconfig/networks-scripts/ on RHEL3 and RHEL4.
Chapter 3 IP over IB Protocol Take the interfaces offline. You cannot merge interfaces until you bring them down. Step 3 The following example shows how to take the interfaces offline: host1# ifconfig ib0 down host1# ifconfig ib1 down Merge the two ports into one virtual IPoIB high availability port by entering the ipoibcfg merge Step 4 command with the IB identifiers of the first and the second IB ports on the HCA.
Page 28
IPoIB High Availability Display the available interfaces by entering the ipoibcfg list command. Step 3 The following example shows how to display the available interfaces: host1# /usr/local/topspin/sbin/ipoibcfg list ib0 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/1, Active: InfiniHost0/1) ib1 (P_Key 0xffff) (SL:255) (Ports: InfiniHost0/2, Active: InfiniHost0/2) Enable the interfaces by entering the ifconfig command with the appropriate IB interface argument and Step 4 the up argument.
SCSI RDMA Protocol This chapter describes SCSI RDMA protocol and includes the following sections: Introduction, page 4-1 • Configuring SRP, page 4-1 • Verifying SRP, page 4-7 • Note See the “Root and Non-root Conventions in Examples” section on page ix significance of prompts used in the examples in this chapter.
Configuring SRP This section contains information on how to configure your IB fabric to connect an SRP host to a SAN and includes the following topics: Configuring ITLs when Using Fibre Channel Gateway, page 4-2 • Configuring SRP Host, page 4-6 •...
Page 31
Chapter 4 SCSI RDMA Protocol Bring up the Fibre Channel gateways on your SFS, by performing the following steps: Step 3 Launch Element Manager. Double-click the Fibre Channel gateway card that you want to bring up. The Fibre Channel Card window opens.
Configuring SRP Configuring ITLs with Element Manager while Global Policy Restrictions Apply This section describes how to configure ITLs with Element Manager while global policy restrictions apply. These instructions apply to environments where the portmask policy and LUN masking policy are both restricted.
Page 33
Chapter 4 SCSI RDMA Protocol Click the Next > button. The Define New SRP Host window displays a recommended WWNN for the Step 9 host and recommended WWPNs that represent the host on all existing and potential Fibre Channel gateway ports. Although you can manually configure the WWNN or WWPNs, we recommend that you use the Note default values to avoid conflicts.
Two additional Sun Model T4 SRP LUNs are available after the configuration is complete. Cisco SFS InfiniBand Host Drivers User Guide for Linux Model: ST373307LC Rev: 0006 ANSI SCSI revision: 03 Model: GEM318P Rev: 1 ANSI SCSI revision: 02 Model: ST373307LC Rev: 0006...
Chapter 4 SCSI RDMA Protocol Verifying SRP This section describes how to verify SRP functionality and verify SRP host-to-storage connections with the Element Manager GUI and includes the following sections: Verifying SRP Functionality, page 4-7 • Verifying with Element Manager, page 4-8 •...
Verifying SRP 512000 inodes, 1023996 blocks 51199 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=1048576000 32 block groups 32768 blocks per group, 32768 fragments per group 16000 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736 Writing inode tables: done Creating journal (8192 blocks): done...
Sockets Direct Protocol This chapter describes the Sockets Direct Protocol and includes the following sections: Introduction, page 5-1 • Configuring IPoIB Interfaces, page • Converting Sockets-Based Application, page 5-2 • SDP Performance, page 5-4 • • Netperf Server with IPoIB and SDP, page 5-6 See the “Root and Non-root Conventions in Examples”...
Converting Sockets-Based Application Converting Sockets-Based Application This section describes how to convert sockets-based applications. You can convert your sockets-based applications to use SDP instead of TCP by using one of two conversion types. This section includes the following topics: Explicit/Source Code Conversion Type, page 5-2 •...
Chapter 5 Sockets Direct Protocol Log Statement This section describes the log statement. The log directive allows the user to specify which debug and error messages are sent and where they are sent. The log statement format is as log [destination stderr | syslog | file filename] [min-level 1-9] Command destination stderr...
SDP Performance shared program The ip_port matches against an IP address, prefix length, and port range. The format is as follows: ip_addr[/prefix_length][:start_port[-end_port]] The prefix length is optional and missing defaults to /32 (length of one host). The ending port in the range is optional and is missing defaults to the port specified by the starting point.
Page 41
Chapter 5 Sockets Direct Protocol Socket Socket Size Size bytes bytes 87380 16384 The following list describes the parameters for the netperf command: 192.168.0.1 The notable performance values in the example above are as follows: Throughput is 6.60 gigabits per second. Client CPU utilization is 23.79 percent of the client CPU.
Netperf Server with IPoIB and SDP The notable performance values in the example above are as follows: Client CPU utilization is 6.26 percent of client CPU. Server CPU utilization is 7.22 percent of server CPU. Latency is 18.01 microseconds. Latency is calculated as follows: (1 / Transaction rate per second) / 2 * 1,000,000 = one-way average latency in microseconds Step 7 To end test, shutdown the Netperf server.
Page 43
Chapter 5 Sockets Direct Protocol The following list describes parameters for the netperf command: 192.168.0.1 The notable performance values in the example above are as follows: Throughput is 6.60 gigabits per second. Client CPU utilization is 23.79 percent of the client CPU. Server CPU utilization is 21.37 percent of the server CPU.
Page 44
Chapter 5 Sockets Direct Protocol Netperf Server with IPoIB and SDP Cisco SFS InfiniBand Host Drivers User Guide for Linux OL-12309-01...
uDAPL This chapter describes uDAPL and includes the following sections: Introduction, page 6-1 • uDAPL Test Performance, page 6-1 • Compiling uDAPL Programs, page 6-4 • Note See the “Root and Non-root Conventions in Examples” section on page ix significance of prompts used in the examples in this chapter. Introduction uDAPL defines a single set of user-level APIs for all RDMA-capable transports.
uDAPL Throughput Test Performance The Throughput test measures RDMA WRITE throughput using uDAPL. To perform a uDAPL Throughput test performance, perform the following steps: Start the Throughput test on the server host. The syntax for the server is as follows: Step 1 /usr/local/topspin/bin/thru_server.x device_name RDMA_size iterations batch_size The following example shows how to start the Throughput test on the server host:...
Chapter 6 uDAPL View the Throughput test results from the server. Step 3 The following example shows the Throughput test results: Created an EP with ep_handle = 0x2a95f8a300 queried max_recv_dtos = 256 queried max_request_dtos = 1024 Accept issued... Received an event on ep_handle = 0x2a95f8a300 Context = 29a Connected! received rmr_context = 1b3b78 target_address = 95e3a000 segment_length = 40000...
host2$ /usr/local/topspin/bin/lat_client.x ib0 192.168.0.1 200000 1 0 ib0 is the name of the device. • 192.168.0.1 is the IPoIB address of the server host. • 200000 is the number of RDMAs to perform for the test. • 1 is the size in bytes of the RDMA WRITE. •...
Initial Setup Initial Setup This section describes the initial MPI setup. MPI can be used with either IPoIB or Ethernet IP addresses. The drivers for MPI are automatically loaded at boot time if IPoIB or SDP is loaded. If neither IPoIB nor SRP are used, the MPI drivers can still be loaded at boot time.
Page 51
Chapter 7 MVAPICH MPI To configure SSH, perform the following steps: Step 1 Log in to the host that you want to configure as the local host, host1. The following example shows how to log in to the host: login: username Password: password host1$ Note...
Page 52
Configuring SSH Change into the .ssh directory that you created. Step 5 The following example shows how to change into the .ssh directory: host1$ cd .ssh Copy the public key that was just generated to the authorized keys file. Step 6 The following example shows how to copy the public key to authorized keys file: host1$ cp id_dsa.pub authorized_keys host1$ chmod 0600 authorized_keys...
Chapter 7 MVAPICH MPI Return to host1 and copy the authorized keys file from Step 9 The following example shows how to return to host1 and copy the authorized keys file to the directory that was created: host1$ scp authorized_keys host2:.ssh If this is the first time you have logged in to host2 using SSH or SCP, you see an authenticity Note message for host2.
Editing Environment Variables Setting Environment Variables in System-Wide Startup Files This method is used to set a system-wide default for which MPI implementation is used. This method is the easiest for end users; users who log in automatically have MPI implementations set up for them without executing any special commands to find MPI executables, such as mpirun or mpicc.
Chapter 7 MVAPICH MPI Editing Environment Variables Manually Typically, you edit environment variables manually when it is necessary to run temporarily with a given MPI implementation. For example, when it is not desirable to change the default MPI implementation, you can edit the environment variables manually and set MVAPICH to be used for the shell where the variables are set.
MPI Latency Test Performance When the test completes successfully, you see output that is similar to the following: # OSU MPI Bandwidth Test (Version 2.2) # Size 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 MPI Latency Test Performance This section describes the MPI Latency test performance.
Chapter 7 MVAPICH MPI The name of the hostfile • The latency executable name • The following example shows how to run the MVAPICH MPI Latency test: host1$ mpirun_rsh -np 2 -hostfile /tmp/hostfile \ /usr/local/topspin/mpi/mpich/bin/osu_latency When the test completes successfully, you see output that is similar to the following: # OSU MPI Latency Test (Version 2.2) # Size 1024...
Intel MPI Benchmarks (IMB) Test Performance When your installation is not working properly, the IMB test might lead to VAPI_RETRY_EXEC errors. You should check the output of the PingPong, PingPing, and Sendrecv bandwidth measurements against known good results on similar architectures and devices. Low-bandwidth values, especially at high numbers of nodes, might indicate either severe congestion or functionality problems within the IB fabric.
Page 59
Chapter 7 MVAPICH MPI When the test completes successfully, you see output similar to the following: #--------------------------------------------------- Intel (R) MPI Benchmark Suite V2.3, MPI-1 part #--------------------------------------------------- # Date : Thu Oct 12 17:48:21 2006 # Machine : x86_64# System # Release : 2.6.9-42.ELsmp # Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006...
Compiling MPI Programs Compiling MPI Programs This section describes how to compile MPI programs. Compiling MPI applications from source code requires adding several compiler and linker flags. MVAPICH MPI provides wrapper compilers that add all appropriate compiler and linker flags to the command line and then invoke the appropriate underlying compiler, such as the GNU or Intel compilers, to actually perform the compile and/or link.
Page 61
Chapter 7 MVAPICH MPI Select the language and compiler of your choice from the selection of compiler wrappers available in Step 3 Table 7-2. Table 7-2 Language Fortran 77 Fortran 90 Compile the examples as shown here: Step 4 host1$ cd $HOME/mpi-examples host1$ mpicc.i -o hello_c hello_c.c host1$ mpiCC.i -o hello_cxx hello_cxx.cc host1$ mpif77.i -o hello_f77 hello_f77.f...
Page 62
Chapter 7 MVAPICH MPI Compiling MPI Programs Cisco SFS InfiniBand Host Drivers User Guide for Linux 7-14 OL-12309-01...
HCA Utilities and Diagnostics This chapter describes the HCA utilities and diagnostics and includes the following sections: Introduction, page 8-1 • hca_self_test Utility, page 8-1 • tvflash Utility, page 8-3 • Diagnostics, page 8-5 • Introduction The sections in this chapter discuss HCA utilities and diagnostics. These features address basic usability and provide starting points for troubleshooting.
hca_self_test Utility Host Driver Initialization ... PASS Number of HCA Ports Active ... 2 Port State of Port #0 on HCA #0 ... UP 4X Port State of Port #1 on HCA #0 ... UP 4X Error Counter Check on HCA #0 ... PASS Kernel Syslog Check ...
Chapter 8 HCA Utilities and Diagnostics tvflash Utility This section describes the tvflash utility and includes the following topics: Viewing Card Type and Firmware Version, page 8-3 • Upgrading Firmware, page 8-4 • The firmware upgrade is handled automatically by the installation script. You should not have to upgrade Note the firmware manually.
tvflash Utility Upgrading Firmware To upgrade firmware on your host, perform the following steps: Upon installation of the host drivers, the firmware is automatically updated, if required. However, if you Note have outdated firmware on a previously installed HCA, you can upgrade the firmware manually. Log in to your host, and flash the updated firmware binary to your local device.
Chapter 8 HCA Utilities and Diagnostics Diagnostics This section includes diagnostics information. A few diagnostic programs are included with the Linux IB host drivers. The vstat utility prints IB information. The following example shows a vstat utility display: host1# /usr/local/topspin/bin/vstat 1 HCA found: hca_id=InfiniHost0 pci_location={BUS=0x07,DEV/FUNC=0x00}...
Page 68
Chapter 8 HCA Utilities and Diagnostics Diagnostics Cisco SFS InfiniBand Host Drivers User Guide for Linux OL-12309-01...
Acronyms and Abbreviations Table A-1 defines the acronyms and abbreviations that are used in this guide. Table A-1 Acronym GUID IPoIB MVAPICH MPI OFED Open MPI RAID RDMA SCSI OL-12309-01 A P P E N D I X List of Acronyms and Abbreviations Expansion Application Program Interface command-line interface...
Table A-1 List of Acronyms and Abbreviations (continued) Acronym uDAPL WWNN WWPN Cisco SFS InfiniBand Host Drivers User Guide for Linux Expansion Secure Shell Protocol Transmission Control Protocol User Direct Access Programming Library upper-level protocol world-wide node name world-wide port name Appendix A Acronyms and Abbreviations OL-12309-01...