/etc/wgetrc Default location of the global startup file. .wgetrc User startup file. #How to Download a Website Using wget
wget -r www.dlsite.com
#This downloads the pages recursively up to a maximum of 5 levels deep.
#Five levels deep might not be enough to get everything from the site. You can use the -l switch to set the number of levels you wish to go to as follows:
wget -r -l10 www.dlsite.com
#If you want infinite recursion you can use the following:
wget -r -l inf www.dlsite.com
# How to Download Certain File Types wget -A "*.mp3" -r
#The reverse of this is to ignore certain files. Perhaps you don't want to download executables. In this case, you would use the following syntax:
wget -R "*.exe" -r
#Other Parameters
-b, --background Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
-o logfile, --output-file=logfile Log all messages to logfile. The messages are normally reported to standard error.
-a logfile, --append-output=logfile Append to logfile. This option is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.
-q, --quiet Turn off wget's output.
-v, --verbose Turn on verbose output, with all the available data. The default output is verbose.
-nv, --non-verbose Non-verbose output. Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.
-i file, --input-file=file Read URLs from a local or external file. If "-" is specified as file, URLs are read from the standard input. (Use "./-" to read from a file literally named "-".)
-F, --force-html When input is read from a file, force it to be treated as an HTML file. This enables you to retrieve relative links from existing HTML files on your local disk, by adding to HTML, or using the --base command-line option.
-t number, --tries=number Set number of retries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused'' or "not found'' (404), which are not retried.
-O file, --output-document=file The documents will not be written to the appropriate files, but all will be concatenated together and written to file.
-c, --continue Continue getting a partially-downloaded file. This option is useful when you want to finish up a download started by a previous instance of wget, or by another program. For instance: wget -c ftp://dlsite/filename
--progress=type Select the progress indicator you want to use. Legal indicators are "dot" and "bar".
-N, --timestamping Turn on time stamping. Output file will have timestamp matching remote copy; if file already exists locally, and remote file is not newer, no download will occur.
--no-use-server-timestamps Don't set the local file's timestamp by the one on the server.
-S, --server-response Print the headers sent by HTTP servers and responses sent by FTP servers.
--spider When invoked with this option, wget will behave as a web spider, which means that it will not download the pages, just check that they are there. For example, you can use wget to check your bookmarks: wget --spider --force-html -i bookmarks.html
-T seconds, --timeout=seconds Set the network timeout to seconds seconds. This option is equivalent to specifying --dns-timeout, --connect-timeout, and --read-timeout, all at the same time.
--dns-timeout=seconds Set the DNS lookup timeout to seconds seconds. DNS lookups that don't complete within the specified time will fail. By default, there is no timeout on DNS lookups, other than that implemented by system libraries.
--connect-timeout=seconds Set the connect timeout to seconds seconds. TCP connections that take longer to establish will be aborted. By default, there is no connect timeout, other than that implemented by system libraries.
--read-timeout=seconds Set the read (and write) timeout to seconds seconds. Reads that take longer will fail. The default value for read timeout is 900 seconds.
--limit-rate=amount Limit the download speed to amount bytes per second. The amount may be expressed in bytes, kilobytes (with the k suffix), or megabytes (with the m suffix). For example, --limit-rate=20k will limit the retrieval rate to 20 KB/s. This option is useful when, for whatever reason, you don't want wget to consume the entire available bandwidth.
-w seconds, --wait=seconds Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the m suffix, in hours using h suffix, or in days using d suffix.
--waitretry=seconds If you don't want wget to wait between every retrieval, but only between retries of failed downloads, you can use this option. wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify. Therefore, a value of 10 will actually make wget wait up to (1 + 2 + ... + 10) = 55 seconds per file. By default, wget will assume a value of 10 seconds.
--random-wait Some websites may perform log analysis to identify retrieval programs such as wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0 and 2*wait seconds, where wait was specified using the --wait option, to mask wget's presence from such analysis.
--no-dns-cache Turn off caching of DNS lookups. Normally, wget remembers the addresses it looked up from DNS so it doesn't have to repeatedly contact the DNS server for the same (typically small) set of addresses it retrieves. This cache exists in memory only; a new wget run will contact DNS again.
--retry-connrefused Consider "connection refused" a transient error and try again. Normally wget gives up on a URL when it is unable to connect to the site because failure to connect is taken as a sign that the server is not running at all and that retries would not help. This option is for mirroring unreliable sites whose servers tend to disappear for short periods of time.
--user=user, --password=password Specify the username user and password for both FTP and HTTP file retrieval. These parameters can be overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password options for HTTP connections.
--ask-password Prompt for a password for each connection established. Cannot be specified when --password is being used, because they are mutually exclusive.
--unlink Force wget to unlink file instead of clobbering existing file. This option is useful for downloading to the directory with hardlinks.
-nd, --no-directories Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the file names will get extensions .n).
-x, --force-directories The opposite of -nd; create a hierarchy of directories, even if one would not have been created otherwise. For example, wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.
-nH, --no-host-directories Disable generation of host-prefixed directories. By default, invoking wget with -r http://dlsite/ will create a structure of directories beginning with dlsite/. This option disables such behaviour.
--protocol-directories Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/....
--cut-dirs=number Ignore number directory components. This option is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
--http-user=user, --http-passwd=password Specify the username user and password on an HTTP server. According to the challenge, wget will encode them using either the "basic" (insecure) or the "digest" authentication scheme.
--ignore-length Unfortunately, some HTTP servers (CGI programs, to be more precise) send out bogus "Content-Length" headers, which makes wget start to bray like a stuck pig, as it thinks not all the document was retrieved. You can spot this syndrome if wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte. With this option, wget ignores the "Content-Length" header, as if it never existed.
--private-key=file Read the private key from file. This option allows you to provide the private key in a file separate from the certificate.
--private-key-type=type Specify the type of the private key. Accepted values are PEM (the default) and DER.
-r, --recursive Turn on recursive retrieving. -l depth, --level=depth Specify recursion maximum depth level depth. The default maximum depth is 5.
-K, --backup-converted When converting a file, backup the original version with an .orig suffix. Affects the behavior of -N.
-m, --mirror Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf -nr.
-p, --page-requisites This option causes wget to download all the files that are necessary to properly display a given HTML page. Including such things as inlined images, sounds, and referenced stylesheets. Ordinarily, when downloading a single HTML page, any requisite documents that may be needed to display it properly are not downloaded. Using -r together with -l can help, but since wget does not ordinarily distinguish between external and inlined documents, one is generally left with "leaf documents'' that are missing their requisites. -A acclist, --accept acclist; -R rejlist, --reject rejlist Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. -D domain-list, --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. --exclude-domains domain-list Specify the domains that are not to be followed. --follow-ftp Follow FTP links from HTML documents. Without this option, wget will ignore all the FTP links.