1 / 10

Efficient Data Filtering Techniques for Beginners

Learn how to filter large data sets effectively using commands like head, tail, sort, wc, uniq, and more. Follow along with examples using text files on GitHub. Enhance your data processing skills today!

joleary
Download Presentation

Efficient Data Filtering Techniques for Beginners

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 7: Filtering

  2. What is filtering? • Filtering is a process by which a large set of data is restricted by certain conditions to make the data set smaller

  3. Examples Files on GitHub • There are three text files that I am using to demonstrate filtering techniques • You can download these onto your VMs if you want to follow along • random-words.txt • random-alphanumeric.txt • random-passwords.txt

  4. Head • The head command will output the first part of a file • The syntax is head [OPTIONS] [FILE] • Example: • head -5 random-words.txt • Head -1 random-passwords.txt

  5. Tail • The tail command will output the last part of a file • The syntax is tail [options] [file] • Examples: • tail -5 random-words.txt • tail -1 random-passwords.txt

  6. Sorting • Sorting is a common filtering technique • There is a built-in sort command • The syntax is sort [options] [file] • Example: • sort random-passwords.txt • This will sort the contents of the file alphabetically

  7. More Sorting • There are numerous options that you can use with the sort command • Some common options: • -r: this will reverse the sorting • -c: this will check to see if the contents are already sorted • -o: let’s you specify an output file for sorting

  8. Word Count • The wc command will print out information about word count, lines, and bytes in a file • wc random-passwords.txt • Optional Arguments: • -c: display the number of bytes in the file • -l: display the number of newline characters in the file • -w: display the number of words in the file

  9. Unique Items • To see unique items of duplicates in a file, you can use the uniq command • The syntax is uniq [options] [file] • uniq dups.txt • Note: this command only works when duplicate items are adjacent to each other, run the sort command first before using uniq

  10. More & Less • The more and less commands can be used to help with reading large files • They will display as much information as they can in the terminal and then you can scroll through the rest of it at your own leisure

More Related